# Publications By Year

**JOURNALS**

**JMLR = 7, TAC = 2, MLJ = 1, JAIR = 1, Automatica = 1, Foundation & Trends = 1, JAAMAS = 1**

**CONFERENCES**

**ICML = 24, NIPS = 17, IJCAI = 5, AISTATS = 8, AAMAS = 3, AAAI = 4, UAI = 3, ICLR = 2,**

**ALT = 1, ACC = 1**

**2021**

### **Conferences**

Yinlam Chow, Brandon Cui, Moonkyung Ryu, & Mohammad Ghavamzadeh. “Variational Model-based Policy Optimization”.

*Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence*(IJCAI-2021), 2021.Arash Mehrjou, Mohammad Ghavamzadeh, & Bernhard Schölkopf. “Neural Lyapunov Redesign”.

*Proceedings of the Third Annual Learning for Dynamics & Control Conference*(L4DC-2021), 2021. pdfAldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, & Heinrich Jiang. “Stochastic Bandits with Linear Constraints”.

*Proceedings of the Twenty-Fourth International Conference on Artificial Intelligence and Statistics*(AISTATS-2021), 2021. pdfBrandon Cui, Yinlam Chow, & Mohammad Ghavamzadeh. “Control-aware Representations for Model-based Reinforcement Learning”.

*Proceedings of the Ninth International Conference on Learning Representations*(ICLR-2021), 2021. pdfRavi-Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Yisong Yue, & Anima Anandkumar. “Deep Bayesian Quadrature Policy Optimization”.

*Proceedings of the Thirty-Fifth Conference on Artificial Intelligence*(AAAI-2021), 2021. pdf

**2020**

### **Conferences**

Yonathan Efroni, Mohammad Ghavamzadeh, & Shie Mannor. “Online Planning with Lookahead Policies”.

*Proceedings of the Thirty-Fourth Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2020), 2020. pdfYinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, & Mohammad Ghavamzadeh. “Safe Policy Learning for Continuous Control”.

*Proceedings of the Fourth Conference on Robot Learning*(CoRL-2020), 2020. pdfShubhanshu Shekhar, Tara Javidi, & Mohammad Ghavamzadeh. “Adaptive Sampling for Estimating Probability Distributions”.

*Proceedings of the Thirty-Seventh International Conference on Machine Learning*(ICML-2020), 2020. pdfManan Tomar, Yonathan Efroni, & Mohammad Ghavamzadeh. “Multi-Step Greedy Reinforcement Learning Algorithms”.

*Proceedings of the Thirty-Seventh International Conference on Machine Learning*(ICML-2020), 2020. pdfRui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, & Hung Bui. “Predictive Coding for Locally-Linear Control”.

*Proceedings of the Thirty-Seventh International Conference on Machine Learning*(ICML-2020), 2020. pdfJean Tarbouriech, Shubhanshu Shekhar, Mohammad Ghavamzadeh, Matteo Pirotta, & Alessandro Lazaric. “Active Model Estimation in Markov Decision Processes”.

*Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence*(UAI-2020), 2020. pdfShubhanshu Shekhar, Mohammad Ghavamzadeh, & Tara Javidi. “Active Learning for Classification with Abstention”.

*Proceedings of IEEE International Symposium on Information Theory*(ISIT-2020), 2020*(short-listed as one of the six finalists for the Jack Keil Wolf student paper award).*Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, & Matteo Pirotta. “Conservative Exploration in Reinforcement Learning”.

*Proceedings of the Twenty-Third International Conference on Artificial Intelligence and Statistics*(AISTATS-2020), 2020. pdfBranislav Kveton, Manzeel Zaheer, Csaba Szepesvári, Lihong Li, Mohammad Ghavamzadeh, & Craig Boutilier. “Randomized Exploration in Generalized Linear Bandits”

*Proceedings of the Twenty-Third International Conference on Artificial Intelligence and Statistics*(AISTATS-2020), 2020. pdfNir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, & Hung Bui. “Prediction, Consistency, Curvature: Representation Learning for Locally Linear Control”.

*Proceedings of the Eighth International Conference on Learning Representations*(ICLR-2020), 2020. pdfEvrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, & Matteo Pirotta. “Improved Algorithms for Conservative Exploration in Bandits”.

*Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence*(AAAI-2020), 2020. pdf

### **Workshop**

Manan Tomar, Lior Shani, Yonathan Efroni, & Mohammad Ghavamzadeh. “Mirror Descent Policy Optimization”. Workshop on

*“Deep Reinforcement Learning”, Thirty-Fourth Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2020), 2020*(selected for a contributed talk – 8 out of over 250 submissions).*Ravi-Tej Akella, Kamyar Azizzadenasheli, Mohammad Ghavamzadeh, Yisong Yue, & Anima Anandkumar. “Deep Bayesian Quadrature Policy Gradient”. Workshops on

*“Deep Reinforcement Learning” and “Challenges of Real-World Reinforcement Learning”, Thirty-Fourth Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2020), 2020.

**2019**

### **Conferences**

Yonathan Effroni, Nadav Merlis, Mohammad Ghavamzadeh, & Shie Mannor. “Tight Regret Bounds for Model-based Reinforcement Learning with Greedy Policies”.

*Accepted for Spotlight Presentation.**Proceedings of the Thirty-Second Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2019), pp. 12224-12234, 2019. pdfBranislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, & Craig Boutilier. “Perturbed-History Exploration in Stochastic Linear Bandits”.

*Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence*(UAI-2019), 2019. pdfBranislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, & Craig Boutilier. “Perturbed-History Exploration in Stochastic Multi-Armed Bandits”.

*Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence*(IJCAI-2019), pp. 2786-2793, 2019. pdfBranislav Kveton, Csaba Szepesvári, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, & Tor Lattimore. “Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits”.

*Proceedings of the Thirty-Sixth International Conference on Machine Learning*(ICML-2019), pp. 3601-3610, 2019. pdfErshad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, & Nikos Vlassis. “Optimizing over a Restricted Policy Class in MDPs”.

*Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics*(AISTATS-2019), pp. 3042-3050, 2019. pdfJonathan Pierre Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, & Marco Pavone. “Risk-sensitive Generative Adversarial Imitation Learning”.

*Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics*(AISTATS-2019), pp. 2154-2163, 2019. pdf

### **Workshop**

Jorge Méndez, Alborz Geramifard, Mohammad Ghavamzadeh, & Bing Liu. “Reinforcement Learning of Multi-Domain Dialog Policies via Action Embeddings”. Workshop on

*“Conversational AI”, Thirty-Third Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2019), 2019.Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, & Matteo Pirotta. “Conservative Exploration in Finite Horizon Markov Decision Processes”. Workshop on

*“Safety and Robustness in Decision-making”, Thirty-Third Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2019), 2019.Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, & Matteo Pirotta. “Improved Algorithms for Conservative Exploration in Bandits”. Workshop on

*“Deep Reinforcement Learning”, Thirty-Third Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2019), 2019.Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, & Joelle Pineau. “Benchmarking Batch Deep Reinforcement Learning Algorithms”. Workshop on

*“Conversational AI”, Thirty-Third Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2019), 2019.Yinlam Chow, Ofir Nachum, Aleksandra Faust, and Mohammad Ghavamzadeh. “Lyapunov-based Safe Policy Optimization for Continuous Control”. Workshop on

*“Reinforcement Learning for Real Life”, Thirty-Sixth International Conference on Machine Learning*(ICML-2019), 2019*(winner of the best paper award).*

**2018**

### **Journal**

Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, & Marek Petrik. “Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity”.

*Journal of Artificial Intelligence Research*(JAIR), 63:461-494, 2018. pdfYinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, & Marco Pavone. “Risk-Constrained Reinforcement Learning with Percentile Risk Criteria”.

*Journal of Machine Learning Research*(JMLR), 18(167):1-51, 2018. pdf

### **Conference**

Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh, & Edgar Duenez-Guzman. “A Lyapunov-based Approach to Safe Reinforcement Learning”.

*Proceedings of the Thirty-Second Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2018), 2018. pdfTengyang Xie, Bo Liu, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, & Daesub Yoon. “A Block Coordinate Ascent Algorithm for Mean-Variance Optimization”.

*Proceedings of the Thirty-Second Annual Conference on Advances in Neural Information Processing Systems*(NeurIPS-2018), 2018. pdfOfir Nachum, Yinlam Chow, & Mohammad Ghavamzadeh. “Path Consistency Learning in Tsallis Entropy Regularized MDPs”.

*Proceedings of the Thirty-Fifth International Conference on Machine Learning*(ICML-2018), pp. 979-988, Stockholm, Sweden, July 2018. pdfMehrdad Farajtabar, Yinlam Chow, & Mohammad Ghavamzadeh. “More Robust Doubly Robust Off-policy Evaluation”.

*Proceedings of the Thirty-Fifth International Conference on Machine Learning*(ICML-2018), pp. 1447-1456, Stockholm, Sweden, July 2018. pdfErshad Banijamali, Rui Shu, Mohammad Ghavamzadeh, Hung Bui & Ali Ghodsi. “Robust Locally-Linear Controllable Embedding”.

*Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics*(AISTATS-2018), pp. 1751-1759, 2018. pdfYahel David, Balázs Szörényi, Mohammad Ghavamzadeh, Shie Mannor, & Nahum Shimkin. “PAC Bandits with Risk Constraints”.

*International Symposium on Artificial Intelligence and Mathematics*(ISAIM-2018), Special Session on Theory of Machine Learning, 2018. pdf

### **Workshop**

- Jonathan Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, & Marco Pavone. “Risk-Sensitive Generative Adversarial Imitation Learning”. Workshop on
*“Safety, Risk, and Uncertainty in Reinforcement Learning”, Thirty-Fourth Conference on Uncertainty in Artificial Intelligence*(UAI-2018), 2018.

**2017**

### **Journal**

- Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, & Shie Mannor. “Sequential Decision-making with Coherent Risk”.
*IEEE Transaction on Automatic Control*(TAC), 62(7):3323-3338, 2017 (DOI: 10.1109/TAC.2016.2644871). pdf

### **Conference**

Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, & Ben Van Roy. “Conservative Contextual Linear Bandits”.

*Proceedings of the Thirty-First Annual Conference on Neural Information Processing Systems*(NIPS-2017), pp. 3913-3922, 2017. pdfCarlos Riquelme, Mohammad Ghavamzadeh, & Alessandro Lazaric. “Active Learning for Accurate Estimation of Linear Models”.

*Proceedings of the Thirty-Fourth International Conference on Machine Learning*(ICML-2017), pp. 2931-2939, Sydney, Australia, August 2017. pdfRui Shu, Hung Bui, & Mohammad Ghavamzadeh. “Bottleneck Conditional Density Estimation”.

*Proceedings of the Thirty-Fourth International Conference on Machine Learning*(ICML-2017), pp. 3164-3172, Sydney, Australia, August 2017. pdfSharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, & Mark Schmidt. “Model-Independent Online Learning for Influence Maximization”.

*Proceedings of the Thirty-Fourth International Conference on Machine Learning*(ICML-2017), pp. 3530-3539, Sydney, Australia, August 2017. pdfMasrour Zhoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvári, & Zheng Wen. “Online Learning to Rank in Stochastic Click Models”.

*Proceedings of the Thirty-Fourth International Conference on Machine Learning*(ICML-2017), pp. 4199-4208, Sydney, Australia, August 2017. pdfAlan Malek, Yinlam Chow, Sumeet Katariya, & Mohammad Ghavamzadeh. “Sequential Multiple Hypothesis Testing with Type I Error Control”.

*Proceedings of the Twentieth International Conference on Artificial Intelligence and Statistics*(AISTATS-2017), pp. 1468-1476, 2017. pdfPhilip Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, & Emma Brunskill. “Predictive Off-Policy Evaluation for Nonstationary Decision Problems”.

*Proceedings of the Twenty-Ninth Conference on Innovative Applications of Artificial Intelligence*(IAAI-2017), pp. 4740-4745, 2017. pdfIan Gemp, Georgios Theocharous, & Mohammad Ghavamzadeh. “Automated Data Cleansing through Meta-Learning”.

*Proceedings of the Twenty-Ninth Conference on Innovative Applications of Artificial Intelligence*(IAAI-2017), pp. 4760-4761, 2017. pdf

### **Workshop**

Ershad Banijamali, Ahmad Khajenezhad, Ali Ghodsi, & Mohammad Ghavamzadeh. “Disentangling Dynamics and Content for Control and Planning”. Workshop on

*“Learning Disentangled Representations: from Perception to Control”, Thirty-First Annual Conference on Neural Information Processing Systems*(NIPS-2017), 2017.Ershad Banijamali, Rui Shu, Mohammad Ghavamzadeh, & Hung Bui. “Robust Controlable Embedding of High-Dimensional Observations of Markov Decision Processes”. Workshop on

*“Implicit Models”, Thirty-Fourth International Conference on Machine Learning*(ICML-2017), Sydney, Australia, August 2017.

**2016**

### **Journal**

Amir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, & Shie Mannor. “Regularized Policy Iteration for Non-Parametric Function Spaces”.

*Journal of Machine Learning Research*(JMLR), 17(139):1-66, 2016. pdfPrashanth L. A. and Mohammad Ghavamzadeh. “Variance-constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs”.

*Machine Learning Journal*(MLJ), 105(3):367-417, 2016 (DOI: 10.1007/s10994-016-5569-5). pdfMohammad Ghavamzadeh, Yaakov Engel, & Michal Valko. “Bayesian Policy Gradient and Actor-Critic Algorithms”.

*Journal of Machine Learning Research*(JMLR), 17(66):1-53, 2016. pdf

**CODE IS AVAILABLE AT**1 2Alessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Analysis of Classification-based Policy Iteration Algorithms”.

*Journal of Machine Learning Research*(JMLR), 17(19):1-30, 2016. pdf

### **Conference**

Marek Petrik, Mohammad Ghavamzadeh, & Yinlam Chow. “Safe Policy Improvement by Minimizing Robust Baseline Regret”.

*Proceedings of the Thirtieth Annual Conference on Neural Information Processing Systems*(NIPS-2016), pp. 2298-2306, 2016. pdfBranislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, & Siqi Sun. “Graphical Model Sketch”.

*Proceedings of the European Conference on Machine Learning*(ECML-2016), Riva del Garda, Italy, 2016. pdfBo Liu, Mohammad Ghavamzadeh, Ian Gemp, Ji Liu, & Sridhar Mahadevan. “Proximal Gradient Temporal Difference Learning Algorithms”.

*Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence*(IJCAI-2016), pp. 4195-4199, New York City, NY, July 2016. pdfVictor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Ronald Ortner, & Peter Bartlett. “Improved Learning Complexity in Combinatorial Pure Exploration Bandits”.

*Proceedings of the Nineteenth International Conference on Artificial Intelligence and Statistics*(AISTATS-2016), pp. 1004-1012, Cadiz, Spain, May 2016. pdf

### **Workshop**

Abbas Kazerouni, Mohammad Ghavamzadeh, & Ben VanRoy. “Safety in Contextual Linear Bandits”. Workshop on

*“Reliable Machine Learning in the Wild”, Thirtieth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2016), Barcelona, Spain, December 2016.Rui Shu, Hung Bui, & Mohammad Ghavamzadeh. “Bottleneck Conditional Density Estimators”. Workshop on

*“Bayesian Deep Learning”, Thirtieth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2016), Barcelona, Spain, December 2016.Rui Shu, James Brofos, Frank Zhang, Hung Bui, Mohammad Ghavamzadeh, & Mykel Kochenderfer. “Stochastic Video Prediction with Conditional Density Estimation”. Workshop on

*“Action and Anticipation for Visual Learning”, Fourteenth European Conference on Computer Vision*(ECCV-2016), Amsterdam, The Netherlands, October 2016.Marek Petrik, Yinlam Chow, & Mohammad Ghavamzadeh. “Optimally Robust Policy Improvement with Baseline Guarantees”. Workshop on

*“Reliable Machine Learning in the Wild”, Thirty-Third International Conference on Machine Learning*(ICML-2016), New York City, NY, June 2016.

**2015**

### **Journal**

Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, & Aviv Tamar. “Bayesian Reinforcement Learning: A Survey”.

*Foundations and Trends in Machine Learning*, 8(5-6):359-483, 2015 (DOI: 10.1561/2200000049). pdfBruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Boris Lesner, & Matthieu Geist. “Approximate Modified Policy Iteration and its Application to the Game of Tetris”.

*Journal of Machine Learning Research*(JMLR), 16:1629-1676, 2015. pdfAmir massoud Farahmand, Doina Precup, André Barreto, & Mohammad Ghavamzadeh. “Classification-based Approximate Policy Iteration”.

*IEEE Transactions on Automatic Control*(TAC), 60(11) 2989-2993, 2015. pdf

### **Conference**

Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, & Shie Mannor. “Policy Gradient for Coherent Risk Measures”.

*Proceedings of the Twenty-Ninth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2015), pp. 1468-1476, 2015. pdfBo Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, & Marek Petrik. “Finite-Sample Analysis of Proximal Gradient TD Algorithms”.

*Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence*(UAI-2015), pp. 504-513, Amsterdam, Netherlands, July 2015*(winner of the Facebook best student paper award).*Philip Thomas, Georgios Theocharous, & Mohammad Ghavamzadeh. “High Confidence Policy Improvement”.

*Proceedings of the Thirty-Second International Conference on Machine Learning*(ICML-2015), pp. 2380-2388, Lille, France, July 2015. pdfJulien Audiffren, Michal Valko, Alessandro Lazaric, & Mohammad Ghavamzadeh. “Maximum Entropy Semi-Supervised Inverse Reinforcement Learning”.

*Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence*(IJCAI-2015), pp. 3315-3321, Buenos Aires, Argentina, July 2015. pdfGeorgios Theocharous, Philip Thomas, & Mohammad Ghavamzadeh. “Building Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees”.

*Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence*(IJCAI-2015), pp. 1806-1812, Buenos Aires, Argentina, July 2015. pdfPhilip Thomas, Georgios Theocharous, & Mohammad Ghavamzadeh. “High Confidence Off-Policy Evaluation”.

*Proceedings of the Twenty-Ninth Conference on Artificial Intelligence*(AAAI-2015), pp. 3000-3006, Austin, TX, January 2015. pdf

### **Workshop**

Sougata Chaudhuri, Georgios Theocharous, & Mohammad Ghavamzadeh. “A Ranking Approach to Address the Click Sparsity Problem in Personalized Ad Recommendation”. Workshop on

*“Machine Learning for eCommerce”. Twenty-Ninth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2015), Montreal, Canada, December 2015.Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, & Shie Mannor. “Policy Gradient for Coherent Risk Measures”.

*Twelfth European Workshop on Reinforcement Learning*(EWRL-12) a*t the Thirty-Second International Conference on Machine Learning*(ICML), Lille, France, July 2015.Georgios Theocharous, Philip Thomas, & Mohammad Ghavamzadeh. “Ad Recommendation Systems for Life-Time Value Optimization”. Workshop on

*“Ad Targeting at Scale”, Twenty-Fourth International World Wide Web Conference*(WWW-2015), Florence, Italy, May 2015.

### **Tech Report**

- Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, and Shie Mannor. “Policy Gradient for Coherent Risk Measures”. arXiv:1502.03919, 2015.

**2014**

### **Conference**

- Yinlam Chow and Mohammad Ghavamzadeh. “Algorithms for CVaR Optimization in MDPs”.
*Proceedings of the Twenty-Eighth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2014), pp. 3509-3517, 2014. pdf

### **Workshop**

Yinlam Chow and Mohammad Ghavamzadeh. “Constrained Stochastic Optimal Control with a Baseline Performance Guarantee”. Workshop on

*“From Bad Models to Good Policies”, Twenty-Eight Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2014), Montreal, Canada, December 2014.Julien Audiffren, Michal Valko, Alessandro Lazaric, & Mohammad Ghavamzadeh. “Maximum Entropy Semi-Supervised Inverse Reinforcement Learning”. Workshop on

*“Novel Trends and Applications in Reinforcement Learning”, Twenty-Eight Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2014), Montreal, Canada, December 2014.Philip Thomas, Georgios Theocharous, & Mohammad Ghavamzadeh. “Safe Policy Search”. Workshop on

*“Customer Life-Time Value Optimization in Digital Marketing”, Thirty-First International Conference on Machine Learning*(ICML-2014), Beijing, China, June 2014.

### **Tech Report**

Yinlam Chow and Mohammad Ghavamzadeh. “Constrained Stochastic Optimal Control with a Baseline Performance Guarantee”. arXiv:1410.2726, 2014.

Yinlam Chow and Mohammad Ghavamzadeh. “Algorithms for CVaR Optimization in MDPs”. arXiv:1406.3339, 2014.

### **Habilitation Thesis**

- Mohammad Ghavamzadeh. “Sample Complexity in Sequential Decision-Making”. Department of Mathematics, Université Lille 1 - Sciences et Technologies, France, June 2014. pdf

**2013**

### **Conference**

Prashanth L. A. and Mohammad Ghavamzadeh. “Actor-Critic Algorithms for Risk-Sensitive MDPs”.

*Accepted for Oral Presentation (%1.4 acceptance - 20 out of 1420 submissions).**Proceedings of the Twenty-Seventh Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2013), pp. 252-260, 2013. pdfVictor Gabillon, Mohammad Ghavamzadeh, & Bruno Scherrer. “Approximate Dynamic Programming Finally Performs Well in the Game of Tetris”.

*Proceedings of the Twenty-Seventh Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2013), pp. 1754-1762, 2013. pdfBernardo Ávila Pires, Mohammad Ghavamzadeh, & Csaba Szepesvári. “Cost-sensitive Multiclass Classification Risk Bounds “.

*Proceedings of the Thirtieth International Conference on Machine Learning*(ICML-2013), pp. 28(3):1391-1399, Atlanta, GA, 2013. pdfHachem Kadri, Mohammad Ghavamzadeh, & Philippe Preux. “A Generalized Kernel Approach to Structured Output Learning”.

*Proceedings of the Thirtieth International Conference on Machine Learning*(ICML-2013), pp. 28(1):471-479, Atlanta, GA, 2013. pdfAmir massoud Farahmand, Doina Precup, André Barreto, & Mohammad Ghavamzadeh. “CAPI: Generalized Classification-based Approximate Policy Iteration”.

*The First Multidisciplinary Conference on Reinforcement Learning and Decision Making*(RLDM-2013), Princeton, NJ, 2013.

### **Tech Report**

- Prashanth L. A. and Mohammad Ghavamzadeh. “Actor-Critic Algorithms for Risk-Sensitive MDPs” Technical Report inria-00794721, INRIA, 2013.

**2012**

### **Journal**

- Alessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Finite-Sample Analysis of Least-Squares Policy Iteration’’.
*Journal of Machine Learning Research*(JMLR), 13:3041-3074, 2012. pdf

### **Conference**

Victor Gabillon, Mohammad Ghavamzadeh, & Alessandro Lazaric. “A Unified Approach to Fixed Budget and Fixed Confidence”.

*Proceedings of the Twenty-Sixth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2012), pp. 3221-3229, 2012. pdfBruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, & Matthieu Geist. “Approximate Modified Policy Iteration”.

*Proceedings of the Twenty-Ninth International Conference on Machine Learning*(ICML-2012), pp. 1207-1214, Edinburgh, Scotland, 2012. pdfMatthieu Geist, Bruno Scherrer, Alessandro Lazaric, &Mohammad Ghavamzadeh. “A Dantzig Selector Approach to Temporal Difference Learning”.

*Proceedings of the Twenty-Ninth International Conference on Machine Learning*(ICML-2012), pp. 1399-1406, Edinburgh, Scotland, 2012. pdfMohammad Ghavamzadeh & Alessandro Lazaric. “Conservative and Greedy Approaches to Classification-based Policy Iteration”.

*Proceedings of the Twenty-Sixth Conference on Artificial Intelligence*(AAAI-2012), 914-920, Toronto, ON, Canada, 2012. pdf

### **Book Chapter**

Nikos Vlassis, Mohammad Ghavamzadeh, Shie Mannor, & Pascal Poupart. “Bayesian Reinforcement Learning”.

*Reinforcement Learning: State of the Art,*Edited by Marco Wiering and Martijn van Otterlo, Springer Verlag, 2012.Lucian Busoniu, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Robert Babuska, & Bert De Schutter. “Least-Squares Methods for Policy Iteration”.

*Reinforcement Learning: State of the Art,*Edited by Marco Wiering and Martijn van Otterlo, Springer Verlag, 2012.

### **Workshop**

- Michal Valko, Mohammad Ghavamzadeh, & Alessandro Lazaric. “Semi-Supervised Inverse
*Reinforcement Learning “. Ninth European Workshop on Reinforcement Learning*(EWRL-2012), Edinburgh, Scotland, 2012.

### **Tech Report**

Hachem Kadri, Mohammad Ghavamzadeh, & Philippe Preux. “A Generalized Kernel Approach to Structured Output Learning” Technical Report inria-00695631, INRIA, 2012.

Victor Gabillon, Mohammad Ghavamzadeh, & Alessandro Lazaric. “Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence” Technical Report inria-00747005, INRIA, 2012.

Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, & Matthieu Geist. “Approximate Modified Policy Iteration” Technical Report inria-00697169, INRIA, 2012.

**2011**

### **Conference**

Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, & Sebastien Bubeck. “Multi-Bandit Best Arm Identification”.

*Proceedings of the Twenty-Fifth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2011), pp. 2222-2230, 2011. pdfMohammad Azar, Rémi Munos, Mohammad Ghavamzadeh, & Hilbert Kappen. “Speedy Q-Learning”.

*Proceedings of the Twenty-Fifth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2011), pp. 2411-2419, 2011. pdfAlexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, & Peter Auer. “Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits”.

*Selected for a special issue of the Journal of Theoretical Computer Science. Proceedings of the Twenty-Second International Conference on Algorithmic Learning Theory*(ALT-2011), pp. 189-203, Espoo, Finland, October 2011. pdfMohammad Ghavamzadeh, Alessandro Lazaric, Rémi Munos, & Matthew Hoffman. “Finite-Sample Analysis of Lasso-TD”.

*Proceedings of the Twenty-Eighth International Conference on Machine Learning*(ICML-2011), pp. 1177-1184, Bellevue, WA, June 2011. pdfVictor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, & Bruno Scherrer. “Classification-based Policy Iteration with a Critic”.

*Proceedings of the Twenty-Eighth International Conference on Machine Learning*(ICML-2011), pp. 1049-1056, Bellevue, WA, June 2011. pdf

### **Workshop**

- Matthew Hoffman, Alessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Regularized Least Squares Temporal Difference Learning with Nested L2 and L1 Penalization”.
*Ninth European Workshop on Reinforcement Learning*(EWRL-2011), Athens, Greece, September 2011.

### **Tech Report**

Mohammad Azar, Rémi Munos, Mohammad Ghavamzadeh, & Hilbert Kappen. “Reinforcement Learning with a Near Optimal Rate of Convergence” Technical Report inria-00636615, INRIA, 2011.

Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, & Sébastien Bubeck. “Multi-Bandit Best Arm Identification” Technical Report inria-00632523, INRIA, 2011.

Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, & Peter Auer. “Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits” Technical Report inria-00594131, INRIA, 2011.

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, & Bruno Scherrer. “Classification-based Policy Iteration with a Critic” Technical Report inria-00590972, INRIA, 2011.

**2010**

### **Conference**

Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, & Rémi Munos. “LSTD with Random Projections”.

*Accepted for Spotlight Presentation (%6 acceptance - 73 out of 1219 submissions).**Proceedings of the Twenty-Fourth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2010), pp. 721-729, 2010. pdfOdalric Maillard, Rémi Munos, Alessandro Lazaric, & Mohammad Ghavamzadeh. “Finite-Sample Analysis of Bellman Residual Minimization’’.

*Proceedings of the Second Asian Conference on Machine Learning*(ACML-2010), pp. 299-314, Tokyo, Japan, November 2010. pdfAlessandro Lazaric & Mohammad Ghavamzadeh. “Bayesian Multi-Task Reinforcement Learning”.

*Proceedings of the Twenty-Seventh International Conference on Machine Learning*(ICML-2010), pp. 599-606, Haifa, Israel, June 2010. pdfAlessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Analysis of a Classification-based Policy Iteration Algorithm”.

*Proceedings of the Twenty-Seventh International Conference on Machine Learning*(ICML-2010), pp. 607-614, Haifa, Israel, June 2010. pdfAlessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Finite-Sample Analysis of LSTD”.

*Proceedings of the Twenty-Seventh International Conference on Machine Learning*(ICML-2010), pp. 615-622, Haifa, Israel, June 2010. pdf

### **Workshop**

- Victor Gabillon, Alessandro Lazaric, & Mohammad Ghavamzadeh. “Rollout Allocation Strategies for Classification-based Policy Iteration”. Workshop on
*“Reinforcement Learning and Search in Very Large Spaces”, Twenty-Seventh International Conference on Machine Learning*(ICML-2010), Haifa, Israel, June 2010.

### **Tech Report**

Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, & Rémi Munos. “LSPI with Random Projections,” Technical Report inria-00530762, INRIA, 2010.,

Alessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Finite-Sample Analysis of Least-Squares Policy Iteration,’’ Technical Report inria-00528596, INRIA, 2010.

Alessandro Lazaric & Mohammad Ghavamzadeh. “Bayesian Multi-Task Reinforcement Learning,’’ Technical Report inria-00475214, INRIA, 2010.

Alessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Analysis of a Classification-based Policy Iteration Algorithm,’’ Technical Report inria-00482065, INRIA, 2010.

Alessandro Lazaric, Mohammad Ghavamzadeh, & Rémi Munos. “Finite-Sample Analysis of LSTD,’’ Technical Report inria-00482189, INRIA, 2010.

**2009**

### **Journal**

- Shalabh Bhatnagar, Richard Sutton, Mohammad Ghavamzadeh, & Mark Lee. “Natural Actor-Critic Algorithms”.
*Automatica,*45(11):2471-2482, 2009 (DOI: 10.1016/j.automatica.2009.07.008). (the longer version is available as a UAlberta Tech-Report pdf

### **Conference**

- Amir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, & Shie Mannor. “Regularized Fitted Q-iteration for Planning in Continuous-Space Markovian Decision Problems”.
*Proceedings of the 2009 American Control Conference*(ACC-2009), pp. 725-730, St. Louis, MO, June 2009. pdf

### **Workshop**

Mohammad Ghavamzadeh. “Hierarchical Hybrid Reinforcement Learning Algorithms”. Workshop on

*“Bridging the Gap between High-level Discrete Representations and Low-level Continuous Behaviors”, Robotics: Science and Systems Conference*(RSS-2009), Seattle, WA, June 2009. pdfAmir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, & Shie Mannor. “Robot Learning with Regularized Reinforcement Learning”. Workshop on

*“Regression in Robotics—Approaches and Applications”, Robotics: Science and Systems Conference*(RSS-2009), Seattle, WA, June 2009. pdfMohammad Ghavamzadeh & Yaakov Engel. “Bayesian Actor Critic: A Bayesian Model for Value Function Approximation and Policy Learning”. Workshop on

*“Regression in Robotics—Approaches and Applications”, Robotics: Science and Systems Conference*(RSS-2009), Seattle, WA, June 2009. pdfAmir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, & Shie Mannor. “Regularization in Reinforcement Learning”.

*Multidisciplinary Symposium on Reinforcement Learning*(MSRL-2009), Montreal, QC, Canada, June 2009. pdf

### **Tech Report**

- Shalabh Bhatnagar, Richard Sutton, Mohammad Ghavamzadeh, & Mark Lee. “Natural Actor-Critic Algorithms,” Technical Report TR09-10, Department of Computing Science, University of Alberta, 2009.

**2008**

### **Conference**

- Amir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, & Shie Mannor. “Regularized Policy Iteration”.
*Proceedings of the Twenty-Second Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2008), pp. 441-448, 2008. pdf

### **Workshop**

Amir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, & Shie Mannor. “Regularized Fitted Q-iteration: Application to Bounded Resource Planning”.

*Proceedings of the Eighth European Workshop on Reinforcement Learning*(EWRL-2008), volume 5323 of Lecture Notes in Artificial Intelligence, pp. 55-68, Villeneuve d’Ascq, France, July 2008. pdfAmir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, & Shie Mannor. “Regularized Policy Iteration”.

*Eighth European Workshop on Reinforcement Learning*(EWRL-2008), Villeneuve d’Ascq, France, July 2008.

**2007**

### **Journal**

- Mohammad Ghavamzadeh & Sridhar Mahadevan. “Hierarchical Average Reward Reinforcement Learning”.
*Journal of Machine Learning Research*(JMLR), 8:2629-2669, 2007. pdf

### **Conference**

Shalabh Bhatnagar, Richard Sutton, Mohammad Ghavamzadeh, & Mark Lee. “Incremental Natural Actor-Critic Algorithms”.

*Accepted for Spotlight Presentation (%10 acceptance - 101 out of 975 submissions).**Proceedings of the Twenty-First Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2007), pp. 105-112, 2007. pdfMohammad Ghavamzadeh & Yaakov Engel. “Bayesian Actor-Critic Algorithms”.

*Proceedings of the Twenty-Fourth International Conference on Machine Learning*(ICML-2007), pp. 297-304, Oregon State University, Corvallis, OR, June 2007. pdf

**2006**

### **Journal**

- Mohammad Ghavamzadeh, Sridhar Mahadevan, & Rajbala Makar. “Hierarchical Multiagent Reinforcement Learning”.
*Journal of Autonomous Agents and Multi-Agent Systems*(JAAMAS), 13(2):197-229, 2006 (DOI: 10.1007/s10458-006-7035-4). pdf

### **Conference**

- Mohammad Ghavamzadeh & Yaakov Engel. “Bayesian Policy Gradient Algorithms”.
*Accepted for Spotlight Presentation (%7.5 acceptance - 63 out of 833 submissions).**Proceedings of the Twentieth Annual Conference on Advances in Neural Information Processing Systems*(NIPS-2006), pp. 457-464, 2006. pdf

### **Workshop**

Mohammad Ghavamzadeh & Yaakov Engel. “Bayesian Policy Gradient”. Workshop on

*“Kernel Machines and Reinforcement Learning” (KRL), Twenty-Thrid International Conference on Machine Learning*(ICML-2006), Pittsburgh, PA, June 2006. pdfMohammad Ghavamzadeh & Sridhar Mahadevan. “Learning to Cooperate using Hierarchical Reinforcement Learning”. Workshop on

*“Hierarchical Autonomous Agents and Multi-Agent Systems” (H-AAMAS), Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems*(AAMAS-2006), Hakodate, Japan, May 2006. pdf

**2005**

### **PhD Thesis**

- Mohammad Ghavamzadeh. “Hierarchical Reinforcement Learning in Continuous State and Multi-Agent Environments”. Department of Computer Science, University of Massachusetts Amherst, May 2005.

**2004**

### **Conference**

- Mohammad Ghavamzadeh & Sridhar Mahadevan. “Learning to Communicate and Act using Hierarchical Reinforcement Learning”.
*Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems*(AAMAS-2004), pp. 1114-1121, New York City, NY, July 2004. pdf

### **Book Chapter**

- Sridhar Mahadevan, Mohammad Ghavamzadeh, Khashayar Rohanimanesh, & Georgios Theocharous. “Hierarchical Approaches to Concurrency, Multiagency, and Partial Observability”.
*Learning and Approximate Dynamic Programming: Scaling up to the Real World,*Edited by Jennie Si, Andrew Barto, Warren Powell and Donald Wunsch, John Wiley & Sons, New York, pp. 285-310, 2004. pdf

### **Tech Report**

- Mohammad Ghavamzadeh & Sridhar Mahadevan. “Hierarchical Multiagent Reinforcement Learning”. Technical Report UM-CS-2004-02. Department of Computer Science, University of Massachusetts Amherst, 2004.

**2003**

### **Conference**

- Mohammad Ghavamzadeh & Sridhar Mahadevan. “Hierarchical Policy Gradient Algorithms”.
*Proceedings of the Twentieth International Conference on Machine Learning*(ICML-2003), pp. 226-233, Washington, D.C., August 2003. pdf

### **Tech Report**

Mohammad Ghavamzadeh, Sridhar Mahadevan, & Rajbala Makar. “Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models”. Technical Report UM-CS-2003-23, Department of Computer Science, University of Massachusetts Amherst, 2003.

Mohammad Ghavamzadeh & Sridhar Mahadevan. “Hierarchical Average Reward Reinforcement Learning”. Technical Report UM-CS-2003-19, Department of Computer Science, University of Massachusetts Amherst, 2003.

**2002**

### **Conference**

Mohammad Ghavamzadeh & Sridhar Mahadevan. “Hierarchically Optimal Average Reward Reinforcement Learning”.

*Proceedings of the Nineteenth International Conference on Machine Learning*(ICML-2002), pp. 195-202, Sydney, Australia, July 2002. pdfMohammad Ghavamzadeh & Sridhar Mahadevan. “A Multiagent Reinforcement Learning Algorithm by Dynamically Merging Markov Decision Processes”.

*Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems*(AAMAS-2002), pp. 845-846, Bologna, Italy, July 2002. pdf

**2001**

### **Journal**

- Ali M. Eydgahi & Mohammad Ghavamzadeh. “Complementary Root Locus Revisited”.
*IEEE Transactions on Education*, 44(2):137-143, 2001. pdf

### **Conference**

Mohammad Ghavamzadeh & Sridhar Mahadevan. “Continuous-Time Hierarchical Reinforcement Learning”.

*Proceedings of the Eighteenth International Conference on Machine Learning*(ICML-2001), pp. 186-193, Williams College, MA, July 2001. pdfRajbala Makar, Sridhar Mahadevan, & Mohammad Ghavamzadeh. “Hierarchical Multi-Agent Reinforcement Learning”.

*Proceedings of the Fifth International Conference on Autonomous Agents*(Agents-2001), pp. 246-253, Montreal, Canada, June 2001*(winner of the best student paper award).*

**Before 2001**

### **Conference**

Ali M. Eydgahi & Mohammad Ghavamzadeh. “Complementary Root Locus Revisited”.

*IEEE Transactions on Education*, 44(2):137-143, 2001. pdfMohammad Ghavamzadeh, Caro Lucas, & Shahin Shayan Arani. “Forecasting the International Oil and Gold Prices Using Artificial Neural Networks”.

*Proceedings of the Conference on Computer Science and Information Technologies*(CSIT-1997), Yerevan, Armenia, September 1997.Ali M. Eydgahi & Mohammad Ghavamzadeh. (in Farsi) “Properties of Branches Passing through Infinity in Root Locus Method”.

*Journal of Faculty of Engineering University of Tehran*, pp. 1-10, December 1996.Ali M. Eydgahi & Mohammad Ghavamzadeh. (in Farsi) “Branches Passing through Infinity in Root Locus Method”.

*Journal of Faculty of Engineering University of Tehran*, pp. 9-15, June 1996.Mohammad Ghavamzadeh & Ali M. Eydgahi. (in Farsi) “An Adaptive Fuzzy Controller for Flexible Joint Robots”.

*Proceedings of the International Conference on Intelligent & Cognitive Systems*, pp. 88-92, Tehran, Iran, September 1996.Mohammad Ghavamzadeh, Khashayar Rohanimanesh, Ali M. Eydgahi, & Bahram Poorali. “Design of an ISDN Terminal”.

*Proceedings of the Twenty First. IEEE International Conference on Industrial Electronics, Control, Instrumentation and Automation*(IECON-1995), pp. 1598-1601, Orlando FL, November 1995.Mohammad Ghavamzadeh & Ali M. Eydgahi. (in Farsi) “A New Approach to Root Locus for Positive Feedback Systems”.

*Proceedings of the Third Iranian Conference on Electrical Engineering*, pp. 23-30, Tehran, Iran, May 1995.Mohammad Ghavamzadeh, Khashayar Rohanimanesh, Ali M. Eydgahi, & Bahram Poorali. (in Farsi) “Design of an ISDN Telephone Terminal with 8751H Intel Micro-Controller”.

*Proceedings of the Third Iranian Conference on Electrical Engineering*, Tehran, Iran, May 1995.