Mohammad Ghavamzadeh

PROFESSIONAL EXPERIENCE

Senior Principal Scientist Senior Staff Research Scientist
Amazon AGI (2023 - present) Google Research (2020 - 2023)

Senior Research Scientist Staff Research Scientist
Facebook AI Research (2018 - 2020) DeepMind (2017 - 2018)

Senior Analytics Researcher Chargé de recherche
Adobe Research (2013 - 2017) INRIA Lille - Team SequeL (2008 - 2013)

EDUCATION

Habilitation à Diriger des Recherches (HDR) Postdoctoral Fellow
Université Lille 1, France (June 2014) University of Alberta, Canada (2005 - 2008)

Ph.D. in Computer Science
University of Massachusetts Amherst, USA (2001 - 2005)

RESEARCH INTERESTS EMAIL
AI, ML, Control, Reinforcement Learning, ghavamza at amazon dot com
RLHF, Bandit Algoriths, Online Learning, mohammad dot ghavamzadeh51 at gmail dot com
Personalization & Recommendation Systems

2025

A paper on “Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models” got accepted at NeurIPS-2025.
I will serve as an area chair at ICLR-2026.
A paper on “Contextual Bandits with Stage-wise Constraints” got accepted at Journal of Machine Learning Research (JMLR).
A paper on “Thompson Sampling for Constrained Bandits” got accepted at RLC-2025.
A paper on “Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage” got accepted at L4DC-2025.
A paper on “Conservative Contextual Bandits: Beyond Linear Representations” got accepted at ICLR-2025.
A paper on “Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis” got accepted at AISTATS-2025.
I joined the editorial board of Foundations and Trends in Machine Learning (FnT ML).
I served as a senior area chair at NeurIPS-2025 and ICML-2025, and as an area chair at ICLR-2025 and RLC-2025.

2024

I was the Science Tech Lead of RLHF for Amazon Nova Foundation Models that were released in December 2024.
Our algorithm Mirror Descent Policy Optimization (MDPO), published at ICLR-2022, was used as the main RLHF algorithm by Apple for Apple Intelligence Foundation Language Models.
Two journal papers published: “A Review of Deep Learning for Video Captioning” at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and “Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors” at ACM Transactions on Recommender Systems.
Six conference papers published: “Bayesian Regret Minimization in Offline Bandits” at ICML-2024, “Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models” and “Maximum Entropy Model Correction in Reinforcement Learning” at ICLR-2024, “Non-adaptive Online Finetuning for Offline Reinforcement Learning” and “Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms” at RLC-2024, and “Factual and Tailored Recommendation Endorsements using Language Models and Reinforcement Learning” at COLM-2024.
I co-chaired a workshop on “Deployable RL: From Research to Practice” at RLC-2024.
I served as a senior area chair at ICML-2024 and AAAI-2024, and as an area chair at ICLR-2024 and NeurIPS-2024.

2023

Our paper on “Ordering-based Conditions for Global Convergence of Policy Gradient Methods” got accepted for an oral presentation at NeurIPS-2023.
Ten conference papers published: “Ordering-based Conditions for Global Convergence of Policy Gradient Methods”, “On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes”, “Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management”, and “Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models” at NeurIPS-2023, “Multi-Task Off-Policy Learning from Bandit Feedback” at ICML-2023, “A Mixture-of-Expert Approach to RL-based Dialogue Management” at ICLR-2023, “Entropic Risk Optimization in Discounted MDPs” and “Multiple-policy High-confidence Policy Evaluation” at AISTATS-2023, “Distributionally Robust Behavioral Cloning for Robust Imitation Learning” at CDC-2023, “Meta-Learning for Simple Regret Minimization” at AAAI-2023.
I co-chaired a workshop on “The Many Facets of Preference-Based Learning” at ICML-2023.
I served as a senior area chair at ICML-2023 and AAAI-2023, and as an area chair at NeurIPS-2023 and ICLR-2023.

2022

Our paper on “A Review on Uncertainty Quantification in Deep Learning: Techniques, Applications, and Challenges” that was published at Elsevier Journal on Information Fusion in 2021 won the 2022 best survey award by this journal.
Our paper on “Fixed-Budget Best-Arm Identification in Structured Bandits” got accepted for a long oral presentation (about %4 acceptance) at IJCAI-2022.
Twelve conference papers published: “Robust Reinforcement Learning using Offline Data”, “Operator Splitting Value Iteration”, “Efficient Risk-Averse Reinforcement Learning”, and “Private and Communication-Efficient Algorithms for Entropy Estimation” at NeurIPS-2022, “Feature and Parameter Selection in Stochastic Linear Bandits” and “Deep Hierarchy in Bandits” at ICML-2022, “Fixed-Budget Best-Arm Identification in Structured Bandits” at IJCAI-2022, “Mirror Descent Policy Optimization” at ICLR-2022, “Thompson Sampling with a Mixture Prior” and “Hierarchical Bayesian Bandits” at AISTATS-2022,“Multi-Environment Meta-Learning in Stochastic Linear Bandits” at IEEE International Symposium on Information Theory (ISIT-2022), and “Collaborative Multi-agent Stochastic Linear Bandits” at ACC-2022.
I was selected as a highlighted area chair at ICLR-2022.
I served as a senior area chair for NeurIPS-2022 and an area chair for ICLR-2022 and ICML-2022.
I served as a guest editor for Machine Learning Journal (MLJ), Special Issue on Safe and Fair Machine Learning.

2021

Two journal papers published: “Active Learning for Classification with Abstention” at IEEE Journal on Selected Areas in Information Theory (JSAIT), and “A Review on Uncertainty Quantification in Deep Learning: Techniques, Applications, and Challenges” at Elsevier Journal on Information Fusion.
Seven conference papers published: “Adaptive Sampling for Minimax Fair Classification” at NeurIPS-2021, “PID Accelerated Value Iteration Algorithm” at ICML-2021, “Variational Model-based Policy Optimization” got accepted at IJCAI-2021, “Neural Lyapunov Redesign” at Learning for Dynamics & Control Conference (L4DC-2021), “Stochastic Bandits with Linear Constraints” at AISTATS-2021, “Control-aware Representations for Model-based Reinforcement Learning” at ICLR-2021, and “Deep Bayesian Quadrature Policy Optimization” at AAAI-2021.
I served as a senior area chair for NeurIPS-2021, and as an area chair for ICML-2021 and ICLR-2021.

2020

Our paper on “Active Learning for Classification with Abstention” short-listed as one of the six finalists for the Jack Keil Wolf student paper award at IEEE International Symposium on Information Theory (ISIT-2020).
Our paper on “Mirror Descent Policy Optimization” accepted for a contributed talk (8 out of about 250 submissions) at the Deep Reinforcement Learning Workshop at NeurIPS-2020.
Eleven conference papers published: “Improved Algorithms for Conservative Exploration in Bandits” at AAAI-2020, “Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control” at ICLR-2020, “Randomized Exploration in Generalized Linear Bandits” and “Conservative Exploration in Reinforcement Learning” at AISTATS-2020, “Active Learning for Classification with Abstention” at IEEE International Symposium on Information Theory (ISIT-2020), “Adaptive Sampling for Estimating Probability Distributions”, “Multi-step Greedy Reinforcement Learning Algorithms”, and “Predictive Coding for Locally-Linear Control” at ICML-2020, “Active Model Estimation in Markov Decision Processes” at UAI-2020, “Safe Policy Learning for Continuous Control” at CoRL-2020, and “Online Planning with Lookahead Policies” at NeurIPS-2020.
I gave an invited talk on “Conservative Exploration in Bandits and Reinforcement Learning” at ICML workshop on “Challenges in Deploying and Monitoring Machine Learning Systems”, and an invited talk at the “Reinforcement Learning Theory Session” at INFORMS-2020.
I co-chaired a tutorial on “Exploration-Exploitation in Reinforcement Learning” at AAAI-2020.
I served as a senior area chair for NeurIPS-2020, and as an area chair for ICML-2020 and AISTATS-2020.

2019

Our paper on “Tight Regret Bounds for Model-based Reinforcement Learning with Greedy Policies” was accepted for spotlight presentation at NeurIPS-2019.
Six conference papers published: “Tight Regret Bounds for Model-based Reinforcement Learning with Greedy Policies” at NeurIPS-2019, “Perturbed-History Exploration in Stochastic Linear Bandits” at UAI-2019, “Perturbed-History Exploration in Stochastic Multi-Armed Bandits” at IJCAI-2019, “Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits” at ICML-2019, and “Risk-sensitive Generative Adversarial Imitation Learning” and “Optimizing over a Restricted Policy Class in MDPs” at AISTATS-2019.
Our paper on “Lyapunov-based Policy Optimization for Continuous Control” won the best paper award at ICML-2019 workshop on “Reinforcement Learning in Real Life”.
I co-chaired a workshop on “Safety and Robustness in Decision-making” at NeurIPS-2019.
I served as an area chair for ICML-2019 and NeurIPS-2019.

2018

Two journal papers published: “Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity” at Journal of Artificial Intelligence Research (JAIR), and “Risk-Constrained Reinforcement Learning with Percentile Risk Criteria” at Journal of Machine Learning Research (JMLR).
Six conference papers published: “A Lyapunov-based Approach to Safe Reinforcement Learning” and “A Block Coordinate Ascent Algorithm for Mean-Variance Optimization” at NIPS-2018, “Path Consistency Learning in Tsallis Entropy Regularized MDPs” and “More Robust Doubly Robust Off-policy Evaluation” at ICML-2018, “Robust Locally-Linear Controllable Embedding” at AISTATS-2018, and “PAC Bandits with Risk Constraints” at ISAIM-2018.
I gave an invited talk on “Three Approaches to Safety in Sequential Decision-making” at ICML workshop on “Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action” (Causal ML).
I taught at the Deep Learning & Reinforcement Learning summer school organized by CIFAR and the Vector Institute at the University of Toronto in August.
I served as an area chair for NIPS-2018 and ICML-2018, and as a senior program committee member for IJCAI-2018 and AAAI-2018.

2017

A journal paper published: “Sequential Decision-making with Coherent Risk” at IEEE Transaction on Automatic Control (TAC).
Eight conference papers published: “Conservative Contextual Linear Bandits” at NIPS-2017, “Active Learning for Accurate Estimation of Linear Models”, “Bottleneck Conditional Density Estimation”, “Diffusion Independent Semi-Bandit Influence Maximization”, and “Online Learning to Rank in Stochastic Click Models” at ICML-2017, “Sequential Multiple Hypothesis Testing with Type I Error Control” at AISTATS-2017, “Predictive Off-Policy Evaluation for Nonstationary Decision Problems” and “Automated Data Cleansing through Meta-Learning” at IAAI-2017.
Together with Marek Petrik, we gave a tutorial on “Risk-averse Decision-making and Control” at AAAI-2017. (tutorial website)
I gave an invited talk at the 2nd Asian Workshop on Reinforcement Learning in Seoul, South Korea on November 15, 2017.
I served as an area chair for NIPS-2017 and as a senior program committee member for AAAI-2017.

2016

Four journal papers published: “Analysis of Classification-based Policy Iteration Algorithms”, “Bayesian Policy Gradient and Actor-Critic Algorithms”, and “Regularized Policy Iteration for Non-Parametric Function Spaces” at Journal of Machine Learning Research (JMLR), and “Variance-constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs” at Machine Learning Journal (MLJ).
Four conference papers published: “Safe Policy Improvement by Minimizing Robust Baseline Regret” at NIPS-2016, “Improved Learning Complexity in Combinatorial Pure Exploration Bandits” at AISTATS-2016, “Proximal Gradient Temporal Difference Learning Algorithms” at the sister conference best paper track at IJCAI-2016, and “Graphical Model Sketch” at ECML-2016.
I gave an invited talk at the 13th European Workshop on Reinforcement Learning (EWRL) in Barcelona on December 3-4, 2016.
I served as a senior program committee member for IJCAI-2016 and ECML-2016.

2015

Three journal papers published: “Approximate Modified Policy Iteration and its Application to the Game of Tetris” at Journal of Machine Learning Research (JMLR), “Classification-based Approximate Policy Iteration” at IEEE Transactions on Automatic Control (TAC), and “Bayesian Reinforcement Learning: A Survey” at Foundation and Trends in Machine Learning.
Five conference papers published: “High Confidence Off-Policy Evaluation” at AAAI-2015, “Maximum Entropy Semi-Supervised Inverse Reinforcement Learning” at IJCAI-2015, “Building Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees” at IJCAI-2015, “High Confidence Policy Improvement” at ICML-2015, and “Policy Gradient for Coherent Risk Measures” at NIPS-2015.
Our paper entitled “Finite-Sample Analysis of Proximal Gradient TD Algorithms” won the Facebook best student paper award at UAI-2015.
I co-chaired two workshops: 12th European Workshop on Reinforcement Learning (EWRL-12) as a workshop at ICML-2015 and “Machine Learning in eCommerce” at NIPS-2015.
My student, Victor Gabillon, won the AFIA (French Association for Artificial Intelligence) prize for the 2nd best Ph.D. thesis (completed in 2014) on artificial intelligence in France.
I served as a senior program committee member for IJCAI-2015.

2014

A paper published: “Algorithms for CVaR Optimization in MDPs” at NIPS-2014.
I co-chaired three workshops: “Sequential Decision-Making with Big Data” at AAAI-2014, “Customers Value Optimization in Digital Marketing” at ICML-2014, and “Large-scale Reinforcement Learning and Markov Decision Problems” at NIPS-2014.
I successfully defended my “Habilitation à Diriger des Recherches” (HDR) thesis and graduated my Ph.D. student Victor Gabillon in June 2014. Victor will be a postdoc with Prof. Peter Bartlett at UC Berkeley starting October 2014.
I served as an area chair for NIPS-2014.