About me

I am the Research Lead at Phaidra. Our team focuses on modeling and optimizing industrial systems.

Before joining Phaidra, I completed my PhD in Artificial Intelligence at TU Delft, where I worked with Frans Oliehoek and Matthijs Spaan. During my PhD, I specialized in (Multi-Agent) Reinforcement Learning, investigating how to abstract (factorize) the agent(s)’ state space to enable more effective learning and optimize runtime. I am also interested in causality, generalization, partial observability, and memory.

Research

Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations

M. Suau. Preprint. Presented at RLDM 2025

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL.

M. Suau, M. T. J Spaan, F. A. Oliehoek. RLC 2024. Outstanding Paper Award on Scientific Understanding in RL.

Leveraging Factored State Representations for Enhanced Efficiency in Reinforcement Learning.

M.Suau. Delft University of Technology. Ph.D. Thesis 2024.

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems.

M. Suau, J. He, M. M. Çelikok, M. T. J Spaan, F. A. Oliehoek. NeurIPS 2022.

Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems.

M. Suau, J. He, M. T. J Spaan, F. A. Oliehoek. ICML 2022.

Online Planning in POMDPs with Self-Improving Simulators.

J. He, M. Suau, H. Baier, M. Kaisers, F. A. Oliehoek. IJCAI 2022.

Speeding up Deep RL through Influence-augmented Local Simulators.

M. Suau, J. He, M. T. J. Spaan, F. A. Oliehoek. AAMAS 2022.

Influence-aware Memory Architectures for Deep Reinforcement Learning in POMDPs.

M. Suau, E. Congeduti, J. He, R.A.N. Starre, A. Czechowski, F. A. Oliehoek. NCAA 2022.

Offline Contextual Bandits for Wireless Network Optimization

M. Suau, A. Agapitos, D. Lynch, D. Farrell, M. Zhou, A. Milenovic. Offline RL workshop, NeurIPS 2021.

Influence-augmented Online Planning for Complex Environments.

J. He, M. Suau, F.A. Oliehoek. NeurIPS 2020.

Selected talks

Cohere Labs 2025 - Policy Confounding: Causes, Consequences, and Corrections.
EWRL 2025 (Tubingen) - Policy Confounding: Causes, Consequences, and Corrections.
RLC 2024 (Amherst) - Bad Habits: Policy Confounding and Out of Trajectory Generalization
Berkeley MARL Seminar 2022 (online) - Scaling up MARL: Distributed Simulation of Large Networked Systems.
ICML 2022 (Baltimore) - Influence-Augmented Local Simulators.
CIG 2018 (Maastricht) - Unity ML Agents Tutorial.

Service

Teaching:

CSE2530, Computational Intelligence, TU Delft, 2021-2022
CSE2530, Computational Intelligence, TU Delft, 2020-2021
CSE2530, Computational Intelligence, TU Delft, 2019-2020

Supervision:

Honours Project: Sven Holtrop, Lucas Crijns (2020)
Master thesis: Nele Albers (2019)
Bachelor’s Thesis: Cian Jansen (2019)
Master thesis: Deniz Hofmeister (2018)

Reviewing:

ICML 2021, 2022 (top reviewer), 2023.
NeurIPS 2021, 2022 (top reviewer), 2023 (top reviewer), 2025 (top reviewer).
ICLR 2021, 2022, 2023.
RLC 2024
AAMAS 2022.

Miguel Suau, Ph.D.