Shengzhuang Chen

sheng<dot>chen17[at]imperial<dot>ac<dot>uk
Shengzhuang Chen

I am a third-year PhD student at Imperial College London, supervised by Prof. Ying Wei, Prof. Jonathan Richard Schwarz, and Prof. Alessandra Russo. Previously, I was a visiting student at Harvard University and a research scientist intern at Thomson Reuters Foundational Research, where I worked on LLM post-training. I graduated from Imperial College London with an MEng in Electrical and Electronic Engineering with first-class honours.


My research focuses on machine learning generalization in dynamic, real-world scenarios. Through meta-learning, sparsity techniques, and data-centric approaches, I address the challenge of adapting foundation models to out-of-distribution downstream tasks, with the goal of improving data efficiency, generalization capability, and scalability. I have published multiple first-author papers at leading machine learning conferences including ICML, NeurIPS, ICLR, and ACL, and have won several awards for academic excellence.


If you are interested in my work or would like to collaborate, feel free to reach out! 😊

News

Dec 2025 ADMIRE-BayesOpt accepted to TMLR.
Oct 2025 Invited talk on Data-Centric ML for LLMs at the Allen Institute for AI (AI2).
Jul 2025 Presented SIMoE as an Oral at ACL 2025 in Vienna.
Jun 2025 Invited talk on Sparse Interpolated Mixture-of-Experts for LLM Upcycling at Qingke AI.
Jan 2025 Two papers accepted — CLDyB at ICLR 2025 and SIMoE at ACL 2025.
Jan 2025 Transferred PhD to Imperial College London.
Jan 2025 Started research scientist internship at Thomson Reuters Foundational Research.
Sep 2024 Paper on Learning Where to Edit Vision Transformers accepted to NeurIPS 2024.
May 2024 Paper on Sparse Interpolated Experts for Few-Shot Generalization accepted to ICML 2024.
May 2024 Started visiting research at Harvard Medical School, hosted by Prof. Marinka Zitnik.
Sep 2023 Paper on Secure OOD Task Generalization with EBMs accepted to NeurIPS 2023.
Sep 2022 Awarded the Hong Kong PhD Fellowship (HKPFS).

Selected Publications

ADMIRE-BayesOpt overview
ADMIRE-BayesOpt: Accelerated Data Mixture Re-weighting for Language Models with Bayesian Optimization
S. Chen, X. Ouyang, M. A. L. Pearce, T. Hartvigsen, J. R. Schwarz (____ = equal contribution)
TMLR, 2025
Abstract
We propose a multi-fidelity Bayesian optimization framework to efficiently search for optimal data mixture weights in LLM post-training. By leveraging cheap, low-fidelity proxy evaluations to guide the search, our method substantially reduces both the training and evaluation costs of mixture optimization while recovering high-quality data recipes.
data-centric ML Bayesian optimization LLM post-training
SIMoE overview
Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts
S. Chen, Y. Wei, J. R. Schwarz
ACL 2025 Oral
Abstract
We introduce SIMoE, a method that automatically discovers domain-specialized experts from a dense LLM and assembles them into a sparse mixture-of-experts architecture. By interpolating between the original and fine-tuned parameters and learning a sparse router, SIMoE enables efficient LLM upcycling with strong multi-domain performance and improved parameter efficiency.
LLM post-training mixture-of-experts
CLDyB overview
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
S. Chen, Y. Liao, X. Sun, K. Ma, Y. Wei
ICLR 2025
Abstract
We present CLDyB, a dynamic benchmarking framework for continual learning with pre-trained models. Rather than relying on fixed task splits, CLDyB procedurally generates diverse and evolving task sequences, enabling more realistic evaluation of continual learning methods and better differentiating them under challenging distribution shifts.
continual learning pre-trained models dynamic benchmarking
SMAT overview
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
S. Chen, J. Tack, Y. Yang, Y. W. Teh, J. R. Schwarz, Y. Wei
ICML 2024
Abstract
We propose SMAT, a meta-tuning approach that constructs sparse interpolated experts by blending pre-trained and fine-tuned model parameters. A learned sparse routing mechanism selects and combines these experts at test time, yielding strong few-shot generalization across diverse tasks with improved parameter efficiency.
meta-learning few-shot learning mixture-of-experts
EBML overview
Secure Out-of-Distribution Task Generalization with Energy-Based Models
S. Chen, L.-K. Huang, J. R. Schwarz, Y. Du, Y. Wei
NeurIPS 2023
Abstract
We present a unified framework that leverages energy-based models to jointly detect and adapt to out-of-distribution tasks in few-shot settings. The energy function provides a principled score for OOD detection while also guiding task-conditional adaptation, enabling reliable generalization across diverse domains.
meta-learning few-shot learning

Research Experience

Research Scientist Intern (Foundational Research) Jan – Jul 2025
Thomson Reuters Ltd
LLM post-training. Supervised by Prof. Jonathan Richard Schwarz.
Visiting Student May – Jul 2024
Harvard Medical School, Harvard University
Instruction-tuning for LLMs, with an emphasis on improving downstream cross-task generalization performance. Hosted by Prof. Marinka Zitnik.

Invited Talks

Data-Centric ML for LLMs Oct 2025
Allen Institute for AI (AI2)
Sparse Interpolated Mixture-of-Experts for LLM Upcycling Jun 2025
Qingke AI

Education

Ph.D. in Computer Science 2025 – Present
Imperial College London
Research on LLM post-training and meta-learning.
M.Eng. in Electrical and Electronic Engineering 2017 – 2021
Imperial College London
First class honours; top 10% of the cohort.

Awards & Honours

Research Tuition Scholarship 2024
City University of Hong Kong — Awarded for outstanding research contributions and academic performance.
HKPFS Academic Excellence Award 2024
City University of Hong Kong — Recognizes exceptional academic achievements among HKPFS recipients.
Hong Kong PhD Fellowship (HKPFS) 2022
Research Grants Council of Hong Kong — Highly competitive fellowship supporting outstanding doctoral students (<5% acceptance rate).
Nicholas Battersby Prize 2021
Imperial College London — Best Master Thesis in Analogue Electronics.
Dean's List for Academic Excellence 2017, 2018, 2021
Imperial College London — Top 10% of the cohort for exceptional academic performance.

Academic Service

Conference Reviewer
ICML, ICLR, NeurIPS, ACL, TMLR
Teaching Assistant 2022 – 2025
CS2402 Computational Probability Modeling (2024–2025) · CS5491 Artificial Intelligence (2022–2023), City University of Hong Kong