I am an Assistant Professor in the UCL Computer Science department, where I co-lead UCL NLP group. I had my PhD at UCL NLP, co-supervised by Prof. Pontus Stenetorp and Prof. Sebastian Riedel. I was affiliated with Montreal Institute of Learning Algorithm (MILA), and Alberta Machine Intelligence Institute (AMII).
My primary research interest is Natural Language Processing (NLP). I lead the UK-LLM sovereign AI initiative, in collaboration with NVIDIA, building language models for Welsh and other regional languages to support public services including healthcare, education, and legal resources. UK-LLM is one of the flagship projects on the UK AI Research Resource, endorsed by the UK Prime Minister, and trained on Isambard-AI, the UK's most powerful supercomputer. Under the UK-LLM pretraining framework, my team also developed multilingual models covering 10+ languages including European, Middle Eastern, Southeast Asian, and African languages. My research is part of the UCL-NVIDIA Sovereign AI partnership.
My research has received generous support from NVIDIA, Microsoft Research, Oracle Research, and the UK National AI Infrastructure. My research has won multiple Outstanding Paper Awards at leading NLP/AI conferences, including ACL 2022 and EMNLP 2024.
Selected Publications
Please check the full list of my publications here.
The Role of Mixed-Language Documents for Multilingual Large Language Model Pretraining
Jiandong Shao, Raphael Tang, Crystina Zhang, Karin Sevegnani, Pontus Stenetorp, Jianfei Yang, Yao Lu
arXiv 2026
Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation
Raphael Tang, Crystina Zhang, Wenyan Li, Carmen Lai, Pontus Stenetorp, Yao Lu
arXiv 2025
Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Jiandong Shao, Yao Lu, Jianfei Yang
NeurIPS 2025
Multilingual Language Model Pretraining using Machine-translated Data
Jiayi Wang*, Yao Lu*, Maurice Weber, Max Ryabinin, David Adelani, Yihong Chen, Raphael Tang, Pontus Stenetorp (equal contribution)
EMNLP 2025
Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation
Raphael Tang, Crystina Zhang, Lixinyu Xu, Yao Lu, Wenyan Li, Pontus Stenetorp, Jimmy Lin, Ferhan Ture
EMNLP 2024 (Outstanding Paper Award, 0.4% out of 6000+ submissions)
Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation
Yao Lu, Jiayi Wang, Raphael Tang, Sebastian Riedel and Pontus Stenetorp
NAACL 2024
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel and Pontus Stenetorp
ACL 2022 (Outstanding Paper Award, 0.2% out of 3000+ submissions)
