← home Thesis cover — crystal structure motif

PhD Thesis

I wrote my PhD thesis Towards Machine Learning Foundation Models for Materials Chemistry under the supervision of Alpha Lee and Ulrich Keyser at the University of Cambridge Cavendish Laboratory, submitted in 2024.

Abstract

This thesis demonstrates how recent advances in machine learning (ML) for materials can accelerate our search for new stable inorganic crystals. We show how best to measure and compare the utility of different models, what range of applications a foundational ML force field can be expected to cover, what remaining shortcomings current ML potentials exhibit and how some of them can be overcome. Specifically, we present Matbench Discovery, an evaluation framework that closely mimics a real-world materials discovery campaign and establishes universal interatomic potentials (UIPs) as the state-of-the-art ML method for accelerating the discovery of thermodynamically stable inorganic crystals.

Next, we design and execute an ML-guided dielectric discovery workflow that integrates rapid ML screening, targeted crystal generation and high-throughput ab initio validation, all feeding into informed experimental characterization which culminated in the synthesis of two novel dielectric materials, CsTaTeO6 and Bi2Zr2O7, with CsTaTeO6 generated by our workflow.

Finally, we comprehensively analyze MACE-MP, the best-performing model we trained for Matbench Discovery, which has since proven to be a highly versatile foundation model for atomistic simulations. While pre-trained purely on inorganic bulk crystals, it exhibits unexpected extrapolation to diverse chemistries and material classes far beyond its training distribution, showing qualitative and often even quantitative agreement with density functional theory (DFT) across 36 diverse test cases — from phonon spectra and aqueous interfaces to combustion, catalysis, batteries and beyond.

Taken together, these projects showcase how graph neural network (GNN) force fields can form a central pillar of computational materials science, inhabiting a different point on the cost-accuracy Pareto front than DFT — not much worse in accuracy yet orders of magnitude cheaper thanks to linear instead of cubic scaling with system size. Machine learning force fields have thus unlocked the study of complex phenomena over length and time scales previously inaccessible to numerical simulation.