TiPES result: Machine learning can be trusted in climate science

New theoretical insights provide a physically sound understanding of machine learning-based simplified climate models

Not just a black box. Gutiérrez et al show a machine learning programme that is widely used in climate science builds a true understanding of the climate system. TiPES/HP

Machine learning, when used in climate science builds an actual understanding of the climate system, according to a study published in the journal Chaos by Manuel Santos Gutiérrez and Valerio Lucarini, University of Reading, UK, Mickäel Chekroun, the Weizmann Institute, Israel and Michael Ghil, Ecole Normale Supérieure, Paris, France. This means we can trust machine learning and further its applications in climate science, say the authors. The study is part of the European Horizon 2020 TiPES project on tipping points in the Earth system.

Man or machine

Large, complex climate models are often impractical to work with as they need to run for months on supercomputers. As an alternative, climate scientists often study simplified models.

Generally, two different approaches are used to simplify climate models: A top-down approach where climate experts estimate what impact left out functions will have on the parts kept in the reduced model. And a bottom-up approach, where climate data is fed a machine learning programme, which then simulates the climate system.

The two methods turn out comparable results. It is a challenging problem, however, to physically understand data-driven (bottom-up) approaches to fully trust them. Do machine learning programmes ”understand” that they are dealing with a complex dynamical system, or are they simply good at statistically guessing the right answers?

Intelligent solution

Now, a group of scientists prove analytically and using computer simulations, that a machine learning programme called Empirical Model Reduction (EMR) in fact knows what it is doing. The study shows that this computer programme reaches comparable results to the top-down reductions of larger models because machine learning constructs its own version of a climate model in its software.

”I think what we do in this investigation is give some sort of physical evidence of why this particular data-driven protocol works. And that to me is quite meaningful, because the method has been in the atmospheric sciences for quite a long time. Yet there was still quite a lot of gaps in the understanding of the methodologies,” says PhD student Manuel Santos Gutièrrez.

Encouraging and useful

The study indicates that the machine learning method is dynamically and physically sound and produces robust simulations. According to the authors, this should motivate the further use of data-driven methods in climate science as well as other sciences.

”It is a very encouraging step. Because in some sense, it means the data-driven method is intelligent. It is not an emulator of data. It is a model that captures the dynamical processes. It is able to reconstruct what lies behind the data. And that indicates these theoretical derivations give you an object which is algorithmically useful,” says Valerio Lucarini, professor of statistical mechanics at the University of Reading.

The result is important in a range of fields: applied mathematics, statistical physics, data science, climate science, and complex system science. And it will have implications in a range of industrial contexts, where complex, dynamical systems are studied but only partial information is accessible – like the engineering of aeroplanes, ships, wind turbines, or in traffic modelling, energy grids, distribution networks.

Link to the paper.