Computing project prepares weather prediction for the exascale

29 November 2021
Share

ESCAPE-2 project image

The three-year EU-funded ESCAPE-2 project coordinated by ECMWF, on energy-efficient scalable algorithms for weather and climate prediction at the exascale, is coming to a successful conclusion.

The project brought together 12 partners across Europe from national meteorological and hydrological services, high-performance computing (HPC) centres, hardware vendors and universities: Deutsches Klimarechenzentrum GmbH and Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (Germany), MeteoSwiss through the Eidgenössisches Department des Inneren (Switzerland), Barcelona Supercomputing Center (Spain), Commissariat a l'Energie Atomique et aux Energies Alternatives (France), Loughborough University (United Kingdom), Institut Royal Météorologique de Belgique (Belgium), Politecnico di Milano (Italy), Danmarks Meteorologiske Institut (Denmark), Fondazione Centro Euro-Mediterraneo sui Cambiamenti Climatici (Italy), Bull SAS (France), and ECMWF.

ESCAPE-2 built on the ESCAPE-1 project, which introduced the concept of ‘weather & climate dwarfs’. The dwarfs are standalone applications representing key patterns of weather and climate models which are particularly relevant for efficiency in upcoming computing architectures.

“They are small enough to facilitate collaboration with hardware vendors and HPC experts but sufficiently complex to be relevant for the performance optimisation of entire operational models,” says Andreas Müller, one of the scientists at ECMWF who worked on the project.

ESCAPE-2 has extended the previous work by adding new dwarfs and algorithmic developments, the possibility of hardware-specific optimisations, benchmarking, and the application of a new tool for uncertainty quantification.

New dwarfs and algorithmic developments

Many new dwarfs have been added in ESCAPE-2: they cover regional and ocean models as well as global and atmosphere models, thanks to partners from the limited-area-model community and the ocean community.

Novel algorithmic advances include the discontinuous Galerkin dynamical core option developed for ECMWF’s Integrated Forecasting System (IFS).

This method divides the globe into small grid elements and supports a high order of accuracy inside each element. It thus produces a much smaller amount of data exchange between different parts of the supercomputer than the global spectral method, which ECMWF uses operationally.

"The approach chosen in ESCAPE-2 permits the use of a long time-step,” says Giovanni Tumolo, another ECMWF scientist involved in the project. “It also makes it possible to vary the order of accuracy between different grid elements. This flexibility thus allows us to dynamically adapt the complexity of the computation to the local weather conditions inside each element.”

Galerkin simulation of transport of two tracers

Adaptive discontinuous Galerkin simulation of transport of two tracers by a RossbyHaurwitz wave with tracer concentration (left) and order of accuracy represented by the polynomial degree (right).

ESCAPE-2 has also developed a new multi-grid approach to the Finite Volume Module (FVM) dynamical core developed for the IFS: weather forecast equations are solved at a relatively low resolution with corrections made at a higher resolution. The result is greater fault tolerance and 4.8 times faster computation of some aspects of the forecast at 6 km resolution without a significant reduction in accuracy.

This led to an in-depth white paper on fault tolerance in weather and climate applications examining different approaches, which is important as the number of processors and therefore the likelihood of hardware drop-outs increases.

Further, the project included the successful demonstration of a machine learning model emulator for the ECMWF radiation scheme ecRAD. As this module is computing intensive, machine learning can help accelerate code execution without significant loss of accuracy.

Hardware-specific optimisations

Optimising code for different emerging hardware architectures usually comes at the cost of increased complexity and reduced readability of the code.

One approach to mitigate these difficulties is the use of domain specific languages (DSL). They make it possible to separate the hardware-specific optimisation from the code containing the scientific algorithm. This improves readability while still allowing complex hardware-specific optimisations.

Toolchain developed in ESCAPE-2

Schematic illustration of the domain-specific language frontends and the toolchain developed in ESCAPE-2. The code which the scientist writes gets translated by one of the frontends (on the left) into a high-level intermediate representation. It then passes through a toolchain of checkers and optimisers and finally gets processed by one of the code generators.

Apart from the full implementation and assessment for atmosphere model dwarfs, the ESCAPE-2 toolchain has also been successfully applied to a dwarf from the NEMO ocean model. The generated code shows good performance. This DSL toolchain will continue to be used in the ongoing ESiWACE-2 project, which will help to establish the ESCAPE-2 DSL in the wider weather and climate community.

Benchmarking

Many of the dwarfs have been incorporated into a newly created benchmark suite called HPCW (High-Performance Computing – Weather) to assess new supercomputers with workloads that are more representative of calculation-, memory- and communication-intensive patterns than standard benchmarks.

To facilitate the use of HPCW by HPC centres around the world, ESCAPE-2 has created test configurations for each of the codes included in HPCW. A common framework to build the codes has been established and a verification routine has been implemented.

In addition, the workload simulator Kronos developed by ECMWF in the NextGenIO project has been used to synthesize components of realistic data-handling-intensive workflows, but also to include real dwarfs.

“This allows more accurate predictions of the performance and workflow on upcoming supercomputers before the actual hardware becomes available,” Andreas says. “This work will thus benefit weather and climate prediction centres in their procurements when buying new computers.”

Verification, validation and uncertainty quantification

Verification, validation and uncertainty quantification (VVUQ) has always been an important part of weather prediction. However, there has been very little exchange with VVUQ approaches in other domains, such as the URANIE platform developed by the French Alternative Energies and Atomic Energy Commission (CEA).

In ESCAPE-2, URANIE has been successfully extended to weather and climate applications. URANIE was improved with the help of performance optimisation tools provided by one of our project partners, the Barcelona Supercomputing Center (BSC).

These improvements allow URANIE to run large-scale applications in parallel and on large supercomputers, and therefore build on the ESCAPE-2 experience for the benefit of applications URANIE was originally built for.

“URANIE has been applied in ESCAPE-2 to a shallow water model, to the radiation scheme ACRANEB2, and to the limited-area model ensemble prediction system HarmonEPS,” Andreas says. “It has been found to be particularly useful for sensitivity and calibration experiments.”

Outlook

ESCAPE-2 has successfully continued foundational concepts pioneered in ESCAPE-1, namely dwarfs and DSL. It has produced crucial developments taken forward by the weather and climate community centre of excellence ESiWACE-2, and it has provided valuable technological guidance for Destination Earth.

Follow-up work at ECMWF will build on ESCAPE-2 to prepare our forecasting techniques for future challenges.