HPC, Big Data, and Data Science

Building cloud-based data services to enable earth-science workflows across HPC centres

UB5.132
John Hanley
Weather forecasts produced by ECMWF and environment services by the Copernicus programme act as a vital input for many downstream simulations and applications. A variety of products, such as ECMWF reanalyses and archived forecasts, are additionally available to users via the MARS archive and the Copernicus data portal. Transferring, storing and locally modifying large volumes of such data prior to integration currently presents a significant challenge to users. The key aim for ECMWF within the H2020 HiDALGO project is to migrate some of these tasks to the cloud, thereby facilitating fast and seamless application integration by enabling precise and efficient data delivery to the end-user. The required cloud infrastructure development will also feed into ECMWF's contribution to the European Weather Cloud pilot which is a collaborative cloud development project between ECMWF and EUMETSAT. ECMWF and its HiDALGO partners aim to implement a set of services that enable the simulation of complex global challenges which require massive high performance computing resources alongside state-of-the-art data analytics and visualization. ECMWF's role in the project will be to enable seamless integration of two pilot applications with its meteorological data and services delivered via ECMWF's Cloud and orchestrated by bespoke HiDALGO workflows. The demonstrated workflows show the increased value of weather forecasts, but also derived forecasts for air quality as provided by the Copernicus Atmospheric Monitoring Service (CAMS). The HiDALGO use-case workflows are comprised of four main components: pre-processing, numerical simulation, post-processing and visualization. The core simulations are ideally suited to running in a dedicated HPC environment, due to their large computational demands, coupled with the heavy communication overhead between parallel processes. However, the pre-/post-processing and visualisation tasks generally do not demand more than a few cores to compute and do not require message passing between instances, hence they are good candidates to run in a cloud environment. Enabling, managing and orchestrating the integration of both HPC and cloud environments to improve overall performance is the key goal of HiDALGO. This talk will give a general overview of HiDALGO project and its main aims and objectives. It will present the two test pilot applications which will be used for integration, and an overview of the general workflows and services within HiDALGO. In particular, it will focus on how ECMWF's cloud data and services will couple with the test pilot applications to improve overall workflow performance and enable access to new data for the pilot users. This work is supported by the HiDALGO project and has been partly funded by the European Commission's ICT activity of the H2020 Programme under grant agreement number: 824115.

Additional information

Type devroom

More sessions

2/2/20
HPC, Big Data, and Data Science
Colin Sauze
UB5.132
This talk will discuss the development of a RaspberryPi cluster for teaching an introduction to HPC. The motivation for this was to overcome four key problems faced by new HPC users: The availability of a real HPC system and the effect running training courses can have on the real system, conversely the availability of spare resources on the real system can cause problems for the training course. A fear of using a large and expensive HPC system for the first time and worries that doing something ...
2/2/20
HPC, Big Data, and Data Science
Adrian Woodhead
UB5.132
This presentation will give an overview of the various tools, software, patterns and approaches that Expedia Group uses to operate a number of large scale data lakes in the cloud and on premise. The data journey undertaken by the Expedia Group is probably similar to many others who have been operating in this space over the past two decades - scaling out from relational databases to on premise Hadoop clusters to a much wider ecosystem in the cloud. This talk will give an overview of that journey ...
2/2/20
HPC, Big Data, and Data Science
Félix-Antoine Fortin
UB5.132
Compute Canada provides HPC infrastructures and support to every academic research institution in Canada. In recent years, Compute Canada has started distributing research software to its HPC clusters using with CERN software distribution service, CVMFS. This opened the possibility for accessing the software from almost any location and therefore allow the replication of the Compute Canada experience outside of its physical infrastructure. From these new possibilities emerged an open-source ...
2/2/20
HPC, Big Data, and Data Science
Moritz Meister
UB5.132
Maggy is an open-source framework built on Apache Spark, for asynchronous parallel execution of trials for machine learning experiments. In this talk, we will present our work to tackle search as a general purpose method efficiently with Maggy, focusing on hyperparameter optimization. We show that an asynchronous system enables state-of-the-art optimization algorithms and allows extensive early stopping in order to increase the number of trials that can be performed in a given period of time on ...
2/2/20
HPC, Big Data, and Data Science
Suneel Marthi
UB5.132
The advent of Deep Learning models has led to a massive growth of real-world machine learning. Deep Learning allows Machine Learning Practitioners to get the state-of-the-art score on benchmarks without any hand-engineered features. These Deep Learning models rely on massive hand-labeled training datasets which is a bottleneck in developing and modifying machine learning models. Most large scale Machine Learning systems today like Google’s DryBell use some form of Weak Supervision to construct ...
2/2/20
HPC, Big Data, and Data Science
Frank McQuillan
UB5.132
In this session we will present an efficient way to train many deep learning model configurations at the same time with Greenplum, a free and open source massively parallel database based on PostgreSQL. The implementation involves distributing data to the workers that have GPUs available and hopping model state between those workers, without sacrificing reproducibility or accuracy. Then we apply optimization algorithms to generate and prune the set of model configurations to try.
2/2/20
HPC, Big Data, and Data Science
UB5.132
Predictive maintenance and condition monitoring for remote heavy machinery are compelling endeavors to reduce maintenance cost and increase availability. Beneficial factors for such endeavors include the degree of interconnectedness, availability of low cost sensors, and advances in predictive analytics. This work presents a condition monitoring platform built entirely from open-source software. A real world industry example for an escalator use case from Deutsche Bahn underlines the advantages ...