HPC, Big Data and Data Science

EESSI: One Scientific Software Stack to Rule Them All

D.hpc
Bob Dröge
The European Environment for Scientific Software Installations (EESSI, pronounced as “easy”) is a collaboration between different HPC sites and industry partners, with the common goal to set up a shared repository of scientific software installations that can be used on a variety of systems, regardless of which flavor/version of Linux distribution or processor architecture is used, or whether it is a full-size HPC cluster, a cloud environment or a personal workstation. The EESSI codebase (https://github.com/eessi) is open source and heavily relies on various other open-source software, including Ansible, archspec, CernVM-FS, Cluster-in-the-Cloud, EasyBuild, Gentoo Prefix, Lmod, ReFrame, Singularity, and Terraform. The concept of the EESSI project was inspired by the Compute Canada software stack, and consists of three main layers: - a filesystem layer leveraging the established CernVM-FS technology, to globally distribute the EESSI software stack; - a compatibility layer using Gentoo Prefix, to ensure compatibility with different client operating systems (different Linux distributions, macOS, Windows Subsystem for Linux); - a software layer, hosting optimized installations of scientific software along with required dependencies, which were built for different processor architectures, and where archspec, EasyBuild and Lmod are leveraged. We use Ansible for automating the deployment of the EESSI software stack. Terraform is used for creating cloud instances which are used for development, building software, and testing. We also employ ReFrame for testing the different layers of the EESSI project, and the provided installations of scientific software applications. Finally, we use Singularity containers for having clean software build environments and for providing easy access to our software stack, for instance on machines without a native CernVM-FS client. In this talk, we will present how the EESSI project grew out of a need for more collaboration to tackle the challenges in the changing landscape of scientific software and HPC system architectures. The project structure will be explained in more detail, covering the motivation for the layered approach and the choice of tools, as well as the lessons learned from the work done by Compute Canada. The goals we have in mind and how we plan to achieve them going forward will be outlined. Finally, we will demonstrate the current pilot version of the project, and give you a feeling of the potential impact.
Here we give a more extensive overview of the free and open-source software that EESSI depends on, and how they are being used in the project. Ansible Ansible (https://www.ansible.com/) is a tool for automation and configuration management. We use Ansible for automating the deployment of the EESSI software stack. This includes, for instance, the installation and configuration of all CernVM-FS components, installing Gentoo Prefix on different CPU architectures, and adding our packages and customizations to the Gentoo Prefix installation. archspec Archspec (https://github.com/archspec/archspec) is a Python library for detecting, querying, and comparing the architecture of a system. In EESSI it is used to find the CPU type of the host system and the software stack in the repository that best matches the host CPU microarchitecture. In the future, we will also use the library to do the same for GPUs. CernVM-FS CernVM-FS (https://cernvm.cern.ch/fs/) is a software distribution service that provides a scalable, read-only, globally distributed filesystem. Clients can mount this filesystem over HTTP. We use CernVM-FS to make the scientific software stack available to any client around the world. Cluster-in-the-Cloud Cluster-in-the-Cloud (https://cluster-in-the-cloud.readthedocs.io/) is a tool that allows you to easily set up a scalable and heterogeneous cluster in the cloud. We leverage this tool to automate software builds on specific architectures, and to test the software installations. EasyBuild EasyBuild (https://easybuilders.github.io/easybuild/) is an installation tool for scientific software, currently supporting over 2,000 packages. By default, EasyBuild optimizes the software for the build host system. We use EasyBuild to install all the different scientific software that we want to include in our stack, and for all the different architectures that we want to support. Gentoo Prefix Gentoo Prefix (https://wiki.gentoo.org/wiki/Project:Prefix) is a Linux distribution that is built from source and can be installed in a given path (the “prefix”). It supports many different architectures, including x86_64, Arm64, POWER, and can be used on both Linux and macOS systems. Lmod Lmod (https://www.tacc.utexas.edu/research-development/tacc-projects/lmod) is an environment modules tool written in Lua, which is used on many different HPC systems to give users intuitive access to software installations. It also allows you to have multiple software versions side-by-side. The EESSI software stack includes an installation of Lmod and environment module files for each scientific software application and its dependencies, providing easy access to those installations to end users. ReFrame ReFrame (https://reframe-hpc.readthedocs.io/) is a high-level regression testing framework for HPC systems. EESSI will be using ReFrame to implement tests (written in Python) for verifying the correctness of the different layers of our project, and doing performance checks of software installations. Singularity Singularity (https://sylabs.io/singularity/) is a container platform that was created to run complex applications on HPC systems. We use it to set up isolated build environments on different kinds of systems, without requiring root privileges. Furthermore, we use it to provide clients with a way to easily access our repository, without having to install a CernVM-FS client. Terraform Terraform (https://www.terraform.io/) is a tool that enables you to easily set up clound instances on demand. We use it to do exactly that, for instance for build machines.

Additional information

Type devroom

More sessions

2/6/21
HPC, Big Data and Data Science
Ali Hajiabadi
D.hpc
With the end of Moore’s law, improving single-core processor performance can be extremely difficult to do in an energy-efficient manner. One alternative is to rethink conventional processor design methodologies and propose innovative ideas to unlock additional performance and efficiency. In an attempt to overcome these difficulties, we propose a compiler-informed non-speculative out-of-order commit processor, that attacks the limitations of in-order commit in current out-of-order cores to ...
2/6/21
HPC, Big Data and Data Science
Christian Kniep
D.hpc
The Container ecosystem spans from spawning a process into an isolated and constrained region of the kernel at bottom layer, building and distributing images just above to discussions on how to schedule a fleet of containers around the world at the very top. While the top layers get all the attention and buzz, this session will base-line the audiences' understanding of how to execute containers.
2/6/21
HPC, Big Data and Data Science
Nicolas Poggi
D.hpc
Over the years, there has been extensive efforts to improve Apache Spark SQL performance. This talk will introduce the new Adaptive Query Execution (AQE) framework and how it can automatically improve user query performance. AQE leverages query runtime statistics to dynamically guide Spark's execution as queries run along. The talk will go over the main features in AQE and provide examples on how it can improve on the previous static query plans. Finally, we'll present the significant ...
2/6/21
HPC, Big Data and Data Science
Mohammad Norouzi
D.hpc
This talk introduces DiscoPoP, a tool which identifies parallelization opportunities in sequential programs and suggests programmers how to parallelize them using OpenMP. The tool first identifies computational units which, in our terminology, are the atoms of parallelization. Then, it profiles memory accesses inside the source code to detect data dependencies. Mapping dependencies to CUs, we create a data structure which we call the program execution tree (PET). Further, DiscoPoP inspects the ...
2/6/21
HPC, Big Data and Data Science
Alaina Edwards
D.hpc
In this talk we explore two programming models for GPU accelerated computing in a Fortran application: OpenMP with target directives and CUDA. We use an example application Riemann problem, a common problem in fluid dynamics, as our testing ground. This example application is implemented in GenASiS, a code being developed for astrophysics simulations. While OpenMP and CUDA are supported on the Summit supercomputer, its successor, an exascale supercomputer Frontier, will support OpenMP and ...
2/6/21
HPC, Big Data and Data Science
Robert McLay
D.hpc
XALT is a tool run on clusters to find out what programs and libraries are run. XALT uses the environment variable LD_PRELOAD to attach a shared library to execute code before and after main(). This means that the XALT shared library is a developer on every program run under linux. This shared library is part of every program run. This talk will discuss the various lessons about routine names and memory usage. Adding XALT to track container usage presents new issues because of what shared ...
2/6/21
HPC, Big Data and Data Science
Carsten Kutzner
D.hpc
In this session we are presenting our experiences with migrating from traditional HPC to cloud-native HPC using a compute-heavy scientific workflow that is usually carried out on national supercomputing centers. Our scientific application are atomistic biomolecular simulations using the GROMACS molecular dynamics simulation toolkit.