HPC, Big Data, and Data Science

Build for your microarchitecture: experiences with Spack and archspec

UB5.132
Todd Gamblin
In HPC, software is typically distributed as source code, so that users can build optimized software that takes advantage of specific microarchitectures and other hardware. While this approach provides a lot of flexibility, building software from source remains a huge barrier for users accustomed to simple, fast binary package mangers. Most package managers and container registries label binaries with a high-level architecture family name, e.g., x86_64 or ppc64le, but there is no standard way to label binaries for specific microarchitectures (haswell, skylake, power9, zen2, etc.). We’ll present a new project called “archspec” that aims to bridge this gap. Archspec provides a standard set of human-understandable labels for many popular microarchitectures. It models compatibility relationships between microarchitectures, and it aggregates information on ISA extensions, compiler support, and compiler flags needed to optimize these machines. Finally, it provides a standard set of names for both microarchitectures and ISA features. These features allow container tools and package managers to detect, build, and use optimized binaries. Archspec grew out of the Spack package manager, but it is intended for widespread use by other build, packaging, and containerization tools. We will describe how it has been used in practice so far, how it has simplified writing generic packages, and our plans to get contributions from vendors and the broader community.
Expected prior knowledge / intended audience: Audience should have basic knowledge of build systems, as well as some knowledge about processor architectures. There will be some brief background on this in the talk. This will be interesting to HPC users, developers, packagers, and admins, as well as to anyone writing tools that deal with microarchitecture metadata (like container systems). Speaker bio: Todd Gamblin is a Senior Principal Member of Technical Staff in the Advanced Technology Office in Livermore Computing at Lawrence Livermore National Laboratory. His research focuses on scalable tools for measuring, analyzing, and visualizing parallel performance data. In addition to his research, Todd leads LLNL's DevRAMP (Reproducibility, Analysis, Monitoring, and Performance) team and the Software Packaging Technologies project in the U.S. Exascale Computing Project. He created Spack, a popular open source HPC package management tool with a community of over 450 contributors. Todd has been at LLNL since 2008. Links to code / slides / material for the talk (optional): To be provided closer to FOSDEM. Links to previous talks by the speaker: https://www.youtube.com/watch?v=DRuyPDdNr0M https://www.youtube.com/watch?v=edpgwyOD79E&t=2891s https://www.youtube.com/watch?v=BxNOxHu6FAI https://insidehpc.com/2019/03/spack-a-package-manager-for-hpc/ https://www.youtube.com/watch?v=iTLBkpHskzA See https://tgamblin.github.io/cv/todd-cv.pdf for more (including tutorials and other presentations at major conferences)

Additional information

Type devroom

More sessions

2/2/20
HPC, Big Data, and Data Science
Colin Sauze
UB5.132
This talk will discuss the development of a RaspberryPi cluster for teaching an introduction to HPC. The motivation for this was to overcome four key problems faced by new HPC users: The availability of a real HPC system and the effect running training courses can have on the real system, conversely the availability of spare resources on the real system can cause problems for the training course. A fear of using a large and expensive HPC system for the first time and worries that doing something ...
2/2/20
HPC, Big Data, and Data Science
Adrian Woodhead
UB5.132
This presentation will give an overview of the various tools, software, patterns and approaches that Expedia Group uses to operate a number of large scale data lakes in the cloud and on premise. The data journey undertaken by the Expedia Group is probably similar to many others who have been operating in this space over the past two decades - scaling out from relational databases to on premise Hadoop clusters to a much wider ecosystem in the cloud. This talk will give an overview of that journey ...
2/2/20
HPC, Big Data, and Data Science
Félix-Antoine Fortin
UB5.132
Compute Canada provides HPC infrastructures and support to every academic research institution in Canada. In recent years, Compute Canada has started distributing research software to its HPC clusters using with CERN software distribution service, CVMFS. This opened the possibility for accessing the software from almost any location and therefore allow the replication of the Compute Canada experience outside of its physical infrastructure. From these new possibilities emerged an open-source ...
2/2/20
HPC, Big Data, and Data Science
Moritz Meister
UB5.132
Maggy is an open-source framework built on Apache Spark, for asynchronous parallel execution of trials for machine learning experiments. In this talk, we will present our work to tackle search as a general purpose method efficiently with Maggy, focusing on hyperparameter optimization. We show that an asynchronous system enables state-of-the-art optimization algorithms and allows extensive early stopping in order to increase the number of trials that can be performed in a given period of time on ...
2/2/20
HPC, Big Data, and Data Science
Suneel Marthi
UB5.132
The advent of Deep Learning models has led to a massive growth of real-world machine learning. Deep Learning allows Machine Learning Practitioners to get the state-of-the-art score on benchmarks without any hand-engineered features. These Deep Learning models rely on massive hand-labeled training datasets which is a bottleneck in developing and modifying machine learning models. Most large scale Machine Learning systems today like Google’s DryBell use some form of Weak Supervision to construct ...
2/2/20
HPC, Big Data, and Data Science
Frank McQuillan
UB5.132
In this session we will present an efficient way to train many deep learning model configurations at the same time with Greenplum, a free and open source massively parallel database based on PostgreSQL. The implementation involves distributing data to the workers that have GPUs available and hopping model state between those workers, without sacrificing reproducibility or accuracy. Then we apply optimization algorithms to generate and prune the set of model configurations to try.
2/2/20
HPC, Big Data, and Data Science
UB5.132
Predictive maintenance and condition monitoring for remote heavy machinery are compelling endeavors to reduce maintenance cost and increase availability. Beneficial factors for such endeavors include the degree of interconnectedness, availability of low cost sensors, and advances in predictive analytics. This work presents a condition monitoring platform built entirely from open-source software. A real world industry example for an escalator use case from Deutsche Bahn underlines the advantages ...