Session
FOSDEM Schedule 2021
HPC, Big Data and Data Science

Accelerating HPC applications with Out-of-Order Commit Processors

D.hpc
Ali Hajiabadi
With the end of Moore’s law, improving single-core processor performance can be extremely difficult to do in an energy-efficient manner. One alternative is to rethink conventional processor design methodologies and propose innovative ideas to unlock additional performance and efficiency. In an attempt to overcome these difficulties, we propose a compiler-informed non-speculative out-of-order commit processor, that attacks the limitations of in-order commit in current out-of-order cores to increase the effective instruction window and use critical resources of the core more intelligently. We build our core based on the open source RISC-V ISA. The hardware and software ecosystem around RISC-V enables building custom hardware and experimenting new HW/SW cooperative ideas.
While modern out-of-order processors execute instructions out-of-order to increase instruction-level parallelism, they retire instructions and manage their limited resources (register file, load/store queue, etc.) in program order to guarantee safe instruction retirement. However, this implementation requires instructions to wait for all preceding branches to resolve in order to release their critical resources, which leaves a significant amount of performance on the table. We propose a HW/SW co-design that enables non-speculative out-of-order commit in a lightweight manner, improving performance and efficiency. The key insight of our work is that identifying true branch dependencies, if properly understood, could lead to higher performance. Dependency analysis shows that not all instructions depend on the most recent branch in the reorder buffer and therefore, there are missed opportunities to improve the performance by not releasing the critical resources of independent instructions. Our processor employs a HW/SW co-design where the compiler detects true branch dependencies that enables the hardware to manage critical resources more intelligently. Also, we introduce a new interface between hardware and OS to enable precise exception handling by exposing recent changes of out-of-order committed instructions. In our talk, we will look at the potential of our out-of-order commit core for HPC workloads. Initial studies with C-based HPC applications show promising results, and we intend to show results for a variety of additional HPC workloads to evaluate the potential of the design. We believe our HW/SW co-design might be a way to build the processors in the future. This work will appear in proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021).

Additional information

Type devroom

More sessions

2/6/21
HPC, Big Data and Data Science
Christian Kniep
D.hpc
The Container ecosystem spans from spawning a process into an isolated and constrained region of the kernel at bottom layer, building and distributing images just above to discussions on how to schedule a fleet of containers around the world at the very top. While the top layers get all the attention and buzz, this session will base-line the audiences' understanding of how to execute containers.
2/6/21
HPC, Big Data and Data Science
Nicolas Poggi
D.hpc
Over the years, there has been extensive and continuous effort on improving Spark SQL's query optimizer and planner, in order to generate high quality query execution plans. One of the biggest improvements is the cost-based optimization framework that collects and leverages a variety of data statistics (e.g., row count, number of distinct values, NULL values, max/min values, etc.) to help Spark make better decisions in picking the most optimal query plan.
2/6/21
HPC, Big Data and Data Science
Mohammad Norouzi
D.hpc
This talk introduces DiscoPoP, a tool which identifies parallelization opportunities in sequential programs and suggests programmers how to parallelize them using OpenMP. The tool first identifies computational units which, in our terminology, are the atoms of parallelization. Then, it profiles memory accesses inside the source code to detect data dependencies. Mapping dependencies to CUs, we create a data structure which we call the program execution tree (PET). Further, DiscoPoP inspects the ...
2/6/21
HPC, Big Data and Data Science
Alaina Edwards
D.hpc
In this talk we explore two programming models for GPU accelerated computing in a Fortran application: OpenMP with target directives and CUDA. We use an example application Riemann problem, a common problem in fluid dynamics, as our testing ground. This example application is implemented in GenASiS, a code being developed for astrophysics simulations. While OpenMP and CUDA are supported on the Summit supercomputer, its successor, an exascale supercomputer Frontier, will support OpenMP and ...
2/6/21
HPC, Big Data and Data Science
Bob Dröge
D.hpc
The European Environment for Scientific Software Installations (EESSI, pronounced as “easy”) is a collaboration between different HPC sites and industry partners, with the common goal to set up a shared repository of scientific software installations that can be used on a variety of systems, regardless of which flavor/version of Linux distribution or processor architecture is used, or whether it is a full-size HPC cluster, a cloud environment or a personal workstation. The EESSI codebase ...
2/6/21
HPC, Big Data and Data Science
Robert McLay
D.hpc
XALT is a tool run on clusters to find out what programs and libraries are run. XALT uses the environment variable LD_PRELOAD to attach a shared library to execute code before and after main(). This means that the XALT shared library is a developer on every program run under linux. This shared library is part of every program run. This talk will discuss the various lessons about routine names and memory usage. Adding XALT to track container usage presents new issues because of what shared ...
2/6/21
HPC, Big Data and Data Science
Carsten Kutzner
D.hpc
In this sessions we are presenting several approaches to migrate from traditional HPC to cloud-native, containerized HPC using an ensemble run of the molecular dynamics code GROMACS as an example. The session will show how containerization via software management is coming to the rescue and how a palatable journey might look like.