Virtualization and IaaS

ML inference acceleration for lightweight VMMs

D.virtualization
The debate on how to deploy applications, monoliths or micro services, is in full swing. Part of this discussion relates to how the new paradigm incorporates support for accessing accelerators, e.g. GPUs, FPGAs. That kind of support has been made available to traditional programming models the last couple of decades and its tooling has evolved to be stable and standardized (eg. CUDA, OpenCL/OpenACC, Tensorflow etc.). On the other hand, what does it mean for a highly distributed application instance (i.e. a Serverless deployment) to access an accelerator? Should the function invoked to classify an image, for instance, link against the whole acceleration runtime and program the hardware device itself? It seems quite counter-intuitive to create such bloated functions. Things get more complicated when we consider the low-level layers of the service architecture. To ensure user and data isolation, infrastructure providers employ virtualization techniques. However, generic hardware accelerators are not designed to be shared by multiple untrusted tenants. Current solutions (device passthrough, API-remoting) impose inflexible setups, present security trade-offs and add significant performance overheads. To this end, we introduce vAccel, a lightweight framework to expose hardware acceleration functionality to VM tenants. Our framework is based on a thin runtime system, vAccelRT, which is, essentially, an acceleration API: it offers support for a set of operators that use generic hardware acceleration frameworks to increase performance, such as machine learning and linear algebra operators. vAccelRT abstracts away any hardware/vendor-specific code by employing a modular design where backends implement bindings for popular acceleration frameworks and the frontend exposes a function prototype for each available acceleration function. On top of that, using an optimized paravirtual interface, vAccelRT is exposed to a VM’s user-space, where applications can benefit from hardware acceleration via a simple function call. In this talk we present the design and implementation of vAccel on two KVM VMMs: QEMU and AWS Firecracker. We go through a brief design description and focus on the key aspects of enabling hardware acceleration for machine learning inference for ligthweight VMs both on x86_64 and aarch64 architectures. Our current implementation supports jetson-inference & TensorRT, as well as Google Coral TPU, while facilitating integration with NVIDIA GPUs (CUDA) and Intel Iris GPUs (OpenCL). Finally, we present a demo of vAccel in action, using a containerized environment to simplify configuration & deployment [1] https://blog.cloudkernels.net/posts/vaccel [2] https://blog.cloudkernels.net/posts/vaccel_v2 [3] https://vaccel.org [4] https://github.com/nubificus/docker-jetson-inference

Additional information

Type devroom

More sessions

2/6/21
Virtualization and IaaS
Simone Tiraboschi
D.virtualization
KubeVirt enables developers to run Containerized Application and Virtual Machines in a common, shared Kubernetes/OKD/OpenShift environment. An Operator is a method of packaging, deploying and managing a Kubernetes/Openshift application. The Hyperconverged Cluster Operator is an unified operator deploying and controlling KubeVirt and several adjacent operators in a controlled and opinionated way.
2/6/21
Virtualization and IaaS
Miguel Barroso
D.virtualization
KubeVirt's architecture is composed of two main components: virt-handler, a trusted DaemonSet, running in each node, which operates as the virtualization agent, and virt-launcher, an untrusted Kubernetes pod encapsulating a single libvirt + qemu process. To reduce the attack surface of the overall solution, the untrusted virt-launcher component should run with as little linux capabilities as possible. The goal of this talk is to explain the journey to get there, and the steps taken to drop CAP ...
2/6/21
Virtualization and IaaS
D.virtualization
VM sockets (vsock) enable communication between hosts and VMs. The vsock use cases have grown over the recent years to also cover cloud and containers projects. Andra and Stefano will walk through the details of a set of projects focused on isolation that use vsock as a communication channel. Then they will present debugging tools and further work items for improving and adding new features for vsock.
2/6/21
Virtualization and IaaS
Jakub Dżon
D.virtualization
Operator SDK is a solid foundation for building robust applications for Kubernetes; one of such applications is the VM import operator (https://github.com/kubevirt/vm-import-operator) allowing Kubernetes administrators to easily import their oVirt-managed virtual machines to KubeVirt. In this talk, the speaker will show how his team used Operator SDK to build the VM import operator and how that operator can be used.
2/6/21
Virtualization and IaaS
D.virtualization
In this session, participants will get an overview of the new oVirt monitoring feature with its data warehouse (DWH) and Grafana, architecture and demo. The session will also cover the option of creating new dashboards based on the oVirt DWH schema. For creating new dashboards, attendees should be familiar with SQL querying.
2/6/21
Virtualization and IaaS
Christian Gonzalez
D.virtualization
OpenNebula has recently incorporated a new supported hypervisor: Firecracker. This next generation virtualization technology was launched by AWS in late 2018 and is designed for secure multi-tenant container-based services. This integration provides an innovative solution to the classic dilemma between using containers—lighter but with weaker security—or Virtual Machines—with strong security but high overhead. Firecracker is an open source technology that makes use of KVM to launch ...
2/6/21
Virtualization and IaaS
Simon Kuenzer
D.virtualization
Cloud computing has revolutionized the way we think about IT infrastructure: Another web server? More database capacity? Resources for your artificial intelligence use case? Just spin-up another instance and you are good to go. While most cloud images (e.g., AMIs on Amazon EC2) are meant to run a single service (e.g., nginx), for convenience these tend to be built on top of general-purpose OSes and full distributions, often resulting in GB-sized images that sometimes only need to perform a ...