Software Defined Storage

Ephemeral Pinning: A Dynamic Metadata Management Strategy for CephFS

H.1308 (Rolin)
Sidharth Anupkrishnan
Having a separate cluster of Metadata Servers (MDS) is a well known design strategy among distributed file-system architectures. One challenge faced by this approach is how to distribute metadata among the MDSs. Unlike data storage and it's associated I/O throughput, which can be scaled linearly with the number of storage devices, file-system metadata is a fairly complex entity to scale due to it's hierarchical nature. In hindsight, a pure hashing based metadata distribution strategy seems like a perfect fit. But, this is not exactly the case. What are the pitfalls then? Too many inter-MDS hops (due to POSIX traversal semantics), loss of hierarchical locality degrades file-system performance, and as a result, this is not beneficial for a workload whose directory hierarchy tree grows in depth rather than breadth. CephFS's metadata balancer takes a different approach by partitioning metadata sub-trees across MDSs thereby preserving good locality benefits. Although efficient, this involves a lot of back and forth migrations of sub-trees and the locality benefits are sometimes trumped by sub-optimal distributions. In this talk, we present a new metadata distribution strategy employed in CephFS - Ephemeral Pinning. This strategy combines the benefits of hashing and naive sub-tree partitioning by intelligently pinning sub-trees to MDSs so as to obtain a balanced distribution as the workload metadata grows by depth and breadth. A consistent hashing based load balancer helps in maintaining an optimal distribution during addition or failure of MDSs.
This talk will cover the following key ideas: How metadata is handled in distributed file systems. Why it is so important to have an optimal distribution of metadata among Metadata Servers. The drawbacks and advantages of commonly used and popular metadata distribution strategies. Ephemeral Pinning - The new metadata distribution strategy employed by CephFS. We will explain the design and implementation of the distribution strategy and delineate it's strong suits. This talk would be beneficial for every distributed file-system project that handles file metadata separately. They would get an overview on existing metadata distribution strategies - it's pitfall's and benefits and the reason why we at CephFS came up with this approach. The benefit's of using consistent hashing for distributing metadata are also discussed.

Additional information

Type devroom

More sessions

2/2/20
Software Defined Storage
Pritha Srivastava
H.1308 (Rolin)
Ceph is an open source, highly scalable, software defined storage that provides object, file and block interfaces under a unified system. Ceph Object Storage Gateway (RGW) provides a RESTful object storage interface to the Ceph Storage cluster. It provides an interface that is compatible with a large subset of AWS S3 APIs. In this talk we discuss the implementation of a subset of the APIs of AWS Secure Token Service (STS). AWS STS is a web service which enables identity federation and ...
2/2/20
Software Defined Storage
Arjun Sharma
H.1308 (Rolin)
NFS-Ganesha is an extensible user-space NFS server that supports NFS v3, v4, v4.1, v4.2, pNFS, and 9P protocol. It has an easily pluggable architecture called FSAL (File System Abstraction Layer), which enables seamless integration with many filesystem backends (GlusterFS, Ceph, etc.). There will be a discussion on the components along with an architectural explanation of NFS Ganesha with a detailed look at how a request flows through the various layers of NFS Ganesha and see some critical ...
2/2/20
Software Defined Storage
Hari Gowtham
H.1308 (Rolin)
As data is becoming more and more important in the world, we can't afford to lose it even if there is a natural calamity. We will see how Geo-Replication came in to solve this problem for us and how it evolved over the days. Through this session, the users will learn how easy it is to set up Georep for Gluster to use it for their storage and back up their data with minimal understanding of storage and linux. Having a basic Gluster knowledge will make it even more easy
2/2/20
Software Defined Storage
Harshita Sharma
H.1308 (Rolin)
While running in user space ZFS utilizes a user space binary called ztest. In cStor, we followed a similar approach to create a binary called ‘zrepl’ that is part of cStor. It has been built using the libraries similar to what is used for ztest and contains transactional, pooled storage layers. cStor uses ZFS behind the scenes by running it in the user space. This talk we will discuss in detail how we used ZFS in userspace for storage engine cStor and highlight a few challenges that our team ...
2/2/20
Software Defined Storage
Jeremy Allison
H.1308 (Rolin)
The presentation will give an overview of all the changes happening in the Samba project code, from the fileserver virtual filesystem (VFS) rewrite, the new features in the SMB3 code, the quest to remove the old SMB1 protocol and much more. Improvements in Samba scalability, clustering and the Active Directory code will be discussed. The intended audience is anyone who uses the Samba code, creates products with Samba or is interested in the SMB protocol.
2/2/20
Software Defined Storage
H.1308 (Rolin)
Metadata-heavy workloads are often the bane of networked and clustered filesystems. Directory operations (create and unlink, in particular) usually involve making a synchronous request to a server on the network, which can be very slow. CephFS however has a novel mechanism for delegating the ability for clients to do certain operations locally. While that mechanism has mostly been used to delegate capabilities on normal files in the past, it's possible to extend this to cover certain types of ...
2/2/20
Software Defined Storage
Alexander Trost
H.1308 (Rolin)
What is Rook and the architecture of Rook the storage run in Kubernetes. We'll also take a look at new features added to Rook.