One of the salient features in Gluster 3.10 goes by the rather boring – and slightly opaque – name of brick multiplexing. To understand what it is, and why it’s a good thing, read on.
First, let’s review some relevant parts of how Gluster works already. All storage in Gluster is managed as bricks, which are just directories on servers. Often they’re whole disks or volumes on servers, but that doesn’t have to be the case. It has always been possible to have multiple bricks per server, or even per disk, though carving things up into two many pieces could have some unpleasant effects involving various kinds of resources.
- Ports. Each brick has its own port, which means hundreds of bricks could use up hundreds of ports – and hundreds of firewall rules to manage.
- Memory. Some data structures in Gluster are global, associated with the process, while others are associated with the translators within a brick. Replicating these global parts for hundreds of processes can mean a lot of wasted space.
- CPU. Like global memory, each process has a global pool of threads – for handling network I/O, disk I/O, and various “housekeeping” purposes. Replicating these across hundreds of processes can result in many more threads system-wide, and thus more context switching.
Brick multiplexing is just a term for putting multiple bricks into one process. Therefore, many bricks can consume *one* port, *one* set of global data structures, and *one* pool of global threads. This reduces resource consumption, allowing us to run more bricks than before – up to three times as many in some tests involving the very large numbers of bricks that might be involved in a container/hyperconverged kind of configuration.
In addition to reducing overall contention for these resources, brick multiplexing also brings that contention under more direct control. Previously, we were at the mercy of the operating system’s scheduler and paging system to manage this contention. They’d have to make many guesses about what we need, and often they’d guess wrong. We *know*. Now that multiple bricks can run in one process, we can manage contention more carefully to match our priorities and policies. Some day, this will even be the lever we can use to provide multi-tenant isolation and quality of service.
It’s important to note that multiplexing is *not* primarily a performance enhancer. At low brick counts – e.g. less than the number of CPU cores on a system – you’re probably better off not multiplexing. This is true both because of performance and to keep failure domains smaller. In the mid range (hundreds of bricks) multiplexing might or might not perform process-per-brick, depending on workload. Mostly this is not because of multiplexing itself but because of other changes – such as a much more scalable memory-pool implementation – that were developed along with it. There’s still some untapped potential here, so over time multiplexing is likely to improve performance in more cases. At the high end (thousands of bricks) multiplexing is the only option; that many separate brick processes eventually consume so many resources and thrash so much that no useful work gets done.
In short, multiplexing is already helping us address the particular needs of container and hyperconverged workloads. It also provides the infrastructure on which other enhancements can be built which will provider greater benefits to even more users.