by on March 18, 2014

Experiments Using SSDs with Gluster

This post explores the question: “how can gluster utilize SSDs” ? It does this by reviewing three tests done by the Red Hat performance group. In each test SSDs were used in a different configuration. The tests varied by cost of ownership and tunability.

The LSI Nytro MegaRAID 8110-4e card was used for testing on systems in Red Hat’s performance labs. The card can be configured for different RAID levels on disk drives. It has an SSD attached to it. The SSD is flexible and can be configured in different ways. Other cards could have been tested; this was chosen because of its availability in the lab.

Here are the tests:

  1. Replace every disk with SSDs (expensive)
  2. Use SSDs as a cache at the LSI controller level (less expensive, no control over cache settings).
  3. Use SSDs as a cache at the kernel level using dm-cache (less expensive, some control over cache settings).

We would expect SSDs perform best in random I/O workloads when compared with disks. The SSD size should be larger than the RAM, because the Linux buffer cache is already doing caching.

How can users realize SSDs benefit at the least cost?

Replacing disks with SSDs

The most obvious deployment method is to simply replace all the disks in Gluster with SSDs. This may be prohibitive from the cost perspective, but suggests an upper bound that would be achievable.

The experiments met expectations. They showed SSDs performed much better than disks in most cases, in particular with small files.

To compare Nytro SSD to traditional spindles,  3 modes of accessing storage were tried:

  • pure SSD – just put XFS on top of the boot drive
  • traditional – XFS on concatenated  LVM volume consisting of 8 RAID1 disk drive pairs, with disk-local writeback caching DISABLED (WCE=0 for SCSI people)
    • with write-back caching – fsync’ed writes complete as soon as they reach NVRAM on the nytro
    • with write-thru caching – fsync’ed writes do NOT complete until the disk drive has sent them to the platter

These tests used 2 workload types, with varying file sizes and numbers of threads, placing files at random into directories and with random exponential file size distribution.

  • create — opens brand new file, writes data to it, fsyncs it (so it persists in event of crash/power-fail), closes it
  • read – opens existing file, reads it, closes it
operation type threads file size SSD files/sec write-back files/sec write-thru files/sec
create 1 4 4325 2485
1 16 3890 2662
1 64 2527 1616
4 4 12648 5354 192
4 16 10785 5077 189
4 64 6027 2761 175
16 4 24110 5803 492
16 16 18596 7427 496
16 64 7963 3852 461
read 1 4 9869 5384
1 16 6641 4157
1 64 4222 2144
4 4 22585 14619 11841
4 16 19457 7394 7109
4 64 9437 5164 2092
16 4 58010 5260 4286
16 16 36636 4356 3837
16 64 10379 3756 2425

Note the numbers in bold. There are 5 times the write back performance numbers for create, and 10 times what was seen for reads.

Using SSDs as a cache at the controller level

By default, the LSI Nytro card utilizes its SSD as a cache in front of the disks. By using a cache, it is desired that frequently used data will be quickly accessible on the SSD rather than the disk. The caching policies are internal to the hardware – in effect the cache is a “black box”. This experiment tries to show how good its caching policies work.

The tests appeared to show the SSDs had some benefit, but not nearly as significant as when the disks were completely replaced. For example, the best results showed a 70% improvement, while replacing the disks completely in some cases yielded a 500% improvement.

I/O was run directly to RAID-6 volumes without gluster, in order to isolate the effect of the SSD. I/O was generated using the smallfile tool. The tool generated in a “small file” workload that generates random I/O operations.

  • Run swift (object protocol) over XFS
  • fsync after every create
  • extended attributes written to every file
  • A deep directory tree was generated with few files/directory
  • 20 workload generator threads
  • average file size is 64 KB
  • an exponential file size distribution featuring mostly files smaller than chosen file size with a few of the files much larger than the chosen file size
  • 200,000 separate files accessed per thread
  • 5 extended attributes of 32 bytes each accessed per file for swift-put/swift-get operations
  • files randomly sprayed across the directory tree (default is to access directories one at a time)
  • fsync issued by thread after each file is written.

200,000 x 20 = 4 million files were written as part of each test, with a total of 256 GB of data, two times the amount of RAM available on the host.

3 passes of each test were done. The write tests are done using the swift-put operation, and the read tests are done using the swift-get operation.

  • swift-put — create and write the files
  • swift-get – read the files after dropping cache
  • swift-get2 — read the files again after dropping cache;  a 2nd read will detect whether Nytro is starting to cache portions of data/metadata in SSD
  • swift-getcached — read files without dropping cache to see what is speed if host RAM is allowed to buffer data beforehand
  • swift-getnomem — read files after restricting memory usage severely (this could simulate effect of having Nytro configured with far more SSD than RAM

multi-threaded-fsynced-gain

 Using SSDs as a cache at the kernel level (dm-cache)

The Nytro card’s default SSD caching configuration did not generate very impressive improvements. Linux introduced an alternative in the 3.9 kernel called the “dm-cache”, aka “dynamic block level storage caching”. It can cache blocks at the device manager level within the kernel. The blocks reside on a “cache device”, typically an SSD. A related project is called bcache.

The dm-cache module has a tunable policy (e.g. LRU, MFU). The file system can send hints to dm-cache to “encourage” blocks to be cached or not cached.

For the test, the Nytro card’s SSD was re-purposed to act as the caching device for the dm-cache. The test compared using dm-cache with using the normal RAID write-back cache on the Nytro controller.

The results showed dm-cache performed well when there was no caching on the controller level (write through was set on Nytro). When write-back was set, little benefit was observed in most cases, with the exception of the small file workload.

extjournal-dm-vs-wb

smf-creates-100kf-try2

The test was preliminary, for example, RAID-10 rather than RAID-6 was used. The meta device used by dm-cache was housed on a disk.

That said,  dm-cache appears promising.  Nytro’s cache helps performance, but many users prefer JBODs to expensive controllers. Such JBOD users would see worse performance without having the cache. They may be able to recover that performance by using dm-cache+SSDs.

 

3 Comments

  1. Dan,

    Thanks for sharing these insights.

    You mention 70% improvement with caching (most cases) and best case with pure SSD as 500% better performance. How about a real world application say a database – MySQL / NoSQL – MongoDB? Did you perform any experiments with say MySQL (Sysbench benchmark) or NoSQL DB say MongoDB (YCSB benchmark)

    best,
    Shaloo

  2. No, we did not investigate key value stores or noSQL databases. We are sometimes a backend for Swift objects.

    I would be interested in your use case and your company’s technology, particularly how it compares with other similar caching products from companies such as Infinio and PernixData.

  3. SzymonM says:

    Hi,
    I realize it’s been three years since you run this test, but do you possibly have and can share the test setup and test run details?
    I’m trying since two weeks to max out Gluster performance on SSD in small files scenario and the best I could do is 5.6K files/sec.
    You were able to do 24K files/sec and I have no idea how you did that.

    Best regards,
    Szymon

Leave a Reply

Your email address will not be published.