all posts tagged cli


by on September 2, 2016

delete, info, config : GlusterFS Snapshots CLI Part 2

Now that we know how to create GlusterFS snapshots, it will be handy to know, how to delete them as well.
Right now I have a cluster with two volumes at my disposal. As can be seen below, each volume has 1 brick.

# gluster volume info

Volume Name: test_vol
Type: Distribute
Volume ID: 74e21265-7060-48c5-9f32-faadaf986d85
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: VM1:/brick/brick-dirs1/brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

Volume Name: test_vol1
Type: Distribute
Volume ID: b6698e0f-748f-4667-8956-ec66dd91bd84
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: VM2:/brick/brick-dirs/brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

We are going to take a bunch of snapshots for both these volumes using the create command.

# gluster snapshot create snap1 test_vol no-timestamp
snapshot create: success: Snap snap1 created successfully
# gluster snapshot create snap2 test_vol no-timestamp
snapshot create: success: Snap snap2 created successfully
# gluster snapshot create snap3 test_vol no-timestamp
snapshot create: success: Snap snap3 created successfully
# gluster snapshot create snap4 test_vol1 no-timestamp
snapshot create: success: Snap snap4 created successfully
# gluster snapshot create snap5 test_vol1 no-timestamp
snapshot create: success: Snap snap5 created successfully
# gluster snapshot create snap6 test_vol1 no-timestamp
snapshot create: success: Snap snap6 created successfully
# gluster snapshot list
snap1
snap2
snap3
snap4
snap5
snap6
#

Now we have 3 snapshots for each volume. To delete a snapshot we have to use the delete command along with the snap name.

# gluster snapshot delete snap1
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: snap1: snap removed successfully
# gluster snapshot list
snap2
snap3
snap4
snap5
snap6
#

We can also choose to delete all snapshots that belong to a particular volume. Before doing that let’s see what snapshots are present for volume “test_vol”. Apart from snapshot list, there is also snapshot info command that provides more elaborate details of snapshots. Like snapshot list, snapshot info can also take volume name as an option to show information of snapshots belonging to only that volume.

# gluster snapshot list test_vol
snap2
snap3
# gluster snapshot info volume test_vol
Volume Name               : test_vol
Snaps Taken               : 2
Snaps Available           : 254
Snapshot                  : snap2
Snap UUID                 : d17fbfac-1cb1-4276-9b96-0b73b90fb545
Created                   : 2016-07-15 09:32:07
Status                    : Stopped

Snapshot                  : snap3
Snap UUID                 : 0f319761-eca2-491e-b678-75b56790f3a0
Created                   : 2016-07-15 09:32:12
Status                    : Stopped
#

As we can see from both list and info command, test_vol  has 2 snapshots snap2, and snap3. Instead of individually deleting these snapshots one by one, we can choose to delete all snapshots that belong to a particular volume, in this case test_vol.

# gluster snapshot delete volume test_vol
Volume (test_vol) contains 2 snapshot(s).
Do you still want to continue and delete them?  (y/n) y
snapshot delete: snap2: snap removed successfully
snapshot delete: snap3: snap removed successfully
#
# gluster snapshot list
snap4
snap5
snap6
# gluster snapshot list test_vol
No snapshots present
# gluster snapshot info volume test_vol
Volume Name               : test_vol
Snaps Taken               : 0
Snaps Available           : 256
#

With the above volume option we successfully deleted both the snapshots of test_vol with a single command. Now only 3 snapshots remain, and both belong to volume “test_vol1”. Before proceeding further let’s create one more snapshot for volume “test_vol”.

# gluster snapshot create snap7 test_vol no-timestamp
snapshot create: success: Snap snap7 created successfully
# gluster snapshot list
snap4
snap5
snap6
snap7
#

With this, we have four snapshots belonging, three of which belong to test_vol1, and one belongs to test_vol. Now with the ‘delete all’  command we will be able to delete all snapshots present irrespective of which volumes they belong to.

 # gluster snapshot delete all
System contains 4 snapshot(s).
Do you still want to continue and delete them?  (y/n) y
snapshot delete: snap4: snap removed successfully
snapshot delete: snap5: snap removed successfully
snapshot delete: snap6: snap removed successfully
snapshot delete: snap7: snap removed successfully
# gluster snapshot list
No snapshots present
#

So that is how you delete GlusterFS snapshots. There are some configurable options for Gluster snapshots, which can be viewed and modified using the snapshot config option.

# gluster snapshot config

Snapshot System Configuration:
snap-max-hard-limit : 256
snap-max-soft-limit : 90%
auto-delete : disable
activate-on-create : disable

Snapshot Volume Configuration:

Volume : test_vol
snap-max-hard-limit : 256
Effective snap-max-hard-limit : 256
Effective snap-max-soft-limit : 230 (90%)

Volume : test_vol1
snap-max-hard-limit : 256
Effective snap-max-hard-limit : 256
Effective snap-max-soft-limit : 230 (90%)
#

Just running the config option, as shown above displays the current configuration in the system. What we are looking at are the default configuration values. There are four different configurable parameters. Let’s go through them one by one.

  • snap-max-hard-limit: Set by default as 256, the snap-max-hard-limit is the maximum number of snapshots that can be present in the system. Once a volume reaches this limit, in terms of the number of snapshots it has, we are not allowed to create any more snapshot, unless we either delete a snapshot, or increase this limit.

    # gluster snapshot config test_vol snap-max-hard-limit 2
    Changing snapshot-max-hard-limit will limit the creation of new snapshots if they exceed the new limit.
    Do you want to continue? (y/n) y
    snapshot config: snap-max-hard-limit for test_vol set successfully
    # gluster snapshot config

    Snapshot System Configuration:
    snap-max-hard-limit : 256
    snap-max-soft-limit : 90%
    auto-delete : disable
    activate-on-create : disable

    Snapshot Volume Configuration:

    Volume : test_vol
    snap-max-hard-limit : 2
    Effective snap-max-hard-limit : 2
    Effective snap-max-soft-limit : 1 (90%)

    Volume : test_vol1
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 256
    Effective snap-max-soft-limit : 230 (90%)
    #
    #
    # gluster snapshot info volume test_vol
    Volume Name               : test_vol
    Snaps Taken               : 0
    Snaps Available           : 2
    #

    As can be seen with the config option, I have modified the snap-max-hard-limit for the volume test_vol to 2. This means after taking 2 snapshots it will not allow me to take any more snapshots, till I either delete one of them, or increase this value. See how the snapshot info for the volume test_vol shows ‘Snaps Available’ as 2.

    # gluster snapshot create snap1 test_vol no-timestamp
    snapshot create: success: Snap snap1 created successfully
    # gluster snapshot create snap2 test_vol no-timestamp
    snapshot create: success: Snap snap2 created successfully
    Warning: Soft-limit of volume (test_vol) is reached. Snapshot creation is not possible once hard-limit is reached.
    #
    #
    # gluster snapshot info volume test_vol
    Volume Name               : test_vol
    Snaps Taken               : 2
    Snaps Available           : 0
    Snapshot                  : snap1
    Snap UUID                 : 2ee5f237-d4d2-47a6-8a0c-53a887b33b26
    Created                   : 2016-07-15 10:12:55
    Status                    : Stopped

    Snapshot                  : snap2
    Snap UUID                 : 2c74925e-4c75-4824-b39e-7e1e22f3b758
    Created                   : 2016-07-15 10:13:02
    Status                    : Stopped

    #
    # gluster snapshot create snap3 test_vol no-timestamp
    snapshot create: failed: The number of existing snaps has reached the effective maximum limit of 2, for the volume (test_vol). Please delete few snapshots before taking further snapshots.
    Snapshot command failed
    #

    What we have done above is we created 2 snapshots for the volume test_vol and we reached it’s snap-max-hard-limit. Notice two things here, first is when we created the second snapshot it gave us a warning that the soft-limit is reached for this volume (we will come to the soft-limit in a while), and second is that the ‘Snaps Available’ in snapshot info has now become 0. As explained, when we try to take the third snapshot it fails to do so, while explaining that we have reached the maximum limit, and asking us to delete a few snapshots.

    # gluster snapshot delete snap1
    Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
    snapshot delete: snap1: snap removed successfully
    # gluster snapshot create snap3 test_vol no-timestamp
    snapshot create: success: Snap snap3 created successfully
    Warning: Soft-limit of volume (test_vol) is reached. Snapshot creation is not possible once hard-limit is reached.
    #
    # gluster snapshot config test_vol snap-max-hard-limit 3
    Changing snapshot-max-hard-limit will limit the creation of new snapshots if they exceed the new limit.
    Do you want to continue? (y/n) y
    snapshot config: snap-max-hard-limit for test_vol set successfully
    # gluster snapshot info volume test_vol
    Volume Name               : test_vol
    Snaps Taken               : 2
    Snaps Available           : 1
    Snapshot                  : snap2
    Snap UUID                 : 2c74925e-4c75-4824-b39e-7e1e22f3b758
    Created                   : 2016-07-15 10:13:02
    Status                    : Stopped

    Snapshot                  : snap3
    Snap UUID                 : bfd080f3-848e-490a-83ed-066858bd96fc
    Created                   : 2016-07-15 10:19:17
    Status                    : Stopped

    # gluster snapshot create snap4 test_vol no-timestamp
    snapshot create: success: Snap snap4 created successfully
    Warning: Soft-limit of volume (test_vol) is reached. Snapshot creation is not possible once hard-limit is reached.
    #

    As seen above, once we delete a snapshot the system allows us to create another one. It also allows us to do so when we increase the snap-max-hard-limit. I am curious to see what happens when we have hit the snap-max-hard-limit, and I go ahead and further decrease the limit. Does the system delete snapshots to bring the number of snapshots to the set limit.

    # gluster snapshot config test_vol snap-max-hard-limit 1
    Changing snapshot-max-hard-limit will limit the creation of new snapshots if they exceed the new limit.
    Do you want to continue? (y/n) y
    snapshot config: snap-max-hard-limit for test_vol set successfully
    # gluster snapshot config

    Snapshot System Configuration:
    snap-max-hard-limit : 256
    snap-max-soft-limit : 90%
    auto-delete : disable
    activate-on-create : disable

    Snapshot Volume Configuration:

    Volume : test_vol
    snap-max-hard-limit : 1
    Effective snap-max-hard-limit : 1
    Effective snap-max-soft-limit : 0 (90%)

    Volume : test_vol1
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 256
    Effective snap-max-soft-limit : 230 (90%)
    # gluster snapshot info volume test_vol
    Volume Name               : test_vol
    Snaps Taken               : 3
    Snaps Available           : 0
    Snapshot                  : snap2
    Snap UUID                 : 2c74925e-4c75-4824-b39e-7e1e22f3b758
    Created                   : 2016-07-15 10:13:02
    Status                    : Stopped

    Snapshot                  : snap3
    Snap UUID                 : bfd080f3-848e-490a-83ed-066858bd96fc
    Created                   : 2016-07-15 10:19:17
    Status                    : Stopped

    Snapshot                  : snap4
    Snap UUID                 : bd9a5297-0eb5-47d1-b250-9b57f4e57427
    Created                   : 2016-07-15 10:20:08
    Status                    : Stopped

    #
    # gluster snapshot create snap5 test_vol no-timestamp
    snapshot create: failed: The number of existing snaps has reached the effective maximum limit of 1, for the volume (test_vol). Please delete few snapshots before taking further snapshots.
    Snapshot command failed
    #

    So the answer to that question is a big NO. We don’t explicitly delete snapshots when you decrease the snap-max-hard-limit to a number below the current number of snapshots. The reason for not doing so, is it will become very easy to lose important snapshots. However, what we do is, we do not allow you to create snapshots, till you… (yeah you guessed it right), either delete a snapshot or increase the snap-max-hard-limit.

    snap-max-hard-limit is both a system config and a volume config. What it means is we can set this value for indiviudal volumes, and we can also set a system value.

    # gluster snapshot config snap-max-hard-limit 10
    Changing snapshot-max-hard-limit will limit the creation of new snapshots if they exceed the new limit.
    Do you want to continue? (y/n) y
    snapshot config: snap-max-hard-limit for System set successfully
    # gluster snapshot config

    Snapshot System Configuration:
    snap-max-hard-limit : 10
    snap-max-soft-limit : 90%
    auto-delete : disable
    activate-on-create : disable

    Snapshot Volume Configuration:

    Volume : test_vol
    snap-max-hard-limit : 1
    Effective snap-max-hard-limit : 1
    Effective snap-max-soft-limit : 0 (90%)

    Volume : test_vol1
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 10
    Effective snap-max-soft-limit : 9 (90%)
    #

    Notice, how not mentioning a volume name for a snapshot config, sets that particular config for the whole system, instead of a particular volume. The same is clearly visible in the ‘Snapshot System Configuration’ section of the snapshot config output. Look at this system option as an umbrella limit for the entire cluster. You are allowed to still configure individual volume’s snap-max-hard-limit. If the individual volume’s limit is lesser than the system’s limit, then it will be honored, else the system limit will be honored.

    For example, we can see that the system snap-max-hard-limit is set to 10. Now, in case of the volume test_vol, the snap-max-hard-limit for the volume is set to 1, which is lower than the system’s limit and is hence honored, making the effective snap-max-hard-limit as 1. This effective snap-max-hard-limit is the limit that is taken into consideration during snapshot create, and is displayed as ‘Snaps Available’ in snapshot info. Similarly, for volume test_vol1, the snap-max-hard-limit is 256, which is higher than the system’s limit, and is hence not honored, making the effective snap-max-hard-limit of that volume as 10, which is the system’s snap-max-hard-limit. Pretty intuitive huh!!!

  • snap-max-soft-limit: This option is set as a percentage (of snap-max-hard-limit), and as we have seen in examples above, on crossing this limit, a warning is shown saying the soft-limit is reached. It serves as a reminder to the user, that he is nearing the hard-limit and should do something about it in order to be able to keep on taking snapshots. By default the snap-max-hard-limit is set to 90%, and can be modified using the snapshot config option.

    # gluster snapshot config test_vol snap-max-soft-limit 50
    Soft limit cannot be set to individual volumes.
    Usage: snapshot config [volname] ([snap-max-hard-limit <count>] [snap-max-soft-limit <percent>]) | ([auto-delete <enable|disable>])| ([activate-on-create <enable|disable>])
    #

    So what do we have here… Yes, the snap-max-soft-limit is a system option only and cannot be set to individual volumes. When the snap-max-soft-limit option is set for the system, it applies on the effective snap-max-hard-limit of individual volumes, to get the effective snap-max-soft-limit of those respective volumes.

    # gluster snapshot config snap-max-soft-limit 50
    If Auto-delete is enabled, snap-max-soft-limit will trigger deletion of oldest snapshot, on the creation of new snapshot, when the snap-max-soft-limit is reached.
    Do you want to change the snap-max-soft-limit? (y/n) y
    snapshot config: snap-max-soft-limit for System set successfully
    # gluster snapshot config

    Snapshot System Configuration:
    snap-max-hard-limit : 10
    snap-max-soft-limit : 50%
    auto-delete : disable
    activate-on-create : disable

    Snapshot Volume Configuration:

    Volume : test_vol
    snap-max-hard-limit : 1
    Effective snap-max-hard-limit : 1
    Effective snap-max-soft-limit : 0 (50%)

    Volume : test_vol1
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 10
    Effective snap-max-soft-limit : 5 (50%)
    #

    As we can see above, on setting the option for the system, it applies to the individual volume’s (see test_vol1) snap-max-soft-limit, to procure that particular volume’s snap-max-soft-limit.

    I am sure the keen-eyed observer in you has noticed, the Auto-delete warning in the output above, and it’s just as well because it is our third configurable parameter.

  • auto-delete: This option is tightly tied with snap-max-soft-limit, or rather effective snap-max-soft-limit of individual volumes. It is however a system option and cannot be set for individual volumes. On enabling this option, once we exceed the effective snap-max-soft-limit, of a particular volume, we automatically delete the oldest snapshot for this volume, making sure the total number of snapshots don’t increase the effective snap-max-soft-limit, and never reach the effective snap-max-hard-limit, enabling you to keep taking snapshots without hassle.

    NOTE: Extreme Caution Should Be Exercised When Enabling This Option, As It Automatically Deletes The Oldest Snapshot Of A Volume, When The Number Of Snapshots For That Volume Exceeds The Effective snap-max-soft-limit Of That Volume.

    # gluster snapshot config auto-delete enable
    snapshot config: auto-delete successfully set
    # gluster snapshot config

    Snapshot System Configuration:
    snap-max-hard-limit : 10
    snap-max-soft-limit : 50%
    auto-delete : enable
    activate-on-create : disable

    Snapshot Volume Configuration:

    Volume : test_vol
    snap-max-hard-limit : 1
    Effective snap-max-hard-limit : 1
    Effective snap-max-soft-limit : 0 (50%)

    Volume : test_vol1
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 10
    Effective snap-max-soft-limit : 5 (50%)
    #
    # gluster snapshot list
    snap2
    snap3
    snap4
    # gluster snapshot delete all
    System contains 3 snapshot(s).
    Do you still want to continue and delete them?  (y/n) y
    snapshot delete: snap2: snap removed successfully
    snapshot delete: snap3: snap removed successfully
    snapshot delete: snap4: snap removed successfully
    # gluster snapshot create snap1 test_vol1 no-timestamp
    snapshot create: success: Snap snap1 created successfully
    # gluster snapshot create snap2 test_vol1 no-timestamp
    snapshot create: success: Snap snap2 created successfully
    # gluster snapshot create snap3 test_vol1 no-timestamp
    snapshot create: success: Snap snap3 created successfully
    # gluster snapshot create snap4 test_vol1 no-timestamp
    snapshot create: success: Snap snap4 created successfully
    # gluster snapshot create snap5 test_vol1 no-timestamp
    snapshot create: success: Snap snap5 created successfully

    In the above example, we first set the auto-delete option in snapshot config,  followed by deleting all the snapshots currently in the system. Then we create 5 snapshots for test_vol1, whose effective snap-max-soft-limit is 5. On creating one more snapshot, we will exceed the limit, and the oldest snapshot will be deleted.

    # gluster snapshot create snap6 test_vol1 no-timestamp
    snapshot create: success: Snap snap6 created successfully
    #
    # gluster snapshot list volume test_vol1
    snap2
    snap3
    snap4
    snap5
    snap6
    #

    As soon as we create snap6, the total number of snapshots become 6, thus exceeding the effective snap-max-soft-limit for test_vol1. The oldest snapshot for test_vol1(which is snap1) is then deleted in the background,  bringing the total number of snapshots to 5.

  • activate-on-create: As we discussed during creation of snapshot, a snapshot on creation is in deactivated state by default, and needs to be activated to be used. On enabling this option in snapshot config, every snapshot created thereafter, will be activated by default. This too is a system option, and cannot be set for individual volumes.

    # gluster snapshot status snap6

    Snap Name : snap6
    Snap UUID : 7fc0a0e7-950d-4c1b-913d-caea6037e633

    Brick Path        :   VM2:/var/run/gluster/snaps/db383315d5a448d6973f71ae3e45573e/brick1/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   1.80
    LV Size           :   616.00m

    #
    # gluster snapshot config activate-on-create enable
    snapshot config: activate-on-create successfully set
    # gluster snapshot config

    Snapshot System Configuration:
    snap-max-hard-limit : 10
    snap-max-soft-limit : 50%
    auto-delete : enable
    activate-on-create : enable

    Snapshot Volume Configuration:

    Volume : test_vol
    snap-max-hard-limit : 1
    Effective snap-max-hard-limit : 1
    Effective snap-max-soft-limit : 0 (50%)

    Volume : test_vol1
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 10
    Effective snap-max-soft-limit : 5 (50%)
    # gluster snapshot create snap7 test_vol1 no-timestamp
    snapshot create: success: Snap snap7 created successfully
    # gluster snapshot status snap7

    Snap Name : snap7
    Snap UUID : b1864a86-1fa4-4d42-b20a-3d95c2f9e277

    Brick Path        :   VM2:/var/run/gluster/snaps/38b1d9a2f3d24b0eb224f142ae5d33ca/brick1/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   Yes
    Brick PID         :   6731
    Data Percentage   :   1.80
    LV Size           :   616.00m

    #

    As can be seen when this option was disabled, snap6 wasn’t activated by default. After enabling this option, snap7 on creation was in activated state. In the next post we will be discussing snapshot restore and snapshot clone.

(more…)

by on January 31, 2013

Volume Files and A Sneak Peak At Translators

In my last post, we went through the three vanilla types of volumes : Distribute, Replicate and Stripe, and now we have a basic understanding of what each of them does. I also mentioned we would be creating volumes, which are a mix of these three types. But before we do so, let's have a look at volume files.

Whenever a volume is created, subsequent volume files are also created. Ideally volume files are located in a directory(bearing the same name as the volume name) inside /var/lib/glusterd/vols/. Let's have a look at the volume files, for the distribute volume(test-vol) we created last time.
# gluster volume info

Volume Name: test-vol
Type: Distribute
Volume ID: 5d28ca28-9363-4b79-b922-5f28d0c0db65
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: Gotham:/home/asengupt/node1
Brick2: Gotham:/home/asengupt/node2

# cd /var/lib/glusterd/vols/
# ls -lrt
total 4
drwxr-xr-x. 4 root root 4096 Jan 31 14:20 test-vol
# cd test-vol/
# ls -lrt
total 32
-rw-------. 1 root root 1406 Jan 31 14:19 test-vol.Gotham.home-asengupt-node1.vol
-rw-------. 1 root root 1406 Jan 31 14:19 test-vol.Gotham.home-asengupt-node2.vol
-rw-------. 1 root root 1349 Jan 31 14:19 trusted-test-vol-fuse.vol
-rw-------. 1 root root 1121 Jan 31 14:20 test-vol-fuse.vol
drwxr-xr-x. 2 root root   80 Jan 31 14:20 run
-rw-------. 1 root root  305 Jan 31 14:20 info
drwxr-xr-x. 2 root root   74 Jan 31 14:20 bricks
-rw-------. 1 root root   12 Jan 31 14:20 rbstate
-rw-------. 1 root root   34 Jan 31 14:20 node_state.info
-rw-------. 1 root root   16 Jan 31 14:20 cksum
#
As I have the bricks in the same machine as the mount, hence we are seeing all the volume files here. The test-vol.Gotham.home-asengupt-node1.vol and test-vol.Gotham.home-asengupt-node2.vol are the volume files for Brick1 and Brick2 respectively. The volume file for test-vol volume is trusted-test-vol-fuse.vol. Let's have a look inside :
# cat trusted-test-vol-fuse.vol
volume test-vol-client-0
    type protocol/client
    option password 010f5d80-9d99-4b7c-a39e-1f964764213e
    option username 6969e53a-438a-4b92-a113-de5e5b7b5464
    option transport-type tcp
    option remote-subvolume /home/asengupt/node1
    option remote-host Gotham
end-volume

volume test-vol-client-1
    type protocol/client
    option password 010f5d80-9d99-4b7c-a39e-1f964764213e
    option username 6969e53a-438a-4b92-a113-de5e5b7b5464
    option transport-type tcp
    option remote-subvolume /home/asengupt/node2
    option remote-host Gotham
end-volume

volume test-vol-dht
    type cluster/distribute
    subvolumes test-vol-client-0 test-vol-client-1
end-volume

volume test-vol-write-behind
    type performance/write-behind
    subvolumes test-vol-dht
end-volume

volume test-vol-read-ahead
    type performance/read-ahead
    subvolumes test-vol-write-behind
end-volume

volume test-vol-io-cache
    type performance/io-cache
    subvolumes test-vol-read-ahead
end-volume

volume test-vol-quick-read
    type performance/quick-read
    subvolumes test-vol-io-cache
end-volume

volume test-vol-md-cache
    type performance/md-cache
    subvolumes test-vol-quick-read
end-volume

volume test-vol
    type debug/io-stats
    option count-fop-hits off
    option latency-measurement off
    subvolumes test-vol-md-cache
end-volume
#
This is what a volume file looks like. It's actually an inverted graph, of the path that the data from the mount point is supposed to follow. Savvy? No? Ok, Let's have a closer look at it. It is made up of a number of sections, each of which begin with "volume test-vol-xxxxx" and end with "end-volume". Each section stores information (type, options, subvolumes etc) for the respective translators(we will come back to what they are in a minute).

For example : Let's say a read fop(file-operation) was attempted at the mount point. The request information (type of fop:read, filename:the file the user tried to read etc.) will be passed on from one translator to another, starting from the io-stats translator at the bottom to either of the client translators at the top. Similarly the response will be transferred back from the client translator, all the way down to the io-stats and finally to the user.

So what are these translators. A translator is a module, which has a very specific purpose. It is to receive data, perform the necessary operations, and pass the data down to the next translator. That's about it in a nutshell. For example let's look at the dht-translator in the above volume file.
volume test-vol-dht
    type cluster/distribute
    subvolumes test-vol-client-0 test-vol-client-1
end-volume
"test-vol" is a distribute type volume, and hence has a dht translator. DHT a cluster translator, as is visible in it's "type". We know that in a distribute volume, a hashing algorithm, decides in which of the "subvolumes" is the data actually present. DHT translator is the one who does that for us.

The dht-translator will receive our read-fop, along with the filename. Based on the filename, the hashing algorithm, finds out the correct subvolume (among the two "test-vol-client-0" and "test-vol-client-1"), and passes the read-fop to the subsequent translator. That is how every other translator in the graph also works(recieve the data, perform it's part of the processing, and pass the data on to the next translator). As is quite visible here, the concept of translators provides us with a lot of modularity.

The volume files are created by the volume-create command, and based on the type of the volume and it's options, a graph is build with the appropriate translators. But we can edit an existing volume-file (add, remove, or modify a couple of translators), and the volume will change behaviour accordingly. Let's try that. Currently "test-vol" is a distribute volume. As per the behaviour, any file that is created, will be present in one of the bricks.
# mount -t glusterfs Gotham:/test-vol /mnt/test-vol-mnt/
# cd /mnt/test-vol-mnt/
# ls -lrt
total 0
# touch file1
# ls -lrt
total 0
-rw-r--r--. 1 root root 0 Jan 31 15:57 file1
# ls -lrt /home/asengupt/node1/
total 0
# ls -lrt /home/asengupt/node2/
total 0
-rw-r--r--. 2 root root 0 Jan 31 15:57 file1
#
The dht-translator, created the file in node2 only. Let's edit the volume-file for test-vol(/var/lib/glusterd/vols/test-vol/trusted-test-vol-fuse.vol). But before that we need to stop the volume and unmount it.
# gluster volume stop test-vol
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: test-vol: success
# umount /mnt/test-vol-mnt 
Then let's edit the volume file, and replace the dht-translator with a replicate translator.
# vi /var/lib/glusterd/vols/test-vol/trusted-test-vol-fuse.vol 
**********Necessary Changes in the volume file************
volume test-vol-afr
    type cluster/replicate
    subvolumes test-vol-client-0 test-vol-client-1
end-volume

volume test-vol-write-behind
    type performance/write-behind
    subvolumes test-vol-afr
end-volume
**********************************************************
Now start the volume, and mount it again.
# gluster volume start test-vol
volume start: test-vol: success
# mount -t glusterfs Gotham:/test-vol /mnt/test-vol-mnt/
Let's create a file at this mount-point and check the behaviour of "test-vol".
# cd /mnt/test-vol-mnt/
# ls -lrt
total 0
-rw-r--r--. 1 root root 0 Jan 31 15:57 file1
# touch file2
# ls -lrt /home/asengupt/node1/
total 4
-rw-r--r--. 2 root root 0 Jan 31 16:06 file1
-rw-r--r--. 2 root root 0 Jan 31 16:06 file2
# ls -lrt /home/asengupt/node2/
total 4
-rw-r--r--. 2 root root 0 Jan 31 16:06 file1
-rw-r--r--. 2 root root 0 Jan 31 16:06 file2
#
Now we have, the same set of files in all the bricks, as is the behaviour of a replicate volume. We also observe that not only did it create the new file(file2) in all the bricks, but when we re-started the volume after changing the volume file, it also created a copy of the existing file(file1) in all the bricks. Let's check the volume info.
# gluster volume info 
  
Volume Name: test-vol 
Type: Distribute 
Volume ID: 5d28ca28-9363-4b79-b922-5f28d0c0db65 
Status: Started 
Number of Bricks: 2 
Transport-type: tcp 
Bricks: 
Brick1: Gotham:/home/asengupt/node1 
Brick2: Gotham:/home/asengupt/node2
It should be noted that, because of the changes we made in the volume file, though the behaviour of the volume changes, the volume-info still reflects the original details of the volume.
As promised we will create a mix of the vanilla volume-types, i.e a distributed-replicate volume.
# gluster volume create mix-vol replica 2 Gotham:/home/asengupt/node1 Gotham:/home/asengupt/node2 Gotham:/home/asengupt/node3 Gotham:/home/asengupt/node4
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume?  (y/n) y
volume create: mix-vol: success: please start the volume to access data
# gluster volume start mix-vol;
volume start: mix-vol: success
# gluster volume info

Volume Name: mix-vol
Type: Distributed-Replicate
Volume ID: 2fc6f11e-254e-444a-8179-43da62cc56e9
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: Gotham:/home/asengupt/node1
Brick2: Gotham:/home/asengupt/node2
Brick3: Gotham:/home/asengupt/node3
Brick4: Gotham:/home/asengupt/node4
#
As it is a distributed-replicate volume, the distribute translator should have two sub-volumes, each of which in turn is a replicate translator, having two bricks each as it's sub-volumes. A quick look at the volume file(/var/lib/glusterd/vols/mix-vol/trusted-mix-vol-fuse.vol) will give us more clarity. This is what the cluster translators in the volume file will look like :
volume mix-vol-replicate-0
    type cluster/replicate
    subvolumes mix-vol-client-0 mix-vol-client-1
end-volume

volume mix-vol-replicate-1
    type cluster/replicate
    subvolumes mix-vol-client-2 mix-vol-client-3
end-volume

volume mix-vol-dht
    type cluster/distribute
    subvolumes mix-vol-replicate-0 mix-vol-replicate-1
end-volume
As we can see the dht-translator has two replicate translators as its sub-volumes : "mix-vol-replicate-0" and "mix-vol-replicate-1". So every file created at the mount-point will be sent to either one of the replicate sub-volumes, by the dht-translator. Each replicate-subvolume has two bricks as it's own sub-volumes. After the write-fop is sent to the appropriate replicate-subvolume, the replicate translator will create a copy each, in both the bricks listed as it's sub-volumes. Let's check this behaviour :
# mount -t glusterfs Gotham:/mix-vol /mnt/test-vol-mnt/
# cd /mnt/test-vol-mnt/
# touch file1
# ls -lrt /home/asengupt/node1/
total 0
# ls -lrt /home/asengupt/node2/
total 0
# ls -lrt /home/asengupt/node3/
total 0
-rw-r--r--. 2 root root 0 Jan 31 16:31 file1
# ls -lrt /home/asengupt/node4/
total 0
-rw-r--r--. 2 root root 0 Jan 31 16:31 file1
#
Similarly, a distributed-stripe or a replicated-stripe volume can also be created.  Jeff Darcy's blog has an awesome set of articles on translators. It's  a great read :
EDIT : The above links are broken, but the same information is provided in glusterfs/doc/developer-guide/translator-development.md in the source code. Thanks to Jo for pointing out the same.
    by on January 11, 2013

    Volumes

    I feel it's safe to say, that we now have a fair idea of what GlusterFS is, and we are pretty comfortable installing GlusterFS, and creating a volume.
    Let's create a volume with two local directories as two bricks.
    # gluster volume create test-vol Gotham:/home/asengupt/node1 Gotham:/home/asengupt/node2
    volume create: test-vol: success: please start the volume to access data
    # gluster volume start test-vol;
    volume start: test-vol: success
    Let's mount this volume, and create a file in that volume.
    # mount -t glusterfs Gotham:/test-vol /mnt/test-vol-mnt/
    # touch /mnt/test-vol-mnt/file1
    # cd /mnt/test-vol-mnt/
    # ls -lrt
    total 1
    -rw-r--r--. 1 root root   0 Jan 10 14:40 file1
    Now where does this file really get created in the backend. Let's have a look at the two directories we used as bricks(subvolumes):
    # cd /home/asengupt/node1
    # ls -lrt
    total 0
    # cd ../node2/
    # ls -lrt
    total 1
    -rw-r--r--. 1 root root   0 Jan 10 14:40 file1
    So the file we created at the mount-point(/mnt/test-vol-mnt), got created in one of the bricks. But why in this particular brick, why not the other one? The answer to that question lies in the volume information.
    # gluster volume info
     
    Volume Name: test-vol
    Type: Distribute
    Volume ID: 5d28ca28-9363-4b79-b922-5f28d0c0db65
    Status: Started
    Number of Bricks: 2
    Transport-type: tcp
    Bricks:
    Brick1: Gotham:/home/asengupt/node1
    Brick2: Gotham:/home/asengupt/node2
    It gives us a lot of info. While creating a volume we have the liberty of providing a number of options like the transport-type, the volume-type, etc. which eventually decides the behaviour of the volume. But at this moment what most interests us is the type. It says that our volume "test-vol" is a distributed volume. What does that mean?

    The type of a volume decides, how exactly the volume stores the data in the bricks. A volume can be of the following types :
    • Distribute : A distribute volume is one, in which all the data of the volume, is distributed throughout the bricks. Based on an algorithm, that takes into account the size available in each brick, the data will be stored in any one of the available bricks. As our "test-vol" volume is a distributed volume, so based on the algorithm "file1" was created in node2. The default volume type is distribute, hence test-vol is distribute.
    • Replicate : In a replicate volume, the data is replicated(duplicated) over every brick, based on the brick number. The number of bricks must be a multiple of the replica count. So when "file1" is created in a replicate volume, having two bricks, it will be stored in brick1, and then replicated to brick2. So the file will be present in both the bricks. Let's create one and see for ourselves.
      # gluster volume create test-vol replica 2 Gotham:/home/asengupt/node1 Gotham:/home/asengupt/node2
      Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
      Do you still want to continue creating the volume?  (y/n) y
      volume create: test-vol: success: please start the volume to access data
      # gluster volume start test-vol
      volume start: test-vol: success
      # gluster volume info
       
      Volume Name: test-vol
      Type: Replicate
      Volume ID: bfb685e9-d30d-484c-beaf-e5fd3b6e66c7
      Status: Started
      Number of Bricks: 1 x 2 = 2
      Transport-type: tcp
      Bricks:
      Brick1: Gotham:/home/asengupt/node1
      Brick2: Gotham:/home/asengupt/node2
      # mount -t glusterfs Gotham:/test-vol /mnt/test-vol-mnt/
      # touch /mnt/test-vol-mnt/file1
      # cd /mnt/test-vol-mnt/
      # ls -lrt
      total 0
      -rw-r--r--. 1 root root 0 Jan 10 14:58 file1
      # ls -lrt /home/asengupt/node1/
      total 0
      -rw-r--r--. 2 root root 0 Jan 10 14:58 file1
      # ls -lrt /home/asengupt/node2/
      total 0
      -rw-r--r--. 2 root root 0 Jan 10 14:58 file1
    • Stripe : A stripe volume is one, in which the data being stored in the backend is striped into units of a particular size, among the bricks. The default unit size is 128KB, but it's configurable. If we create a striped volume of stripe count 3, and then create a 300 KB file at the mount point, the first 128KB will be stored in the first sub-volume(brick in our case), the next 128KB in the second, and the remaining 56KB in the third. The number of bricks should be a multiple of the stripe count.
      # gluster volume create test-vol stripe 3 Gotham:/home/asengupt/node1 Gotham:/home/asengupt/node2 Gotham:/home/asengupt/node3
      volume create: test-vol: success: please start the volume to access data
      # gluster volume start test-vol
      volume start: test-vol: success
      # gluster volume info
       
      Volume Name: test-vol
      Type: Stripe
      Volume ID: c5aa1590-2f6e-464d-a783-cd9bc222db30
      Status: Started
      Number of Bricks: 1 x 3 = 3
      Transport-type: tcp
      Bricks:
      Brick1: Gotham:/home/asengupt/node1
      Brick2: Gotham:/home/asengupt/node2
      Brick3: Gotham:/home/asengupt/node3
      # mount -t glusterfs Gotham:/test-vol /mnt/test-vol-mnt/
      # cd /mnt/test-vol-mnt/
      # ls -lrt
      total 0
      # cp /home/asengupt/300KB_File .
      # ls -lrt
      total 308
      -rwxr-xr-x. 1 root root 307200 Jan 11 12:46 300KB_File
      # ls -lrt /home/asengupt/node1/
      total 132
      -rwxr-xr-x. 2 root root 131072 Jan 11 12:46 300KB_File
      # ls -lrt /home/asengupt/node2/
      total 132
      -rwxr-xr-x. 2 root root 262144 Jan 11 12:46 300KB_File
      # ls -lrt /home/asengupt/node3/
      total 48
      -rwxr-xr-x. 2 root root 307200 Jan 11 12:46 300KB_File
      Why do we see that the first sub-volume indeed has 128kb of data, but the second and third sub-volumes contain 256KB, and 300KB respectively?
      That's because of holes. It means that the filesystem just pretends that at a particular place in the file there is just zero bytes, but no actual disk sectors are reserved for that place in the file. To proof this let's check the disk usage.
      # cd /home/asengupt
      # du | grep node.$
      136    ./node2
      136    ./node1
      52    ./node3
      0    ./node4
      Here we observe, that node1 and node2 indeed have 128KB of data, while node3 has 44KB. The additional 8KB present in these directories, are glusterfs system files.

    Apart from these three vanilla types of volume, we can also create a volume which is a mix of these types. We will go through these and the respective volume files in the next post.