all posts tagged yaml

by on June 4, 2014

Hiera data in modules and OS independent puppet

Earlier this year, R.I.Pienaar released his brilliant data in modules hack, a few months ago, I got the chance to start implementing it in Puppet-Gluster, and today I have found the time to blog about it.

What is it?

R.I.’s hack lets you store hiera data inside a puppet module. This can have many uses including letting you throw out the nested mess that is commonly params.pp, and replace it with something file based that is elegant and hierarchical. For my use case, I’m using it to build OS independent puppet modules, without storing this data as code. The secondary win is that porting your module to a new GNU/Linux distribution or version could be as simple as adding a YAML file.

How does it work?

(For the specifics on the hack in general, please read R.I. Pienaar’s blog post. After you’re comfortable with that, please continue…)

In the hiera.yaml data/ hierarchy, I define an OS / version structure that should probably cover all use cases. It looks like this:

- tree/%{::osfamily}/%{::operatingsystem}/%{::operatingsystemrelease}
- tree/%{::osfamily}/%{::operatingsystem}
- tree/%{::osfamily}
- common

At the bottom, you can specify common data, which can be overridden by OS family specific data (think RedHat “like” vs. Debian “like”), which can be overridden with operating system specific data (think CentOS vs. Fedora), which can finally be overridden with operating system version specific data (think RHEL6 vs. RHEL7).

Grouping the commonalities near the bottom of the tree, avoids duplication, and makes it possible to support new OS versions with fewer changes. It would be especially cool if someone could write a script to refactor commonalities downwards, and to refactor new uniqueness upwards.

This is an except of the Fedora specific YAML file:

gluster::params::package_glusterfs_server: 'glusterfs-server'
gluster::params::program_mkfs_xfs: '/usr/sbin/mkfs.xfs'
gluster::params::program_mkfs_ext4: '/usr/sbin/mkfs.ext4'
gluster::params::program_findmnt: '/usr/bin/findmnt'
gluster::params::service_glusterd: 'glusterd'
gluster::params::misc_gluster_reload: '/usr/bin/systemctl reload glusterd'

Since we use full paths in Puppet-Gluster, and since they are uniquely different in Fedora (no more: /bin) it’s nice to specify them all here. The added advantage is that you can easily drop in different versions of these utilities if you want to test a patched release without having to edit your system utilities. In addition, you’ll see that the OS specific RPM package name and service names are in here too. On a Debian system, they are usually different.


This depends on Puppet >= 3.x and having the puppet-module-data module included. I do so for integration with vagrant like so.

Should I still use params.pp?

I think that this answer is yes. I use a params.pp file with a single class specifying all the defaults:

class gluster::params(
    # packages...
    $package_glusterfs_server = 'glusterfs-server',

    $program_mkfs_xfs = '/sbin/mkfs.xfs',
    $program_mkfs_ext4 = '/sbin/mkfs.ext4',

    # services...
    $service_glusterd = 'glusterd',

    # misc...
    $misc_gluster_reload = '/sbin/service glusterd reload',

    # comment...
    $comment = ''
) {
    if "${comment}" == '' {
        warning('Unable to load yaml data/ directory!')

    # ...


In my data/common.yaml I include a bogus comment canary so that I can trigger a warning if the data in modules module isn’t working. This shouldn’t be a fail as long as you want to allow backwards compatibility, otherwise it should be! The defaults I use correspond to the primary OS I hack and use this module with, which in this case is CentOS 6.x.

To use this data in your module, include the params.pp file, and start using it. Example:

include gluster::params
package { "${::gluster::params::package_glusterfs_server}":
    ensure => present,

Unfortunately the readability isn’t nearly as nice as it is without this, however it’s an essential evil, due to the puppet language limitations.

Common patterns:

There are a few common code patterns, which you might need for this technique. The first few, I’ve already mentioned above. These are the tree layout in hiera.yaml, the comment canary, and the params.pp defaults. There’s one more that you might find helpful…

The split package pattern:

Certain packages are split into multiple pieces on some operating systems, and grouped together on others. This means there isn’t always a one-to-one mapping between the data and the package type. For simple cases you can use a hiera array:

# this hiera value could be an array of strings...
package { $::some_module::params::package::some_package_list:
    ensure => present,
    alias => 'some_package',
service { 'foo':
    require => Package['some_package'],

For this to work you must always define at least one element in the array. For more complex cases you might need to test for the secondary package in the split:

if "${::some_module::params::package::some_package}" != '' {
    package { "${::some_module::params::package::some_package}":
        ensure => present,
        alias => 'some_package', # or use the $name and skip this

service { 'foo':
    require => "${::some_module::params::package::some_package}" ? {
        '' => undef,
        default => Package['some_package'],

This pattern is used in Puppet-Gluster in more than one place. It turns out that it’s also useful when optional python packages get pulled into the system python. (example)

Hopefully you found this useful. Please help increase the multi-os aspect of Puppet-Gluster by submitting patches to the YAML files, and by testing it on your favourite GNU/Linux distro!

Happy hacking!



by on November 17, 2013

Iteration in Puppet

People often ask how to do iteration in Puppet. Most Puppet users have a background in imperative programming, and are already very familiar with for loops. Puppet is sometimes confusing at first, because it is actually (or technically, contains) a declarative, domain-specific language. In general, DSL’s aren’t always Turing complete, nor do they need to support loops, but this doesn’t mean you can’t iterate.

Until recently, Puppet didn’t have an explicit looping construct, and it is quite possible to build complex modules without using this new construct. There are even some who believe that the language shouldn’t even contain this feature. I’ll abstain from that debate for now, but instead, I would like to show you some iteration techniques that you can use to get your desired result.


puppets-all-the-way-downMany people forget that recursion is a form of iteration. Even more don’t realize that you can do recursion in Puppet:

#!/usr/bin/puppet apply

define recursion(
) {
    # do something here...
    notify { "count-${count}":
    $minus1 = inline_template('<%= count.to_i - 1 %>')
    if "${minus1}" == '0' {
        notify { 'done counting!':
    } else {
        # recurse
        recursion { "count-${minus1}":
            count => $minus1,

# kick it off...
recursion { 'start':
    count => 4,

If you really want to Push Puppet, even more advanced recursion is possible. In general, I haven’t found this technique very useful for module design, but it’s worth mentioning as a form of iteration. If you do find a legitimate use of this technique, please let me know!

Type iteration

We’re used to seeing simple type declarations such as:

user { 'james':
    ensure => present,
    comment => 'james is awesome!',

In fact, the namevar can actually accept a list:

$users = ['kermit', 'beaker', 'statler', 'waldorf', 'tom']
user { $users:
    ensure => present,
    comment => 'who gave these muppets user accounts?',

Which will cause Puppet to effectively iterate across the elements in $users. This is the most important type of iteration in Puppet. Please get familiar with it.

This technique can be used with any type. It can even be used to express a many-to-one dependency relationship:

# where $bricks is a list of gluster::brick names
Gluster::Brick[$bricks] -> Gluster::Volume[$name]    # volume requires bricks

Suppose you’d like to use type iteration, but you’d also like to know the index of each element. This can be useful to avoid duplicate sub-types, or to provide a unique index:

define some_module::process_array(
    $array    # pass in the original $name
) {
    #notice(inline_template('NAME: <%= name.inspect %>'))

    # do something here...

    # build a unique name...
    $length = inline_template('<%= array.length %>')
    $ulength = inline_template('<%= array.uniq.length %>')
    if ( "${length}" != '0' ) and ( "${length}" != "${ulength}" ) {
        fail('Array must not have duplicates.')
    # if array had duplicates, this wouldn't be a unique index
    $index = inline_template('<%= array.index(name) %>')

    # iterate, knowing your index
    some::type { "${foo}:${index}":
        foo => 'hello',
        index => "${index}",

# a list
$some_array = ['a', 'b', 'c']    # must not have duplicates

# using the type requires that you pass in $some_array twice!
some_module::process_array { $some_array:    # array
    foo => 'bar',
    array => $some_array,    # same array as above

While this example might seem contrived, it is actually a modified excerpt from a module that I wrote.


This is a similar technique for when you want to specify different arguments for each type:

$defaults = {
    ensure => present,
    comment => 'a muppet',
$data = {
    'kermit' => {
        comment => 'the frog',
    'beaker' => {
        comment => 'keep him away from your glassware',
    'fozzie' => {
        home => '/home/fozzie',
    'tom' => {
        comment => 'the swedish chef',
create_resources('user', $data, $defaults)

This creates each user resource with its own arguments. If an argument isn’t given in the $data, it is taken from the $defaults hash. A similar example, and the official documentation is found here.

Template iteration

You might want to iterate to perform a simple computation, or to modify an array in some way. For static value computations, you can often use a template. Remember that the template will get executed at compile time on the Puppet Master, so code accordingly. Here are a few contrived examples:

# filter out all the integers less than zero
$array_in = [-4,3,-8,-2,1,4,-2,1,5,-1,-7,9,-3,2,6,-8,5,3,5,-6,8,9,7,-5,9,3,-3]
$array_out = split(inline_template('<%= array_in.delete_if {|x| x < 0 }.join(",") %>'), ',')

We can also use the ruby map:

# build out a greeting string
$names = ['animal', 'gonzo', 'rowlf']
# NOTE: you can also use a regular multi-line template for readability
$message = inline_template('<% if names == [] %>Hello... Anyone there?<% else %><%= {|x| "Hello "+x.capitalize}.join(", ") %>.<% end %>')

Use your imagination! Remember that you can also write a custom function if necessary, but first check that there isn’t already a built-in function, or a stdlib function that suits your needs.

Advanced template iteration

When you really need to get fancy, it’s often time to call in a custom function. Custom functions require that you split them off into separate files, and away from the module logic, instead of keeping the functions inline and accessible as lambdas. The downside to using these “inline_template” lambdas instead, is that they can quickly turn into parlous one-liners.

# transform the $data hash
$data = {
    'waldorf' => {
        'heckles' => 'absolutely',
        'comment' => 'a critic',
    'statler' => {
        'heckles' => 'all the time',
        'comment' => 'another critic!',
# rename and filter on the 'heckles' key
$yaml = inline_template('<%= data.inject({}) {|h, (x,y)| h[x] = {"applauds" => y.fetch("heckles", "yes")}; h}.to_yaml %>')
$output = parseyaml($yaml) # parseyaml is in the puppetlabs-stdlib

As with simple template iteration, the key problem is transferring the data in and out of the template. In the simple case, arrays can be joined and split as long as there is a reserved character that won’t be used in the data. For the advanced template iteration, we rely on the YAML transformation functions.

Some reminders

If you properly understand the functionality that your module is trying to model/manage, you can usually break it up into separate classes and defined types, such that re-use via type iteration can fulfill your needs. Usually you’ll end up with a more properly designed module.

Test using the same version of Ruby that will run your module. Newer versions of Ruby have some incompatible changes, and new features, with respect to older versions of Ruby.

Remember that templates and functions run on the Puppet Master, but facts and types run on the client (agent).

The Puppet language is mostly declarative. Because this might be an unfamiliar paradigm, try not to look for all the imperative features that you’re used to. Having a programming background can help, because there’s certainly programming mixed in, whether you’re writing custom functions, or erb templates.

Future parser

For completeness, I should mention that the future parser now supports native iteration. If you need it, it probably means that you’re writing a fairly advanced module, and you’re comfortable manual diving. If you have a legitimate use case that isn’t possible with the existing constructs, and isn’t only a readability improvement, please let me know.


I hope you enjoyed this article. The next time someone asks you how to iterate in Puppet, feel free to link them this way.

Happy hacking,