Masterzen’s Blog

Journey in a software world…

Storeconfigs (advanced) use cases

This week on #puppet, Nico asked for a storeconfigs live example. So I thought, a blog post would be perfect to post an example of a storeconfigs use case and its full explanation. Of course if you’re interested in some discussions around storeconfigs, please report to the following blog posts:

At Days of Wonder, I use storeconfigs for only one type of use: exchanging information between nodes. But I know some other people use this feature as an inventory system (to know what node gets what configuration).

Use case 1: website document root replication

Let’s start with a simple example, easily understandable.

At Days of Wonder we have a bunch of webservers arranged in a kind of cluster. All these webservers document root (where reside the various php and image files) should be always in sync. So we rsync to all webservers, from a central build server each time the developpers commit a change.

The tedious part with this scheme is that you have to make sure all the webservers have the correct ssh authorized_keys and ssh authorization for the build server to contact them successfully.

The manifest


# Class:: devl
# This class is implemented on the build server
#
# Usage:
# Generate a ssh key and store the private key and public key
# on the puppetmaster files mount as keys/buildkey and keys/buildkey.pub
#
#   node build {
#       include devl
#       devl::pushkey{
#           "build":
#               keyfile => "files/keys/buildkey"
#       }
#   }
#
#
class devl {
    ...
    define pushkey($keyfile) {
        @@ssh_authorized_key {
            "push-${name}@${fqdn}":
                user => "push",
                type => "ssh-rsa",
                tag => "push",
                # this is to remove the ssh-rsa prefix, the suffix and trim any \n
                key => gsub(gsub(file("/etc/puppet/${keyfile}.pub"), '^ssh-rsa (.*) .*$', '\1'), "\n", ""),
                options => ['command="rsync --server -vlgDtpr --delete . /path/to/docroot/"', 'no-port-forwarding','no-X11-forwarding','no-agent-forwarding','no-pty'],
        }

        # store the private key locally, for our rsync build
        file {
            "/home/build/.ssh/id_${name}":
                ensure => file, owner => "build", group => "build",
                source => "puppet:///${keyfile}", mode => 0400,
                alias => "pkey-${name}",
                require => [User["build"], File["/home/build/.ssh"]]
        }
    }
    ...
}

# Class: www::push
# This class is implemented on webservers
#
class www::push {
    ... create here the push user and so on...
    Ssh_authorized_key <<| tag == "push" |>>
    ...
}

Inner workings

It’s easy when the build server applies its configuration, it creates an exported ssh_authorized_key (notice the double @), which is not applied locally. Instead it is stored in the storeconfigs database.

We also create locally a file containing the ssh private key pair.

When one of the webserver comes to check out its configuration, it implements the www::push class which collects all ssh_authorized_key resources tagged with “push”.

That is all the authorized keys we created with the pushkey definition in the build configuration. The collection means that this resource is created as if we defined it in the node that collects it. That means the webserver will have a new ssh authorized key whose action, options and keys are the one defined in the build server configuration.

Of course this manifest doesn’t show everything, it also drops a handful of shell scripts to do the rsync using the local private keys, along with more configuration files for some other parts of the build.

Note: the gsub function is a custom parser function I borrowed from David Schmidtt repository. In 0.25 it would be replaced by regsubst.

Use case 2: tinydns master and slaves

Once again at Days of Wonder, we run tinydns as our DNS server. Tinydns doesn’t have a fancy full of security holes zone transfer system, so we emulate this functionality by rsync’ing the zone files from the master to the slaves each time the zones are changed (the zones are managed by Puppet of course).

This is somehow the exact same system as the one we saw in the use case 1, except there is one key for all the slaves, and more important each slave registers itself to the master to be part of the replication.

The manifest

class djbdns {
    ...

    # Define: tinydns::master
    # define a master with its listening +ip+, +keyfile+, and zonefile.
    # Usage:
    #     djbdns::tinydns::master {
    #         "root":
    #             keyfile => "files/keys/tinydns",
    #             content => "files/dow/zone"
    #     }
    #
    define tinydns::master($ip, $keyfile, $content='') {
        $root = "/var/lib/service/${name}"
        tinydns::common { $name: ip => $ip, content=>$content }

        # send our public key to our slaves
        @@ssh_authorized_key {
            "dns-${name}@${fqdn}":
                user => "root",
                type => "ssh-rsa",
                tag => "djbdns-master",
                key => file("/etc/puppet/${keyfile}.pub"),
                options => ["command=\"rsync --server -logDtprz . ${root}/root/data.cdb\"", "from=\"${fqdn}\"", 'no-port-forwarding','no-X11-forwarding','no-agent-forwarding','no-pty']
        }

        # store our private key locally
        file {
            "/root/.ssh/${name}_identity":
            ensure => file,
            source => "puppet://${keyfile}", mode => 0600,
            alias => "master-pkey-${name}"
        }

        # replicate with the help of the propagate-key script
        # this exec subscribe to the zone file and the slaves
        # which means each time we add a slave it is rsynced
        # or each time the zone file changes.
        exec {
            "propagate-data-${name}":
                command => "/usr/local/bin/propagate-key ${name} /var/lib/puppet/modules/djbdns/slaves.d /root/.ssh/${name}_identity",
                subscribe => [File["/var/lib/puppet/modules/djbdns/slaves.d"] , File["${root}/root/data"], Exec["data-${name}"]],
                require => [File["/usr/local/bin/propagate-key"], Exec["data-${name}"]],
                refreshonly => true
        }

        # collect slaves address
        File<<| tag == 'djbdns' |>>
    }

    # Define:: tinydns::slave
    # this define is implemented on each tinydns slaves
    define tinydns::slave($ip) {
        $root = "/var/lib/service/${name}"

        tinydns::common { $name: ip => $ip }

        # publish our addresses back to the master
        # our ip address ends up being in a file name in the slaves.d directory
        # where the propagate-key shell script will get it.
        @@file {
            "/var/lib/puppet/modules/djbdns/slaves.d/${name}-${ipaddress}":
            ensure => file, content => "\n",
            alias => "slave-address-${name}",
            tag => 'djbdns'
        }

        # collect the ssh public keys of our master
        Ssh_authorized_key <<| tag == 'djbdns-master' |>>
    }
}

Inner workings

This time we have a double exchange system:

  1. The master exports its public key to be collected by the slaves
  2. and the slaves are exporting back their IP addresses to the master, under the form a of an empty file. Their IP address is encoded in those file names.

When the zone file has to be propagated, the propagate-key shell script is executed. This script lists all the file in the /var/lib/puppet/djbdns/slaves.d folder where the slaves exports their ip addresses, extract the ip address from the file names and calls rsync with the correct private key. Simple and elegant, isn’t it?

Other ideas

There’s simply no limitation to what we can do with storeconfigs, because you can export any kind of resources, not only files or ssh authorized keys.

I’m giving here some ideas (some that we are implementing here):

  • Centralized backups. Using rdiff-backup for instance, we could propagate the central backup server key to all servers, and get back the list of files to backup.
  • Resolv.conf building. This is something we’re doing at Days of Wonder. Each dnscache server exports their IP address, and we build resolv.conf on each host from those addresses.
  • Ntp automated configuration: each NTP server (of a high stratum) exports their ip address (or ntp.conf configuration fragments) that can be used for all the other NTP server to be pointed to those to form lower stratum servers.
  • Automated monitoring configurations: each service and node exports configuration fragments that are collected on the NMS host to build the NMS configuration. People running nagios or munin seems to do that.

If you have some creative uses of storeconfigs, do not hesitate to publish them, either on the Puppet-user list, the Puppet wiki or elsewhere (and why not in a blog post that could be aggregated by Planet Puppet).


OMG!! storedconfigs killed my database!

When I wrote my previous post titled all about storedconfigs, I was pretty confident I explained everything I could about storedconfigs… I was wrong of course :-)

A couple of days ago, I was helping some USG admins who were facing an interesting issue. Interesting for me, but I don’t think they’d share my views on this, as their servers were melting down under the database load.

But first let me explain the issue.

The issue

The thing is that when a client checks in to get its configuration, the puppetmaster compiles its configuration to a digestible format and returns it. This operation is the process of transforming the AST built by parsing the manifests to what is called the catalog in Puppet. This is this catalog (which in fact is a graph of resources) which is later played by the client.

When the compilation process is over, and if storedconfigs is enabled on the master, the master connects to the RDBMS, and retrieves all the resources, parameters, tags and facts. Those, if any, are compared to what has just been compiled, and if some resources differs (by value/content, or if there are some missing or new ones), they get written to the database.

Pretty straightforward, isn’t it?

As you can see, this process is synchronous and while the master processes the storedconfigs operations, it doesn’t serve anybody else.

Now, imagine you have a large site (ie hundreds of puppetd clients), and you decide to turn on storedconfigs. All the clients checking in will see their current configuration stored in the database.

Unfortunately the first run of storedconfigs for a client, the database is empty, so the puppetmaster has to send all the information to the RDBMS which in turns as to write it to the disks. Of course on subsequent runs only what is modified needs to reach the RDBMS which is much less than the first time (provided you are running 0.24.8 or applied my patch).

But if your RDBMS is not correctly setup or not sized for so much concurrent write load, the storedconfigs process will take time. During this time this master is pinned to the database and can’t serve clients. So the immediate effect is that new clients checking in will see timeouts, load will rise, and so on.

The database

If you are in the aforementioned scenario you must be sure your RDBMS hardware is properly sized for this peak load, and that your database is properly tuned.

I’ll soon give some generic MySQL tuning advices to let MySQL handle the load, but remember those are generic so YMMV.

Size the I/O subsystem

What people usually forget is that disk (ie those with rotating plates, not SSDs) have a maximum number of I/O operations per seconds. This value is for professional high-end disks about 250 IOP/s.

Now, to simplify, let’s say your average puppet client has 500 resources with an average of 4 parameters each. That means the master will have to perform at least 500 * 4 + 500 = 2500 writes to the database (that’s naive since there are indices to modify, and transactions can be grouped, etc.. but you see the point).

Add to this the tags, hmm let’s say an average of 4 tags per resources, and we have 500 * 4 + 500 + 500 * 4 = 4500 writes to perform to store the configuration of a given host.

Now remember our 250 IOP/s, how many seconds does the disk need to performs 4500 writes?

The answer is 18s!! Which is a high value. During this time you can’t do anything else. Now add concurrency to the mix, and imagine what that means.

Of course this supposes we have to wait for the disk to have finished (ie synchronous writing), but in fact that’s pretty how RDBMS are working if you really want to trust your data.

So the result is that if you want a fast RDBMS you must be ready to pay for an expensive I/O subsystem.

Size the I/O subsystem

That’s certainly the most important part of your server.

You need:

  • fast disks (15k RPM, because they is a real latency benefit compared to 10k )
  • the more spindle possible grouped in a sane RAID array like RAID10. Please forget RAID5 if you want your data to be safe (and fast writes). I saw too much horror stories with RAID5. I should really join the BAARF.
  • a Battery Backed RAID Cache unit (that will absorb the fsyncs gracefully).
  • Tune the RAID for the largest stripe size. Remove the RAID read cache if possible (innodb will take care of the READ cache with the innodb buffer pool).

If you don’t have this, do not even think turning on storedconfigs for a large site.

Size the RDBMS server

Of course other things matters. If the database can fit in RAM (the best if you don’t want to be I/O bound), then you obviously need RAM. Preferably ECC Registered RAM. Use 64 bits hardware with a 64 bits OS.

Then you need some CPU. Nowadays they’re cheap, but beware of InnoDB scaling issues on multi-core/multi-CPU systems (see below).

Tune the database configuration

Here is a checklist on how to tune MySQL for a mostly write load:

InnoDB of course

For concurrency, stability and durability reasons InnoDB is mandatory. MyISAM is at best usable for READ workload but suffers concurrency issues so it is a no-no for our topic

Tuned InnoDB

The default InnoDB settings are tailored to very small 10 years old servers…

Things to look to:

  • innodb_buffer_pool_size. Usual advice says 70% to 80% of physical RAM of the server if MySQL is the only running application. I’d say that it depends on the size of the database. If you know you’ll store only a few MiB, no need to allocate 2 GiB :-). More information with this useful and intersting blog post from Percona guys.
  • innodb_log_file_size. We want those to be the largest we can to ease the mostly write log we have. Once all the clients will be stored in the database we’ll reduce this to a something lower. The trade-off with large logs is the recovery time in case of crash. It isn’t uncommon to see several hundreds of MiB, or even GiB.
  • innodb_flush_method = O_DIRECT on Linux. This is to prevent the OS to cache the innodb_buffer_pool content (thus ending with a double cache).
  • innodb_flush_log_at_trx_commit=2. If your MySQL server doesn’t have any other use than for storedconfigs or you don’t care about the D in ACID. Otherwise use 0. It is also possible to temporarily change it to 2, and then move back to 0 when all clients have their configs stored.
  • transaction-isolation=READ-COMMITTED. This one can help also, although I never tested it myself

Patch MySQL

The fine people at Percona or Ourdelta produces some patched builds of MySQL that removes some of the MySQL InnoDB scalability issues. This is more important on high concurrency workload on multi-core/multi-cpu systems.

It can also be good to run MySQL with Google’s perftools TCMalloc. TCMalloc is a memory allocator which scales way better than the Glibc one.

On the Puppet side

The immediate and most straightforward idea is to limit the number of clients that can check in at the same time. This can be done by disabling puppetd on each client (puppetd –disable), blocking network access, or any other creative mean…

When all the active hosts have checked in, you can then enable the other ones. This can be done hundreds of hosts at a time, until all hosts have a configuration stored.

Another solution is to direct some hosts to a special puppetmaster with storedconfigs on (the regular one still has storedconfigs disabled), by playing with DNS or by configuration, whatever is simplest in your environment. Once those hosts have their config stored, move them back to their regular puppetmaster and move newer hosts there.

Since that’s completely manual, it might be unpractical for you, but that’s the simplest method.

And after that?

As long as your manifests are only slightly changing, subsequent runs will see only a really limited database activity (if you run a puppetmaster >= 0.24.8). That means the tuning we did earlier can be undone (for instance you can lower the innodb_log_file_size for instance, and adjust the innodb_buffer_pool_size to the size of the hot set).

But still storedconfigs can double your compilation time. If you are already at the limit compared to the number of hosts, you might see some client timeouts.

The Future

Today Luke announced on the puppet-dev list that they were working on a queuing system to defer storedconfigs and smooth out the load by spreading it on a longer time. But still, tuning the database is important.

The idea is to offload the storedconfigs to another daemon which is hooked behind a queuing system. After the compilation the puppetmaster queues the catalog, where it will be unqueued by the puppet queue daemon which will in turn execute the storedconfigs process.

I don’t know the ETA for this interesting feature, but meanwhile I hope the tips I provided here can be of any help to anyone :-)

Stay tuned for more puppet stories!

Masterzen's Pictures

masterzen's photo

masterzen's photo

Golden Gate from the other Side

Golden Gate from the other Side

masterzen's photo

masterzen's photo

Muir Woods redwoods

Muir Woods redwoods

Muir Woods redwoods

Muir Woods redwoods

masterzen's photo

masterzen's photo

masterzen's photo

masterzen's photo

Ohad testing the offroad Segway but faster

Ohad testing the offroad Segway but faster