Masterzen’s Blog

Journey in a software world…

Puppet Memory Usage - not a fatality

As every reader of this blog certainly know, I’m a big fan of Puppet, using it in production on Days of Wonder servers, up to the point I used to contribute regularly bug fixes and new features (not that I stopped, it’s just that my spare time is a scarce resource nowadays).

Still, I think there are some issues in term of scalability or resource consumption (CPU or memory), for which we can find some workarounds or even fixes. Those issues are not a symptom bad programming or bad design. No, most of the issues come either from ruby itself or some random library issues.

Let’s review the things I have been thinking about lately.

Memory consumption

This is by far one of the most seen issues both on the client side and the server side. I’ve mainly seen this problem on the client side, up to the point that most people recommend running puppetd as cronjobs, instead of being a long lived process.

Ruby allocator

All boils down to the ruby (at least the the MRI 1.8.x version) allocator. This is the part in the ruby interpreter that deals with memory allocations. Like in many dynamic languages, the allocator manages a memory pool that is called a heap. And like some other languages (among them Java), this heap can never shrink and always grows when more memory is needed. This is done this way because it is simpler and way faster. Usually applications ends using their nominal part of memory and no more memory has to be allocated by the kernel to the process, which gives faster applications.

The problem is that if the application needs transiently a high amount of memory that will be trashed a couple of millisecond after, the process will pay this penalty all its life, even though say 80% of the memory used by the process is free but not reclaimed by the OS.

And it’s even worst. The ruby interpreter when it grows the heap, instead of allocating bytes per bytes (which would be really slow) does this by chunk. The whole question is what is the proper size of a chunk?

In the default implementation of MRI 1.8.x, a chunk is the size of the previous heap times 1.8. That means at worst a ruby process might end up allocating 1.8 times more than what it really needs at a given time. (This is a gross simplification, read the code if you want to know more).

Yes but what happens in Puppet?

So how does it apply to puppetd?

It’s easy, puppetd uses memory for two things (beside maintaining some core data to be able to run):

  1. the catalog (which contains all resources, along with all templates) as shipped by the puppetmaster (i.e. serialized) and live as ruby objects.
  2. the content of the sourced files (one at a time, so it’s the biggest transmitted file that imposes it’s high watermark for puppetd). Of course this is still better than in 0.24 where the content was transmitted encoded in XMLRPC adding the penalty of escaping everything…

Hopefully, nobody distributes large files with Puppet :-) If you’re tempted to do so, see below…

But again there’s more, as Peter Meier (known as duritong in the community) discovered a couple of month ago: when puppetd gets its catalog (which by the way is transmitted in json nowadays), it also stores it as a local cache to be able to run if it can’t contact the master for a subsequent run. This operation is done by unserializing the catalog from json to ruby live objects, and then serializing the laters to YAML. Beside the evident loss of time to do that on large catalog, YAML is a real memory hog. Peter’s experience showed that about 200MB of live memory his puppetd process was using came from this final serialization!

So I had the following idea: why not store the serialized version of the catalog (the json one) since we already have it in a serialized form when we receive it from the master (it’s a little bit more complex than that of course). This way no need to serialize it again in YAML. This is what ticket #2892 is all about. Luke is committed to have this enhancement in Rowlf, so there’s good hope!

Some puppet solutions?

So what can we do to help puppet not consume that many memory?

In theory we could play on several factors:

  • Transmit smaller catalogs. For instance get rid of all those templates you love (ok that’s not a solution)
  • Stream the serialization/deserialization with something like Yajl-Ruby
  • Use another ruby interpreter with a better allocator (like for instance JRuby)
  • Use a different constant for resizing the heap (ie replace this 1.8 by 1.0 or less on line 410 of gc.c). This can be done easily when using Rails machine GC patches or Ruby Enterprise Edition, in which case setting the environment variable RUBY_HEAP_SLOTS_GROWTH_FACTOR is enough. Check the documentation for more information.
  • Stream the sourced file on the server and the client (this way only a small buffer is used, and the total size of the file is never allocated). This one is hard.

Note that the same issues apply to the master too (especially for the file serving part). But it’s usually easier to run a different ruby interpreter (like REE) on the master than on all your clients.

Streaming HTTP requests is promising but unfortunately would require large change to how Puppet deals with HTTP. Maybe it can be done only for file content requests… This is something I’ll definitely explore.

This file serving thing let me think about the following which I already discussed several time with Peter…

File serving offloading

One of the mission of the puppetmaster is to serve sourced file to its clients. We saw in the previous section that to do that the master has to read the file in memory. That’s one reason it is recommended to use a dedicated puppetmaster server to act as a pure fileserver.

But there’s a better way, provided you run puppet behind nginx or apache. Those two proxies are also static file servers: why not leverage what they do best to serve the sourced files and thus offload our puppetmaster?

This has some advantages:

  • it frees lots of resources on the puppetmaster, so that they can serve more catalogs by unit time
  • the job will be done faster and by using less resources. Those static servers have been created to spoon-feed our puppet clients…

In fact it was impossible in 0.24.x, but now that file content serving is RESTful it becomes trivial.

Of course offloading would give its best if your clients requires lots of sourced files that change often, or if you provision lots of new hosts at the same time because we’re offloading only content, not file metadata. File content is served only if the client hasn’t the file or the file checksum on the client is different.

An example is better than thousand words

Imagine we have a standard manifest layout with:

  • some globally sourced files under /etc/puppet/files and
  • some modules files under /etc/puppet/modules/<modulename>/files.

Here is what would be the nginx configuration for such scheme:

server {
    listen 8140;

    ssl                     on;
    ssl_session_timeout     5m;
    ssl_certificate         /var/lib/puppet/ssl/certs/master.pem;
    ssl_certificate_key     /var/lib/puppet/ssl/private_keys/master.pem;
    ssl_client_certificate  /var/lib/puppet/ssl/ca/ca_crt.pem;
    ssl_crl                 /var/lib/puppet/ssl/ca/ca_crl.pem;
    ssl_verify_client       optional;

    root                    /etc/puppet;

    # those locations are for the "production" environment
    # update according to your configuration

    # serve static file for the [files] mountpoint
    location /production/file_content/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        alias /etc/puppet/files/;
    }

    # serve modules files sections
    location ~ /production/file_content/[^/]+/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        root /etc/puppet/modules;
        # rewrite /production/file_content/module/files/file.txt
        # to /module/file.text
        rewrite ^/production/file_content/([^/]+)/files/(.+)$  $1/$2 break;
    }

    # ask the puppetmaster for everything else
    location / {
        proxy_pass          http://puppet-production;
        proxy_redirect      off;
        proxy_set_header    Host             $host;
        proxy_set_header    X-Real-IP        $remote_addr;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header    X-Client-Verify  $ssl_client_verify;
        proxy_set_header    X-SSL-Subject    $ssl_client_s_dn;
        proxy_set_header    X-SSL-Issuer     $ssl_client_i_dn;
        proxy_buffer_size   16k;
        proxy_buffers       8 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;
        proxy_read_timeout  65;
    }
}

EDIT: the above configuration was missing the only content-type that nginx can return for Puppet to be able to actually receive the file content (that is raw).

I leave as an exercise to the reader the apache configuration.

It would also be possible to write some ruby/sh/whatever to generate the nginx configuration from the puppet fileserver.conf file.

And that’s all folks, stay tuned for more Puppet (or even different) content.

This morning I got the joy to see that my Puppet Camp 2009 slides had been selected by Slideshare to appear on their home page:

Waouh. For a surprise, that’s a surprise. I guess those stock photos I used are the underlying reason for this.

Still now that I talk about Puppet Camp again, I forgot to give the links to some pictures taken during the event:

and

Puppet Camp 2009 debriefing time!

I attended Puppet Camp 2009 in San Francisco last week. It was a wonderful event and I could meet a lot of really smart developers and sysadmins from a lot of different countries (US, Australia, Europe and even Singapore).

The format of the event (an unconference with some scheduled talks in the morning) was really great. Everybody got a chance to enter or propose a discussion topic they care about. I could attend some development sessions about the Ruby DSL vs Parser DSL, Code smells, Puppet Provider/Type developments, Augeas, and so on…

Morning talks were awesome. I was presenting a talk about storeconfigs, called “All About Storeconfigs”. Puppet Storeconfigs is a feature where you can store nodes configuration and export/collect resources between nodes with the help of a database. I already talked about this in a couple of posts:

You can enjoy the recording of the session (event though they cut the first part which is not that good), and have closer look to my slides here:

What’s great with those conferences in foreign countries is that you usually finish at the pub with some local people to continue to share Puppet (or not) experiences. Those parties were plenty of fun, so thank you everybody for this.

So thanks everybody and Reductive Labs team (especially Andrew who organized everything) for this event, and thanks to Days of Wonder for funding my trip!

Storeconfigs (advanced) use cases

This week on #puppet, Nico asked for a storeconfigs live example. So I thought, a blog post would be perfect to post an example of a storeconfigs use case and its full explanation. Of course if you’re interested in some discussions around storeconfigs, please report to the following blog posts:

At Days of Wonder, I use storeconfigs for only one type of use: exchanging information between nodes. But I know some other people use this feature as an inventory system (to know what node gets what configuration).

Use case 1: website document root replication

Let’s start with a simple example, easily understandable.

At Days of Wonder we have a bunch of webservers arranged in a kind of cluster. All these webservers document root (where reside the various php and image files) should be always in sync. So we rsync to all webservers, from a central build server each time the developpers commit a change.

The tedious part with this scheme is that you have to make sure all the webservers have the correct ssh authorized_keys and ssh authorization for the build server to contact them successfully.

The manifest


# Class:: devl
# This class is implemented on the build server
#
# Usage:
# Generate a ssh key and store the private key and public key
# on the puppetmaster files mount as keys/buildkey and keys/buildkey.pub
#
#   node build {
#       include devl
#       devl::pushkey{
#           "build":
#               keyfile => "files/keys/buildkey"
#       }
#   }
#
#
class devl {
    ...
    define pushkey($keyfile) {
        @@ssh_authorized_key {
            "push-${name}@${fqdn}":
                user => "push",
                type => "ssh-rsa",
                tag => "push",
                # this is to remove the ssh-rsa prefix, the suffix and trim any \n
                key => gsub(gsub(file("/etc/puppet/${keyfile}.pub"), '^ssh-rsa (.*) .*$', '\1'), "\n", ""),
                options => ['command="rsync --server -vlgDtpr --delete . /path/to/docroot/"', 'no-port-forwarding','no-X11-forwarding','no-agent-forwarding','no-pty'],
        }

        # store the private key locally, for our rsync build
        file {
            "/home/build/.ssh/id_${name}":
                ensure => file, owner => "build", group => "build",
                source => "puppet:///${keyfile}", mode => 0400,
                alias => "pkey-${name}",
                require => [User["build"], File["/home/build/.ssh"]]
        }
    }
    ...
}

# Class: www::push
# This class is implemented on webservers
#
class www::push {
    ... create here the push user and so on...
    Ssh_authorized_key <<| tag == "push" |>>
    ...
}

Inner workings

It’s easy when the build server applies its configuration, it creates an exported ssh_authorized_key (notice the double @), which is not applied locally. Instead it is stored in the storeconfigs database.

We also create locally a file containing the ssh private key pair.

When one of the webserver comes to check out its configuration, it implements the www::push class which collects all ssh_authorized_key resources tagged with “push”.

That is all the authorized keys we created with the pushkey definition in the build configuration. The collection means that this resource is created as if we defined it in the node that collects it. That means the webserver will have a new ssh authorized key whose action, options and keys are the one defined in the build server configuration.

Of course this manifest doesn’t show everything, it also drops a handful of shell scripts to do the rsync using the local private keys, along with more configuration files for some other parts of the build.

Note: the gsub function is a custom parser function I borrowed from David Schmidtt repository. In 0.25 it would be replaced by regsubst.

Use case 2: tinydns master and slaves

Once again at Days of Wonder, we run tinydns as our DNS server. Tinydns doesn’t have a fancy full of security holes zone transfer system, so we emulate this functionality by rsync’ing the zone files from the master to the slaves each time the zones are changed (the zones are managed by Puppet of course).

This is somehow the exact same system as the one we saw in the use case 1, except there is one key for all the slaves, and more important each slave registers itself to the master to be part of the replication.

The manifest

class djbdns {
    ...

    # Define: tinydns::master
    # define a master with its listening +ip+, +keyfile+, and zonefile.
    # Usage:
    #     djbdns::tinydns::master {
    #         "root":
    #             keyfile => "files/keys/tinydns",
    #             content => "files/dow/zone"
    #     }
    #
    define tinydns::master($ip, $keyfile, $content='') {
        $root = "/var/lib/service/${name}"
        tinydns::common { $name: ip => $ip, content=>$content }

        # send our public key to our slaves
        @@ssh_authorized_key {
            "dns-${name}@${fqdn}":
                user => "root",
                type => "ssh-rsa",
                tag => "djbdns-master",
                key => file("/etc/puppet/${keyfile}.pub"),
                options => ["command=\"rsync --server -logDtprz . ${root}/root/data.cdb\"", "from=\"${fqdn}\"", 'no-port-forwarding','no-X11-forwarding','no-agent-forwarding','no-pty']
        }

        # store our private key locally
        file {
            "/root/.ssh/${name}_identity":
            ensure => file,
            source => "puppet://${keyfile}", mode => 0600,
            alias => "master-pkey-${name}"
        }

        # replicate with the help of the propagate-key script
        # this exec subscribe to the zone file and the slaves
        # which means each time we add a slave it is rsynced
        # or each time the zone file changes.
        exec {
            "propagate-data-${name}":
                command => "/usr/local/bin/propagate-key ${name} /var/lib/puppet/modules/djbdns/slaves.d /root/.ssh/${name}_identity",
                subscribe => [File["/var/lib/puppet/modules/djbdns/slaves.d"] , File["${root}/root/data"], Exec["data-${name}"]],
                require => [File["/usr/local/bin/propagate-key"], Exec["data-${name}"]],
                refreshonly => true
        }

        # collect slaves address
        File<<| tag == 'djbdns' |>>
    }

    # Define:: tinydns::slave
    # this define is implemented on each tinydns slaves
    define tinydns::slave($ip) {
        $root = "/var/lib/service/${name}"

        tinydns::common { $name: ip => $ip }

        # publish our addresses back to the master
        # our ip address ends up being in a file name in the slaves.d directory
        # where the propagate-key shell script will get it.
        @@file {
            "/var/lib/puppet/modules/djbdns/slaves.d/${name}-${ipaddress}":
            ensure => file, content => "\n",
            alias => "slave-address-${name}",
            tag => 'djbdns'
        }

        # collect the ssh public keys of our master
        Ssh_authorized_key <<| tag == 'djbdns-master' |>>
    }
}

Inner workings

This time we have a double exchange system:

  1. The master exports its public key to be collected by the slaves
  2. and the slaves are exporting back their IP addresses to the master, under the form a of an empty file. Their IP address is encoded in those file names.

When the zone file has to be propagated, the propagate-key shell script is executed. This script lists all the file in the /var/lib/puppet/djbdns/slaves.d folder where the slaves exports their ip addresses, extract the ip address from the file names and calls rsync with the correct private key. Simple and elegant, isn’t it?

Other ideas

There’s simply no limitation to what we can do with storeconfigs, because you can export any kind of resources, not only files or ssh authorized keys.

I’m giving here some ideas (some that we are implementing here):

  • Centralized backups. Using rdiff-backup for instance, we could propagate the central backup server key to all servers, and get back the list of files to backup.
  • Resolv.conf building. This is something we’re doing at Days of Wonder. Each dnscache server exports their IP address, and we build resolv.conf on each host from those addresses.
  • Ntp automated configuration: each NTP server (of a high stratum) exports their ip address (or ntp.conf configuration fragments) that can be used for all the other NTP server to be pointed to those to form lower stratum servers.
  • Automated monitoring configurations: each service and node exports configuration fragments that are collected on the NMS host to build the NMS configuration. People running nagios or munin seems to do that.

If you have some creative uses of storeconfigs, do not hesitate to publish them, either on the Puppet-user list, the Puppet wiki or elsewhere (and why not in a blog post that could be aggregated by Planet Puppet).


Planet Puppet is born!

As usual, I’m faster to create things than to talk about them.

Last week, after talking with several member of #puppet, I decided to register planetpuppet.org, and to install moonmoon to aggregate the few Puppet blogs out there in the blogosphere.

The whole aim of this attempt is to provide more exposure to our own blogs (we have a sentence in France which basically says: “union makes the force”). This is not to be confused with Puppet Planet

If you run a blog with a Puppet tag or category from which we can extract a RSS or Atom Feed, then please contact me or drop a comment here, and I’ll happily add it to the Planet Puppet.

There are still some work to do for the site. For instance it looks ugly, has no logo, and there’s no explanation of what it is. My plan is to add this incrementally; I wanted to have first the site up and running. And since I plain suck at graphic design, I’ll wait some Days of Wonder co-worker vacation return to ask them for some help on this area :-)

Meanwhile, do not forget to visit Planet Puppet from time to time (once a day would be good!). It is also possible to subscribe to the Planet Puppet feed.

  • 0 Comments
  • Filed under: Puppet
  • Puppet and JRuby a love story!

    As announced in my last edit of my yesterday post Puppet and JRuby a love and hate story, I finally managed to run a webrick puppetmaster under JRuby with a MRI client connecting and fetching it’s config.

    The Recipe

    Puppet side

    Unfortunately Puppet creates its first certificate with a serial number of 0, which JRuby-OpenSSL finds invalid (in fact that’s Bouncy Castle JCE Provider). So the first thing is to check if you already have some certificate generated with a serial of 0. If you have none, then everything is great you can skip this.

    You can see a certificate content with openssl:

    
    % openssl x509 -text -in /path/to/my/puppet/ssl/ca/ca_cert.pem
    
    Certificate:
    Data:
    Version: 3 (0x2)
    Serial Number: 1 (0x1)
    Signature Algorithm: sha1WithRSAEncryption
    Issuer: CN=ca
    Validity
    Not Before: May 23 18:38:19 2009 GMT
    Not After : May 22 18:38:19 2014 GMT
    Subject: CN=ca
    ...
    

    If no certificate has a serial of 0, then it’s OK, otherwise I’m afraid you’ll have to start the PKI from scratch (which means rm -rf $vardir/ssl and authenticate clients again), after applying the following Puppet patch:

    
    JRuby fix: make sure certificate serial > 0
    
    JRuby OpenSSL implementation is more strict than real ruby one and
    requires certificate serial number to be strictly positive.
    
    Signed-off-by: Brice Figureau <brice-puppet@daysofwonder.com>
    
    diff --git a/lib/puppet/ssl/certificate_authority.rb b/lib/puppet/ssl/certificate_authority.rb
    index 08feff0..4a7d461 100644
    --- a/lib/puppet/ssl/certificate_authority.rb
    +++ b/lib/puppet/ssl/certificate_authority.rb
    @@ -184,7 +184,7 @@ class Puppet::SSL::CertificateAuthority
    # it, but with a mode we can't actually read in some cases.  So, use
    # a default before the lock.
    unless FileTest.exist?(Puppet[:serial])
    -            serial = 0x0
    +            serial = 0x1
    end
    
    Puppet.settings.readwritelock(:serial) { |f|
    

    I’ll post this patch to puppet-dev soon, so I hope it’ll eventually get merged soon in mainline.

    JRuby

    You need the freshest JRuby available at this time. My test were conducted with latest JRuby as of commit “3aadd8a”. The best is to clone the github jruby repository, and build it (it requires of course a JDK and Ant, but that’s pretty much all).

    Then install jruby in your path (if you need assistance for this, I’m not sure this blog post is for you :-))

    JRuby-OpenSSL

    As I explained in my previous blog post about the same subject, Puppet exercises a lot the Ruby OpenSSL subsystem. During this experiment, I found a few shortcomings in the current JRuby-OpenSSL 0.5, including missing methods, or missing behaviors needed by Puppet to run fine.

    So to get a fully Puppet enabled JRuby-OpenSSL you need either to get the very latest JRuby-OpenSSL from its own github repository (or checkout the puppet-fixes branch of my fork of said repository on github) and or apply manually the following patches on top of the 0.5 source tarballs:

    • JRUBY-3689: OpenSSL::X509::CRL can’t be created with PEM content
    • JRUBY-3690: OpenSSL::X509::Request can’t be created from PEM content
    • JRUBY-3691: Implement OpenSSL::X509::Request#to_pem
    • JRUBY-3692: Implement OpenSSL::X509::Store#add_file
    • JRUBY-3693: OpenSSL::X509::Certificate#check_private_key is not implemented
    • JRUBY-3556: Webrick doesn’t start in https
    • JRUBY-3694: Webrick HTTPS produces some SSL stack trace

    Then rebuild JRuby-OpenSSL which is a straightforward process (copy build.properties.SAMPLE to build.properties, adjust jruby.jar path, and then issue ant jar to build the jopenssl.jar).

    Once done, install the 0.5 JRuby-OpenSSL gem in your jruby install, and copy other the built jar in lib/ruby/gems/1.8/gems/jruby-openssl-0.5/lib.

    Let’s try it!

    Then it’s time to run your puppetmaster, just start it with jruby instead of ruby. Of course you need the puppet dependencies installed (Facter).

    My next try will be to run Puppet on Jruby and mongrel (or what replaces it in JRuby world), then try with storeconfig on…

    Hope that helps, and for any question, please post in the puppet-dev list.

    Puppet and JRuby, a love and hate story

    Since I heard about JRuby about a year ago, I wanted to try to run my favorite ruby program on it. I’m working with Java almost all day long, so I know for sure that the Sun JVM is a precious tool for running long-lived server. It is pretty fast, and has a very good (and tunable) garbage collector.

    In a word: the perfect system to run a long-lived puppetmaster!

    The first time I tried, back in February 2009, I unfortunately encountered the bug JRUBY-3349 which prevented Puppet to run quite early, because the Fcntl constants weren’t defined. Since my understanding of JRuby internal is near zero, I left there.

    But thanks to Luke Kanies (Puppet creator), one of the JRuby main developers Charles Oliver Nutter fixed the issue a couple of weeks ago (thanks to him, and they even fixed another issue at about the same time about fcntl which didn’t support SET_FD).

    That was just in time for another test…

    But what I forgot was that Puppet is not every ruby app on the block. It uses lots of cryptography behind the scene. Remember that Puppet manages its own PKI, including:

    • a full Certification Authority.
    • a CRL.
    • authenticated clients connections, through SSL.

    That just means Puppet exercise a lot the Ruby OpenSSL extension.

    The main issue is that MRI uses OpenSSL for all the cryptographic stuff, and JRuby uses a specific Java version of this extension. Of course this later is still young (presently at v 0.5) and doesn’t contain yet everything needed to be able to run Puppet.

    In another life I wrote a proprietary cryptographic Java library, so I’m not a complete cryptography newcomer (OK, I forgot almost everything, but I still have some good books to refer to). So I decided to implement what is missing in JRuby-openssl to allow a webrick Puppetmaster to run.

    You can find my contributions in the various JRUBY-3689, JRUBY-3690, JRUBY-3691, JRUBY-3692, JRUBY-3693 bugs.

    I still have another a minor patch to submit (OpenSSL::X509::Certificate#to_text implementation).

    So the question is: with all that patches applied, did I get a puppetmaster running?

    And the answer is unfortunately no.

    I can get the puppetmaster to start on a fresh configuration (ie it creates everything SSL related and such), but it fails as soon a client connects (hey that’s way better than before I started :-)).

    All comes from SSL. The issue is that with the C OpenSSL implementation it is possible to get the peer certificate anytime, but the java SSL implementation (which is provided by the Sun virtual machine) requires the client to be authenticated before anyone get access to the peer certificate.

    That’s unfortunate because to be able to authenticate a not-yet-registered client, we must have access to its certificate. I couldn’t find any easy code fix, so I stopped my investigations there.

    There is still some possible workarounds, like running in mongrel mode (provided JRuby supports mongrel which I didn’t check) and let Nginx (or Apache) handle the SSL stuff, but still it would be great to be able to run a full-fledged puppetmaster on JRuby.

    I tried with a known client and get the same issue, so maybe that’s a whole different issue, I guess I’ll have to dig deeper in the Java SSL code, which unfortunately is not available :-)

    Stay tuned for more info about this. I hope to be able to have a full puppetmaster running on JRuby soon!

    EDIT: I could run a full puppetmaster on webrick from scratch under JRuby with a normal ruby client. I’ll post the recipe in a subsequent article soon.

    OMG!! storedconfigs killed my database!

    When I wrote my previous post titled all about storedconfigs, I was pretty confident I explained everything I could about storedconfigs… I was wrong of course :-)

    A couple of days ago, I was helping some USG admins who were facing an interesting issue. Interesting for me, but I don’t think they’d share my views on this, as their servers were melting down under the database load.

    But first let me explain the issue.

    The issue

    The thing is that when a client checks in to get its configuration, the puppetmaster compiles its configuration to a digestible format and returns it. This operation is the process of transforming the AST built by parsing the manifests to what is called the catalog in Puppet. This is this catalog (which in fact is a graph of resources) which is later played by the client.

    When the compilation process is over, and if storedconfigs is enabled on the master, the master connects to the RDBMS, and retrieves all the resources, parameters, tags and facts. Those, if any, are compared to what has just been compiled, and if some resources differs (by value/content, or if there are some missing or new ones), they get written to the database.

    Pretty straightforward, isn’t it?

    As you can see, this process is synchronous and while the master processes the storedconfigs operations, it doesn’t serve anybody else.

    Now, imagine you have a large site (ie hundreds of puppetd clients), and you decide to turn on storedconfigs. All the clients checking in will see their current configuration stored in the database.

    Unfortunately the first run of storedconfigs for a client, the database is empty, so the puppetmaster has to send all the information to the RDBMS which in turns as to write it to the disks. Of course on subsequent runs only what is modified needs to reach the RDBMS which is much less than the first time (provided you are running 0.24.8 or applied my patch).

    But if your RDBMS is not correctly setup or not sized for so much concurrent write load, the storedconfigs process will take time. During this time this master is pinned to the database and can’t serve clients. So the immediate effect is that new clients checking in will see timeouts, load will rise, and so on.

    The database

    If you are in the aforementioned scenario you must be sure your RDBMS hardware is properly sized for this peak load, and that your database is properly tuned.

    I’ll soon give some generic MySQL tuning advices to let MySQL handle the load, but remember those are generic so YMMV.

    Size the I/O subsystem

    What people usually forget is that disk (ie those with rotating plates, not SSDs) have a maximum number of I/O operations per seconds. This value is for professional high-end disks about 250 IOP/s.

    Now, to simplify, let’s say your average puppet client has 500 resources with an average of 4 parameters each. That means the master will have to perform at least 500 * 4 + 500 = 2500 writes to the database (that’s naive since there are indices to modify, and transactions can be grouped, etc.. but you see the point).

    Add to this the tags, hmm let’s say an average of 4 tags per resources, and we have 500 * 4 + 500 + 500 * 4 = 4500 writes to perform to store the configuration of a given host.

    Now remember our 250 IOP/s, how many seconds does the disk need to performs 4500 writes?

    The answer is 18s!! Which is a high value. During this time you can’t do anything else. Now add concurrency to the mix, and imagine what that means.

    Of course this supposes we have to wait for the disk to have finished (ie synchronous writing), but in fact that’s pretty how RDBMS are working if you really want to trust your data.

    So the result is that if you want a fast RDBMS you must be ready to pay for an expensive I/O subsystem.

    Size the I/O subsystem

    That’s certainly the most important part of your server.

    You need:

    • fast disks (15k RPM, because they is a real latency benefit compared to 10k )
    • the more spindle possible grouped in a sane RAID array like RAID10. Please forget RAID5 if you want your data to be safe (and fast writes). I saw too much horror stories with RAID5. I should really join the BAARF.
    • a Battery Backed RAID Cache unit (that will absorb the fsyncs gracefully).
    • Tune the RAID for the largest stripe size. Remove the RAID read cache if possible (innodb will take care of the READ cache with the innodb buffer pool).

    If you don’t have this, do not even think turning on storedconfigs for a large site.

    Size the RDBMS server

    Of course other things matters. If the database can fit in RAM (the best if you don’t want to be I/O bound), then you obviously need RAM. Preferably ECC Registered RAM. Use 64 bits hardware with a 64 bits OS.

    Then you need some CPU. Nowadays they’re cheap, but beware of InnoDB scaling issues on multi-core/multi-CPU systems (see below).

    Tune the database configuration

    Here is a checklist on how to tune MySQL for a mostly write load:

    InnoDB of course

    For concurrency, stability and durability reasons InnoDB is mandatory. MyISAM is at best usable for READ workload but suffers concurrency issues so it is a no-no for our topic

    Tuned InnoDB

    The default InnoDB settings are tailored to very small 10 years old servers…

    Things to look to:

    • innodb_buffer_pool_size. Usual advice says 70% to 80% of physical RAM of the server if MySQL is the only running application. I’d say that it depends on the size of the database. If you know you’ll store only a few MiB, no need to allocate 2 GiB :-). More information with this useful and intersting blog post from Percona guys.
    • innodb_log_file_size. We want those to be the largest we can to ease the mostly write log we have. Once all the clients will be stored in the database we’ll reduce this to a something lower. The trade-off with large logs is the recovery time in case of crash. It isn’t uncommon to see several hundreds of MiB, or even GiB.
    • innodb_flush_method = O_DIRECT on Linux. This is to prevent the OS to cache the innodb_buffer_pool content (thus ending with a double cache).
    • innodb_flush_log_at_trx_commit=2. If your MySQL server doesn’t have any other use than for storedconfigs or you don’t care about the D in ACID. Otherwise use 0. It is also possible to temporarily change it to 2, and then move back to 0 when all clients have their configs stored.
    • transaction-isolation=READ-COMMITTED. This one can help also, although I never tested it myself

    Patch MySQL

    The fine people at Percona or Ourdelta produces some patched builds of MySQL that removes some of the MySQL InnoDB scalability issues. This is more important on high concurrency workload on multi-core/multi-cpu systems.

    It can also be good to run MySQL with Google’s perftools TCMalloc. TCMalloc is a memory allocator which scales way better than the Glibc one.

    On the Puppet side

    The immediate and most straightforward idea is to limit the number of clients that can check in at the same time. This can be done by disabling puppetd on each client (puppetd –disable), blocking network access, or any other creative mean…

    When all the active hosts have checked in, you can then enable the other ones. This can be done hundreds of hosts at a time, until all hosts have a configuration stored.

    Another solution is to direct some hosts to a special puppetmaster with storedconfigs on (the regular one still has storedconfigs disabled), by playing with DNS or by configuration, whatever is simplest in your environment. Once those hosts have their config stored, move them back to their regular puppetmaster and move newer hosts there.

    Since that’s completely manual, it might be unpractical for you, but that’s the simplest method.

    And after that?

    As long as your manifests are only slightly changing, subsequent runs will see only a really limited database activity (if you run a puppetmaster >= 0.24.8). That means the tuning we did earlier can be undone (for instance you can lower the innodb_log_file_size for instance, and adjust the innodb_buffer_pool_size to the size of the hot set).

    But still storedconfigs can double your compilation time. If you are already at the limit compared to the number of hosts, you might see some client timeouts.

    The Future

    Today Luke announced on the puppet-dev list that they were working on a queuing system to defer storedconfigs and smooth out the load by spreading it on a longer time. But still, tuning the database is important.

    The idea is to offload the storedconfigs to another daemon which is hooked behind a queuing system. After the compilation the puppetmaster queues the catalog, where it will be unqueued by the puppet queue daemon which will in turn execute the storedconfigs process.

    I don’t know the ETA for this interesting feature, but meanwhile I hope the tips I provided here can be of any help to anyone :-)

    Stay tuned for more puppet stories!

    All about Puppet storeconfigs

    Since a long time people (including me) complained that storeconfigs was a real resource hog. Unfortunately for us, this option is so cool and useful.

    What’s storeconfigs

    Storeconfigs is a puppetmasterd option that stores the nodes actual configuration to a database. It does this by comparing the result of the last compilation against what is actually in the database, resource per resource, then parameter per parameter, and so on.

    The actual implementation is based on Rails’ Active Record, which is a great way to abstract the gory details of the database, and prototype code easily and quickly (but has a few shortcomings).

    Storeconfigs uses

    The immediate use of storeconfigs is exported resources. Exported resources are resources which are prefixed by @@. Those resources are marked specially so that they can be collected on several other nodes.

    A little completely dumb example speaks by itself:

    class exporter {
      @@file {
        "/var/lib/puppet/nodes/$fqdn": content => "$ipaddress\n", tag => "ip"
      }
    }
    
    node "export1.daysofwonder.com" {
      include exporter
    }
    
    node "export2.daysofwonder.com" {
      include exporter
    }
    
    node "collector.daysofwonder.com" {
      File <<| tag == "ip" |>>
    }
    

    What does this example do?
    That’s simple, all the exporter nodes creates a file in /var/lib/puppet/nodes whose name is the node name and whose content is its primary IP address.
    What is interesting is that the node “collector.daysofwonder.com” collects all files tagged by “ip“, that is all the exported files. In the end, after exporter1, exporter2 and collector have run a compilation, the collector host will have the /var/lib/puppet/nodes/exporter1.daysofwonder.com and /var/lib/puppet/nodes/exporter2.daysofwonder.com and their respective content.

    Got it?

    That’s the perfect tool for instance to automatically:

    • share/distribute public keys (ssh or openssl or other types)
    • build list of hosts running some services (for monitoring)
    • build configuration files which requires multiple hosts (for instance /etc/resolv.conf can be the concatenation of files exported by your dns cache hosts
    • and certainly other creative use

    Still there is another use, since the whole configuration of your nodes is in an RDBMS, you can use that to perform some data-mining about your hosts configuration. That’s what puppetshow does.

    Shortcomings

    The storeconfigs issue its current incarnation (ie 0.24.7) is that it is a slow feature (it usually doubles the compilation time), and imposes an higher load on the puppetmaster and the database engine.

    For large installation it might not possible to be able to run with this feature on. There were also some reports of high memory usage or leak with this feature on (see my recommendation about this in my puppetmaster memory leak post).

    Recommendations

    Here my usual puppet and storeconfigs recommendations:

    • use a fairly new ruby interpreter (at least one that is known to be memory leak free)
    • use a fairly new Rails (I’m currently using rails 2.1.0 on my master without any issues)
    • use the mysql ruby connector if you use mysql (otherwise rails will use a pure ruby implementation which is reported to not be stable)
    • use a powerful database engine (not sqlite), and for large deployements use a dedicated server (or cluster of servers). If you are using mysql and you want to trust your data, use InnoDB of course.
    • properly tune your database engine for a mix of writes and reads (for InnoDB a properly sized buffer pool and logs is mandatory).
    • make sure your manifests are determinists

    I think the last point deserves a little bit more explanation:

    I had the following schematized pattern in some of my manifests, that I took from David Schmitt excellent modules:

    in one class:
    if defined(File["/var/lib/puppet/modules/djbdns.d/"]) {
      warn("already defined")
    } else {
      file {
        "/var/lib/puppet/modules/djbdns.d/": ...
      }
    }
    
    and in another class the exact same code:
    if defined(File["/var/lib/puppet/modules/djbdns.d/"]) {
      warn("already defined")
    } else {
      file {
        "/var/lib/puppet/modules/djbdns.d/": ...
      }
    }
    

    What happens is that from run to run the evaluation order could change, and the defined resource could be the one in the first class and another time it could be the one in the second class, which meant the storeconfigs code had to remove the resources from the database and re-create them again. Clearly not the best way to have less database workload :-)

    What’s cooking

    I contributed for 0.24.8 a partial rewrite of some parts of the storeconfigs feature to increase its performance.

    My analysis is that what was slow in the feature is threefold:

    1. creating tons of Active Record objects is slow (one object per resource parameters)
    2. although the code was clearly rails optimized code (ie using association prefetching and so), there was still a large number of read operations for all the tags and parameters
    3. there are still a large number of writes to the database on successive runs because the order of tags evaluation is not guaranteed.

    I fixed the first two points by attacking directly the database to fetch the parameters and tags, keeping them in hash instead of objects. This saves a large number of database request and at the same time it prevents a large number of ruby objects to be created (it should even save some memory).

    The last point was fixed by imposing a strict order (although not completely correct, but still better that how it was) in the way the tags are assigned to resources.

    Both patches have been merged for 0.24.8, and some people reported some performance improvements.

    On the Days of Wonder infrastructure I found that with a 562 resources node, on a tuned mysql database:

    • 0.24.7:
      info: Stored catalog for corp2.daysofwonder.com in 4.05 seconds
      notice: Compiled catalog for corp2.daysofwonder.com in 6.31 seconds
    • 0.24.7 with the patch:
      info: Stored catalog for corp2.daysofwonder.com in 1.39 seconds
      notice: Compiled catalog for corp2.daysofwonder.com in 3.80 second

    That’s a nice improvement, isn’t it :-)

    The future?

    Luke and I discussed about this, it was also discussed on the puppet-dev list a few times. I think that a RDBMS might not be the right storage choice for this feature, because clearly there is almost no random keyed access to the individual parameters of a resource (so having a table dedicated to parameters is of almost no use).

    I know Luke’s plan is to abstract the storeconfigs feature from the current implementation (certainly through the indirector), so that we can use different storeconfigs engines.

    I also know that someone is working on a promising CouchDB implementation. I myself can see a memcached implementation (which I’d really like to start working on). Maybe even the filesystem would be enough.

    Of course, I’m open to any other improvements or storage engine ideas :-)

    Help! Puppetd is eating my server!

    This seems to be recurrent this last 3 or 4 days with a few #puppet, redmine or puppet-user requests, asking about why puppetd is consuming so much CPU and/or memory.

    While I don’t have a definitive answer about why it could happen (hey all software components have bugs), I think it is important to at least know how to see what happens. I even include some common issues I myself have observed.

    Know your puppetd

    I mean, know what is puppetd doing. That’s easy, disable puppetd on the host where you have an issue, and try to run it manually in debug mode. I’m really astonished that almost nobody tries a debug run before complaining that something doesn’t work :-)

    % puppetd --disable
    % puppetd --test --debug --trace
    ... full output on the console ...
    

    At the same time, monitor the CPU usage and look at the debug entries when most of the CPU is consumed.
    If nothing is printed at this same moment, and it still uses CPU, CTRL-C the process, maybe it will print a useful stack trace that will help you (or us) understand what happens.

    With this you will certainly catch things you didn’t intend (see below computing checksums when it is not necessary).

    Inspect your ruby interpreter

    I already mentioned this tip in my puppetmaster memory leak post a month ago. You can’t imagine how much useful information you can get with this tool.

    Install as explained in the original article the ruby file into ~/.gdb/ruby, copy the following into your ~/.gdbinit:

    define session-ruby
      source ~/.gdb/ruby
    end
    

    Here I’m going to show how to do this with a puppetmasterd, but it is exactly the same thing with puppetd.

    Basically, the idea is to attach gdb to the puppet process, halt it and look to the current stack trace:

    % ps auxgww | grep puppetd
    puppet   28602  2.0  8.9 275508 184492 pts/3   Sl+  Feb19  65:25 ruby /usr/bin/puppetmasterd --debug
    
    % gdb /usr/bin/ruby
    GNU gdb 6.8-debian
    Copyright (C) 2008 Free Software Foundation, Inc.
    ...
    (gdb) session-ruby
    (gdb) attach 28602
    Attaching to program: /usr/bin/ruby, process 28602
    ...
    

    Now our gdb is attached to our ruby interpreter.
    Lets see where we stopped:

    (gdb) rb_backtrace
    $3 = 34
    

    Note: the output is displayed by default on the stdout/stderr of the attached process, so in our case my puppetmasterd. Going to the terminal where it runs (actually the screen):

    ...
            from /usr/lib/ruby/1.8/webrick/server.rb:91:in `select'
            from /usr/lib/ruby/1.8/webrick/server.rb:91:in `start'
            from /usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
            from /usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
            from /usr/lib/ruby/1.8/puppet.rb:293:in `start'
            from /usr/lib/ruby/1.8/puppet.rb:144:in `newthread'
            from /usr/lib/ruby/1.8/puppet.rb:143:in `initialize'
            from /usr/lib/ruby/1.8/puppet.rb:143:in `new'
            from /usr/lib/ruby/1.8/puppet.rb:143:in `newthread'
            from /usr/lib/ruby/1.8/puppet.rb:291:in `start'
            from /usr/lib/ruby/1.8/puppet.rb:290:in `each'
            from /usr/lib/ruby/1.8/puppet.rb:290:in `start'
            from /usr/sbin/puppetmasterd:285
    

    It works!
    It is now easy to see what puppetd is doing:

    1. introspect your running and eating puppetd
    2. stop it (issue CTRL-C in gdb)
    3. rb_backtrace, copy the backtrace in a file
    4. issue ‘continue’ in gdb to let the process run again
    5. go to 2. several times

    Examining the stack traces should give you hints (or us) to what your puppetd is doing at this moment.

    Possible causes of puppetd CPU consumption

    A potential bug

    You might have encountered a bug. Please report it in Puppet redmine, and enclose all the useful information you gathered by following the two points above.

    A recursive file resource with checksum on

    That’s the usual suspect, and one I encountered myself.

    Let’s say you have something like this in your manifest:

    File { checksum =&gt; md5 }
    ...
    
    file {
      &quot;/path/to/so/many/files&quot;:
        owner =&gt; myself, mode =&gt; 0644, recurse =&gt; true
    }
    

    What does that mean?
    You’re telling puppet that every file resource should compute checksum, and you have a recursive file operation managing owner and mode. What puppetd will do is to traverse the whole ‘/path/to/so/many/files’ and happily manage them changing owner and mode when needed.
    What you might have forgotten, is that you requested checksum to be MD5, so puppetd instead of only doing a bunch of stat(3) on your files will also compute MD5 sums of their content.
    If you have tons of files in this hierarchy this can take quite some time. Since checksums are cached, it can also take quite some memory.

    How to solve this issue:

    File { checksum =&gt; md5 }
    ...
    
    file {
      &quot;/path/to/so/many/files&quot;:
        owner =&gt; myself, mode =&gt; 0644, recurse =&gt; true, checksum =&gt; undef
    }
    

    Sometimes, it isn’t possible to solve this issue, if your file {} resource is a retrieve file (ie there is a source parameter), because you need to have checksum to manage the files. In this case, just byte the bullet, change the checksum to mtime, limit recursion or wait for my fix of Puppet bug #1469.

    Simply no reason

    Actually it is in your interest that puppetd is taking 100% of CPU while applying the configuration the puppetmaster has given. That just means it’ll do its job faster than if it was consuming 10% of CPU :-)

    I mean, puppetd has a fixed amount of things to perform, some are CPU bound, some are I/O bound (actually most are I/O bound), so it is perfectly normal that it takes wall clock time and consume resources to play your manifests.

    What is not normal is consuming CPU or memory between configuration run. But you already know how to diagnose such issues if you read the start of this post :-)

    Conclusion

    Not all resource consumption are bad.

    We’re all dreaming of a faster puppetd.

    And at this subject, I think it should be possible (provided ruby supports native thread (maybe a task for JRuby)) to apply the catalog in a multi-threaded way. I never really thought about this (I mean technically), but I don’t see why it couldn’t be possible. That would allow puppetd to do several I/O bound operations in parallel (like installing packages and managing files at the same time).

    Masterzen's Pictures

    Golden Gate from the other Side

    Golden Gate from the other Side

    Sunset in Sausalito

    Sunset in Sausalito

    Ohad testing the offroad Segway

    Ohad testing the offroad Segway

    Muir Woods redwoods

    Muir Woods redwoods

    Sunset in Sausalito

    Sunset in Sausalito

    masterzen's photo

    masterzen's photo

    And this is offroad

    And this is offroad

    Golden Gate from the other Side

    Golden Gate from the other Side