Masterzen’s Blog

Journey in a software world…

Puppet Memory Usage - not a fatality

As every reader of this blog certainly know, I’m a big fan of Puppet, using it in production on Days of Wonder servers, up to the point I used to contribute regularly bug fixes and new features (not that I stopped, it’s just that my spare time is a scarce resource nowadays).

Still, I think there are some issues in term of scalability or resource consumption (CPU or memory), for which we can find some workarounds or even fixes. Those issues are not a symptom bad programming or bad design. No, most of the issues come either from ruby itself or some random library issues.

Let’s review the things I have been thinking about lately.

Memory consumption

This is by far one of the most seen issues both on the client side and the server side. I’ve mainly seen this problem on the client side, up to the point that most people recommend running puppetd as cronjobs, instead of being a long lived process.

Ruby allocator

All boils down to the ruby (at least the the MRI 1.8.x version) allocator. This is the part in the ruby interpreter that deals with memory allocations. Like in many dynamic languages, the allocator manages a memory pool that is called a heap. And like some other languages (among them Java), this heap can never shrink and always grows when more memory is needed. This is done this way because it is simpler and way faster. Usually applications ends using their nominal part of memory and no more memory has to be allocated by the kernel to the process, which gives faster applications.

The problem is that if the application needs transiently a high amount of memory that will be trashed a couple of millisecond after, the process will pay this penalty all its life, even though say 80% of the memory used by the process is free but not reclaimed by the OS.

And it’s even worst. The ruby interpreter when it grows the heap, instead of allocating bytes per bytes (which would be really slow) does this by chunk. The whole question is what is the proper size of a chunk?

In the default implementation of MRI 1.8.x, a chunk is the size of the previous heap times 1.8. That means at worst a ruby process might end up allocating 1.8 times more than what it really needs at a given time. (This is a gross simplification, read the code if you want to know more).

Yes but what happens in Puppet?

So how does it apply to puppetd?

It’s easy, puppetd uses memory for two things (beside maintaining some core data to be able to run):

  1. the catalog (which contains all resources, along with all templates) as shipped by the puppetmaster (i.e. serialized) and live as ruby objects.
  2. the content of the sourced files (one at a time, so it’s the biggest transmitted file that imposes it’s high watermark for puppetd). Of course this is still better than in 0.24 where the content was transmitted encoded in XMLRPC adding the penalty of escaping everything…

Hopefully, nobody distributes large files with Puppet :-) If you’re tempted to do so, see below…

But again there’s more, as Peter Meier (known as duritong in the community) discovered a couple of month ago: when puppetd gets its catalog (which by the way is transmitted in json nowadays), it also stores it as a local cache to be able to run if it can’t contact the master for a subsequent run. This operation is done by unserializing the catalog from json to ruby live objects, and then serializing the laters to YAML. Beside the evident loss of time to do that on large catalog, YAML is a real memory hog. Peter’s experience showed that about 200MB of live memory his puppetd process was using came from this final serialization!

So I had the following idea: why not store the serialized version of the catalog (the json one) since we already have it in a serialized form when we receive it from the master (it’s a little bit more complex than that of course). This way no need to serialize it again in YAML. This is what ticket #2892 is all about. Luke is committed to have this enhancement in Rowlf, so there’s good hope!

Some puppet solutions?

So what can we do to help puppet not consume that many memory?

In theory we could play on several factors:

  • Transmit smaller catalogs. For instance get rid of all those templates you love (ok that’s not a solution)
  • Stream the serialization/deserialization with something like Yajl-Ruby
  • Use another ruby interpreter with a better allocator (like for instance JRuby)
  • Use a different constant for resizing the heap (ie replace this 1.8 by 1.0 or less on line 410 of gc.c). This can be done easily when using Rails machine GC patches or Ruby Enterprise Edition, in which case setting the environment variable RUBY_HEAP_SLOTS_GROWTH_FACTOR is enough. Check the documentation for more information.
  • Stream the sourced file on the server and the client (this way only a small buffer is used, and the total size of the file is never allocated). This one is hard.

Note that the same issues apply to the master too (especially for the file serving part). But it’s usually easier to run a different ruby interpreter (like REE) on the master than on all your clients.

Streaming HTTP requests is promising but unfortunately would require large change to how Puppet deals with HTTP. Maybe it can be done only for file content requests… This is something I’ll definitely explore.

This file serving thing let me think about the following which I already discussed several time with Peter…

File serving offloading

One of the mission of the puppetmaster is to serve sourced file to its clients. We saw in the previous section that to do that the master has to read the file in memory. That’s one reason it is recommended to use a dedicated puppetmaster server to act as a pure fileserver.

But there’s a better way, provided you run puppet behind nginx or apache. Those two proxies are also static file servers: why not leverage what they do best to serve the sourced files and thus offload our puppetmaster?

This has some advantages:

  • it frees lots of resources on the puppetmaster, so that they can serve more catalogs by unit time
  • the job will be done faster and by using less resources. Those static servers have been created to spoon-feed our puppet clients…

In fact it was impossible in 0.24.x, but now that file content serving is RESTful it becomes trivial.

Of course offloading would give its best if your clients requires lots of sourced files that change often, or if you provision lots of new hosts at the same time because we’re offloading only content, not file metadata. File content is served only if the client hasn’t the file or the file checksum on the client is different.

An example is better than thousand words

Imagine we have a standard manifest layout with:

  • some globally sourced files under /etc/puppet/files and
  • some modules files under /etc/puppet/modules/<modulename>/files.

Here is what would be the nginx configuration for such scheme:

server {
    listen 8140;

    ssl                     on;
    ssl_session_timeout     5m;
    ssl_certificate         /var/lib/puppet/ssl/certs/master.pem;
    ssl_certificate_key     /var/lib/puppet/ssl/private_keys/master.pem;
    ssl_client_certificate  /var/lib/puppet/ssl/ca/ca_crt.pem;
    ssl_crl                 /var/lib/puppet/ssl/ca/ca_crl.pem;
    ssl_verify_client       optional;

    root                    /etc/puppet;

    # those locations are for the "production" environment
    # update according to your configuration

    # serve static file for the [files] mountpoint
    location /production/file_content/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        alias /etc/puppet/files/;
    }

    # serve modules files sections
    location ~ /production/file_content/[^/]+/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        root /etc/puppet/modules;
        # rewrite /production/file_content/module/files/file.txt
        # to /module/file.text
        rewrite ^/production/file_content/([^/]+)/files/(.+)$  $1/$2 break;
    }

    # ask the puppetmaster for everything else
    location / {
        proxy_pass          http://puppet-production;
        proxy_redirect      off;
        proxy_set_header    Host             $host;
        proxy_set_header    X-Real-IP        $remote_addr;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header    X-Client-Verify  $ssl_client_verify;
        proxy_set_header    X-SSL-Subject    $ssl_client_s_dn;
        proxy_set_header    X-SSL-Issuer     $ssl_client_i_dn;
        proxy_buffer_size   16k;
        proxy_buffers       8 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;
        proxy_read_timeout  65;
    }
}

EDIT: the above configuration was missing the only content-type that nginx can return for Puppet to be able to actually receive the file content (that is raw).

I leave as an exercise to the reader the apache configuration.

It would also be possible to write some ruby/sh/whatever to generate the nginx configuration from the puppet fileserver.conf file.

And that’s all folks, stay tuned for more Puppet (or even different) content.

At Days of Wonder we are huge fans of MySQL (and since about a year of the various Open Query, Percona, Google or other community patches), up to the point we’re using MySQL for about everything in production.

But since we moved to 5.0, back 3 years ago our production databases which hold our website and online game systems has a unique issue: the mysqld process uses more and more RAM, up to the point where the kernel OOM decide to kill the process.

You’d certainly think we are complete morons because we didn’t do anything in the last 3 years to fix the issue :-)

Unfortunately, I never couldn’t replicate the issue in the lab, mainly because it is difficult to replicate the exact same load the production server sees (mainly because of the online games activity).

During those 3 years, I tried everything I could, from using other allocators, valgrind, debug builds and so on, without any success.

What is nice, is that we moved to an OurDelta build about a year ago, where InnoDB is able to print more memory statistics than the default MySQL version.

For instance it shows

Internal hash tables (constant factor + variable factor)
    Adaptive hash index 1455381240      (118999688 + 1336381552)
    Page hash           7438328
    Dictionary cache    281544240       (89251896 + 192292344)
    File system         254712  (82672 + 172040)
    Lock system         18597112        (18594536 + 2576)
    Recovery system     0       (0 + 0)
    Threads             408056  (406936 + 1120)
    innodb_io_pattern   0       (0 + 0)

Back several month ago, I analyzed this output just to see what figures were growing, and found that the Dictionary Cache variable part was increasing (slowly but definitely).

Sure fine MySQL experts would have been able to tell me exactly what, when and where the problem was, but since I’m not familiar with the code-base, I looked up what this number was and where it was increased (all in dict0dict.c) and added some logs each time it was increased.

I then installed this version for a quite long time (just to check it wouldn’t crash on production) on a slave server. But this server didn’t print anything interesting because it doesn’t see the exact same load the production masters.

A couple of months after that, I moved this code to one of the master and bingo! I found the operation and the tables exhibiting an increase:

mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank_tmp2 193330680 + 8112
mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank 193338792 + 8112

As soon as I saw the operation and table (ie rank), I found what the culprit is. We have a daemon that every 10s computes the player ranks for our online games.

To do this, we’re using the following pattern:

-- compute the ranks
SELECT NULL, playerID
FROM game_score as g
ORDER BY g.rankscore DESC
INTO OUTFILE "/tmp/rank_tmp.tmp"

-- load back the scores
LOAD DATA INFILE "/tmp/rank_tmp.tmp" INTO TABLE rank_tmp

-- swap tables so that clients see new ranks atomatically
RENAME TABLE rank TO rank_tmp2 , rank_tmp TO rank, rank_tmp2 TO rank_tmp

-- truncate the old ranks for a new pass
TRUNCATE TABLE rank_tmp

-- go back to the select above

You might ask why I’m doing a so much convoluted system, especially the SELECT INTO OUTFILE and the LOAD DATA. It’s just because INSERT … SELECT with innodb and binlog enabled can produce transactions abort (which we were getting tons of).

Back to the original issue, apparently the issue lies in the RENAME part of the daemon.

Looking at the dict0dict.c dict_table_rename_in_cache function we see:

ibool
dict_table_rename_in_cache(...)
...
  old_name = mem_heap_strdup(table-&gt;heap, table-&gt;name);
  table-&gt;name = mem_heap_strdup(table-&gt;heap, new_name);
...
}

Looking to mem_heap stuff, I discovered that each table has a heap associated in which InnoDB allocates various things. This heap can only grow (by block of 8112 bytes it seems), since the allocator is not a real one. This is done for performance reasons.

So each time we rename a table, the old name (why? since it is already allocated) is duplicated, along with the new name. Each time.

This heap is freed when the table is dropped, so there is a possibility to reclaim the used memory. That means this issue is not a memory leak per-se.

By the way, I’ve filed this bug on mysql bug system.

One work-around, beside fixing the code itself, would be to drop the rank table instead of truncating it. The issue with dropping/creating InnoDB table on a fast pace is that the dictionary cache itself will grow, because it can only grow as there is no way to purge it from old tables (except running one of the Percona patches). So the more tables we create the more we’ll use memory - back to square 0, but worst.

So right now, I don’t really have any idea on how to really fix the issue. Anyone having an idea, please do not hesitate to comment on this blog post :-)

And please, don’t tell me to move to MyISAM…

Help! Puppetd is eating my server!

This seems to be recurrent this last 3 or 4 days with a few #puppet, redmine or puppet-user requests, asking about why puppetd is consuming so much CPU and/or memory.

While I don’t have a definitive answer about why it could happen (hey all software components have bugs), I think it is important to at least know how to see what happens. I even include some common issues I myself have observed.

Know your puppetd

I mean, know what is puppetd doing. That’s easy, disable puppetd on the host where you have an issue, and try to run it manually in debug mode. I’m really astonished that almost nobody tries a debug run before complaining that something doesn’t work :-)

% puppetd --disable
% puppetd --test --debug --trace
... full output on the console ...

At the same time, monitor the CPU usage and look at the debug entries when most of the CPU is consumed.
If nothing is printed at this same moment, and it still uses CPU, CTRL-C the process, maybe it will print a useful stack trace that will help you (or us) understand what happens.

With this you will certainly catch things you didn’t intend (see below computing checksums when it is not necessary).

Inspect your ruby interpreter

I already mentioned this tip in my puppetmaster memory leak post a month ago. You can’t imagine how much useful information you can get with this tool.

Install as explained in the original article the ruby file into ~/.gdb/ruby, copy the following into your ~/.gdbinit:

define session-ruby
  source ~/.gdb/ruby
end

Here I’m going to show how to do this with a puppetmasterd, but it is exactly the same thing with puppetd.

Basically, the idea is to attach gdb to the puppet process, halt it and look to the current stack trace:

% ps auxgww | grep puppetd
puppet   28602  2.0  8.9 275508 184492 pts/3   Sl+  Feb19  65:25 ruby /usr/bin/puppetmasterd --debug

% gdb /usr/bin/ruby
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
...
(gdb) session-ruby
(gdb) attach 28602
Attaching to program: /usr/bin/ruby, process 28602
...

Now our gdb is attached to our ruby interpreter.
Lets see where we stopped:

(gdb) rb_backtrace
$3 = 34

Note: the output is displayed by default on the stdout/stderr of the attached process, so in our case my puppetmasterd. Going to the terminal where it runs (actually the screen):

...
        from /usr/lib/ruby/1.8/webrick/server.rb:91:in `select'
        from /usr/lib/ruby/1.8/webrick/server.rb:91:in `start'
        from /usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
        from /usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
        from /usr/lib/ruby/1.8/puppet.rb:293:in `start'
        from /usr/lib/ruby/1.8/puppet.rb:144:in `newthread'
        from /usr/lib/ruby/1.8/puppet.rb:143:in `initialize'
        from /usr/lib/ruby/1.8/puppet.rb:143:in `new'
        from /usr/lib/ruby/1.8/puppet.rb:143:in `newthread'
        from /usr/lib/ruby/1.8/puppet.rb:291:in `start'
        from /usr/lib/ruby/1.8/puppet.rb:290:in `each'
        from /usr/lib/ruby/1.8/puppet.rb:290:in `start'
        from /usr/sbin/puppetmasterd:285

It works!
It is now easy to see what puppetd is doing:

  1. introspect your running and eating puppetd
  2. stop it (issue CTRL-C in gdb)
  3. rb_backtrace, copy the backtrace in a file
  4. issue ‘continue’ in gdb to let the process run again
  5. go to 2. several times

Examining the stack traces should give you hints (or us) to what your puppetd is doing at this moment.

Possible causes of puppetd CPU consumption

A potential bug

You might have encountered a bug. Please report it in Puppet redmine, and enclose all the useful information you gathered by following the two points above.

A recursive file resource with checksum on

That’s the usual suspect, and one I encountered myself.

Let’s say you have something like this in your manifest:

File { checksum =&gt; md5 }
...

file {
  &quot;/path/to/so/many/files&quot;:
    owner =&gt; myself, mode =&gt; 0644, recurse =&gt; true
}

What does that mean?
You’re telling puppet that every file resource should compute checksum, and you have a recursive file operation managing owner and mode. What puppetd will do is to traverse the whole ‘/path/to/so/many/files’ and happily manage them changing owner and mode when needed.
What you might have forgotten, is that you requested checksum to be MD5, so puppetd instead of only doing a bunch of stat(3) on your files will also compute MD5 sums of their content.
If you have tons of files in this hierarchy this can take quite some time. Since checksums are cached, it can also take quite some memory.

How to solve this issue:

File { checksum =&gt; md5 }
...

file {
  &quot;/path/to/so/many/files&quot;:
    owner =&gt; myself, mode =&gt; 0644, recurse =&gt; true, checksum =&gt; undef
}

Sometimes, it isn’t possible to solve this issue, if your file {} resource is a retrieve file (ie there is a source parameter), because you need to have checksum to manage the files. In this case, just byte the bullet, change the checksum to mtime, limit recursion or wait for my fix of Puppet bug #1469.

Simply no reason

Actually it is in your interest that puppetd is taking 100% of CPU while applying the configuration the puppetmaster has given. That just means it’ll do its job faster than if it was consuming 10% of CPU :-)

I mean, puppetd has a fixed amount of things to perform, some are CPU bound, some are I/O bound (actually most are I/O bound), so it is perfectly normal that it takes wall clock time and consume resources to play your manifests.

What is not normal is consuming CPU or memory between configuration run. But you already know how to diagnose such issues if you read the start of this post :-)

Conclusion

Not all resource consumption are bad.

We’re all dreaming of a faster puppetd.

And at this subject, I think it should be possible (provided ruby supports native thread (maybe a task for JRuby)) to apply the catalog in a multi-threaded way. I never really thought about this (I mean technically), but I don’t see why it couldn’t be possible. That would allow puppetd to do several I/O bound operations in parallel (like installing packages and managing files at the same time).

Masterzen's Pictures

Golden Gate from Fort Point

Golden Gate from Fort Point

This tree was old...

This tree was old...

Golden Gate from the other Side

Golden Gate from the other Side

masterzen's photo

masterzen's photo

Puppeteers on the Pier

Puppeteers on the Pier

Ohad testing the offroad Segway

Ohad testing the offroad Segway

Muir Woods redwoods

Muir Woods redwoods

Golden Gate from the other Side

Golden Gate from the other Side