Journey in a software world…
8 Mar
Since a long time people (including me) complained that storeconfigs was a real resource hog. Unfortunately for us, this option is so cool and useful.
Storeconfigs is a puppetmasterd option that stores the nodes actual configuration to a database. It does this by comparing the result of the last compilation against what is actually in the database, resource per resource, then parameter per parameter, and so on.
The actual implementation is based on Rails’ Active Record, which is a great way to abstract the gory details of the database, and prototype code easily and quickly (but has a few shortcomings).
The immediate use of storeconfigs is exported resources. Exported resources are resources which are prefixed by @@. Those resources are marked specially so that they can be collected on several other nodes.
A little completely dumb example speaks by itself:
class exporter {
@@file {
"/var/lib/puppet/nodes/$fqdn": content => "$ipaddress\n", tag => "ip"
}
}
node "export1.daysofwonder.com" {
include exporter
}
node "export2.daysofwonder.com" {
include exporter
}
node "collector.daysofwonder.com" {
File <<| tag == "ip" |>>
}
What does this example do?
That’s simple, all the exporter nodes creates a file in /var/lib/puppet/nodes whose name is the node name and whose content is its primary IP address.
What is interesting is that the node “collector.daysofwonder.com” collects all files tagged by “ip“, that is all the exported files. In the end, after exporter1, exporter2 and collector have run a compilation, the collector host will have the /var/lib/puppet/nodes/exporter1.daysofwonder.com and /var/lib/puppet/nodes/exporter2.daysofwonder.com and their respective content.
Got it?
That’s the perfect tool for instance to automatically:
Still there is another use, since the whole configuration of your nodes is in an RDBMS, you can use that to perform some data-mining about your hosts configuration. That’s what puppetshow does.
The storeconfigs issue its current incarnation (ie 0.24.7) is that it is a slow feature (it usually doubles the compilation time), and imposes an higher load on the puppetmaster and the database engine.
For large installation it might not possible to be able to run with this feature on. There were also some reports of high memory usage or leak with this feature on (see my recommendation about this in my puppetmaster memory leak post).
Here my usual puppet and storeconfigs recommendations:
I think the last point deserves a little bit more explanation:
I had the following schematized pattern in some of my manifests, that I took from David Schmitt excellent modules:
in one class:
if defined(File["/var/lib/puppet/modules/djbdns.d/"]) {
warn("already defined")
} else {
file {
"/var/lib/puppet/modules/djbdns.d/": ...
}
}
and in another class the exact same code:
if defined(File["/var/lib/puppet/modules/djbdns.d/"]) {
warn("already defined")
} else {
file {
"/var/lib/puppet/modules/djbdns.d/": ...
}
}
What happens is that from run to run the evaluation order could change, and the defined resource could be the one in the first class and another time it could be the one in the second class, which meant the storeconfigs code had to remove the resources from the database and re-create them again. Clearly not the best way to have less database workload
I contributed for 0.24.8 a partial rewrite of some parts of the storeconfigs feature to increase its performance.
My analysis is that what was slow in the feature is threefold:
I fixed the first two points by attacking directly the database to fetch the parameters and tags, keeping them in hash instead of objects. This saves a large number of database request and at the same time it prevents a large number of ruby objects to be created (it should even save some memory).
The last point was fixed by imposing a strict order (although not completely correct, but still better that how it was) in the way the tags are assigned to resources.
Both patches have been merged for 0.24.8, and some people reported some performance improvements.
On the Days of Wonder infrastructure I found that with a 562 resources node, on a tuned mysql database:
That’s a nice improvement, isn’t it
Luke and I discussed about this, it was also discussed on the puppet-dev list a few times. I think that a RDBMS might not be the right storage choice for this feature, because clearly there is almost no random keyed access to the individual parameters of a resource (so having a table dedicated to parameters is of almost no use).
I know Luke’s plan is to abstract the storeconfigs feature from the current implementation (certainly through the indirector), so that we can use different storeconfigs engines.
I also know that someone is working on a promising CouchDB implementation. I myself can see a memcached implementation (which I’d really like to start working on). Maybe even the filesystem would be enough.
Of course, I’m open to any other improvements or storage engine ideas
21 Feb
This seems to be recurrent this last 3 or 4 days with a few #puppet, redmine or puppet-user requests, asking about why puppetd is consuming so much CPU and/or memory.
While I don’t have a definitive answer about why it could happen (hey all software components have bugs), I think it is important to at least know how to see what happens. I even include some common issues I myself have observed.
I mean, know what is puppetd doing. That’s easy, disable puppetd on the host where you have an issue, and try to run it manually in debug mode. I’m really astonished that almost nobody tries a debug run before complaining that something doesn’t work
% puppetd --disable % puppetd --test --debug --trace ... full output on the console ...
At the same time, monitor the CPU usage and look at the debug entries when most of the CPU is consumed.
If nothing is printed at this same moment, and it still uses CPU, CTRL-C the process, maybe it will print a useful stack trace that will help you (or us) understand what happens.
With this you will certainly catch things you didn’t intend (see below computing checksums when it is not necessary).
I already mentioned this tip in my puppetmaster memory leak post a month ago. You can’t imagine how much useful information you can get with this tool.
Install as explained in the original article the ruby file into ~/.gdb/ruby, copy the following into your ~/.gdbinit:
define session-ruby source ~/.gdb/ruby end
Here I’m going to show how to do this with a puppetmasterd, but it is exactly the same thing with puppetd.
Basically, the idea is to attach gdb to the puppet process, halt it and look to the current stack trace:
% ps auxgww | grep puppetd puppet 28602 2.0 8.9 275508 184492 pts/3 Sl+ Feb19 65:25 ruby /usr/bin/puppetmasterd --debug % gdb /usr/bin/ruby GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. ... (gdb) session-ruby (gdb) attach 28602 Attaching to program: /usr/bin/ruby, process 28602 ...
Now our gdb is attached to our ruby interpreter.
Lets see where we stopped:
(gdb) rb_backtrace $3 = 34
Note: the output is displayed by default on the stdout/stderr of the attached process, so in our case my puppetmasterd. Going to the terminal where it runs (actually the screen):
...
from /usr/lib/ruby/1.8/webrick/server.rb:91:in `select'
from /usr/lib/ruby/1.8/webrick/server.rb:91:in `start'
from /usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
from /usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
from /usr/lib/ruby/1.8/puppet.rb:293:in `start'
from /usr/lib/ruby/1.8/puppet.rb:144:in `newthread'
from /usr/lib/ruby/1.8/puppet.rb:143:in `initialize'
from /usr/lib/ruby/1.8/puppet.rb:143:in `new'
from /usr/lib/ruby/1.8/puppet.rb:143:in `newthread'
from /usr/lib/ruby/1.8/puppet.rb:291:in `start'
from /usr/lib/ruby/1.8/puppet.rb:290:in `each'
from /usr/lib/ruby/1.8/puppet.rb:290:in `start'
from /usr/sbin/puppetmasterd:285
It works!
It is now easy to see what puppetd is doing:
Examining the stack traces should give you hints (or us) to what your puppetd is doing at this moment.
You might have encountered a bug. Please report it in Puppet redmine, and enclose all the useful information you gathered by following the two points above.
That’s the usual suspect, and one I encountered myself.
Let’s say you have something like this in your manifest:
File { checksum => md5 }
...
file {
"/path/to/so/many/files":
owner => myself, mode => 0644, recurse => true
}
What does that mean?
You’re telling puppet that every file resource should compute checksum, and you have a recursive file operation managing owner and mode. What puppetd will do is to traverse the whole ‘/path/to/so/many/files’ and happily manage them changing owner and mode when needed.
What you might have forgotten, is that you requested checksum to be MD5, so puppetd instead of only doing a bunch of stat(3) on your files will also compute MD5 sums of their content.
If you have tons of files in this hierarchy this can take quite some time. Since checksums are cached, it can also take quite some memory.
How to solve this issue:
File { checksum => md5 }
...
file {
"/path/to/so/many/files":
owner => myself, mode => 0644, recurse => true, checksum => undef
}
Sometimes, it isn’t possible to solve this issue, if your file {} resource is a retrieve file (ie there is a source parameter), because you need to have checksum to manage the files. In this case, just byte the bullet, change the checksum to mtime, limit recursion or wait for my fix of Puppet bug #1469.
Actually it is in your interest that puppetd is taking 100% of CPU while applying the configuration the puppetmaster has given. That just means it’ll do its job faster than if it was consuming 10% of CPU
I mean, puppetd has a fixed amount of things to perform, some are CPU bound, some are I/O bound (actually most are I/O bound), so it is perfectly normal that it takes wall clock time and consume resources to play your manifests.
What is not normal is consuming CPU or memory between configuration run. But you already know how to diagnose such issues if you read the start of this post
Not all resource consumption are bad.
We’re all dreaming of a faster puppetd.
And at this subject, I think it should be possible (provided ruby supports native thread (maybe a task for JRuby)) to apply the catalog in a multi-threaded way. I never really thought about this (I mean technically), but I don’t see why it couldn’t be possible. That would allow puppetd to do several I/O bound operations in parallel (like installing packages and managing files at the same time).
5 Feb
Yesterday we had the February Puppet Dev Call with unfortunately poor audio, lots of Skype disconnections which for a non native English speaker like me rendered the call difficult to follow (what is strange is that the one I could hear the best was Luke)
But that was an important meeting, as we know how the development process will continue from now on. It was agreed (because it makes real sense) to have the master as current stable and fork a ‘next’ branch for on-going development of the next version.
The idea is that newcomers will just have to git clone the repository to produce a bug fix or stable feature, without having to wonder (or read the development process wiki page) where/how to get the code.
It was also decided that 0.25 was really imminent with a planned release date later this month. Arghhh, this doesn’t leave me lots of time to finish the Application Controller stuff I’m currently working on. The issue is that I procrastinated a little bit with the storeconfigs speed-up patch (which I hope will be merged for 0.25), and a few important 0.24.x bug fixes.
There was also a discussion about what should be part of the Puppet core and what shouldn’t (like the recent zenoss patch). Digression: I’m considering doing an OpenNMS type/provider like the Zenoss or Nagios one. Back to the real topic. It was proposed to have a repository of non-core features, but this essentially only creates more troubles, including but not limited to:
Someone suggested (sorry can’t remember who) that we need a packaging system to fill this hole, but I don’t think it is satisfactory. I understand the issue, but have no immediate answer to this question (that’s why I didn’t comment on this topic during the call).
Second digression: if you read this and want to contribute to Puppet (because that’s a wonderful software, a great developer team, a nicely and well-done codebase), I can’t stress you too much to read the following wiki pages:
Also come by to #puppet and/or the puppet-dev google groups, we’re ready to help!
19 Jan
From time to time we get some complaints about so-called Puppet memory leaks either on #puppet, on the puppet-user list or in the Puppet redmine.
I tried hard to reproduce the issue on the Days of Wonder servers (mostly up-to-date debian), but never could. Starting from there I tried to gather from the various people I talked to on various channels what could be the cause, if they solved it and how.
You also can be sure there are no memory leaks in the Puppet source code. All of the identified memory leaks are either not memory leaks per-se or are caused by an out of control code base (ruby itself or a library).
It is known that there are some ruby versions (around 1.8.5 and 1.8.6) exhibiting some leaks of some sort. This is especially true for RHEL 4 and 5 versions (and some Fedora ones too), as I found with the help of one Puppet user, or as others found.
Upgrading Ruby to 1.8.7-pl72 either from source or any repositories is usually enough to fix it.
I also encountered some people that told me that storeconfigs with MySQL but without the real ruby-mysql gem, lead to some increasing memory footprint for their puppetmaster.
It seems also to be a common advice to use Rails 2.1 if you use storeconfigs. I don’t know if Puppet uses this, but it seems that nested includes leaks in rails 2.0.
The previous items I outlined above are real leaks. Some people (including myself) encountered a different issue: the puppetmaster is consuming lots of memory while doing file transfer to the clients.
In fact, up to Puppet 0.25 (not yet released at this time), Puppet is using XMLRPC as its communication protocol. Unfortunately this is not a transfer protocol, it is a Remote Procedure Call protocol. It means that to transfer binary files, Puppet has to load the whole file in memory, and then it escapes its content (same escaping as URL, which means every byte outside of 32-127 will take 3 bytes). Usually that means the master has to allocate roughly 2.5 times the size of the current transferred file. Puppet 0.25 will use REST (so native HTTP) to transfer files, which will bring speed and streaming to file serving.
Hopefully, if the Garbage Collector has a chance to trigger (because your ruby interpreter is not too much loaded), it will de-allocate all these memory used for files. If you are not so lucky, the ruby interpreter don’t have time to run a full garbage cycle, and the memory usage grows.
Some people running high-load puppetmaster have separated their file serving puppetmaster from their config serving puppetmaster to alleviate this issue.
Also, if like me you are using file recursive copy, you might encounter Bug #1469 File recursion with a remote source should not recurse locally.
Here is how you can find leaks in a ruby application:
I tried the three aforementioned techniques, and found that the GDB trick is the easier one to use and setup.
There’s also something that I think hasn’t been tried yet: running Puppet under a different Ruby interpreter (we’d say Virtual Machine in this case). For instance JRuby is running on top of the Java Virtual Machine which has more than 14 years of Garbage Collection development behind it.
You also can be sure than a different Ruby interpreter won’t have the same bug or memory leak as the regular one (the so called Matz Ruby interpreter from the name of his author).
There are some nice Ruby VM under development right now, and I’m sure I’ll blog about using Puppet on some of them soon
Recent Comments