Journey in a software world…
10 Jan
I’m really proud to announce the release of the version 1.0 of mysql-snmp.
mysql-snmp is a mix between the excellent MySQL Cacti Templates and a Net-SNMP agent. The idea is that combining the power of the MySQL Cacti Templates and any SNMP based monitoring would unleash a powerful mysql monitoring system. Of course this project favorite monitoring system is OpenNMS.
mysql-snmp is shipped with the necessary OpenNMS configuration files, but any other SNMP monitoring software can be used (provided you configure it).
To get there, you need to run a SNMP agent on each MySQL server, along with mysql-snmp. Then OpenNMS (or any SNMP monitoring software) will contact it and fetch the various values.
Mysql-snmp exposes a lot of useful values including but not limited to:
Here are some graph examples produced with OpenNMS 1.6.5 and mysql-snmp 1.0 on one of Days of Wonder MySQL server (running a MySQL 5.0 Percona build):
mysql-snmp is available in my github repository. The repository contains a spec file to build a RPM and what is needed to build a Debian package. Refer to the README or the mysql-snmp page for more information.
Thanks to gihub, it is possible to download the tarball instead of using Git:
This lists all new features/options from the initial version v0.6:
Please use Github issue system to report any issues.
There is a little issue here. mysql-snmp uses Net-Snmp. Not all versions of Net-Snmp are supported as some older versions have some bug for dealing with Counter64. Version 5.4.2.1 with this patch is known to work fine.
Also note that this project uses some Counter64, so make sure you configure your SNMP monitoring software to use SNMP v2c or v3 (SNMP v1 doesn’t support 64 bits values).
I wish everybody an happy new year. Consider this new version as my Christmas present to the community
15 Oct
At Days of Wonder we are huge fans of MySQL (and since about a year of the various Open Query, Percona, Google or other community patches), up to the point we’re using MySQL for about everything in production.
But since we moved to 5.0, back 3 years ago our production databases which hold our website and online game systems has a unique issue: the mysqld process uses more and more RAM, up to the point where the kernel OOM decide to kill the process.
You’d certainly think we are complete morons because we didn’t do anything in the last 3 years to fix the issue
Unfortunately, I never couldn’t replicate the issue in the lab, mainly because it is difficult to replicate the exact same load the production server sees (mainly because of the online games activity).
During those 3 years, I tried everything I could, from using other allocators, valgrind, debug builds and so on, without any success.
What is nice, is that we moved to an OurDelta build about a year ago, where InnoDB is able to print more memory statistics than the default MySQL version.
For instance it shows
Internal hash tables (constant factor + variable factor) Adaptive hash index 1455381240 (118999688 + 1336381552) Page hash 7438328 Dictionary cache 281544240 (89251896 + 192292344) File system 254712 (82672 + 172040) Lock system 18597112 (18594536 + 2576) Recovery system 0 (0 + 0) Threads 408056 (406936 + 1120) innodb_io_pattern 0 (0 + 0)
Back several month ago, I analyzed this output just to see what figures were growing, and found that the Dictionary Cache variable part was increasing (slowly but definitely).
Sure fine MySQL experts would have been able to tell me exactly what, when and where the problem was, but since I’m not familiar with the code-base, I looked up what this number was and where it was increased (all in dict0dict.c) and added some logs each time it was increased.
I then installed this version for a quite long time (just to check it wouldn’t crash on production) on a slave server. But this server didn’t print anything interesting because it doesn’t see the exact same load the production masters.
A couple of months after that, I moved this code to one of the master and bingo! I found the operation and the tables exhibiting an increase:
mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank_tmp2 193330680 + 8112 mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank 193338792 + 8112
As soon as I saw the operation and table (ie rank), I found what the culprit is. We have a daemon that every 10s computes the player ranks for our online games.
To do this, we’re using the following pattern:
-- compute the ranks SELECT NULL, playerID FROM game_score as g ORDER BY g.rankscore DESC INTO OUTFILE "/tmp/rank_tmp.tmp" -- load back the scores LOAD DATA INFILE "/tmp/rank_tmp.tmp" INTO TABLE rank_tmp -- swap tables so that clients see new ranks atomatically RENAME TABLE rank TO rank_tmp2 , rank_tmp TO rank, rank_tmp2 TO rank_tmp -- truncate the old ranks for a new pass TRUNCATE TABLE rank_tmp -- go back to the select above
You might ask why I’m doing a so much convoluted system, especially the SELECT INTO OUTFILE and the LOAD DATA. It’s just because INSERT … SELECT with innodb and binlog enabled can produce transactions abort (which we were getting tons of).
Back to the original issue, apparently the issue lies in the RENAME part of the daemon.
Looking at the dict0dict.c dict_table_rename_in_cache function we see:
ibool dict_table_rename_in_cache(...) ... old_name = mem_heap_strdup(table->heap, table->name); table->name = mem_heap_strdup(table->heap, new_name); ... }
Looking to mem_heap stuff, I discovered that each table has a heap associated in which InnoDB allocates various things. This heap can only grow (by block of 8112 bytes it seems), since the allocator is not a real one. This is done for performance reasons.
So each time we rename a table, the old name (why? since it is already allocated) is duplicated, along with the new name. Each time.
This heap is freed when the table is dropped, so there is a possibility to reclaim the used memory. That means this issue is not a memory leak per-se.
By the way, I’ve filed this bug on mysql bug system.
One work-around, beside fixing the code itself, would be to drop the rank table instead of truncating it. The issue with dropping/creating InnoDB table on a fast pace is that the dictionary cache itself will grow, because it can only grow as there is no way to purge it from old tables (except running one of the Percona patches). So the more tables we create the more we’ll use memory – back to square 0, but worst.
So right now, I don’t really have any idea on how to really fix the issue. Anyone having an idea, please do not hesitate to comment on this blog post
And please, don’t tell me to move to MyISAM…
13 Apr
Thanks to Days of Wonder the company I work for, I’m proud to release in Free Software (GPL):
At Days of Wonder, we’re using MySQL for almost everything since the beginning of the company. We were initially monitoring all our infrastructure with mon and Cricket, including our MySQL servers. Nine months ago I migrated the monitoring infrastructure to OpenNMS, and at the same we lost the Cricket MySQL monitoring (which was done with direct SQL SHOW STATUS LIKE commands).
I had to find another way, and since OpenNMS excels at SNMP, it was natural to monitor MySQL through SNMP. My browsing crossed this blog post. At about the same time I noticed that Baron Schwartz had released some very good MySQL Cacti Templates, so I decided I should cross both project and started working on mysql-snmp on my free time.
Hopefully, Days of Wonder has an IANA SNMP enterprises sub-number (20267, we use this for monitoring our game servers), so the MIB I wrote for this project is hosted in a natural place in the MIB hierarchy.
It’s a Net-SNMP perl subagent that connects to your MySQL server, and reports various statistics (from show status or show innodb status, or even replication) through SNMP.
If you followed this blog from the very start, you know we’re using OpenNMS to monitor Days of Wonder infrastructure. So I included the various OpenNMS configuration bit to display nice and usable graphs, inspired by the excellent MySQL Cacti Templates.
Here are some examples:
The code is hosted in my github repository, and everything you should know is in the mysql-snmp page on my site.
If you use this software, please do not hesitate to contribute, and/or fix bugs
18 Mar
When I wrote my previous post titled all about storedconfigs, I was pretty confident I explained everything I could about storedconfigs… I was wrong of course
A couple of days ago, I was helping some USG admins who were facing an interesting issue. Interesting for me, but I don’t think they’d share my views on this, as their servers were melting down under the database load.
But first let me explain the issue.
The thing is that when a client checks in to get its configuration, the puppetmaster compiles its configuration to a digestible format and returns it. This operation is the process of transforming the AST built by parsing the manifests to what is called the catalog in Puppet. This is this catalog (which in fact is a graph of resources) which is later played by the client.
When the compilation process is over, and if storedconfigs is enabled on the master, the master connects to the RDBMS, and retrieves all the resources, parameters, tags and facts. Those, if any, are compared to what has just been compiled, and if some resources differs (by value/content, or if there are some missing or new ones), they get written to the database.
Pretty straightforward, isn’t it?
As you can see, this process is synchronous and while the master processes the storedconfigs operations, it doesn’t serve anybody else.
Now, imagine you have a large site (ie hundreds of puppetd clients), and you decide to turn on storedconfigs. All the clients checking in will see their current configuration stored in the database.
Unfortunately the first run of storedconfigs for a client, the database is empty, so the puppetmaster has to send all the information to the RDBMS which in turns as to write it to the disks. Of course on subsequent runs only what is modified needs to reach the RDBMS which is much less than the first time (provided you are running 0.24.8 or applied my patch).
But if your RDBMS is not correctly setup or not sized for so much concurrent write load, the storedconfigs process will take time. During this time this master is pinned to the database and can’t serve clients. So the immediate effect is that new clients checking in will see timeouts, load will rise, and so on.
If you are in the aforementioned scenario you must be sure your RDBMS hardware is properly sized for this peak load, and that your database is properly tuned.
I’ll soon give some generic MySQL tuning advices to let MySQL handle the load, but remember those are generic so YMMV.
What people usually forget is that disk (ie those with rotating plates, not SSDs) have a maximum number of I/O operations per seconds. This value is for professional high-end disks about 250 IOP/s.
Now, to simplify, let’s say your average puppet client has 500 resources with an average of 4 parameters each. That means the master will have to perform at least 500 * 4 + 500 = 2500 writes to the database (that’s naive since there are indices to modify, and transactions can be grouped, etc.. but you see the point).
Add to this the tags, hmm let’s say an average of 4 tags per resources, and we have 500 * 4 + 500 + 500 * 4 = 4500 writes to perform to store the configuration of a given host.
Now remember our 250 IOP/s, how many seconds does the disk need to performs 4500 writes?
The answer is 18s!! Which is a high value. During this time you can’t do anything else. Now add concurrency to the mix, and imagine what that means.
Of course this supposes we have to wait for the disk to have finished (ie synchronous writing), but in fact that’s pretty how RDBMS are working if you really want to trust your data.
So the result is that if you want a fast RDBMS you must be ready to pay for an expensive I/O subsystem.
That’s certainly the most important part of your server.
You need:
If you don’t have this, do not even think turning on storedconfigs for a large site.
Of course other things matters. If the database can fit in RAM (the best if you don’t want to be I/O bound), then you obviously need RAM. Preferably ECC Registered RAM. Use 64 bits hardware with a 64 bits OS.
Then you need some CPU. Nowadays they’re cheap, but beware of InnoDB scaling issues on multi-core/multi-CPU systems (see below).
Here is a checklist on how to tune MySQL for a mostly write load:
For concurrency, stability and durability reasons InnoDB is mandatory. MyISAM is at best usable for READ workload but suffers concurrency issues so it is a no-no for our topic
The default InnoDB settings are tailored to very small 10 years old servers…
Things to look to:
The fine people at Percona or Ourdelta produces some patched builds of MySQL that removes some of the MySQL InnoDB scalability issues. This is more important on high concurrency workload on multi-core/multi-cpu systems.
It can also be good to run MySQL with Google’s perftools TCMalloc. TCMalloc is a memory allocator which scales way better than the Glibc one.
The immediate and most straightforward idea is to limit the number of clients that can check in at the same time. This can be done by disabling puppetd on each client (puppetd –disable), blocking network access, or any other creative mean…
When all the active hosts have checked in, you can then enable the other ones. This can be done hundreds of hosts at a time, until all hosts have a configuration stored.
Another solution is to direct some hosts to a special puppetmaster with storedconfigs on (the regular one still has storedconfigs disabled), by playing with DNS or by configuration, whatever is simplest in your environment. Once those hosts have their config stored, move them back to their regular puppetmaster and move newer hosts there.
Since that’s completely manual, it might be unpractical for you, but that’s the simplest method.
As long as your manifests are only slightly changing, subsequent runs will see only a really limited database activity (if you run a puppetmaster >= 0.24.8). That means the tuning we did earlier can be undone (for instance you can lower the innodb_log_file_size for instance, and adjust the innodb_buffer_pool_size to the size of the hot set).
But still storedconfigs can double your compilation time. If you are already at the limit compared to the number of hosts, you might see some client timeouts.
Today Luke announced on the puppet-dev list that they were working on a queuing system to defer storedconfigs and smooth out the load by spreading it on a longer time. But still, tuning the database is important.
The idea is to offload the storedconfigs to another daemon which is hooked behind a queuing system. After the compilation the puppetmaster queues the catalog, where it will be unqueued by the puppet queue daemon which will in turn execute the storedconfigs process.
I don’t know the ETA for this interesting feature, but meanwhile I hope the tips I provided here can be of any help to anyone
Stay tuned for more puppet stories!
8 Mar
Since a long time people (including me) complained that storeconfigs was a real resource hog. Unfortunately for us, this option is so cool and useful.
Storeconfigs is a puppetmasterd option that stores the nodes actual configuration to a database. It does this by comparing the result of the last compilation against what is actually in the database, resource per resource, then parameter per parameter, and so on.
The actual implementation is based on Rails’ Active Record, which is a great way to abstract the gory details of the database, and prototype code easily and quickly (but has a few shortcomings).
The immediate use of storeconfigs is exported resources. Exported resources are resources which are prefixed by @@. Those resources are marked specially so that they can be collected on several other nodes.
A little completely dumb example speaks by itself:
class exporter {
@@file {
"/var/lib/puppet/nodes/$fqdn": content => "$ipaddress\n", tag => "ip"
}
}
node "export1.daysofwonder.com" {
include exporter
}
node "export2.daysofwonder.com" {
include exporter
}
node "collector.daysofwonder.com" {
File <<| tag == "ip" |>>
}
What does this example do?
That’s simple, all the exporter nodes creates a file in /var/lib/puppet/nodes whose name is the node name and whose content is its primary IP address.
What is interesting is that the node “collector.daysofwonder.com” collects all files tagged by “ip“, that is all the exported files. In the end, after exporter1, exporter2 and collector have run a compilation, the collector host will have the /var/lib/puppet/nodes/exporter1.daysofwonder.com and /var/lib/puppet/nodes/exporter2.daysofwonder.com and their respective content.
Got it?
That’s the perfect tool for instance to automatically:
Still there is another use, since the whole configuration of your nodes is in an RDBMS, you can use that to perform some data-mining about your hosts configuration. That’s what puppetshow does.
The storeconfigs issue its current incarnation (ie 0.24.7) is that it is a slow feature (it usually doubles the compilation time), and imposes an higher load on the puppetmaster and the database engine.
For large installation it might not possible to be able to run with this feature on. There were also some reports of high memory usage or leak with this feature on (see my recommendation about this in my puppetmaster memory leak post).
Here my usual puppet and storeconfigs recommendations:
I think the last point deserves a little bit more explanation:
I had the following schematized pattern in some of my manifests, that I took from David Schmitt excellent modules:
in one class:
if defined(File["/var/lib/puppet/modules/djbdns.d/"]) {
warn("already defined")
} else {
file {
"/var/lib/puppet/modules/djbdns.d/": ...
}
}
and in another class the exact same code:
if defined(File["/var/lib/puppet/modules/djbdns.d/"]) {
warn("already defined")
} else {
file {
"/var/lib/puppet/modules/djbdns.d/": ...
}
}
What happens is that from run to run the evaluation order could change, and the defined resource could be the one in the first class and another time it could be the one in the second class, which meant the storeconfigs code had to remove the resources from the database and re-create them again. Clearly not the best way to have less database workload
I contributed for 0.24.8 a partial rewrite of some parts of the storeconfigs feature to increase its performance.
My analysis is that what was slow in the feature is threefold:
I fixed the first two points by attacking directly the database to fetch the parameters and tags, keeping them in hash instead of objects. This saves a large number of database request and at the same time it prevents a large number of ruby objects to be created (it should even save some memory).
The last point was fixed by imposing a strict order (although not completely correct, but still better that how it was) in the way the tags are assigned to resources.
Both patches have been merged for 0.24.8, and some people reported some performance improvements.
On the Days of Wonder infrastructure I found that with a 562 resources node, on a tuned mysql database:
That’s a nice improvement, isn’t it
Luke and I discussed about this, it was also discussed on the puppet-dev list a few times. I think that a RDBMS might not be the right storage choice for this feature, because clearly there is almost no random keyed access to the individual parameters of a resource (so having a table dedicated to parameters is of almost no use).
I know Luke’s plan is to abstract the storeconfigs feature from the current implementation (certainly through the indirector), so that we can use different storeconfigs engines.
I also know that someone is working on a promising CouchDB implementation. I myself can see a memcached implementation (which I’d really like to start working on). Maybe even the filesystem would be enough.
Of course, I’m open to any other improvements or storage engine ideas
12 Jan
Since a few months we are monitoring our infrastructure at Days of Wonder with OpenNMS. Until this afternoon we were running the beta/final candidate version 1.5.93.
We are monitoring a few things with the JDBC Stored Procedure Poller, which is really great to monitor complex business operations without writing remote or GP scripts.
Unfortunately the migration to OpenNMS 1.6.1 led me to discover that the JDBC Stored Procedure poller was not working anymore, crashing with a NullPointerException in the MySQL JDBC Driver while trying to fetch the output parameter.
In fact it turned out I was plain wrong. I was using a MySQL PROCEDURE:
DELIMITER // CREATE PROCEDURE `check_for_something`() READS SQL DATA BEGIN SELECT ... as valid FROM ... END //
But this OpenNMS poller uses the following JDBC procedure call:
{ ? = call check_for_something() }
After a few struggling, wrestling, and various MySQL JDBC Connector/J driver upgrades, I finally figured out what the driver was doing:
The driver rewrites the call I gave above to something like this:
SELECT check_for_something();
This means that the procedure should in fact be a SQL FUNCTION.
Here is the same procedure rewritten as a FUNCTION:
DELIMITER // CREATE FUNCTION `check_for_something`() RETURNS int(11) READS SQL DATA DETERMINISTIC BEGIN DECLARE valid INTEGER; SELECT ... INTO valid FROM ... RETURN valid; END //
It now works. I’m amazed it even worked in the first time with 1.5.93 (it was for sure).
Recent Comments