Journey in a software world…
10 Jan
I’m really proud to announce the release of the version 1.0 of mysql-snmp.
mysql-snmp is a mix between the excellent MySQL Cacti Templates and a Net-SNMP agent. The idea is that combining the power of the MySQL Cacti Templates and any SNMP based monitoring would unleash a powerful mysql monitoring system. Of course this project favorite monitoring system is OpenNMS.
mysql-snmp is shipped with the necessary OpenNMS configuration files, but any other SNMP monitoring software can be used (provided you configure it).
To get there, you need to run a SNMP agent on each MySQL server, along with mysql-snmp. Then OpenNMS (or any SNMP monitoring software) will contact it and fetch the various values.
Mysql-snmp exposes a lot of useful values including but not limited to:
Here are some graph examples produced with OpenNMS 1.6.5 and mysql-snmp 1.0 on one of Days of Wonder MySQL server (running a MySQL 5.0 Percona build):
mysql-snmp is available in my github repository. The repository contains a spec file to build a RPM and what is needed to build a Debian package. Refer to the README or the mysql-snmp page for more information.
Thanks to gihub, it is possible to download the tarball instead of using Git:
This lists all new features/options from the initial version v0.6:
Please use Github issue system to report any issues.
There is a little issue here. mysql-snmp uses Net-Snmp. Not all versions of Net-Snmp are supported as some older versions have some bug for dealing with Counter64. Version 5.4.2.1 with this patch is known to work fine.
Also note that this project uses some Counter64, so make sure you configure your SNMP monitoring software to use SNMP v2c or v3 (SNMP v1 doesn’t support 64 bits values).
I wish everybody an happy new year. Consider this new version as my Christmas present to the community
13 Apr
Thanks to Days of Wonder the company I work for, I’m proud to release in Free Software (GPL):
At Days of Wonder, we’re using MySQL for almost everything since the beginning of the company. We were initially monitoring all our infrastructure with mon and Cricket, including our MySQL servers. Nine months ago I migrated the monitoring infrastructure to OpenNMS, and at the same we lost the Cricket MySQL monitoring (which was done with direct SQL SHOW STATUS LIKE commands).
I had to find another way, and since OpenNMS excels at SNMP, it was natural to monitor MySQL through SNMP. My browsing crossed this blog post. At about the same time I noticed that Baron Schwartz had released some very good MySQL Cacti Templates, so I decided I should cross both project and started working on mysql-snmp on my free time.
Hopefully, Days of Wonder has an IANA SNMP enterprises sub-number (20267, we use this for monitoring our game servers), so the MIB I wrote for this project is hosted in a natural place in the MIB hierarchy.
It’s a Net-SNMP perl subagent that connects to your MySQL server, and reports various statistics (from show status or show innodb status, or even replication) through SNMP.
If you followed this blog from the very start, you know we’re using OpenNMS to monitor Days of Wonder infrastructure. So I included the various OpenNMS configuration bit to display nice and usable graphs, inspired by the excellent MySQL Cacti Templates.
Here are some examples:
The code is hosted in my github repository, and everything you should know is in the mysql-snmp page on my site.
If you use this software, please do not hesitate to contribute, and/or fix bugs
21 Feb
This seems to be recurrent this last 3 or 4 days with a few #puppet, redmine or puppet-user requests, asking about why puppetd is consuming so much CPU and/or memory.
While I don’t have a definitive answer about why it could happen (hey all software components have bugs), I think it is important to at least know how to see what happens. I even include some common issues I myself have observed.
I mean, know what is puppetd doing. That’s easy, disable puppetd on the host where you have an issue, and try to run it manually in debug mode. I’m really astonished that almost nobody tries a debug run before complaining that something doesn’t work
% puppetd --disable % puppetd --test --debug --trace ... full output on the console ...
At the same time, monitor the CPU usage and look at the debug entries when most of the CPU is consumed.
If nothing is printed at this same moment, and it still uses CPU, CTRL-C the process, maybe it will print a useful stack trace that will help you (or us) understand what happens.
With this you will certainly catch things you didn’t intend (see below computing checksums when it is not necessary).
I already mentioned this tip in my puppetmaster memory leak post a month ago. You can’t imagine how much useful information you can get with this tool.
Install as explained in the original article the ruby file into ~/.gdb/ruby, copy the following into your ~/.gdbinit:
define session-ruby source ~/.gdb/ruby end
Here I’m going to show how to do this with a puppetmasterd, but it is exactly the same thing with puppetd.
Basically, the idea is to attach gdb to the puppet process, halt it and look to the current stack trace:
% ps auxgww | grep puppetd puppet 28602 2.0 8.9 275508 184492 pts/3 Sl+ Feb19 65:25 ruby /usr/bin/puppetmasterd --debug % gdb /usr/bin/ruby GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. ... (gdb) session-ruby (gdb) attach 28602 Attaching to program: /usr/bin/ruby, process 28602 ...
Now our gdb is attached to our ruby interpreter.
Lets see where we stopped:
(gdb) rb_backtrace $3 = 34
Note: the output is displayed by default on the stdout/stderr of the attached process, so in our case my puppetmasterd. Going to the terminal where it runs (actually the screen):
...
from /usr/lib/ruby/1.8/webrick/server.rb:91:in `select'
from /usr/lib/ruby/1.8/webrick/server.rb:91:in `start'
from /usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
from /usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
from /usr/lib/ruby/1.8/puppet.rb:293:in `start'
from /usr/lib/ruby/1.8/puppet.rb:144:in `newthread'
from /usr/lib/ruby/1.8/puppet.rb:143:in `initialize'
from /usr/lib/ruby/1.8/puppet.rb:143:in `new'
from /usr/lib/ruby/1.8/puppet.rb:143:in `newthread'
from /usr/lib/ruby/1.8/puppet.rb:291:in `start'
from /usr/lib/ruby/1.8/puppet.rb:290:in `each'
from /usr/lib/ruby/1.8/puppet.rb:290:in `start'
from /usr/sbin/puppetmasterd:285
It works!
It is now easy to see what puppetd is doing:
Examining the stack traces should give you hints (or us) to what your puppetd is doing at this moment.
You might have encountered a bug. Please report it in Puppet redmine, and enclose all the useful information you gathered by following the two points above.
That’s the usual suspect, and one I encountered myself.
Let’s say you have something like this in your manifest:
File { checksum => md5 }
...
file {
"/path/to/so/many/files":
owner => myself, mode => 0644, recurse => true
}
What does that mean?
You’re telling puppet that every file resource should compute checksum, and you have a recursive file operation managing owner and mode. What puppetd will do is to traverse the whole ‘/path/to/so/many/files’ and happily manage them changing owner and mode when needed.
What you might have forgotten, is that you requested checksum to be MD5, so puppetd instead of only doing a bunch of stat(3) on your files will also compute MD5 sums of their content.
If you have tons of files in this hierarchy this can take quite some time. Since checksums are cached, it can also take quite some memory.
How to solve this issue:
File { checksum => md5 }
...
file {
"/path/to/so/many/files":
owner => myself, mode => 0644, recurse => true, checksum => undef
}
Sometimes, it isn’t possible to solve this issue, if your file {} resource is a retrieve file (ie there is a source parameter), because you need to have checksum to manage the files. In this case, just byte the bullet, change the checksum to mtime, limit recursion or wait for my fix of Puppet bug #1469.
Actually it is in your interest that puppetd is taking 100% of CPU while applying the configuration the puppetmaster has given. That just means it’ll do its job faster than if it was consuming 10% of CPU
I mean, puppetd has a fixed amount of things to perform, some are CPU bound, some are I/O bound (actually most are I/O bound), so it is perfectly normal that it takes wall clock time and consume resources to play your manifests.
What is not normal is consuming CPU or memory between configuration run. But you already know how to diagnose such issues if you read the start of this post
Not all resource consumption are bad.
We’re all dreaming of a faster puppetd.
And at this subject, I think it should be possible (provided ruby supports native thread (maybe a task for JRuby)) to apply the catalog in a multi-threaded way. I never really thought about this (I mean technically), but I don’t see why it couldn’t be possible. That would allow puppetd to do several I/O bound operations in parallel (like installing packages and managing files at the same time).
12 Jan
Since a few months we are monitoring our infrastructure at Days of Wonder with OpenNMS. Until this afternoon we were running the beta/final candidate version 1.5.93.
We are monitoring a few things with the JDBC Stored Procedure Poller, which is really great to monitor complex business operations without writing remote or GP scripts.
Unfortunately the migration to OpenNMS 1.6.1 led me to discover that the JDBC Stored Procedure poller was not working anymore, crashing with a NullPointerException in the MySQL JDBC Driver while trying to fetch the output parameter.
In fact it turned out I was plain wrong. I was using a MySQL PROCEDURE:
DELIMITER // CREATE PROCEDURE `check_for_something`() READS SQL DATA BEGIN SELECT ... as valid FROM ... END //
But this OpenNMS poller uses the following JDBC procedure call:
{ ? = call check_for_something() }
After a few struggling, wrestling, and various MySQL JDBC Connector/J driver upgrades, I finally figured out what the driver was doing:
The driver rewrites the call I gave above to something like this:
SELECT check_for_something();
This means that the procedure should in fact be a SQL FUNCTION.
Here is the same procedure rewritten as a FUNCTION:
DELIMITER // CREATE FUNCTION `check_for_something`() RETURNS int(11) READS SQL DATA DETERMINISTIC BEGIN DECLARE valid INTEGER; SELECT ... INTO valid FROM ... RETURN valid; END //
It now works. I’m amazed it even worked in the first time with 1.5.93 (it was for sure).
Recent Comments