Masterzen’s Blog

Journey in a software world…

At Days of Wonder we are huge fans of MySQL (and since about a year of the various Open Query, Percona, Google or other community patches), up to the point we’re using MySQL for about everything in production.

But since we moved to 5.0, back 3 years ago our production databases which hold our website and online game systems has a unique issue: the mysqld process uses more and more RAM, up to the point where the kernel OOM decide to kill the process.

You’d certainly think we are complete morons because we didn’t do anything in the last 3 years to fix the issue :-)

Unfortunately, I never couldn’t replicate the issue in the lab, mainly because it is difficult to replicate the exact same load the production server sees (mainly because of the online games activity).

During those 3 years, I tried everything I could, from using other allocators, valgrind, debug builds and so on, without any success.

What is nice, is that we moved to an OurDelta build about a year ago, where InnoDB is able to print more memory statistics than the default MySQL version.

For instance it shows

Internal hash tables (constant factor + variable factor)
    Adaptive hash index 1455381240      (118999688 + 1336381552)
    Page hash           7438328
    Dictionary cache    281544240       (89251896 + 192292344)
    File system         254712  (82672 + 172040)
    Lock system         18597112        (18594536 + 2576)
    Recovery system     0       (0 + 0)
    Threads             408056  (406936 + 1120)
    innodb_io_pattern   0       (0 + 0)

Back several month ago, I analyzed this output just to see what figures were growing, and found that the Dictionary Cache variable part was increasing (slowly but definitely).

Sure fine MySQL experts would have been able to tell me exactly what, when and where the problem was, but since I’m not familiar with the code-base, I looked up what this number was and where it was increased (all in dict0dict.c) and added some logs each time it was increased.

I then installed this version for a quite long time (just to check it wouldn’t crash on production) on a slave server. But this server didn’t print anything interesting because it doesn’t see the exact same load the production masters.

A couple of months after that, I moved this code to one of the master and bingo! I found the operation and the tables exhibiting an increase:

mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank_tmp2 193330680 + 8112
mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank 193338792 + 8112

As soon as I saw the operation and table (ie rank), I found what the culprit is. We have a daemon that every 10s computes the player ranks for our online games.

To do this, we’re using the following pattern:

-- compute the ranks
SELECT NULL, playerID
FROM game_score as g
ORDER BY g.rankscore DESC
INTO OUTFILE "/tmp/rank_tmp.tmp"

-- load back the scores
LOAD DATA INFILE "/tmp/rank_tmp.tmp" INTO TABLE rank_tmp

-- swap tables so that clients see new ranks atomatically
RENAME TABLE rank TO rank_tmp2 , rank_tmp TO rank, rank_tmp2 TO rank_tmp

-- truncate the old ranks for a new pass
TRUNCATE TABLE rank_tmp

-- go back to the select above

You might ask why I’m doing a so much convoluted system, especially the SELECT INTO OUTFILE and the LOAD DATA. It’s just because INSERT … SELECT with innodb and binlog enabled can produce transactions abort (which we were getting tons of).

Back to the original issue, apparently the issue lies in the RENAME part of the daemon.

Looking at the dict0dict.c dict_table_rename_in_cache function we see:

ibool
dict_table_rename_in_cache(...)
...
  old_name = mem_heap_strdup(table->heap, table->name);
  table->name = mem_heap_strdup(table->heap, new_name);
...
}

Looking to mem_heap stuff, I discovered that each table has a heap associated in which InnoDB allocates various things. This heap can only grow (by block of 8112 bytes it seems), since the allocator is not a real one. This is done for performance reasons.

So each time we rename a table, the old name (why? since it is already allocated) is duplicated, along with the new name. Each time.

This heap is freed when the table is dropped, so there is a possibility to reclaim the used memory. That means this issue is not a memory leak per-se.

By the way, I’ve filed this bug on mysql bug system.

One work-around, beside fixing the code itself, would be to drop the rank table instead of truncating it. The issue with dropping/creating InnoDB table on a fast pace is that the dictionary cache itself will grow, because it can only grow as there is no way to purge it from old tables (except running one of the Percona patches). So the more tables we create the more we’ll use memory – back to square 0, but worst.

So right now, I don’t really have any idea on how to really fix the issue. Anyone having an idea, please do not hesitate to comment on this blog post :-)

And please, don’t tell me to move to MyISAM…

Puppet Memory Leaks… Or not…

From time to time we get some complaints about so-called Puppet memory leaks either on #puppet, on the puppet-user list or in the Puppet redmine.

I tried hard to reproduce the issue on the Days of Wonder servers (mostly up-to-date debian), but never could. Starting from there I tried to gather from the various people I talked to on various channels what could be the cause, if they solved it and how.

You also can be sure there are no memory leaks in the Puppet source code. All of the identified memory leaks are either not memory leaks per-se or are caused by an out of control code base (ruby itself or a library).

Watch your Ruby

It is known that there are some ruby versions (around 1.8.5 and 1.8.6) exhibiting some leaks of some sort. This is especially true for RHEL 4 and 5 versions (and some Fedora ones too), as I found with the help of one Puppet user, or as others found.

Upgrading Ruby to 1.8.7-pl72 either from source or any repositories is usually enough to fix it.

Storeconfigs and MySQL

I also encountered some people that told me that storeconfigs with MySQL but without the real ruby-mysql gem, lead to some increasing memory footprint for their puppetmaster.

Storeconfigs and Rails < 2.1

It seems also to be a common advice to use Rails 2.1 if you use storeconfigs. I don’t know if Puppet uses this, but it seems that nested includes leaks in rails 2.0.

Is it really a leak?

The previous items I outlined above are real leaks. Some people (including myself) encountered a different issue: the puppetmaster is consuming lots of memory while doing file transfer to the clients.

In fact, up to Puppet 0.25 (not yet released at this time), Puppet is using XMLRPC as its communication protocol. Unfortunately this is not a transfer protocol, it is a Remote Procedure Call protocol. It means that to transfer binary files, Puppet has to load the whole file in memory, and then it escapes its content (same escaping as URL, which means every byte outside of 32-127 will take 3 bytes). Usually that means the master has to allocate roughly 2.5 times the size of the current transferred file. Puppet 0.25 will use REST (so native HTTP) to transfer files, which will bring speed and streaming to file serving.

Hopefully, if the Garbage Collector has a chance to trigger (because your ruby interpreter is not too much loaded), it will de-allocate all these memory used for files. If you are not so lucky, the ruby interpreter don’t have time to run a full garbage cycle, and the memory usage grows.

Some people running high-load puppetmaster have separated their file serving puppetmaster from their config serving puppetmaster to alleviate this issue.

Also, if like me you are using file recursive copy, you might encounter Bug #1469 File recursion with a remote source should not recurse locally.

I still have a leak you didn’t explain

Here is how you can find leaks in a ruby application:

I tried the three aforementioned techniques, and found that the GDB trick is the easier one to use and setup.

Another Ruby?

There’s also something that I think hasn’t been tried yet: running Puppet under a different Ruby interpreter (we’d say Virtual Machine in this case). For instance JRuby is running on top of the Java Virtual Machine which has more than 14 years of Garbage Collection development behind it.

You also can be sure than a different Ruby interpreter won’t have the same bug or memory leak as the regular one (the so called Matz Ruby interpreter from the name of his author).

There are some nice Ruby VM under development right now, and I’m sure I’ll blog about using Puppet on some of them soon :-)

Masterzen's Pictures

Ticket to Ride World Championship 2010

Ticket to Ride World Championship 2010

masterzen's photo

masterzen's photo

Musée de la Carte à jouer
Ticket to Ride World Championship 2010

Musée de la Carte à jouer Ticket to Ride World Championship...

Ticket to Ride World Championship 2010

Ticket to Ride World Championship 2010

Qualification Round - Table2
Ticket to Ride World Championship 2010

Qualification Round - Table2 Ticket to Ride World Championship...

Ian Andrew (UK), semi-final
Ticket to Ride World Championship 2010

Ian Andrew (UK), semi-final Ticket to Ride World Championship...

Ticket to Ride World Championship 2010

Ticket to Ride World Championship 2010

Qualification Round
Ticket to Ride World Championship 2010

Qualification Round Ticket to Ride World Championship 2010