Masterzen’s Blog

Journey in a software world…

Archive for the ‘War stories’ Category

The Days of Wonder News Center is running Wordpress which until a couple of days used Gengo for multilingual stuff. Back when we started using Wordpress for our news, we wanted to be able to have those in three (and maybe more) languages.

At that time (in 2007, wordpress 2.3), only Gengo was available.
During the last years, Gengo was unfortunately not maintained anymore, and it was difficult to upgrade Wordpress to new versions.

Recently we took the decision to upgrade our Wordpress installation, and at the same time ditch Gengo and start over using WPML, which is actively maintained (and looks superior to Gengo).

So, I started thinking about the conversion, then looked on the web and  found how to convert posts, with the help of those two blog posts:

Those two posts were invaluable for the conversion of posts, but unfortunately nobody solved the conversion of translated categories… until I did :)

So here is the most complete recipe to convert from Gengo 2.5 to WPML 1.8, with updated and working SQL requests.

Pre-requisites

You might want to stop the traffic to your blog during all this procedure. One way to do that is to return an HTTP error code 503 by modifying your Apache/Nginx/Whatever configuration.

  1. Log-in as an administrator in the Wordpress back-end, and deactivate Gengo.
  2. Install WPML 1.8, and activates it to create the necessary tables. I had to massage WPML a little bit to let it create the tables, YMMV.
  3. In the WPML settings, define the same languages as in Gengo (in my case English (primary), French and German)
  4. Finish the WPML configuration.
  5. If you had a define(WP_LANG,…) in your wordpress config, get rid of it.

Converting Posts

Connect to your MySQL server and issue the following revised SQL requests (thanks for the above blog posts for them):

Converting Pages

This is the same procedure, except we track ‘post_page’ instead of ‘post_post’:

Category conversion

This part is a little bit tricky. In Gengo, we translated the categories without creating new categories, but in WPML we have to create new categories that would be translations of a primary category.
To do this, I created the following SQL procedure that simplifies the creation of a translated category:

Then we need to create translated categories with this procedure (this can be done with the Wordpress admin interface, but if you have many categories it is simpler to do this with a bunch of SQL statements):

Bind translated categories to translated posts

And this is the last step, we need to make sure our posts translations have the correct translated categories (for the moment they use the English primary categories).

To do this, I created the following SQL request:

The request is in two parts. The first one will list all the French translations posts IDs that we will report in the second request to update the categories links.

At Days of Wonder we are huge fans of MySQL (and since about a year of the various Open Query, Percona, Google or other community patches), up to the point we’re using MySQL for about everything in production.

But since we moved to 5.0, back 3 years ago our production databases which hold our website and online game systems has a unique issue: the mysqld process uses more and more RAM, up to the point where the kernel OOM decide to kill the process.

You’d certainly think we are complete morons because we didn’t do anything in the last 3 years to fix the issue :-)

Unfortunately, I never couldn’t replicate the issue in the lab, mainly because it is difficult to replicate the exact same load the production server sees (mainly because of the online games activity).

During those 3 years, I tried everything I could, from using other allocators, valgrind, debug builds and so on, without any success.

What is nice, is that we moved to an OurDelta build about a year ago, where InnoDB is able to print more memory statistics than the default MySQL version.

For instance it shows

Internal hash tables (constant factor + variable factor)
    Adaptive hash index 1455381240      (118999688 + 1336381552)
    Page hash           7438328
    Dictionary cache    281544240       (89251896 + 192292344)
    File system         254712  (82672 + 172040)
    Lock system         18597112        (18594536 + 2576)
    Recovery system     0       (0 + 0)
    Threads             408056  (406936 + 1120)
    innodb_io_pattern   0       (0 + 0)

Back several month ago, I analyzed this output just to see what figures were growing, and found that the Dictionary Cache variable part was increasing (slowly but definitely).

Sure fine MySQL experts would have been able to tell me exactly what, when and where the problem was, but since I’m not familiar with the code-base, I looked up what this number was and where it was increased (all in dict0dict.c) and added some logs each time it was increased.

I then installed this version for a quite long time (just to check it wouldn’t crash on production) on a slave server. But this server didn’t print anything interesting because it doesn’t see the exact same load the production masters.

A couple of months after that, I moved this code to one of the master and bingo! I found the operation and the tables exhibiting an increase:

mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank_tmp2 193330680 + 8112
mysqld[8131]: InnoDB: dict_table_rename_in_cache production/rank 193338792 + 8112

As soon as I saw the operation and table (ie rank), I found what the culprit is. We have a daemon that every 10s computes the player ranks for our online games.

To do this, we’re using the following pattern:

-- compute the ranks
SELECT NULL, playerID
FROM game_score as g
ORDER BY g.rankscore DESC
INTO OUTFILE "/tmp/rank_tmp.tmp"

-- load back the scores
LOAD DATA INFILE "/tmp/rank_tmp.tmp" INTO TABLE rank_tmp

-- swap tables so that clients see new ranks atomatically
RENAME TABLE rank TO rank_tmp2 , rank_tmp TO rank, rank_tmp2 TO rank_tmp

-- truncate the old ranks for a new pass
TRUNCATE TABLE rank_tmp

-- go back to the select above

You might ask why I’m doing a so much convoluted system, especially the SELECT INTO OUTFILE and the LOAD DATA. It’s just because INSERT … SELECT with innodb and binlog enabled can produce transactions abort (which we were getting tons of).

Back to the original issue, apparently the issue lies in the RENAME part of the daemon.

Looking at the dict0dict.c dict_table_rename_in_cache function we see:

ibool
dict_table_rename_in_cache(...)
...
  old_name = mem_heap_strdup(table->heap, table->name);
  table->name = mem_heap_strdup(table->heap, new_name);
...
}

Looking to mem_heap stuff, I discovered that each table has a heap associated in which InnoDB allocates various things. This heap can only grow (by block of 8112 bytes it seems), since the allocator is not a real one. This is done for performance reasons.

So each time we rename a table, the old name (why? since it is already allocated) is duplicated, along with the new name. Each time.

This heap is freed when the table is dropped, so there is a possibility to reclaim the used memory. That means this issue is not a memory leak per-se.

By the way, I’ve filed this bug on mysql bug system.

One work-around, beside fixing the code itself, would be to drop the rank table instead of truncating it. The issue with dropping/creating InnoDB table on a fast pace is that the dictionary cache itself will grow, because it can only grow as there is no way to purge it from old tables (except running one of the Percona patches). So the more tables we create the more we’ll use memory – back to square 0, but worst.

So right now, I don’t really have any idea on how to really fix the issue. Anyone having an idea, please do not hesitate to comment on this blog post :-)

And please, don’t tell me to move to MyISAM…

PLD to the rescue!

There is something I used to hate to do. And I think all admins also hate to do that.

It’s when you need to reboot a server on a rescue environment to perform an administration task (i.e. fixing unbootable servers, fixing crashed root filesystems, and so on).

The commonly found problems with rescue environment are:

  • they’re not always remotely usable
  • they’re not always updated to your specific kernel version or tool
  • they can be difficult to use
  • some are CD or DVD only (no netboot, no usb keys…)
  • they don’t recognize your dumb azerty keyboard (argh, too much time spent looking for / or .)

OK, so a long time ago, I had a crashed server refusing to start on a reboot, and I had to chose a rescue environment for linux servers, other than booting on the Debian CD once again.

That’s how I discovered PLD Linux rescue CD:

PLD Rescue

and GRML:

GRML

My heart still goes to PLD rescue (because it’s really light), but I must admit that GRML has a really good zsh configuration (I even used some of their configuration ideas for my day to day zsh).

On that subject, if you don’t use zsh or don’t even know it and still want to qualify as a knowledgeable Unix admin, then please try it (preferably with GRML so that you’ll have an idea of what’s possible, and they even have a good documentation), another solution is to buy of course this really good book: “From Bash to Z Shell: Conquering the Command Line

That makes me think I should do a whole blog post on zsh.

OK, so let’s go back to our sheep (yes that’s a literally French translated expression, so I don’t expect anyone to grasp the funny part except the occasional French guys reading me :-) ).

So what’s so good about PLD Rescue:

  • it supports serial console (and that’s invaluable if you like me use a console server, and you should)
  • it can be booted:
  • it’s fully packed with only sysadmin tools – that’s the perfect sysadmin swiss-knife
  • it always stay up to date (currently kernel 2.6.28)
  • it works on x86 and amd64 servers

So my basic usage is to have a PXE netboot environment in our remote colocation, a console server (it is a real damn good Opengear CM4116).

With this setup I can netboot remotely any server to a PLD Rescue image with serial support, and then rescue my servers without going to the datacenter (it’s not that it is far from home or the office, but at 3AM, you don’t usually want to go out).

If you have a preferred rescue setup, please share it!

In the Days of Wonder Paris Office (where is located our graphic studio, and incidentally where I work), we are using Bacula to perform the multi-terabyte backup of the laaaaarge graphic files the studio produces every day.

The setup is the following:

Both servers are connected to the switch through two gigabit ethernet copper links, each one forming a 802.3ad link. The Apple Xserve and the linux box uses a layer3 hash algorithm to spread the load between each slave.

OK, that’s the fine print.

Usually about network gears, I’m pretty Cisco only (sorry, but I never found anything better than IOS). When we installed this setup back in 2006, the management decided to not go the full cisco route for the office network because of the price (a Dell 5324 is about 800 EUR, compared to a 2960G-24 which is more around 2000 EUR).

So, this switch was installed there, and never received an update (if it ain’t broken don’t fix it is my motto). Until last saturday, when I noticed that in fact the switch with the 1.0.0.47 firmware uses only layer-2 hashing to select the outgoing slave in a 802.3ad channel bonding. As you might have understood, it ruins all the efforts of both servers, since they have a constant and unique MAC address, so always the same slave is selected to move data from the switch to any server.

Brave as I am, I download the new firmware revision (which needs a new boot image), and I remotely installs it. And that was the start of the nightmare…

The switch upgraded the configuration to the new version, but unfortunately both 802.3ad channel groups were not up after the restart. After enquiring I couldn’t find any valid reason why the peers wouldn’t form such group.

OK, so back to the previous firmware (so that at least the backup scheduled for the same night would succeed). Unfortunately, something I didn’t think about, was that the new boot image couldn’t boot the old firmware. And if it did, I was still screwed up because it wouldn’t have been possible to run the configuration since it had been internally converted to the newer format… I already downgraded cisco gear, and I never had such failure… Back to the topic.

So the switch was bricked, sitting in the cabinet without switching any packets. Since we don’t have any remote console server (and I was at home), I left the switch as is until early Monday…

On Monday, I connected my helpful eeePC (and an USB/Serial converter), launched Minicom, and connected to the switch serial console. I rebooted the switch, erased the config, rebooted, reloaded the config from our tftp server and I was back to 1.0.0.47 with both 802.3ad channel groups working… but still no layer-3 hashing…

But since I’m someone that wants to understand why things are failing, I also tried again the move to firmware 2.0.1.3 to see where I was wrong. And still the same result: no more channel groups, so back to 1.0.0.47 (because some angry users wanted to actually work that day :-) )

After exchanging a few forum posts with some people on the Dell Community forum (I don’t have any support for this switch), I was suggested to actually erase the configuration before moving to the new firmware.

And that did it. It seems that the process of upgrading the configuration to the newest version is buggy and gave a somewhat invalid configuration from which the switch was unable to recover.

In fact, the switch seems to compile the configuration in a binary form/structure it uses to talk to the hardware. And when it upgraded the previous binary version, certainly some bits flipped somewhere and the various ports although still in the channel groups were setup as INDIVIDUAL instead of AGGREGATABLE.

Now the switch is running with a layer-3 hash algorithm, but it doesn’t seem to work fine, as if I run two parallel netcats on 2 IP addresses on the first server, connected to two other netcats on the second server, everything goes on only one path. I think this part needs more testing…

How would you test 802.3ad hashing?

The curse of bad blocks (is no more)

If you like me are struggling with old disks (in my case SCSI 10k RPM Ultra Wide 2 HP disks) that exhibits bad blocks, here is a short survival howto. Those disks are placed in a refurbished HP Network RS/12 I use as a spool area for Bacula backups of our Apple XServe RAID which is used by Days of Wonder graphic Studio (and those guys knows how to produce huge files, trust me).

Since a couple of days, one of the disk exhibits read errors on some sectors (did I say they are old), so waiting to get replaced by other (old) disks, I had to find a way to have it working.

Of course the SCSI utility in the Adaptec SCSI card has a remapping tool, but you have reboot the server and have it offline during the verify, which can take a long time, so that wasn’t an option.

I then learnt about sg3_utils (sg3-utils for the debian package) thanks to the very good page of smartmontools bad blocks handling.

This set of tools directly address SCSI disks through mode page, to instruct the disk to do some things. What’s interesting is that it comes with two commands of great use (there might be more of course):

  • sg_verify: to check for the health of a sector
  • sg_reassign: to remap a dead sector to one from the good sector list

Here is the use case:

backup:~# dd if=/dev/sda iflag=direct of=/dev/zero skip=1915 bs=1M
dd: reading `/dev/sda': Input/output error
12+0 records in
12+0 records out
12582912 bytes (13 MB) copied, 1.41468 seconds, 8.9 MB/s

Something is wrong, we only read 13MB instead of the whole disk.
Let’s have look to the kernel log:

backup:~# dmesg | tail
[331709.192108] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[331709.192108] sd 0:0:0:0: [sda] Sense Key : Medium Error [current]
[331709.192108] Info fld=0x3c3bb1
[331709.192108] sd 0:0:0:0: [sda] Add. Sense: Read retries exhausted
[331709.192108] end_request: I/O error, dev sda, sector 3947441

Indeed /dev/sda has a failed sector (at lba 3947441).
Let’s confirm it:

backup:~# sg_verify --lba=3947441 /dev/sda
verify (10):  Fixed format, current;  Sense key: Medium Error
 Additional sense: Read retries exhausted
  Info fld=0x3c3bb1 [3947441]
  Actual retry count: 0x003f
medium or hardware error, reported lba=0x3c3bb1

Check the defect list:

sg_reassign --grown /dev/sda
>> Elements in grown defect list: 0

And tell the disk firmware to reassign the sector

backup:~# sg_reassign --address=3947441 /dev/sda

Now verify that it was remapped:

backup:~# sg_reassign --grown /dev/sda
>> Elements in grown defect list: 1

Do we have a working sector?

backup:~# dd if=/dev/sda iflag=direct of=/dev/null bs=512 count=1 skip=3947441
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00780813 seconds, 65.6 kB/s

The sector could be read! The disk is now safe.

Of course, this tutorial might not work for every disks: PATA and SATA disks don’t respond to SCSI commands. For those disks, you have to write on the failed sector with dd and the disk firmware should automatically remap the sector. This can be proved by looking at the Reallocated_Sector_Ct output of smartctl -a.

Good luck :-)

Masterzen's Pictures

masterzen's photo

masterzen's photo

Patrick (NL)
Ticket to Ride World Championship 2010

Patrick (NL) Ticket to Ride World Championship 2010

Erick (CA)
Ticket to Ride World Championship 2010

Erick (CA) Ticket to Ride World Championship 2010

Ticket to Ride World Championship 2010

Ticket to Ride World Championship 2010

masterzen's photo

masterzen's photo

Ticket to Ride World Championship 2010

Ticket to Ride World Championship 2010

Ian Andrew (UK)
Ticket to Ride World Championship 2010

Ian Andrew (UK) Ticket to Ride World Championship 2010

Qualification Round
Ticket to Ride World Championship 2010

Qualification Round Ticket to Ride World Championship 2010