Recent Posts

Failed upgrade, impossible to downgrade… Oh my…

3 minute read

In the Days of Wonder Paris Office (where is located our graphic studio, and incidentally where I work), we are using Bacula to perform the multi-terabyte backup of the laaaaarge graphic files the studio produces every day.

The setup is the following:

Both servers are connected to the switch through two gigabit ethernet copper links, each one forming a 802.3ad link. The Apple Xserve and the linux box uses a layer3 hash algorithm to spread the load between each slave.

OK, that’s the fine print.

Usually about network gears, I’m pretty Cisco only (sorry, but I never found anything better than IOS). When we installed this setup back in 2006, the management decided to not go the full cisco route for the office network because of the price (a Dell 5324 is about 800 EUR, compared to a 2960G-24 which is more around 2000 EUR).

So, this switch was installed there, and never received an update (if it ain’t broken don’t fix it is my motto). Until last saturday, when I noticed that in fact the switch with the 1.0.0.47 firmware uses only layer-2 hashing to select the outgoing slave in a 802.3ad channel bonding. As you might have understood, it ruins all the efforts of both servers, since they have a constant and unique MAC address, so always the same slave is selected to move data from the switch to any server.

Brave as I am, I download the new firmware revision (which needs a new boot image), and I remotely installs it. And that was the start of the nightmare…

The switch upgraded the configuration to the new version, but unfortunately both 802.3ad channel groups were not up after the restart. After enquiring I couldn’t find any valid reason why the peers wouldn’t form such group.

OK, so back to the previous firmware (so that at least the backup scheduled for the same night would succeed). Unfortunately, something I didn’t think about, was that the new boot image couldn’t boot the old firmware. And if it did, I was still screwed up because it wouldn’t have been possible to run the configuration since it had been internally converted to the newer format…

I already downgraded cisco gear, and I never had such failure… Back to the topic.

So the switch was bricked, sitting in the cabinet without switching any packets. Since we don’t have any remote console server (and I was at home), I left the switch as is until early Monday…

On Monday, I connected my helpful eeePC (and an USB/Serial converter), launched Minicom, and connected to the switch serial console. I rebooted the switch, erased the config, rebooted, reloaded the config from our tftp server and I was back to 1.0.0.47 with both 802.3ad channel groups working… but still no layer-3 hashing…

But since I’m someone that wants to understand why things are failing, I also tried again the move to firmware 2.0.1.3 to see where I was wrong. And still the same result: no more channel groups, so back to 1.0.0.47 (because some angry users wanted to actually work that day :-))

After exchanging a few forum posts with some people on the Dell Community forum (I don’t have any support for this switch), I was suggested to actually erase the configuration before moving to the new firmware.

And that did it. It seems that the process of upgrading the configuration to the newest version is buggy and gave a somewhat invalid configuration from which the switch was unable to recover.

In fact, the switch seems to compile the configuration in a binary form/structure it uses to talk to the hardware. And when it upgraded the previous binary version, certainly some bits flipped somewhere and the various ports although still in the channel groups were setup as INDIVIDUAL instead of AGGREGATABLE.

Now the switch is running with a layer-3 hash algorithm, but it doesn’t seem to work fine, as if I run two parallel netcats on 2 IP addresses on the first server, connected to two other netcats on the second server, everything goes on only one path. I think this part needs more testing…

How would you test 802.3ad hashing?

February Puppet Dev Call

1 minute read

Yesterday we had the February Puppet Dev Call with unfortunately poor audio, lots of Skype disconnections which for a non native English speaker like me rendered the call difficult to follow (what is strange is that the one I could hear the best was Luke)

Puppet, brought to you by Reductive Labs

But that was an important meeting, as we know how the development process will continue from now on. It was agreed (because it makes real sense) to have the master as current stable and fork a ‘next’ branch for on-going development of the next version.

The idea is that newcomers will just have to git clone the repository to produce a bug fix or stable feature, without having to wonder (or read the development process wiki page) where/how to get the code.

It was also decided that 0.25 was really imminent with a planned release date later this month.

Arghhh, this doesn’t leave me lots of time to finish the Application Controller stuff I’m currently working on. The issue is that I procrastinated a little bit with the storeconfigs speed-up patch (which I hope will be merged for 0.25), and a few important 0.24.x bug fixes.

There was also a discussion about what should be part of the Puppet core and what shouldn’t (like the recent zenoss patch). Digression: I’m considering doing an OpenNMS type/provider like the Zenoss or Nagios one.

Back to the real topic. It was proposed to have a repository of non-core features, but this essentially only creates more troubles, including but not limited to:

  • _Versioning _of interdependent modules
  • Modules dependencies
  • Modules distribution
  • Testing (how do you run exhaustive tests if everything is scattered ?)
  • Reponsability

Someone suggested (sorry can’t remember who) that we need a packaging system to fill this hole, but I don’t think it is satisfactory. I understand the issue, but have no immediate answer to this question (that’s why I didn’t comment on this topic during the call).

Second digression: if you read this and want to contribute to Puppet (because that’s a wonderful software, a great developer team, a nicely and well-done codebase), I can’t stress you too much to read the following wiki pages:

Also come by to #puppet and/or the puppet-dev google groups, we’re ready to help!

The curse of bad blocks (is no more)

2 minute read

If you like me are struggling with old disks (in my case SCSI 10k RPM Ultra Wide 2 HP disks) that exhibits bad blocks, here is a short survival howto.

Those disks are placed in a refurbished HP Network RS/12 I use as a spool area for Bacula backups of our Apple XServe RAID which is used by Days of Wonder graphic Studio (and those guys knows how to produce huge files, trust me).

Since a couple of days, one of the disk exhibits read errors on some sectors (did I say they are old), so waiting to get replaced by other (old) disks, I had to find a way to have it working.

Of course the SCSI utility in the Adaptec SCSI card has a remapping tool, but you have to reboot the server and have it offline during the verify, which can take a long time, so that wasn’t an option.

I then learnt about sg3_utils (sg3-utils for the debian package) thanks to the very good page of smartmontools bad blocks handling.

This set of tools directly address SCSI disks through mode page, to instruct the disk to do some things. What’s interesting is that it comes with two commands of great use (there might be more of course):

  • sg_verify: to check for the health of a sector
  • sg_reassign: to remap a dead sector to one from the good sector list

Here is the use case:

backup:~# dd if=/dev/sda iflag=direct of=/dev/zero skip=1915 bs=1M
dd: reading `/dev/sda': Input/output error
12+0 records in
12+0 records out
12582912 bytes (13 MB) copied, 1.41468 seconds, 8.9 MB/s

Something is wrong, we only read 13MB instead of the whole disk. Let’s have look to the kernel log:

backup:~# dmesg | tail
[331709.192108] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[331709.192108] sd 0:0:0:0: [sda] Sense Key : Medium Error [current]
[331709.192108] Info fld=0x3c3bb1
[331709.192108] sd 0:0:0:0: [sda] Add. Sense: Read retries exhausted
[331709.192108] end_request: I/O error, dev sda, sector 3947441

Indeed /dev/sda has a failed sector (at lba 3947441).

Let’s confirm it:

backup:~# sg_verify --lba=3947441 /dev/sdaverify
 (10):  Fixed format, current;  
 Sense key: Medium Error Additional sense: Read retries exhausted  
 Info fld=0x3c3bb1 [3947441]  
 Actual retry count: 0x003f
 medium or hardware error, reported lba=0x3c3bb1

Check the defect list:

sg_reassign --grown /dev/sda
>> Elements in grown defect list: 0

And tell the disk firmware to reassign the sector

backup:~# sg_reassign --address=3947441 /dev/sda

Now verify that it was remapped:

backup:~# sg_reassign --grown /dev/sda
>> Elements in grown defect list: 1

Do we have a working sector?

backup:~# dd if=/dev/sda iflag=direct of=/dev/null bs=512 count=1 skip=3947441
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00780813 seconds, 65.6 kB/s

The sector could be read! The disk is now safe.

Of course, this tutorial might not work for every disks: PATA and SATA disks don’t respond to SCSI commands. For those disks, you have to write on the failed sector with dd and the disk firmware should automatically remap the sector. This can be proved by looking at the **Reallocated_Sector_Ct **output of smartctl -a.

Good luck :-)