Masterzen's Blog

Journey in a software world…

Bootstrapping Windows Servers With Puppet

| Comments

All started a handful of months ago, when it appeared that we’d need to build some of our native software on Windows. Before that time, all our desktop software at Days of Wonder was mostly cross-platform java code that could be cross-compiled on Linux. Unfortunately, we badly needed a Windows build machine.

In this blog post, I’ll tell you the whole story from my zero knowledge of Windows administration to an almost fully automatized Windows build machine image construction.

Jenkins

But, first let’s digress a bit to explain in which context we operate our builds.

Our CI system is built around Jenkins, with a specific twist. We run the Jenkins master on our own infrastructure and our build slaves on AWS EC2. The reason behind this choice is out of the scope of this article (but you can still ask me, I’ll happily answer).

So, we’re using the Jenkins EC2 plugin, and a revamped by your servitor Jenkins S3 Plugin. We produce somewhat large binary artifacts when building our client software, and the bandwidth between EC2 and our master is not that great (and expensive), so using the aforementioned patch I contributed, we host all our artifacts into S3, fully managed by our out-of-aws Jenkins master.

The problem I faced when starting to explore the intricate world of Windows in relation with Jenkins slave, is that we wanted to keep the Linux model we had: on-demand slave spawned by the master when scheduling a build. Unfortunately the current state of the Jenkins EC2 plugin only supports Linux slave.

Enter WinRM and WinRS

The EC2 plugin for Linux slave works like this:

  1. it starts the slave
  2. using an internal scp implementation it copies ‘slave.jar’ which implements the client Jenkins remoting protocol
  3. using an internal ssh implementation, it executes java -jar slave.jar. The stdin and stdout of the slave.jar process is then connected to the jenkins master through an ssh tunnel.
  4. now, Jenkins does its job (basically sending more jars, classes)
  5. at this stage the slave is considered up

I needed to replicate this behavior. In the Windows world, ssh is inexistent. You can find some native implementation (like FreeSSHd or some other commercial ones), but all that options weren’t easy to implement, or simply non-working.

In the Windows world, remote process execution is achieved through the Windows Remote Management which is called WinRM for short. WinRM is an implementation of the WSMAN specifications. It allows to access the Windows Management Instrumentation to get access to hardware counters (ala SNMP or IPMI for the unix world).

One component of WinRM is WinRS: Windows Remote Shell. This is the part that allows to run remote commands. Recent Windows version (at least since Server 2003) are shipped with WinRM installed (but not started by default).

WinRM is an HTTP/SOAP based protocol. By default, the payload is encrypted if the protocol is used in a Domain Controller environment (in this case, it uses Kerberos), which will not be our case on EC2.

Digging, further, I found two client implementations:

I started integrating Overthere into the ec2-plugin but encountered several incompatibilities, most notably Overthere was using a more recent dependency on some libraries than jenkins itself.

I finally decided to create my own WinRM client implementation and released Windows support for the EC2 plugin. This hasn’t been merged upstream, and should still be considered experimental.

We’re using this version of the plugin for about a couple of month and it works, but to be honest WinRM doesn’t seem to be as stable as ssh would be. There are times the slave is unable to start correctly because WinRM abruptly stops working (especially shortly after the machine boots).

WinRM, the bootstrap

So all is great, we know how to execute commands remotely from Jenkins. But that’s not enough for our sysadmin needs. Especially we need to be able to create a Windows AMI that contains all our software to build our own applications.

Since I’m a long time Puppet user (which you certainly noticed if you read this blog in the past), using Puppet to configure our Windows build slave was the only possiblity. So we need to run Puppet on a Windows base AMI, then create an AMI from there that will be used for our build slaves. And if we can make this process repeatable and automatic that’d be wonderful.

In the Linux world, this task is usually devoted to tools like Packer or Veewee (which BTW supports provisioning Windows machines). Unfortunately Packer which is written in Go doesn’t yet support Windows, and Veewee doesn’t support EC2.

That’s the reason I ported the small implementation I wrote for the Jenkins EC2 plugin to a WinRM Go library. This was the perfect pet project to learn a new language :)

Windows Base AMI

So, starting with all those tools, we’re ready to start our project. But there’s a caveat: WinRM is not enabled by default on Windows. So before automating anything we need to create a Windows base AMI that would have the necessary tools to further allow automating installation of our build tools.

Windows boot on EC2

There’s a service running on the AWS Windows AMI called EC2config that does the following at the first boot:

  1. set a random password for the ‘Administrator’ account
  2. generate and install the host certificate used for Remote Desktop Connection.
  3. execute the specified user data (and cloud-init if installed)

On first and subsequent boot, it also does:

  1. it might set the computer host name to match the private DNS name
  2. it configures the key management server (KMS), check for Windows activation status, and activate Windows as necessary.
  3. format and mount any Amazon EBS volumes and instance store volumes, and map volume names to drive letters.
  4. some other administrative tasks

One thing that is problematic with Windows on EC2 is that the Administrator password is unfortunately defined randomly at the first boot. That means to further do things on the machine (usually using remote desktop to administer it) you need to first know it by asking AWS (with the command-line you can do: aws ec2 get-password-data).

Next, we might also want to set a custom password instead of this dynamic one. We might also want to enable WinRM and install several utilities that will help us later.

To do that we can inject specific AMI user-data at the first boot of the Windows base AMI. Those user-data can contain one or more cmd.exe or Powershell scripts that will get executed at boot.

I created this Windows bootstrap Gist (actually I forked and edited the part I needed) to prepare the slave.

First bootstrap

First, we’ll create a Windows security group allowing incoming WinRM, SMB and RDP:

1
2
3
4
5
6
7
aws ec2 create-security-group --group-name "Windows" --description "Remote access to Windows instances"
# WinRM
aws ec2 authorize-security-group-ingress --group-name "Windows" --protocol tcp --port 5985 --cidr <YOURIP>/32
# Incoming SMB/TCP 
aws ec2 authorize-security-group-ingress --group-name "Windows" --protocol tcp --port 445 --cidr <YOURIP>/32
# RDP
aws ec2 authorize-security-group-ingress --group-name "Windows" --protocol tcp --port 3389 --cidr <YOURIP>/32

Now, let’s start our base image with the following user-data (let’s put it into userdata.txt):

1
2
3
4
<powershell>
Set-ExecutionPolicy Unrestricted
icm $executioncontext.InvokeCommand.NewScriptBlock((New-Object Net.WebClient).DownloadString('https://gist.github.com/masterzen/6714787/raw')) -ArgumentList "VerySecret"
</powershell>

This powershell script will download the Windows bootstrap Gist and execute it, passing the desired administrator password.

Next we launch the instance:

1
aws ec2 run-instances --image-id ami-4524002c --instance-type m1.small --security-groups Windows --key-name <YOURKEY> --user-data "$(cat userdata.txt)"

Unlike what is written in the ec2config documentation, the user-data must not be encoded in Base64.

Note, the first boot can be quite long :)

After that we can connect through WinRM with the “VerySecret” password. To check we’ll use the WinRM Go tool I wrote and talked about above:

1
./winrm -hostname <publicip> -username Administrator -password VerySecret "ipconfig /all"

We should see the output of the ipconfig command.

Note: in the next winrm command, I’ve omitted the various credentials to increase legibility (a future version of the tool will allow to read a config file, meanwhile we can create an alias).

A few caveats:

  • BITS doesn’t work in the user-data powershell, because it requires a user to be logged-in which is not possible during boot, that’s the reason downloading is done through the System.Net.WebClient
  • WinRM enforces some resource limits, you might have to increase the allowed shell resources for running some hungry commands: winrm set winrm/config/winrs @{MaxMemoryPerShellMB="1024"} Unfortunately, this is completely broken in Windows Server 2008 unless you install this Microsoft hotfix The linked bootstrap code doesn’t install this hotfix, because I’m not sure I can redistribute the file, that’s an exercise left to the reader :)
  • the winrm traffic is not encrypted nor protected (if you use my tool). Use at your own risk. It’s possible to setup WinRM over HTTPS, but it’s a bit more involved. Current version of my WinRM tool doesn’t support HTTPS yet (but it’s very easy to add).

Baking our base image

Now that we have our base system with WinRM and Puppet installed by the bootstrap code, we need to create a derived AMI that will become our base image later when we’ll create our different windows machines.

1
aws ec2 create-image --instance-id <ourid> --name 'windows-2008-base'

For a real world example we might have defragmented and blanked the free space of the root volume before creating the image (on Windows you can use sdelete for this task).

Note that we don’t run the Ec2config sysprep prior to creating the image, which means the first boot of any instances created from this image won’t run the whole boot sequence and our Administrator password will not be reset to a random password.

Where does Puppet fit?

Now that we have this base image, we can start deriving it to create other images, but this time using Puppet instead of a powershell script. Puppet has been installed on the base image, by virtue of the powershell bootstrap we used as user-data.

First, let’s get rid of the current instance and run a fresh one coming from the new AMI we just created:

1
aws ec2 run-instances --image-id <newami> --instance-type m1.small --security-groups Windows --key-name <YOURKEY>

Anatomy of running Puppet

We’re going to run Puppet in masterless mode for this project. So we need to upload our set of manifests and modules to the target host.

One way to do this is to connect to the host with SMB over TCP (which our base image supports):

1
2
sudo mkdir -p /mnt/win
sudo mount -t cifs -o user="Administrator%VerySecret",uid="$USER",forceuid "//<instance-ip>/C\$/Users/Administrator/AppData/Local/Temp" /mnt/win

Note how we’re using an Administrative Share (the C$ above). On Windows the Administrator user has access to the local drives through Administrative Shares without having to share them as for normal users.

The user-data script we ran in the base image opens the windows firewall to allow inbound SMB over TCP (port 445).

We can then just zip our manifests/modules, send the file over there, and unzip remotely:

1
2
zip -q -r /mnt/win/puppet-windows.zip manifests/jenkins-steam.pp modules -x .git
./winrm "7z x -y -oC:\\Users\\Administrator\\AppData\\Local\\Temp\\ C:\\Users\\Administrator\\AppData\\Local\\Temp\\puppet-windows.zip | FIND /V \"ing  \""

And finally, let’s run Puppet there:

1
./winrm "\"C:\\Program Files (x86)\\Puppet Labs\\Puppet\\bin\\puppet.bat\" apply --debug --modulepath C:\\Users\\Administrator\\AppData\\Local\\Temp\\modules C:\\Users\\Administrator\\AppData\\Local\\Temp\\manifests\\site.pp"

And voila, shortly we’ll have a running instance configured. Now we can create a new image from it and use it as our Windows build slave in the ec2 plugin configuration.

Puppet on Windows

Puppet on Windows is not like your regular Puppet on Unix. Let’s focus on what works or not when running Puppet on Windows.

Core resources known to work

The obvious ones known to work:

  • File: beside symbolic links that are supported only on Puppet >3.4 and Windows 2008+, there are a few things to take care when using files:

    • NTFS is case-insensitive (but not the file resource namevar)
    • Managing permissions: octal unix permissions are mapped to Windows permissions, but the translation is imperfect. Puppet doesn’t manage Windows ACL (for more information check Managing File Permissions on Windows)
  • User: Puppet can create/delete/modify local users. The Security Identifier (SID) can’t be set. User names are case-insensitive on Windows. To my knowledge you can’t manage domain users.

  • Group: Puppet can create/delete/modify local groups. Puppet can’t manage domain groups.

  • Package: Puppet can install MSI or exe installers present on a local path (you need to specify the source). For a more comprehensive package system, check below the paragraph about Chocolatey.

  • Service: Puppet can start/stop/enable/disable services. You need to specify the short service name, not the human-reading service name.

  • Exec: Puppet can run executable (any .exe, .com or .bat). But unlike on Unix, there is no shell so you might need to wrap the commands with cmd /c. Check the Powershell exec provider module for a more comprehensive Exec system on Windows.

  • Host: works the same as for Unix systems.

  • Cron: there’s no cron system on Windows. Instead you must use the Scheduled_task type.

Do not expect your average unix module to work out-of-the-box

Of course that’s expected, mostly because of the used packages. Most of the Forge module for instance are targeting unix systems. Some Forge modules are Windows only, but they tend to cover specific Windows aspects (like registry, Powershell, etc…), still make sure to check those, as they are invaluable in your module Portfolio.

My Path is not your Path!

You certainly know that Windows paths are not like Unix paths. They use \ where Unix uses /.

The problem is that in most languages (including the Puppet DSL) ‘\’ is considered as an escape character when used in double quoted strings literals, so must be doubled \\.

Puppet single-quoted strings don’t understand all of the escape sequences double-quoted strings know (it only parses \' and \\), so it is safe to use a lone \ as long as it is not the last character of the string.

Why is that?

Let’s take this path C:\Users\Administrator\, when enclosed in a single-quoted string 'C:\Users\Administrator\' you will notice that the last 2 characters are \' which forms an escape sequence and thus for Puppet the string is not terminated correctly by a single-quote. The safe way to write a single-quoted path like above is to double the final slash: 'C:\Users\Administrator\\', which looks a bit strange. My suggestion is to double all \ in all kind of strings for simplicity.

Finally when writing an UNC Path#UNC_in_Windows) in a string literal you need to use four backslashes: \\\\host\\path.

Back to the slash/anti-slash problem there’s a simple rule: if the path is directly interpreted by Puppet, then you can safely use /. If the path if destined to a Windows command (like in an Exec), use a \.

Here’s a list of possible type of paths for Puppet resources:

  • Puppet URL: this is an url, so /
  • template paths: this is a path for the master, so /
  • File path: it is preferred to use / for coherence
  • Exec command: it is preferred to use /, but beware that most Windows executable requires \ paths (especially cmd.exe)
  • Package source: it is preferred to use /
  • Scheduled task command: use \ as this will be used directly by Windows.

Windows facts to help detection of windows

To identify a Windows client in a Puppet manifests you can use the kernel, operatingsystem and osfamily facts that all resolves to windows.

Other facts, like hostname, fqdn, domain or memory*, processorcount, architecture, hardwaremodel and so on are working like their Unix counterpart.

Networking facts also works, but with the Windows Interface name (ie Local_Area_Connection), so for instance the local ip address of a server will be in ipaddress_local_area_connection. The ipaddress fact also works, but on my Windows EC2 server it is returning a link-local IPv6 address instead of the IPv4 Local Area Connection address (but that might because it’s running on EC2).

Do yourself a favor and use Chocolatey

We’ve seen that Puppet Package type has a Windows provider that knows how to install MSI and/or exe installers when provided with a local source. Unfortunately this model is very far from what Apt or Yum is able to do on Linux servers, allowing access to multiple repositories of software and on-demand download and installation (on the same subject, we’re still missing something like that for OSX).

Hopefully in the Windows world, there’s Chocolatey. Chocolatey is a package manager (based on NuGet) and a public repository of software (there’s no easy way to have a private repository yet). If you read the bootstrap code I used earlier, you’ve seen that it installs Chocolatey.

Chocolatey is quite straightforward to install (beware that it doesn’t work for Windows Server Core, because it is missing the shell Zip extension, which is the reason the bootstrap code installs Chocolatey manually).

Once installed, the chocolatey command allows to install/remove software that might come in several flavors: either command-line packages or install packages. The first one only allows access through the command line, whereas the second does a full installation of the software.

So for instance to install Git on a Windows machine, it’s as simple as:

1
chocolatey install git.install

To make things much more enjoyable for the Puppet users, there’s a Chocolatey Package Provider Module on the Forge allowing to do the following

1
2
3
4
5
package {
  "cmake":
    ensure => installed,
    provider => "chocolatey"
}

Unfortunately at this stage it’s not possible to host easily your own chocolatey repository. But it is possible to host your own chocolatey packages, and use the source metaparameter. In the following example we assume that I packaged cmake version 2.8.12 (which I did by the way), and hosted this package on my own webserver:

1
2
3
4
5
6
7
8
9
10
11
12
# download_file uses powershell to emulate wget
# check here: http://forge.puppetlabs.com/opentable/download_file
download_file { "cmake":
  url                   => "http://chocolatey.domain.com/packages/cmake.2.8.12.nupkg",
  destination_directory => "C:\\Users\\Administrator\\AppData\\Local\\Temp\\",
}
->
package {
  "cmake":
    ensure => install,
    source => "C:\\Users\\Administrator\\AppData\\Local\\Temp\\"
}

You can also decide that chocolatey will be the default provider by adding this to your site.pp:

1
2
3
Package {
  provider => "chocolatey"
}

Finally read how to create chocolatey packages if you wish to create your own chocolatey packages.

Line endings and character encodings

There’s one final things that the Windows Puppet user must take care about. It’s line endings and character encodings. If you use Puppet File resources to install files on a Windows node, you must be aware that file content is transferred verbatim from the master (either by using content or source).

That means if the file uses the Unix LF line-endings the file content on your Windows machine will use the same. If you need to have a Windows line ending, make sure your file on the master (or the content in the manifest) is using Windows \r\n line ending.

That also means that your text files might not use a windows character set. It’s less problematic nowadays than it could have been in the past because of the ubiquitous UTF-8 encoding. But be aware that the default character set on western Windows systems is CP-1252 and not UTF-8 or ISO-8859-15. It’s possible that cmd.exe scripts not encoded in CP-1252 might not work as intended if they use characters out of the ASCII range.

Conclusion

I hope this article will help you tackle the hard task of provisioning Windows VM and running Puppet on Windows. It is the result of several hours of hard work to find the tools and learn Windows knowledge.

During this journey, I started learning a new language (Go), remembered how I dislike Windows (and its administration), contributed to several open-source projects, discovered a whole lot on Puppet on Windows, and finally learnt a lot on WinRM/WinRS.

Stay tuned on this channel for more article (when I have the time) about Puppet, programming and/or system administration :)

FOSDEM and Config Management Camp 2014

| Comments

So I know this blog has been almost abandoned, but that’s not because I don’t have anything to say. At contrary, I’ve never dealt with so many different tools, technologies and programming languages than in the past, and that only could fuel a large number of posts. Unfortunately my spare time is close to nil, and my personal priorities are much more geared toward my private life than this blog :)

So why this new post?

Well, a bunch of very fine people leaded by @KrisBuytaert have decided to organize a conference about Configuration Management, alongside of the Configuration Management DevRoom of the FOSDEM.

This post is all about why you should attend and/or even submit a talk for those events.

FOSDEM

If you live in Europe and never heard about the FOSDEM, you’re certainly not part of the Open Source movement, and it’s now the time to solve this issue!

The FOSDEM takes place every year in Brussels the first week-end of February at the Université Libre de Bruxelles. This year it’s the week-end of Feb 1st and 2nd.

This free (it’s usually a good practice to buy the T-shirt to support them) conference is certainly the largest Open Source event in Europe since more than 10 years. This conference is the gathering of about 5000 geeks from all the world, attending more than 400 talks in dozens of tracks ranging from databases (mysql, postgresql, nosql…) to languages (java, go…) through software practices (testing…) and system administration (configuration management, cloud and IAAS), and a lot more other subjects!

For 4 years now, there has been a Configuration Management DevRoom at the FOSDEM. I’m part of the people selecting the talks, and it looks like the program this year will be awesome!

And if that only is not compelling enough for you to attend the FOSDEM, here is a short list of why you should attend:

  • It’s close to almost everywhere in Europe by train or plane
  • It’s the occasion to meet a lot of Open Source stars, talk with your peers, meet people from communities you are part of
  • Belgium beers and food (do not forget the pre-FOSDEM crowded beer event usually at the Delirium Café)
  • Brussels is a very nice city (if you’re tired of computer geeks, you can visit the city – there used to be a spouse tour for non geeks relatives, but I don’t know if that still happens)

But wait, there’s more. While being in Brussels you’re pretty close to anywhere in this country by train. And since you’ll be there for this week-end, you can extend your trip by two days and attend the conference I’ll now talk about in the next paragraph :)

Config Management Camp

So, last year before FOSDEM we had a large two-days free PuppetCamp in Ghent (one of the nicest Belgium city at 30 minutes by train from Brussels). This was an awesome event.

This year, Kris et al had the crazy idea of doing a multi-tools conference. Last year camp was a success, so why not bring all the communities gravitating around the Configuration Management space in the same location and time and see what happens.

This new conference will happen on Monday and Tuesday February 3rd and 4th.

This current version of the Config Management Camp Ghent 2014 will have a main general track, and separated per-community tracks. We’ll be able to currently attend the following lectures:

So if you’re a user or a contributor to one of this tool, you’ll be able to attend two days of interesting talks.

The good news is that you might want to actually propose a talk for this conference. Fear not, the Call for Presentation deadline is December 15th for most of the community tracks.

I’ve heard from insiders that there will be some configuration management stars attending this conference (I don’t know at this stage if I can name those, but it appears most (if not all) above mentioned tools creator will be among us.)

Even though the exact schedule of the lectures is not yet known at this time, having those people on board will certainly sparkle some very interesting discussions!

The good news is that the free registration is open since Friday evening.

Go register. Now!

For some practical details now. You can go from Brussels to Ghent by a direct train departing from the Brussel Zuid station (Bruxelles Midi). This takes about 30 minutes with a lot of trains during the day, so it’s even possible to stay in Brussels and commute every day. But I wouldn’t advise that if you want to attend the Monday evening social event.

The conference takes place at the SchoonMeersen Campus of the University College Gent, which is about 5 minutes walk from the main Ghent station.

You can book an hotel/room/flat near the station (there are plenty nice hotels there), but you would miss the very nice no-car city center with all those monuments, churches and canals (and bars and restaurants too). My recommendation (actually @ripienaar’s one) would be to stay near the city center.

Of course I’ll be there, ready for some very interesting hallway discussions and socialization :)

See you all there! Do not forget to register!

Update: I think the organizers are still looking for Config Management Camp potential sponsors.

Devopsdays Paris for Real!

| Comments

Paris is more and more becoming the DevOps place to be. We (apparently) successfully rebooted the Paris DevOps Meetups, with already two events so far, and two more already in the pipeline (stay tuned for the announcements).

We’re now announcing and pushing hard the Paris edition of devopsdays (well we’re quite under the shadow of devopsdays London at this time; I take this as an extension to our long time friendly fight with Britons :)

The Paris edition of DevOpsDays is being held 18 – 19 April 2013, and we want you to be a part of it! This conference brings together speakers and attendees from around the world, with a focus on DevOps culture, techniques, and best practices.

devopsdays Paris

The format is simple: talks in the morning, and open/hack spaces in the afternoon. We’ve done 17 very successful events on 5 continents, and we’re looking forward to another great edition here!

Perhaps you’re curious, “what exactly is DevOps?”

Well if you already follow this blog, or my twitter account you might already know what is under this term. The term is, of course, a portmanteau of Development and Operations, and is perhaps best thought of as a cultural movement within the IT world. It stresses communication, collaboration and integration between software developers and IT professionals. DevOps is a response to, and evolution of, the interdependence of software development and IT operations.

The conference itself will be held at the MAS. Tickets can be purchased for one or both days, and include full access to the talks and spaces, as well as a catered lunch.

What’s more, we’re currently offering 25% off of the ticket price – just use the code WELOVEDEVOPS when you register. This is a limited-time offer (until the end of this week), so don’t delay!

And, of course, the Call for Proposals is still open until 20 March 2013.

Finally, we invite you to peruse the list of proposals, and to comment and vote for your favorite ones!

So if you had to choose one devopsdays this year, choose ours and come

  • exchange with lots of talented French people
  • taste French food and wine
  • learn how we do devops in 3 hours of work per day (just kidding of course)
  • smell the fragrance of Paris in spring (there won’t be any more snow, I promise)
  • visit the City of Lights

Looking forward to seeing you in April!

The 10 Commandments of Logging

| Comments

Welcome on my blog for this new Year. After writing an answer to a thread regarding monitoring and log monitoring on the Paris devops mailing list, I thought back about a blog post project I had in mind for a long time.

I wrote this blog post while wearing my Ops hat and this is mostly addressed to developers.

Knowing how and what to log is, to me, one of the hardest task a software engineer will have to solve. Mostly because this task is akin to divination. It’s very hard to know what information you’ll need during troubleshooting… That’s the reason I hope those 10 commandments will help you enhance your application logging for the great benefits of the ops engineers :)

1. Thou shalt not write log by yourself

Never, ever use printf or write your log entries by yourself to files, or handle log rotation by yourself. Please do your ops guys a favor and use a standard library or system API call for this.

This way, you’re sure the running application will play nicely with the other system components, will log to the right place or network services without special system configuration.

So, if you just use the system API, then this means logging with syslog(3). Learn how to use it.

If you instead prefer to use a logging library, there are plenty of those especially in the Java world, like Log4j, JCL, slf4j and logback. My favorite is the combination of slf4j and logback because it is very powerful and relatively easy to configure (and allows JMX configuration or reloading of the configuration file).

The best thing about slf4j is that you can change the logging backend when you see fit. It is especially important if you’re coding a library, because it allows anyone to use your library with their own logging backend without any modification to your library.

There are also several other logging library for different languages, like for ruby: Log4r, stdlib logger, or the almost perfect Jordan Sissel’s Ruby-cabin

And if your argument for not using a logging library is CPU consumption, then you have my permission to skip this blog post. Sure you should not put log statements in tight inner loops, but otherwise you’ll never see the difference.

2. Thou shalt log at the proper level

If you followed the 1st commandment, then you can use a different log level per log statement in your application. One of the most difficult task is to find at what level this log entry should be logged.

I’m giving here some advice:

  • TRACE level: this is a code smell if used in production. This should be used during development to track bugs, but never committed to your VCS.

  • DEBUG level: log at this level about anything that happens in the program. This is mostly used during debugging, and I’d advocate trimming down the number of debug statement before entering the production stage, so that only the most meaningful entries are left, and can be activated during troubleshooting.

  • INFO level: log at this level all actions that are user-driven, or system specific (ie regularly scheduled operations…)

  • NOTICE level: this will certainly be the level at which the program will run when in production. Log at this level all the notable event that are not considered an error.

  • WARN level: log at this level all event that could potentially become an error. For instance if one database call took more than a predefined time, or if a in memory cache is near capacity. This will allow proper automated alerting, and during troubleshooting will allow to better understand how the system was behaving before the failure.

  • ERROR level: log every error conditions at this level. That can be API calls that return errors or internal error conditions.

  • FATAL level: too bad it’s doomsday. Use this very scarcely, this shouldn’t happen a lot in a real program. Usually logging at this level signifies the end of the program. For instance, if a network daemon can’t bind a network socket, log at this level and exit is the only sensible thing to do.

Note that the default running level in your program or service might widely vary. For instance I run my server code at level INFO usually, but my desktop programs runs at level DEBUG. Because it’s very hard to troubleshoot an issue on a computer you don’t have access too, and it’s far easier when doing support or customer service to ask the user to send you the log than teaching her to change the log level and then send you the log. Of course YMMV :)

3. Honor thy log category

Most logging library I cited in the 1st commandment allow to specify a logging category. This category allows to classify the log message, and will ultimately, based on the logging framework configuration, be logged in a distinct way or not logged at all.

Most of the time java developers use the fully qualified class name where the log statement appears as the category. This is a scheme that works relatively fine if your program respects the simple responsibility principle.

Log categories in java logging libraries are hierarchical, so for instance logging with category com.daysofwonder.ranking.ELORankingComputation would match the top level category com.daysofwonder.ranking. This would allow the ops engineer to setup a logging configuration that works for all the ranking subsystem by just specifying configuration for this category. But it could at the same time, produce logging configuration for child categories if needed.

We can extend the paradigm a little bit further to help troubleshooting specific situation. Imagine that you are dealing with a server software that respond to user based request (like a REST API for instance). If your server is logging with this category my.service.api.<apitoken> (where apitoken is specific to a given user), then you could either log all the api logs by allowing my.service.api or a single misbehaving api user by logging with a more detailed level and the category my.service.api.<bad-user-api-token>. Of course this requires a system where you can change logging configuration on the fly.

4. Thou shalt write meaningful logs

This might probably be the most important commandment. There’s nothing worst than cryptic log entries assuming you have a deep understanding of the program internals.

When writing your log entries messages, always anticipate that there are emergency situations where the only thing you have is the log file, from which you have to understand what happened. Doing it right might be the subtle difference between getting fired and promoted :)

When a developer writes a log message, it is in the context of the code in which the log directive is to be inserted. In those conditions we tend to write messages the infer on the current context. Unfortunately when reading the log itself this context is absent, and those messages might not be understandable.

One way to overcome this situation (and that’s particularly important when writing at the warn or error level), is to add remediation information to the log message, or if not possible, what was the purpose of the operation, and it’s outcome.

Also, do not log message that depends on previous messages content. The reason is that those previous messages might not appear if they are logged in a different category or level, or worst can appear in a different place (or way before) in a multi-threaded or asynchronous context.

5. Thy log shalt be written in English

This might seem a strange commandment, especially coming from a French guy. First, I still think English is much more concise than French and better suits technical language. Why would you want to log in French if the message contains more than 50% of English words in it?

This being put aside, here are the essential reason behind this commandment:

  • English means your messages will be in logged with ASCII characters. This is particularly important because you can’t really know what will happen to the log message, nor what software layer or media it will cross before being archived somewhere. If your message uses a special charset or even UTF-8, it might not render correctly at the end, but worst it could be corrupted in transit and become unreadable. Still remains the question of logging user-input which might be in diverse charset and/or encoding.

  • If your program is to be used by the most and you don’t have the resources for a full localization, then English is probably your best alternative. Now, if you have to localize one thing, localize the interface that is closer to the end-user (it’s usually not the log entries).

  • if you localize your log entries (like for instance all the warning and error level), make sure you prefix those by a specific meaningful error-code. This way people can do language independent Internet search and find information. Such great scheme has been used a while ago in the VMS operating system, and I must admit it is very effective. If you were to design such scheme, you can adopt this scheme: APP-S-CODE or APP-S-SUB-CODE, with respectively:

    • APP: your application name on 3 letters
    • S: severity on 1 letter (ie D: debug, I: info…)
    • SUB: the sub part of the application this code pertains to
    • CODE: a numeric code specific to the error in question

6. Thou shalt log with context

So, there’s nothing worst than this kind of log message:

1
 Transaction failed

or

1
User operation succeeds

or in case of API exceptions:

1
java.lang.IndexOutOfBoundsException

Without proper context, those messages are only noise, they don’t add value and consume space that could have been useful during troubleshooting.

Messages are much more valuable with added context, like:

1
Transaction 2346432 failed: cc number checksum incorrect

or

1
User 54543 successfully registered e-mail user@domain.com

or

1
IndexOutOfBoundsException: index 12 is greater than collection size 10

Since we’re talking about exceptions in this last context example, if you happen to propagate up exceptions, make sure to enhance them with context appropriate to the current level, to ease troubleshooting, as in this java example:

Context propagation
1
2
3
4
5
6
7
  public void storeUserRank(int userId, int rank, String game) {
    try {
      ... deal database ...
    } catch(DatabaseException de) {
      throw new RankingException("Can't store ranking for user "+userId+" in game "+ game + " because " + de.getMessage() );
    }
  }

So the upper-layer client of the rank API will be able to log the error with enough context information. It’s even better if the context becomes parameters of the exception itself instead of the message, this way the upper layer can use remediation if needed.

An easy way to keep a context is to use the MDC some of the java logging library implements. The MDC is a per thread associative array. The logger configuration can be modified to always print the MDC content for every log line. If your program uses a per-thread paradigm, this can help solve the issue of keeping the context. For instance this java example is using the MDC to log per user information for a given request:

MDC example
1
2
3
4
5
6
7
8
9
10
11
12
  class UserRequest {
    ...
    public void execute(int userid) {
      MDC.put("user", userid);

      // ... all logged message now will display the user=<userid> for this thread context ...
      log.info("Successful execution of request");

      // user request processing is now finished, no need to log our current user anymore
      MDC.remote("user");
    }
  }

Note that the MDC system doesn’t play nice with asynchronous logging scheme, like Akka’s logging system. Because the MDC is kept in a per-thread storage area and in asynchronous systems you don’t have the guarantee that the thread doing the log write is the one that has the MDC. In such situation, you need to log the context manually with every log statement.

7. Thou shalt log in machine parseable format

Log entries are really good for human, but very poor for machines. Sometimes it is not enough to manually read log files, you need to perform some automated processing (for instance for alerting or auditing). Or you want to store centrally your logs and be able to perform search requests.

So what happens when you embed the log context in the string like in this hypothetical logging statement:

Difficult to parse
1
log.info("User {} plays {} in game {}", userId, card, gameId);

This will produce this kind of text:

Difficult to parse
1
2013-01-12 17:49:37,656 [T1] INFO  c.d.g.UserRequest  User 1334563 plays 4 of spades in game 23425656

Now, if you want to parse this, you’d need the following (untested) regex:

Difficult to parse
1
  /User (\d+) plays (.+?) in game (\d+)$/

Well, this is not easy and very error-prone, just to get access to string parameters your code already knows natively.

So what about this idea, I believe Jordan Sissel first introduced in his ruby-cabin library: Let add the context in a machine parseable format in your log entry. Our aforementioned example could be using JSON like this:

Difficult to parse
1
2013-01-12 17:49:37,656 [T1] INFO  c.d.g.UserRequest  User plays {'user':1334563, 'card':'4 of spade', 'game':23425656}

Now your log parsers can be much easier to write, indexing now becomes straightforward and you can enable all logstash power.

8. Thou shalt not log too much or too little

That might sound stupid, but there is a right balance for the amount of log.

Too much log and it will really become hard to get any value from it. When manually browsing such logs, there is too much clutter which when trying to troubleshoot a production issue at 3AM is not a good thing.

Too little log and you risk to not be able to troubleshoot problems: troubleshooting is like solving a difficult puzzle, you need to get enough material for this.

Unfortunately there is no magic rule when coding to know what to log. It is thus very important to strictly respect the 1st and 2nd commandments so that when the application will be live it will be easier to increase or decrease the log verbosity.

One way to overcome this issue is during development to log as much as possible (do not confuse this with logging added to debug the program). Then when the application enters production, perform an analysis of the produced logs and reduce or increase the logging statement accordingly to the problems found. Especially during troubleshooting, note the part of the application you wished you could have more context or logging, and make sure to add those log statements to the next version (if possible at the same time you fix the issue to keep the problem fresh in memory). Of course that requires an amount of communication between ops and devs.

This can be a complex task, but I would recommend to refactor logging statements as much as you refactor the code. The idea would be to have a tight feedback loop between the production logs and the modification of such logging statement. It’s even more efficient if your organization has a continuous delivery process in place, as the refactoring can be constant.

Logging statements are some kind of code metadata, at the same level of code comments. It’s really important to keep the logging statements in sync with the code. There’s nothing worst when troubleshooting issues to get irrelevant messages that have no relation to the code processed.

9. Thou shalt think to the reader

Why adding logging to an application?

The only answer is that someone will have to read it one day or later (or what is the point?). More important it is interesting to think about who will read those lines. Depending on the person you think will read the log messages you’re about to write, the log message content, context, category and level will be quite different.

Those persons can be:

  • an end-user trying to troubleshoot herself a problem (imagine a client or desktop program)
  • a system-administrator or operation engineer troubleshooting a production issue
  • a developer either for debugging during development or solving a production issue

Of course the developer knows the internals of the program, thus her log messages can be much more complex than if the log message is to be addressed to an end-user. So adapt your language to the intended target audience, you can even dedicate separate categories for this.

10. Thou shalt not log only for troubleshooting

As the log messages are for a different audience, log messages will be used for different reasons. Even though troubleshooting is certainly the most evident target of log messages, you can also use log messages very efficiently for:

  • Auditing: this is sometimes a business requirement. The idea is to capture significant events that matter to the management or legal people. These are statements that describe usually what users of the system are doing (like who signed-in, who edited that, etc…).

  • Profiling: as logs are timestamped (sometimes to the millisecond level), it can become a good tool to profile sections of a program, for instance by logging the start and end of an operation, you can either automatically (by parsing the log) or during troubleshooting infer some performance metrics without adding those metrics to the program itself.

  • Statistics: if you log each time a certain event happens (like a certain kind of error or event) you can compute interesting statistics about the running program (or the user behaviors). It’s also possible to hook this to an alert system that can detect too many errors in a row.

Conclusion

I hope this will help you produce more useful logs, and bear with me if I forgot an essential (to you) commandments. Oh and I can’t be held responsible if your log doesn’t get better after reading this blog :)

It’s possible that those 10 commandments are not enough, so feel free to use the comment section (or twitter or your own blog) to add more useful tips.

Thanks for reading.

Ticket to Devopsdays Roma?

| Comments

After seeing @KrisBuytaert tweet a couple of days ago about offering priority registration for the upcoming Devopsdays Roma next October to people blogging about Devops, I thought why not me?

I already wanted to attend the last European Devopsdays session in Goteborg last year, but time proximity with the PuppetConf and some work schedule prevented me to finally join the party.

This year, I won’t show up in San Francisco the yearly Puppet big event (which at least from there sounds quite terrific) for various reasons, so if attending a Devopsdays conference had a certain appeal I really couldn’t resist (and choosing one in Europe, well is much more easier for me).

Qualifying

I believe (feel free to speak up if you think not) I can qualify for being part of the devops movement (and per se can attend Devopsdays Roma) :)

One reason is that my work job is mostly programming. But I also have the fine responsibility of bringing what I write to production systems, along with creating and managing the infrastructure.

This puts me in a situation that not a lot of my fellow programmers experienced. From a long time, producing software for Software Engineers was compiling a binary artifact and handle it into other hands, then work as soon as possible on new software.

Unfortunately in the real world, this doesn’t really work. The ops people in charge of the application (and that would be the same for client side products with respect to support engineers) will struggle to deploy, analyze and simply maintain it.

When you already experienced both side of the world, you know that what matters to a Software Engineers (ie clean code, re-usable components, TDD and unit testing…) doesn’t really matters to Operations Engineers. The latter currently wants:

  • logs
  • ease of administration (like changing configuration gracefully)
  • ease of introspection (like usable logs, integrated consoles, useful metrics, trace mechanism ala Dapper)
  • ease of deployment (like packaging system, artifacts assembly that contains comprehensible dependencies)
  • external (and internal) monitoring of all aspects
  • wisely chosen hard dependencies (like database, message queues or cache systems)

Well, if you’re a developer you’ll recognize like me that the above list is really not trivial to bring, and usually tends to be overcome.

Unfortunately, an application that doesn’t implement this will be hard or impossible to properly maintain.

The first time I tried to deploy software I wrote, I discovered the hard way that those requirements are, well, requirements. Now, I make sure that the projects I work on have user stories encompassing those essential facets.

Ops already know it

If you’re on the operation side, then you already know the value of those software requirements. Our role as operation engineers is to evangelize and teach software engineers those good practices. I strongly believe that devops is all about that.

But it’s not only this. It’s also being agile on the infrastructure side…

Puppet opened my eyes

I always had been interested in managing server infrastructures. I started managing linux servers about 12 years ago (about 5 years after professionally starting as a software developer). Managing servers has always been something on the side for me, my main job being producing software (either client or server side).

I was lurking the configuration management space and community for a long time before I adopted Puppet. On the small infrastructure I was maintaining, I thought it would be overkill.

Oh, how I was wrong at that time :)

Back in 2007, I started using Puppet, and began to write modules for the software we were using in production. I was still too shy to run this in production. Then in 2008, I really started using puppet (and BTW, contributing some features I thought interesting, which you already know if you read this infrequently updated blog).

Puppet helped me to:

  • have all my configuration centralized in git
  • deploy servers with repeatable process (ie recreate servers from scratch)
  • parallelize (clusters can be spawned much more easily)
  • prevent configuration drift
  • orchestrate multiple nodes configurations (ie publish monitoring or backup information to other nodes)

Puppet helped me understand that tools are a real life savers.

I also learnt (and actually enforced) good administration practice:

  • deploy native packages instead of source installs
  • setup monitoring for every installed critical software
  • no manual configuration on servers anymore
  • configuration testing on VM (thanks Vagrant BTW)
  • automate as much as you can

Puppet also helped me join sysadmin communities (like the find folks at ##infra-talk), which in turn helped me discover other life saver tools.

This, I think, is part of the devops culture and community.

It’s not reserved to server-side

Yes, it isn’t. Support staff or customer service staff share the same responsibilities as the operation teams but for client side applications. There is now much more client software than ever with the number of smartphones out there.

The same benefit of devops I talked about earlier, can and should also be applied to client side software. Logs are invaluable when trying to understand why some software your dev team wrote doesn’t work when in the hand of your clients.

It’s even much more complex than analyzing server-side issues, because when you have the chance of managing client applications that produce logs, it’s most of the time impossible to get access to them…

Is devops the future?

My own small goal (my stone to the devops edifice) is to start the cultural mindset shift of the more developers I can (starting with my fellow co-workers). And I think that’s our own responsibility as part of the devops movement (if we can use this word) to initiate such shift.

I always smile when I see “devops engineer” job positions. Devops is not a role, it’s a mindset that everybody in a given dev and ops team should share. Maybe recruiters use this word as a synonym for “help us use automation tools”, as if it was the only solution to a human problem (well obviously if you don’t use any configuration management you have more problems to solve)

The same way olympic athletes practice hard every day to reach their level, I strongly believe that those devops practice I described should be adopted by all software developers.

Now it’s our job to spread the word and help engineers to.

Finally, in my humble opinion, devops is all about common sense. I think it’s easier to implement such practices in small companies/teams than in larger already installed teams (people are usually reluctant to changes, being good or bad). Nevertheless, if developers and operations unite and walk in the same direction, big things can be achieved.

What happens at Devopsdays…

… should not stay at Devopsdays (well except maybe for the drinks outcome)

What do I want to get from attending Devopsdays Roma?

I really want to:

  • learn new things
  • learn about other people experiences in the field
  • share about fixing problems that plagues us all like:
  • monitoring sucks
  • log processing/centralization (and developer access)
  • network and learn from wise and knowledgeable peers

And now I wish I’ll be there and that I’ll meet you :)

Puppet Internals: The Compiler

| Comments

And I’m now proud to present the second installation of my series of post about Puppet Internals:

Today we’ll focus on the compiler.

The Compiler

The compiler is at the heart of Puppet, master/agent or masterless. Its responsibility is to transform the AST into a set of resources called the catalog that the agent can consume to perform the necessary changes on the node.

You can see the compiler as a function of the AST and Facts and returning the catalog.

The compiler lives in the lib/puppet/parser/compiler.rb file and more specifically in the Puppet::Parser::Compiler class. When a node connects to a master to ask for a catalog, the Indirector directs the request to the compiler.

In a classic master/agent system, the agent does a REST find catalog request to the master. The master catalog indirection is configured to delegate to the compiler. This happens in the lib/puppet/indirector/catalog/compiler.rb file. Check this previous article about the Indirector if you want to know more.

The indirector request contains two things:

  • what node we should compile
  • the node’s facts

Produced Catalog

When we’re talking about catalog, in the Puppet system it can mean two distinct things:

  • a containment catalog
  • a relationship resource catalog

The first one is the product of the compiler (which we’ll delve into in this article). The second one is formed by the transformation of the first one in the agent. This is the later one that we usually call the puppet catalog.

Here is a simple manifest and the containment catalog that I obtained after compiling:

1
2
3
4
5
6
7
class test {
  file {
    "/tmp/a": content => "test!"
  }
}

include test

And here is the produced catalog:

Out of compiler containment catalog

You’ll notice that as its name implies, the containment catalog is a graph of classes and resources that follows the structure of the manifest.

When Facts matter

In a master/agent system the facts are coming from the request in a serialized form. Those facts were created by calling Facter on the remote node.

Once unserialized, the facts are cached locally as YAML (as per the default terminus for facts on a master). You can find them in the $vardir/yaml/facts/$certname.yaml file.

At the same time the compiler catalog terminus compute some server facts that are injected into the current node instance.

Looking for the node

In Puppet there are several possibilities to store node definitions. They can be defined by node {} blocks in the site.pp, by an ENC, into an LDAP directory, etc…

Before the compiler can start, it needs to create an instance of the Puppet::Node class, and fill this with the node informations.

The node indirection terminus is controlled by the node_terminus puppet settings which by default is plain. This terminus just creates a new empty instance of a Puppet::Node.

In an ENC setup, the terminus for the node indirection will be exec. This terminus will create a Puppet::Node instance initialized with a set of classes and global parameters the compiler will be able to use.

The plain terminus for nodes calls Puppet::Node#fact_merge. This methods finds the current set of Facts of this node. In the plain case it involves reading the YAML facts we wrote to disk in the last chapter, and merging those to the current node instance parameters.

Back to the compiler catalog terminus, this one tries to find the node with the given request information and if not present by using the node certname. Remember that the request to get a catalog from REST matches /catalog/node.domain.com, in which case the request key is node.domain.com.

Let’s compile

After that, we really enter the compiler code, when the compiler catalog terminus calls Puppet::Parser::Compiler.compile, which creates a new Puppet::Parser::Compiler instance giving it our node instance.

When creating this compiler instance, the following is created:

  • an empty catalog (an instance of Puppet::Resource::Catalog). This one will hold the result of the compilation.
  • a companion top scope (an instance of Puppet::Parser::Scope)
  • some other internal data structures helping the compilation

If the given node was coming from an ENC, the catalog is bootstrapped with the known node classes.

Once done, the compile method is called on the compiler instance. The first thing done is to bootstrap top scope with the node parameters (which contains the global data coming from the ENC if one is used and the facts).

Remember the AST

When we left the Parser post, we obtained an AST. This AST is a tree of AST instances that implement the guts of the Puppet language.

In this previous article we left aside 3 types of AST:

  • Node AST
  • Hostclass AST
  • Definition AST

Those are different in the sense that we don’t strictly evaluate them during compilation (more later on this step). No, those are instantiated as part of the initial import of the known types. If you’re wondering why I spelled the Class AST as Hostclass, then it’s because that’s how it is spelled in the Puppet code; the reason being that class is a reserved word in Ruby :)

Using a lazy evaluation scheme, Puppet keeps (actually per environments), a list of all the parsed known types (classes, definitions and nodes that the parser encountered during parsing); this is called the known types.

When this list is first accessed, if it doesn’t exist, Puppet triggers the parser to populate it. This happens in Puppet::Node::Environment.known_resource_types which calls the import_ast method with the result of the parsing phase.

import_ast adds to the known types an instance of every definitions, hostclass, node returned by their respective instantiate method.

Let’s have a closer look of the hostclass instantiate:

Simplified Hostclass AST instantiate method
1
2
3
4
5
6
7
8
9
10
def instantiate(modname)
  new_class = Puppet::Resource::Type.new(:hostclass, @name)
  all_types = [new_class]
  code.each do |nested_ast_node|
    if nested_ast_node.respond_to? :instantiate
      all_types += nested_ast_node.instantiate(modname)
    end
  end
  return all_types
end

So instantiate returns an array of Puppet::Resource::Type of the given type. You’ll notice that the hostclass code above analyzes its current class AST children for other ‘instantiable’ AST elements that will also end in the known types.

Known Types

The known types I’m talking about since a while all live in the Puppet::Resource::TypeCollection object. There’s one per Puppet environment in fact.

This object main responsibility is storing all known classes, nodes and definitions to be easily accessed by the compiler. It also watches all loaded files by the parser, so that it can trigger a re-parse when one of those is updated. It also serves as the Puppet class/module autoloader (when asking it for an unknown class, it will first try to load it from disk and parse it).

Scopes

Let’s open a parenthesis to explain a little bit what the scope is. The scope is an instance of Puppet::Parser::Scope and is simply a symbol table (as explained in the Dragon Book). It just keeps the values of Puppet variables.

It forms a tree, with the top scope (the one we saw the creation earlier) being the root of all scopes. This tree contains one child per new namespace.

The scope supports two operations:

  1. Looking up a variable value
  2. Setting a variable value

Look up is done with the lookupvar method. If the variable is qualified it will directly ask the correct scope for its value. For instance ::$hostname will fetch directly the top scope fact hostname.

Otherwise it will either return its value in the local scope if it exists or delegate to the parent scope. This can happen up until the top scope. If the value can’t be found anywhere, the :undef ruby symbol will be returned.

Note that this dynamic scope behavior will be removed in the next Puppet version, where only the local scope and the top scope will be supported. More information is available in this Scope and Puppet article.

Setting a variable is done with the setvar method. This method is called for instance by the AST class responsible of variable assignment (the AST::VarDef).

Along with regular variables, each scope has the notion of ephemeral scope. An ephemeral scope is a special transient scope that stores only regex capture $0 to $xy variables.

Each scope level maintains a stack of ephemeral scopes, which is by default empty.

In Puppet there is no scopes for other language structures than classes (and nodes and definitions), so inside the following if, an ephemeral scope is created, and pushed on the stack, to store the result of the regex match:

Epehmeral scope in action
1
2
3
if $var =~ /test(.*)/ {
  # here $0, $1... are available
}

When Puppet execution reaches the closing ‘}’, the ephemeral scope is popped from the ephemeral scope stack, removing the $0 definition.

lookupvar will also ask the ephemeral scope stack if needed.

Orthogonally, the scope instance will also store resource defaults.

Talking about AST Evaluation

And here we need to take a break from compilation to talk about AST evaluation, which I elegantly eluded from the previous post on the Parser.

Every AST node (both branch and leaf ones) implements the evaluate method. This method takes a Puppet::Parser::Scope instance as parameter. This is the scope instance that is valid at the moment we evaluate this AST node (usually the scope associated with the class where the code we evaluate is).

There are several outcomes possible after evaluation:

  • Manipulation of the scope (like variable assignment, variable lookup, parser function call)
  • Evaluation of AST children of this node (for instance if, case, selectors need to evaluate code in one their children branch)
  • Creation of Puppet::Parser::Resource when encountering a puppet resource
  • Creation of Puppet::Resource::Type (more puppet classes)

When an AST node evaluates its children it does so by calling safeevaluate on them which in turn will call evaluate. Safeevaluate will shield the caller from exceptions, and transform them to parse errors that can specify the line and file of the puppet instruction that triggered the problem.

Shouldn’t we talk about compilation?

Let’s go back to the compiler now. We left the compiler after we populated the top scope with the node’s facts, and we still didn’t properly started the compilation phase itself.

Here is what happens after:

  1. Main class evaluation
  2. Node AST evaluation
  3. Evaluation of the node classes if any
  4. Recursive evaluation of definitions and collections (called generators)
  5. Evaluation of relationships
  6. Resource overrides evaluation
  7. Resource finish
  8. Ship the catalog

After that, what remains is the containment catalog. This one will be transformed to a resource containment catalog. We call resource catalog an instance of Puppet::Resource::Catalog where all Puppet::Parser::Resource have been transformed to Puppet::Resource instances.

Let’s now see in order the list of operations we outlined above and that form the compilation.

Main class evaluation

The main class is an hidden class where every code outside any definition, node or class ends. It’s a kind of top class from which any other class is inner. This class is special because it has an empty name.

Evaluating the main class means:

  1. Creating a companion resource (an instance of Puppet::Parser::Resource) whose scope is the top scope.
  2. Add this resource to the catalog
  3. Evaluating the class code of this resource

Let’s focus on this last step which happens in Puppet::Parser::Resource.evaluate. It involves mainly getting access to the Puppet::Resource::Type instance matching our class (its type in fact) from the known types, and then calling the Puppet::Resource::Type.evaluate_code method.

Evaluating code of a class

I’m putting aside the main class evaluation to talk a little bit about code evaluation of a given class because that’s something we’ll see for every class or node during compilation.

This happens during Puppet::Resource::Type.evaluate_code which essentially does:

  1. Create a scope for this class (unless we’re compiling the main class which already uses the top scope)
  2. Ask the class AST children to evaluate with this scope

We saw in the Puppet Parser post how the AST was produced. Eventually some of those AST nodes will end up in the code element of a given puppet class (you can refer to the Puppet grammar and Puppet::Parser::AST::Hostclass for the code), under the form of an ASTArray (which is an array of AST nodes).

Node Evaluation

As for the main class, the current node compilation phase:

  • ask the known types about the current node, and if none are found ask for a default node.
  • creates a resource for this node, add it to the catalog
  • evaluates this node resource

This last evaluation will execute the given node AST code.

Node class evaluation

If the node was provided by an ENC, the compiler will then evaluate those classes. This is the same process as for the main class, where for every classes we create a resource, add it to the catalog and then evaluate it.

Evaluation of Generators

In Puppet the generators are the entities that are able to spawn new resources:

  • collections, including storeconfig exported resources
  • definitions

This part of the compilation loops calling evaluate_definitions and evaluate_collections, until none of those produces new resources.

Definitions

During the AST code evaluation, if the compiler encounters a definition call, the Puppet::Parser::AST::Resource.evaluate will be called (like for every resource).

Since this resource comes from a definition, a type resource will be instantiated and added to the catalog. This resource will not be evaluated at this stage.

Later, when evaluate_definitions is called, it will pick up any resource that hasn’t been evaluated (which is the case of our definition resources) and evaluates them.

This operation might in turn create more unevaluated resources (ie new definition spawning more definition resources), which will be evaluated in a subsequent pass over evaluate_definitions.

Collections

When the parser parses a collection which are defined like this in the Puppet language:

1
File <<| tag == 'key' |>>

it creates an AST node of type Puppet::Parser::AST::Collection. The same happen if you use the realize function.

Later when the compiler evaluate code and encounters this collection instance, it will create a Puppet::Parser::Collector and register it to the compiler.

Even later, during evaluate_collections, the evaluate method of all the registered collectors will be called. This method will either fetch exported resources from storeconfigs or virtual resources, and create Puppet::Parser::Resource that are registered to the compiler.

If the collector has created all its resources, it is removed from the compiler.

Relationship evaluation

The current compiler holds the list of relationships defined with the -> class of relationship operators (but not the ones defined with the require or before meta-parameters).

During code evaluation, when the compiler encounters the relationship AST node (an instance of Puppet::Parser::AST::Relationship), it will register a Puppet::Parser::Relationship instance to the compiler.

During the evaluate_relationships method of the compiler, every registered relationship will be evaluated. This evaluation simply adds the destination resource reference to the source resource meta-parameter matching the operator.

Resource overrides

And the next compilation phase consists in adding all the overrides we discovered during the AST code evaluation. Normally overrides are applied as soon as they are discovered, but it can happen than an override (especially for collection overrides), can not be applied because the resources it should apply on are not yet created.

Applying an override consist in setting a given resource parameter to the overridden value.

Resource finishing

During this phase, the compiler will call the finish method on every created resources. This methods is responsible of:

  • adding resource defaults to the resource parameters
  • tagging the resource with the current scope tags
  • checking that resource parameter are valid

Resource meta-parameters

The next step in the compilation process is to set all meta-parameter of our created resources, starting from the main class and walking the catalog from there.

Finish

Once everything has been done, the compiler runs some checks to make sure all overrides and collections have been evaluated. Then the catalog is transformed to a Puppet::Resource catalog (which doesn’t change its layout, just the instance of its vertices).

Conclusion

I hope you now have a better view of the compilation process. As you’ve seen the compilation is a complex process, which is one of the reason it can take some time. But that’s the price to pay to produce a data only graph tailored to one host that can be applied on the host.

Stay tuned here for the next episode of my Puppet Internals series of post. The next installment will certainly cover the Puppet transaction system, whose role is to apply the catalog on the agent.

Benchmarking Puppet Stacks

| Comments

I decided this week-end to try the more popular puppet master stacks and benchmark them with puppet-load (which is a tool I wrote to simulate concurrent clients).

My idea was to check the common stacks and see which one would deliver the best concurrency. This article is a follow-up of my previous post about puppet-load and puppet master benchmarking

Methodology

I decided to try the following stacks:

  • Apache and Passenger, which is the blessed stack, with MRI 1.8.7 and 1.9.2
  • Nginx and Mongrel
  • JRuby with minzuno

The setup is the following:

  • one m1.large ec2 instance as the master
  • one m1.small ec2 instance as the client (in the same availability zone if that matters)

To recap, m1.large instances are:

  • 2 cpu with 2 virtual core each
  • 8 GiB of RAM

All the benchmarks were run on the same instance couples to prevent skew in the numbers.

The master uses my own production manifests, consisting of about 100 modules. The node for which we’ll compile a catalog contains 1902 resources exactly (which makes it a big catalog).

There is no storeconfigs involved at all (this was to reduce setup complexity).

The methodology is to setup the various stacks on the master instance and run puppet-load on the client instance. To ensure everything is hot on the master, a first run of the benchmark is run at full concurrency first. Then multiple run of puppet-load are performed simulating an increasing number of clients. This pre-heat phase also make sure the manifests are already parsed and no I/O is involved.

Tuning has been done as best as I could on all stacks. And care was taken for the master instance to never swap (all the benchmarks involved consumed about 4GiB of RAM or less).

Puppet Master workload

Essentially a puppet master compiling catalog is a CPU bound process (that’s not because a master speaks HTTP than its workload is a webserver workload). That means during the compilation phase of a client connection, you can be guaranteed that puppet will consume 100% of a CPU core.

Which essentially means that there is usually little benefit of using more puppet master processes than CPU cores on a server.

A little bit of scaling math

When we want to scale a puppet master server, there is a rough computation that allows us to see how it will work.

Here are the elements of our problem:

  • 2000 clients
  • 30 minutes sleep interval, clients evenly distributed in time
  • master with 8 CPU core and 8GiB of RAM
  • our average catalog compilation is 10s

30 minutes interval means that every 30 minutes we must compile 2000 catalogs for our 2000 nodes. That leaves us with 2000/30 = 66 catalogs per minute.

That’s about a new client checking-in about every seconds.

Since we have 8 CPU, that means we can accommodate 8 catalogs compilation in parallel, not more (because CPU time is a finite quantity).

Since 66/8 = 8.25, we can accommodate 8 clients per minute, which means each client must be serviced in less than 60/8.25 = 7.27s.

Since our catalogs take about 10s to compile (in my example), we’re clearly in trouble and we would need to either add more master servers or increase our client sleep time (or not compile catalogs, but that’s another story).

Results

Comparing our stacks

Let’s first compare our favorite stacks for an increasing concurrent clients number (increasing concurrency).

For setups that requires a fixed number of workers (Passenger, Mongrel) those were setup with 25 puppet master workers. This was fitting in the available RAM.

For JRuby, I had to use the at the time of writing jruby-head because of a bug in 1.6.5.1. I also had to comment out the Puppet execution system (in lib/puppet/util.rb).

Normally this sub-system is in use only on clients, but when the master loads the types it knows for validation, it also autoloads the providers. Those are checking if some support commands are available by trying to execute them (yes I’m talking to you rpm and yum providers).

I also had to comment out when puppet tries to become the puppet user, because that’s not supported under JRuby.

JRuby was run with Sun java 1.6.0_26, so it couldn’t benefit from the invokedynamic work that went into Java 1.7. I fully expect this feature to improve the performances dramatically.

The main metric I’m using to compare stacks is the TPS (transaction per seconds). This is in fact the number of catalogs a master stack can compile in one second. The higher the better. Since compiling a catalog on our server takes about 12s, we have TPS numbers less than 1.

Here are the main results:

Puppet Master Stack / Catalog compiled per Seconds

And, here is the failure rate:

Puppet Master Stack / Failure rate

First notice that some of the stack exhibited failures at high concurrency. The errors I could observe were clients timeouts., even tough I configured a large client side timeout (around 10 minutes). This is what happens when too many clients connect at the same time. Everything slows down until the client times out.

Fairness

In this graph, I plotted the min, average, median and max time of compilation for a concurrency of 16 clients.

Puppet Master Stack / fairness

Of course, the better is when min and max are almost the same.

Digging into the number of workers

For the stacks that supports a configurable number of workers (mongrel and passenger), I wanted to check what impact it could have. I strongly believe that there’s no reason to use a large number (compared to I/O bound workloads).

Puppet Master Stack / Worker # influence

Conclusions

Beside being fun this project shows why Passenger is still the best stack to run Puppet. JRuby shows some great hopes, but I had to massage the Puppet codebase to make it run (I might publish the patches later).

That’d would be really awesome if we could settle on a corpus of manifests to allow comparing benchmark results between Puppet users. Anyone want to try to fix this?

Puppet Internals: The Parser

| Comments

As more or less promised in my series of post about Puppet Extension Points, here is the first post about Puppet Internals.

The idea is to produce a series of blog post about each one about a Puppet sub-system.

Before starting, I first want to present what are the various sub-blocks that forms Puppet, or Puppet: the Big Picture:

Puppet the Big Picture

I hope to be able to cover each of those sub-blocks in various posts, but we’ll today focus on the Puppet Parser.

The Puppet Parser

The Puppet Parser responsibility is to transform the textual manifests into a computer usable data structure that could be fed to the compiler to produce the catalog. This data structure is called an AST (Abstract Syntax Tree).

The Puppet Parser is the combination of various different sub-systems:

  • the lexer
  • the racc-based parser
  • the AST model

The Lexer

The purpose of the lexer is to read manifests characters by characters and to produce a stream of tokens. A token is just a symbol (combined with data) that represents a valid part of the Puppet language.

For instance, the lexer is able to find things such (but not limited to):

  • reserved keywords (like case, class, define…)
  • quoted strings
  • identifiers
  • variables
  • various operators (like left parenthesis or right curly braces…)
  • regexes

Let’s take an example and follow what comes out of the lexer when scanning this manifest:

1
$variable = "this is a string"

And here is the stream of tokens that is the outcome of the lexer:

1
2
3
:VARIABLE(VARIABLE) {:line=>1, :value=>"variable"}
:EQUALS(EQUALS) {:line=>1, :value=>"="}
:STRING(STRING) {:line=>1, :value=>"this is a string"}

As you can see, a puppet token is the combination of a symbol and a hash.

Let’s see how we achieved this result. First you must know that the Puppet lexer is a regex-based system. Each token is defined as a regex (or a stock string). When reading a character, the lexer ‘just’ checks if one of the string or regex can match. If there is one match, the lexer emits the corresponding token.

Let’s take our example manifest (the variable assignment above), and see what happens in the lexer:

  1. read $ character
  2. no regex match, let’s read some more characters
  3. read ‘variable’, still no match, our current buffer contains $variable
  4. read ‘ ’, oh we have a match against the DOLLAR_VARIABLE token regex
  5. this token is special, it is defined with a ruby block. When one of those token is read and matched, the block is executed.
  6. the block just emits the VARIABLE("variable") token

The lexer’s scanner doesn’t try every regexes or strings, it does this in a particular order. In short it tries to maximize the length of the matched string, in a word the lexer is greedy. This helps removing ambiguity.

As seen in the token stream above, the lexer associates to each token an hash containing the line number where we found it. This allows error messages in case of parsing error to point to the correct line. It also helps puppetdoc to associate the right comment with the right language structure.

The lexer also supports lexing contexts. Some tokens are valid in some specific contexts only, this is true especially when parsing quoted strings for variables interpolation.

Not all lexed tokens emit tokens for the parser. For instance comments are scanned (and stored in a stack for puppetdoc use), but they don’t produce a token for the parser: they’re skipped.

Finally, the lexer also maintains a stack of the class names it crossed. This is to be able to find the correct fully qualified name of inner classes as seen in the following example:

Fully qualified class names by keeping a stack of names during lexing
1
2
3
4
5
6
7
class outer {
  class middle {
    class inner {
      # we're in outer::middle::inner
    }
  }
}

If you want more information about the lexer, check the Puppet::Parser::Lexer class.

The parser

The parser is based on racc. Racc is a ruby port of the good old Yacc. Racc, like Yacc, is what we call a LALR parser.

The ‘cc’ in Racc means ‘compiler of compiler’. It means in fact that the parser is generated from what we call a grammar (and for LALR parsers, even a context free grammar). The generated parser is table driven and consumes tokens one by one. Those kind of parsers are sometimes called Shift/Reduce parsers.

This grammar is written in a language that is a machine readable version of a Backus-Naur Form or “BNF”.

There are different subclasses of context free grammars. Racc works best with LR(1) grammars, which means it must be possible to parse any portion of an input string with just a single token lookahead. Parsers for LR(1) grammars are deterministic. This means that we only need a fixed number of lookahead tokens (in our case 1) and what we already parsed to find what next rule to apply.

Roughly it does the following:

  1. read a token
  2. shift (this mean put the token on the stack), goto 1. until we can reduce
  3. reduce the read tokens with a grammar rules (this involves looking ahead)

We’ll have a deeper look in the subsequent chapters. Meanwhile if you want to learn everything about LALR Parsers or parsers in general, I highly recommend the Dragon Book

The Puppet Grammar

The Puppet Grammar can be found in lib/puppet/parser/grammar.ra in the sources. It is a typical racc/yacc grammar that

  • defines the known tokens (those matches the lexed token names)
  • defines the precedence of operators
  • various recursive rules that form the definition of the Puppet languages

Let’s have a look to a bit of the Puppet Grammar to better understand how it works:

Excerpt of the Puppet Grammar
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
statement_or_declaration:    resource
  | collection
  | assignment
  | casestatement
  | ifstatement_begin
...
assignment:     VARIABLE EQUALS expression {
  variable = ast AST::Name, :value => val[0][:value], :line => val[0][:line]
  result = ast AST::VarDef, :name => variable, :value => val[2], :line => val[0][:line]
}
...
expression:   rvalue
  | hash
  ...

rvalue:       quotedtext
  | name
  ...

quotedtext: STRING  { result = ast AST::String, :value => val[0][:value], :line => val[0][:line] }

So the closer look above shows 4 rules:

  • a non-terminal rule called statement_or_declaration which is an alternation of sub-rules
  • a terminal rule called assignment, with a ruby code block that will be executed when this rule will be reduced.
  • a non terminal rule called expression
  • a terminal rule quotedtext with a ruby block

To understand what that means, we could translate those rules by:

  1. A statement or declaration can be either a resource or a collection, or an assignement
  2. An assignment is when the parser finds a VARIABLE token followed by an EQUALS token and an expression
  3. An expression can be a rvalue or an hash (all defined later on in the grammar file)
  4. A rvalue can be among other things a quotedtext
  5. And finally a quotedtext can be STRING (among other things)

You can generate yourself the puppet parser by using racc, it’s as simple as:

  1. Installing racc (available as a gem)
  2. running: make -C lib/puppet/parser

This rebuilds the lib/puppet/parser/parser.rb file.

You can generate a debug parser that prints everything it does if you use -g command-line switch to racc (check the lib/puppet/parser/makefile and define @@yydebug = true in the parser class.

The parser itself is controlled by the Puppet::Parser::Parser class which is in lib/puppet/parser/parser_support.rb. This class is requiring the generated parser (both share the same ruby class). That means that the ruby blocks in the grammar will be executed in the context of an instance of the Puppet::Parser::Parser class. In other words, you can call from the grammar, methods defined in the parser_support.rb file. That’s the reason we refer to the ast method in the above example. This method just creates an instance of the given class and associates some context to it.

Let’s go back a little bit to the reduce operation. When the parser is reducing, it pops from the stack the reduced tokens and pushes the result to the stack. The result can either be what ends in the result field of the grammar ruby block or the result of the reduction of the mentioned rule (when it’s a non-terminal one).

In the ruby block of a terminal rule, it is possible to access the tokens and rule results currently parsed in the val array. To get back to the assignment statement above, val[0] is the VARIABLE token, and val[2] the result of the reduction of the expression rule.

The AST

The AST is the computer model of the parsed manifests. It forms a tree of instances of the AST base class. There are AST classes (all inheriting the AST base class) for every elements of the language. For instance there’s one for puppet classes, for if, case and so on. You’ll find all those in lib/puppet/parser/ast/ directory.

There are two kinds of AST classes:

  • leaves: which represent some kind of values (like an identifier or a string)
  • branches: which encompass more than one other AST classes (like if, case or class). This is what forms the tree.

All AST classes implement the evaluate method which we’ll cover in the compiler article.

For instance when parsing an if/else statement like this:

An If/Else statement
1
2
3
4
5
if $var {
  notice("var is true")
} else {
  notice("var is false")
}

The whole if/else once parsed will be an instance of Puppet::Parser::AST::IfStatement (which can be found in lib/puppet/parser/ast/ifstatement.rb.

This class defines three instance variables:

  1. @test
  2. @statements
  3. @else

The grammar rule for ifstatement is (I simplified it for the purpose of the article):

Simplified Grammar rule for If/else
1
2
3
4
5
6
7
8
ifstatement:  IF expression LBRACE statements RBRACE else {
  args = {
    :test => val[0],
    :statements => val[2],
    :else = val[4]
  }
  result = ast AST::IfStatement, args
}

Notice how the AST::IfStatement is initialized with the args hash containing the test,statements and else result of the those rules. Those rules result will also be AST classes, and will end up in the IFStatement fields we talked about earlier.

Thus this forms a tree. If you look to the AST::IfStatement#evaluate implementation you’ll see that depending on the result of the evaluation of the @test it will either evaluate @statements or @else.

Calling the evaluate method of the root element of this tree will in chain trigger calling evaluate on children like for the IfStatement example. This process will be explained in details in the compiler article, but that’s essentially how Puppet compiler works.

An Example Step by Step

Let’s see an end-to-end example of parsing a simple manifest:

A Simple Example
1
2
3
4
5
class test {
  file {
    "/tmp/a": content => "test!"
  }
}

This will produce the following stream of tokens:

A Simple Example
1
2
3
4
5
6
7
8
9
10
11
12
:CLASS(CLASS) {:line=>1, :value=>"class"}
:NAME(NAME) {:line=>1, :value=>"test"}
:LBRACE(LBRACE) {:line=>1, :value=>"{"}
:NAME(NAME) {:line=>2, :value=>"file"}
:LBRACE(LBRACE) {:line=>2, :value=>"{"}
:STRING(STRING) {:line=>3, :value=>"/tmp/a"}
:COLON(COLON) {:line=>3, :value=>":"}
:NAME(NAME) {:line=>3, :value=>"content"}
:FARROW(FARROW) {:line=>3, :value=>"=>"}
:STRING(STRING) {:line=>3, :value=>"test!"}
:RBRACE(RBRACE) {:line=>4, :value=>"}"}
:RBRACE(RBRACE) {:line=>5, :value=>"}"}

And now let’s dive in the parser events (I simplified the outcome because the Puppet grammar is a little bit more complex than necessary for this article). The following example shows all actions of the Parser and how looks the parser stack after the operation took place. I elided some of the stacks when not strictly needed to understand what happened.

  1. receive: CLASS (our parser got the first token from the lexer)
  2. shift CLASS (there’s nothing else to do for the moment)

    the result of the shift is that we now have one token in the parser stack

    stack: [ CLASS ]

  3. receive: NAME("test") (we get one more token)

  4. shift NAME (still no rules can match so we shift it)

    stack: [ CLASS NAME("test") ]

  5. reduce NAME —> classname (oh and now we can reduce a rule)

    notice how the stacks now contains a classname and not a NAME

    stack: [ CLASS (classname "test") ]

  6. receive: LBRACE

  7. shift LBRACE

    stack: [ CLASS (classname "test") LBRACE ]

  8. receive: NAME("file")

  9. shift NAME

    stack: [ CLASS (classname "test")
LBRACE NAME("file") ]

  10. receive: LBRACE

  11. reduce NAME —> classname

    stack: [ CLASS (classname "test")
LBRACE (classname "file") ]

  12. shift: LBRACE

    stack: [ CLASS (classname "test")
LBRACE (classname "file") LBRACE ]

  13. receive STRING("/tmp/a")

  14. shift STRING

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE STRING("/tmp/a") ]

  15. reduce STRING —> quotedtext

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (quotedtext AST::String("/tmp/a")) ]

  16. receive COLON

  17. reduce quotedtext —> resourcename

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) ]

  18. shift COLON

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON ]

  19. receive: NAME("content")

  20. shift NAME

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") ]

  21. receive: FARROW

  22. shift FARROW

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") FARROW ]

  23. receive: STRING("test!")

  24. shift: STRING
  25. reduce STRING —> quotedtext
  26. receive: RBRACE
  27. reduce quotedtext —> rvalue

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") FARROW (rvalue AST::String("test!"))]

  28. reduce rvalue —> expression

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") FARROW (expression AST::String("test!"))]

  29. reduce NAME FARROW expression —> param (we’ve now a resource parameter)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON (param AST::ResourceParam("content"=>"test!")))]

  30. reduce param —> params (multiple parameters can form a params)

  31. reduce resourcename COLON params —> resourceinst (name: parameters form a resouce)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourceinst (AST::ResourceInstance(...)))]

  32. reduce resourceinst —> resourceinstances (more than one resourceinst can form resourceinstances)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourceinstances [(resourceinst (AST::ResourceInstance(...)))] )]

  33. shift RBRACE

  34. reduce classname LBRACE resourceinstances RBRACE —> resource (we’ve discovered a resource)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resource AST::Resource(...))]

  35. receive: RBRACE

  36. reduce resource —> statement_or_declaration (a resource is one statement)
  37. reduce statement_or_declaration —> statement_and_declarations
  38. shift RBRACE

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resource AST::Resource(...)) RBRACE ]

  39. reduce CLASS classname LBRACE statements_and_declarations RBRACE —> hostclass (we’ve discovered a puppet class)

    stack: [ (hostclass AST::Hostclass(...)) ]

  40. reduce hostclass —> statement_or_declaration

  41. reduce statement_or_declaration —> statements_and_declarations
  42. receive: end of file
  43. reduce statements_and_declarations —> program
  44. shift end of file

    stack: [ (program (AST::ASTArray [AST::Hostclass(...))])) ]

And the parsing is now over. What is returned is this program, which is in fact an instance of an AST::ASTArray.

If we now analyze the produced AST, we find:

  • AST::ASTarrayarray of AST instances, this is our program
    • AST::Hostclassan instance of a class
      • AST::Resourcecontains an array of resource instances
        • AST::ResourceInstance
          • AST::ResourceParamcontains the “content” parameter
            • AST::String("content")
            • AST::String("test!")

What’s important to understand is that the AST depends only from the manifests. Thus the Puppet master needs only to reparse manifests only if they change.

What’s next?

The next episode will follow-up after the Parser: the compilation. The Puppet compiler takes the AST, injects into it the facts and gets what we call a catalog; that’s exactly what we’ll learn in the next article (sorry, no ETA yet).

Do not hesitate to comment or ask questions on this article with the comment system below :)

And happy new year all!

Protobuf, Maven, M2E and Eclipse Are on a Boat

| Comments

At Days of Wonder we develop several Java projects (for instance our online game servers). Those are built with Maven, and most if not all are using Google Protocol Buffers for data interchange.

Development happens mostly in Eclipse, and until a couple of months ago with m2eclipse. With the release of m2e (m2eclipse successor), our builds don’t work as is in Eclipse.

The reason is that we run the maven-protoc-plugin (the David Trott fork which is more or less now the only one available still seeing development). This maven plugins allows the protoc Protocol Buffers compiler to be run at the generate-sources phase of the Maven Lifecycle. Under m2eclipse, this phase was happening outside Eclipse and the builds was running fine.

Unfortunately m2e is not able to solve this correctly. It requires using a connector. Those connectors are Eclipse plugins that ties a maven plugin to a m2e build lifecycle phase. This way when m2e needs to execute this phase of the build, it can do so with the connector.

Until now, there wasn’t any lifecycle connector for the maven-protoc-plugin. This wasn’t possible to continue without this in the long term for our development team, so I took a stab to build it.

In fact it was way simpler than what I first thought. I used the m2e Extension Development Guide as a bootstrap (and especially the EGit extension).

The result of this few hours of development is now open-source and available in the m2e-protoc-connector Github repository.

Installation

I didn’t release an Eclipse p2 update repository (mainly because I don’t really know how to do that), so you’ll have to build the project by yourself (but it’s easy).

  1. Clone the repository
1
git clone git://github.com/masterzen/m2e-protoc-connector.git
  1. Build with maven 3
1
mvn package

Once built, you’ll find the feature packaged in com.daysofwonder.tools.m2e-protoc-connector.feature/target/com.daysofwonder.tools.m2e-protoc-connector.feature-1.0.0.20111130-1035-site.zip.

To install in Eclipse Indigo:

  1. open the Install New Software window from the Help menu.
  2. Then click on the Add button
  3. select the Archive button and point it to the: com.daysofwonder.tools.m2e-protoc-connector.feature/target/com.daysofwonder.tools.m2e-protoc-connector.feature-1.0.0.20111130-1035-site.zip file.
  4. Accept the license terms and restart eclipse.

Usage

To use it there is no specific need, as long as your pom.xml conforms roughly to what we use:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<plugin>
    <groupId>com.google.protobuf.tools</groupId>
    <artifactId>maven-protoc-plugin</artifactId>
    <executions>
        <execution>
            <id>generate proto sources</id>
            <goals>
                <goal>compile</goal>
            </goals>
            <phase>generate-sources</phase>
            <configuration>
                <protoSourceRoot>${basedir}/src/main/proto/</protoSourceRoot>
                <includes>
                    <param>**/*.proto</param>
                </includes>
            </configuration>
        </execution>
    </executions>
</plugin>
...
  <dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>2.4.1</version>
  </dependency>
...
<pluginRepositories>
    <pluginRepository>
        <id>dtrott-public</id>
        <name>David Trott's Public Repository</name>
        <url>http://maven.davidtrott.com/repository</url>
    </pluginRepository>
</pluginRepositories>

If you find any problem, do not hesitate to open an issue on the github repository.

Redis-snmp: Redis Performance Monitoring Through SNMP

| Comments

The same way I created mysql-snmp a small Net-SNMP subagent that allows exporting performance data from MySQL through SNMP, I’m proud to announce the first release of redis-snmp to monitor Redis servers. It is also inspired by the Cacti MySQL Templates (which also covers Redis).

I originally created this Net-SNMP perl subagent to monitor some Redis performance metrics with OpenNMS.

The where

You’ll find the sources (which allows to produce a debian package) in the redis-snmp github repository

The what

Here are the kind of graphs and metrics you can export from a redis server:

Redis Connections

Redis Commands

Redis Memory

The how

Like mysql-snmp you need to run redis-snmp on a host that has a connectivity with the monitored redis server (the same host makes sense). You also need the following dependencies:

  • Net-SNMP >= 5.4.2.1 (older versions contains a 64 bits varbind issue)
  • perl (tested under perl 5.10 from debian squeeze)

Once running, you should be able to ask your snmpd about redis values:

1
2
3
4
5
6
7
$ snmpbulkwalk -m'REDIS-SERVER-MIB' -v 2c  -c public redis-server.domain.com .1.3.6.1.4.1.20267.400
REDIS-SERVER-MIB::redisConnectedClients.0 = Gauge32: 1
REDIS-SERVER-MIB::redisConnectedSlaves.0 = Gauge32: 0
REDIS-SERVER-MIB::redisUsedMemory.0 = Counter64: 154007648
REDIS-SERVER-MIB::redisChangesSinceLastSave.0 = Gauge32: 542
REDIS-SERVER-MIB::redisTotalConnections.0 = Counter64: 6794739
REDIS-SERVER-MIB::redisCommandsProcessed.0 = Counter64: 37574019

Of course you must adjust the hostname and community. SNMP v2c (or better) is mandatory since we’re reporting 64 bits values.

Note that you can get the OID translation to name only if the REDIS-SNMP-SERVER MIB is installed on the host where you run the above command.

OpeNMS integration

To integrate to OpenNMS, it’s as simple as adding the following group to your datacollection-config.xml file:

1
2
3
4
5
6
7
8
9
<!-- REDIS-SERVER MIB -->
<group name="redis" ifType="ignore">
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.1" instance="0" alias="redisConnectedClnts" type="Gauge32" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.2" instance="0" alias="redisConnectedSlavs" type="Gauge32" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.3" instance="0" alias="redisUsedMemory" type="Gauge64" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.4" instance="0" alias="redisChangsSncLstSv" type="Gauge32" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.5" instance="0" alias="redisTotalConnectns" type="Counter64" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.6" instance="0" alias="redisCommandsPrcssd" type="Counter64" />
</group>

And the following graph definitions to your snmp-graph.properties file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
report.redis.redisconnections.name=Redis Connections
report.redis.redisconnections.columns=redisConnectedClnts,redisConnectedSlavs,redisTotalConnectns
report.redis.redisconnections.type=nodeSnmp
report.redis.redisconnections.width=565
report.redis.redisconnections.height=200
report.redis.redisconnections.command=--title "Redis Connections" \
 --width 565 \
 --height 200 \
 DEF:redisConnectedClnts={rrd1}:redisConnectedClnts:AVERAGE \
 DEF:redisConnectedSlavs={rrd2}:redisConnectedSlavs:AVERAGE \
 DEF:redisTotalConnectns={rrd3}:redisTotalConnectns:AVERAGE \
 LINE1:redisConnectedClnts#9B2B1B:"REDIS Connected Clients         " \
 GPRINT:redisConnectedClnts:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisConnectedClnts:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisConnectedClnts:MAX:"Max \\: %8.2lf %s\\n" \
 LINE1:redisConnectedSlavs#4A170F:"REDIS Connected Slaves          " \
 GPRINT:redisConnectedSlavs:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisConnectedSlavs:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisConnectedSlavs:MAX:"Max \\: %8.2lf %s\\n" \
 LINE1:redisTotalConnectns#38524B:"REDIS Total Connections Received" \
 GPRINT:redisTotalConnectns:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisTotalConnectns:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisTotalConnectns:MAX:"Max \\: %8.2lf %s\\n"

report.redis.redismemory.name=Redis Memory
report.redis.redismemory.columns=redisUsedMemory
report.redis.redismemory.type=nodeSnmp
report.redis.redismemory.width=565
report.redis.redismemory.height=200
report.redis.redismemory.command=--title "Redis Memory" \
  --width 565 \
  --height 200 \
  DEF:redisUsedMemory={rrd1}:redisUsedMemory:AVERAGE \
  AREA:redisUsedMemory#3B7AD9:"REDIS Used Memory" \
  GPRINT:redisUsedMemory:AVERAGE:"Avg \\: %8.2lf %s" \
  GPRINT:redisUsedMemory:MIN:"Min \\: %8.2lf %s" \
  GPRINT:redisUsedMemory:MAX:"Max \\: %8.2lf %s\\n"

report.redis.rediscommands.name=Redis Commands
report.redis.rediscommands.columns=redisCommandsPrcssd
report.redis.rediscommands.type=nodeSnmp
report.redis.rediscommands.width=565
report.redis.rediscommands.height=200
report.redis.rediscommands.command=--title "Redis Commands" \
 --width 565 \
 --height 200 \
 DEF:redisCommandsPrcssd={rrd1}:redisCommandsPrcssd:AVERAGE \
 AREA:redisCommandsPrcssd#FF7200:"REDIS Total Commands Processed" \
 GPRINT:redisCommandsPrcssd:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisCommandsPrcssd:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisCommandsPrcssd:MAX:"Max \\: %8.2lf %s\\n"

report.redis.redisunsavedchanges.name=Redis Unsaved Changes
report.redis.redisunsavedchanges.columns=redisChangsSncLstSv
report.redis.redisunsavedchanges.type=nodeSnmp
report.redis.redisunsavedchanges.width=565
report.redis.redisunsavedchanges.height=200
report.redis.redisunsavedchanges.command=--title "Redis Unsaved Changes" \
  --width 565 \
  --height 200 \
  DEF:redisChangsSncLstSv={rrd1}:redisChangsSncLstSv:AVERAGE \
  AREA:redisChangsSncLstSv#A88558:"REDIS Changes Since Last Save" \
  GPRINT:redisChangsSncLstSv:AVERAGE:"Avg \\: %8.2lf %s" \
  GPRINT:redisChangsSncLstSv:MIN:"Min \\: %8.2lf %s" \
  GPRINT:redisChangsSncLstSv:MAX:"Max \\: %8.2lf %s\\n"

Do not forget to register the new graphs in the report list at the top of snmp-graph.properties file.

Restart OpenNMS, and it should start graphing your redis performance metrics. You’ll find those files in the opennms directory of the source distribution.

Enjoy :)