Masterzen's Blog

Journey in a software world…

The Indirector - Puppet Extension Points 3

| Comments

This article is a follow-up of those previous two articles of this series on Puppet Internals:

Today we’ll cover the The Indirector. I believe that at the end of this post, you’ll know exactly what is the indirector and how it works.

The scene

The puppet source code needs to deal with lots of different abstractions to do its job. Among those abstraction you’ll find:

  • Certificates
  • Nodes
  • Facts
  • Catalogs

Each one those abstractions can be found in the Puppet source code under the form of a model class. For instance when Puppet needs to deal with the current node, it in fact deals with an instance of the node model class. This class is called Puppet::Node.

Each model can exist physically under different forms. For instance Facts can come from Facter or a YAML file, or Nodes can come from an ENC, LDAP, site.pp and so on. This is what we call a Terminus.

The Indirector allows the Puppet programmer to deal with model instances without having to manage herself the gory details of where this model instance is coming/going.

For instance, the code is the same for the client call site to find a node when it comes from an ENC or LDAP, because it’s irrelevant to the client code.

Actions

So you might be wondering what the Indirector allows to do with our models. Basically the Indirector implements a basic CRUD (Create, Retrieve, Update, Delete) system. In fact it implements 4 verbs (that maps to the CRUD and REST verb sets):

  • Find: allows to retrieve a specific instance, given through the key
  • Search: allows to retrieve some instances with a search term
  • Destroy: remove a given instance
  • Save: stores a given instance

You’ll see a little bit later how it is wired, but those verbs exist as class and/or instance methods in the models class.

So back to our Puppet::Node example, we can say this:

1
2
3
4
5
6
7
8
9
10
11
12
  # Finding a specific node
  node = Puppet::Node.find('test.daysofwonder.com')

  # here I can use node, being an instance of Puppet::Node
  puts "node: #{node.name}"

  # I can also save the given node (if the terminus allows it of course)
  # Note: save is implemented as an instance method
  node.save

  # we can also destroy a given node (if the terminus implements it):
  Puppet::Node.destroy('unwanted.daysowonder.com')

And this works for all the managed models, I could have done the exact same code with certificate instead of nodes.

Terminii

For the Latin illiterate out-there, terminii is the latin plural for terminus.

So a terminus is a concrete class that knows how to deal with a specific model type. A terminus exists only for a given model. For instance the catalog indirection can use the Compiler or the YAML terminus among half-dozen of available terminus.

The terminus is a class that should inherit somewhere in the class hierarchy from Puppet::Indirector::Terminus. This last sentence might be obscure but if your terminus for a given model directly inherits from Puppet::Indirector::Terminus, it is considered as an abstract terminus and won’t work.

1
2
3
4
5
6
7
8
9
10
11
12
13
  def find(request)
    # request.key contains the instance to find
  end

  def destroy(request)
  end

  def search(request)
  end

  def save(request)
    # request.instance contains the model instance to save
  end

The request parameter used above is an instance of Puppet::Indirector::Request. This request object contains a handful property that might be of interest when implementing a terminus. The first one is the key method which returns the name of the instance we want to manipulate. The other is instance which is available only when saving is a concrete instance of the model.

Implementing a terminus

To implement a new terminus of a given model, you need to add a ruby file of the terminus name in the puppet/indirector/<indirection>/<terminus>.rb.

For instance if we want to implement a new source of puppet nodes like storing node classes in DNS TXT resource records, we’d create a puppet/node/dns.rb file whose find method would ask for TXT RR using request.key.

Puppet already defines some common behavior like yaml based files, rest based, code based or executable based. A new terminus can inherit from one of those abstract terminus to inherit from its behavior.

I contributed (but hasn’t been merged yet) and OCSP system for Puppet. This one defines a new indirection: ocsp. This indirection contains two terminus:

The real concrete one that inherits from Puppet::Indirector::Code, it in fact delegates the OCSP request verification to the OCSP layer:

1
2
3
4
5
6
7
8
9
10
11
require 'puppet/indirector/ocsp'
require 'puppet/indirector/code'
require 'puppet/ssl/ocsp/responder'

class Puppet::Indirector::Ocsp::Ca < Puppet::Indirector::Code
  desc "OCSP request revocation verification through the local CA."

  def save(request)
    Puppet::SSL::Ocsp::Responder.respond(request.instance)
  end
end

It also has a REST terminus. This allows for a given implementation to talk to a remote puppet process (usually a puppetmaster) using the indirector without modifying client or server code:

1
2
3
4
5
6
7
8
9
require 'puppet/indirector/ocsp'
require 'puppet/indirector/rest'

class Puppet::Indirector::Ocsp::Rest < Puppet::Indirector::REST
  desc "Remote OCSP certificate REST remote revocation status."

  use_server_setting(:ca_server)
  use_port_setting(:ca_port)
end

As you can see we can do a REST client without implementing any network stuff!

Indirection creation

To tell Puppet that a given model class can be indirected it’s just a matter or adding a little bit of Ruby metaprogramming.

To keep my OCSP system example, the OCSP request model class is declared like this:

1
2
3
4
5
6
7
8
9
10
class Puppet::SSL::Ocsp::Request < Puppet::SSL::Base
  ...

  extend Puppet::Indirector
  # this will tell puppet that we have a new indirection
  # and our default terminus will be found in puppet/indirector/ocsp/ca.rb
  indirects :ocsp, :terminus_class => :ca

  ...
end

Basically we’re saying the our model Puppet::SSL::Ocsp::Request declares an indirection ocsp, whose default terminus class is ca. That means, if we straightly try to call Puppet::SSL::Ocsp::Request.find, the puppet/indirection/ocsp/ca.rb file will be used.

Terminus selection

There’s something I didn’t talk about. You might ask yourself how Puppet knows which terminus it should use when we call one of the indirector verb. As seen above, if nothing is done to configure it, it will default to the terminus given on the indirects call.

But it is configurable. The Puppet::Indirector module defines the terminus_class= method. This methods when called can be used to change the active terminus.

For instance in the puppet agent, the catalog indirection has a REST terminus, but in the master the same indirection uses the compiler:

1
2
3
4
5
  # puppet agent equivalent code
  Puppet::Resource::Catalog.terminus_class = :rest

  # puppet master equivalent code
  Puppet::Resource::Catalog.terminus_class = :compiler

In fact the code is a little bit more complicated than this for the catalog but in the end it’s equivalent.

There’s also the possibility for a puppet application to specify a routing table between indirection and terminus to simplify the wiring.

More than one type of terminii

There’s something I left aside earlier. There are in fact two types of terminii per indirection:

  • regular terminus as we saw earlier
  • cache terminus

For every model class we can define the regular indirection terminus and an optional cache terminus.

Then when finding for an instance the cache terminus will first be asked for. If not found in the cache (or asked to not get from the cache) the regular terminus will be used. Afterward the instance will be saved in the cache terminus.

This cache is exploited in lots of place in the Puppet code base.

Among those, the catalog cache terminus is set to :yaml on the agent. The effect is that when the agent retrieves the catalog from the master through the :rest regular terminus, it is locally saved by the yaml terminus. This way if the next agent run fails when retrieving the catalog through REST, it will used the previous one locally cached during the previous run.

Most of the certificate stuff is handled along the line of the catalog, with local caching with a file terminus.

REST Terminus in details

There is a direct translation between the REST verbs and the indirection verbs. Thus the :rest terminus:

  1. transforms the indirection and key to an URI: /<environment>/<indirection>/<key>
  2. does an HTTP GET|PUT|DELETE|POST depending on the indirection verb

On the server side, the Puppet network layer does the reverse, calling the right indirection methods based on the URI and the REST verb.

There’s also the possibility to sends parameters to the indirection and with REST, those are transformed into URL request parameters.

The indirection name used in the URI is pluralized by adding a trailing ’s’ to the indirection name when doing a search, to be more REST. For example:

  • GET /production/certificate/test.daysofwonder.com is find
  • GET /production/certificates/unused is a search

When indirecting a model class, Puppet mixes-in the Puppet::Network::FormatHandler module. This module allows to render and convert an instance from and to a serialized format. The most used one in Puppet is called pson, which in fact is json in disguised name.

During a REST transaction, the instance can be serialized and deserialized using this format. Each model can define its preferred serialization format (for instance catalog use pson, but certificates prefer raw encoding).

On the HTTP level, we correctly add the various encoding headers reflecting the serialization used.

You will find a comprehensive list of all REST endpoint in puppet here

Puppet 2.7 indirection

The syntax I used in my samples are derived from the 2.6 puppet source. In Puppet 2.7, the dev team introduced (and are now contemplating removing) an indirection property in the model class which implements the indirector verbs (instead of being implemented directly in the model class).

This translates to:

1
2
3
4
5
  # 2.6 way, and possibly 2.8 onward
  Puppet::Node.find(...)

  # 2.7 way
  Puppet::Node.indirection.find(...)

Gory details anyone?

OK, so how it works?

Let’s focus on Puppet::Node.find call:

  1. Ruby loads the Puppet::Node class
  2. When mixing in Puppet::Indirector we created a bunch of find/destroy… methods in the current model class
  3. Ruby execute the indirects call from the Puppet::Indirector module
    1. This one creates a Puppet::Indirector::Indirection stored locally in the indirection class instance variable
    2. This also registers the given indirection in a global indirection list
    3. This also register the given default terminus class. The terminus are loaded with a Puppet::Util::Autoloader through a set of Puppet::Util::InstanceLoader
  4. When this terminus class is loaded, since it somewhat inherits from Puppet::Indirector::Terminus, the Puppet::Indirector:Terminus#inherited ruby callback is executed. This one after doing a bunch of safety checks register the terminus class as a valid terminus for the loaded indirection.
  5. We’re now ready to really call Puppet::Node.find. find is one of the method that we got when we mixed-in Puppet::Indirector
    1. find first create a Puppet::Indirector::Request, with the given key.
    2. It then checks the terminus cache if one has been defined. If the cache terminus finds an instance, this one is returned
    3. Otherwise find delegates to the registered terminus, by calling terminus.find(request)
    4. If there’s a result, this one is cached in the cache terminus
    5. and the result is returned

Pretty simple, isn’t it? And that’s about the same mechanism for the three other verbs.

It is to be noted that the terminus are loaded with the puppet autoloader. That means it should be possible to add more indirection and/or terminus as long as paths are respected and they are in the RUBYLIB. I don’t think though that those paths are pluginsync’ed.

Conclusion

I know that the indirector can be intimidating at first, but even without completely understanding the internals, it is quite easy to add a new terminus for a given indirection.

On the same subject, I highly recommends this presentation about Extending Puppet by Richard Crowley. This presentation also covers the indirector.

This article will certainly close the Puppet Extension Points series. The last remaining extension type (Faces) have already been covered thoroughly on the Puppetlabs Docs site.

The next article will I think cover the full picture of a full puppet agent/master run.

Puppet Extension Points - Part 2

| Comments

After the first part in this series of article on Puppet extensions points, I’m proud to deliver a new episode focusing on Types and Providers.

Note that there’s a really good chapter on the same topic in James Turnbull and Jeff McCune Pro Puppet (which I highly recommend if you’re a serious puppeteer). Also note that you can attend Puppetlabs Developper Training, which covers this topic.

Of Types and Providers

One of the great force of Puppet is how various heterogenous aspects of a given POSIX system (or not, like the Network Device system I contributed) are abstracted into simple elements: types.

Types are the foundation bricks of Puppet, you use them everyday to model how your systems are formed. Among the core types, you’ll find user, group, file, …

In Puppet, manifests define resources which are instances of their type. There can be only one resource of a given name (what we call the namevar, name or title) for a given catalog (which usually maps to a given host).

A type models what facets of a physical entity (like a host user) are managed by Puppet. These model facets are called “properties” in Puppet lingo.

Essentially a type is a name, some properties to be managed and some parameters. Paramaters are values that will help or direct Puppet to manage the resource (for instance the managehome parameter of the user type is not part of a given user on the host, but explains to Puppet that this user’s home directory is to be managed).

Let’s follow the life of a resource during a puppet run.

  1. During compilation, the puppet parser will instantiate Puppet::Parser::Resource instances which are Puppet::Resource objects. Those contains the various properties and parameters values defined in the manifest.

  2. Those resources are then inserted into the catalog (an instance of Puppet::Resource::Catalog)

  3. The catalog is then sent to the agent (usually in json format)

  4. The agent converts the catalog individual resources into RAL resources by virtue of Puppet::Resource#to_ral. We’re now dealing with instances of the real puppet type class. RAL means Resource Abstraction Layer.

    1. The agent then applies the catalog. This process creates the relationships graph so that we can manage resources in an order obeying require/before metaparameters. During catalog application, every RAL resource is evaluated. This process tells a given type to do what is necessary so that every managed property of the real underlying resource match what was specified in the manifest. The software system that does this is the provider.

So to summarize, a type defines to Puppet what properties it can manage and an accompanying provider is the process to manage them. Those two elements forms the Puppet RAL.

There can be more than one provider per type, depending on the host or platform. For instance every users have a login name on all kind of systems, but the way to create a new user can be completely different on Windows or Unix. In this case we can have a provider for Windows, one for OSX, one for Linux… Puppet knows how to select the best provider based on the facts (the same way you can confine facts to some operating systems, you can confine providers to some operating systems).

Looking Types into the eyes

I’ve written a combination of types/providers for this article. It allows to manage DNS zones and DNS Resource Records for DNS hosting providers (like AWS Route 53 or Zerigo). To simplify development I based the system on Fog DNS providers (you need to have the Fog gem installed to use those types on the agent). The full code of this system is available in my puppet-dns github repository.

This work defines two new Puppet types:

  • dnszone: manage a given DNS zone (ie a domain)
  • dnsrr: manage an individual DNS RR (like an A, AAAA, … record). It takes a name, a value and a type.

Here is how to use it in a manifest:





Let’s focus on the dnszone type, which is the simpler one of this module:





Note, that the dnszone type assumes there is a /etc/puppet/fog.yaml file that contains Fog DNS options and credentials as a hash encoded in yaml. Refer to the aforementioned github repository for more information and use case.

Exactly like parser functions, types are defined in ruby, and Puppet can autoload them. Thus types should obey to the Puppet type ruby namespace. That’s the reason we have to put types in puppet/type/. Once again this is ruby metaprogramming (in its all glory), to create a specific internal DSL that helps describe types to Puppet with simple directives (the alternative would have been to define a datastructure which would have been much less practical).

Let’s dive into the dnszone type.

  • Line 1, we’re calling the Puppet::Type#newtype method passing, first the type name as a ruby symbol (which should be unique among types), second a block (from line 1 to the end). The newtype method is imported in Puppet::Type but is in fact defined in Puppet::Metatype::ManagerNewtype job is to create a new singleton class whose parent is Puppet::Type (or a descendant if needed). Then the given block will be evaluated in class context (this means that the block is executed with self being the just created class). This singleton class is called Puppet::TypeDnszone in our case (but you see the pattern).

  • Line 2: we’re assigning a string to the Puppet::Type class variable @doc. This will be used to to extract type documentation.

  • Line 4: This straight word ensurable, is a class method in Puppet::Type. So when our type block is evaluated, this method will be called. This methods installs a new special property Ensure. This is a shortcut to automatically manage creation/deletion/existence of the managed resource. This automatically adds support for ensure =&gt; (present|absent) to your type. The provider still has to manage ensurability, though.

  • Line 6: Here we’re calling Puppet::Type#newparam. This tells our type that we’re going to have a parameter called “name”. Every resource in Puppet must have a unique key, this key is usually called the name or the title. We’re giving a block to this newparam method. The job of newparam is to create a new class descending of Puppet::Parameter, and to evaluate the given block in the context of this class (which means in this block self is a singleton class of Puppet::Parameter). Puppet::Parameter defines a bunch of utility class methods (that becomes apparent directives of our parameter DSL), among those we can find isnamevar which we’ve used for the name parameter. This tells Puppet type system that the name parameter is what will be the holder of the unique key of this type. The desc method allows to give some documentation about the parameter.

  • Line 12: we’re defining now the email parameter. And we’re using the newvalues class method of Puppet::Parameter. This method defines what possible values can be set to this parameter. We’re passing a regex that allows any string containing an ‘@’, which is certainly the worst regex to validate an e-mail address :) Puppet will raise an error if we don’t give a valid value to this parameter.

  • Line 17: and again a new parameter. This parameter is used to control Fog behavior (ie give to it your credential and fog provider used). Here we’re using defaultto, which means if we don’t pass a value then the defaultto value will be used.

  • Line 22: there is a possibility for a given resource to auto-require another resource. The same way a file resource can automatically add ‘require’ to its path ancestor. In our case, we’re autorequiring the yaml_fog_file, so that if it is managed by puppet, it will be evaluated before our dnszone resource (otherwise our fog provider might not have its credentials available).

Let’s now see another type which uses some other type DSL directives:





We’ll pass over the bits we already covered with the first type, and concentrate on new things:

  • Line 12: our dnszone type contained only parameters. Now it’s the first time we define a property. A property is exactly like a parameter but is fully managed by Puppet (see the chapter below). A property is an instance of a Puppet::Property class, which itself inherits from Puppet::Parameter, which means all the methods we’ve covered in our first example for parameters are available for properties. This type property is interesting because it defines discrete values. If you try to set something outside of this list of possible values, Puppet will raise an error. Values can be either ruby symbols or strings.

  • Line 17: a new property is defined here. With the isrequired method we tell Puppet that is is indeed necessary to have a value. And the validate methods will store the given validate block so that when Puppet will set the desired value to this property it will execute it. In our case we’ll report an error if the given value is empty.

  • Line 24: here we defined a global validation system. This will be called when all properties will have been assigned a value. This block executes in the instance context of the type, which means that we can access all instance variables and methods of Puppet::Type (in particualy the [] method that allows to access parameters/properties values). This allows to perform validation across the boundaries of a given parameter/property.

  • Line 25: finally, we declare a new parameter that references a dnszone. Note that we use a dynamic defaultto (with a block), so that we can look up the given resource name and derive our zone from the FQDN. This raises an important feature of the type system: the order of the declarations of the various blocks is important. Puppet will always respect the declaration order of the various properties when evaluating their values. That means a given property can access a value of another properties defined earlier.

I left managing RR TTL as an exercise to the astute reader :) Also note we didn’t cover all the directives the type DSL offers us. Notably, we didn’t see value munging (which allows to transform a string representation coming from the manifest to an internal (to the type) format). For instance that can be used to transform string IP address to the ruby IPAddr type for later use. I highly recommend you to browse the default types in the Puppet source distribution and check the various directives used there. You can also read Puppet::Parameter, Puppet::Property and Puppet::Type source code to see the ones we didn’t cover.

Life and death of Properties

So, we saw that a Puppet::Parameter is just a holder for the value coming from the manifest. A Puppet::Property is a parameter that along with the desired value (the one coming from the manifest) contains the current value (the one coming from the managed resource on the host). The first one is called the “should”, and the later one is called the “value”. Those innocently are methods of the Puppet::Property object and returns respectively those values. A property implements the following aspects:

  • it can retrieve a value from the managed resource. This is the operation of asking the real host resource to fetch its value. This is usually performed by delegation to the provider.

  • it can report its should which is the value given in the manifest

  • it can be insync?. This returns true if the retrieved value is equal to the “should” value.

  • and finally it might sync. Which means to the necessary so that “insync?” becomes true. If there is a provider for the given type, this one will be called to take care of the change.

When Puppet manages a resource, it does it with the help of a Puppet::Transaction. The given transaction orders the various properties that are not insync? to sync. Of course this is a bit more complex than that, because this is done while respecting resource ordering (the one given by the require/before metaparameter), but also propagating change events (so that service can be restarted and so on), and allowing resources to spawn child resources, etc… It’s perfectly possible to write a type without a provider, as long as all properties used implement their respective retrieve and sync methods. Some of the core types are doing this.

Providers

We’ve seen that properties usually delegate to the providers for managing the underlying real resource. In our example, we’ll have two providers, one for each defined type. There are two types of providers:

  • prefetch/flush
  • per properties

The per properties providers needs to implement a getter and a setter for every property of the accompanying type. When the transaction manipulates a given property its provider getter is called, and later on the setter will be called if the property is not insync?. It is the responsibility of those setters to flush those values to the physical managed resource. For some providers it is highly impractical or inefficient to flush on every property value change. To solve this issue, a given provider can be a prefetch/flush one. A prefetch/flush provider implements only two methods:

  • prefetch, which given a list of resources will in one call return a set of provider instances filled with the value fetched from the real resource.
  • flush will be called after all values will have been set, and that they can be persisted to the real resource.

The two providers I’ve written for this article are prefetch/flush ones, because it was impractical to call Fog for every property.

Anatomy of the dnszone provider

We’ll focus only on this provider, and I’ll leave as an exercise to the reader the analysis of the second one. Providers, being also ruby extensions, must live in the correct path respecting their ruby namespaces. For our dnszone fog provider, it should be in the puppet/provider/dnszone/fog.rb file. Unlike what I did for the types, I’ll split the provider code in parts so that I can explain them with the context. You can still browse the whole code.





This is how we tell Puppet that we have a new provider for a given type. If we decipher this, we’re fetching the dnszone type (which returns the singleton class of our dnszone type), and call the class method “provide”, passing it a name, some options and a big block. In our case, the provider is called “fog”, and our parent should be Puppet::Provider::Fog (which defines common methods for both of our fog providers, and is also a descendant of Puppet::Provider). Like for types, we have a desc class method in Puppet::Provider to store some documentation strings. We also have the confine method. This method will help Puppet choose the correct provider for a given type, ie its suitability. The confining system is managed by Puppet::Provider::Confiner. You can use:

  • a fact or puppet settings value, as in: confine :operatingsystem =&gt; :windows
  • a file existence: confine :exists =&gt; "/etc/passwd"
  • a Puppet “feature”, like we did for testing the fog library presence
  • an arbitrary boolean expression confine :true =&gt; 2 == 2

A provider can also be the default for a given fact value. This allows to make sure the correct provider is used for a given type, for instance the apt provider on debian/ubuntu platforms.

And to finish, a provider might need to call executables on the platform (and in fact most of them do). The Puppet::Provider class defines a shortcut to declare and use those executables easily:





Let’s continue our exploration of our dnszone provider





mk_resource_methods is an handy system that creates a bunch of setters/getters for every parameter/properties for us. Those fills values in the @property_hash hash.





The prefetch methods calls fog to fetch all the DNS zones, and then we match those with the ones managed by Puppet (from the resources hash).

For each match we instantiate a provider filled with the values coming from the underlying physical resource (in our case fog). For those that don’t match, we create a provider whose only existing properties is that ensure is absent.





Flush does the reverse of prefetch. Its role is to make sure the real underlying resource conforms to what Puppet wants it to be.

There are 3 possibilities:

  • the desired state is absent. We thus tell fog to destroy the given zone.
  • the desired state is present, but during prefetch we didn’t find the zone, we’re going to tell fog to create it.
  • the desired state is present, and we could find it during prefetch, in which case we’re just refreshing the fog zone.

To my knowledge this is used only for ralsh (puppet resource). The problem is that our provider can’t know how to access fog until it has a dnszone (which creates a chicken and egg problem :)

And finally we need to manage the Ensure property which requires our provider to implement: create, destroy and exists?.

In a prefetch/flush provider there’s no need to do more than controlling the ensure value.

Things to note:

  • a provider instance can access its resource with the resource accessor
  • a provider can access the current catalog through its resource.catalog accessor. This allows as I did in the dnsrr/fog.rb provider to retrieve a given resource (in this case the dnszone a given dnsrr depends to find how to access a given zone through fog).

Conclusion

We just surfaced the provider/type system (if you read everything you might disagree, though).

For instance we didn’t review the parsed file provider which is a beast in itself (the Pro Puppet book has a section about it if you want to learn how it works, the Puppet core host type is also a parsed file provider if you need a reference).

Anyway make sure to read the Puppet core code if you want to know more :) feel free to ask questions about Puppet on the puppet-dev mailing list or on the #puppet-dev irc channel on freenode, where you’ll find me under the masterzen nick.

And finally expect a little bit of time before the next episode, which will certainly cover the Indirector and how to add new terminus (but I first need to find an example, so suggestions are welcome).

Puppet Extension Points - Part 1

| Comments

It’s been a long time since my last blog post, almost a year. Not that I stopped hacking on Puppet or other things (even though I’m not as productive as I had been in the past), it’s just that so many things happened last year (Memoir’44 release, architecture work at Days of Wonder) that I lost the motivation of maintaining this blog.

But that’s over, I plan to start a series of Puppet internals articles. The first one (yes this one) is devoted to Puppet Extension Points.

Since a long time, Puppet contains a system to dynamically load ruby fragments to provide new functionalities both for the client and the master. Among the available extension points you’ll find:

  • manifests functions
  • custom facts
  • types and providers
  • faces

Moreover, Puppet contains a synchronization mechanism that allows you to ship your extensions into your manifests modules and those will be replicated automatically to the clients. This system is called pluginsync.

This first article will first dive into the ruby meta-programming used to create (some of) the extension DSL (not to be confused with the Puppet DSL which is the language used in the manifests). We’ll talk a lot about DSL and ruby meta programming. If you want to know more on those two topics, I’ll urge you to read those books:

Anatomy of a simple extension

Let’s start with the simplest form of extension: Parser Functions.

Functions are extensions of the Puppet Parser, the entity that reads and analyzes the puppet DSL (ie the manifests). This language contains a structure which is called “function”. You already use them a lot, for instance “include” or “template” are functions.

When the parser analyzes a given manifest, it detects the use of functions, and later on during the compilation phase the function code is executed and the result may be injected back into the compilation.

Here is a simple function:

The given function uses the puppet functions DSL to load the extension code into Puppet core code. This function is simple and does what its basename shell equivalent does: stripping leading paths in a given filename. For this function to work you need to drop it in the lib/puppet/parser/functions directory of your module. Why is that? It’s because after all, extensions are written in ruby and integrate into the Puppet ruby namespace. Functions in puppet live in the Puppet::Parser::Functions class, which itself belongs to the Puppet scope.

The Puppet::Parser::Functions class in Puppet core has the task of loading all functions defined in any puppet/parser/functions directories it will be able to find in the whole ruby load path. When Puppet uses a module, the modules’ lib directory is automatically added to the ruby load path. Later on, when parsing manifests and a function call is detected, the Puppet::Parser::Functions will try to load all the ruby files in all the puppet/parser/functions directory available in the ruby load path. This last task is done by the Puppet autoloader (available into Puppet::Util::Autoload). Let’s see how the above code is formed:

  • Line 1: this is ruby way to say that this file belongs to the puppet function namespace, so that Puppet::Parser::Functions will be able to load it. In real, we’re opening the ruby class Puppet::Parser::Functions, and all that will follow will apply to this specific puppet class.

  • Line 2: this is where ruby meta-programming is used. Translated to standard ruby, we’re just calling the “newfunction” method. Since we’re in the Puppet::Parser::Functions class, we in fact are just calling the class method Puppet::Parser::Functions#newfunction.

We pass to it 4 arguments:

  • the function name, encoded as a symbol. Functions name should be unique in a given environment
  • the function type: either your function is a rvalue (meaning a right-value, an entity that lies on the right side of an assignment operation, so in real English: a function that returns a value), or is not (in which case the function is just a side-effect function not returning any values).
  • a documentation string (here we used a ruby heredoc) which might be extracted later.
  • and finally we’re passing a ruby code block (from the do on line 5, to the inner end on line 10). This code block won’t be executed when puppet loads the functions.

  • Line 5 to 10. The body of the methods. When ruby loads the function file on behalf of Puppet, it will happily pass the code block to newfunction. This last one will store the code block for later use, and make it available in the Puppet scope class under the name function_basename (that’s one of the cool thing about ruby, you can arbitrarily create new methods on classes, objects or even instances).

So let’s see what happens when puppet parses and executes the following manifest:

The first thing that happens when compiling manifests is that the Puppet lexer triggers. It will read the manifest content and split it in tokens that the parser knows. So essentially the above content will be transformed in the following stream of tokens:

The parser, given this input, will reduce this to what we call an Abstract Syntax Tree. That’s a memory data structure (usually a tree) that represents the orders to be executed that was derived from the language grammar and the stream of tokens. In our case this will schematically be parsed as:

In turns, when puppet will compile the manifest (ie execute the above AST), this will be equivalent to this ruby operation:

Remember how Puppet::Parser::Functions#newfunction created the function_basename. At that time I didn’t really told you the exact truth. In fact newfunction creates a function in an environment specific object instance (so that functions can’t leak from one Puppet environment to another, which was one of the problem of 0.25.x). And any given Puppet scope which are instances of Puppet::Parser::Scope when constructed will mix in this environment object, and thus bring to life our shiny function as if it was defined in the scope ruby code itself.

Pluginsync

Let’s talk briefly about the way your modules extensions are propagated to the clients. So far we’ve seen that functions live in the master, but some other extensions types (like facts or types) essentially live in the client. Since it would be cumbersome for an admin to replicate all the given extensions to all the clients manually, Puppet offers pluginsync, a way to distribute this ruby code to the clients. It’s part of every puppet agent run, before asking for a catalog to the master. The interesting thing (and that happens in a lot of place into Puppet, which always amazes me), is that this pluginsync process is using Puppet itself to perform this synchronization. Puppet is good at synchronizing remotely and recursively a set of files living on the master. So pluginsync just create a small catalog containing a recursive File resource whose source is the plugins fileserver mount on the master, and the destination the current agent puppet lib directory (which is part of the ruby load path). Then this catalog is evaluated and the Puppet File resource mechanism does its magic and creates all the files locally, or synchronizes them if they differ. Finally, the agent loads all the ruby files it synchronized, registering the various extensions it contains, before asking for its host catalog.

Wants some facts?

The other extension point that you certainly already encountered is adding custom facts. A fact is simply a key, value tuple (both are strings). But we also usually call a fact the method that dynamically produces this tuple. Let’s see what it does internally. We’ll use the following example custom fact:





It’s no secret that Puppet uses Facter a lot. When a puppet agent wants a catalog, the first thing it does is asking Facter for a set of facts pertaining to the current machine. Then those facts are sent to the master when the agent asks for a catalog. The master injects those facts as variables in the root scope when compiling the manifests.

So, facts are executed in the agent. Those are pluginsync’ed as explained above, then loaded into the running process.

When that happens the add method of the Facter class is called. The block defined between line 2 and 6 is then executed in the Facter::Util::Resolution context. So the Facter::Util::Resolution#setcode method will be called and the block between line 3 and 5 will be stored for later use.

This Facter::Util::Resolution instance holding our fact code will be in turn stored in the facts collection under the name of the fact (see line 2).

Why is it done in this way? Because not all facts can run on every hosts. For instance our above facts does not work on Windows platform. So we should use facter way of confining our facts to architectures on which we know they’ll work. Thus Facter defines a set of methods like “confine” that can be called during the call of Facter#add (just add those outside of the setcode block).  Those methods will modify how the facts collection will be executed later on. It wouldn’t have been possible to confine our facts if we stored the whole Facter#add block and called it directly at fact resolution, hence the use of this two-steps system.

Conclusion

And, that’s all folks for the moment. Next episode will explain types and providers inner workings. I also plan an episode about other Puppet internals, like the parser, catalog evaluation, and/or the indirector system.

Tell me (though comments here or through my twitter handle @masterzen) if you’re interested in this kind of Puppet stuff, or if there are any specific topics you’d like me to cover :)

Puppet SSL Explained

| Comments

The puppet-users or #puppet freenode irc channel is full of questions or people struggling about the puppet SSL PKI. To my despair, there are also people wanting to completely get rid of any security.

While I don’t advocate the live happy, live without security motto of some puppet users (and I really think a corporate firewall is only one layer of defense among many, not the ultimate one), I hope this blog post will help them shoot themselves in their foot :)

I really think SSL or the X509 PKI is simple once you grasped its underlying concept. If you want to know more about SSL, I really think everybody should read Eric Rescola’s excellent “SSL and TLS: Designing and Building Secure Systems”.

I myself had to deal with SSL internals and X509 PKI while I implemented a java secured network protocol in a previous life, including a cryptographic library.

Purpose of Puppet SSL PKI

The current puppet security layer has 3 aims:

  1. authenticate any node to the master (so that no rogue node can get a catalog from your master)
  2. authenticate the master on any node (so that your nodes are not tricked into getting a catalog from a rogue master).
  3. prevent communication eavesdropping between master and nodes (so that no rogue users can grab configuration secrets by listening to your traffic, which is useful in the cloud)

A notion of PKI

PKI means: Public Key Infrastructure. But whats this?

A PKI is a framework of computer security that allows authentication of individual components based on public key cryptography. The most known system is the x509 one which is used to protect our current web.

A public key cryptographic system works like this:

  • every components of the system has a secret key (known as the private key) and a public key (this one can be shared with other participant of the system). The public and private keys are usually bound by a cryptographic algorithm.
  • authentication of any component is done with a simple process: a component signs a message with its own private key. The receiver can authenticate the message (ie know the message comes from the original component) by validating the signature. To do this, only the public key is needed.

There are different public/private key pair cryptosystem, the most known ones are RSA, DSA or those based on Elliptic Curve cryptography.

Usually it is not good that all participants of the system must know each other to communicate. So most of the current PKI system use a hierarchical validation system, where all the participant in the system must only know one of the parent in the hierarchy to be able to validate each others.

X509 PKI

X509 is an ITU-T standard of a PKI. It is the base of the SSL protocol authentication that puppet use. This standard specifies certificates, certificate revocation list, authority and so on…

A given X509 certificate contains several information like those:

  • Serial number (which is unique for a given CA)
  • Issuer (who created this certificate, in puppet this is the CA)
  • Subject (who this certificate represents, in puppet this is the node certname or fqdn)
  • Validity (valid from, expiration date)
  • Public key (and what kind of public key algorithm has been used)
  • Various extensions (usually what this certificate can be used for,…)

You can check RFC1422 for more details.

The certificate is usually the DER encoding of the ASN.1 representation of those informations, and is usually stored as PEM for consumption.

A given X509 certificate is signed by what we call a Certificate Authority (CA for short). A CA is an infrastructure that can sign new certificates. Anyone sharing the public key of the CA can validate that a given certificate has been validated by the CA.

Usually X509 certificate embeds a RSA public key with an exponent of 0x100001 (see below).  Along with a certificate, you need a private key (usually also PEM-encoded).

So basically the X509 system works with the following principle: CA are using their own private keys to sign components certificates, it is the CA role to sign only trusted component certificates. The trust is usually established out-of-bound of the signing request.

Then every component in the system knows the CA certificate (ie public key). If one component gets a message from another component, it checks the attached message signature with the CA certificate. If that validates, then the component is authenticated. Of course the component should also check the certificate validity, if the certificate has been revoked (from OCSP or a given CRL), and finally that the certificate subject matches who the component pretends to be (usually this is an hostname validation against some part of the certificate Subject)

RSA system

Most of X509 certificate are based on the RSA cryptosystem, so let’s see what it is.

The RSA cryptosystem is a public key pair system that works like this:

Key Generation

To generate a RSA key, we chose two prime number p and q.

We compute n=pq. We call n the modulus.

We compute φ(pq) = (p − 1)(q − 1).

We chose e so that e>1 and e<φ(pq) (e and φ(pq) must be coprime). e is called the exponent. It usually is 0x10001 because it greatly simplifies the computations later (and you know what I mean if you already implemented this :)).

Finally we compute d=e^-1 mod((p-1)(q-1)). This will be our secret key. Note that it is not possible to get d from only e (and since p and q are never kept after the computation this works).

In the end:

  • e and n form the public key
  • d is our private key

Encryption

So the usual actors when describing cryptosystems are Alice and Bob. Let’s use them.

Alice wants to send a message M to Bob. Alice knows Bob’s public key (e,n). She transform M in a number < n (this is called padding) that we’ll call m, then she computes: _c = me . mod(n) _

Decryption

When Bob wants to decrypt the message, he computes with his private key d: m = cd . mod(n)

Signing message

Now if Alice wants to sign a message to Bob. She first computes a hash of her message called H, then she computes: s = Hd mod n. So she used her own private key. She sends both the message and the signature.

Bob, then gets the message computes H and computes h’ = He mod n with Alice’s public key. If h’ = h, then only Alice could have sent it.

Security

What makes this scheme work is the fundamental that finding p and q from n is a hard problem (understand for big values of n, it would take far longer than the validity of the message). This operation is called factorization. Current certificate are numbers containing  2048 bits, which roughly makes a 617 digits number to factor.

Want to know more?

Then there are a couple of books really worth reading:

How does this fit in SSL?

So SSL (which BTW means Secure Socket Layer) and now TLS (SSL successor) is a protocol that aims to provide security of communications between two peers. It is above the transport protocol (usually TCP/IP) in the OSI model. It does this by using symmetric encryption and message authentication code (MAC for short). The standard is (now) described in RFC5246.

It works by first performing an handshake between peers. Then all the remaining communications are encrypted and tamperproof.

This handshake contains several phases (some are optional):

  1. Client and server finds the best encryption scheme and MAC from the common list supported by both the server and the clients (in fact the server choses).
  2. The server then sends its certificate and any intermediate CA that the client might need
  3. The server may ask for the client certificate. The client may send its certificate.
  4. Both peers may validate those certificates (against a common CA, from the CRL, etc…)
  5. They then generate the session keys. The client generates a random number, encrypts it with the server public key. Only the server can decrypt it. From this random number, both peers generate the symmetric key that will be used for encryption and decryption.
  6. The client may send a signed message of the previous handshake message. This way the server can verify the client knows his private key (this is the client validation). This phase is optional.

After that, each message is encrypted with the generated session keys using a symmetric cipher, and validated with an agreed on MAC. Usual symmetric ciphers range from RC4 to AES. A symmetric cipher is used because those are usually way faster than any asymmetric systems.

Application to Puppet

Puppet defines it’s own Certificate Authority that is usually running on the master (it is possible to run a CA only server, for instance if you have more than one master).

This CA can be used to:

  • generate new certificate for a given client out-of-bound
  • sign a new node that just sent his Certificate Signing Request
  • revoke any signed certificate
  • display certificate fingerprints

What is important to understand is the following:

  • Every node knows the CA certificate. This allows to check the validity of the master from a node
  • The master doesn’t need the node certificate, since it’s sent by the client when connecting. It just need to make sure the client knows the private key and this certificate has been signed by the master CA.

It is also important to understand that when your master is running behind an Apache proxy (for Passenger setups) or Nginx proxy (ie some mongrel setups):

  • The proxy is the SSL endpoint. It does all the validation and authentication of the node.
  • Traffic between the proxy and the master happens in clear
  • The master knows the client has been authenticated because the proxy adds an HTTP header that says so (usually X-Client-Verify for Apache/Passenger).

When running with webrick, webrick runs inside the puppetmaster process and does all this internally. Webrick tells the master internally if the node is authenticated or not.

When the master starts for the 1st time, it generates its own CA certificate and private key, initializes the CRL and generates a special certificate which I will call the server certificate. This certificate will be the one used in the SSL/TLS communication as the server certificate that is later sent to the client. This certificate subject will be the current master FQDN. If your master is also a client of itself (ie it runs a puppet agent), I recommend using this certificate as the client certificate.

The more important thing is that this server certificate advertises the following extension:

1
2
X509v3 Subject Alternative Name:
                DNS:puppet, DNS:$fqdn, DNS:puppet.$domain

What this means is that this certificate will validate if the connection endpoint using it has any name matching puppet, the current fqdn or puppet in the current domain.

By default a client tries to connect to the “puppet” host (this can be changed with —server which I don’t recommend and is usually the source of most SSL trouble).

If your DNS system is well behaving, the client will connect to puppet.$domain. If your DNS contains a CNAME for puppet to your real master fqdn, then when the client will validate the server certificate it will succeed because it will compare “puppet” to one of those DNS: entries in the aforementioned certificate extension. BTW, if you need to change this list, you can use the —certdnsname option (note: this can be done afterward, but requires to re-generate the server certificate).

The whole client process is the following:

  1. if the client runs for the 1st time, it generates a Certificate Signing Request and a private key. The former is an x509 certificate that is self-signed.
  2. the client connects to the master (at this time the client is not authenticated) and sends its CSR, it will also receives the CA certificate and the CRL in return.
  3. the master stores locally the CSR
  4. the administrator checks the CSR and can eventually sign it (this process can be automated with autosigning). I strongly suggest verifying certificate fingerprint at this stage.
  5. the client is then waiting for his signed certificate, which the master ultimately sends
  6. All next communications will use this client certificate. Both the master and client will authenticate each others by virtue of sharing the same CA.

Tips and Tricks

Troubleshooting SSL

Certificate content

First you can check any certificate content with this:

Simulate a SSL connection

You can know more information about a SSL error by simulating a client connection. Log in the trouble node and:

Check the last line of the report, it should say “Verify return code: 0 (ok)” if both the server and client authenticated each others. Check also the various information bits to see what certificate were sent. In case of error, you can learn about the failure by looking that the verification error message.

ssldump

Using ssldump or wireshark you can also learn more about ssl issues. For this to work, it is usually needed to force the cipher to use a simple cipher like RC4 (and also ssldump needs to know the private keys if you want it to decrypt the application data).

Some known issues

Also, in case of SSL troubles make sure your master isn’t using a different $ssldir than what you are thinking. If that happens, it’s possible your master is using a different dir and has regenerated its CA. If that happens no one node can connect to it anymore. This can happen if you upgrade a master from gem when it was installed first with a package (or the reverse).

If you regenerate a host, but forgot to remove its cert from the CA (with puppetca —clean), the master will refuse to sign it. If for any reason you need to fully re-install a given node without changing its fqdn, either use the previous certificate or clean this node certificate (which will automatically revoke the certificate for your own security).

Looking to the CRL content:

Notice how the certificate serial number 3 has been revoked.

Fingerprinting

Since puppet 2.6.0, it is possible to fingerprint certificates. If you manually sign your node, it is important to make sure you are signing the correct node and not a rogue system trying to pretend it is some genuine node. To do this you can get the certificate fingerprint of a node by running puppet agent —fingerprint, and when listing on the master the various CSR, you can make sure both fingerprint match.

Dirty Trick

Earlier I was saying that when running with a reverse proxy in front of Puppet, this one is the SSL endpoint and it propagates the authentication status to Puppet.

I strongly don’t recommend implementing the following. This will compromise your setup security.

This can be used to severely remove Puppet security for instance you can:

  • make so that every nodes are authenticated for the server by always returning the correct header
  • make so that nodes are authenticated based on their IP addresses or fqdn

You can even combine this with a mono-certificate deployment. The idea is that every node share the same certificate. This can be useful when you need to provision tons of short-lived nodes. Just generate on your master a certificate:

You can then use those generated certificate (which will end up in /var/lib/puppet/ssl/certs and /var/lib/puppet/private_keys) in a pre-canned $ssldir, provided you rename it to the local fqdn (or symlink it). Since this certificate is already signed by the CA, it is valid. The only remaining issue is that the master will serve the catalog of this certificate certname. I proposed a patch to fix this, this patch will be part of 2.6.3. In this case the master will serve the catalog of the given connecting node and not the connecting certname. Of course you need a relaxed auth.conf:

Caveat: I didn’t try, but it should work. YMMV :)

Of course if you follow this and shoot yourself in the foot, I can’t be held responsible for any reasons, you are warned. Think twice and maybe thrice before implementing this.

Multiple CA or reusing an existing CA

This goes beyond the object of this blog post, and I must admit I never tried this. Please refer to: Managing Multiple Certificate Authorities and  Puppet Scalability

Conclusion

If there is one: security is necessary when dealing with configuration management. We don’t want any node to trust rogue masters, we don’t want masters to distribute sensitive configuration data to rogue nodes. We even don’t want a rogue user sharing the same network to read the configuration traffic. Now that you fully understand SSL, and the X509 PKI, I’m sure you’ll be able to design some clever attacks against a Puppet setup :)

Benchmarking Puppetmaster Stacks

| Comments

It’s been a long time since my last puppet blog post about file content offloading. Two puppetcamps even passed (more on the last one in a next blog article). A new major puppet release (2.6) was even released, addressing lots of performance issues (including the file streaming patch I contributed).

In this new major version, I contributed a new 3rd party executable (available in the ext/ directory in the source tree) that allows to simulate concurrent nodes hammering a puppetmaster. This tool is called puppet-load.

Rationale

I created this tool for several reasons:

  • I wanted to be able to benchmark and compare several ruby interpreter (like comparing JRuby against MRI)
  • I wanted to be able to benchmark and compare several deployements solutions (like passenger against mongrel)

There was already a testing tool (called puppet-test) that could do that. Unfortunately puppet-test had the following issues:

  • No REST support besides some never merged patches I contributed, which render it moot to test 0.25 or 2.6 :(
  • based on a forking process models, so simulating many clients is not resource friendly
  • it consumes the master response and fully unserializes it creating puppet internals objects, which takes plenty of RAM and time, penalizing the concurrency.
  • no useful metrics, except the time the operation took (which was in my test mostly dominated by the unserialization of the response)

Based on those issues, I crafted from scratch a tool that:

  • is able to impose an high concurrency to a puppetmaster, because it is based on EventMachine (no threads or processes are harmed in this program)
  • is lightweight because it doesn’t consume puppet responses
  • is able to gather some (useful or not) metrics and aggregates them

Caveats

For the moment, puppet-load is still very new and only supports catalog compilations for a single node (even though it simulates many clients in parallel requesting this catalog). I just released a patch to support multiple node catalogs. I also plan to support file sourcing in the future.

So far, since puppet-load exercise a puppetmaster in such a hard way, achieving concurrencies nobody has seen on production puppetmasters, we were able to find and fix half a dozen threading race condition bugs in the puppet code (some have been fixed in 2.6.1 and 2.6.2, the others will soon be fixed).

Usage

The first thing to do is to generate a certificate and its accompanying private key:

Then modify your auth.conf (or create one if you don’t have one) to allow puppet-load to compile catalos. Unfortunately until #5020 is merged, the puppetmaster will use the client certname as the node to compile instead of the given URI. Let’s pretend your master has the patch #5020 applied (this is a one-liner).

Next, we need the facts of the client we’ll simulate. Puppet-load will overwrite the ‘fqdn’, ‘hostname’ and ‘domain’ facts with values inferred from the current node name.

Then launch puppet-load against a puppet master:

If we try with an higher concurrency (here my master is running under webrick with a 1 resource catalog, so compilations are extremely fast):

It returns a bunch of informations. First if you ran it in debug mode, it would have printed when it would start simulated clients (up to the given concurrency) and when it receives the response.

Then it displays some important information:

  • availability %: which is the percent of non-error response it received
  • min and max request time
  • average and median request time (this can be used to see if the master served clients in a fair way)
  • real concurrency: how many clients the master was able to serve in parallel
  • transaction rate: how many compilation per seconds the master was able to perform (I expect this number to vary in function of applied concurrency)
  • various transfer metrics like throughput and catalog size transferred: this can be useful to understand the amount of information transferred to every clients (hint: puppet 2.6 and puppet-load both support http compression)

At last puppetcamp, Jason Wright from Google, briefly talked about puppet-load (thanks Jason!). It was apparently already helpful to diagnose performance issues in his External Node Tool classifier.

If you also use puppet-load, and/or have ideas on how to improve it, please let me know! If you have interesting results to share like comparison of various puppet master stacks, let me know!

The Definitive Recipe for Wordpress Gengo to WPML Conversion

| Comments

The Days of Wonder News Center is running Wordpress which until a couple of days used Gengo for multilingual stuff. Back when we started using Wordpress for our news, we wanted to be able to have those in three (and maybe more) languages.

At that time (in 2007, wordpress 2.3), only Gengo was available. During the last years, Gengo was unfortunately not maintained anymore, and it was difficult to upgrade Wordpress to new versions.

Recently we took the decision to upgrade our Wordpress installation, and at the same time ditch Gengo and start over using WPML, which is actively maintained (and looks superior to Gengo).

So, I started thinking about the conversion, then looked on the web and  found how to convert posts, with the help of those two blog posts:

Those two posts were invaluable for the conversion of posts, but unfortunately nobody solved the conversion of translated categories… until I did :)

So here is the most complete recipe to convert from Gengo 2.5 to WPML 1.8, with updated and working SQL requests.

Pre-requisites

You might want to stop the traffic to your blog during all this procedure. One way to do that is to return an HTTP error code 503 by modifying your Apache/Nginx/Whatever configuration.

  1. Log-in as an administrator in the Wordpress back-end, and deactivate Gengo.
  2. Install WPML 1.8, and activates it to create the necessary tables. I had to massage WPML a little bit to let it create the tables, YMMV.
  3. In the WPML settings, define the same languages as in Gengo (in my case English (primary), French and German)
  4. Finish the WPML configuration.
  5. If you had a define(WP_LANG,…) in your wordpress config, get rid of it.

Converting Posts

Connect to your MySQL server and issue the following revised SQL requests (thanks for the above blog posts for them):

Converting Pages

This is the same procedure, except we track ‘post_page’ instead of ‘post_post’:

Category conversion

This part is a little bit tricky. In Gengo, we translated the categories without creating new categories, but in WPML we have to create new categories that would be translations of a primary category. To do this, I created the following SQL procedure that simplifies the creation of a translated category:

Then we need to create translated categories with this procedure (this can be done with the Wordpress admin interface, but if you have many categories it is simpler to do this with a bunch of SQL statements):

Bind translated categories to translated posts

And this is the last step, we need to make sure our posts translations have the correct translated categories (for the moment they use the English primary categories).

To do this, I created the following SQL request:

The request is in two parts. The first one will list all the French translations posts IDs that we will report in the second request to update the categories links.

More Puppet Offloading

| Comments

Puppet really shines at configuration management, but there are some things it is not good at, for instance file sourcing of large files, or managing deep hierarchies.

Fortunately, most of this efficiency issues will be addressed in a subsequent major version (thanks to some of my patches and other refactorings).

Meanwhile it is interesting to work-around those bugs. Since most of us are running our masters as part of a more complete stack and not isolated, we can leverage the power of this stack to address some of the issues.

In this article, I’ll expose two techniques to help your overloaded masters to serve more and more clients.

Offloading file sourcing

I already talked about offloading file sourcing in a previous blog post about puppet memory consumption. Here the idea is to prevent our puppetmasters to read the whole content of files in memory at once to serve them. Most of the installation of puppetmasterd out there are behind an http reverse proxy of some sort (ie Apache or Nginx).

The idea is that file serving is an activity that a small static server is better placed to do than puppet itself (that might change when #3373 will be fully addressed). Note: I produced an experimental patch pending review to stream puppet file sourcing on the client side, which this tip doesn’t address.

So I did implement this in Nginx (which is my favorite http server of course, but that can be ported to any other webserver quite easily, which is an exercise left to the reader):

And if you use multiple module paths (for instance to separate common modules to other modules), it is still possible to use this trick with some use of nginx try_files directive.

The try_files directive allows puppet to try several physical path (the first matching one will be served), and if none match you can use the generic location that proxies to the master which certainly will know what to do.

Something that can be useful would be to create a small script to generate the nginx config from your fileserver.conf and puppet.conf. Since mine is pretty easy, I did it manually.

Optimize Catalog Compilation

The normal process of puppet is to contact the puppetmaster at some time interval asking for a catalog. The catalog is a byproduct of the compilation of the parsed manifests in which are injected the node facts. This operation takes some times depending on the manifest complexity and the server capacity or current load.

Most of the time an host requires a catalog while the manifests didn’t change at all. In my own infrastructure I rarely change my manifests once a kind of host become stable (I might do a change every week at most when in production).

Since 0.25, puppet is now fully RESTful, that means to get a catalog puppetd contacts the master under its SSL protected links and asks for this url:

In return the puppetmaster responds by a json-encoded catalog. The actual compilation of a catalog for one of my largest host takes about 4s (excluding storeconfigs). During this 4s one ruby thread inside the master is using the CPU. And this is done once every 30 minutes, even if the manifests don’t change.

What if we could compile only when something changes? This would really free our masters!

Since puppet uses HTTP, it is easy to add a front-most HTTP cache in front of our master to actually cache the catalog the first time it is compiled and serve this one on the subsequent requests.

Although we can do it with any HTTP Cache (ie Varnish), this is really easy to add this with Nginx (which is already running in my own stack):

Puppet currently doesn’t return any http caching headers (ie Cache-Control or Expires), so we use nginx ability to cache despite it (see proxy_cache_valid). Of course I have a custom puppet branch that introduces a new parameter called —catalog_ttl which allows puppet to set those cache headers.

One thing to note is that the cache expiration won’t coincide with when you change your manifests. So we need some ways to purge the cache when you deploy new manifests.

With Nginx this can be done with:

It’s easy to actually add one of those methods to any svn hook or git post-receive hook so that deploying manifests actually purge the cache.

Note: I think that ReductiveLabs has some plan to add catalog compilation caching directly to Puppet (which would make sense). This method is the way to go before this features gets added to Puppet. I have no doubt that caching inside Puppet will be much better than outside caching, mainly because Puppet would be able to expire the cache when the manifests change.

There a few caveats to note:

  • any host with a valid certificate can request another cached catalog, unlike with the normal puppetmaster which makes sure to serve catalogs only to the correct host. It’s something that can be a problem for some configurations
  • if your manifests rely on “dynamic” facts (like uptime or free memory), obviously you shouldn’t cache the catalog at all.
  • the above nginx configuration doesn’t include the facts as part of the cache key. That means the catalog won’t be re-generated when any facts change and the cached catalog will always be served. If that’s an issue, you need to purge the cache when the host itself change.

I should also mention that caching is certainly not the panacea of reducing the master load.

Some other people are using clever methods to smooth out master load. One notable example is the MCollective puppet scheduler, R.I Pienaar has written. In essence he wrote a puppet run scheduler running on top of MCollective that schedule puppet runs (triggered through MCollective) when the master load is appropriate. This allows for the best use of the host running the master.

If you also have some tricks or tips for running puppet, do not hesitate to contact me (I’m masterzen on freenode’s #puppet or @masterzen on twitter).

Puppet Memory Usage - Not a Fatality

| Comments

As every reader of this blog certainly know, I’m a big fan of Puppet, using it in production on Days of Wonder servers, up to the point I used to contribute regularly bug fixes and new features (not that I stopped, it’s just that my spare time is a scarce resource nowadays).

Still, I think there are some issues in term of scalability or resource consumption (CPU or memory), for which we can find some workarounds or even fixes. Those issues are not a symptom bad programming or bad design. No, most of the issues come either from ruby itself or some random library issues.

Let’s review the things I have been thinking about lately.

Memory consumption

This is by far one of the most seen issues both on the client side and the server side. I’ve mainly seen this problem on the client side, up to the point that most people recommend running puppetd as cronjobs, instead of being a long lived process.

Ruby allocator

All boils down to the ruby (at least the the MRI 1.8.x version) allocator. This is the part in the ruby interpreter that deals with memory allocations. Like in many dynamic languages, the allocator manages a memory pool that is called a heap. And like some other languages (among them Java), this heap can never shrink and always grows when more memory is needed. This is done this way because it is simpler and way faster. Usually applications ends using their nominal part of memory and no more memory has to be allocated by the kernel to the process, which gives faster applications.

The problem is that if the application needs transiently a high amount of memory that will be trashed a couple of millisecond after, the process will pay this penalty all its life, even though say 80% of the memory used by the process is free but not reclaimed by the OS.

And it’s even worst. The ruby interpreter when it grows the heap, instead of allocating bytes per bytes (which would be really slow) does this by chunk. The whole question is what is the proper size of a chunk?

In the default implementation of MRI 1.8.x, a chunk is the size of the previous heap times 1.8. That means at worst a ruby process might end up allocating 1.8 times more than what it really needs at a given time. (This is a gross simplification, read the code if you want to know more).

Yes but what happens in Puppet?

So how does it apply to puppetd?

It’s easy, puppetd uses memory for two things (beside maintaining some core data to be able to run):

  1. the catalog (which contains all resources, along with all templates) as shipped by the puppetmaster (i.e. serialized) and live as ruby objects.
  2. the content of the sourced files (one at a time, so it’s the biggest transmitted file that imposes it’s high watermark for puppetd). Of course this is still better than in 0.24 where the content was transmitted encoded in XMLRPC adding the penalty of escaping everything…

Hopefully, nobody distributes large files with Puppet :–) If you’re tempted to do so, see below…

But again there’s more, as Peter Meier (known as duritong in the community) discovered a couple of month ago: when puppetd gets its catalog (which by the way is transmitted in json nowadays), it also stores it as a local cache to be able to run if it can’t contact the master for a subsequent run. This operation is done by unserializing the catalog from json to ruby live objects, and then serializing the laters to YAML. Beside the evident loss of time to do that on large catalog, YAML is a real memory hog. Peter’s experience showed that about 200MB of live memory his puppetd process was using came from this final serialization!

So I had the following idea: why not store the serialized version of the catalog (the json one) since we already have it in a serialized form when we receive it from the master (it’s a little bit more complex than that of course). This way no need to serialize it again in YAML. This is what ticket #2892 is all about. Luke is committed to have this enhancement in Rowlf, so there’s good hope!

Some puppet solutions?

So what can we do to help puppet not consume that many memory?

In theory we could play on several factors:

  • Transmit smaller catalogs. For instance get rid of all those templates you love (ok that’s not a solution)
  • Stream the serialization/deserialization with something like Yajl-Ruby
  • Use another ruby interpreter with a better allocator (like for instance JRuby)
  • Use a different constant for resizing the heap (ie replace this 1.8 by 1.0 or less on line 410 of gc.c). This can be done easily when using Rails machine GC patches or Ruby Enterprise Edition, in which case setting the environment variable RUBY_HEAP_SLOTS_GROWTH_FACTOR is enough. Check the documentation for more information.
  • Stream the sourced file on the server and the client (this way only a small buffer is used, and the total size of the file is never allocated). This one is hard.

Note that the same issues apply to the master too (especially for the file serving part). But it’s usually easier to run a different ruby interpreter (like REE) on the master than on all your clients.

Streaming HTTP requests is promising but unfortunately would require large change to how Puppet deals with HTTP. Maybe it can be done only for file content requests… This is something I’ll definitely explore.

This file serving thing let me think about the following which I already discussed several time with Peter…

File serving offloading

One of the mission of the puppetmaster is to serve sourced file to its clients. We saw in the previous section that to do that the master has to read the file in memory. That’s one reason it is recommended to use a dedicated puppetmaster server to act as a pure fileserver.

But there’s a better way, provided you run puppet behind nginx or apache. Those two proxies are also static file servers: why not leverage what they do best to serve the sourced files and thus offload our puppetmaster?

This has some advantages:

  • it frees lots of resources on the puppetmaster, so that they can serve more catalogs by unit time
  • the job will be done faster and by using less resources. Those static servers have been created to spoon-feed our puppet clients…

In fact it was impossible in 0.24.x, but now that file content serving is RESTful it becomes trivial.

Of course offloading would give its best if your clients requires lots of sourced files that change often, or if you provision lots of new hosts at the same time because we’re offloading only content, not file metadata. File content is served only if the client hasn’t the file or the file checksum on the client is different.

An example is better than thousand words

Imagine we have a standard manifest layout with:

  • some globally sourced files under /etc/puppet/files and
  • some modules files under /etc/puppet/modules//files.

Here is what would be the nginx configuration for such scheme:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
server {
    listen 8140;

    ssl                     on;
    ssl_session_timeout     5m;
    ssl_certificate         /var/lib/puppet/ssl/certs/master.pem;
    ssl_certificate_key     /var/lib/puppet/ssl/private_keys/master.pem;
    ssl_client_certificate  /var/lib/puppet/ssl/ca/ca_crt.pem;
    ssl_crl                 /var/lib/puppet/ssl/ca/ca_crl.pem;
    ssl_verify_client       optional;

    root                    /etc/puppet;

    # those locations are for the "production" environment
    # update according to your configuration

    # serve static file for the [files] mountpoint
    location /production/file_content/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        alias /etc/puppet/files/;
    }

    # serve modules files sections
    location ~ /production/file_content/[^/]+/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        root /etc/puppet/modules;
        # rewrite /production/file_content/module/files/file.txt
        # to /module/file.text
        rewrite ^/production/file_content/([^/]+)/files/(.+)$  $1/$2 break;
    }

    # ask the puppetmaster for everything else
    location / {
        proxy_pass          http://puppet-production;
        proxy_redirect      off;
        proxy_set_header    Host             $host;
        proxy_set_header    X-Real-IP        $remote_addr;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header    X-Client-Verify  $ssl_client_verify;
        proxy_set_header    X-SSL-Subject    $ssl_client_s_dn;
        proxy_set_header    X-SSL-Issuer     $ssl_client_i_dn;
        proxy_buffer_size   16k;
        proxy_buffers       8 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;
        proxy_read_timeout  65;
    }
}

EDIT: the above configuration was missing the only content-type that nginx can return for Puppet to be able to actually receive the file content (that is raw).

I leave as an exercise to the reader the apache configuration.

It would also be possible to write some ruby/sh/whatever to generate the nginx configuration from the puppet fileserver.conf file.

And that’s all folks, stay tuned for more Puppet (or even different) content.

Mysql-snmp 1.0 - SNMP Monitoring for MySQL

| Comments

I’m really proud to announce the release of the version 1.0 of mysql-snmp.

What is mysql-snmp?

mysql-snmp is a mix between the excellent MySQL Cacti Templates and a Net-SNMP agent. The idea is that combining the power of the MySQL Cacti Templates and any SNMP based monitoring would unleash a powerful mysql monitoring system. Of course this project favorite monitoring system is OpenNMS.

mysql-snmp is shipped with the necessary OpenNMS configuration files, but any other SNMP monitoring software can be used (provided you configure it).

To get there, you need to run a SNMP agent on each MySQL server, along with mysql-snmp. Then OpenNMS (or any SNMP monitoring software) will contact it and fetch the various values.

Mysql-snmp exposes a lot of useful values including but not limited to:

  • SHOW STATUS values
  • SHOW ENGINE INNODB STATUS parsed values (MySQL 5.0, 5.1, XtraDB or Innodb plugin are supported)

Here are some graph examples produced with OpenNMS 1.6.5 and mysql-snmp 1.0 on one of Days of Wonder MySQL server (running a MySQL 5.0 Percona build):

commands

mem

tmp

innodbwrites

graph

tablelocks

Where to get it

mysql-snmp is available in my github repository. The repository contains a spec file to build a RPM and what is needed to build a Debian package. Refer to the README or the mysql-snmp page for more information.

Thanks to gihub, it is possible to download the tarball instead of using Git:

Mysql-snmp v1.0 tarball

Changelog

This lists all new features/options from the initial version v0.6:

  • Spec file to build RPM
  • Use of configuration file for storing mysql password
  • Fix of slave handling
  • Fix for mk-heartbeat slave lag
  • Support of InnoDB plugin and Percona XtraDB
  • Automated testing of InnoDB parsing
  • Removed some false positive errors
  • OpenNMS configuration generation from MySQL Cacti Templates core files
  • 64 bits computation done in Perl instead of (ab)using MySQL
  • More InnoDB values (memory, locked tables, …)

Reporting Issues

Please use Github issue system to report any issues.

Requirements

There is a little issue here. mysql-snmp uses Net-Snmp. Not all versions of Net-Snmp are supported as some older versions have some bug for dealing with Counter64. Version 5.4.2.1 with this patch is known to work fine.

Also note that this project uses some Counter64, so make sure you configure your SNMP monitoring software to use SNMP v2c or v3 (SNMP v1 doesn’t support 64 bits values).

Final words!

I wish everybody an happy new year. Consider this new version as my Christmas present to the community :–)

Nginx Upload Progress Module v0.8!

| Comments

Yes, I know… I released v0.7 less than a month ago. But this release was crippled by a crash that could happen at start or reload.

Changes

Bonus in this new version, brought to you by Tizoc:

  • JSONP support
  • Long awaited fix for X-Progress-ID to be the last parameter in the request parameter

If you wonder what JSONP is (as I did when I got the merge request), you can check the original blog post that lead to it.

To activate JSONP you need:

  1. to use the upload_progress_jsonp_output in the progress probe location
  2. declare the JSONP parameter with the upload_progress_jsonp_parameter

This version has been tested with 0.7.64 and 0.8.30.

How do you get it?

Easy, download the tarball from the nginx upload progress module github repository download section.

If you want to report a bug, please use the Github issue section.