Outline

  • Problem statement - what are we trying to solve
  • Solution space - puppet (with and without master), cfengine etc.
  • Implementation
    • Git and bootstrapping
    • Authentication and monkeysphere
    • Monitoring
    • Case study: Supplying entropy
  • Future work
    • Secret server - maintaining secret information in an open infrastructure
    • Automated configuration of backups
    • Incorporating existing infrastructure
  • Pre-practical  monkeysphere setup. Actual practical: git repository setup.

The problem

The DTG has a large and slowly growing number of servers and services which get configured and then forgotten about until many years later, frequently when the person who originally set them up has left. In that time the server becomes progressively outdated and drops out of OS support potentially resulting in security issues. Similarly when things go wrong it is difficult to gain access to the machine and without any monitoring it can be a long time before we realise that something has gone wrong.

Solution space

There are a variety of solutions which allow us to do what we want to do and keep all the configuration for all the servers in one place and use standard software engineering practices like version control and code reuse. Most of them seem to use ruby.

One popular solution is puppet, in its default configuration it has a master server which all the nodes talk to. The master compiles the configuration and serves the compiled rules to the node which executes them. Unfortunately this involves running a ruby webapp with root privileges on all your nodes and a root ruby daemon on all the nodes. The security issues with this are not just theoretical - I have seen four separate critical security vulnerability disclosures since February.

The solution to this is to move to a fully distributed approach using git and ssh where the configuration for all the nodes gets pushed to all the nodes individually as required and is then automatically compiled and applied on the node. This is achieved by pushing to /etc/puppet-bare which contains a post-update hook which pushes the config to /etc/puppet and then applies it.

Implementation

Our puppet configuration is stored on  code.dtg and on  github and so can be cloned with git clone git@code.dtg.cl.cam.ac.uk:infrastructure/dtg-puppet this is a rather complex git project as it uses the git submodules feature which allows including git repositories inside git repositories with the outer repository just tracking the commit id of the inner ones.

File hierarchy we follow the standard puppet layout:

  • manifests/
    • nodes/{code.pp,entropy.pp,jenkins-master.pp,monitor.pp,test.pp} These specify the config to use for each node
    • site.pp The entry point into the code and points at nodes/* and globals.pp
    • globals.pp Defines constants
  • modules/
    • dtg/ this is the module that defines dtg specific stuff
      • files/ Contains the static files we want to deploy
        • apache/
        • apt/
        • nexus/
        • raven/
        • sbin/setuserpassword Sets a randomly generate password for the user and emails it to them
        • ssh/
        • ssl/
        • bootstrap.sh Bootstraps the node from a clean OS install with correctly set hostname to being a proper node
        • pre-commit.hook Does a syntax check before commits are allowed to be made
      • lib/puppet/parser/functions/
        • dnsLookup.rb Lookup the ip address at compile time
        • escapeRegexp.rb Escape regular expression
        • random_number.rb Generate a random integer (useful for cron job definitions)
      • manifests/ Definitions of the services
        • apache.pp Raven etc.
        • aptrepository.pp Use UCS apt repositories
        • entropy.pp Entropy server and client
        • firewall.pp Firewall rules
        • git.pp Git server
        • init.pp Entry point for the module
        • jenkins.pp Jenkins server
        • maven.pp Maven repository
        • minimal.pp Defines minimal config for a node
        • tomcat.pp Tomcat, tomcat with raven
        • unattendedupgrades.pp Install security updates automatically
        • user.pp Create local user accounts
      • templates/apt/50unattended-upgrades.erb
    • lots of others, mainly external modules (apt, apache) but some that we define ourselves in the same repo like munin and some which we have forked (stunnel)

Git and bootstrapping

The process of going from a clean OS install to a configured puppet node can't use the puppet configuration but there is a bootstrap.sh file which when run as root sets things up correctly creating /etc/puppet-bare, /etc/puppet and pulling the configuration from github and running it (github is publicly visible, code.dtg require auth).

Authentication and monkeysphere

Anyone in the dtg can make changes to dtg-puppet on code.dtg but only current admins (as defined in globals.pp or elsewhere) can push the config to the machines. Authentication to the machines is done using  monkeysphere Monkeysphere uses the PGP web of trust to authenticate ssh public keys, this allows for a distributed rather than centralised management of keys and simplifies key revocation and the addition of new keys. This works by having a list of trusted keys which sign the keys of people who are allowed access to the systems and then there is a mapping of username to key user id which is used to download the correct keys. Then the monkeysphere ssh subkey of the pgp key is used as one of the authorized keys for the user. Users are also issued with randomly generated passwords by email (for use with sudo) but password based login is disabled as you can generate new keys which will give you access to the machine remotely and so don't need the fallback mechanism in normal usage (if something goes badly wrong we can use a local console via xencenter or turning up in person).

Monitoring

One of the neat tricks of having all the config in one place is that it is easy to put the config saying 'monitor this service' next to the config saying 'run this service' and then have  nagios complain if the service goes away and  munin monitor it for performance. This should hopefully solve our present problem of only finding out when something goes down significantly after the fact when one of our users tells us.

Case study: supplying entropy

One problem with doing lots of crypto is that you need lots of good quality entropy and unfortunately virtual machines find entropy rather hard to come by as they live in rather deterministic worlds. Hence  entropy.dtg which is a rPi that has a  simtec entropy key which uses hardware to produce entropy.

The node specific config for entropy.dtg is in manfiests/nodes/entropy.pp:

node 'entropy.dtg.cl.cam.ac.uk' { # stuff in this block will only be executed by nodes with a matching hostname
  # We don't have a local mirror of raspbian to point at
  class { 'minimal': manageapt => false, }# this pulls in the minimal config for all servers without using the unix support mirror for apt
  class { 'dtg::entropy::host': # this specifies we want this node to be an entropy host using an entropy key
    certificate => '/root/puppet/ssl/stunnel.pem',
    private_key => '/root/puppet/ssl/stunnel.pem',
    ca          => '/usr/local/share/ssl/cafile',
    stage       => 'entropy-host'
  }
}
if ( $::fqdn == $::nagios_machine_fqdn ) { # Tell nagios to monitor the node's ssh
  nagios_monitor { 'entropy':
    parents    => '',
    address    => 'entropy.dtg.cl.cam.ac.uk',
    hostgroups => [ 'ssh-servers'],
  }
}
if ( $::fqdn == $::munin_machine_fqdn ) { # Tell munin to monitor the node
  munin::gatherer::configure_node { 'entropy': }
}

This then via dtg::entropy::host pulls in config from modules/dtg/manifests/entropy.pp and via 'minimal' pulls in modules/dtg/manifests/minimal.pp. The relevant section from entropy.pp:

# Entropy server
# Relies on the ekeyd and stunnel modules
class dtg::entropy {
  class { 'stunnel': stage => $stage, }# This sorts out stunnel being setup sufficiently early on
  file {'/usr/local/share/ssl/':
    ensure => directory,
    mode   => '0755',
    owner  => 'root',
    group  => 'root',
  }
  file {'/usr/local/share/ssl/cafile': # Specify the list of trusted keys
    ensure => file,
    source => 'puppet:///modules/dtg/ssl/cafile',
    mode   => '0644',
    owner  => 'root',
    group  => 'root',
  }
}
class dtg::entropy::host ($certificate, $private_key, $ca, $crl = false){
  if ! defined(Class['dtg::entropy']) {
    fail("Class['dtg::entropy'] must be defined")
  }
  group { 'egd-host': ensure => present, }
  user { 'egd-host':
    gid => 'egd-host',
    comment => 'Entropy Generating Device user',
    ensure => present,
  }
  file { '/var/lib/stunnel4/egd-host':
    ensure => directory,
    owner  => 'egd-host',
    group  => 'egd-host',
    mode   => 700,
    require => Package[$stunnel::data::package],
  }
  stunnel::tun { 'egd-host': # create a SSL tunnel
    cert        => $certificate,
    key         => $private_key,
    cafile      => $ca,
    crlfile     => $crl,
    chroot      => '/var/lib/stunnel4/egd-host',
    pid         => '/egd-host.pid',
    output      => '/egd-host.log',
    user        => 'egd-host',
    group       => 'egd-host',
    services    => {'egd-host' => {accept => '7776'}},
    connect     => '777',
    client      => false,
    protocol    => false,
    require     => User['egd-host'],
  }
  class { 'ekeyd':
    host => true,
    port => '777',
    masterkey => '',
  }
  # Allow connections to 7776
  class { 'dtg::firewall::entropy':}
}

Future work

There is still rather a lot of stuff left to do, we have many servers and services which are not yet in the new world and there are still some new things which need to be developed. Currently backups are rather ad-hoc or completely lacking and this is rather worrying and dangerous, some things (our git and maven repositories) are backed up to nas by cron jobs running as me to /anfs/dtgscratch/drt24/svr-acr31-code/backups/ but really we want to use a better solution which is more space efficient and which is integrated into the puppet world.

One service which still needs development is the 'secret server' which serves secrets to the root user on the nodes and so allows them to do things like access the nexus server and use predefined ssh keys.

Practical

Before this talk

To allow everything to run smoothly when we actually get to the practical I want to set up the monkeysphere enabled gpg keys in advance. To do that you will need to have a gpg/pgp key which may involve  creating one. Then you can follow the  monkeysphere instructions on setting up a monkeysphere enabled ssh subkey. Then you can can tell me what your GPG key id is and I can sign it and add you to the list of admins. Unfortunately  Johnny still can't encrypt so if you get stuck come ask me.

Actual practical

Checkout the dtg-puppet git repository:

git clone git@dtg-code.cl.cam.ac.uk:infrastructure/dtg-puppet
git submodule update --init # get the submodules

Install puppet-common:

cl-asuser apt-get install puppet-common

Test the puppet config:

puppet apply --noop --modulePath=modules --node_name_value=test-puppet.dtg.cl.cam.ac.uk manifests/site.pp

You can change 'test-puppet' to one of the other node names like 'entropy' unfortunately this may fail if not run as root if it tries to see whether it could execute a script as a different user. There is a pre-commit hook in modules/dtg/files/pre-commit.hook which you might want to symlink into .git/hooks so that you can't commit code with syntax errors.

You can then push changes you commit to the config out to nodes by adding them as remotes:

git remote add test-puppet CRSID@test-puppet.dtg.cl.cam.ac.uk:/etc/puppet-bare