Outline
- Problem statement - what are we trying to solve
- Solution space - puppet (with and without master), cfengine etc.
- Implementation
- Git and bootstrapping
- Authentication and monkeysphere
- Monitoring
- Case study: Supplying entropy
- Future work
- Secret server - maintaining secret information in an open infrastructure
- Automated configuration of backups
- Incorporating existing infrastructure
- Pre-practical monkeysphere setup. Actual practical: git repository setup.
The problem
The DTG has a large and slowly growing number of servers and services which get configured and then forgotten about until many years later, frequently when the person who originally set them up has left. In that time the server becomes progressively outdated and drops out of OS support potentially resulting in security issues. Similarly when things go wrong it is difficult to gain access to the machine and without any monitoring it can be a long time before we realise that something has gone wrong.
Solution space
There are a variety of solutions which allow us to do what we want to do and keep all the configuration for all the servers in one place and use standard software engineering practices like version control and code reuse. Most of them seem to use ruby.
One popular solution is puppet, in its default configuration it has a master server which all the nodes talk to. The master compiles the configuration and serves the compiled rules to the node which executes them. Unfortunately this involves running a ruby webapp with root privileges on all your nodes and a root ruby daemon on all the nodes. The security issues with this are not just theoretical - I have seen four separate critical security vulnerability disclosures since February.
The solution to this is to move to a fully distributed approach using git and ssh where the configuration for all the nodes gets pushed to all the nodes individually as required and is then automatically compiled and applied on the node. This is achieved by pushing to /etc/puppet-bare which contains a post-update hook which pushes the config to /etc/puppet and then applies it.
Implementation
Our puppet configuration is stored on code.dtg and on github and so can be cloned with git clone git@code.dtg.cl.cam.ac.uk:infrastructure/dtg-puppet this is a rather complex git project as it uses the git submodules feature which allows including git repositories inside git repositories with the outer repository just tracking the commit id of the inner ones.
File hierarchy we follow the standard puppet layout:
- manifests/
- nodes/{code.pp,entropy.pp,jenkins-master.pp,monitor.pp,test.pp} These specify the config to use for each node
- site.pp The entry point into the code and points at nodes/* and globals.pp
- globals.pp Defines constants
- modules/
- dtg/ this is the module that defines dtg specific stuff
- files/ Contains the static files we want to deploy
- apache/
- apt/
- nexus/
- raven/
- sbin/setuserpassword Sets a randomly generate password for the user and emails it to them
- ssh/
- ssl/
- bootstrap.sh Bootstraps the node from a clean OS install with correctly set hostname to being a proper node
- pre-commit.hook Does a syntax check before commits are allowed to be made
- lib/puppet/parser/functions/
- dnsLookup.rb Lookup the ip address at compile time
- escapeRegexp.rb Escape regular expression
- random_number.rb Generate a random integer (useful for cron job definitions)
- manifests/ Definitions of the services
- apache.pp Raven etc.
- aptrepository.pp Use UCS apt repositories
- entropy.pp Entropy server and client
- firewall.pp Firewall rules
- git.pp Git server
- init.pp Entry point for the module
- jenkins.pp Jenkins server
- maven.pp Maven repository
- minimal.pp Defines minimal config for a node
- tomcat.pp Tomcat, tomcat with raven
- unattendedupgrades.pp Install security updates automatically
- user.pp Create local user accounts
- templates/apt/50unattended-upgrades.erb
- files/ Contains the static files we want to deploy
- lots of others, mainly external modules (apt, apache) but some that we define ourselves in the same repo like munin and some which we have forked (stunnel)
- dtg/ this is the module that defines dtg specific stuff
Git and bootstrapping
The process of going from a clean OS install to a configured puppet node can't use the puppet configuration but there is a bootstrap.sh file which when run as root sets things up correctly creating /etc/puppet-bare, /etc/puppet and pulling the configuration from github and running it (github is publicly visible, code.dtg require auth).
Authentication and monkeysphere
Anyone in the dtg can make changes to dtg-puppet on code.dtg but only current admins (as defined in globals.pp or elsewhere) can push the config to the machines. Authentication to the machines is done using monkeysphere Monkeysphere uses the PGP web of trust to authenticate ssh public keys, this allows for a distributed rather than centralised management of keys and simplifies key revocation and the addition of new keys. This works by having a list of trusted keys which sign the keys of people who are allowed access to the systems and then there is a mapping of username to key user id which is used to download the correct keys. Then the monkeysphere ssh subkey of the pgp key is used as one of the authorized keys for the user. Users are also issued with randomly generated passwords by email (for use with sudo) but password based login is disabled as you can generate new keys which will give you access to the machine remotely and so don't need the fallback mechanism in normal usage (if something goes badly wrong we can use a local console via xencenter or turning up in person).
Monitoring
One of the neat tricks of having all the config in one place is that it is easy to put the config saying 'monitor this service' next to the config saying 'run this service' and then have nagios complain if the service goes away and munin monitor it for performance. This should hopefully solve our present problem of only finding out when something goes down significantly after the fact when one of our users tells us.
Case study: supplying entropy
One problem with doing lots of crypto is that you need lots of good quality entropy and unfortunately virtual machines find entropy rather hard to come by as they live in rather deterministic worlds. Hence entropy.dtg which is a rPi that has a simtec entropy key which uses hardware to produce entropy.
The node specific config for entropy.dtg is in manfiests/nodes/entropy.pp:
node 'entropy.dtg.cl.cam.ac.uk' { # stuff in this block will only be executed by nodes with a matching hostname
# We don't have a local mirror of raspbian to point at
class { 'minimal': manageapt => false, }# this pulls in the minimal config for all servers without using the unix support mirror for apt
class { 'dtg::entropy::host': # this specifies we want this node to be an entropy host using an entropy key
certificate => '/root/puppet/ssl/stunnel.pem',
private_key => '/root/puppet/ssl/stunnel.pem',
ca => '/usr/local/share/ssl/cafile',
stage => 'entropy-host'
}
}
if ( $::fqdn == $::nagios_machine_fqdn ) { # Tell nagios to monitor the node's ssh
nagios_monitor { 'entropy':
parents => '',
address => 'entropy.dtg.cl.cam.ac.uk',
hostgroups => [ 'ssh-servers'],
}
}
if ( $::fqdn == $::munin_machine_fqdn ) { # Tell munin to monitor the node
munin::gatherer::configure_node { 'entropy': }
}
This then via dtg::entropy::host pulls in config from modules/dtg/manifests/entropy.pp and via 'minimal' pulls in modules/dtg/manifests/minimal.pp. The relevant section from entropy.pp:
# Entropy server
# Relies on the ekeyd and stunnel modules
class dtg::entropy {
class { 'stunnel': stage => $stage, }# This sorts out stunnel being setup sufficiently early on
file {'/usr/local/share/ssl/':
ensure => directory,
mode => '0755',
owner => 'root',
group => 'root',
}
file {'/usr/local/share/ssl/cafile': # Specify the list of trusted keys
ensure => file,
source => 'puppet:///modules/dtg/ssl/cafile',
mode => '0644',
owner => 'root',
group => 'root',
}
}
class dtg::entropy::host ($certificate, $private_key, $ca, $crl = false){
if ! defined(Class['dtg::entropy']) {
fail("Class['dtg::entropy'] must be defined")
}
group { 'egd-host': ensure => present, }
user { 'egd-host':
gid => 'egd-host',
comment => 'Entropy Generating Device user',
ensure => present,
}
file { '/var/lib/stunnel4/egd-host':
ensure => directory,
owner => 'egd-host',
group => 'egd-host',
mode => 700,
require => Package[$stunnel::data::package],
}
stunnel::tun { 'egd-host': # create a SSL tunnel
cert => $certificate,
key => $private_key,
cafile => $ca,
crlfile => $crl,
chroot => '/var/lib/stunnel4/egd-host',
pid => '/egd-host.pid',
output => '/egd-host.log',
user => 'egd-host',
group => 'egd-host',
services => {'egd-host' => {accept => '7776'}},
connect => '777',
client => false,
protocol => false,
require => User['egd-host'],
}
class { 'ekeyd':
host => true,
port => '777',
masterkey => '',
}
# Allow connections to 7776
class { 'dtg::firewall::entropy':}
}
Future work
There is still rather a lot of stuff left to do, we have many servers and services which are not yet in the new world and there are still some new things which need to be developed. Currently backups are rather ad-hoc or completely lacking and this is rather worrying and dangerous, some things (our git and maven repositories) are backed up to nas by cron jobs running as me to /anfs/dtgscratch/drt24/svr-acr31-code/backups/ but really we want to use a better solution which is more space efficient and which is integrated into the puppet world.
One service which still needs development is the 'secret server' which serves secrets to the root user on the nodes and so allows them to do things like access the nexus server and use predefined ssh keys.
Practical
Before this talk
To allow everything to run smoothly when we actually get to the practical I want to set up the monkeysphere enabled gpg keys in advance. To do that you will need to have a gpg/pgp key which may involve creating one. Then you can follow the monkeysphere instructions on setting up a monkeysphere enabled ssh subkey. Then you can can tell me what your GPG key id is and I can sign it and add you to the list of admins. Unfortunately Johnny still can't encrypt so if you get stuck come ask me.
Actual practical
Checkout the dtg-puppet git repository:
git clone git@dtg-code.cl.cam.ac.uk:infrastructure/dtg-puppet git submodule update --init # get the submodules
Install puppet-common:
cl-asuser apt-get install puppet-common
Test the puppet config:
puppet apply --noop --modulePath=modules --node_name_value=test-puppet.dtg.cl.cam.ac.uk manifests/site.pp
You can change 'test-puppet' to one of the other node names like 'entropy' unfortunately this may fail if not run as root if it tries to see whether it could execute a script as a different user. There is a pre-commit hook in modules/dtg/files/pre-commit.hook which you might want to symlink into .git/hooks so that you can't commit code with syntax errors.
You can then push changes you commit to the config out to nodes by adding them as remotes:
git remote add test-puppet CRSID@test-puppet.dtg.cl.cam.ac.uk:/etc/puppet-bare
