A SaltStack Highstate Killswitch

On rare occasion it’s necessary to debug a condition on a system by making temporary changes to the running system. If you’re using config management, especially as part of your deployment process, it’s necessary to disable it so that your temporary changes won’t be reset. salt-call doesn’t natively have a mechanism for this like Puppet does (puppet agent –disable; puppet agent –enable). It’s possible to do this yourself, though.

This requires that you’re using the failhard option in your configuration, that you’re using the 2014.7 (Helium) or above release, and also assumes you have some base state that is always included and is always included first.

SaltStack AWS Orchestration and Masterless Bootstrapping

In my last post, I mentioned that we’re using SaltStack (Salt) without a master. Without a master, how are we bootstrapping our instances? How are we updating the code that’s managing the instances? For this, we’re using python virtualenvs, S3, autoscaling groups with IAM roles, cloud-init and an artifact-based deployer that stores artifacts in S3 and pulls them onto the instances. Let’s start with how we’re creating the AWS resources.

Moving away from Puppet: SaltStack or Ansible?

Over the past month at Lyft we’ve been working on porting our infrastructure code away from Puppet. We had some difficulty coming to agreement on whether we wanted to use SaltStack (Salt) or Ansible. We were already using Salt for AWS orchestration, but we were divided on whether Salt or Ansible would be better for configuration management. We decided to settle it the thorough way by implementing the port in both Salt and Ansible, comparing them over multiple criteria.

Truly ordered execution using SaltStack

SaltStack’s documentation implies that by default, since the Hydrogen (2014.1.x) release, execution of states is ordered as defined. In practice, however, this isn’t true. SaltStack supports a feature called requisites, which provide features like require, watch, onchange, etc.. Some requisites, like watch, are basically impossible to live without. For instance, if you want to conditionally restart a service when a configuration file changes you need watch. If you use requisites you can’t ensure the state run will execute in order.

Per-project users and groups (aka service groups)

In Wikimedia Labs, we’re using OpenStack with heavy LDAP integration. With this integration we have a concept of global groups and users. When a user registers with Labs, the user’s account immediately becomes a global user, usable in all of Labs and its related infrastructure. When the user is added to an OpenStack project it’s also added to a global group, which is usable throughout the infrastructure.

Global users and groups are really useful for handing authentication and authorization at a global level, especially when interacting with things like Gerrit and other global services. Global users can also be used as service accounts within instances, between instances or between projects. There’s a number of downsides to global users though:

OpenStack wiki migration

On Feb 15th we migrated the MoinMoin powered OpenStack wiki to a new wiki powered by MediaWiki. Overall the migration went well. There was a large amount of cleanup that needed to get done, but we followed up the migration with a doc cleanup sprint. The wiki should be in a mostly good state. If you happen to find any articles that need cleanup, be bold!

So, what’s new with the wiki?

  1. All articles now have discussion pages
  2. It’s possible to make PDFs out of individual pages or to create a book (as a PDF or an actual physical book) from collections of articles

Extending a flatdhcp network the hard way

The title may make you think there’s an easy way. No such luck. Nova has no facility for extending a flatdhcp network, and as far as I can tell Quantum also has no facility for doing so.

Extending the flatdhcp network can be kind of a pain in the ass, so here’s how I handled it:

Assumptions

  • Network before extension:
    • Network CIDR: 10.4.0.0/24
    • Broadcast: 10.4.0.255
    • Netmask: 255.255.255.0
    • Network ID: 2
  • Network after extension:
    • Network CIDR: 10.4.0.0/21
    • Broadcast: 10.4.7.255
    • Netmask: 255.255.248.0
    • Network ID: 2

Modify the network

First modify the network via the database:

OpenStack Foundation Board Candidacy

Voting has started for the OpenStack board and I’m one of the 39 candidates. Many of the candidates have posted answers to a set of questions asked of all candidates. You can read my responses at the candidate site. Rather than reiterating those answers, I’d like to bring up some of the specific things I’d like to do as a board member.

Fight for the users

Being an OpenStack user is difficult, currently. Unless you have an OpenStack developer on your team, it’s difficult to even run OpenStack, let alone migrate between versions. Many deployments will run into bugs in the stable version of OpenStack and getting those bugs fixed and moved into the stable branch is difficult.

I’ve been with the Wikimedia Foundation for a second year. Have I met my goals?

I’m actually on time for this update, this year! Here’s my goals from last year; I’ll give feedback inline:

  1. Continue with the Labs project. Finish set up of test/dev Labs, and begin work and make major progress on tool Labs.
    • Partial success: Test/dev Labs is going really well. At the time of this writing we have 99 projects, 174 instances, and 446 users. We have per-project nagios, ganglia, puppet, and sudo. We also have an all-in-one MediaWiki puppet configuration. We currently have one zone with 5 compute nodes, and will mostly triple the capacity of that in the next month. We have another zone coming up in another datacenter that will be 8 large compute nodes. Stability is still currently a concern, and we haven’t come out of closed beta, yet, though. Also, work on Tool Labs is mostly not started. We do have a bots cluster that’s community managed, but we don’t have database replication and don’t have a simple way for tool authors to contribute.

Announcing OATHAuth, a two-factor authentication extension for MediaWiki

I’ve just released OATHAuth 0.1 for MediaWiki. This is an HMAC based One Time Password (HOTP) implementation providing two factor authentication. This is the same technology used for Google’s two-factor authentication.

OATHAuth is an opt-in feature that adds more security accounts in a wiki. It provides two-factor authentication, using your phone as the something you have, and your username/password as the something you know. If you are using iPhone or Android, you can use the Google Authenticator app as a client. There are also clients for most other phones and desktops; Wikipedia has a good list of clients.