Investigating local queuing: Redis, NSQ and LMDB

Systems designed for cloud services assume instances can die at any time, so they’re written to defend against this. It’s important to also remember that networks in cloud services are also incredibly unreliable, and often much less reliable than the instances themselves. When considering a design, it’s important to remember that a node can be partitioned from other services and possibly for long periods of time.

One easy consideration here is logs (including stats and analytics events). We want to ensure delivery of logs, but we also don’t want delivery to affect service operation.

DynamoDB support in SaltStack 2015.2

In the 2015.2 SaltStack release we’ve added the boto_dynamodb execution module and boto_dynamodb state module. This allows you to create DynamoDB tables via states:

Ensure DynamoDB table exists:
    - table_name: {{ grains.cluster_name }}
    - read_capacity_units: 10
    - write_capacity_units: 10
    - hash_key: id
    - hash_key_data_type: S
    - global_indexes:
      - data_type_date_index:
        - name: "data_type_date_index"
        - read_capacity_units: 10
        - write_capacity_units: 10
        - hash_key: data_type
        - hash_key_data_type: S
        - range_key: modified_date
        - range_key_data_type: S
      - data_type_revision_index:
        - name: "data_type_revision_index"
        - read_capacity_units: 10
        - write_capacity_units: 10
        - hash_key: data_type
        - hash_key_data_type: S
        - range_key: revision
        - range_key_data_type: N

Note that at this point the module will only create and delete tables and indexes. It doesn’t currently support dynamically adding and removing indexes or changing read or write capacity units. These are features we’d love to see, and will likely add in the future. If you’d like to beat us to this, please send in pull requests!

Splunk saved search state and execution module support in SaltStack

We (Lyft) believe strongly in the concept of infrastructure as code. If it isn’t automated, it isn’t finished. This belief also applies to our monitoring and alerting. We’re using Splunk saved searches for portions of our alerting and want to ensure that our developers can quickly and easily define alarms in a standard way to be able to share alarms between services.

We’ve added the splunk_search execution module and splunk_search state module to the 2015.2 Saltstack release (in release candidate status at the time of this writing) so that we can manage our searches via orchestration.

Grafana dashboard orchestration using SaltStack

As mentioned in my post on Cloudwatch alarms, we (Lyft) believe that it should be easy to do the right thing and difficult to do the wrong thing. We operate on the concept “If you build it, you run it.” Running your own service isn’t easy, if you don’t have the right tools to help you, though.

We’re using Graphite and Grafana for time series data dashboards. With a consistent configuration management pattern all new services start with their data flowing into Graphite. Dashboard management is tricky, though. We encourage teams to add custom metrics to their services and use them in panels and rows for their services, but we also want to provide a number of consistent panels/rows for all services. We also want to avoid making teams go between multiple dashboards to monitor their own services.

SaltConf15: Sequentially Ordered Execution in SaltStack talk and slides

Here’s another talk that I gave at SaltConf15. It’s about sequentially ordered Salt and if you’ve read my blog posts on it, this probably won’t add a lot of technical info, but it’ll give a lot more context behind why you’d want to use Salt in a sequentially ordered way. Enjoy!

Sequentially Ordered Execution in SaltStack

Here’s the slides.. Note: though my blog is creative commons licensed, the slides are all rights reserved (sorry!).

SaltConf15: Masterless SaltStack at Scale talk and slides

I gave a talk at SaltConf15. It’s about masterless SaltStack, AWS orchestration, Docker management using Salt and other fun things. Enjoy!

Here’s the slides.. Note: though my blog is creative commons licensed, the slides are all rights reserved (sorry!).

SaltStack: Automated cloudwatch alarm management for AWS resources

For the Salt 2014.7 released we (Lyft) upstreamed a number of Salt execution and state modules for AWS. These modules manage various AWS resources. For most of the resources you’ll want to create, you’ll probably want to add cloudwatch alarms to go along with them. It’s not really difficult to do:

Truly ordered execution using SaltStack (Part 2)

A while back I wrote a post about sequentially ordered SaltStack execution. 2014.7 (Helium) has been released and the listen/listen_in feature I described is now generally available. It’s been about 6 months since I’ve been using Salt in a sequentially ordered manner and there’s some other patterns I’ve picked up here. Particularly, there’s a couple gotchas to watch out for: includes and Jinja.

Includes imply a requirement between modules. Requirements can modify ordering, so it’s important to be strict about how you handle them. For example, when reading the following, remember that include implies require:


Using Lua in Nginx for unique request IDs and millisecond times in logs

Nginx is awesome, but it’s missing some common features. For instance, a common thing to add to access logs is a unique ID per request, so that you can track the flow of a single request through multiple services. Another thing it’s missing is the ability to log request_time in milliseconds, rather than seconds with a millisecond granularity. Using Lua, we can add these features ourselves.

I’ll show the whole solution, then I’ll break it down into parts:

Reloading grains and pillars during a SaltStack run

If you use the grain/state pattern a lot, or if you use external pillars you’ve probably stumbled upon a limitation with grains and pillars.

During a Salt run, if you set a grain, or update an external pillar, it won’t be reflected in the grains and pillars found in the grains and pillar dictionaries. This is because you’ve updated it, but it hasn’t been reloaded into the in-memory data structures that salt creates at the beginning of the run. From a performance point of view this is good, since reloading grains and especially loading external pillars is quite slow.