Recovering A Sentinel Configuration

Sentinel configuration and configuration management systems don’t play well together, and neither do package management systems and the config file. As a result it is possible to have your sentinel configuration file wiped clean under a running sentinel. Here are some ways you might be able to recover your running configuration.

Update: Article updated to reflect the release of a version supporting the flushconfig command.

Requirements

First, let us consider the non-recoverable scenario. If every sentinel in your constellation has had the file cleaned and has been restarted, and you are not running RedSkull, there is no recovering it without rebuilding it the way you built it in the first place. However, if even one sentinel server is still running with the configuration in memory you can recover it. If you are running RedSkull, you can recover even if all the sentinels have gone down - provided at least one RedSkull server as not been restarted.

How do you know the configuration is still in memory? Connect to sentinel using redis-cli -p 26379 -h <host> and issue an INFO command. If you see your pods listed, congratulations - you can recover.

Scenario 0: Redis version >= 2.8.21 or 3.0.2

If you are running at least 2.8.21 or 3.0.2 you will have the easiest time. For this you connect to your sentinels and execute “SENTINEL FLUSHCONFIG”. Done. Now wasn’t that easy?

Scenario 1: No Red Skull, sentinel process still running

The most simple option is to pick one of the pods in Sentinel, and “change” it’s configuration. For example, you can do a SENTINEL SET <podname> parallel-syncs 1. Ideally you’d use the same setting it is now. If you’ve not modified the settings the setting parallel-syncs 1 doesn’t really change the config, but it makes Sentinel think it did.

Once sentinel thinks it has changed a setting it will trigger a full write of the config to disk. This rebuilds your file. Do this on each sentinel in your constellation and you’ve recovered.

Scenario 2: Red Skull running, Sentinels Restarted

For this one you’ll need to pull the constellation configuration from Red Skull as your sentinel daemons have a clean slate. For this scenario you will rely on the fact that Red Skull stores the configuration of every single pod it knows about. There are two ways: simple and complex.

Simple Recovery

With this option all of your sentinels will look alike. If you’re running Red Skull to distribute the sentinel job across a bank of them, this may not work cleanly for you - but it will get you to a state you can recover from manually by rebalancing each pod.

For the simple route you pull the JSON data via the Red Skull API via: http://red.skull.host:8000/api/knownpods, then iterate over them adding each one back into Sentinel via the Sentinel API or by writing a new file and starting sentinel back up. Here is a short Python script to do the latter for you: Redskull-To-Sentinel-Config-File

However, if you are leveraging Red Skull’s ability to manage a cluster of Sentinels for you, you’ll probably prefer option two: complex recovery.

Complex Recovery

For this option we are going to do a more in-depth data dump from Red Skull, and it is not yet guaranteed to work as it uses Red Skull pathways which pull data from Sentinel.

For this option you will iterate over every known pod and write (append) to a file for each known-sentinel. Depending on the elapsed time from the event to when you do this there may be no other known-sentinels. However, it is likely the data is still there, so it should in general work.

To do this you will need to generate and store a mapping of sentinel -> managed-pod. For example, in a Python script you might have a dictionary for each sentinel where the name of the sentinel might be ip_port and the dictionary in that variable contain the pod name, current master, and any settings for it such as the auth-pass setting.

You would iterate over the /api/knownpods pod listing building these dictionaries up by talking to every Red Skull server in the constellation. Once you have the set of dictionaries you can then loop over each one and follow the basic procedure outlined under the Simple Recovery section.

Caveats

There is a big caveat to recovery via RedSkull. If a current master goes down before you can recover that pod will need to be manually removed and added as Red Skull won’t know about it because Sentinel didn’t. You could possibly recover by looking at the “old” master/slave host data and manually checking and updating it.

As far as adding via the Sentinel API or by writing a config file, I prefer using the API. Additions are more immediate and you don’t need to bother with stopping sentinel, writing files, and restarting. Indeed you would not even need to do these operations on the Sentinel nodes directly but, if you have connectivity, you can do it from a bastion or your laptop/desktop.

Future Options

Despite our best efforts these scenarios can still plague us - whether it be from automated code or from an “accidentally” by a human. Ideally Sentinel would not store constellation state in it’s config, and there is work in progress to do just that. However, it could still happen even with a state file instead of the config file.

Because it could still happen we now have the ability to tell Sentinel to flush it’s config to disk, provided you’re running an up-to-date version. If not, then at least now you have some other options - especially if this happens while trying to update to the newer versions.

Hopefully should you ever find yourself in this unfortunate situation these methods will provide you some means of retrieving at least some of your sanity. And, of course, your configuration.