Recovering A Sentinel Configuration
Sentinel configuration and configuration management systems don’t play well together, and neither do package management systems and the config file. As a result it is possible to have your sentinel configuration file wiped clean under a running sentinel. Here are some ways you might be able to recover your running configuration.
Update: Article updated to reflect the release of a version supporting the
flushconfig
command.
Requirements
First, let us consider the non-recoverable scenario. If every sentinel in your constellation has had the file cleaned and has been restarted, and you are not running RedSkull, there is no recovering it without rebuilding it the way you built it in the first place. However, if even one sentinel server is still running with the configuration in memory you can recover it. If you are running RedSkull, you can recover even if all the sentinels have gone down - provided at least one RedSkull server as not been restarted.
How do you know the configuration is still in memory? Connect to sentinel using
redis-cli -p 26379 -h <host>
and issue an INFO
command. If you see your
pods listed, congratulations - you can recover.
Scenario 0: Redis version >= 2.8.21 or 3.0.2
If you are running at least 2.8.21 or 3.0.2 you will have the easiest time. For this you connect to your sentinels and execute “SENTINEL FLUSHCONFIG”. Done. Now wasn’t that easy?
Scenario 1: No Red Skull, sentinel process still running
The most simple option is to pick one of the pods in Sentinel, and
“change” it’s configuration. For example, you can do a SENTINEL SET <podname> parallel-syncs 1
. Ideally you’d use the same setting it is
now. If you’ve not modified the settings the setting parallel-syncs 1
doesn’t really change the config, but it makes Sentinel think it
did.
Once sentinel thinks it has changed a setting it will trigger a full write of the config to disk. This rebuilds your file. Do this on each sentinel in your constellation and you’ve recovered.
Scenario 2: Red Skull running, Sentinels Restarted
For this one you’ll need to pull the constellation configuration from Red Skull as your sentinel daemons have a clean slate. For this scenario you will rely on the fact that Red Skull stores the configuration of every single pod it knows about. There are two ways: simple and complex.
Simple Recovery
With this option all of your sentinels will look alike. If you’re running Red Skull to distribute the sentinel job across a bank of them, this may not work cleanly for you - but it will get you to a state you can recover from manually by rebalancing each pod.
For the simple route you pull the JSON data via the Red Skull API via:
http://red.skull.host:8000/api/knownpods
, then iterate over them adding each
one back into Sentinel via the Sentinel API
or by writing a new file and starting sentinel back up. Here is a short Python
script to do the latter for you:
Redskull-To-Sentinel-Config-File
However, if you are leveraging Red Skull’s ability to manage a cluster of Sentinels for you, you’ll probably prefer option two: complex recovery.
Complex Recovery
For this option we are going to do a more in-depth data dump from Red Skull, and it is not yet guaranteed to work as it uses Red Skull pathways which pull data from Sentinel.
For this option you will iterate over every known pod and write (append) to a
file for each known-sentinel
. Depending on the elapsed time from the event to
when you do this there may be no other known-sentinels. However, it is likely
the data is still there, so it should in general work.
To do this you will need to generate and store a mapping of sentinel ->
managed-pod. For example, in a Python script you might have a dictionary for
each sentinel where the name of the sentinel might be ip_port
and the
dictionary in that variable contain the pod name, current master, and any
settings for it such as the auth-pass setting.
You would iterate over the /api/knownpods
pod listing building these
dictionaries up by talking to every Red Skull server in the constellation. Once
you have the set of dictionaries you can then loop over each one and follow the
basic procedure outlined under the Simple Recovery section.
Caveats
There is a big caveat to recovery via RedSkull. If a current master goes down before you can recover that pod will need to be manually removed and added as Red Skull won’t know about it because Sentinel didn’t. You could possibly recover by looking at the “old” master/slave host data and manually checking and updating it.
As far as adding via the Sentinel API or by writing a config file, I prefer using the API. Additions are more immediate and you don’t need to bother with stopping sentinel, writing files, and restarting. Indeed you would not even need to do these operations on the Sentinel nodes directly but, if you have connectivity, you can do it from a bastion or your laptop/desktop.
Future Options
Despite our best efforts these scenarios can still plague us - whether it be from automated code or from an “accidentally” by a human. Ideally Sentinel would not store constellation state in it’s config, and there is work in progress to do just that. However, it could still happen even with a state file instead of the config file.
Because it could still happen we now have the ability to tell Sentinel to flush it’s config to disk, provided you’re running an up-to-date version. If not, then at least now you have some other options - especially if this happens while trying to update to the newer versions.
Hopefully should you ever find yourself in this unfortunate situation these methods will provide you some means of retrieving at least some of your sanity. And, of course, your configuration.