Redis already has one of the most extensive set of configuration options available to any data store. But we can do better. I proposed these changes at the Redis Developer Day 2015 in London this week but this is the greater detail version.
Change Summary And Raison d’être
You can configure it via the traditional config file method, through the API,
and, though lesser known, via the command line where you prepend the directive
name with ‘
--’ to turn it into a command switch. But with an increase
container prevalence and to meet growing ideas in operational management Redis
can benefit from some improvements in this space. This proposal is a
collection of changes which bring these benefits to Redis users and those who
need to support it in an operational capacity.
Some of these changes are connected with non-configuration changes. As a result there will be some overlap in this proposal with some of the others.
Environment Based Configuration
A methodology surrounding how to design software with operations in mind is known as the “12 Factor App” methodology. Redis is, though unintentionally, already incorporating several of these. The “factor” being addressed here is environmentally configured software. In this factor software reads configuration from the shell’s environment variables.
As to the priority order for values supplied in more than one location would be:
- Environment Variables -> Command Line -> Config file
Now order of the first two are potentially contentious. Normally the command line would override the environment variables. However, I think there is a reason for inverting these two: Docker, or more specifically the Dockerfile. When running Redis via Docker the Dockerfile has to be set up in a way to allow you pass command line options. If the Dockerfile is not set up this way you can’t pass them in. Normally for this priority this would be irrelevant. However, often these Dockerfiles set CLI options in the Dockerfile itself.
If we run with the traditional CLI > ENV route it means you are locked into what the Dockerfile indicates. By prioritizing the ENV over the CLI we get around this problem, ensuring that all non-file configuration options are available at container run time, regardles of choices made at container creation time.
That is to say that if you have a config variable with “port 6379”, and
you launch with ‘
--port 7000’, that Redis instance will bind to port
7000. I you then added the environment variable for it and set that to
8000, it would listen on port 8000. This priority sequence allows you to
have sane defaults which can be customized at runtime. Redis further
brings in the idea of sane defaults by not needing a config file at all,
thus providing full coverage. I’d propose the prefix be
SENTINEL_. Obviously as hyphens or dashes are not allowed in ENV vars,
conversion from an underscore would need to happen. As such to set the
set-max-intset-entries via the environment you would set
From an operations point of view, the ability to configure based on environment variables provides flexibility while reducing issues - something you don’t see much of. With Redis modifying it’s configuration file you don’t want a configuration management system stomping over your changes. Rather than have custom config files you have environment variables set at runtime thus eliminating this point of operational contention. Another growing aspect where environmental based configuration is a significant win is in containerization.
Redis containers in Docker provide a nice mechanism to dynamic configuration: the environment. You may not want to have everything in a configuration file - or pass it via command line. Thus, adding environment variables as a route to configuration would be most excellent.
Announce IP, Announce Port
As we have in sentinel, Redis needs the ability to be told what IP
and/or port to use when it connects to a master as a slave, or interacts
with Sentinel - and for the same reasons. Behind a NAT your Redis instance
sees different connectivity information than it actually has for
off-host connectivity. In order to handle this we need to follow
Sentinel’s footsteps and add ‘announce-ip’ and ‘announce-port’. These
need to be settable via all configuration mechanisms (environment,
command-line, file, and API). Note that changing these at runtime after
slaveof directive or command will mean the slave needs to inform the
master of it’s true connectivity. There is an altertnative I’ll discuss
later in this proposal.
This is a simple variable useful for identifying a specific instance. This is distinct from a RunId in that it persists across restarts. It will need to be set via all regular configuration mechanisms. As Sentinel (and the Redis) is already very discoverable, being able to name instances becomes extremely handy in dynamic and/or large deployments. It will improve the ability of Redis management tools to discover and report on Redis instances, pods, and clusters. Note: there is already some cases of a “name” being associated with an instance but they are currently IP:PORT combinations.
Another way to handle this could be to consider this information a type of metadata for the instance. I’m of two minds on this subject. Which is better depends on how deeply Redis internally (including in Sentinel and/or Cluster) the instance name gets used. If it becomes heavily used by Redis itsself, it should be configuration item. Otherwise it is probably a better fit for the next seciton: metadata.
Another way of accomplishing the instance name is to have a ‘metadata’
command which acts like the config command but sets or gets metadata
about the instance instead. This data would only be accessible via the
meta command. With this command you could do ‘meta set name roslave-01’
to set the name. You could also do things such as ‘meta set
business-group operations’ and ‘meta set zone a’ to further classify the
Why a metadata store in Redis? Because Redis is already highly discoverable and being able to essentially tag an instance with additional data extends the usefulness of this capability. It becomes highly useful for Redis management systems. The configuration of metadata should be handled the same as configuration data: you should be able to specify it via all normal configuration means: file, CLI options, Environment, and the ‘meta’ command in the API.
Why not store in Redis?
- An admin may not want their users to have access to that data via normal commands. By using a dedicated command it could be renamed or could be restricted if/when we get multi-user or multi-role capability.
Say you are doing multi-regional availability. Sure, Redis isn’t built for it but you need it. How can you ensure your sentinels pick based on DC first? Sure, you could manually configure DCs two and three to a different slave-priority, but what happens when the “master” DC dies? You have to go reconfigure everything. So perhaps you’d like to store decision making information in the Redis instances themselves and implement your own version of Sentinel. By being able to assign metadata in each instance you could indeed do this - and without standing up a separate datastore or overloadig an existing one.
Anyone who has poked around ElastiCache may have noticed there is a key they don’t control. IMO this should not be somewhere the user has access to - if for no other reason than it can skew their code’s results which calculates or iterates over the keys in a database or produce unintended results such as a flush not resulting in a dbsize of 0. With an inbuilt metadata store this becomes a reality.
Configuration Sync From Master to Slave(s)
Currently if you connect to a master and change a config variable such as persistence or memory optimization settings this is only done on the master. Redis needs to have the ability to push certain changes to one or more slaves, rather than client management code needing to go do it for you.
There are a few ways to do it, though they are not mutually exclusive. One is to be able to specify a list of directive to always sync to all slaves. For example:
config sync all hash-max-ziplist-entries hash-max-ziplist-values
config sync slave-01 save
The first tells Redis that when a config set is executed for the
hash-max-ziplist-* settings to then push said changes to all slaves.
The second tells Redis to push to slave-01 any changes made to the save
Another option is for the form to be one of exclusions:
config sync all save,hash-max-ziplist-entries hash-max-ziplist-values
config nosync slave-01 save
config sync slave-02 set-max-intset-entries
In this form all slaves except slave-01 will have changes made to save
hash-max-ziplist-* settings replicated to them. Slave-01 will get
the hash-max-ziplist settings but NOT the save changes. Slave-02 will
ALSO get the set-max-intset-entries changes. Not all directives should
be replicated, resulting in a blacklist of sorts. Some key examples
announce-* changes, name, slave settings, etc..
Another way to go about it is to add a ‘sync’ option to the config set command. For example:
config set save ‘’ sync all
config set save ’60 100’ sync slave-01 slave-03
This could even be done in tandem with the config sync/config nosync
options to all per-invocation syncs. This option is likely quicker to
implement and does allow you to determine on a given
command where if and where to sync it. On the other hand, it means your
code always has to take this into account, whereas with the other option
it is set by a policy.
The first option also enables configs to be pushed on startup and handles cases where a management tool already exists and makes changes but shoudln’t be making the sync decisions.
Meta Data Sync
While most metadata would be specific to that instance there are cases where it should be replicated. For those cases I propose we do the same thing for metadata. Real example: in Sentinel we name each pod. Yet that information is not available in the instances in said pod. This means you can’t easily reverse-discovery your setup.
Additionally, and thanks to Salvatore for spotting it, if Sentinel were
to set the metadata key ‘sentinel-name’ (or whatever we decide to call
it) on the master when you call
sentinel monitor it could be used by
Sentinel for cross-checking configuration. For example Sentinel could
interrogate a new master only to find it already has that field set and
uses a different name. In that case it could refuse to pile on,
returning an error
Post-Start Initialization Phase
This change provides a window of time after startup wherein Redis does
not serve or accept data. Configurable via a config parameter (such as
readiness-delay) such that 0 disables it (by virtue of the time-to-wait
being 0) and a number of (milli?)seconds specified means the server
will wait for post-start initialization and configuration commands. Also
a new config sub-command to be introduced such as
config ready will
end the delay and place the server in server mode.
The purpose of this is to enable the administrator to set various things that can, or should, not be specified in the config file - or should be overridden at run-time. We have a few examples of this already: Redis/Sentinel behind a NAT, Redis being reconfigured as a slave or master.
One of the items Salvatore discussed this year was the concept of a
“protected restart” mode to prevent a scenario where a diskless
replication enabled setup could lose all of it’s data through a failed
restart. This mode is quite similar to the one I propose here in that it
doesn’t serve or accept data or data changes while active. After some
discussion we arrived at the idea that since this state can be queried if
you are using this mode and, as you should, make use of the
ready command to indicate it should start you should query for this
before doing so. This will ensure both cases are covered without require
Redis to always do Yet More Checks.
A Redis server is run behind a NAT-ed IP address, such as in a VM or
container. You do not know what IP and port will be assigned to the
service when starting it. With new ‘announce-ip’ and ‘announce-port’
readiness-delay long enough to discover them, the code
launching the instance in a container can discover the client-facing
IP:PORT pair, then call:
- launch instance, discover connectivity
- call on the new instance:
config set announce-port 7654
- call on the new instance:
config set announce-ip 22.214.171.124
- (obtain master Ip through some mechanism such as Docker’s API)
- call on the new instance:
slaveof 126.96.36.199 6379
- call on the new instance:
Additional Thoughts on Config Mode
With some time to think more about this I wonder if the
should have a counterpart
config notready for cases where you need to
go the other way around. Perhaps you have found an issue which requires
reconfiguration. Being able to stop data serving could give you time to
reconfigure without needing to restart.
An Idea From The Void
To throw out a wild idea, I also brought up the possibility to storing and accessing configuration in a backing store such as Consul as an option. While it isnt part of the forthcoming RCP, it is an idea that holds a lot of merit as it allows a lot of new capabilities and interaction between Redis nodes both in pods as well as clusters but also for proxies and client/server information as well.