More Thoughts on Redis Performance

In the previous installment I discussed topics and approaches to preventing your Redis instance from becoming slow.

Now it is time to go into ways of measuring it.

What To Measure

For this installment we are looking at command latency and it’s components. Because the number of commands you can push through a Redis server/library is a result of the speed of each command this is the base place to look.

Quick and Easy: The CLI

The first way to check your command latency is to use the command line client redis-cli. It is quick and gives you an immediate checkpoint to start from. The examples here will be using localhost for simplicity, but in your setup you should be using the -h <host> options and if needed -a <auth> and -p <port>.

Basic Latency

To get the basic latency results run redis-cli --latency. This will output a set of numbers indicating min, mac, avg, and the number of iterations. These iterations consist of running the Redis command PING against an open connection. Consequently it does not include any time to connect to the server. If your code connects on every command you’ll see much worse performance than this command will indicate.

This command will run until you kill it, easily done via CTRL-C. The latency numbers are in milliseconds. Thus a result of

min: 0, max: 1, avg: 0.11 (11369 samples)

Means the minimum latency was less than 1 millisecond, the maximum was 1 millisecond, with an average per-PING latency of 0.11 milliseconds. Yes, this is a localhost connection. ;) These numbers are a result computed from 11,369 “samples”, or PING commands.

An alternative is to use --latency-history instead. This option allows you to break it out into temporal slices. Running redis-cli --latency-history will use a default window of 15 seconds. The result will look something like this:

min: 0, max: 1, avg: 0.11 (1475 samples) -- 15.01 seconds range
min: 0, max: 1, avg: 0.10 (1474 samples) -- 15.01 seconds range

Like --latency, this command will run until you kill it. If you want to run with a different interval you can pass -i <number-of-seconds>. For example:

redis-cli --latency-history  -i 10
min: 0, max: 1, avg: 0.11 (984 samples) -- 10.01 seconds range
min: 0, max: 1, avg: 0.11 (983 samples) -- 10.01 seconds range
min: 0, max: 1, avg: 0.11 (983 samples) -- 10.01 seconds range

Since these operations simply use the PING command this should work against any version of Redis.

Intrinsic Latency

Upon an initial reading this might lead you to believe it is measuring the intrinsic latency of the server. It isn’t. If you look at the redis-cli source on Github you’ll find it doesn’t even talk to the server:

 start = ustime();
 compute_something_fast();
 end = ustime();
 latency = end-start;

According to the source comments on intrinsic latency mode:

* Measure max latency of a running process that does not result from
* syscalls. Basically this software should provide an hint about how
  much
* time the kernel leaves the process without a chance to run.

What this is doing is testing intrinsic latency on your client host. This is useful for knowing if the problem actually lies on the client side rather than the server itself.

So while useful it should probably not be in your first steps at checking your latency.

Using Candui

Candui is a small suite I am working on for tracking and testing Redis latency and overall performance. Warning: it is early in it’s development and much will likely change. For this reason, use at your own willingness to use non-production quality software. Candui is in The Github Candui Repository.

Specifically for this article we will be looking at the golatency tool/directory.

This tool, configured via environment variables, will connect to a given Redis instance and run the same “time the PING command” mechanism as the redis-cli does. I did this to maintain parity of the test. Where it differs is it’s ability to output more information and (currently) even store it in a MongoDB - more stores to come - for later analysis

A run of golatency might look like this, though the output is likely to have changed since this article was written:

./golatency 
Connected to <host:ip>

100000 iterations over 53698us, average 536us/operation

Percentile breakout:
====================
99.00%: 3,876.99us
95.00%: 640.00us
90.00%: 514.00us
75.00%: 452.00us
50.00%: 414.00us

Min: 243us
Max: 44,686us
Mean: 536.98us
Jitter: 764.37us

Notice the temporal unit is ‘us’, microseconds in this run. Also you can see it will give you the percentile breakout - letting you get a better picture of what it looks like than a simple min/max/avg will. In this context ‘Jitter’ is the standard deviation of the sample.

This output tells us that overall we’re looking pretty good. This particular example is over a private network. We can see the peak is much higher than the mean, but 95% of the iterations come in at or below 640 microseconds. What does this tell us?

It could tell us there is a sporadic network issue, a sporadic server-side delay, or even something like garbage control in the test client affecting the output. This is why the distribution of the requests are made available - so you can see how common a “high” result is.

In order to determine where the problem is, you will want to first look at how long the commands on the server are taking. This will tell us if the server is the problem.

I’ll go into more detail on that subject in the next installment, but a quick check you can make would be, in this case, to check the redis info output:

redis-cli info commandstats
# Commandstats
cmdstat_ping:calls=32497193,usec=22213723,usec_per_call=0.68

In this particular case we can see the average time to execute PING on the server is 0.68 microseconds. Clearly the ping command itself is not likely the culprit. However, there could be other commands executed during the test which backed the Redis process up causing the delay. Delving into that is the subject of the next installment.