Redis Zen

Working with Redis can be done with the usual mindset, but approaching it from a different mindset can reap significant rewards.

The fundamental approach to most data storage during development is “OK, so how do I fit this into an SQL DB?”. Then you spend time figuring out how to munge the data you get out when you run queries. When you are using Redis, you need to turn this around.

Instead you first approach how you want to retrieve the data. Since Redis is more of a Data Structure Server than a database, how you intend to consume the data should drive which data structure(s) you use.

For example, consider a common use case: statistics.

Let us say we want to track traffic in log-time. That is, we have something that process access logs as they come in via Syslog. Specifically we want to track every page’s request rate. With the standard mindset we would assume we want a table with the page-id, perhaps a timestamp, and an integer. Or perhaps we design a mated table with one detailing the page, and a table to store one hit per record. Then we write a bunch of code to tease the information out of the database.

Now we take the Redis approach of what I call “Access Based Design”. We know we want to show the top 10 most requested pages in our site for the last, day, and the last 7 days. This leads to a far different data structure design.

Instead we might wind up with something like using a sorted set key for each window we want to track and report on which uses zincr to increment the counter each timer we see a page-id. This reflects the access driven design.

To pull the data out we can retrieve precisely the window we want. We use the zrevrangebyscore command to pull the ten most accessed page-ids.

Storing the data is simple, though it does require a break from the “do it all in one DB update” mentality. You use a zincrby command for each window you track. Thus, you zincrby the key “sitename:hourly:YYYY:MM:DD:13” adding one to the page-id stored in the sorted set to register a view at 1PM of the day in question. If you want to only keep that data for 24 hours, use expireat to have it automatically purged 24 hours later. By doing this every command you ensure the data is fresh for 24 hours after the last update.

You would do this for each window on each hit. Redis is fast enough to handle running multiple commands in quick succession and you can pipeline them as well. Why munge data going in when you can simply increment each counter? Especially when you’d need to rework the data coming back out.

By implementing it this way you eliminate a lot of overhead. You don’t have t manage the data expiration, you don’t have to do calculations in code, you make one call to retrieve precisely the data you are after, and you store data in an extensible fashion which makes adding new counters trivial.

Choose the structure for your data depending on the best way you would access it and you can eliminate a bunch of code and logic. While not everything is as obvious as this common example, the mindset it teaches is what is important. Once you get the hang of it you’ll be able to know immediately whether the data fits a traditional relational DB model or a Data Structure Server model. When it fits the Redis way the design will be quick and easy for you to see, saving you significant design and coding time.