We've covered some good ground already, some blabber about Redis in general, and also some thoughts on when using it could be beneficial. The other big question is: How do I integrate that stuff in my application? How do I get my objects to be stored neatly in Redis?

The simplest way is to just use the redis-rb library and talk to Redis almost directly. That's quite low level compared what you're used to with ActiveRecord though. Hurl wraps most of the convenience stuff into a neat class Model that implements basic functionality like saving, validating and handling identifiers.

Ohm takes it the next level, adding more complex validations, integrating the Redis data types as top level class attributes, and even handling associations for you. It's serious awesomesauce, and the implementation is quite simple too. It even supports having indexes on attributes you want to query for, updating them transparently for you as you create new objects or update existing ones.

As for integrating these things alongside your existing application, based e.g. on ActiveRecord, there is no really easy way to to that. You can't have both Ohm and AR in the same class. Data stored in Redis referencing records in other data stores, usually by storing type and identifier, is obviously weakly coupled to them. You have to take care of cleaning up and of ensuring that the keys match. Associating them means you'll have to stuff some way to fetch the data from Redis into your existing model. Let's look at some means how you could store objects and associated data in Redis. The approaches are derived from Ruby code, but are easily applicable to other languages. They're more general implementation ideas than specific pieces of code.

Keys

Simple thing, use the fully qualified class name, add a unique identifier, and you're done. The identifier can either be derived from a separate key generator attribute in Redis, using the atomic increment command to generate new ones, or by using something like a UUID. It could basically look like this, the simple rule still applying that your keys need to be unique at least on a per-class basis:

"User:ef12abc"

Attributes

I can think of two ways to store your simple attribute data, and both are implemented in Ohm and in hurl respectively. The first one is to combine the key and the attribute name and store each attribute's value as a separate value. Ohm follows that approach, but I honestly can't think of any good reason to do it this way. Resulting key-value pairs could look like this:

"User:ef12abc:name" => "salvatore"

hurl uses a much nicer approach, it just serializes the hash of attributes into a JSON string, storing it with the generated key as mentioned above. I much prefer this approach as it doesn't create dozens of keys for each object stored in Redis. Data is serialized before storing and deserialized when loading, simple story, and not a new approach in the world of post-relational databases. You'd do the same with CouchDB.

"User:ef12abc" => "{name: 'salvatore'}"

Of course an attribute can simply be a list or a set of values, no big deal with Redis. Ohm wraps that kind of functionality into simple proxy classes.

Finding Things

With a key structure in place it's easy to fetch objects and their attributes, but what about querying by attribute values? Redis doesn't have any query mechanism as you know it from relational databases. You just can't do find_by_name('salvatore') and be done with it. Ohm has a neat approach that just seems logical applied to something like Redis. For attributes you wish to index it stores a reverse lookup list. So if you have users with different names, that list is basically a key for each name used in any one object with a reference to objects having that very value.

You could extend that for any combination of attributes you want to query for, but if detailed ad-hoc querying is what you're after, maybe Redis is not the right tool for the job. The stored key-value combination could look like this:

"User:name:salvatore" => ['User:ef12abc']

Combine class name, attribute and the value with a list of objects having that value. It's up to you to just have a list of their identifiers or store the fully qualified key. Of course for arbitrary values the keys can get quite long, so Ohm solves that problem by encoding the values with Base64 which does have the disadvantage of being less readable when debugging.

With that key-value combination in place a find on a particular value is merely fetching a single key in Redis. And if you didn't realize it until now, Ohm is pretty neat. It solves Redis usage for Ruby models quite elegantly.

Associations

At this point you should get an idea how things can be solved with Redis. For associating objects the solution is pretty simple. Just keep a list of all the objects belonging to the current one. Say a user has a bunch of posts attached to them. Given that posts are also stored in Redis (and why wouldn't they?), it's just a list of keys, or a set, if you don't care for the ordering:

"User:ef12abc:posts" => ['Post:1', 'Post:2']

Each post gets its own attribute storing the reference back up, but depending on how you usually navigate your objects, you can leave it out.

So how do you mix objects using ActiveRecord with objects stored in Redis? The answer is right there: with a lookup list similar to the one above, just using a different key according to the ActiveRecord model. It could just be as simple as class and primary key.

Locking Objects

After my Redis talk someone raised the question of how Redis solves the problem of concurrent writes and their potential to overwrite each other's data. I was a bit caught off guard by this question, and while I think that this is not a problem solely related to Redis, but to relational databases in general, the answer is: it doesn't. Apart from the atomic operations you can do on sets and lists and incrementing counters there just is no way.

But if you still want to ensure that at least your write is successful, Ohm to the rescue. It uses a simple lock value to lock a specific object in Redis, at least for this very write. That a write waiting for the lock to be remove might overwrite the data just written is another issue. The code in Ohm to obtain the lock looks like this (I've replaced one method call with a real value to give you an idea):

def lock!
  lock = db.setnx('User'ef12abc:_lock', 1) until lock == 1
end

setnx sets a value only if it hasn't been previously set. So it loops until it acquires the lock, and then performs the operation. Ohm utilizes this kind of mutex for several of its operation where Redis itself can't guarantee atomicity.

There is some discussions going on to introduce some sort of batch command style syntax to the Redis protocol though. So while I don't want to see the full logic of complex transactions in Redis, having a command to have a whole batch of other commands either run all together or fail together is pretty neat.

Select * from users

So how do you get all of your users for easy pagination? Easy, create a list for all your User objects. You can access specific ranges and always have the number of total objects at hand:

"User:all" => ['User:12acf', 'User:f31ad']

Now you can use the lrange to get to your objects or the llen command (llen User:all) to get the total number of objects of a particular type.

Sorting

Now this is where it gets tricky. Sorting by an attribute would involve a list of attribute values somehow associated to the objects they belong to. I can't really think of a simple way to solve this in Redis 1.0, since there is no data structure that allows linking two values like that. For the record, sorting a simple list in Redis is easy as pie, check Chris Wanstrath's post on sorting.

With Redis 1.1 however, sorted sets will be introduced. While they still don't solve the problem entirely, they're an acceptable solution. They work based on a score you specify when adding the attribute to the list. The score is really the kicker though, to fully work, it needs to be a number, a double precision floating point to be exact.

Since you're dealing mostly with strings, you'd need to run some sort of hashing on them. My first thought was to just put their numerical ASCII or UTF-8 codes together. But that falls apart considering that you'd need to pad the numbers depending on how many digits they have. Since the score itself is considered a number, at least padding of the first character code is lost.

But wait, did I say floating point earlier? I think I did. What if we add the generated string to e.g. a "1." and get a nice floating point number? That way padding is not lost and we still get a valid representation in terms of how Redis treats scores. I'd restrict the scoring to the first couple of characters though, the float could get quite long. You'd have the same problem with strings of arbitrary length though.

Using this simple method that's probably far from being of any scientific value, we can get a basic score for a string using it's ASCII value:

def score(string)
  base = '1.'
  base.<< string.split("").collect{|c| '%03d' % c.unpack('c')}[0...6].join
end

So "salvatore" would get you a score of 1.115097108118097116111. As you can see depending on the string this might break down pretty fast, so I'd restrict it to the first couple of characters in the string. My algorithm knowledge is embarrassingly rusty, so if you have good ideas how to solve that problem, please let me know. An elegant solution would be what I'm after, I'm not too happy with the one above, but it should give you an idea how you could solve the problem of sorting.

Now that you have a somewhat decent way of putting a score on an attribute value, all you need is to turn them into a sorted set saving score and the key and you're done.

redis-cli zadd User:name:sorted 1.115097108118097116111 User:12acf
redis-cli zadd User:name:sorted 1.109097116104105097115 User:f31ad

redis-cli zrange User:name:sorted 0 1

> 1. User:f31ad
> 2. User:12acf

Update: There's a caveat (a good one) that Salvatore brought to my attention, and it's very much worth mentioning, although it doesn't help when referencing identifiers. When you specify the same scoring number for every attribute with zadd, Redis will start sorting lexicographically. How cool is that? So you could spread the same score across different attributes and still get sorting for strings. Neat stuff. Salvatore explained this in good detail a while back on the mailing list.

So what now?

I can hear you think: Why would I go through all that trouble just to get my data into Redis? Isn't that too much work and not really worth the hassle? Let me tell you why, because in return you get blazing speed. All data is in memory and is accessed accordingly fast. As opposed to your database the data here doesn't clog up the whole system taking up precious memory that could be used for really important data and queries.

That's why you want to start getting data out of your database, into a key-value store. Redis is just one of your options, but the implementation would usually end up being quite similar.

The inspiration for most of the ideas came from going through Ohm's source, but they're pretty similar to what I imagined they'd work, so I don't take full credit for them.

In other news, the video for last week's N✮SQL Berlin meetup are available for your viewing pleasure. If you want to hear me use the the word awesome more than 30 times, now's your chance.

Tags: redis

A very valid question is: What's a good use case for Redis? There's quite a few, as Redis isn't your every day key-value store, it allows you to keeps lists and sets in your datastore, and to run atomic operations on them, like pushing and popping elements. All that stuff is incredibly fast, as obviously your data is held in memory and only persisted to the hard disk if necessary and to top it off, asynchronously, while not reducing the throughput of the server itself.

The simplest and most obvious use case is a cache. Redis clocks in at almost the speed of Memcached, with a couple of features sprinkled on top. If you need a cache, but maybe have a use case where you want also want to store data you in it that you want to be persisted, Redis is a decent tool for your caching needs. If you already have a Memcached instance in place I'd look at my options before adding a new component to my infrastructure though.

Pushing and popping elements atomically, does that ring a bell? Correct, that's what you want from a worker queue. Look at delayed_job, you'll find that it uses a locking column in your jobs table. Some people argue that a database should not be the place where you keep your worker jobs. Up to a certain amount of work I disagree, but at some point the performance costs outweigh the benefits, and it's time to move on. Redis is a perfect fit here. No locking needed, just push on the list of jobs and pop back off it in your workers, simple like that. It's the GitHub way, and the more I think about it, the more sense it makes.

For Redis 1.1 Salvatore has been working on a proposal by Ezra from Engine Yard to implement a command that would move items from one list to another in one step, atomically. The idea is to mark a job as in progress, while not removing it entirely from the data storage. Reliable messaging anyone? It's such a simple yet genius idea, and Redis has most of the functionality already in place. There's heaps more planned for future Redis releases, I'd highly recommend keeping an eye on the mailing list and on Salvatore's Twitter stream.

As I'm sure you noticed Redis is used for data storage in hurl, a neat little app to debug HTTP calls. Redis is simply used to store your personalized list of URLs you checked. Should some data be lost in between database dumps, it's not a big deal, it's not great sure, but not a big deal.

The simple answer for when to use Redis is: Whenever you want to store data fast that doesn't need to be 100% consistent. In the past projects I've worked on that includes classic examples of web application data, especially when there's social stuff sprinkled on top: ratings, comments, views, clicks, all the social stuff you could think of. With Redis, some of it is just a simple increment command or pushing something onto a list. Here's a nice example of affiliate click tracking using Rack and Redis.

Why is that a good match? Because if some of that data is lost, it doesn't make much of a difference. Throw in all the statistical or historial data you can think of that's accumulated in some way through your application, and could be recalculated if necessary. That data usually just keeps clogging up your database, and is harder and harder to get rid of as it grows.

Same is true for activity streams, logging history, all that stuff that is nonvolatile yet doesn't need to be fully consistent, where some data loss is acceptable. You'd be surprised how much of your data that includes. It does not, and let me be perfectly clear on that, include data that involves any sort of business transaction, be it for a shopping platform or for data involved in transactions for software as a service applications. While I don't insist you store that data in a relational database, at least it needs to go into a reliable and fully recoverable datastore.

One last example, the one that brought me and Redis together is Nanite, a self-assembling fabric of Ruby daemons. The mapper layer in Nanite keeps track of the state of the daemons in the cluster. That state can be kept on each mapper redundantly, but better yet, it should be stored in Redis. I've written a post about that a while back, but it's still another prime use for Redis. State that, should it or part of it get lost, will recover all by itself and automatically (best case scenario, but that's how it works in Nanite).

One thing to be careful though is that Redis can only take as much data as it has memory available. Especially for data that has the potential to grow exponential with users and their actions in your application, it's good to keep an eye on it and to do some basic calculations, but you should even do that when using something like MySQL. When in doubt, throw more memory at it. With Redis and its master-slave replication it's very easy to add a new machine with more memory, do one sync and promoto the slave to the new master within a matter of minutes. Try doing that with MySQL.

For me, this stuff is not about just being awesome. I've had countless situation where I had data that could've been handled more elegantly using something like Redis, or a fully persistent key-value store like Tokyo Tyrant. Now there's really no excuse to get that pesky data clogging up your database out of there. These are just some examples.

By the way, if you want to know what your Redis server is doing, telnet to your Redis instance on port 6379, and just enter "monitor". Watch in awe as all the commands coming in from other clients appear on your screen.

In the next post we'll dig into how you can store data from your objects conveniently into Redis.

Redis, consider us for your next project.

Tags: nosql, redis

I like to think that there's never been a more exciting time when it comes to playing with new technologies. Sure, that's a bit selfish, but that's just how I feel. Doing Java after I got my diploma was interesting, but it wasn't exciting. Definitely not compared to the tools that keep popping up everywhere.

One "movement" (if you can even call it that) is NoSQL. I've never been particularly happy with relational databases, and I happily dropped MySQL and the like when an opportunity to work with something entirely new came up. Since it's my own project I'm not putting anything at risk, and I don't regret taking that step. We're working with two members of the NoSQL family in particular, CouchDB and Redis.

Last week people interested in and people working with and on these new and pretty fascinating tools came together for the first NoSQL meetup in Berlin. I talked about Redis, and before I keep blabbering on about it, here are my slides. The talks have been filmed, so expect an announcement for the videos soon-ish.

I was up against a tough competition, including CouchDB, Riak and MongoDB (but we're all friends, no hard feelings). During my talk, I might've overused the word awesome. But after all the talks were over, it hit me: Redis is awesome. It seriously is. Not because it does a lot of things, is distributed, written in Erlang (it's written in old-school, wicked fast C), has support for JSON (though that's planned), and all that stuff. No, it's awesome because it does only a very small set of work for you, but it does it extremely well, and wicked fast. I don't know about you, but I like tools like that. I took a tour of the C code last week, and even though my skills in that area are a bit rusty, it was quite pleasant to read, and easy to follow the flow.

I like Redis, and while I don't ask you to love it too, do yourself a favor and check it out. It gives Memcached a serious run for its money. Everyone loves benchmarks, and I do to, but I'm careful not reading too much into them. I ran Redis and Memcached through their paces, using the available Ruby libraries. I tried both the C-based and and Ruby-based version for Memcached, and the canonical Ruby version for Redis. It's like a cache, with sugar sprinkled on top.

Without putting out any numbers, let me just say that, first it's a shame Rails is shipped with the Ruby version of the Memcached library, because it is sloooow. Okay not so slow you should be worried, but slower than the competition. Second, Redis clocks in right in the middle between both Memcached libraries. While it's faster than memcache-client, it's still a bit slower than memcached. Did I mention that the library for Redis is pure Ruby? Pretty impressive, especially considering what you get in return. Sit back for a moment, and think about how much work went into Memcached already, and how young Redis still is. Oh the possibilities.

Redis is more than just a key-value store, it's a lifestyle. No wait, that's something different. But it still requires you to think differently. Shouldn't be a surprise really, most of the new generation of data stores do. It takes any data you give to it, and you're good to go as long as it fits into your memory. Let me tell you, that's still a lot of data. Salvatore is constantly working on new features for Redis, so keep an eye on its GitHub repository. If you thought that pushing and popping elements atomically off lists was cool, there might be a big warm surprise for you in the near future.

I first came across it using Nanite, where it's used to store the state of the daemon cluster. Running it through its paces in preparation for the talk I realized how underused it is. For our use case, Redis is the perfect place to store stuff like history of system data, e.g. CPU usage, load, memory usage and the like. It's also a great fit for a worker queue, but since we have RabbitMQ in place, there's no need for that.

When you look at it closely, there's heaps of uses for Redis. Chris Wanstrath wrote about how he used it writing hurl, and Simon Willison also published a love letter to Redis, there's also more info on how you use it with the Ruby library over at the EngineYard blog, and James Edward Grey published a whole serious on how to install, setup and use Redis with Ruby. Just like CouchDB I want to put Redis to more uses in the future. That doesn't mean I'm looking to find a problem for a solution, it just means that when I have a problem I'm gonna consider my options, and Redis is one of them. It's a perfect mix between a simple yet insanely speedy data store, but with the little twist that is Redis' way of persisting data.

Tags: redis, nosql