I like to think that there's never been a more exciting time when it comes to playing with new technologies. Sure, that's a bit selfish, but that's just how I feel. Doing Java after I got my diploma was interesting, but it wasn't exciting. Definitely not compared to the tools that keep popping up everywhere.

One "movement" (if you can even call it that) is NoSQL. I've never been particularly happy with relational databases, and I happily dropped MySQL and the like when an opportunity to work with something entirely new came up. Since it's my own project I'm not putting anything at risk, and I don't regret taking that step. We're working with two members of the NoSQL family in particular, CouchDB and Redis.

Last week people interested in and people working with and on these new and pretty fascinating tools came together for the first NoSQL meetup in Berlin. I talked about Redis, and before I keep blabbering on about it, here are my slides. The talks have been filmed, so expect an announcement for the videos soon-ish.

I was up against a tough competition, including CouchDB, Riak and MongoDB (but we're all friends, no hard feelings). During my talk, I might've overused the word awesome. But after all the talks were over, it hit me: Redis is awesome. It seriously is. Not because it does a lot of things, is distributed, written in Erlang (it's written in old-school, wicked fast C), has support for JSON (though that's planned), and all that stuff. No, it's awesome because it does only a very small set of work for you, but it does it extremely well, and wicked fast. I don't know about you, but I like tools like that. I took a tour of the C code last week, and even though my skills in that area are a bit rusty, it was quite pleasant to read, and easy to follow the flow.

I like Redis, and while I don't ask you to love it too, do yourself a favor and check it out. It gives Memcached a serious run for its money. Everyone loves benchmarks, and I do to, but I'm careful not reading too much into them. I ran Redis and Memcached through their paces, using the available Ruby libraries. I tried both the C-based and and Ruby-based version for Memcached, and the canonical Ruby version for Redis. It's like a cache, with sugar sprinkled on top.

Without putting out any numbers, let me just say that, first it's a shame Rails is shipped with the Ruby version of the Memcached library, because it is sloooow. Okay not so slow you should be worried, but slower than the competition. Second, Redis clocks in right in the middle between both Memcached libraries. While it's faster than memcache-client, it's still a bit slower than memcached. Did I mention that the library for Redis is pure Ruby? Pretty impressive, especially considering what you get in return. Sit back for a moment, and think about how much work went into Memcached already, and how young Redis still is. Oh the possibilities.

Redis is more than just a key-value store, it's a lifestyle. No wait, that's something different. But it still requires you to think differently. Shouldn't be a surprise really, most of the new generation of data stores do. It takes any data you give to it, and you're good to go as long as it fits into your memory. Let me tell you, that's still a lot of data. Salvatore is constantly working on new features for Redis, so keep an eye on its GitHub repository. If you thought that pushing and popping elements atomically off lists was cool, there might be a big warm surprise for you in the near future.

I first came across it using Nanite, where it's used to store the state of the daemon cluster. Running it through its paces in preparation for the talk I realized how underused it is. For our use case, Redis is the perfect place to store stuff like history of system data, e.g. CPU usage, load, memory usage and the like. It's also a great fit for a worker queue, but since we have RabbitMQ in place, there's no need for that.

When you look at it closely, there's heaps of uses for Redis. Chris Wanstrath wrote about how he used it writing hurl, and Simon Willison also published a love letter to Redis, there's also more info on how you use it with the Ruby library over at the EngineYard blog, and James Edward Grey published a whole serious on how to install, setup and use Redis with Ruby. Just like CouchDB I want to put Redis to more uses in the future. That doesn't mean I'm looking to find a problem for a solution, it just means that when I have a problem I'm gonna consider my options, and Redis is one of them. It's a perfect mix between a simple yet insanely speedy data store, but with the little twist that is Redis' way of persisting data.

Tags: redis, nosql

We've covered some good ground already, some blabber about Redis in general, and also some thoughts on when using it could be beneficial. The other big question is: How do I integrate that stuff in my application? How do I get my objects to be stored neatly in Redis?

The simplest way is to just use the redis-rb library and talk to Redis almost directly. That's quite low level compared what you're used to with ActiveRecord though. Hurl wraps most of the convenience stuff into a neat class Model that implements basic functionality like saving, validating and handling identifiers.

Ohm takes it the next level, adding more complex validations, integrating the Redis data types as top level class attributes, and even handling associations for you. It's serious awesomesauce, and the implementation is quite simple too. It even supports having indexes on attributes you want to query for, updating them transparently for you as you create new objects or update existing ones.

As for integrating these things alongside your existing application, based e.g. on ActiveRecord, there is no really easy way to to that. You can't have both Ohm and AR in the same class. Data stored in Redis referencing records in other data stores, usually by storing type and identifier, is obviously weakly coupled to them. You have to take care of cleaning up and of ensuring that the keys match. Associating them means you'll have to stuff some way to fetch the data from Redis into your existing model. Let's look at some means how you could store objects and associated data in Redis. The approaches are derived from Ruby code, but are easily applicable to other languages. They're more general implementation ideas than specific pieces of code.

Keys

Simple thing, use the fully qualified class name, add a unique identifier, and you're done. The identifier can either be derived from a separate key generator attribute in Redis, using the atomic increment command to generate new ones, or by using something like a UUID. It could basically look like this, the simple rule still applying that your keys need to be unique at least on a per-class basis:

"User:ef12abc"

Attributes

I can think of two ways to store your simple attribute data, and both are implemented in Ohm and in hurl respectively. The first one is to combine the key and the attribute name and store each attribute's value as a separate value. Ohm follows that approach, but I honestly can't think of any good reason to do it this way. Resulting key-value pairs could look like this:

"User:ef12abc:name" => "salvatore"

hurl uses a much nicer approach, it just serializes the hash of attributes into a JSON string, storing it with the generated key as mentioned above. I much prefer this approach as it doesn't create dozens of keys for each object stored in Redis. Data is serialized before storing and deserialized when loading, simple story, and not a new approach in the world of post-relational databases. You'd do the same with CouchDB.

"User:ef12abc" => "{name: 'salvatore'}"

Of course an attribute can simply be a list or a set of values, no big deal with Redis. Ohm wraps that kind of functionality into simple proxy classes.

Finding Things

With a key structure in place it's easy to fetch objects and their attributes, but what about querying by attribute values? Redis doesn't have any query mechanism as you know it from relational databases. You just can't do find_by_name('salvatore') and be done with it. Ohm has a neat approach that just seems logical applied to something like Redis. For attributes you wish to index it stores a reverse lookup list. So if you have users with different names, that list is basically a key for each name used in any one object with a reference to objects having that very value.

You could extend that for any combination of attributes you want to query for, but if detailed ad-hoc querying is what you're after, maybe Redis is not the right tool for the job. The stored key-value combination could look like this:

"User:name:salvatore" => ['User:ef12abc']

Combine class name, attribute and the value with a list of objects having that value. It's up to you to just have a list of their identifiers or store the fully qualified key. Of course for arbitrary values the keys can get quite long, so Ohm solves that problem by encoding the values with Base64 which does have the disadvantage of being less readable when debugging.

With that key-value combination in place a find on a particular value is merely fetching a single key in Redis. And if you didn't realize it until now, Ohm is pretty neat. It solves Redis usage for Ruby models quite elegantly.

Associations

At this point you should get an idea how things can be solved with Redis. For associating objects the solution is pretty simple. Just keep a list of all the objects belonging to the current one. Say a user has a bunch of posts attached to them. Given that posts are also stored in Redis (and why wouldn't they?), it's just a list of keys, or a set, if you don't care for the ordering:

"User:ef12abc:posts" => ['Post:1', 'Post:2']

Each post gets its own attribute storing the reference back up, but depending on how you usually navigate your objects, you can leave it out.

So how do you mix objects using ActiveRecord with objects stored in Redis? The answer is right there: with a lookup list similar to the one above, just using a different key according to the ActiveRecord model. It could just be as simple as class and primary key.

Locking Objects

After my Redis talk someone raised the question of how Redis solves the problem of concurrent writes and their potential to overwrite each other's data. I was a bit caught off guard by this question, and while I think that this is not a problem solely related to Redis, but to relational databases in general, the answer is: it doesn't. Apart from the atomic operations you can do on sets and lists and incrementing counters there just is no way.

But if you still want to ensure that at least your write is successful, Ohm to the rescue. It uses a simple lock value to lock a specific object in Redis, at least for this very write. That a write waiting for the lock to be remove might overwrite the data just written is another issue. The code in Ohm to obtain the lock looks like this (I've replaced one method call with a real value to give you an idea):

def lock!
  lock = db.setnx('User'ef12abc:_lock', 1) until lock == 1
end

setnx sets a value only if it hasn't been previously set. So it loops until it acquires the lock, and then performs the operation. Ohm utilizes this kind of mutex for several of its operation where Redis itself can't guarantee atomicity.

There is some discussions going on to introduce some sort of batch command style syntax to the Redis protocol though. So while I don't want to see the full logic of complex transactions in Redis, having a command to have a whole batch of other commands either run all together or fail together is pretty neat.

Select * from users

So how do you get all of your users for easy pagination? Easy, create a list for all your User objects. You can access specific ranges and always have the number of total objects at hand:

"User:all" => ['User:12acf', 'User:f31ad']

Now you can use the lrange to get to your objects or the llen command (llen User:all) to get the total number of objects of a particular type.

Sorting

Now this is where it gets tricky. Sorting by an attribute would involve a list of attribute values somehow associated to the objects they belong to. I can't really think of a simple way to solve this in Redis 1.0, since there is no data structure that allows linking two values like that. For the record, sorting a simple list in Redis is easy as pie, check Chris Wanstrath's post on sorting.

With Redis 1.1 however, sorted sets will be introduced. While they still don't solve the problem entirely, they're an acceptable solution. They work based on a score you specify when adding the attribute to the list. The score is really the kicker though, to fully work, it needs to be a number, a double precision floating point to be exact.

Since you're dealing mostly with strings, you'd need to run some sort of hashing on them. My first thought was to just put their numerical ASCII or UTF-8 codes together. But that falls apart considering that you'd need to pad the numbers depending on how many digits they have. Since the score itself is considered a number, at least padding of the first character code is lost.

But wait, did I say floating point earlier? I think I did. What if we add the generated string to e.g. a "1." and get a nice floating point number? That way padding is not lost and we still get a valid representation in terms of how Redis treats scores. I'd restrict the scoring to the first couple of characters though, the float could get quite long. You'd have the same problem with strings of arbitrary length though.

Using this simple method that's probably far from being of any scientific value, we can get a basic score for a string using it's ASCII value:

def score(string)
  base = '1.'
  base.<< string.split("").collect{|c| '%03d' % c.unpack('c')}[0...6].join
end

So "salvatore" would get you a score of 1.115097108118097116111. As you can see depending on the string this might break down pretty fast, so I'd restrict it to the first couple of characters in the string. My algorithm knowledge is embarrassingly rusty, so if you have good ideas how to solve that problem, please let me know. An elegant solution would be what I'm after, I'm not too happy with the one above, but it should give you an idea how you could solve the problem of sorting.

Now that you have a somewhat decent way of putting a score on an attribute value, all you need is to turn them into a sorted set saving score and the key and you're done.

redis-cli zadd User:name:sorted 1.115097108118097116111 User:12acf
redis-cli zadd User:name:sorted 1.109097116104105097115 User:f31ad

redis-cli zrange User:name:sorted 0 1

> 1. User:f31ad
> 2. User:12acf

Update: There's a caveat (a good one) that Salvatore brought to my attention, and it's very much worth mentioning, although it doesn't help when referencing identifiers. When you specify the same scoring number for every attribute with zadd, Redis will start sorting lexicographically. How cool is that? So you could spread the same score across different attributes and still get sorting for strings. Neat stuff. Salvatore explained this in good detail a while back on the mailing list.

So what now?

I can hear you think: Why would I go through all that trouble just to get my data into Redis? Isn't that too much work and not really worth the hassle? Let me tell you why, because in return you get blazing speed. All data is in memory and is accessed accordingly fast. As opposed to your database the data here doesn't clog up the whole system taking up precious memory that could be used for really important data and queries.

That's why you want to start getting data out of your database, into a key-value store. Redis is just one of your options, but the implementation would usually end up being quite similar.

The inspiration for most of the ideas came from going through Ohm's source, but they're pretty similar to what I imagined they'd work, so I don't take full credit for them.

In other news, the video for last week's N✮SQL Berlin meetup are available for your viewing pleasure. If you want to hear me use the the word awesome more than 30 times, now's your chance.

Tags: redis

June was an exhausting month for me. I spoke at four different conferences, two of which were not in Berlin. I finished the last talk today, so time to reciprocate on conferences and talks. In all I had good fun. It was a lot of work to get the presentations done (around 400 single slides altogether), but in all I would dare say that it was all more than good practice to work on my presentation skills and to loose a bit of the fear of talking in front of people. But I'll follow up on that stuff in particular in a later post.

RailsWayCon in Berlin

I have to admit that I didn't see much of the conference, I mainly hung around, talked to people, and gave a talk on Redis and how to use it with Ruby. Like last year the conference was mingled in with the International PHP Conference and the German Webinale, a somewhat web-related conference. I made a pretty comprehensive set of slides for Redis, available for your viewing pleasure.

Berlin Buzzwords in Berlin

Hadoop, Lucene, NoSQL, Berlin Buzzwords had it all. I spent most of my time in the talks on the topics around NoSQL, having been given the honor of opening the track with a general introduction on the topic. I can't remember having given a talk in front of this many people. The room took about 250, and it seemed pretty full. Not tooting my own horn here, I've never been more anxious before a talk of how it would go. Obviously there were heaps of people in the room who have only heard of the term, and people who work with or on the tools on a daily basis. Feedback was quite positive, so I guess it turned out pretty okay. Rusty Klophaus wrote two very good recaps of the whole event, read on about day one and day two.

The slide set for my talk has some 120 slides in all, trying to give a no-fuss overview of the NoSQL ecosystem and the ideas and inspirations. There's some historical references in the talk, because in general the technologies aren't revolutionary, they use ideas that've been around for a while and combine them with some newer ones. Do check out the slides for some more details on that.

MongoUK in London

10gen is running MongoDB related conferences in a couple of cities, one of them in London, where I was asked to speak on something related to MongoDB. Since I'm all about diversity, that's pretty much what I ended up talking about, with a hint of MongoDB sprinkled on top of it. Document databases, the web, the universe, all the philosophical foundation knowledge you could ask for. I talked about CouchDB, Riak, and about what makes MongoDB stand out from the rest.

Most enjoyable about MongoUK was to hear about real life experiences of MongoDB users, what kind of problems they had and such. Also, I finally got to see some of London and meet friends, but I'll write more about that (and coffee) on my personal blog. Again, the slide set is available for your document database comparison pleasure.

Cloud Expo Europe in Prague

Just two 36 hours after I got back from London I jumped on the train to Prague to speak about MongoDB at Cloud Expo Europe. Cloud is something I can get on board with (hint: Scalarium), so why the hell not? It turned out to be a pretty enterprisey conference, but still, got some new food for thought on cloud computing in general.

I already gave a talk on MongoDB at Berlin's Ruby brigade, but I built a different slide set this time, improving on the details I found to be a bit confusing at first. Do check out the slides, if you don't know anything about MongoDB yet, it should give you a good idea.

Showing off

As you'll surely notice, my slides are all websites, and not on Slideshare. Two months ago I looked into Scott Chacon's Showoff, a tool to build web-based presentations that simply run as tiny JavaScript apps in the browser. I very much like that idea, because even though Keynote is still the king of the crop, it's still awful. Using Markdown, CSs and JavaScript appeals much more to the geek in my. It's so easy to crank out slides as simple text, and worry about the styling later. Plus, I can easily keep my slides in Git, and who doesn't enjoy that? I'd very much recommend giving it a go. If you want to look at some sources, all my talks and their sources are available on the GitHubs, MongoDB, Redis, NoSQL, document databases and again MongoDB.

It's a pleasure to build slides with Showoff, and it has helped me focus my slides on very short phrases and as few bullet points as possible. Sure, it's not Keynote and doesn't have all the fancy features, but I noticed that it forced me to focus more, and that keeping slides short helped me stay focussed, but again, more on that in a follow-up post.

Feel free to use my slides as inspiration to play with Showoff, there's surprisingly little magic involved. Also, if you think I should speak at a conference you know of or that you're organising, do get in touch.

Interested in Redis? You might be interested in the Redis Handbook I'm currently working on.

Over at Scalarium we constantly find outselves adding new statistics to track specific parts of the system. Thought it'd be a good idea to share some of them, and how we're using Redis to store them.

Yesterday I was looking for a way to track the time it takes for an EC2 instance to boot up. Booting up in this case means, how long it takes for the instance to change from state "pending" to "running" on EC2. Depending on utilization and availability zone this can take anywhere from 30 seconds to even 30 minutes (us-east, I'm looking at you). I want to get a feel for how long it takes on average.

We poll the APIs every so many seconds, so we'll never get an exact number, but that's fine. It actually makes the tracking easier, because the intervals are pretty fixed, and all I need to do is store the interval and increment a number.

Sounds like a job for a sorted set. We could achieve similar results with a hash structure too, but let's look at the sorted set nonetheless, because it's pre-sorted, which suits me well in this case. For every instance that's been booted up I simply store the interval and increment the number of instances.

In terms of a sorted set, my interval will be the member in the sorted set and the number of instances falling into that particular interval will be the score, the value determining the member's rank. Advantage here is that the set will automatically be sorted by the number of instances in that particular interval, so that e.g. the interval with the most instances always comes first.

We don't need anything to get started, we just have to increment the score for the particular interval (or member), in this case 60 seconds, Redis will start from zero automatically, I'll use the Redis Ruby library for brevity.

redis.zincrby('instance_startup_time', 1, 60)

Another instance took 120 seconds to boot up, so we'll increment the score for that interval too.

redis.zincrby('instance_startup_time', 1, 120)

After some time we have added some good numbers to this sorted set, and we can start keeping an eye on the top five.

redis.zrevrange('instance_startup_time', 0, 4, :with_scores => true)
# => ["160", "22", "60", "21", "90", "10", "120", "10", "40", "5"]

The default sort order is ascending in a sorted set, hence we'll get a reverse range (using the zrevrange command) of the five intervals with the highest score, i.e. where the most instances fall into.

To get the number of instances for a particular interval, we can use the zscore command.

redis.zscore('instance_startup_time', 60)
# => 21

To find the rank in the sorted set for a particular interval, e.g. to find out if it falls into the top five intervals, use zrevrank.

redis.zrank('instance_startup_time', 160)
# => 0

Now we want to find the intervals where a particular number of instances fall into, say everything from 10 to 20 instances. We can use zrangebyscore for this purpose.

redis.zrangebyscore('instance_startup_time', 10, 20, :with_scores => true)
# => ["120", "10", "90", "10"] 

Note that Redis has some nifty operators where you can e.g. ask for every interval that has more than 10 instances, using the +inf operator, useful when you don't know the highest score in the sorted set.

redis.zrangebyscore('instance_startup_time', 10, '+inf', :with_scores => true)
# => ["120", "10", "90", "10", "60", "21", "160", "22"]

Now you want to sort the sorted set by the interval, e.g. to display the numbers in a table. You can use the sort command to sort the set by its elements, but unfortunately there doesn't seem to be a way to get the scores in the same call.

redis.sort('instance_startup_time')
# => ["20", "40", "60", "90", "120", "160"]

To make up for this you could iterate over the results and fetch the results in one go using the multi command.

members = redis.sort('instance_startup_time')
redis.multi do
  members.each do |member|
    redis.zscore('instance_startup_time', member)
  end
end

So far we've stored all numbers in one big sorted set, which will grow over time, making the statistical numbers very broad and less informative. Suppose we want to store daily metrics and then run the numbers weekly and monthly. We just used a different key derived from the current date.

today = Date.today.strftime("%Y%m%d")
redis.zincrby("instance_startup_time:#{today}", 1, 60)

Suppose we have collected data in the last two days. Thanks to zunionstore we can add the two sets together. Assume you have data from all days of the week, then you can use zunionstore to accumulate that data and store it with a different key.

redis.zunionstore('instance_startup_time:week49',
                  ['instance_startup_time:20102911', 'instance_startup_time:20103011'])

This will create a union of the sorted sets for the two subsequent days. The neat part is that will aggregate the data of the elements in the sets. So if on the one day 12 instances took 60 seconds to start and on the second 15, Redis will create the sum of all the scores. Neat, huh? What you get is a weekly aggregate of the collected data, of course it's easy to create monthly data as well.

Instead of summing up the scores you could also store the maximum or minimum across all the sets.

redis.zunionstore('instance_startup_time:week49',
                  ['instance_startup_time:20102911', 'instance_startup_time:20103011'],
                  :aggregate => 'max')

Of course you could save the extra union and just create counters for days, weeks and months in one go, but that wouldn't give me much material to highlight the awesomeness of sorted set unions now, wouldn't it?

You could achieve a similar data structure by using hashes, but you can do some neat things on sorted sets that you'd have to implement manually with hashes. Sorted sets are pretty neat when you need a weighed counter, e.g. download statistics, clicks, views, prelisted by the number of hits (scores) for the particular element.

Tags: redis

I've been spending some quality time with two of my new favorite tools lately (CouchDB and Redis, duh!), and while integrating them into Scalarium some needs and as a result some smaller hacks emerged. I don't want to deprive the world of their joy, so here they are.

First one is a tiny gem that will allow you to use Redis as a session store. What's so special about it, there's redis-store, right? Sure, but I couldn't for the life of me get to work reliably. Seems that's due to some oddity in Rack or something, at least that's where my interest of further investigating the issues faded, and I decided to just rip the code off MemCacheStore, and there you have it, redis-session-store. Rails-only and proud of it.

While working on it I constantly kept a monitor process open on Redis. Great feature by the way, if not awesome. I used telnet, and somehow I constantly managed to hit Ctrl-C in the terminal I had the telnet session open in. Reconnecting manually is tedious, so I give you my little redis-monitor script:

Incredibly simple, but saves those precious moments you'd waste typing everything by hand.

Last but not least, here's a hack-ish patch to make CouchPotato (great CouchDB Ruby library by the way) dump view queries into the log file. The ugly part at the end is me trying to get the log that's output at the end of each request to include the DB time for CouchDB queries.

It's not great, but works for now. We'll very likely include something decent into CouchPotato without hacking into ActionController like that. Unfortunately to get this far, there's really no other way. I tried faking ActiveRecord, but you open a whole other can of worm doing that, because a lot of code in Rails seems to rely on the existance on the ActiveRecord constant, assuming you're using the full AR stack when the constant is defined. Here's hoping that stuff is out the door in Rails 3. Haven't checked to be honest.

Dump that file into an initializer, and you're good to go (for the moment).

Tags: rails, redis, couchdb

A very valid question is: What's a good use case for Redis? There's quite a few, as Redis isn't your every day key-value store, it allows you to keeps lists and sets in your datastore, and to run atomic operations on them, like pushing and popping elements. All that stuff is incredibly fast, as obviously your data is held in memory and only persisted to the hard disk if necessary and to top it off, asynchronously, while not reducing the throughput of the server itself.

The simplest and most obvious use case is a cache. Redis clocks in at almost the speed of Memcached, with a couple of features sprinkled on top. If you need a cache, but maybe have a use case where you want also want to store data you in it that you want to be persisted, Redis is a decent tool for your caching needs. If you already have a Memcached instance in place I'd look at my options before adding a new component to my infrastructure though.

Pushing and popping elements atomically, does that ring a bell? Correct, that's what you want from a worker queue. Look at delayed_job, you'll find that it uses a locking column in your jobs table. Some people argue that a database should not be the place where you keep your worker jobs. Up to a certain amount of work I disagree, but at some point the performance costs outweigh the benefits, and it's time to move on. Redis is a perfect fit here. No locking needed, just push on the list of jobs and pop back off it in your workers, simple like that. It's the GitHub way, and the more I think about it, the more sense it makes.

For Redis 1.1 Salvatore has been working on a proposal by Ezra from Engine Yard to implement a command that would move items from one list to another in one step, atomically. The idea is to mark a job as in progress, while not removing it entirely from the data storage. Reliable messaging anyone? It's such a simple yet genius idea, and Redis has most of the functionality already in place. There's heaps more planned for future Redis releases, I'd highly recommend keeping an eye on the mailing list and on Salvatore's Twitter stream.

As I'm sure you noticed Redis is used for data storage in hurl, a neat little app to debug HTTP calls. Redis is simply used to store your personalized list of URLs you checked. Should some data be lost in between database dumps, it's not a big deal, it's not great sure, but not a big deal.

The simple answer for when to use Redis is: Whenever you want to store data fast that doesn't need to be 100% consistent. In the past projects I've worked on that includes classic examples of web application data, especially when there's social stuff sprinkled on top: ratings, comments, views, clicks, all the social stuff you could think of. With Redis, some of it is just a simple increment command or pushing something onto a list. Here's a nice example of affiliate click tracking using Rack and Redis.

Why is that a good match? Because if some of that data is lost, it doesn't make much of a difference. Throw in all the statistical or historial data you can think of that's accumulated in some way through your application, and could be recalculated if necessary. That data usually just keeps clogging up your database, and is harder and harder to get rid of as it grows.

Same is true for activity streams, logging history, all that stuff that is nonvolatile yet doesn't need to be fully consistent, where some data loss is acceptable. You'd be surprised how much of your data that includes. It does not, and let me be perfectly clear on that, include data that involves any sort of business transaction, be it for a shopping platform or for data involved in transactions for software as a service applications. While I don't insist you store that data in a relational database, at least it needs to go into a reliable and fully recoverable datastore.

One last example, the one that brought me and Redis together is Nanite, a self-assembling fabric of Ruby daemons. The mapper layer in Nanite keeps track of the state of the daemons in the cluster. That state can be kept on each mapper redundantly, but better yet, it should be stored in Redis. I've written a post about that a while back, but it's still another prime use for Redis. State that, should it or part of it get lost, will recover all by itself and automatically (best case scenario, but that's how it works in Nanite).

One thing to be careful though is that Redis can only take as much data as it has memory available. Especially for data that has the potential to grow exponential with users and their actions in your application, it's good to keep an eye on it and to do some basic calculations, but you should even do that when using something like MySQL. When in doubt, throw more memory at it. With Redis and its master-slave replication it's very easy to add a new machine with more memory, do one sync and promoto the slave to the new master within a matter of minutes. Try doing that with MySQL.

For me, this stuff is not about just being awesome. I've had countless situation where I had data that could've been handled more elegantly using something like Redis, or a fully persistent key-value store like Tokyo Tyrant. Now there's really no excuse to get that pesky data clogging up your database out of there. These are just some examples.

By the way, if you want to know what your Redis server is doing, telnet to your Redis instance on port 6379, and just enter "monitor". Watch in awe as all the commands coming in from other clients appear on your screen.

In the next post we'll dig into how you can store data from your objects conveniently into Redis.

Redis, consider us for your next project.

Tags: nosql, redis

Interested in Redis? You might be interested in the Redis Handbook I'm currently working on.

I'm gonna eat my own dog food here, and start you off with a collection of links and ideas of people using Redis. Redis' particular way of treating data requires some rethinking how to store your data to benefit from speed, atomicity and its data types. I've already written about Redis in abundance, this post's purpose is to compliment them with real-world scenarios. Maybe you can gather some ideas on how to deal with things.

There's a couple of well-known use cases already, the most popular of them being Resque, a worker queue. RestMQ, an HTTP-based worker queue using Redis, was just recently released too. Both don't make use yet of the rather new blocking pop commands like Redactor does, so there's still room for improvement, and to make them even more reliable.

Ohm is a library to store objects in Redis. While I'm not sure I'd put this layer of abstraction on top of it, it's well worth looking at the code to get inspiration. Same is true for redis-types.

Redis' simplicity, atomicity and speed make it an excellent tool when tracking things directly from the web, e.g. through WebSockets or Comet. If you can use it asynchronously, all the better.

  • Affiliate Click Tracking with Rack and Redis.

    Simple approach to tracking clicks, I probably wouldn't use a list for all clicks, but instead have one for each path, but there's always several ways to get to your goal with Redis. Not exactly the same, but Almaz can track URLs visited by users in Rails applications.

    Update: Turns out that the affiliate click tracking code above, the list is only used to push clicks into a queue, where they're popped off and handled by a worker, as pointed out by Kris in the comments.

  • Building a NLTK FreqDist on Redis

    Calculation of frequency distribution, with data stored in Redis.

  • Gemcutter: Download Statistics

    The RubyGems resource par excellence is going to use Redis's sorted sets to track daily download statistics. While just a proposals, the ideas are well applicable to all sorts of statistics being tracked in today's web applications.

  • Usage stats and Redis

    More on tracking views statistics with Redis.

  • Vanity - Experiment Driven Development

    Split testing tool based on Redis to integrate in your Rails application. Another kind of tracking statistics. If you didn't realize it up to now, Redis is an excellent tool for this kind of application. Data that you wouldn't want to load off to your main database, because let's face it, it's got enough crap to do already.

  • Flow Analysis & Time-based Bloom Filters

    Streaming data analysis for the masses.

  • Crowdsourced document analysis and MP expenses

    While being more prose than code, it still shows areas where Redis is a much better choice than e.g. MySQL.

Using Redis to store any suitable kind of statistics is pretty much an immediate use case for a lot of web applications. I could think of several projects I've work on that could gain something from using certain parts of their application to Redis. It's the kind of data you just don't want to clutter your database with. Clicks, view, history and all that stuff puts an unnecessary amount of data and load on it. The more data it accumulates, the harder it will be to get rid off, especially in MySQL.

It's not hard to tell that we're still far from having heaps of inspiration and real-life use cases to choose from, but these should give you an idea. If you want it can get a lot simpler too. When you're using Redis already, it makes sense to use it for storing Rails sessions.

Redis is a great way to share data between different processes, be it Ruby or something else. The atomic access to lists, strings and sets, together with speedy access ensures that you don't even need to worry about concurrency issues when reading and writing data. On Scalarium, we're using it mostly for sharing data between processes.

E.g., all communication between our system and clients on the instances we boot for our users is encrypted and signed. To ensure that all processes have access to the keys, they're stored conveniently in Redis. Even though that means the data is duplicated from our main database (which is CouchDB if you must know), access to Redis is a lot faster. We keep statistics about the instances in Redis too, because CouchDB is just not made for writing heaps and heaps of data quickly. Redis also tracks a request token that is used to authenticate internal requests in our asynchronous messaging system, to make sure that they can't be compromised from some external source. Each request gets assigned a unique token. The token is stored in Redis before the message is published and checked before the message is consumed. That way we turned Redis into a trusted source for shared data between web and worker processes.

The library memodis makes sharing data incredibly easy, it offers Redis-based memoization. When you assign a memodis'd attribute in your code, it'll be stored in Redis and therefore can be easily read from other processes.

Redis is incredibly versatile, and if you have a real-life Redis story or usage scenario to share, please do.

Tags: nosql, redis