Recently I've been having a foul taste in my mouth, or just a bad feeling, if you will. Whenever I started adding validations and callbacks to a model in a Rails application. It just felt wrong. It felt like I'm adding code that shouldn't be there, that makes everything a lot more complicated, and turns explicit into implicit code. Code that is only being run depending on the persistence state of an object. Code that is being hard to test, because you need to save an object to test parts of your business logic. Don't get me started on observers, I never was a fan of them. Putting stuff that should be run when an object was saved elsewhere is the worst kind of hiding business logic.

The two are somewhat bound together: Create an object, run a callback and validations. Update an object, check if an attribute is set, run a different callback. I started to loathe the lack of implicitness that callbacks gave my code, and I started to loathe that libraries everywhere copied that behaviour from ActiveRecord, and that you involve testing your database in your unit tests, though implicitly, and that noone had a better answer for how that stuff should work. Sure, it's not big news that involving your database in your unit tests is not a recommended practice, hence the overuse of mocking and stubbing fostered by RSpec, but it took a while for me to really feel the pain.

Ingredients: REST, Rails, Resources

The root problem (in a very philosophical way and in my very own opinion) lies in Rails' understanding of RESTful resources. The path to a resource is supposed to be thin, and a resource is mostly represented by something that's stored in a database. So far, so good. That thin layer however, the controller, implies that running any logic on a resource somehow ends up in simply updating the resource, setting some attribute signaling that an action needs to be taken. That's how tools like resource_controller, make_resourceful and inherited_resources came about, to get rid of the thin-layered controller tier, the boiler plate code. And I loved them, because I hated writing the same code over and over again.

But over time I realized that controllers aren't the problem, it's the way Rails treats resources that lead us to this mess where everything's declared in a model's class body, where you have to look at callbacks to find out what's going on when you create a new object, and where you clutter your models with code that just doesn't need to be there (sending mails, for example, or doing custom validations that are bound to specific state). It all started to lack explicitness for me. There's no clear path of entry, no list of method calls you can follow along.

I'm not sure how other people see this, obviously a lot of people don't realize it's a problem, or they're simply fine with how things work, I was for a long time too, almost five years now, to be exact. I know I'm not alone in this though, talking to others made me realize they're on a similar page, but given that it's the way Rails works, what can you do?

Validations

What about validations? I can live with validations being part of the model, but they're still part of the problem generated by callbacks, simply because validations are callbacks themselves. I always found myself ending up writing custom validations in before_validation et. al. hooks, and it just felt wrong, because the list of callbacks grew noticably. Validations are not always the same depending on your object's state. They can even be quite useless when you're handling a lot of backend code that's asynchronous and decoupled from the web interface.

What do we do to work around it? We resort to using save(false), whose lack of explicitness I cannot even begin to fully explain in a peaceful manner. Validations tend to depend on the object's state, at least more complex ones. They're not universal, they can be different when creating and saving objects. Yes, I know that in ActiveRecord you can say :on => :create, but for me that just emphasizes my point. Persistent state determines callbacks and therefore business logic, decreasing expressiveness.

Forms? Presenters?

I've looked into how Django is dealing with this, and it sort of struck a chord with me. Their approach is to define form objects that do the validations for you. They're somewhat decoupled from your business logic and are your intermediate layer between the web interface and the business model. Good lord, did I just recommend we all use presenter objects, even though Jay Fields said they don't work? I think I did, Bob!

It's not the way, but it's one way, and there's been work on bringing this to Rails, have a look at Bureaucrat for example. Again, it's one way to try solving the problem, it doesn't come without any downsides too. Tying the idea of explicit services to form data may or may not make unit testing harder, or at least mix two different concerns (forms and creating/manipulating objects). What I like about it though is that it decouples validation from the business logic, and ties it to the layer it's really necessary, the view.

Not without irony, I've started using something similar to handle embedded objects on a project using CouchDB. I wanted to extract out the stuff handling with embedded objects, because I was simply to lazy to add support for that to CouchPotato. The result felt amazingly good, almost like the above forms mixed with a *gasp\* service.

Services?

I'm still seeking for the answer to that one. Pat Maddox and James Golick suggest services as one way to make callbacks explicit, be sure to read through the comments on James' post. Now, I don't know about your history, but coming from a Java background that struck a bad cord with me. I laughed off the idea, to be frank.

One thing still bugs me about the approach James is laying out in his post, it's the extensive use of mocking. It sure does make for fast unit tests, and I'm starting to see the sense in it when using a service model, but I've grown fond of the fakes approach, which is why we wrote Rocking Chair, and why people came up with things like SuperModel, in-memory database implementations you can easily switch on and off, not exactly decoupling your unit tests from the persistence layer, but decoupling them from having a database running, making it easier to e.g. split batches of tests across several cores and processors, and simply making them faster.

However, the more I thought about it, the more sense the services approach made. You don't have to necessarily make it a separate class, depending on how much you like having all of your business logic in the model. I'm not sure which approach I'd prefer, because honestly, I haven't tried it yet, but I'm willing to give in. I'm willing to try things that let me ditch callbacks and validations, because I simply loathe them, they just don't feel right to me anymore when used in the way ActiveRecord does.

With Rails 3 for example, you can put your validations anywhere, they're not just bound to ActiveRecord anymore, they leave a lot more room for you to use them in different ways. I'm not suggesting you should get rid of validations, they're just a part of the callbacks madness, and they tend to not always be the same depending on what you're doing in a particular use case. Using save(false) just started to feel like a bad workaround to me to avoid having callbacks run. I want to save an object no matter what, and run whatever code I require in that course of action, in an explicit fashion.

Are Fine-grained Resources the Future?

The question that's left for me, and this is where I'm starting to gasp question Rails' approach to resources. I'm starting to think they're wrong, and that for the service or the forms model to work, the idea of a single resource you update whenever you want your model to do something needs to change. Having very specific services and their methods would mean a switch to more lightweight and finer-grained resources, specific resources where you e.g. accept an invitation not by PUTting to /invitation/12345, but instead to /invitation/12345/accept.

But, and here's the kicker, to me this leaves the question of how RESTful that approach is. This is where I'd like to get input. How do people using services already solve this problem? Have lots of tiny methods in your controllers? Maybe it's time to work more with Sinatra to get different ideas in my head, to rethink resources. I'm still not certain what a good and RESTful way to approach this would be. Whatever it is, it'd leave my business logic in a way that's a lot more approachable both for people reading it, and for my tests. That's something pretty dear to me, and more important than fully conforming to a REST model. There must be something better, because the current status quo of handling business logic with Rails' resources just doesn't work for me anymore, it doesn't make me a happy programmer.

The answers have yet to be found. My intention is not to rant, but rather to start a discussion around it. In my opinion the way Rails recommends handling resources doesn't have a future, if only for me, and even if that means using other frameworks in the future. I don't think we wasted time with Rails' current way, instead we're going to see an evolution, and I find that pretty exciting, because the future has yet to be written in this regard, and we can be a part of that.

Tags: rails, rest

For an article in a German magazine I've been researching MongoDB over the last week or so. While I didn't need a lot of the information I came across I collected some nicely distilled notes on some of its inner workings. You won't find information on how to get data out of or into MongoDB. The notes deal with the way MongoDB treats and handles your data, a high-low-level view if you will. I tried to keep them as objective as possible, but I added some commentary below.

Most of this is distilled knowledge I gathered from the MongoDB documentation, credit for making such a good resource available for us to read goes to the Mongo team. I added some of my own conclusion where it made sense. They're doing a great job documenting it, and I can highly recommend spending time to go through as much of it as possible to get a good overview of the whys and hows of MongoDB. Also, thanks to Mathias Stearn for hooking me up with some more details on future plans and inner workings in general. If you want to know more about its inner workings, there's a webcast coming up where they're gonna explain how it works.

Basics

  • Name stems from humongous, though (fun fact) mongo has some unfortunate meanings in other languages than English (German for example)
  • Written in C++.
  • Lots of language drivers available, pushed and backed by the MongoDB team. Good momentum here.
  • According to The Changelog Show ([1]) MongoDB was originally part of a cloud web development platform, and at some point was extracted from the rest, open sourced and turned into what it is today.

Collections

  • Data in MongoDB is stored in collections, which in turn is stored in databases. Collections are a way of storing related data (think relational tables, but sans the schema). Collections contain documents which have in turn keys, another name for attributes.
  • Data is limited to around 2 GB on 32-bit systems, because MongoDB uses memory-mapped files, as they're tied to the available memory addressing. (see [2])
  • Documents in collections usually have a similar data structure, but any arbitrary kind of document could be stored, similarity is recommended for index efficiency. Document's can have a maximum size of 4MB.
  • Collections can be namespaced, i.e. logically nested: db.blog.posts, but the collection is still flat as far as MongoDB is concerned, purely an organizational means. Indexes created on a namespaced collection only seem to apply to the namespace they were created on though.
  • A collection is physically created as soon as the first document is created in it.
  • Default limit on number of namespaces per database is 24000 (includes all collections as they're practically the top level namespace in a database), which also includes indexes, so with the maximum of 40 indexes applied to each collection you could have 585 collections in a database. The default can be changed of course, but requires repairing the database if changed on an active instance.
  • While you can put all your data into one single collection, from a performance point of view, it seems to make sense to separate them into different collections, because it allows MongoDB to keep its indexes clean, as they won't index attributes for totally unrelated documents.

Capped Collections

  • Capped collections are fixed-size collections that automatically remove aged entries by LRU. Sounds fancier than it probably is, I'm thinking that documents are just appended at the last writing index, which is reset to 0 when limit of the collection is reached. Preferrable for insert-only use cases, updates of existing documents fail when the data size is larger than before the update. This makes sense because moving an object would destroy the natural insertion order. Limited to ~1GB on 32-bit systems, sky's the limit on 64-bit.
  • Capped collections seem like a good tool for logging data, well knowing that old data is purged automatically, being replaced with new data when the limit is reached. Documents can't be deleted, only the entire collection can be dropped. Capped collections have no indexes on the _id by default, ensuring good write performance. Indexes generally not recommended to ensure high write performance. No index on _id means that walking the collection is preferred over looking up by a key.
  • Documents fetched from a capped collection are returned in the order of their insertion, newest first, think log tailing.

Data Format

  • Data is stored and queried in BSON, think binary-serialized JSON-like data. Features are a superset of JSON, adding support for regular expressions, date, binary data, and their own object id type. All strings are stored in UTF-8 in BSON, sorting on the other hand does not (yet), it uses strcmp, so the order might be different from what you'd expect. There's a sort of specification for BSON, if you're into that kind of stuff: [3] and [4]
  • Documents are not identified by a simple ID, but by an object identifier type, optimized for storage and indexing. Uses machine identifier, timestamp and process id to be reasonably unique. That's the default, and the user is free to assign any value he wishes as a document's ID.
  • MongoDB has a "standard" way of storing references to other documents using the DBRef type, but it doesn't seem to have any advantages (e.g. fetch associated objects with parent) just yet. Some language drivers can take the DBRef object and dereference it.
  • Binary data is serialized in little-endian.
  • Being a binary format, MongoDB doesn't have to parse documents like with JSON, they're a valid in-memory presentation already when coming across the wire.

References

  • Documents can embed a tree of associated data, e.g. tags, comments and the like instead of storing them in different MongoDB documents. This is not specific to MongoDB, but document databases in general (see [5]), but when using find you can dereference nested objects with the dot, e.g. blog.posts.comments.body, and index them with the same notation.
  • It's mostly left to the language drivers to implement automatic dereferencing of associated documents.
  • It's possible to reference documents in other databases.

Indexes

  • Every document gets a default index on the _id attribute, which also enforces uniqueness. It's recommended to index any attribute that's being queried or sorted on.
  • Indexes can be set on any attribute or embedded attributes and documents. Indexes can also be created on multiple attributes, additionally specifying a sort order.
  • If an array attribute is indexed, MongoDB will indexed all the values in it (Multikeys).
  • Unique keys are possible, missing attributes are set to null to ensure a document with the same missing attribute can only be stored once.
  • If it can, MongoDB will only update indexes on keys that changed when updating a document, only if the document hasn't changed in size so much that it must be moved.
  • MongoDB up to 1.2 creates and updates synchronously, 1.3 has support to update indexes in the background

Updates

  • Updates to documents are in-place, allowing for partial updates and atomic operations on attributes (set for all attributes, incr, decr on numbers, push, pop, pull et. al on arrays), also known as modifier operations. If an object grows out of the space originally allocated for it, it'll be moved, which is obviously a lot slower than updating in-place, since indexes need to be updated as well. MongoDB tries to adapt by allocating based on update history (see [6]). Writes are lazy.
  • Not using any modifier operation will result in the full document being updated.
  • Updated can be done with criteria, so a whole bunch of matching documents. Think "update ... where" in SQL. This allows for updating objects based on a particular snapshot, i.e. update based on id and some value in the criteria will only update when the document still has that value. This kind of update is atomic. Reliably updating multiple documents atomically (think transaction) is not possible. There's also findAndModify in 1.3 (see [7]) which allows atomically updating and returning a document.
  • Upserts insert when a record with the given criteria doesn't exist, otherwise updates the found record. They're executed on the collection. A normal save() will do that automatically for any given document. Think find_or_create_by in ActiveRecord.

Querying

  • Results are returned as cursors, walking a collection as it advances. Which explains why you potentially get records that needed to be moved, it pops up in a space that's potentially after its current position, if there's space even in a spot before the current cursor's position. Cursors are fetched in batches of 100 documents or 4 MB of data, whichever's reached first.
  • That's also why it's better to store similar data in a separate collection. Traversing similar data is cheaper than traversing over totally unrelated data, the bigger the size of documents compared to the documents that match your find, the more data will have to be fetched from the database and skipped if it doesn't match your criteria.
  • Data is returned in natural order which doesn't necessarily relate to insertion order, as data can be moved if it doesn't fit into its old spot anymore when updated. For capped collections, natural order is always insertion order.

Durability

  • By default, data in MongoDB is flushed to disk every 60 seconds. Writes to MongoDB (i.e. document creates, updates and deletes) are not stored on disk until the next sync. Tradeoff high write performance vs. durability. Need more durability, reduce sync delay. Closest comparison to the durability behaviour is MySQL's MyISAM.
  • Data is not written transactional, so if the server is killed during a write operation, the data is likely to be inconsistent or even corrupted and needs repair. Think classic file systems like ext2 or MyISAM.
  • In MongoDB 1.3 a database flush to disk can be enforced by sending the fsync command to the server.

Replication

  • Replication is the recommended way of ensuring data durability and failover in MongoDB. A new (i.e. bare and dataless) instance can be hooked onto another at any time, doing an initial cloning of all data, fetching only updates after that.
  • Replica pairs offers an auto-failover mechanism. Initially both settle on which is master and which is slave, the slave taking over should the master go down. Can be used e.g. in the Ruby driver using :left and :right options. There's an algorithm to handle changes when master and slave get out of sync, but it's not fully obvious to me (see [8]). Replica Pairs will be replaced by Replica Sets, allowing for more than one slave. The slave with the most recent data will be promoted master in case of the master going down. The slaves agree which one of them is the new master, so a client could ask any one server in the set which one of them is the master.
  • Replication is asynchronous, so updates won't propagate immediately to the slaves. There's ideas to require the right to be propagated to at least N slaves before returning the write to the client successfully (similar to the feature in MySQL 5.4). (see [9])
  • A master collects its writes in an opslog on which the slaves simply poll for changes. The opslog is a capped collection and therefore not a fully usable transaction log (not written to disk?) as old data is purged automatically, hence not reliable for restoring the database after a crash.
  • After initial clone, slaves poll once on the full opslog, subsequent polls remember the position where the previous poll ended.
  • Replication is not transactional, so the durability of the data on the slave is prone to the same durability conditions as the master, just in a different and still durability-increasing manner, since having a slave allows to decrease sync times on it, and therefore shortening the timespan of data not being written to disk across the setup.

Caching

  • With the default storage engine, caching is basically handled by the operating system's virtual memory manager, since it uses memory-mapped files. File cache == Database cache
  • Caching behaviour relies on the operating system, and can vary, not necessarily the same on every operating system.

Backup

  • If you can live with a temporary write lock on your database, MongoDB 1.3 offers fsync with lock to take a reliable snapshot of the database's files.
  • Otherwise, take the old school way of dumping the data using mongodump, or snapshotting/dumping from a slave database.

Storage

  • Data is stored in subsequently numbered data files, each new one being larger than the former, 2GB being the maximum size a data file can have.
  • Allocation of new datafiles doesn't seem to be exactly related to the amount of data currently being stored. E.g. storage size returned by MongoDB for a collection was 2874825392 bytes, but it had already created almost six gigabytes worth of database files. Maybe that's the result of padding space for records. I haven't found a clear documentation on this behaviour.
  • When MongoDB moves data into a different spot or deletes documents, it keeps track of the free space to reuse in the future. The command repairDatabase() can be use to compact it, but that's a slow and blocking operation.

Concurrency

  • MongoDB refrains from using any kind of locking on data, it has no notion of a transaction or isolation levels. Concurrent writes will simply overwrite each other's data, as they go straight to memory. Exceptions are modifier operations that are guaranteed to be atomic. As there is no way to update multiple records in some sort of transaction, optimistic locking is not possible, at least in a fully reliable way. Since writes are in-place and in-memory first, they're wicked fast.
  • Reads from the database are usually done in cursors, fetching a batch of documents lazily while iterating through it. If records in the cursors are updated while the cursor is being read from, the updated data may or may not show up. There's no kind of isolation level (as there are no locks or snapshotting). Deleted records will be skipped. If a record is updated from another process so that the size increases and the object has to be moved to another spot there's a chance it's returned twice.
  • There's snapshot queries, but even they may or may not return inserted and deleted records. They do ensure that even updated records will be returned only once, but are slower than normal queries.

Memory

  • New data is allocated in memory first, increments seem to be fully related to the amount of data saved.
  • MongoDB seems to be happy to hold on to whatever memory it can get, but at least during fsync it frees as much as possible. Sometimes it just went back to consuming about 512 MB real memory, other times it went down to just a couple of megs, I couldn't for the life of me make out a pattern.
  • When a new database file needs to be created, it looks like MongoDB is forcing all data to be flushed to disk, freeing a dramatic amount of memory. On normal fsyncs, there's no real pattern as to how MongoDB frees memory.
  • It's not obvious how a user can configure how much memory MongoDB can or should use, I guess it's not possible as of now. Memory-mapped files probably just use whatever's available, and be cleaned up automatically by the operating system's virtual memory system.
  • The need to add an additional caching layer is reduced, as object and database representation is the same, and file system and memory cache can work together to speed up access, there's no data conversion involved, at least not on MongoDB's side, data will just be sent serialized and unparsed across the wire. Obviously it depends on the use case if this is really an advantage or a secondary caching layer is still needed.

GridFS

  • Overcomes the 4MB limit on documents by chunking larger binary objects into smaller bits.
  • Can store metadata alongside file data. Metadata can be specified by the user and be arbitrary, e.g. contain access control information, tags, etc.
  • Chunks can be randomly access, so it's possible to fetch data easily whose position in the file is well-known. If random access is required, makes sense to keep chunks small. Default size is 256K.

Protocol Access

  • MongoDB's protocol is binary and in its own right proprietary, hence they offer a lot of language drivers to take that pain away from developers, but also offer a full specification on both BSON and the protocol.

Sharding

  • MongoDB has alpha support for sharding. Its functionality shouldn't be confused with Riak's way of partitioning, it's a whole different story. The current functionality is far from what is planned for production, so take everything listed here with a grain of salt, it merely presents the current state. The final sharding feature is supposed to be free of the restrictions listed here.
  • A shard ideally (but not necessarily) consists of two servers which form a replica pair, or a replica set in the future.
  • All shards are known to a number of config server instances that also know how and where data is partitioned to.
  • Data can be sharded by a specific key. That key can't be changed afterwards, neither can the key's value.
  • Keys chosen should be granular enough so that there's the potential of having too many records with the same key. Data is split into chunks of 50 MB so with big documents, it's probably better to store them in GridFS, as a chunk can contain a minimum of ~12 documents when all take up the available space of 4 MB.
  • Sharding is handled by a number of mongos instances which are connected to the shards which in turn are all known to a number of mongod config server instances. These can run on the same machines as the data-handling mongod instances, with the risk that when the servers go down they also disappear. Having backup services seems to be appropriate in this scenario.
  • Sharding is still in alpha, e.g. currently replicated shards aren't supported in alpha 2, so a reliable sharding setup is currently not possible. If a shard goes down, the data on it is simply unavailable until it's brought back up. Until that happens, all reads will raise an error, even when looking up data that's known to be on the still available shards.
  • There's no auto-balancing to move chunks to new shards, but that can be done manually.

There you go. If you have something to add or to correct, feel free to leave a comment. I'm happy to stand corrected should I have drawn wrong conclusions anywhere.

As a user of CouchDB I gotta say, I was quite sceptical about some of MongoDB's approaches of handling data. Especially durability is something that I was worried about. But while I read through the documentation and played with MongoDB I realized that it's the same story as always: It depends. It's a problem when it's a problem. CouchDB and MongoDB don't necessarily cover the same set of use cases. There's enough use cases where the durability approach of MongoDB is acceptable compared to what you gain, e.g. in development speed, or speed when accessing data, because holy crap, that stuff is fast. There's a good reason for that, as I hope you'll agree after going through these notes. I'm glad I took the time to get to know it better, because the use cases kept popping up in my head where I would prefer it over CouchDB, which isn't always a sweet treat either.

If you haven't already, do give MongoDB a spin, go through their documentation, throw data at it. It's a fun database, and the entrance barrier couldn't be lower. It's a good combination of relational database technologies, with schemaless and JavaScript sprinkled on top.

Tags: nosql, mongodb

Interested in Redis? You might be interested in the Redis Handbook I'm currently working on.

I'm gonna eat my own dog food here, and start you off with a collection of links and ideas of people using Redis. Redis' particular way of treating data requires some rethinking how to store your data to benefit from speed, atomicity and its data types. I've already written about Redis in abundance, this post's purpose is to compliment them with real-world scenarios. Maybe you can gather some ideas on how to deal with things.

There's a couple of well-known use cases already, the most popular of them being Resque, a worker queue. RestMQ, an HTTP-based worker queue using Redis, was just recently released too. Both don't make use yet of the rather new blocking pop commands like Redactor does, so there's still room for improvement, and to make them even more reliable.

Ohm is a library to store objects in Redis. While I'm not sure I'd put this layer of abstraction on top of it, it's well worth looking at the code to get inspiration. Same is true for redis-types.

Redis' simplicity, atomicity and speed make it an excellent tool when tracking things directly from the web, e.g. through WebSockets or Comet. If you can use it asynchronously, all the better.

  • Affiliate Click Tracking with Rack and Redis.

    Simple approach to tracking clicks, I probably wouldn't use a list for all clicks, but instead have one for each path, but there's always several ways to get to your goal with Redis. Not exactly the same, but Almaz can track URLs visited by users in Rails applications.

    Update: Turns out that the affiliate click tracking code above, the list is only used to push clicks into a queue, where they're popped off and handled by a worker, as pointed out by Kris in the comments.

  • Building a NLTK FreqDist on Redis

    Calculation of frequency distribution, with data stored in Redis.

  • Gemcutter: Download Statistics

    The RubyGems resource par excellence is going to use Redis's sorted sets to track daily download statistics. While just a proposals, the ideas are well applicable to all sorts of statistics being tracked in today's web applications.

  • Usage stats and Redis

    More on tracking views statistics with Redis.

  • Vanity - Experiment Driven Development

    Split testing tool based on Redis to integrate in your Rails application. Another kind of tracking statistics. If you didn't realize it up to now, Redis is an excellent tool for this kind of application. Data that you wouldn't want to load off to your main database, because let's face it, it's got enough crap to do already.

  • Flow Analysis & Time-based Bloom Filters

    Streaming data analysis for the masses.

  • Crowdsourced document analysis and MP expenses

    While being more prose than code, it still shows areas where Redis is a much better choice than e.g. MySQL.

Using Redis to store any suitable kind of statistics is pretty much an immediate use case for a lot of web applications. I could think of several projects I've work on that could gain something from using certain parts of their application to Redis. It's the kind of data you just don't want to clutter your database with. Clicks, view, history and all that stuff puts an unnecessary amount of data and load on it. The more data it accumulates, the harder it will be to get rid off, especially in MySQL.

It's not hard to tell that we're still far from having heaps of inspiration and real-life use cases to choose from, but these should give you an idea. If you want it can get a lot simpler too. When you're using Redis already, it makes sense to use it for storing Rails sessions.

Redis is a great way to share data between different processes, be it Ruby or something else. The atomic access to lists, strings and sets, together with speedy access ensures that you don't even need to worry about concurrency issues when reading and writing data. On Scalarium, we're using it mostly for sharing data between processes.

E.g., all communication between our system and clients on the instances we boot for our users is encrypted and signed. To ensure that all processes have access to the keys, they're stored conveniently in Redis. Even though that means the data is duplicated from our main database (which is CouchDB if you must know), access to Redis is a lot faster. We keep statistics about the instances in Redis too, because CouchDB is just not made for writing heaps and heaps of data quickly. Redis also tracks a request token that is used to authenticate internal requests in our asynchronous messaging system, to make sure that they can't be compromised from some external source. Each request gets assigned a unique token. The token is stored in Redis before the message is published and checked before the message is consumed. That way we turned Redis into a trusted source for shared data between web and worker processes.

The library memodis makes sharing data incredibly easy, it offers Redis-based memoization. When you assign a memodis'd attribute in your code, it'll be stored in Redis and therefore can be easily read from other processes.

Redis is incredibly versatile, and if you have a real-life Redis story or usage scenario to share, please do.

Tags: nosql, redis