Last weekend I tweeted two links to two tweets by a poor guy who apparently got his MongoDB database into an unrecoverable state during shutdown whilst upgrading to a newer version. That tweet quickly made the rounds, and the next morning I saw myself staring at replies stating that it was all his fault, because he 1.) used kill -9 to shut it down because apparently the process hung (my guess is it was in the middle of flushing all data to disk) and 2.) didn't have a slave, just one database instance.

Others went as far as indirectly calling him an idiot. Oh interwebs, you make me sad. If you check out the thread on the mailing list, you'll notice a similar pattern in reasoning. The folks over at http://learnmongo.com seem to want to be the wittiest of them all, recommending to always have a recent backup, a slave or replica set and to never kill -9 your database.

While you can argue that the guy should've known better, there's something very much at odds here, and it seems to become a terrifying meme with fans of MongoDB, the idea that you need to do all of these things to get the insurance of your data being durable. Don't have a replica? Your fault. kill -9 on a database, any database? You mad? Should've read the documentation first, dude. This whole issue goes a bit deeper than just reading documentation, it's the fundamental design decision of how MongoDB treats your data, and it's been my biggest gripe from the get go. I can't help but be horrified by these comments.

I've heard the same reasoning over and over again, and also that it just hasn't happened so far, noone's really lost any considerable data. The problem is, most people never talk about it publicly, because it's embarrassing, best proof is the poor guy above. This issue is not even related to MongoDB, it's a general problem.

Memory-Mapped Persistence

But let me start at the beginning, MongoDB's persistence cycle, and then get to what's being done to improve its reliability and your data's durability. At the very heart, MongoDB uses memory-mapped files to store data. A memory-mapped file is a data structure that has the same representation on disk as it has when loaded into memory. When you access a document in MongoDB, loading it from disk is transparent to MongoDB itself, it can just go ahead and write to the address in memory, as every database in MongoDB is mapped to a dynamically allocated set of files on disk. Note that memory-mapped files are something you won't find in a lot of other databases, if any at all. Most do their own house-keeping and use custom data structures for that purpose.

The memory mapping library (in MongoDB's case the POSIX functions, and whatever Windows offers in that area) will take care of handling the flush back to disk every 60 seconds (configurable). Everything in between happens solely in memory. Database crash one second before the flush strikes again? You just lost most of the data that was written in the last 59 seconds. Just to be clear, the flushing cycle is configurable, and you should consider choosing a better value depending on what kind of data you're storing.

MongoDB's much praised insert speed? This is where it comes from. When you write stuff directly to local memory, they better be fast. The persistence cycle is simple: accept writes for 60 seconds, then flush the whole thing to disk. Wait for another 60 seconds, then flush again, and so on. Of course MongoDB also flushes the data when you shut it down. But, and here's the kicker, of course that flush will fail when you kill it without mercy, using the KILL signal, just like the poor guy above did apparently. When you kill something that writes a big set binary data to disk, all bets are off. One bit landing on the wrong foot and the database can get corrupted.

Database Crashes are Unavoidable

This scenario can and does happen in e.g. MySQL too, it even happens with CouchDB, but the difference is, that in MySQL you usually only have a slightly damaged region, which can be fixed by deleting and re-inserting it. In CouchDB, all that happens is that your last writes may be broken, but CouchDB simply walks all the way back to the last successful write and runs happily ever after.

My point here is simple: even when killed using the KILL signal, a database should not be unrecoverable. It simply shouldn't be allowed to happen. You can blame the guy all you want for using kill -9, but consider the fact that it's the process equivalent of a server or even just the database process crashing hard. Which happens, believe it or not.

Yes, you can and probably will have a replica eventually, but it shouldn't be the sole precondition to get a durable database. And this is what horrifies me, people seem to accept that this is simply one of MongoDB's trade-offs, and that it should just be considered normal. They shouldn't, it needs more guys like the one causing all the stir bringing up these isses, even though it's partly his fault, to show the world what can happen when worse comes to worst.

People need to ask more questions, and not just accept answers like: don't use kill -9, or always have a replica around. Servers crash, and your database needs to be able to deal with it.

Durability Improvements in MongoDB 1.7/1.8

Now, the MongoDB folks aren't completely deaf, and I'm happy to report they've been working on improvements in the area of data durability for a while, and you can play with the new durability option in the latest builds of the 1.7 branch, and just a couple of hours ago, there was activity in improving the repair tools to better deal with corrupted databases. I welcome these changes, very much so. MongoDB has great traction, a pretty good feature set, and the speed seems to blow peoples' minds. Data durability has not been one of its strengths though, so I'm glad there's been a lot of activity in that area.

If you start the MongoDB server with the new --dur option, and it will start keeping a journal. When your database crashed, the journal is simply replayed to restore all changes since the last successful flush. This is not a particularly special idea, because it's how your favorite relation database has been working for ages, and not unsimilar to the storage model of other databases in the NoSQL space. It's a good trade-off between keeping good write speed and getting a much more durable dataset.

When you kill your database harshly in between flushes with a good pile of writes in between, you don't lose a lot of data anymore, maybe a second's worth (just as you do with MySQL when you use InnoDB's delayed flushing), if any at all, but not much more than that. Note that these are observation based on a build that's now already more than a month old. Situation may have improved since then. Operations are put into a buffer in memory, from where they're both logged to disk into the journal, and then applied to the dataset. When writing the data to memory, it has already been written to the journal. Journals are rotated once they reach a certain size and it's ensured that all their data has been applied to the dataset.

A recovery process applies all uncommitted changes from the log when the database crashes. This way it's ensured that you only lose a minimum set of data, if none at all, when your database server crashes hard. In theory the journal could be used to restore a corrupted in a scenario as outlined above, so it's pretty neat in my opinion. Either way, the risk of losing data is now pretty low. In case your curious for code, the magic happens in this method.

I for one am glad to see improvements in this area of MongoDB, and I'm secretly hoping that durable will become the default mode, though I don't see it happening for marketing reasons anytime soon. Also, be aware that durability brings more overhead. In some initial tests however, the speed difference between non-durable and durable MongoDB was almost not worth mentioning, though I wouldn't call them representative, but in general there's no excuse to not use it really.

It's not yet production ready, but nothing should keep you from playing with it to get an idea of what it does.

Bottom Line

It's okay to accept trade-offs with whatever database you choose to your own liking. However, in my opinion, the potential of losing all your data when you use kill -9 to stop it should not be one of them, nor should accepting that you always need a slave to achieve any level of durability. The problem is less with the fact that it's MongoDB's current way of doing persistence, it's with people implying that it's a seemingly good choice. I don't accept it as such. If you can live with that, which hopefully you don't have to for much longer anyway, that's fine with me, it's not my data anyway. Or maybe I'm just too paranoid.

Let me go ahead and say it: I love distributed systems. Why? Simply because they bend my brain. Yesterday I tweeted "Distributed databases are my happy place." One response I got was along the lines of: "then you're probably not running a distributed database in production." Busted! But does it matter? We all love distributed stuff, we love thinking about scaling. They seem like problems everyone wants to have and solve.

But the truth is, I don't, and I can assure you, you don't want to either, sometimes I doubt my brain is even capable properly solving these problems, but that doesn't prevent me from trying. I prefer to work on as small a scale as possible, you could even say I hate distributed systems. Scaling and distribution is a problem most of us don't have, and are probably better of not having.

Truth be told, I'm not highly interested in running highly distributed systems in production, quite the opposite. I prefer maxing out what I have as far up as possible. Sometimes I do take the plunge and just try something new in production, but I'm happy prepared to replace it with something different, even something simpler, if that seems like the better option in the end. Everyone should experiment at some point, but not all the time.

But why then do I love distributed systems? Simply because they make me think about how they could be put to use, what algorithms and the problems involved are, and what implications they would have on a production system, both from an operations and developer perspective. That's where the value is for me, it allows me to simply make informed decisions when the time comes.

Take Riak, for example, on which I gave a shortish talk at yesterday's meet-up of the local Ruby brigade. Riak's distribution model is based on Amazon's Dynamo implementation, with some neat features sprinkled on top. Riak is built by a bunch of really, really smart guys at Basho, whose work I have nothing but respect for, but who also are sane and open enough to tell people when their database may or may not be a good fit (something a certain other database is severly lacking).

Riak is exciting for me because it was the first database that really made me dive into Amazon's Dynamo, and once I started grokking it, it blew my mind. If you haven't read it, please do. It blew my mind simply because it introduced me to a whole new thinking, to heavily distributed storage, with all the potential hot spots, downsides and business use cases for specifics parts of it thrown in. The same is true for Google's BigTable. The technologies involved with both are true mind-benders.

And there's my bottom line. Distributed systems aren't necessarily awesome just because they allow scaling to infinite heights (exaggeration intended), but because they broaden your personal horizon. It's like learning new programming languages. It's about getting new ideas in your head, ideas outside of your everyday working realm. Ideas you can maybe even take back to what you're working on and start applying them where it makes sense, and only if it makes sense. Learning about distributed systems is not just about learning how to use them, but when. Knowing is half the battle.

While you're at it, check out Evan Weaver's "Distributed Systems Primer", a collection of papers on distributed systems, or the papers collection over at NoSQL Summer. Get ready to have your mind blown in whole new ways. Say what you will, that stuff is just fascinating. It appeals to the distributed database lover in me.

By now it should be obvious that I'm quite fond of alternatives data stores (call them NoSQL if you must). I've given quite a few talks on the subjects recently, and had the honor of being a guest on the (German) heise Developer Podcast on NoSQL.

There's some comments and questions that pop up every time alternative databases are being talked about, especially by people deeply rooted in relational thinking. I've been there, and I know it requires some rethinking, and also am quite aware that there are some controversial things that basically are the exact opposite of everything you learned in university.

I'd like to address a couple of those with some commentary and my personal experience (Disclaimer: my experience is not the universal truth, it's simply that: my experience, your mileage may vary). When I speak of things done in practice, I'm talking about how I witnessed things getting done in Real Life™, and how I've done them myself, both good and bad. I'm focussing on document databases, but in general everything below holds true for any other kind of non-relational database.

It's easy to say that all the nice features document databases offer are just aiming for one thing, to scale up. While that may or may not be true, it just doesn't matter for a lot of people. Scaling is awesome, and it's a problem everyone wants to solve, but in reality it's not the main issue, at least not for most people. Also, it's not an impossible thing to do even with MySQL, I've had my fun doing so, and it sure was an experience, but it can be done.

It's about getting stuff done. There's a lot more to alternative databases in general, and document databases in particular, that I like, not just the ability to scale up. They simply can make my life easier, if I let them. If I can gain productivity while still being aware of the potential risks and pitfalls, it's a big win in my book.

What you'll find, when you really think about it, is that everything below holds true no matter what database you're using. Depending on your use case, it can even apply to relational databases.

Relational Databases are all about the Data

Yes, they are. They are about trying to fit your data into a constrained schema, constrained in length, type, and other things if you see fit. They're about building relationships between your data in a strongly coupled way, think foreign key constraints. Whenever you need to add data, you need to migrate your schema. That's what they do. They're good at enforcing a set of ground rules on your data.

See where I'm going with this? Even though relational databases tried to be a perfect fit for data, they ended up being a pain once that data needed to evolve. If you haven't felt that pain yet, good for you. I certainly have. Tabular data sounds nice in theory, and is pretty easy to handle in Excel, but in practice, it causes some pain. A lot of that pain stemmed from people using MySQL, yes, but take that argument to the guy who wrote it and sold it to people as the nicest and simplest SQL database out there.

It's easy to get your data into a schema once, but it gets a lot harder to change the schema and the data into a different schema at a later point in time. While data sticks around, the schema evolves constantly. Something relational databases aren't very good at supporting.

Relational Databases Enforce Data Consistency

They sure do, that's what they were built for. Constraints, foreign keys, all the magic tricks. Take Rails as a counter-example. It fostered the idea that all that stuff is supposed to be part of the application, not the database. Does it have trade-offs? Sure, but it's part of your application. In practice, that was correct, for the most part, although I can hear a thousand Postgres users scream. There's always an area that requires constraints on the database level, otherwise they wouldn't have been created in the first place.

But most web applications can live fine without it, they benefit from being free about their data, to shape it in whichever way they like, adding consistency on the application level. The consistency suddenly lies in your hands, a responsibility not everyone is comfortable with. You're suddenly forced to think more about edge cases. But you sure as hell don't have to live without consistent data, quite the opposite. The difference is that you're taking care of the consistency yourself, in terms of your use case, not using a generic one-fits-all solution.

Relationships between data aren't always strict. They can be loosely linked, what's the point of enforcing consistency when you don't care if a piece of data still exists or not? You handle it gracefully in your application code if you do.

SQL is a Standard

The basics of SQL are similar, if not the same, but under the hood, there's subtle differences. Why? Because under the hood, every relational database works differently. Which is exactly what document databases acknowledge. Every database is different, trying to put a common language on top will only get you so far. If you want to get the best out of it, you're going to specialize.

Thinking in Map/Reduce as CouchDB or Riak force you to is no piece of cake. It takes a while to get used to the ideas around it and what implications it has for you and your data. It's worth it either way, but sometimes SQL is just a must, no question. Business reporting can be a big issue, if your company relies on supporting standard tools, you're out of luck.

While standards are important, in the end it's important what you need to do with your data. If a standard gets in your way, how is that helpful? Don't expect a standard query language for document databases any time soon. They all solve different types of problems in different ways, and they don't intend to hide that from you with a standard query language. If on the other hand, all you need is a dynamic language for doing ad-hoc queries, check out MongoDB.

Normalized Data is a Myth

I learned a lot in uni about all the different kinds of normalization. It just sounded so nice in theory. Model your data upfront, then normalize the hell out of it, until it's as DRY as the desert.

So far so good. I noticed one thing in practice: Normalized data almost never worked out. Why? Because you need to duplicate data, even in e-commerce applications, an area that's traditionally mentioned as an example where relational databases are going strong.

Denormalizing data is simply a natural step. Going back to the e-commerce example, you need to store a lot of things separately when someone places an order: Shipping and billing address, payment data used, product price and taxes, and so on. Should you do it all over the place? Of course not, not even in a document database. Even they encourage storing similar data to a certain extent, and with some of them, it's simply a must. But you're free to make these decisions on your own. They're not implying you need to stop normalizing, it still makes sense, even in a document database.

Schemaless is not Schemaless

But there's one important thing denormalization is not about, something that's being brought up quite frequently and misunderstood easily. Denormalization doesn't mean you're not thinking about any kind of schema. While the word schemaless is brought up regularly, schemaless is simply not schemaless.

Of course you'll end up with having documents of the same type, with a similar set of attributes. Some tools, for instance MongoDB, even encourage (if not force) you to store different types of documents in different collections. But here's the kicker, I deliberately used the word similar. They don't need to be all the same across all documents. One document can have a specific attribute, the other doesn't. If it doesn't, just assume it's empty, it's that easy. If it needs to be filled at some point, write data lazily, so that your schema eventually is complete again. It's evolving naturally, which does sound easy, but in practice requires more logic in your application to catch these corner cases.

So instead of running migrations that add new tables and columns, and in the end pushing around your data, you migrate the data on the next access, whether that's a read or a write is up to your particular use case. In the end you simply migrate data, not your schema. The schema will evolve eventually, but first and foremost, it's about the data, not the constraints they live in. The funny thing: In larger projects, I ended up doing the same thing with a relational database. It's just easier to do and gentler on the load than running a huge batch job on a production database.

No Joins, No Dice

No document database supports joins, simple like that. If you need joins, you have two options: Use a database that supports joins, or adapt your documents so that they remove the need for joins.

Documents have one powerful advantage: It's easy to embed other documents. If there's data you'd usually fetch using a join, and that'd be suitable for embedding (and therefore oftentimes: denormalizing), there's your second option. Going back to the e-commerce example: Whereas in a relational database you'd need a lot of extra tables to keep that data around (unless you're serializing it into single column), in a document database you just add it as embedded data to the order document. You have all the important data one in place, and you're able to fetch it in one go. Someone said that relational databases are a perfect fit for e-commerce. Funny, I've worked on a market platform, and I've found that to be a ludicrous statement. I'd have benefited from a loser data storage several times, joins be damned.

It's not always viable, sure, and it'd be foolish to stick with a document database if that's an important criterion for your particular use case, then no dice. It's relational data storage or bust.

Of course there's secret option number three, which is to just ignore the problem until it's a problem, just by going with a document database and see how you go, but obviously that doesn't come without risks. It's worth noticing though that Riak supports links between documents, and even fetching linked documents together with the parent in one request. In CouchDB on the other hand, you can emit linked documents in views. You can't be fully selective about the document data you're interested in, but if all you want is fetch linked documents, there is one or two ways to do that. Also, graph databases have made it their main focus to make traversal of associated documents an incredibly cheap operation. Something your relational database is pretty bad at.

Documents killed my Model

There's this myth that you just stop thinking about how to model your data with document databases or key-value storage. That myth is downright wrong. Just because you're using schemaless storage doesn't mean you stop thinking about your data, quite the opposite, you think even more about it, and in different ways, because you simply have more options to model and store it. Embedding documents is a nice luxury to have, but isn't always the right way to go, just like normalizing the crap out of a schema isn't always the way to go.

It's a matter of discipline, but so is relational modelling. You can make a mess of a document database just like you can make a mess of a relational database. When you migrate data on the fly in a document database, there's more responsibility in your hands, and it requires good care with regards to testing. The same is true for keeping track of data consistency. It's been moved from the database into your application's code. Is that a bad thing? No, it's a sign of the times. You're in charge of your data, it's not your database's task anymore to ensure it's correct and valid, it's yours. With great power comes great responsibility, but I sure like that fact about document databases. It's something I've been missing a lot when working with relational databases: The freedom to do whatever the heck I want with my data.

Read vs. Write Patterns

I just like including this simply because it always holds true, no matter what kind of database you're using. If you're not thinking about how you're going to access your data with both reads and writes, you should do something about that. In the end, your schema should reflect your business use case, but what good is that when it's awkward to access the data, when it takes joins across several tables to fetch the data you're interested in?

If you need to denormalize to improve read access, go for it, but be aware of the consequences. A schema is easy to build up, migrating on the go, but if document databases force you to do one thing, and one thing only, it's to think about how you're reading and writing your data. It's safe to say that you're not going to figure it all out upfront, but you're encouraged to put as much effort into it as you can. When you find out you're wrong down the line, you might be surprised to find that they make it even easier to change paths.

Do your Homework

Someone recently wrote a blog post on why he went back to MySQL from MongoDB, and one of his reasons was that it doesn't support transactions. While this is a stupid argument to bring up in hindsight, it makes one thing clear: You need to do research yourself, noone's going to do it for you. If you don't want to live up to that, use the tools you're familiar with, no harm done.

It should be pretty clear up front what your business use case requires, and what tools may or may not support you in fulfilling these requirements. Not all tool providers are upfront about all the downsides, but hey, neither was MySQL. Read up, try and learn. That's the only thing you can do, and noone will do it for you. Nothing has changed here, it's simply becoming more obvious, because you suddenly have a lot more options to work with.

Polyglot Data Storage

Which brings me to the most important part of them all: Document databases (and alternative, non-relational data stores in general) are not here to replace relational databases. They're living alongside of them, with both sides hopefully somewhat learning from each other. Your projects won't be about just one database any more, it's not unlikely you're going to end up using two or more, for different use cases.

Polyglot persistence is the future. If there's one thing I'm certain of, this is it. Don't let anyone fool you into thinking that their database is the only one you'll need, they all have their place. The hard part is to figure out what place that is. Again, that's up to you to find out. People ask me for particular use cases for non-relational databases, but honestly, there is no real distinction. Without knowing the tools, you'll never find out what the use cases are. Other people can just give you ideas, or talk about how they're using the tools, they can't draw the line for you.

Back to the Future

You shouldn't think of it as something totally new, document databases just don't hide these things from you. Lots of the things I mentioned here are things you should be doing anyway, no matter if you're using a relational or a non-relational data store. They should be common sense really. We're not trying to repeat what went wrong in history, we're learning from it.

If there's one thing you should do, it's to start playing with one of the new tools immediately. I shouldn't even be telling you this, since you should hone your craft all the time, and that includes playing the field and broadening your personal and professional horizon. Only then will you be able to judge what use case is a good fit for e.g. a document database. I'd highly suggest starting to play with e.g. CouchDB, MongoDB, Riak or Redis.

Call it NoSQL, call it post-relational, call it what you like, but it's hard to ignore that hings are happening in the database world. A paradigm shift is not too far ahead, and it's a big one, and I for one am welcoming our post-relational overlords. Whatever you call them, CouchDB, MongoDB (although you really shouldn't call a database MongoDB), Cassandra, Redis, Tokyo Cabinet, etc. I'm well aware that they're not necessarily all the same, but they do try to fill similar gaps. Making data storage easy as pie, offering data storage fitting with the kind of evolving data we usually find on the web.

The web is slowly moving away from relational databases and with that, SQL. Let me just say it upfront: I hate SQL. I physically hate it. It doesn't fit with my way of thinking about problems, and it doesn't fit with the web. That's my opinion anyway, but I'm sure I'm not alone with it.

There is however one area where object-oriented databases failed, and where the new generation of document databases will have similar problems. You could argue that object-oriented databases are in some way a predecessor to modern post-relational databases, they made storing objects insanely easy, no matter how complex they were, and they made navigating through objects trees even easier and insanely fast. Which made them applicable to some problems, but they weren't flexible enough in my opinion. But they still laid some groundwork.

skitched-20090908-165230.jpg

It's mainly concerning The Enterprise and their giant collection of reporting tools. Everybody loves tools, and The Enterprise especially loves them. The more expensive, the better. Reporting tools are the base for those awesome pie charts they just love to fill entire PowerPoint presentations with. They work on "standardized" interfaces and languages and therefore, with SQL.

I've worked on a project were we switched from an object-oriented to a relational database just because of that. Sure, there's proprietary query languages, or there's JQL when you're into JDO, EJB3 and the like. But they're nowhere as powerful as SQL is. They're also not as brain-twisting. That should be a good thing really, but there you have it.

NoSQL databases are facing a similar dilemma. Just like object-oriented databases they're awesome for just dumping data in it, more or less structured. It's easy to get them out too, and it's usually easy to aggregate the data in some way. Is it a big deal? Of course not, at least not in my opinion. But if it is some sort of deal, what can you do to work around that?

  • Ignore it. Simple, isn't it? The reporting requirement can usually be solved in a different way. Sure, it can be more work, but usually reporting is less of a killer than some might think. Give the client some way to express a query and let him at it. Give him a spare instance of your replicated database, and let him work off that data. Best thing you could do is pre-aggregate it as much as possible so there's less work for the client.

  • If you really need structured data in a relational database, consider replicating the data into one from your post-relational database of choice. I can hear you say: That guy's crazy, that'd involve so much work keeping the two in sync! No, it wouldn't. Create a fresh dump every time you need a current dataset, and dump it into your SQL database. Simple like that.

  • Put an interface in front of the new database. Yes, it's insane, but I've done it, and it works. It doesn't have to be an SQL interface, just a common interface that works with one set of reporting tools. Yes, it's not ideal, but it's an option.

  • Don't ignore it, keep using a relational database. Yep, not all of us are lucky enough, someone still has to serve the market demands. Legacy projects or clients are forcing us to stick with the old and the dusty model of storing and retrieving data. Quite a lot of people are happy with that, but I'm not.

I'm sure there's other options, these are just off the top of my head, and I can say that I've practiced all of them with more or less good results.. I for one am sick of still having to use MySQL on new projects. I've had my fun with it, and sure there's a whole bunch of patches that make it a bit more fun, but it's still MySQL. Yes, I am aware that there's PostgreSQL, but it's the same story. Old, old and old.

Should you still try to get a new generation database into new projects? Yes, yes and yes, you definitely should. Consider yourself lucky if you succeed, because you're still an early awesome adopter. Even use SimpleDB if you must, but maybe reconsider before you really use it, it's not great. But don't lie to your clients, they should be aware what they're getting into. It's no big deal, but the bigger they are the more likely they have administrators not yet familiar with the new tools. But the more people start using them now, the better they'll get before they hit the mainstream. Which they will eventually, rest assured. I'm ready, the web is ready, and the tools are ready. What about you?

Tags: nosql, databases