By now it should be obvious that I'm quite fond of alternatives data stores (call them NoSQL if you must). I've given quite a few talks on the subjects recently, and had the honor of being a guest on the (German) heise Developer Podcast on NoSQL.

There's some comments and questions that pop up every time alternative databases are being talked about, especially by people deeply rooted in relational thinking. I've been there, and I know it requires some rethinking, and also am quite aware that there are some controversial things that basically are the exact opposite of everything you learned in university.

I'd like to address a couple of those with some commentary and my personal experience (Disclaimer: my experience is not the universal truth, it's simply that: my experience, your mileage may vary). When I speak of things done in practice, I'm talking about how I witnessed things getting done in Real Lifeā„¢, and how I've done them myself, both good and bad. I'm focussing on document databases, but in general everything below holds true for any other kind of non-relational database.

It's easy to say that all the nice features document databases offer are just aiming for one thing, to scale up. While that may or may not be true, it just doesn't matter for a lot of people. Scaling is awesome, and it's a problem everyone wants to solve, but in reality it's not the main issue, at least not for most people. Also, it's not an impossible thing to do even with MySQL, I've had my fun doing so, and it sure was an experience, but it can be done.

It's about getting stuff done. There's a lot more to alternative databases in general, and document databases in particular, that I like, not just the ability to scale up. They simply can make my life easier, if I let them. If I can gain productivity while still being aware of the potential risks and pitfalls, it's a big win in my book.

What you'll find, when you really think about it, is that everything below holds true no matter what database you're using. Depending on your use case, it can even apply to relational databases.

Relational Databases are all about the Data

Yes, they are. They are about trying to fit your data into a constrained schema, constrained in length, type, and other things if you see fit. They're about building relationships between your data in a strongly coupled way, think foreign key constraints. Whenever you need to add data, you need to migrate your schema. That's what they do. They're good at enforcing a set of ground rules on your data.

See where I'm going with this? Even though relational databases tried to be a perfect fit for data, they ended up being a pain once that data needed to evolve. If you haven't felt that pain yet, good for you. I certainly have. Tabular data sounds nice in theory, and is pretty easy to handle in Excel, but in practice, it causes some pain. A lot of that pain stemmed from people using MySQL, yes, but take that argument to the guy who wrote it and sold it to people as the nicest and simplest SQL database out there.

It's easy to get your data into a schema once, but it gets a lot harder to change the schema and the data into a different schema at a later point in time. While data sticks around, the schema evolves constantly. Something relational databases aren't very good at supporting.

Relational Databases Enforce Data Consistency

They sure do, that's what they were built for. Constraints, foreign keys, all the magic tricks. Take Rails as a counter-example. It fostered the idea that all that stuff is supposed to be part of the application, not the database. Does it have trade-offs? Sure, but it's part of your application. In practice, that was correct, for the most part, although I can hear a thousand Postgres users scream. There's always an area that requires constraints on the database level, otherwise they wouldn't have been created in the first place.

But most web applications can live fine without it, they benefit from being free about their data, to shape it in whichever way they like, adding consistency on the application level. The consistency suddenly lies in your hands, a responsibility not everyone is comfortable with. You're suddenly forced to think more about edge cases. But you sure as hell don't have to live without consistent data, quite the opposite. The difference is that you're taking care of the consistency yourself, in terms of your use case, not using a generic one-fits-all solution.

Relationships between data aren't always strict. They can be loosely linked, what's the point of enforcing consistency when you don't care if a piece of data still exists or not? You handle it gracefully in your application code if you do.

SQL is a Standard

The basics of SQL are similar, if not the same, but under the hood, there's subtle differences. Why? Because under the hood, every relational database works differently. Which is exactly what document databases acknowledge. Every database is different, trying to put a common language on top will only get you so far. If you want to get the best out of it, you're going to specialize.

Thinking in Map/Reduce as CouchDB or Riak force you to is no piece of cake. It takes a while to get used to the ideas around it and what implications it has for you and your data. It's worth it either way, but sometimes SQL is just a must, no question. Business reporting can be a big issue, if your company relies on supporting standard tools, you're out of luck.

While standards are important, in the end it's important what you need to do with your data. If a standard gets in your way, how is that helpful? Don't expect a standard query language for document databases any time soon. They all solve different types of problems in different ways, and they don't intend to hide that from you with a standard query language. If on the other hand, all you need is a dynamic language for doing ad-hoc queries, check out MongoDB.

Normalized Data is a Myth

I learned a lot in uni about all the different kinds of normalization. It just sounded so nice in theory. Model your data upfront, then normalize the hell out of it, until it's as DRY as the desert.

So far so good. I noticed one thing in practice: Normalized data almost never worked out. Why? Because you need to duplicate data, even in e-commerce applications, an area that's traditionally mentioned as an example where relational databases are going strong.

Denormalizing data is simply a natural step. Going back to the e-commerce example, you need to store a lot of things separately when someone places an order: Shipping and billing address, payment data used, product price and taxes, and so on. Should you do it all over the place? Of course not, not even in a document database. Even they encourage storing similar data to a certain extent, and with some of them, it's simply a must. But you're free to make these decisions on your own. They're not implying you need to stop normalizing, it still makes sense, even in a document database.

Schemaless is not Schemaless

But there's one important thing denormalization is not about, something that's being brought up quite frequently and misunderstood easily. Denormalization doesn't mean you're not thinking about any kind of schema. While the word schemaless is brought up regularly, schemaless is simply not schemaless.

Of course you'll end up with having documents of the same type, with a similar set of attributes. Some tools, for instance MongoDB, even encourage (if not force) you to store different types of documents in different collections. But here's the kicker, I deliberately used the word similar. They don't need to be all the same across all documents. One document can have a specific attribute, the other doesn't. If it doesn't, just assume it's empty, it's that easy. If it needs to be filled at some point, write data lazily, so that your schema eventually is complete again. It's evolving naturally, which does sound easy, but in practice requires more logic in your application to catch these corner cases.

So instead of running migrations that add new tables and columns, and in the end pushing around your data, you migrate the data on the next access, whether that's a read or a write is up to your particular use case. In the end you simply migrate data, not your schema. The schema will evolve eventually, but first and foremost, it's about the data, not the constraints they live in. The funny thing: In larger projects, I ended up doing the same thing with a relational database. It's just easier to do and gentler on the load than running a huge batch job on a production database.

No Joins, No Dice

No document database supports joins, simple like that. If you need joins, you have two options: Use a database that supports joins, or adapt your documents so that they remove the need for joins.

Documents have one powerful advantage: It's easy to embed other documents. If there's data you'd usually fetch using a join, and that'd be suitable for embedding (and therefore oftentimes: denormalizing), there's your second option. Going back to the e-commerce example: Whereas in a relational database you'd need a lot of extra tables to keep that data around (unless you're serializing it into single column), in a document database you just add it as embedded data to the order document. You have all the important data one in place, and you're able to fetch it in one go. Someone said that relational databases are a perfect fit for e-commerce. Funny, I've worked on a market platform, and I've found that to be a ludicrous statement. I'd have benefited from a loser data storage several times, joins be damned.

It's not always viable, sure, and it'd be foolish to stick with a document database if that's an important criterion for your particular use case, then no dice. It's relational data storage or bust.

Of course there's secret option number three, which is to just ignore the problem until it's a problem, just by going with a document database and see how you go, but obviously that doesn't come without risks. It's worth noticing though that Riak supports links between documents, and even fetching linked documents together with the parent in one request. In CouchDB on the other hand, you can emit linked documents in views. You can't be fully selective about the document data you're interested in, but if all you want is fetch linked documents, there is one or two ways to do that. Also, graph databases have made it their main focus to make traversal of associated documents an incredibly cheap operation. Something your relational database is pretty bad at.

Documents killed my Model

There's this myth that you just stop thinking about how to model your data with document databases or key-value storage. That myth is downright wrong. Just because you're using schemaless storage doesn't mean you stop thinking about your data, quite the opposite, you think even more about it, and in different ways, because you simply have more options to model and store it. Embedding documents is a nice luxury to have, but isn't always the right way to go, just like normalizing the crap out of a schema isn't always the way to go.

It's a matter of discipline, but so is relational modelling. You can make a mess of a document database just like you can make a mess of a relational database. When you migrate data on the fly in a document database, there's more responsibility in your hands, and it requires good care with regards to testing. The same is true for keeping track of data consistency. It's been moved from the database into your application's code. Is that a bad thing? No, it's a sign of the times. You're in charge of your data, it's not your database's task anymore to ensure it's correct and valid, it's yours. With great power comes great responsibility, but I sure like that fact about document databases. It's something I've been missing a lot when working with relational databases: The freedom to do whatever the heck I want with my data.

Read vs. Write Patterns

I just like including this simply because it always holds true, no matter what kind of database you're using. If you're not thinking about how you're going to access your data with both reads and writes, you should do something about that. In the end, your schema should reflect your business use case, but what good is that when it's awkward to access the data, when it takes joins across several tables to fetch the data you're interested in?

If you need to denormalize to improve read access, go for it, but be aware of the consequences. A schema is easy to build up, migrating on the go, but if document databases force you to do one thing, and one thing only, it's to think about how you're reading and writing your data. It's safe to say that you're not going to figure it all out upfront, but you're encouraged to put as much effort into it as you can. When you find out you're wrong down the line, you might be surprised to find that they make it even easier to change paths.

Do your Homework

Someone recently wrote a blog post on why he went back to MySQL from MongoDB, and one of his reasons was that it doesn't support transactions. While this is a stupid argument to bring up in hindsight, it makes one thing clear: You need to do research yourself, noone's going to do it for you. If you don't want to live up to that, use the tools you're familiar with, no harm done.

It should be pretty clear up front what your business use case requires, and what tools may or may not support you in fulfilling these requirements. Not all tool providers are upfront about all the downsides, but hey, neither was MySQL. Read up, try and learn. That's the only thing you can do, and noone will do it for you. Nothing has changed here, it's simply becoming more obvious, because you suddenly have a lot more options to work with.

Polyglot Data Storage

Which brings me to the most important part of them all: Document databases (and alternative, non-relational data stores in general) are not here to replace relational databases. They're living alongside of them, with both sides hopefully somewhat learning from each other. Your projects won't be about just one database any more, it's not unlikely you're going to end up using two or more, for different use cases.

Polyglot persistence is the future. If there's one thing I'm certain of, this is it. Don't let anyone fool you into thinking that their database is the only one you'll need, they all have their place. The hard part is to figure out what place that is. Again, that's up to you to find out. People ask me for particular use cases for non-relational databases, but honestly, there is no real distinction. Without knowing the tools, you'll never find out what the use cases are. Other people can just give you ideas, or talk about how they're using the tools, they can't draw the line for you.

Back to the Future

You shouldn't think of it as something totally new, document databases just don't hide these things from you. Lots of the things I mentioned here are things you should be doing anyway, no matter if you're using a relational or a non-relational data store. They should be common sense really. We're not trying to repeat what went wrong in history, we're learning from it.

If there's one thing you should do, it's to start playing with one of the new tools immediately. I shouldn't even be telling you this, since you should hone your craft all the time, and that includes playing the field and broadening your personal and professional horizon. Only then will you be able to judge what use case is a good fit for e.g. a document database. I'd highly suggest starting to play with e.g. CouchDB, MongoDB, Riak or Redis.

About eighteen months ago I wrote about going back to Vim as my daily text editor. It was a bust, and I went back to TextMate after about a week.

Suddenly it's the year 2010, and I'm typing this in Vim. What happened? My itch was re-scratched if you will. I was wary of some of TextMate's perceived shortcomings, and honestly I missed having a command and insert mode. It may sound stupid, but I really prefer that way of working with text and code. TextMate is still a nice editor, but seeing its development coming to a perceived halt made me realize that Vim is simply forever, not being developed by just one guy, but a community.

It's also worth mentioning that I simply started from scratch. Last time I built upon a configuration that grew over the years, and that included things about whose purpose I just had no idea. I watched the Smash Into Vim PeepCode too, and started with the clean slate configuration set that comes with it. If you're thinking of getting (back) into Vim, it's highly recommended, it's sure to wet your appetite. There's also a collection of screencasts and a free book on Vim 7 available on the interwebs. I have some useful links in my bookmark collection too.

There's been a lot of developments around scripts for Vim that bring TextMate-like functionality, or that support things like Cucumber, smart quotes and auto-closing braces, or even the most awesome Git integration you'll find. But the nicest of them all is Pathogen, a script that allows you to keep all your other scripts in separate places, not losing overview of what's installed where, and in which version.

Coming from TextMate, you're gonna miss the "Go To File" dialog, I'm sure. Check out Command-T, which does exactly that, only with path-matching sprinkled on top. It's not as fast unfortunately, but a lot faster to use than the annoying fuzzy thing I used the last time I tried to live on Vim. There's also PeepOpen, but it always opens files in new tabs, and that can get quite annoying, as new Vim tabs are quite different from Vim buffers. For project views I use NERDtree, though LustyExplorer also seems acceptable.

As I said, I started from scratch, with a clean slate. So the decent thing to do was to put all my Vim configuration files on GitHub. They include all the scripts I'm using, and my configuration, all neatly separated into different bundles thanks to Pathogen. There's a couple of things that are still a bit wonky. Lusty Juggler doesn't work as advertised all the time, though it's a neat tool, allowing you to quickly select one of a list of the latest open buffers. RubyTest is quite weird, and I'm thinking of dumping it completely, and simply rolling my own commands to run tests based on it. The rails.vim script package does include some support to run tests too, but not to execute a single test case.

In general, I haven't found anything that works in TextMate that you can't somehow get to work in Vim. Yes, I've used the word somehow. It's not easy as pie all of the time, and it can be different, heck it's a different editor. But I willingly accept that, because as a text editor, I find Vim to be a lot better than TextMate.

I've been back on Vim for a month now, and I'm not looking back at all. It's like coming back to an old friend and learning what awesome things he's been up to. It's pretty much as exciting as playing with new technologies at the moment. Learning new things can be pretty exciting, even if it's just another text editor. But it's not all fun and giggles. I have some annoyances still, but no editor is perfect. I'm more willing to accept Vim's for the increased text surgeon skills than TextMate's, to be frank. TextMate is still a nice editor, don't get me wrong, my heart just always belonged to Vim.

Honestly, I'm more willing to invest my learning time in an editor that I know I can use everywhere than one I can only use on the Mac with a running user interface. I'm using Vim on every server I'm managing, so why not on my local machine? Vim makes me think about how I can edit text in the most efficient way possible, and I like that very much. It even made me map my caps-lock key to control, finally!

Update: Was just tipped off that PeepOpen can be made to behave properly and open files in the current MacVim tab. When you set your MacVim options like in the picture below (notice the part "Open files from applications"), it works a treat. Thanks, Mutwin!

MacVim Options

Tags: vim

I've attended my fair share of conferences this month alone, plus a Seedcamp, and I can safely say that in any way, I learned a lot about how to build slides, how to keep the audience engaged and things one just shouldn't do in a talk or in slides. While I certainly don't claim to be an expert on the topic now, I just wanted to put all of my impressions and lessons learned into a post.

I'm definitely not the first person to write about this kind of stuff, a year ago Geoffrey Grosenbach wrote on presenting, and just recently John Nunemaker wrote a post on improving your presentations for less then $50. Both are well worth reading, but they don't cover everything I find annoying in presentations, so there you go.


Keep them small

Seven bullet points per slide is bullshit, that's way too much. One phrase per slide is a decent rule, though I'm not dogmatic about it. One phrase and a couple of short bullet points (not more than four) work from time to time, but not all the time. I usually go for a bigger slide set these days, with less content on each slide.

I can run through 80 slides in 45 minutes. I know that sounds like a lot, and I certainly go through them fast, but I'd rather give people something to think about than bore them to death. Slides with too much text on it also have the negative effect of distracting the audience. They shouldn't read the slide text, they should be listening to what you have to say. Even if you do talk slow, less text on slides is always a good idea. The people should listen to you, not try to understand what your slides are saying.

What I usually do is just crank out slides with any text that I'd like to say, and then I go through them one or two times to refine and shorten the prases I used to be no more than four or five words for the most part. I also throw out slides when I realize they're disrupting the flow or contain things I'm likely to talk about when I'm on a different slide.

Use a large font

Just do it. Not only does it make your slides more readable for everyone in the audience, it forces you to keep the information on a single slide short. My headlines are usually 60pt, my subheadings and bullet points around 45pt. The bigger the better.

While we're talking about fonts, avoid italic. It's a lot harder to read, especially when you mix it with a regular font. If you need to emphasize something, just make it bold. Italic fonts disrupt your slides' flow.

Avoid full sentences

Except when you're quoting someone. Short phrases or even just a single word are much easier to grasp for the audience, and they give you a better sense of flow.

Dark text on a bright background

A dark background only works for Steve Jobs, because his team does everything they can to adjust the lighting on location for his talk. You on the other end, have to assume the worst. If there's just a little too much light coming into the room, your slides will be unreadable, when you use a dark background. I've even seen slides where people chose a dark background and just a slightly dark font.

You have no influence on the lighting in the room, and you'll pretty much just embarrass yourself when your slides are unreadable. There's just no excuse why you shouldn't just use a light background and a dark font.

Avoid dark photos

Photos are at a similar risk. The more contrast you have in photos you're using in your preso, the less likely people will be able to see them. I tend to not use a lot of photos in my slides anyway, but I just hate having to say: "Geee, that's a bit hard to see, isn't it?"

Slides are for the people attending the talk

Your slide set should not be focussed on being fully understandable by people who have not attended your talk. You end up with so called slideuments, presentations that read like a document. You're talking for the people attending your talk, they probably paid to hear you speak, so focus your energy on giving them a good talk. If you want the rest of the world to know about details of your preso, write a blog post or put it into the presenter notes.

Video killed the conference star

I've seen video in presentations quite a few times, and honestly, it bores me to death, especially when there's a voiceover on the video. If you must include video, at least talk yourself, taking the audience through whatever happens on the screen, especially because you don't know how the audio is going to be at the venue. I'm well aware that live demos are a finnicky thing, but so is video. Not always do you have the luxury of using your own computer to do the presentation.

Avoid long code snippets

Code is simply hard to grasp within just a couple of seconds, and it's awkward trying to explain larger chunks of it. Use short snippets instead. If you must include some longer examples, split it up in smaller bits, explaining them one by one. I tend to avoid overly complex code snippets. Trying to explain them properly just takes too much time.

Avoid flashy animations

They simply take up valueable time and distract the audience. Even though they're nice to look at in theory, in practice they're the bane of a well-built presentation. This is true for both transitions between slides and elements of a single slide appearing later. Just make them appear, not sparkle or fade in.

The Talk

Practice, practice, practice

I find practicing a talk by speaking to myself awkward, not because it's embarrassing, but simply because of the butterflies in my stomach I always end up saying different things in the actual talk. Now, that's not to say you shouldn't think about what you want to say. I tend to go through my slides several times, going through the things I associate with every single one of them, giving me a rough idea and a line of thought on what I want to say. This definitely is a lot easier to do when it's a topic you've talked about before, but in general the above has worked much better for me.

Drink, drink, drink

It's a simple fact that talking a lot lets your mouth run dry. I need about half a liter of water to get through a talk. Or at least I make sure I have that amount ready. Before you run dry and faint in the midst of your talk, drink, it's not a shameful thing to do, it simply keeps you going. Shame on conference organizers not thinking about having drinks ready for their speakers. When in doubt, scout the talks before you and make sure you have a bottle ready should it not being taken care of.

Look at the audience, not the big screen

It should be so obvious, yet I've just seen people do it again at Cloud Expo. One of the guy's slides had 14 bullet points on it, and the font probably was too small for him to be able to read it from the laptop screen. Another reason why I keep my slides short, they're purpose is to keep me in a flow, to give me short reminders of what I want to talk about.

Don't read your presenter notes

If you need presenter notes to run your talk, you need to practice more. They're surely useful for people just looking at your slides, but if it takes full sentences to keep your talk running, you'll end up wasting a lot of time trying to read what your notes say. Talking freely is a challenge, but the earlier you take it on, the faster you'll get used to it. I've seen people use index cards with their presenter notes on them, handwritten, trying to decipher what they've written on them.

If you know what you're talking about (at least the slightest bit), you'll be fine without them, trust me.

Two's not a company

Having more than one speaker is awkward, especially when one of them is just standing there for most of the time, waiting for his turn. Have one up in front at any one time, bring in the next person when it's his turn. Simple like that.

Don't ask questions

The audience simply won't answer. If you ask anything, make the audience raise their hands on a topic, but don't expect anyone to answer a specific question. That's your task. Involving the audience sounds like a good idea, but they're lazy, they want to learn something.

Jokes, tiny bits and stories

Stories and jokes can really lighten up a presentation. Sure, you shouldn't tell jokes all the time, but something sarcastic thrown in from time to time sure can help to wake up the audience. Stories are even better, people love benefitting from real life experiences in any way. If it has a happy ending, even better.

Talking slowly is for wimps

The rule of spending two minutes on a slide is bullshit. It would only mean you'd have seven bullet points on a particular slide. You shouldn't rush through anything, and I certainly try to avoid doing that, and it definitely depends on the topic you're talking about, but when I talk about technical things I expect the audience to be curious about it and try to keep up. If they can't, they can always come back to my slides or ask questions. But as always, it depends.

Talking fast is for the impatient

If it's on more generic things that involve higher level topics, or some sort of longer-running workshop, it's only appropriate to walk the people through it and take your time doing so. Usually in these situations it's a lot easier to focus on a single topic. It just depends on how broad your talks topic is.

Take tiny breaks

Should you realize you're sort of losing track, simply bring yourself back on the rails. Take a tiny break or just stop talking. You don't need to apologize for that. It's easy to start blabbering on about a certain topic which you didn't even intend to cover in your talk. On the other hand, that's what makes every talk unique, and is exactly why shorter phrases on slides are so much better. They keep your brain engaged, making up associations with certain things as you go, and they help keeping a talk interesting.

Avoid longer breaks though as people end up being bored, and you're losing precious time. Longer breaks are usually a sign that you're not as prepared as you should be. If you need to switch in between e.g. slides and a live demo, make sure that everything is prepared before the talk.

Talking in front of others is a challenge, no doubt about it, but there's really no point trying to avoid it, because the only way to improve your skills is to simply talk in front of people. This is my view of the talking world. I constantly try to improve on my slides and think about what I'm doing wrong during talks to improve on that. I'll never loose the excitement right before a talk, and that's a good thing. When it becomes routine, you tend to bore people instead of engaging them. It's about constantly improving yourself to simply become better at talking in front of others.

This is my view of giving presentations. Feel free to throw in your ideas, or even to disagree. These guidelines probably aren't for everyone, and they might even change for me within just a couple of months, but most of them simply make sense to me. I do need to get me a good remote though, since with my larger slide sets, I find myself hitting the space bar a lot.