It's been slightly more than six months since I released the first version of the Riak Handbook. It's been an amazing and incredible ride so far, and it's about time I wrote about how things went from the perspective of publishing, marketing and selling this book all on my own. For this I draw inspiration from Jarrod Drysdale's post on his book Bootstrapping Design and Jesse Storimer's post on the sales of his book Working With Unix Processes. Both books are awesome, by the way, and well worth checking out.

Introduction

Originally I wanted to write a book on NoSQL. I called it "NoSQL Handbook", and I wanted it to cover the better-known NoSQL databases, including Riak, Redis, Cassandra, CouchDB and MongoDB. That was in early 2011. I wrote a chapter on MongoDB, but it didn't feel right to me, so I tried a different approach on the Riak chapter, much more focused on being practical and introducing the user-facing concepts in a streamlined way, building up as the book progresses.

That approach felt much more natural to me, so eventually I started on the Redis chapter and thought I nailed it. I continued working on the Riak chapter afterwards in a similar style. Focusing on practical and interesting usage made it much more approachable to me, so I figured I might just do the same for others. That's not exactly how you build a product, though, I was lucky that so many customers agreed with this approach to writing the book.

The Riak chapter grew and grew, eventually it clocked in at 120 pages and kept growing. My list of things to write about was long, it still is. So I turned to Amy Hoy's 30x500 class, which I was part of last winter, and brought up the idea of first creating a product focused on a single database and to follow up with the bigger book later on.

I started to like the idea, and was encouraged by Amy to try this route. Six months later, going down that route was a success, though there was no guarantee at all it would end up being one.

Let's look at the numbers.

Numbers, Spikes, Cause and Effect

It's weird talking about this, because it involves sales numbers, money and being very open about the selling part of the book. It feels weird because not a lot of people selling products do it. It feels weird because book authors are usually not very open about the royalties they get. I have the most respect for anyone writing a book, make no mistake. But I do want to encourage people to go down the self-publishing route, to build products, to write books, sell and market them.

Since the book went on sale on December 15th 2011, it has sold 746 copies to this very day, grossing in at a total of $22,007.67. 8.9% of that go to my fulfillment provider, leaving a profit of $20,048.99. The total splits into 7 site licenses and the 739 single licenses.

It's hard to express how grateful I am. The number of sales have exceeded my wildest expectations. I thought if I got to $2,000 I would strike off the book as a success. I was insanely happy when I reached 2k, every step afterwards I could hardly believe, eventually reaching an unbelievable milestone of $10,000.

But on the other hand, maybe the sales numbers aren't entirely a coincidence. Here's a graph representing sales in the last six months.

Sales

There's a huge and a smaller spike at the beginning and several smaller spikes along the way. Even though I can't put my finger on all of the smaller spikes (something which I must work on for sure), there are some that can be explained in terms of marketing. Let's put some context on each and see what happened.

Sales

The first spike is the first day of publishing the book. I ended up selling 89 copies on the first day, 52 on the second. The book got posted on HackerNews, and so did an article on building activity feeds with Riak that I published the same day.

The Monday after the release I sent out a newsletter to everyone who signed up for the NoSQL Handbook, some 1400 people. That's the second spike, and it sold some 70 books in total, looking at the first three days of the newsletter going out. Some sales kept trickling in from that newsletter over the following days too.

The next spike is in early February. If you're into NoSQL you'll remember Amazon releasing their new cloud database offering DynamoDB to the public in January. I wrote a long post about it discussing its features and how I thought (still do) it's different from Dynamo, Amazon's original distributed database whose ideas Riak was built on. Interestingly, that article brought quite a few sales of the book.

It's curious because the article is only slightly related to the contents of the book but still builds up enough interest for people to dig deeper into the roots of both Riak and DynamoDB. The book covers a pretty good ground in terms of where Riak comes from and the theories and ideas behind it. In total that article got somewhere around 20,000 views in just a few days.

The last spike is the 1.1 update that I released in late May. I put the book on sale for a week, with a 25% discount and sold quite a few copies in that time.

Marketing

The interesting bit about marketing started not too long before I published the book. Following Amy's suggestion I worked my way up to the launch by publishing three blog posts around Riak. That was also the first time I publicly dropped my intention of publishing a book on Riak, the "Riak Handbook".

It's hard to put numbers on that kind of marketing, but it certainly helps a lot to build up excitement and interest for an upcoming product.

The really hard part about it all was writing the copy for the website. Bringing out the value of the book for someone interested in Riak is hard, really hard. I think I did okay for my first ever try writing what you'd call marketing copy, but only the people who did or didn't buy the book can be the judge of that.

Picking up the spirit of Seth Godin, I'd certainly do more variations on copy and website and do more in-depth testing on a future product instead of leaving it the way it is for too long. I'd also experiment with different ways of underlining the existence of the site license.

The same is true for the newsletter. Writing an email to 1400 people, people that'd see my copy prose and that'd read my email, that was scary. It took me full day to write the text, lay out the email, re-read it multiple times and to finally hit that send button.

It was really, really scary. It felt like being exposed to an unknown public with my pants off. But then again, these people showed interest in what I'm working on by signing up. That's what eventually pushed me over the edge to hit send.

The newsletter helped a lot, so did the initial blog posts and other blog posts following over time. Regularly writing and publishing something related to the book, related to Riak, being helpful on IRC and on the Riak mailing list made a difference too. It got people curious about the book. That's the first step. I need to work on the second step, taking the customer from there.

As a last thought, the best kind of marketing you can get is word of mouth. Great, sincere and personal customer support goes a long way. If someone has a problem, you help them fix it, you help them get along, even if you're not to blame. People will remember that and tell others about it. The same is true for offering sincere apologies to people who are annoyed for valid reasons. With the right kind of support you can turn them into loyal customers. An apology goes a long way.

The Price

When I worked towards a beta of the NoSQL Handbook, I figured I'd start selling it for $12 and the final book for double that amount. After going through Amy's process and reading some of her excellent blog posts on pricing I reconsidered. I bumped up the price for the Riak Handbook alone to $19. After talking to some of my reviewers (who are awesome, thank you guys so much!) I raised it to $25. Giving it some more thought I settled on $29.

Let's look at how that worked out. To make the same amount with $19, I'd have to sell 1158 copies. That's more than 50% more sales I'd have to have made, which feels unlikely to me. Cheaper price doesn't automatically mean more books sold. The value of the book to the customer, to people genuinely interested in learning about Riak, is what really matters. The price follows the value, not what feels right.

Looking back, I had only very few complaints about the price. Given that the book just got a huge update and it's going to get more over the next months, I'm still pondering the price in the longer term.

The Rest

As I said before, it has been amazing experience. Selling something you've built yourself gives you a totally new kind of insight in building a business around products, in developing customer relationships, in providing value to the people to whom it matters the most.

You appreciate every single customer. You get to build a personal relationship with every one of them, should you choose to and should they accept your offer. Compare that with a traditional publisher, where you just get paid by the publisher, and you'll never know who your readers are unless they write a review about your book somewhere.

As a self-publisher, you also get to redefine the process of publishing. You don't have to just ship a finished book and leave it at that like most traditional publishers. For example, the book's recent update clocked in at more than 40 pages of new contents, which existing customers got for free.

The update also included the entire book as a man page. The idea still cracks me up, but just think how useful that is. Customers can search the book quickly and never have to leave the command line they spend so much time in.

Or you can ditch the book format entirely like Steve Klabnik did with his Designing Hypermedia APIs. You get to decide how you want to publish your product, and you get to validate if it's the right approach or not.

I'm still working on pushing the book forwards, adding more contents, updating examples and texts for new Riak releases, improving things as I go along. And yes, that means there's more free updates in the future. You should check out the book if you like that idea :)

I've been on vacation in France for most of June, and that means lots of time to read. Originally I planned on reading more on distributed systems, but I had a decent backlog of books on my Kindle, so this was just the right time to plow through them. By the way, if you don't have a Kindle yet, you should get one. It's a great little device. I've been reading so much more since I got it. Anyhoo, here's the list of books I've been reading in June.

Java Concurrency in Practice by Brian Goetz. This is a classic on programming for concurrency in Java. While all the code examples are Java, they're just as easy to understand, and should be easily applicable to your programming language of choice. Given, of course, that there are libraries offering similar data structures.

The book goes through great length discussing what's wrong with just using threads and synchronizing access to data and how newer concurrent APIs in Java can help you avoid the hassle. It covers a mind-boggling number of details and data structures. Concurrent collections, designing thread-safe code, latches, barriers, queues, atomic data types, locks, semaphores, deadlocks, thread liveness, execution pools and so on. The part that really surprised me was the insight on the JVM's memory model, and why you need to protect data structures when it's shared across threads and multiple cores and processors. A must-read when it comes to programming for concurrency, and not just on the JVM. This book is a true gem.

Designing With Data by Brian Suda. A great, short introduction to visualizing data. The book is for everyone new to the area of graphing and exploring data. Don't expect a thorough introduction on statistics and everything around the numbers. The book focuses more on introducing the reader to the different types of graphs, why and when they work and also why some of them don't work.

Scalable Internet Architectures by Theo Schlossnagle. This book was written in 2007 and was way ahead of its time. Never mind the examples being mostly in Perl, this book covers all the little details on what it takes to build scalable web applications. Heck, it even shows you how you can build your own cross-vendor database replicator. A highly recommended read. It's right up there with Release It! by Micheal Nygard, which you should read too.

Small Is The New Big by Seth Godin. I gotta admit, I haven't read anything by Seth so far, but this was a great start. It's a collection of 183 posts from his blog, carefully selected to represent little stories on why big companies fail and how small companies can succeed. It's a great read, I'm amazed how well Seth can take small examples like chucking a large pile of jewel cases and extrapolate them into a big picture to examplify why the music industry is doomed. Looking forward to reading more of his books.

Drive: The Surprising Truth About What Motivates Us by Daniel Pink. The title says it all, the book explores, through scientific (but not at all boring) analysis, why money is not our sole motivator. We have an inner drive to expand our personal horizons, to master what we do every day and to work towards a purpose bigger than ourselves. Tom Preston-Werner (of GitHub) recommended the book at a conference, and you can see how it reflects the work culture at GitHub. Fits in very well with the aforementioned book.

Programming Concurrency on the JVM by Venkat Subramaniam. This book picks up where "Java Concurrency in Practice" left off. To recap things in terms of more traditional synchronization and concurrency APIs it builds on several simple examples that are being rebuilt constantly using new tools as the book progresses. The interesting bits are the part that covers software transactional memory and actors, both mostly focusing on Akka.

As the title suggests this book is very code-heavy, which sometimes, at least on the Kindle, is a bit unreadable. It takes you through all the details of using STM and actors, both in Java and Scala, but also with examples in Groovy, JRuby and Clojure. This is pretty neat, because you pick up some new things along the way. I'd wish for some more depth here and there but I feel much better informed on STM and actors after reading it.

Knack by Norm Brodsky and Bo Burlingham. A book focused around founding, running and growing a business, this one is full of stories from the author's experiences with his businesses, beginning as start-ups, growing into big yet still customer-focused and in their own right still small companies.

Added to the mix are stories from people and companies the Norm has advised over the years. You don't have to believe or take for granted everything he has to say and recommends doing or not doing, but this one is a great read either way, very much so because it is full of stories. If you read "Drive" and "Small Is The New Big", you'll find similar patterns occurring in all of them.

As days go by this book keeps coming back to me. Lots of little details that I want to apply to my own business practices. The more I think about it the more I think you should read this book.

Clojure Programming by Chas Emerick, Brian Carper and Christophe Grande. Clojure pushes all the right buttons for me as a language, and this book so far has helped me grasp more and more of it. While some of the code examples aren't very practical and introduce new concepts without discussing them here and there, the book is still a great introduction to the language. I just wish it wasn't > 600 pages, but still, lots of contents to plow through.

Pricing with Confidence by Reed Holden. I came across this book by way of Amy Hoy's blog posts on pricing. The book deserved an emergency spot on my reading list because it's very relevant for the product I'm currently working on. The book's focus is on basing the price of a product on its value to the customer. Granted, I just started reading it, but so far it reads well and the points make a lot of sense. If you're looking to dive deeper into pricing your products, there's also Don't Just Roll The Dice, whose PDF version is available as a free download.

Now go read!

Tags: reading, books

I recently spent some quality time with CRDTs, which is short for commutative replicated data types. I've gotten curious about them when working on the Riak Handbook and I gave a talk about designing data structures for Riak the other week at NoSQL matters, slides are available too

What are commutative replicated data types? It's a fancy term for describing data structures suitable for eventually consistent systems. You know what's an eventually consistent system? Riak!

When working with Riak, your data needs to be designed in a way that allows coping with its eventually consistent nature. This poses a problem for data types like counters, sets, graphs, essentially all data structures that require operations to be executed in a monotonic fashion.

For instance, with a counter, you don't want to lose a single increment when multiple clients add values. But due to Riak's eventually consistent nature you can't guarantee monotonic order of updates. Instead you need to make sure you can restore a logical order of operations at any time given any number of conflicting writes.

When multiple clients update the same object concurrently they cause siblings, two objects with different values. If every sibling has a different value for the counter, how do you make sure you can restore order and therefore the final value? Let's look at a worst-case scenario of a data structure that won't work well in this case. Two clients see an object already stored in Riak representing a counter, currently having the value 1.

{
  "value": 1
}

Two clients now want to update the counter, incrementing its value by 1. They both store the updated data back to Riak, causing a conflict. Now you have two siblings, both having the value 2. You also have the original sibling around as referenced by both clients when they wrote their data.

{
  "value": 2
}

It's unlikely you'll be able to restore the total sum of both values, because you don't know what the previous value for both clients was. You can assume the value was 1, but what if a client incremented by 2? In an eventually consistent system it's hard to say how much has changed since the last time you've seen the data unless you keep track specifically of what has changed.

Commutative replicated data types are data structures designed to help you here. Let's look at an alternative for a counter. What if, instead of keeping a single value, every client keeps its own value and only updates that number instead of the total value?

We can assume that updates of a single client will happen in a monotonic fashion. There shouldn't be more than one client with the same identifier in the system.

Here's an example of our updated data structure:

{
  "client-1": 2,
  "client-2": 2,
  "client-3": 3
}

When a client wants to update a value he only updates its own value. It's a contract between all clients to never touch any other client's data other than merge it back together. When a client finds an object with siblings it can merge them together simply by picking the highest value for every client. Part of the contract is also that a client must merge the data when it finds an object with siblings.

To get the total value for the counter, just calculate the sum of all values, et voila! This surprisingly simple data structure is called G-counter.

Let's look at some code. I'm assuming your bucket has support for siblings enabled.

The bits to generate a counter value are straight-forward. You just have to make sure to assign unique but recurring client identifiers to your client objects. Here we're using the Ruby client.

require 'riak'

riak = Riak::Client.new(client_id: 'client-1')
counter = riak.bucket('g-counters').get_or_new('counter-1')

counter.data ||= {}
counter.data[riak.client_id] ||= 0
counter.data[riak.client_id] += 1
counter.store

After initializing the data structure we're assigning it a default, if necessary and increment the counter. This code can nicely be hidden in a library function somewhere. The interesting bit is merging the data structures back together should the client find siblings. The Ruby client has a convenient way to specify callbacks that should be called when more than one object is returned.

We're writing code that iterates over all siblings, picking the highest value for every client along the way.

Riak::RObject.on_conflict do |robject|
  return nil if robject.bucket != 'g-counters'
  data = robject.siblings.each_with_object({}) do |sibling, data|
    (sibling.data || {}).each do |client_id, value|
      if (data[client_id] || 0) < value
        data[client_id] = value
      end
    end
  end
  robject.data = data
  robject
end

The next time you fetch the data and the Ruby client detects a conflict the callback will be run, merging the data back together into a single data structure.

I'll leave the code to calculate the sum of all values as an exercise to the reader.

All this assumes that you're enforcing a stronger consistency on your data. You need to make sure that R + W > N, because even when one client only updates its own values, he has little control over where its data is written. When you don't make sure that consistency of data is enforced you can run into situations where a client comes across two siblings caused by its own updates. This can happen when a primary replica failed, a secondary replica took its place and the client only uses a small read quorum. These scenarios deserve their own write-up.

If you want to know more about commutative replicated data types I highly suggest reading the relevant paper on them. It's almost fifty pages long and required me several reads to get a good grasp of them, but it's totally worth it. There are more specific implementations available for CRDTs too, specifically statebox for Erlang, knockbox for Clojure and a sample implementation in Ruby. The latter comes with a handy README that shows examples for the specific data types. All of them aren't specific to Riak but can be used with it. Also fresh from the world of academic papers is this one by Neil Conway et. al. on lattices in distributed computing by way of Bloom, a language for disorderly distributed computing.

There are some other caveats with CRDTs and Riak but we'll look at them in more detail in another installment of this series, in particular regarding consistency and garbage collection. There's a lot to be said about CRDTs and there's a lot of brain matter to be spent on trying to understand them. The next update for Riak Handbook might even include a section on them. The topic is certainly fascinating enough to warrant one, as it addresses the issues people commonly encounter when designing data structures for eventual consistency.

Tags: riak, crdt