Here's a list of things I've been reading lately or that I'm about to read, and that I found to be worth sharing. If you're looking for something to read over the holidays, I'm happy to give you some suggestions. Books, papers, articles, and videos, something for everyone.

Scalability Rules

A list of 50 rules related to scalability, in an easy to read recipe style. They leave some stuff to the imagination, and I don't agree with every single rule, especially not with the one that demands software should always be easy to rollback, but they give you good food for thought for your own applications.

Time, Clocks, and the Ordering of Events in a Distributed System

The earliest paper (1978!) to mention the notion of clocks as a means to track ordering of events in distributed systems, the predecessor to vector clocks, if you will. A must read.

Harvest, Yield, and Scalable Tolerant Systems

A recap of CAP, making the whole notion of it a bit more flexible by adding tuning knobs for graceful degradation. Hat tip to Coda Hale and his article "You can't sacrifice partition tolerance" for pointing me to this.

Problems with CAP, and Yahoo's little known NoSQL system

Also related to CAP, this article introduces the notion of PACELC, which basically adds latency to the CAP equation. CAP has been criticized quite a few times for being too strict in this regard, and while the name PACELC is a bit odd, the added notion of latency makes a lot of sense.

Replication and the latency-consistency tradeoff

Another one from Daniel Abadi, another one related to CAP, this time talking about replication, consistency, and latency.

It's the latency, stupid!

Going further back in time, this paper talks about latency in all its glory. Sure, it talks about modem speed connections, but extrapolate that into today's network bandwidth and you still have latency. Or you can read the next posts too.

It's Still The Latency, Stupid...pt. 1 and It's Still The Latency, Stupid...pt. 2

A more recent update on latency, because it still matters more than bandwidth.

Crash-only Software

I've been pondering fault-tolerant and cloud-ready systems for a while now, here's one related to the topic, software that crashes as a means to make it more fault-tolerant.

Systems that Never Stop (video)

Great talk by Joe Armstrong, inventor of Erlang, laying down six laws for fault-tolerant systems. All laws lead to Erlang obviously, but it all makes a lot of sense.

Why Do Computers Stop and What Can Be Done About It?

Related to Joe's talk, this paper discusses hardware reduncancy and reliable storage by means of process pairs, modularity and transactions. Have yet to read this one, but going to be interesting thinking about how these ideas, stemming from hardware, apply to software and have been implemented by Erlang.

Working With Unix Processes

A little indie-published ebook on handling Unix processes. Code is focused on Ruby, but most if not all of the book is easily applicable to any other language or a basic Unix environment.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA was an idea for web and application server concurrency based on using queues to condition and handle requests. While the idea has not exactly made it through, I found the model to be strikingly similar to the actor model, in a different way, but still very similar.

A Retrospective on SEDA

SEDA, ten years later, by the author of the original paper. I gotta say, he talks a lot what they got wrong, but I for one think SEDA had a pretty big impact on the bigger picture of web application architecture. Probably something worth discussing in a separate post.

Why Events Are A Bad Idea

A paper comparing threads and events for highly concurrent servers. I'd recommend taking this with a grain of salt. A lot has changed since this paper was written, but what I like about reading papers like this is that it gives you a historic perspective, same for SEDA.

Understanding Virtual Memory

Nice summary of how virtual memory works on Linux.

The Declarative Imperative: Experiences and Conjectures in Distributed Logic

To be honest, this is a slightly confusing paper. It starts out modeling things in an oscure language called Datalog, but then dives into making some conjectures about distributed logic, which was to me the more interesting part.

A brief history of Consensus, 2PC and Transaction Commit

This article is full of gold. An extraordinarily compact view on the topic, but with an abundance of links to papers to dive deeper.

Going to keep posting reading lists like this in the future. So much good stuff to read out there. Lots of great knowledge collected in papers.

Last but not least, why not add the Riak Handbook to your reading list as well?

Tags: reading

Been a while since the last reading list (here's a handy link, in case you're looking for more to read). Time to remedy that. Disclaimer: All links below are Amazon affiliate links. You'll be feeding my reading habit. Thank you in advance!

Pricing With Confidence by Reed Holden

I know I already mentioned this on the previous list, but it's just so good. A must read for pricing products or even your time as a freelancer. Must. Read.

Poke the Box by Seth Godin

A nice and short manifesto about starting (and finishing) things. If you don't finish, technically you never really started, right? Pretty delightful read and a nice kick in the pants about starting something, anything, about making things happen. Because if you don't, who else is there?

Fool's Gold by Gillian Tett

An excellent rundown of how the 2008 financial crisis came about and how derivatives and collateralized debt obligations came about. The interesting bit is that they were created with good intentions originally, but as with a lot of things, the short-sightedness and greed of investors and banks turned it into a mind-boggling web that was bound to end up as a cataclysmic and cascading failure across the entire financial system.

Start Small, Stay Small by Rob Walling

If you're interested in running a small business, built around profitable products, marketing and building them yourself, this is a great little introduction on everything you need to know. I got quite a few ideas from this book for my next ventures.

After you're done with it, and you want to keep going, Amy Hoy's 30x500 class is highly recommended.

Architecture of Open Source Applications Vol. 2

The second edition of this great compilation is upon us, and it's great. I loved the chapter on ZeroMQ in particular, but there's still a lot I need to read, e.g. the chapter on nginx or the one on PyPy

How To Win Friends and Influence People by Dale Carnegie

This book is now 80 years old yet its content is pretty much timeless. The title might be a bit misleading about what it's really about. If you're interested in improving your people skills, how to make people want something you have to offer and how you can turn them over to your side, this book is for you. If you're running a business of any kind, this is a must read. The single most revealing book I've read in a while.

It turns out, people and how we interact have barely changed at all. Still so much to learn.

Predictably Irrational by Dan Ariely

A delightful and pretty revealing book about how irrational yet predictable human behaviour is. Driven by scientific experiments, this book is also rather revealing when it comes to marketing products, for example. I'd call this another must-read if you run a business of sorts or sell something for a living.

It Will Be Exilirating by Dan Provost

A very short but nice read about how Studio Neat, makers of the Glif and the Cosmonaut, came about. Talks a bit about successfully running a Kickstarter campaign, but also about running their small business in general. A few bits and pieces to pick up in this one. Most importantly, it's another inspiration to start something.

Happy reading!

Tags: reading, books

I've been on vacation in France for most of June, and that means lots of time to read. Originally I planned on reading more on distributed systems, but I had a decent backlog of books on my Kindle, so this was just the right time to plow through them. By the way, if you don't have a Kindle yet, you should get one. It's a great little device. I've been reading so much more since I got it. Anyhoo, here's the list of books I've been reading in June.

Java Concurrency in Practice by Brian Goetz. This is a classic on programming for concurrency in Java. While all the code examples are Java, they're just as easy to understand, and should be easily applicable to your programming language of choice. Given, of course, that there are libraries offering similar data structures.

The book goes through great length discussing what's wrong with just using threads and synchronizing access to data and how newer concurrent APIs in Java can help you avoid the hassle. It covers a mind-boggling number of details and data structures. Concurrent collections, designing thread-safe code, latches, barriers, queues, atomic data types, locks, semaphores, deadlocks, thread liveness, execution pools and so on. The part that really surprised me was the insight on the JVM's memory model, and why you need to protect data structures when it's shared across threads and multiple cores and processors. A must-read when it comes to programming for concurrency, and not just on the JVM. This book is a true gem.

Designing With Data by Brian Suda. A great, short introduction to visualizing data. The book is for everyone new to the area of graphing and exploring data. Don't expect a thorough introduction on statistics and everything around the numbers. The book focuses more on introducing the reader to the different types of graphs, why and when they work and also why some of them don't work.

Scalable Internet Architectures by Theo Schlossnagle. This book was written in 2007 and was way ahead of its time. Never mind the examples being mostly in Perl, this book covers all the little details on what it takes to build scalable web applications. Heck, it even shows you how you can build your own cross-vendor database replicator. A highly recommended read. It's right up there with Release It! by Micheal Nygard, which you should read too.

Small Is The New Big by Seth Godin. I gotta admit, I haven't read anything by Seth so far, but this was a great start. It's a collection of 183 posts from his blog, carefully selected to represent little stories on why big companies fail and how small companies can succeed. It's a great read, I'm amazed how well Seth can take small examples like chucking a large pile of jewel cases and extrapolate them into a big picture to examplify why the music industry is doomed. Looking forward to reading more of his books.

Drive: The Surprising Truth About What Motivates Us by Daniel Pink. The title says it all, the book explores, through scientific (but not at all boring) analysis, why money is not our sole motivator. We have an inner drive to expand our personal horizons, to master what we do every day and to work towards a purpose bigger than ourselves. Tom Preston-Werner (of GitHub) recommended the book at a conference, and you can see how it reflects the work culture at GitHub. Fits in very well with the aforementioned book.

Programming Concurrency on the JVM by Venkat Subramaniam. This book picks up where "Java Concurrency in Practice" left off. To recap things in terms of more traditional synchronization and concurrency APIs it builds on several simple examples that are being rebuilt constantly using new tools as the book progresses. The interesting bits are the part that covers software transactional memory and actors, both mostly focusing on Akka.

As the title suggests this book is very code-heavy, which sometimes, at least on the Kindle, is a bit unreadable. It takes you through all the details of using STM and actors, both in Java and Scala, but also with examples in Groovy, JRuby and Clojure. This is pretty neat, because you pick up some new things along the way. I'd wish for some more depth here and there but I feel much better informed on STM and actors after reading it.

Knack by Norm Brodsky and Bo Burlingham. A book focused around founding, running and growing a business, this one is full of stories from the author's experiences with his businesses, beginning as start-ups, growing into big yet still customer-focused and in their own right still small companies.

Added to the mix are stories from people and companies the Norm has advised over the years. You don't have to believe or take for granted everything he has to say and recommends doing or not doing, but this one is a great read either way, very much so because it is full of stories. If you read "Drive" and "Small Is The New Big", you'll find similar patterns occurring in all of them.

As days go by this book keeps coming back to me. Lots of little details that I want to apply to my own business practices. The more I think about it the more I think you should read this book.

Clojure Programming by Chas Emerick, Brian Carper and Christophe Grande. Clojure pushes all the right buttons for me as a language, and this book so far has helped me grasp more and more of it. While some of the code examples aren't very practical and introduce new concepts without discussing them here and there, the book is still a great introduction to the language. I just wish it wasn't > 600 pages, but still, lots of contents to plow through.

Pricing with Confidence by Reed Holden. I came across this book by way of Amy Hoy's blog posts on pricing. The book deserved an emergency spot on my reading list because it's very relevant for the product I'm currently working on. The book's focus is on basing the price of a product on its value to the customer. Granted, I just started reading it, but so far it reads well and the points make a lot of sense. If you're looking to dive deeper into pricing your products, there's also Don't Just Roll The Dice, whose PDF version is available as a free download.

Now go read!

Tags: reading, books

With February almost over, it's time to give you news things to read, or at least to make a list of things I've been reading lately.

Continuous Delivery

A rather wordy and very repetitive excursion into the ideas behind continuous delivery, which involves continuous integration, continuous deployment and lots of other things. While I'm all on board with the ideas in the book, it's simply too long. Every chapter repeats a lot of the things from other chapters, mostly with the purpose of being easily accessible on their own. I like the book in general, but 500 pages for a book like this is simply too long.

The Architecture of Open Source Applications

This is an incredible resource, and best of all, it's free. Several open source projects are outlined from how their architecture evolved over time and how they came about to begin with.

The chapters I've very much enjoyed so far are Graphite, HDFS, Erlang and Riak, Sendmail. But the best chapter by far is the one on BerkeleyDB, the key-value store you didn't know you'd find everywhere. It's an exceptional read and should be mandatory for software developers. It's a great story on how to evolve architecture of what started out as a simple library over the course of 20 years.

While the book is free to read online, please consider buying if you like it. It's for a good cause too. There's a second volume in the works, due in March.

Things Caches Do

A short and simple read on what web caches do. If you build web apps, read this. Also has links to more in-depth articles on HTTP and caching.

Probabilistically Bounded Staleness for Practical Partial Quorums

An analysis of how eventual and consistent eventual consistency is. Relevant if you're dealing with Dynamo-style databases, but an interesting read any way you look at it. Accompanied by a website that allows you to calculate the probability that data in a quorum-replicated cluster is consistent over time.

Design of Apache Kafka, a high-throughput distributed messaging system

If you're into messaging, you should read this, especially if all you know is RabbitMQ and Redis for queueing. Both don't scale well and they're not easy to make fault-tolerant. Kafka, built at LinkedIn, follows a very different design to allow being run fully distributed.

Notes on Varnish, from the Architect

The ideas behind Varnish, why Squid's way is outdated, and a perspective on great uses of memory-mapped files. Short and self-congratulatory read.

The LMAX Architecture

After reading this, you'll have a different perspective on things when it comes to building high-throughput systems. LMAX is a real-time trading site, and this article describes how they built the service that manages millions of trades per second. Other than your typical architecture description on highscalability.com, this is one has a lot of great information.

ZooKeeper

A paper on ZooKeeper, a system for coordinating process in distributed systems. This is a surprisingly good read, and it also outlines several use cases for ZooKeeper.

The Lean Startup

People raved about this book, and it was recommended to me from several folks. After reading it, I'm rather meh on it. It felt like reading an over-glorified handbook for a process that startups must adopt to be successful. Overuse of the words "disruptive", "pivot", "startup", and "entrepreneur" all but added to the slightly weird taste the book left me with.

Still, I don't think of reading something as a waste. If anything, this book helped me clarify my thoughts on the matter and gave me more perspective. That's what reading to me is all about.

Why We Get Fat

I got curiously interested in how the body works and, in particular, how carbohydrates affect it. While this book should be taken with a spoon full of salt, I learned quite a bit from it.

Read, Read!

If you want to read something really good like right now. Make it the aforementioned chapter on BerkeleyDB and the ZooKeeper paper.

Tags: reading