I've had the dubious pleasure of working with a certain library. It's a layer to talk to most of the Amazon Web Services APIs. While working with web services usually is a particular awful experience, this library doesn't make much of an effort to hide their awkwardness, in fact, in some ways it even makes it worse. It's pretty old news that I enjoy bitching about code I don't like, but I also like to keep it positive in at least thinking about how it could be improved.

Let's take a moment and have a look at what part of their API is particularly nasty and how the situation could be improved. We actually wrote a micro-wrapper around that library to make things a bit less awful for us, and to at least keep their code out of ours.

I'm looking in particular at the EC2 part of the library. Here's the signature of the method that runs a new instance:

run_instances(image_id, min_count, max_count, group_ids, key_name,
              user_data='', addressing_type = nil, instance_type = nil,
              kernel_id = nil, ramdisk_id = nil, availability_zone = nil,
              block_device_mappings = nil)

Yes, that's 12 arguments you can hand over to the method, most of them being optional. Now, what's it look like calling it:

run_instances('ami-123445', 1, 1, [], 'default')

So far so good. But what if you want to specify a different availability zone for e.g. the EU:

run_instances('ami-123445', 1, 1, [], 'default', '', nil, nil, nil, nil, 'eu-west1b')

Nice, eh? It's starting to get really ugly. The important parameters are hidden by unnecessary noise. What's with max_count and min_count not having default values? I'd argue that you normally it'd be quite reasonable to assume that you only want to run one instance when calling the method without these two parameters.

Anyway, that's just one example, in general, there's only one important piece of information, and that's the AMI identifier. Let's look at what this method could look like when you sprinkle some very common Rubyisms on top, things that should be quite common sense when writing code in Ruby, but seem to have been lost here in favor of resembling the original API as close as possible, even if that means writing cumbersome code:

run_instance('ami-123445', :availability_zone => 'eu-west1b', :ssh_key => 'default')

Now the method only takes one argument and a has of options. Is it more code to write? Yes. Are its intentions clearer? I'd argue that they are. Is it close to what the API underneath expects as arguments? Closer than before. Unnecessary information is left out, because either the method itself could send sensible defaults, or the web service itself does it. The EC2 API even has a default for the instance type, and it doesn't care if you leave out the RAM disk or specific kernel. It does what's necessary to ensure your instance is up.

Knowing that, it's not that hard to hide those details behind a slightly improved version of run_instances, in this case even called run_instance. Throw in a method run_instances to handle specifics of launching multiple instances at once. Is it more code to write? Sure. But it sure as hell makes for a nicer usage of the library itself. Instead of having to look up the order of the method's parameters and fill in blanks with nil, I only have to look what options the methods accepts.

In general I don't like giving a method more than three arguments. It simply means it's doing too much, or that the data it needs should be wrapped in a simple data structure, e.g. a hash (which is just called for in Ruby) or a simple struct. The more arguments you give a method, the harder it will be for users to figure out its usage, especially if you make a whole stash of them optional, because then the order of them starts to matter, and users have to dig through an endless line of documentation just to figure out where the argument most important to them needs to be placed among a pile of nils.

run_instances in said library returns an array of hashes. Those hashes contain the data returned by the web service, each entry in the array describing one instance. Apart from the fact that you always have to get the first entry of that array when you just launched one instance, it also does something weird. EC2 returns attributes in the hash using names like imageId, instanceId, keyName and so on. You could argue about the naming style, but the names sure are reasonable. What run_instances does is go ahead and prefix almost every attribute with aws_, but not all of them. Most methods in said library do the same, so at least there's some consistency, but it sure isn't great.

So while trying to resemble the API call, it makes you care about what's what instead of just giving you the attributes as it receives them from the API. The answer to this is simple, give me the damn attributes as you get them, simple like that. If you need to ensure that there are no naming clashes, put them in a subhash e.g. using the key :ec2 if you must.

Another example is this really simple method:


It takes an array as an argument, but nothing more. Even if you just want to get the data for one instance, you have to call it like this:


I don't know about you, but I think that looks just gross. We're using Ruby for chrissakes, there's a simple mechanism to ensure you get an array, but the library user doesn't need to specify it as such, it's called splat arguments. It's a simple fix, and has great effect.

describe_instances('i-123454', 'i-213143')

There, much better. Again, I'd make the case for a convenience method for getting the data for just one instance, which in turn fetches the first element from the resulting array. If you must, generate the code for it, but it makes code using the library a lot easier on the eyes. Sure, it's more work for you as the library's author, but your users will be thankful, I'm sure.

Last example, the library's layer to access S3. Now, when I reach for my bucket, I can use this method:

bucket(name, create=false, perms=nil, headers={})

The second parameter caught my eye, according to the RDoc it creates the bucket if it doesn't exist. Looking at the code it just doesn't even check if the bucket already exists, it just goes ahead and does a PUT request whenever you specify true. So to make the intention clearer you actually have to do the checking yourself and recall the method with create set to true to create the bucket on the second run. Because if it doesn't exist bucket will just return nil. You could end up with something like this.

bucket('bucket-for-monsieur') or bucket('bucket-for-monsieur', true)

So the RDoc is actually lying to us, and we have to bend over and ruin one line of code just because our library is too lazy to do the job for us. If you ask me, the whole method signature looks a bit odd. Somehow it also calls for a little makeover with an options hash, and it obviously should do what it says and check the existence of the bucket for us if we ask it to, and maybe raise an error otherwise, but the latter is a matter of taste. Sometimes I'd rather like to have an error instead of having to check for nil, since in this case you could argue that it clearly is an exceptional situation.

bucket('bucket-for-monsieur', :create => true, :permissions => 'public')

Designing a good API is hard, especially when you're hiding one that's already not great, though that's not an excuse. But sometimes it's well worth looking at how you think others would want to use your library, putting aside the fact that you need to do more work maybe to make it more usable. It's not too much to ask, and it'll sure make your users happy, because their code will look nicer too. The only failure I could add to the above is when the methods would have the real names of the API calls:


Gross! You can laugh now, but some Ruby libraries for accessing Amazon's Web Services don't shy away from exposing you to these undoubtedly not Ruby-like method names.

This post has been lying in my drafts folder for a while now, and since I'm trying out new approaches to shrink oversized controllers, it's about time to put this out, and get ready for describing alternatives.

One of the basic memes of Rails is "Skinny controller, fat model." And even though most examples, especially the ones using the newer REST features in Rails advertise it, there are still heaps of controllers out there that have grown too big, in both the sense of lines of code and the number of actions. I've had my fair share of super-sized controllers over the last months, and it's never a pleasant experience to work with them.

If you don’t believe me, look at the Redmine code. Whether it's laziness or lack of knowing better doesn't really matter. Fact is, if it keeps growing you'll have a hard time adding new features or fixing bugs. Error handling will become more and more cumbersome the more code you stuff into a single action.

And if it's pretty big already, chances are that people will throw in more and more code over time. Why not? There's so much code in there already, what difference does it make? Broken windows are in full effect here. If the controller is full of garbled and untested code already, people will add more code like that. Why should they bother writing tests for it or refactoring it anyway? Sounds stupid, but that's just the way it is.

On a current project I refactored several (and by that, I mean a lot) of these actions to merely four to ten lines of code. The project is not using RESTful Rails, but it wouldn't make much of a difference anyway. I've made some experiences that worked out pretty well for me, which would very likely help to make a controller RESTful. But that wasn't really the main objective on my current project. If they're still up to par when fully using REST I'll leave up to you to decide or, even better, update.

I’m not going to put code into this article, since most of the stuff is graspable without looking at code. If you see the sort of code I’m talking about you’ll understand.

It's actually just a few simple steps, but they can be both frustrating and exhausting, even when you take on step at a time (which you should, really).

Understand what it does

It's too obvious, isn't it? But still, a beast consisting of 160 of code should be approached carefully. Understand what each line does, and more importantly, why it does it. In an ideal world you can just read the code, and understand what it does, but we both know that’s wishful thinking. If you’re lucky you can just ask someone who knows their way around. But oftentimes you’re out of luck, and just have to gather as much information from the code as you can, or from playing around with the application itself.

Don't just look at the code, run it from the application, look at the parameters coming in from the views. Annotate the code if it helps you, it might help others as well.

Look at the view code too. This also isn't always a pleasant experience, but you will find hidden fields, parameters set and handed through in the weirdest ways for no apparent reason.

Test the hell out of it

Most likely the action at hand does not have any tests available to ensure your refactoring will work out well, otherwise you very likely wouldn't be in your current position. If it does they might've been abandoned a long time ago, and it's not even safe to say if the tests are still testing the right thing. If you have a big controller with a good test suite in place, even better. Check if they're testing all aspects of the code about to be refactored.

If not, take this opportunity to write as much tests for it as possible. Test even the simplest features, with as much detail as you can or as the time available allows for. You don't want even those features to break, do you?

I easily end up with 50 new test cases for a bigger action during such a run. Resist the temptation to refactor while you write tests. Mark parts of the code if you get ideas what to do with them, and get back to them in the refactoring phase.

Avoid testing too much functionality at once in a single test case. Keep them small and focused on a single aspect of the code in question. It doesn't have to be tests with just one assertion, but keep it focussed on a specific aspect of the method in question.

Basically it's now or never. This is the chance to improve test coverage, so do it. You'll be happy you invested the time, it will give you a better understanding of the code, and will ensure that the code still works.

It’s a painful process, but it also helps you to really understand what the code does.

Avoid complex fixtures

I don't use a lot of fixtures anymore in my functional tests. They're not just a pain in the ass, they're hard to set up, especially for complex actions, and they kill test performance. Try to use mocks and stubs instead. If you test your action line by line you can easily identify the methods that need to be stubbed or mocked. If you prefer it the scenario way, use something like factory_girl to setup objects for your tests. I'm a fan of stubbing and mocking, but too much of it will clutter your test code. I've been using it heavily for a while, but it tends to become a sign for bad design when you're using too much of it. So I've returned to using scenarios based on the task at hand, even if they hit the database.

If you turn to mocking/stubbing initially, make sure you untangle the potential mess afterwards. Even though the database can make your tests slower, in the end you want to test the whole thing.

You also want to stub out external collaborators, like web services, Amazon's S3 and the like. They don't belong into your controllers anyway, but moving them somewhere else might just open another can of worms (think asynchronous messaging), and introducing that level of complexity is just not what you need right now. Though you might want to consider it eventually.

Move blocks into methods

I'm not speaking of a block in terms of proc and lambda, but in the sense of conditional branching and the like. Longer if/else clauses usually are good candidates for getting code out of a long method into a new one, and you usually can go ahead and do just that. Once you've moved stuff out into methods, it's a lot easier to move them into modules or the model, but only if the blocks depend on parameters you can't or don't want to reach in your model.

Try to avoid the temptation to look for similar code to consolidate in the same or other controllers just yet. Make a note, and wait until you have tests in place for all the affected code. Then start to make the code DRYer by moving it into a place more common for all the classes that require it.

Break out new controllers and new actions

The same rule that applies to adding new actions to controllers also applies to working on existing ones: Having lot of actions in one controller usually means that it's doing more than it's supposed to. More and more actions usually mean there's code that's not really the controller's responsibility, solely speaking in terms of concerns. If the controller responsible for logins also takes care of a user's messages, then it breaks the separation of concerns. Move that stuff out into a new controller.

But if you can, it's also feasible to break out new actions. That's a good option when you have an action that responds differently based on input parameters or depending on the HTTP method, an old Rails classic. It will have the advantage that stuff like error handling will get a lot simpler. Big actions that do different things all at once tend to have a complex setup for catching errors. Several variables are assigned along the process, and at the end there's a long statement that checks if somewhere along the way an error occurred. If you separate the different tasks into smaller actions, you'll end up with much simpler error handling code, since it can focus on one thing, and one thing only.

The same goes for all classes really. Although with a model it's not always easy to break out a new class. But what you can do is break out code into modules and just include them.

Extract filters

Filters are a nice fit for parts of the code where objects are fetched and redirects are sent in case something isn't right, especially since the latter always require that you step out of your action as soon as possible, before doing any further logic. Moving that code out into methods, checking their return code and returning based upon that will make your code look pretty awkward. Filters are also nice to set pre-conditions for an action, pre-fetch data and the like. Whatever will help your controller actions do their thing with less code, and doesn’t fit into the model, try fitting it into a filter.

Try to keep them small though. It's too easy to just break out filters instead of moving code into the model, and it will slightly improve the code for sure. But what you really want is a small and focussed controller with a couple of lines of code in each action, and a few supporting filters around them.

Move code into the model

This is where it gets tricky, but now you can get all that business logic where it belongs. To get code out of the controller and into the model, you have to make sure it doesn't rely on things that's only available in it. params, session, cookies and the flash are the usual candidates here.

But there's always a way to work around that. Oftentimes you'll find code that assigns an error message or something similar to the flash. That kind of stuff is sometimes easier to handle in validations in the model, if it's dealing with error messages. I've seen that a lot, and it's just not the controllers responsibility to do all that work.

If your controller code is heavily dealing with stuff from the params hash, you can usually just hand that over to the model. Given of course that you properly cleaned it up first into a before_filter, or ensured that proper validations are in place.

You’ll usually find lots of constructed finders in controllers. Go after those too. Either use named scopes if you can, or create new finders in the model. It's already a lot easier on the eye when all that hand-crafted finder code is out of the way, and tucked neatly into a model.

Code that checks model objects for errors, validity, etc. belongs into validations or callbacks in your model. Just like any other code that’s none of the controllers’ business. Which basically is to mediate between the view and the model, and to do everything required to get that task done fast and without hassle. So that's the next step. A lot of times you'll find controllers setting arbitrary instance variables based on the state of the model. Rings a bell? Sure, why should the controller store the state of the model? It just should not. That's what the model is for, right?

When you’re done moving code into the model, move the according test code from functional to unit tests. Tests that used to test the business logic from the controllers perspective can now do so from a unit test. That way your functional tests can solely focus on what your web layer does.

Over time you will get an eye for code that just belongs into the model, and code that could be moved into the view, or that needs to stay in the controller. It takes practice, but the more the better. Working with legacy code is oftentimes an opportunity, not a punishment.

Know when to stop

Now that you have a skinny and tested controller, why not just keep going? It’s easy to fall into the refactoring trap. It’s just such a nice feeling of accomplishment. If you look at it that way, you could just keep refactoring your application’s code forever. But who will build the new features? Who will fix the bugs?

Avoid refactoring for the sake of it. Refactoring is an important part of the development life-cycle, but you need to find a balance between your role as a programmer, where you add new features and write tests for them, and the refactoring part.

So I could say “rinse and repeat”, but when you’re done with the controller at hand, leave it be. Get a coffee, and bask in the glorious feeling of just having done your code and the developers to come, a big favor. Unless of course, you have a lot of time. In that case, for the love of code, keep going. But that’s usually a luxury. What you can do instead is plan in time for more refactorings when you’re adding features on a controller that’s another mess. Clean it up first, then get going with the new code. When you see code that could need a refactoring while working on something different make a note (and by note I don't mean TODO, FIXME, and the like, they will get lost in the code, and never be looked at again), and get cracking on it later.

This was just a beginning though. There's still things that surprise me when I work with legacy Rails code, and they want to be dealt with in their own specific ways. As I mentioned earlier, I'm still trying out new things, and I'm going to keep posting about them.

Please, share your experiences on the matter. I'm pretty sure I'm not alone with the pain of oversized controllers.

It's very simple: Because it won't go away. There, that was easy.

Every line of code you write becomes legacy code, whether you like it or not. Doesn't matter if you're using a shiny new framework or something that's been develop in-house. Sooner or later someone will have to maintain your code, whether it's adding features or fixing bugs. It doesn't matter if it's you or someone else. At some point some human will come back to your code, and will want to do something with it.

So what do you do? You take all precautions so that this person can do his job. You write tests, you refactor and clean up your code, and you make it as readable as possible. Simple as that. You work your way in from the outside.

What happens when you get back to someone else's legacy code? That depends really. If there are tests you run them, you read them. If there are none, you just write tests. It's a neat way to discover a piece of code's functionality. You'll find some first pointers on where to start refactoring, or where you can add that new feature marketing wants so badly.

Legacy code is something that will give you a chance to learn.

You can learn how to refactor code. It's not always fun, but I find working with legacy code to be a more rewarding if not always pleasant experience. You get the chance to make code better, to make it easier to read, and finally to add new things. It sounds more glorious than it really is, cleaning up after others isn't something that'll win you a Jolt award, but it will deepen your knowledge of the domain, the code, and it will improve your personal skills.

Over time it will get easier to work with other people's code, no matter how big the mess they've made.

Working with legacy code is not a glamorous job. But it's a skill that'll give you a lot more knowledge about working with code than constantly working on new projects will ever do.

Mind you, legacy code doesn't need to be a mess. It's up to you to prove that.

Code metrics are a nice tool, and they're starting to become popular in the Ruby community. They statically analyze your code, and tell you with the simple power of numbers what's wrong with your code. You can spend ages changing your code to make the numbers look good. Still that doesn't tell your customer anything. He doesn't care about those numbers, he wants to see a finished product.

Why do people come back to metrics then? Not an easy question, and there are many answers:

  • They need an excuse to waste time, to beautify their code so that the metrics approve.
  • Their company has a policy that code must fulfill the strict regiment of the measuring tools.
  • They just love numbers.
  • They need a tool to tell them that they're code looks nice.

That's pretty much it. I haven't found any other real use for them. They're just a waste of time. Everything they tell you is just based on a set of rules, not common sense.

Instead of trying to fix your code to satisfy the code metrics tools, work in pairs. Work together to find code that looks doubtful, that reduces readability, that may increase the risk of bugs.

It will do a lot more for the quality of your code than any tool will ever do. Relying on tools to ensure your code's quality will only cover some deeper-lying problems within your team or organisation.

One thing I explicitly except from this is code coverage. While it's not important to have exactly 100% of your code covered with tests, it's important to have a code coverage tool and your common sense at hand to find code that still requires testing, to get the code that matters under tests. Check out this post by the Thoughtbot guys (who write excellent testing tools, by the way) on test metrics. All you ever need in my book.

Other code metrics won't tell you anything a trained eye wouldn't see without them. So read code, write code, and try to make it look neat and readable. That's the experience you need.

I'm not religious about anything, but if there's one thing I adopted and applied rigorously over the last years it's testing. Out of that habit I've developed using some principles that drive my writing of tests.

  • You don't have to avoid stubbing and mocking everything, but you can try

    Reducing the number of collaborators will result in smaller methods which are in turn easier to test. In the end stubbing things out is better than having to have complex dependencies in place to run a test suite.

  • Test everything that's important

    And if it's not important, why not just throw it out? If you can't throw it out, then it's probably important. So make sure you write tests for it. Simple as that.

  • If you don't know what else to do, write a test

    Stop thinking about how the code should look, start thinking what you want it to do. Some people call it working test-first, I just use it to find out how I want the code to work from the outside. When you put every aspect of the method in question into test cases, the way it should work will take shape naturally.

  • The testing framework is not your biggest problem

    While I prefer the style of RSpec and Shoulda for writing tests, it's usually not the problem of which of them you actually use. The point is to write tests, choose whichever framework will make that the easiest for you. You can write tests with all of them.

  • Don't DRY out your tests

    Tests are about readability. They're supposed to tell the reader what the code is supposed to do, without having to browse up and down in the test file to find methods that are used to create objects, or worse yet, through fixtures. Don't try too hard to put common code to setup scenarios into separate methods. It's just not important in tests. Create your scenarios where you need them, or use tools like factory_girl, where you collect object scenarios in a single file.

    Don't put too much code into separate methods for reuse in multiple test cases. If you do, make sure you name them in a way that will ensure readability, and think twice before you move code away from the point where other developers will look first.

  • Keep your tests small

    Instead of testing everything in one test method, create a separate test for every aspect of the method under test. The tests are easier to read and errors are easier to find.

Writing good tests is an art in itself, but trying your best to learn and constantly improve it is the best thing you can do for yourself in my opinion.

I've been on a lot of projects, where people put an awful lot of time into coming up with the right coding style. Everyone of course wants to have his knack for a certain way of doing things included: "I want my opening curly brace at the end of the line." Or: "A single space between if and the following opening brace."

I've been on several projects where I've written either coding styleguides myself or had people scream at me or others for not conforming to the coding styleguide.

It all seemed like an incredible waste of time to me, because people just tend to stop caring, or find a way around your automated checks. Everyone has its own style to code, and it's healthier for a project to just accept that fact.

Because they want to focus on the task at hand instead of some guidelines that just stand in their way of building something awesome, and simply because they don't like having every aspect of their work controlled. Each developer has a different personality. That will reflect upon his code, and that's a good thing.

Instead of a document of several pages you just could've come up with four bullet points, like the Rails team has. Simple as that. No long discussions, no enforcing the rules. Let peer review do the rest. If your programmers work in pairs, you don't need a fancy coding guidelines document anyway. People will just adapt to a common coding style naturally. People will point out possible issues with code readability on the spot without the need to control and micromanage them.

The most useful coding guideline I've ever seen and used: Indent with two spaces. If you don't think so, I highly recommend this rather dogmatic Rails coding style checklist.

It's hard to believe, but for some people it still doesn't. I've heard something along the lines of "The refactoring is done, now we can code again" or "I can't refactor that code right now, I'll just add a little code here and be done with it" far too often over the last months and years. The irrational but persistent thought that refactoring is a once-in-a-product-lifecycle activity is an annoyingly sticky idiom. Refactoring is not a one time thing, it is (or at least should be) an important part of your development process, equally important as coding and testing.

Granted, it's not easy to sell a refactoring, and it very likely won't work to bring a project to a full stop every once in a while to clean up the mess that gathered over the last few months. And most of the time it's not the way to go. Refactoring is in iterative process, something that is constantly needed and should be done as needed. But it's still something that lots of people don't realize.

If it's not done, controllers will grow, replication will crawl in, code will become less and less readable, and therefore understandable. Hence it'll take more time to add features, to fix bugs, and to bring new team members up to speed. The worst thing of it all is that once code bloat creeps into your project, things will only get worse. If one part of it is broken or bloated, people will just start adding more code, introduce more bloat, more bugs, and more complexity. The code will smell really bad really quickly. Everyone will notice it, but most of the people just will get used to the smell and ignore it.

If you don't fix these broken windows on a regular basis, soon your whole project will be full of them. And let me tell you looking at these broken windows, and being the one having to fix them is not a very satisfying job. It's an experience you can do once, but if you're constantly cleaning up after others, it's just frustrating.

The cost of not doing refactoring only becomes obvious after a while, but sometimes even not at all. Though the latter will very likely only happen when you actually make refactoring a part of your development lifecycle.

Only if you don't do it will it take longer to work with the code, and will in the end cost you more than a comparably (if not exactly squeaky) clean code base. Productivity of your development will decrease more and more over time.

But refactoring is not only something that's sometimes misunderstood by management, developers still forget the most important thing of it all, to clean up after themselves. Code a little, test a little, refactor a little. That should be the daily mantra. Never stop refactoring your code, otherwise you'll have to fix a lot of broken windows sooner than you think.

This morning, on day two, Marcel Molina and Michael Koziarski did a little Best Practices session, a welcome change to the keynotes and sessions. It was very code-oriented. I did even take something out of it I didn't know before. Though I wish it would've gone into a little bit more detail (which I actually wish for a lot of the other sessions as well, but more on this in a later post), it was something that you could relate to on a practical level.

I took some notes, without code though, and here they are:

  • Keep the controllers skinny, keep logic that’s on the model's low level out of the controller
  • All that logic in the model makes it the fat model
  • The controller should not deal with that logic, because it’s a different layer of abstraction
  • Rough guide: 6 to 7 actions per controller, 6 to 7 lines per action
  • Use association proxy methods. Add custom finders for associations to keep the association logic in the model and to represent the business logic more clearly
  • Use explicit and short validation callbacks (e.g. validate :make_sure_something_is_as_it_should_be) instead of just long validate methods. It’s easier to read and understand
  • with_scope can make code harder to read and is (apparently) used in situations where it isn’t necessary. It can be used to fake associations through proxies, e.g. to find objects that aren’t associated with an object through the database, but through some conditions, e.g. a smart group or a smart folder

Short, but sweet.