There's a term which in the sense of hiring (and firing), is more loaded than anything else. I'm talking about the culture fit.

A recent post on the Lighthouse blog (a product I'm actively using and that I'm a big fan of, by the way!) states that not checking for culture fit is one of the eight interview mistakes that cost you great candidates.

The post brings up the most peculiar example in this regard, a company culture of drinking (disclaimer: I've been a non-drinker for more than 16 years now). If your company has a culture of heavy drinking, then you can be sure as hell that someone like me will neither fit nor want to fit into that culture.

Startup culture in particular is known for their silent rituals and expectations on new hires. If a mum or dad can't go out to a bar at night, that's a thumbs-down, right? After all, they're not willing to socialize, and your company is like a family. A family that drinks together to create and maintain bonds.

The word culture fit continues to be thrown around both as reasons not to hire someone and to fire someone. It's a simple explanation, and it should be clear to anyone on the team (and the person affected) why they're not a culture fit, right?

There's one fundamental mistake in both using and looking for culture fit as a means for hiring: You're assuming that your current culture is healthy and doesn't need to be changed.

Using culture fit as a reason to fire or not to hire says more about you than it says about them. It says that you're not willing to dig deep and figure out what exactly you think doesn't match in your expectation and a candidates personality. It shows that your culture is a fixed property of your company and team, one that can't be changed, one that is exactly where you want it to be.

Culture fit is a reason to continue maintaining the status quo.

But let's take it one step at a time.

What is culture?

Culture fit is a loaded word, because the word culture has so many possible definitions. There are a lot of layers in your company.

It's safe to assume that culture is represented by your company's and team's values. If you've poured lots of work into those, then you have a good definition of your team's expected behaviour.

But culture goes beyond that. Culture is what happens at your company every day. Culture includes founders buying themselves expensive cars from a secondary investment.

Culture includes silent expectations like going to socialization events in evenings, like drinking at bars, dinners and other company events. No one wants to put those on job ads, right?

Culture is how you write and phrase your job ads. Culture is whether you're looking for rock stars or want to build a great team and help people grow. Culture is how you pay your people. Culture is how a CEO behaves towards their team and in public. Culture is how leadership fosters and drives change. Culture is how you treat your customers. Culture is how you treat your team. Culture is how open you are to changing the status quo. Culture is a team that only consists of white dudes in their late twenties.

Culture is that ping pong table in your office. Culture are all those free and unhealthy soft drinks that your company keeps in the fridge. Culture is serving your team breakfast or lunch (or both?) every day to make sure they're in the office for as long as possible. Culture is talking about commitment issues when someone on your team needs to leave early because they have children to take care of.

Everything you do every day, in your company and team, is part of your culture.

If your assumption is that no one should be able to be a part of that and that it's written in stone and cannot be changes, then by all means, hire or fire based on culture fit.

Culture fit is a means to keep people out of a protected and privileged circle, rather than to protect that circle's values, which is probably what you think it is..

Culture fit is a means to avoid talking about whether your culture is healthy and whether it needs to be improved, and most importantly, to avoid actively changing and improving it.

Stop using "culture fit"

If culture fit isn't a reason for not hiring someone or firing someone, then what is?

Culture fit is a loaded word because it can have so many meanings, it can apply on so many layers of your company. Using it means you want to spend little time on figuring out where exactly someone isn't a good match for your own expectations and why those expectations exist.

The best way to avoid falling into the culture fit trap is to have an honest look at why someone doesn't match your expectations. Did they not match implicit or explicit expectations? If they're implicit, are they really a part of your company culture? If they are, why are they not explicit?

If socializing over drinks is an expectation you have, then you should be honest enough to make it an explicit expectation. Or, if you want my advice, you should revisit why and whether it's such an important part of your culture.

Because I can tell you right here, making it explicit will help keep even more people out of your precious circle. Parents, people of religion who don't drink, non-drinkers. You can be sure that those people will never be part of your team, and that your team will continue to attract the same kind of people that are already a part of it.

"Culture fit" hampers the biggest benefit of any great team: diversity. Stop using it and start looking at the real reasons why you don't want to hire someone. They might not be their flaws but yours.

Tags: culture

Listening to the “This American Life” episode on the GM/Toyota NUMMI plant recently, one particular part struck me as interesting when it comes to culture.

Culture is something that everyone would love to be able to easily replicate. Companies like Etsy, Netflix and others are forging ahead with openness, open source and empowering employees when it comes to their production systems.

NUMMI was an attempt to bring Toyota’s principles in building cars to General Motors, the automotive giant that was struggling hard in the eighties and was eventually bailed out by the American tax payers in 2009.

Toyota’s production line is famous for a simple tool, the Andon cord that allowed every worker on the factory floor to stop the assembly line whenever they encountered a problem. This empowered every employee to work towards a single goal: quality.

At NUMMI, this same system was implemented, and very successfully so. Every worker in the factory initially worked for two weeks with a team at Toyota in Japan to fully experience how teamworks looks like. It didn’t exist inside GM before NUMMI was conceived.

The Andon cord is an essential tool in learning and improving quality continuously. Every stop of the production line is an opportunity to learn and to improve the production process.

Before the NUMMI experiment, and in the rest of GM, the one goal is to never stop the production line. Quantity over quality, at all times.

Quality at NUMMI thrived, and GM looked into implementing this in more of their factories.

This experiment failed as there was a lot of resistance in management, amongst the workers and in the unions (all of whom had been fully onboard at NUMMI).

One bit in particular was interesting about the adoption issues.

The Andon cord was installed in other factories too, but when workers used it, they were reprimanded for stopping the production line. Managers were paid by volume of cars leaving the factory. In other factories, the cord was cut down so it was harder to reach.

I found this bit fascinating in so many ways, and it made me think about culture.

We’d love to just take a blueprint from another company and apply that to ours. But culture is something you need to work hard on, something that takes years of learning and improving to bring about, and it requires continuous nurturing to stay healthy.

You can’t just replicate culture.

Failure is still one of the most undervalued things in our business, in most businesses really. We still tend to point fingers elsewhere, blame the other department, or try anything to cover our asses.

How about we do something else instead? We embrace failure openly, turn it into our company's culture and do everything we can to make sure every failure is turned into a learning experience, into an opportunity?

Let me start with some illustrating examples.

Wings of Fury

In 2010, Boeing tested the wings of a brand new 787 Dreamliner. In a giant hangar, they set up a contraption that'd pull the wings of a 787 up, with so much pull that the wings were bound to break.

Eventually, and after they've been flexed upwards of 25 feet, the wings broke spectacularly.

The amazing bit: all the engineers watching it happen started to cheer and applaud.

Why? Because they anticipated the failure at the exact circumstances where it broke, at about 150% of what wings handle at normal operation.

They can break things loud and proud, they can predict when their engineering work falls apart. Can we do the same?

Safety first

I've been reading a great book, "The Power of Habit", and it outlines another story of failure and how tackling that was turned into an opportunity to improve company culture.

When Paul O'Neill, later to become Secretary of the Treasury, took over management of Alcoa, one of the United States' largest aluminum production companies, he made it his first and foremost to tackle the safety issues in the company's production plants.

He put rules in place that any accidents must be reported to him within just a few hours, including remedies on how this kind of accident will be prevented in the future.

While his main focus was to prevent failures, because they would harm or even kill workers, what he eventually managed to do is to implement a company culture where even the smallest suggestions to improve safety or to improve efficiency from any worker would be considered and would be handed up the chain of management.

This fostered a culture of highly increased communication between production plants, between managers, between workers.

Failures and accidents still happened, but were in sharp decline, as every single one was taken as an opportunity to learn and improve the situation to prevent them from happening again.

It was a chain of post-mortems if you will. O'Neill's interest was to make everyone part of improving the overall situation without having to fear blame. Everyone was made felt like they're an important part of the company. By then, 15000 people worked at Alcoa.

This had an interesting effect on the company. In twelve years, O'Neill managed to increase Alcoa's revenues from $1.5 to $23 billion dollars.

His policies became an integral part of the company's culture and ensured that everyone working for it felt like an integral part of the production chain.

Floor worker's were given permission to shut down the production chain if they deemed it necessary and were encouraged to whistle when they noticed even the slightest risk in any activity in the company's facilities.

To be quite fair, competitors were pretty much in the dark about these practices, which gave Alcoa a great advantage on the market.

But within a decade of running the company, he transformed it into a culture that sounds strikingly similar to the ideas of DevOps. He managed to make everyone feel responsible for delivering a great product and for everyone to be enabled to take charge should something go wrong.

All that is based on the premise of trust. Trust that when someone speaks up, they will be taken seriously.

Three Habits of Failure

If you look at the examples above, some patterns come up. There are companies outside of our field that have mastered or at least taken on an attitude of accepting that failure is inevitable, anticipating failure and dealing with and learning from failure.

Looking at some more examples it occurred to me that even doing one of these things will improve your company's culture significantly.

How do we fare?

We fail, a lot. It's in the nature of the hardware we use and the software we build. Networks partition, hard drives fail, software bugs creep into system that can lead to cascading failures.

But do we, as a community, take enough of advantage of what we learn from each outage?

Does your company hold post-mortem meetings after a production outage? Do you write public post-mortems for your customers?

If you don't, what's keeping you from doing so? Is it fear of giving your competitors an advantage? Is it fear of giving away too many internal details? Fear of admitting fault in public?

There's a great advantage in making this information public. Usually, it doesn't really concern your customers what happened in all detail. What does concern them is knowing that you're in control of the situation.

A post-mortem follows three Rs: regret, reason and remedy.

They're a means to say sorry to your customers, to tell them that you know what caused the issues and how you're going to fix them.

On the other hand, post-mortems are a great learning opportunity for your peer ops and development people.

Web Operations

This learning is an important part of improving the awareness of web operations, especially during development. There's a great deal to be learned from other people's experiences.

Web operations is a field that is mostly learning by doing right now. Which is an important part of the profession, without a doubt.

If you look at the available books, there are currently three books that give insight into what it means to build and run reliable and scalable systems.

"Release It!", "Web Operations" and "Scalable Internet Architectures" are the ones that come to mind.

My personal favorite is "Release It!", because it raises developer awareness on how to handle and prevent production issues in code.

It's great to see the circuit breaker and the bulkhead pattern introduced in this book now being popularized by Netflix, who openly write about their experiences implementing it.

Netflix is a great example here. They're very open about what they do, they write detailed post-mortems when there's an outage. You should read their engineering blog, same for Etsy's.

Why? Because it attracts engineering talent.

If you're looking for a job, which company would you rather work for? One that encourages taking risks while also taking responsibility for fixing issues when failure does come up, and one that enables a culture of fixing and improving issues as a whole rather than to put blame?

I'd certainly choose the former.

Over the last two years, Amazon has also realized how important this is. Their post-mortems have gotten very valuable for anyone interest in things that can happen in multi-tenant, distributed systems.

If you remember the most recent outage on Christmas Eve, they even had the guts to come out and say that production data was deleted by accident.

Can you imagine the shame these developers must feel? But can you imagine a culture where the issue itself is considered an opportunity to learn instead of blaming or firing you? If only to learn that accessing production data needs stricter policies.

It's a culture I'd love to see fostered in every company.

Regarding ops education, there have been some great things last year that are worth mentioning. hangops is a nice little circle, streamed live (mostly) every Friday, and available for anyone to watch on YouTube afterwards.

Ops School has started a great collection of introductory material on operations topics. It's still very young, but it's a great start, and you can help move it forward.

Travis CI

At Travis CI, we're learning from failure, a lot. As a continuous integration platform, it started out as a hobby project and was built with a lot of positive assumptions.

It used to be a distributed system that always assumed everything would work correctly all the time.

As we grew and added more languages and more projects, this ideal fell apart pretty quickly.

It is a symptom of a lot of projects that are developer-driven, because there's just so little public information on how to do it right, on how distributed systems are built and run at other companies for them to work reliably.

We decided to turn every failure into an opportunity to share our learnings. We're an open source project, so it only makes sense to be open about our problems too.

Our audience and customers, who are mostly developers themselves, seem to appreciate that. I for one am convinced that we owe to them.

I encourage you to do the same, to share details on your development, on how you run your systems. It'll be surprising how introducing these changes can affect working as a team as a whole.

Cultural evolution

This insight didn't come easy. We're a small team, and we were all on board with the general idea of openness about our operational work and about the failures in our system.

That openness brings with it the need to own your systems, to own your failures. It took a while for us to get used to working together as a team to get these issues out of the way as quickly as possible and to find a path for a fix.

In the beginning, it was still too easy to look elsewhere for the cause of the problem. Blame is one side of the story, hindsight bias is the other. It's too easy to point out that the issue has been brought up in the past, but that doesn't contribute anything to fixing it.

The more helpful attitude than saying "I've been saying this has been broken for months" is to say "Here's how I'll fix it." You own your failures.

The only thing that matters is delivering value to the customer. Putting aside blame and admitting fault while doing everything you can to make sure the issue is under control is, in my opinion, the only way how you can do that, with everyone in your company on board.

Accepting this might just help transform your company's culture significantly.