Book Notes and Main Takeaways: “The Unicorn Project” by Gene Kim
(LM → my personal comments)
applying the strangler pattern to dismantle decades-old code monoliths and replacing them safely, confidently, and brilliantly.
LM: incrementally replacing parts of a legacy system with new components instead of doing a risky big-bang rewrite.
A healthy software system is one that you can change at the speed you need, where people can contribute easily, without jumping through hoops. This is how you make a project that’s fun and worthwhile contributing to, and where you often find the most vibrant communities.
She is able to build things with focus, flow, and joy. She had fast feedback in her work. People were able to do what they wanted without being dependent on scores of other people. This is what great architecture enables.
LM: fast feedback in her work = proper testing pyramid / proper understanding of subject under test
You can’t do anything without first convincing a bunch of steering committees and architects or having to fill out a bunch of forms or work with three or four different teams who each have their own priorities. Everything is by committee. No one can make decisions, and implementing even the smallest thing seems to require consensus from everyone.
LM: kkkcrying for having lived that in the past
“This company is run by a bunch of executives with no clue about technology, and project managers who want us to follow a bunch of arcane processes. I’ll scream at the next one who wants me to write a Product Requirements Document.” “The PRD!” everyone shouts, laughing.
LM: When you have a great product owner in the team the PRD can actually help, but the PRD process can’t become dogmatic.
And this is what an effective network is all about — when you can assemble a group of motivated people to solve a big problem, even though the team looks nothing like the official org chart.
LM: Searching and joining forces with other value focused people in the company is really a great way to get things done and improve joy at work in big organizations.
“Code deployment lead time, code deployment frequency, and time to resolve problems are predictive of software delivery, operational performance, and organizational performance, and they correlate with burnout, employee engagement, and so much more.
Which is why putting cross-cutting concerns in one place is so great, like logging, security, or retry policies. You change it there, and you’ve changed it everywhere,” he says.
LM: Not sure how this would work. is “one place” a library? what about security?
technical debt is what you feel the next time you want to make a change
LM: the opposite being total confidence in making a change and having a fast feedback cycle that it will work in production
The First Ideal — Locality and Simplicity The Second Ideal — Focus, Flow, and Joy The Third Ideal — Improvement of Daily Work The Fourth Ideal — Psychological Safety The Fifth Ideal — Customer Focus
the First Ideal of Locality and Simplicity. We need to design things so that we have locality in our systems and the organizations that build them. And we need simplicity in everything we do. The last place we want complexity is internally, whether it’s in our code, in our organization, or in our processes. The external world is complex enough, so it would be intolerable if we allow it in things we can actually control! We must make it easy to do our work.
The Second Ideal is Focus, Flow, and Joy. It’s all about how our daily work feels. Is our work marked by boredom and waiting for other people to get things done on our behalf? Do we blindly work on small pieces of the whole, only seeing the outcomes of our work during a deployment when everything blows up, leading to firefighting, punishment, and burnout? Or do we work in small batches, ideally single-piece flow, getting fast and continual feedback on our work? These are the conditions that allow for focus and flow, challenge, learning, discovery, mastering our domain, and even joy.”
She doesn’t judge or dismiss any of the technology stacks — after all, it’s been successfully serving the enterprise for decades. It may not be the most elegant piece of software she’s seen, but things that have been in production for twenty years rarely are. Software is like a city, constantly undergoing change, needing renovations and repair. She will, however, acknowledge that Data Hub is not the hippest neighborhood. It’s undoubtedly difficult to recruit new college grads who want to learn and use the hottest, most in-demand languages and frameworks.
In her MRP team, any developer could test their own code and even push code into production themselves. They didn’t have to wait weeks for other people to do that work for them. Being able to test and push code to production is more productive, makes for happier customers, creates accountability of code quality to the people who write it, and also makes the work more joyful and rewarding.
CEO Bill Gates was so concerned that he wrote a famous internal memo to every employee, stating that if a developer has to choose between implementing a feature or improving security, they must choose security, because nothing less than the survival of the company was at stake. And thus began the famous security stand-down that affected every product at Microsoft. Interestingly, Satya Nadella, CEO of Microsoft, still has a culture that if a developer ever has a choice between working on a feature or developer productivity, they should always choose developer productivity.
Each adds to the coordination cost for everything we do, and drives up our cost of delay. And because the distance from where decisions are made and where work is performed keeps growing, the quality of our outcomes diminish.
every incident is a learning opportunity, an unplanned investment that was made without our consent
LM: an unplanned investment -> Since you made this investment anyway, it’s better to capitalize on it.
But repeating platitudes isn’t enough. The leader must constantly model and coach and positively reinforce these desired behaviors every day. Psychological safety slips away so easily, like when the leader micromanages, can’t say ‘I don’t know,’ or acts like a know-it-all, pompous jackass. And it’s not just leaders, it’s also how one’s peers behave.
LM: the importance of constant reinforcement
‘Everyone must be responsible for their own safety and the safety of their teammates. If you see something that could hurt someone, you must fix it as quickly as possible.’ He told everyone that fixing safety issues should never be budgeted — just fix it, and they’d figure out how to pay for it later,”
Steve talks about workplace injuries at every Town Hall. He knows he can’t directly influence everyone’s daily work. However, Steve can reinforce and model his desired values and norms, which he does so effectively, Maxine realizes.
The last thing a QA person wants to hear from a developer they just met is their ideas on how to automate their job away.
“‘LARB’ stands for Lead Architecture Review Board,” Dwayne explains. “It was a committee created decades ago after a whole bunch of bad things happened in technology, long before I joined the company. Someone decided to create a bunch of rules to make sure anything new was ‘properly reviewed,’” Dwayne says, air quoting with his hands. “It’s a committee of committees. There are seven Ops architects, seven Dev architects, two Security architects, and two Enterprise architects. It’s like they’re frozen in time, still acting like it’s the 1990s,” he says. “Any major technology initiative needs their sign-off.
Instead of one product manager working on this the entire time, we could have had five people working on it. And we could have been learning the whole time, Maxine thinks. She wonders how much of this specification document that was written two years ago is now out of date.
We’ll need someone really good at databases, because we’ll probably need to reduce our reliance on the big, centralized Phoenix databases and all those systems of record. We’ll need some serious infrastructure skills to support a new deployment and operations model. And because we’ll likely be running things in production ourselves again, we’ll need people with superb skills in Security and Ops.”
Maxine heartily agrees and is again impressed with Kurt’s ability to deliver the things that the teams need, able to navigate the organization in a way very different than the official org chart would suggest.
Their use exploded after the famous 2004 Google Map/Reduce research paper was published, which described the techniques Google used to massively parallelize the indexing of the entire internet on commodity hardware, using techniques at the core of functional programming. This led to the invention of Hadoop, Spark, Beam, and so many other exciting technologies that transformed this space
Shannon describes how this new data platform would be fed by a new event streaming technology. “Unlike Data Hub, where almost every business rule change also requires a change from the Data Hub team, this new scheme would allow a massive decoupling of services and data. It would enable developers to change things independently, without needing a centralized team to write intermediary code. And unlike the centralized Data Warehouse, the responsibility for cleaning, ingesting, analyzing, and publishing accurate data to the rest of the organization would be pushed into each business and application team, where they have the most knowledge of what the data actually means.”
I’ve got twenty-five data scientists and analysts across five teams who never have the data they need. But it’s not just them — almost everyone in Marketing accesses or manipulates data. Operations is mostly about data. Sales operations and management is all about data. In fact, I’d bet half of all Parts Unlimited employees access or manipulate data every day. And for years, we’ve been handcuffed by the way everything has to go through the Data Warehouse team.
There’s so much we wanted to fix, but we were all put on other projects, so until recently, there’s been no full-time developers on the mobile apps. But as Maggie said, that has changed.
Worse, some of the automated tests were failing intermittently. Last week, she cringed watching as a developer, whose tests failed, just ran them again, and they also failed. So, he ran it a third time, as if it were slot machine in a casino. This time it passed. This is no way to run a development shop, Maxine thought with embarrassment and distaste.
They spend two hours writing tests around the code to make sure they really understand how it works, and then they start pulling out common operations, putting them where they belong.
Main takeaways
- In non-startup/bigger organizations, the problems Individual Contributors and the organization itself face are comically similar. they are caused by:
- — each director trying to maximize his/her local metrics by adding new processes and bureaucracy
- — not managing complexity (in software and in processes)
- What is great architecture?
- — ability to build things with focus, flow, and joy
- — fast feedback in her work
- — People were able to do what they wanted without being dependent on scores of other people
- Searching and joining forces with other value focused people in the company is really a great way to get things done and improve joy at work.
- — These people are already doing a rogue version of the practices described in the book Reinventing Organizations:
- — Flexible roles. People fill the role that they can generate the most value.
- — full autonomy on local decision making
- technical debt is what you feel the next time you want to make a change
- Developer focus: Security > Developer Productivity > Features