Writing a Reliability Strategy (2019)

A while ago I wrote about modeling a hiring funnel as an example of creating a system model, but that post doesn't explore how the process of evolving a system model can be helpful. This post does. | Continue reading


@lethain.com | 4 years ago

How to build your company's engineering brand

If you end up working in an engineering team that wants to accelerate hiring, at some point you’ll hear the dreaded statement, “We need to grow our eng brand.” The method to accomplish that aim isn’t always clear, but the goal: what can we do so that candidates enter our process … | Continue reading


@lethain.com | 4 years ago

First 90 Days as CTO or VP Engineering

Whenever I transition to a new opportunity, I think about how to “start well.” How can I ramp up as effectively as possible? How do I balance the urge to “show value” immediately with making the right decisions? | Continue reading


@lethain.com | 4 years ago

How the Digg team was acquihired

About a year after the catastrophic Digg V4 launch, our last-ditch experiment to salvage the site showed a spark of hope. We’d cajoled our way into a Facebook beta that allowed us to publish each Digg users’s read articles into their Facebook newsfeed, sending every clicking frie … | Continue reading


@lethain.com | 4 years ago

Incident response, programs and you(r startup)

During an incident at Digg, a coworker once quipped, “We serve funny cat pictures, who cares if we’re down for a little while?” If that’s your attitude towards reliability, then you probably don’t need to formalize handling incidents, but if you believe what you’re doing matters … | Continue reading


@lethain.com | 4 years ago

An Elegant Puzzle: Systems of Eng Management

I wrote a book, An Elegant Puzzle, which will be available in late May, 2019. This is something I've been working on over the past year, and which I'm extraordinarily excited to share! | Continue reading


@lethain.com | 4 years ago

Expanding on S[a-z]{3,} Reliability Engineer roles

One of my foundational learning experiences occurred in 2014, when I designed and rolled out Uber’s original Site Reliability Engineering role and organization. While I’d make many decisions a bit differently if I could rewind and try again, for the most part I’m proud when revie … | Continue reading


@lethain.com | 4 years ago

Reclaim Unreasonable Software

Big Ball of Mud was published twenty years ago, and rings just as true today: the most prominent architecture in successful, growth-stage companies is non-architecture. Crisp patterns are slowly overgrown by the chaotic tendrils of quick fixes, and productivity creeps towards zer … | Continue reading


@lethain.com | 4 years ago

You only learn when you reflect

Early in your career, the majority of problems you work on are difficult because they are _new_ for _you_. You’ve never done it before, and it’s challenging to do good work on problems you’ve never encountered before. However, the good news is that there are other folks on your t … | Continue reading


@lethain.com | 4 years ago

A Forty Year Career

The Silicon Valley narrative centers on entrepreneurial protagonists who are poised one predestined step away from changing the world. A decade ago they were heroes, and more recently they’ve become villains, but either way they are absolutely the protagonists. Working within the … | Continue reading


@lethain.com | 5 years ago

A Forty Year Career

The Silicon Valley narrative centers on entrepreneurial protagonists who are poised one predestined step away from changing the world. A decade ago they were heroes, and more recently they’ve become villains, but either way they are absolutely the protagonists. Working within the … | Continue reading


@lethain.com | 5 years ago

Describing Fault Domains

Fault domains are one of the most useful concepts I've found in arhcitecting reliable systems, and don't get enough attention. If you want to make your software predictably reliable, including measuring your reliability risk, then it's an extremely useful concept ot spend some ti … | Continue reading


@lethain.com | 5 years ago

Reclaim Unreasonable Software

Big Ball of Mud was published twenty years ago, and rings just as true today: the most prominent architecture in successful, growth-stage companies is non-architecture. Crisp patterns are slowly overgrown by the chaotic tendrils of quick fixes, and productivity creeps towards zer … | Continue reading


@lethain.com | 5 years ago

What's the Inverse of Literate Programming

This is an ongoing collection of various press from the An Elegant Puzzle book release. | Continue reading


@lethain.com | 5 years ago

Some Career Advice

One unexpected perk of publishing a book is that folks start to ask you questions about all sorts of loosely related things. One pretty common thread has been around career advice, I’ve written up most of my advice for easier reusability. Some of the ideas are a bit contradictory … | Continue reading


@lethain.com | 5 years ago

What I learned writing a book on engineering management

My favorite story from releasing An Elegant Puzzle is about preorders. After a few days of preorders, Amazon asks you to ship them a certain number of books that they’ll use to fulfill your orders. They originally asked us to ship some number of thousands -- exciting! -- then ask … | Continue reading


@lethain.com | 5 years ago

How to Invest in Technical Infrastructure

I'm speaking at Velocity on June 12th on 'How Stripe invests in technical infrastructure', and this is the rough outline of the content the talk will cover. I hope to see y'all there. | Continue reading


@lethain.com | 5 years ago

How to Invest in Technical Infrastructure

I'm speaking at Velocity on June 12th on 'How Stripe invests in technical infrastructure', and this is the rough outline of the content the talk will cover. I hope to see y'all there. | Continue reading


@lethain.com | 5 years ago

Notes on Structure and Interpretation of Computer Programs

Almost a decade ago, I bought a copy of Structure and Interpretation of Computer Programs. My purchase was inspired by folks calling it a great work, and I wanted to love it. In the decade since, I've started working through the book probably a dozen times, but never got too far. … | Continue reading


@lethain.com | 5 years ago

Infrastructure planning: users, baselines and timeframes

Technical infrastructure is never complete. System processes can always run with less overhead or be bin-packed onto fewer machines. Data can be retrieved more quickly and stored at a cheaper cost per terabyte. System design can broaden the gap between failure and user impact. Tr … | Continue reading


@lethain.com | 5 years ago

Introducing new specialized roles in your organization

Folks are sometimes surprised to learn that I started out working as a frontend engineer. I'd like to imagine it's because I'm so terribly knowledgeable about infrastructure, but I suspect it's mostly grounded in my unconscionably poor design aesthetic. Something that has stuck w … | Continue reading


@lethain.com | 6 years ago

Introduction to systems thinking

Many effective leaders I've worked with have the uncanny knack for working on leverage problems. In some problem domains, the product management skillset is extraordinarily effective for identifying useful problems, but systems thinking is the most universally useful toolkit I've … | Continue reading


@lethain.com | 6 years ago

Notes on “A Philosophy of Software Design.”

Jumping on the recent trend, I picked up a copy of A Philosophy of Software Design based on Cindy's recommendation. It's a fairly concise read at 160 pages, and I skimmed through it over the last few days, writing up some notes along the way. | Continue reading


@lethain.com | 6 years ago

Sizing engineering teams

I've come to believe that most organizational design questions can be answered by recursively applying a framework for sizing teams. Over the past year I've refined my approach to team sizing into a bit of a framework, and even changed my mind on several aspects, especially the v … | Continue reading


@lethain.com | 6 years ago

Digg's v4 launch: an optimism born of necessity

Digg was having a rough year. Our CEO left the day before I joined. Senior engineers ghosted out the door, dampening productivity and pulling their remaining friends. We had only one remaining shot at revival, launching our two-years in the making rewrite: Digg v4. It did not go … | Continue reading


@lethain.com | 6 years ago

Digg's v4 launch: an optimism born of necessity

Digg was having a rough year. Our CEO left the day before I joined. Senior engineers ghosted out the door, dampening productivity and pulling their remaining friends. We had only one remaining shot at revival, launching our two-years in the making rewrite: Digg v4. It did not go … | Continue reading


@lethain.com | 6 years ago

Digg's v4 launch: an optimism born of necessity

Digg was having a rough year. Our CEO left the day before I joined. Senior engineers ghosted out the door, dampening productivity and pulling their remaining friends. We had only one remaining shot at revival, launching our two-years in the making rewrite: Digg v4. It did not go … | Continue reading


@lethain.com | 6 years ago