James Shore: New

The Accountability Problem

October 18, 2025

This is a transcript of my keynote presentation at the Agile Cambridge conference in England on October 2nd, 2025. The topic was “The Accountability Problem.” How do we define software department accountability so our business partners don’t do it for us?

Introduction

The Accountability Problem

Demonstrating Accountability

Quantifying the Bet

The First Two Years

Conclusion

A picture of buildings at Cambridge University, taken from the water of the River Cam.

Thanks for having me. I’m very happy to be here in Cambridge. This is my first time visiting, so I spent the afternoon Tuesday doing some sightseeing, including a lovely ride down the River Cam. I was delighted to learn yesterday that I had Simon Wardley to thank for chauffered punt rides, including the completely fictional story I was told about the mathematical bridge.

One of the things I love about Cambridge is its rich history. Of course, lots of history is important when you have...

A picture of the Chronophage, a large circular clock with a grasshopper-like monster at the top.

...this monster eating up every second.

That’s the Chronophage outside of Corpus Christi college, if you aren’t familiar with it, and much more impressive in person than in my terrible vertical picture with window glare.

A picture of a rapper holding a stop sign.

Before we get going, I should explain my context. You’ll hear a lot of advice at this conference, and how much that advice is relevant to you has a lot to do with how much their context matches yours.

I’m currently VP of Engineering at OpenSesame, and for the 23 years prior to that, I was a consultant. As VP, and as a consultant, I specialize in late-stage startups: entrepreneurial organizations that were successful enough that they were able to grow. These are companies with a product mindset that value entrepreneurial thinking, but they’re also trying to grow up and be “real companies,” and they’re trying to figure out how to do that without losing their entrepreneurial edge.

So that’s the context of my material: entrepreneurial companies building software products that they sell. If you’re not in that situation, I encourage you to mine my talk for ideas, but don’t try to apply it blindly. And if you are in that situation... well, mine my talk for ideas, and don’t try to apply it blindly!

A picture of a rapper holding a stop sign next to a set of disclaimers (explained in the text).

A few more disclaimers. All the substantive content of this talk—the words, diagrams, examples, and so forth—were created with my actual meat brain, without any AI. Large images have been sourced from various locations, and are credited in the bottom left corner.

I’ve also dressed up some of the slides with decorative AI-generated images from ChatGPT 5, like that rapper holding a stop sign. If there’s one thing GenAI is good at, it’s embellishment.

I should also mention that, although I work for OpenSesame, I’m not speaking for OpenSesame. I created this talk on my own time, and I’m technically on vacation right now. The opinions I express are my own.

A grainy black-and-white picture of LP Hartley. He’s standing outside, smiling genially at the camera, and holding a pipe.

Anyway, as I was saying, one of the things I love about Cambridge is its rich history. I’m sure you’ve all heard several times by now that the university was founded back in 1209, by people fleeing [waves hand dismissively] the other university. In comparison, my home town of Astoria, Oregon, which is the oldest permanent settlement on the west coast of the US, was founded in 1811. I think that’s last Tuesday by British standards.

Part of the history surrounding Cambridge is this man: LP Hartley. He was born in Cambridgeshire in 1895, although he never went to Cambridge University. He went to... the other one. But, despite that choice, he went on to become a successful novelist.

A picture of the cover of the book, “The Go-Between,” by LP Hartley. The cover has a watercolor picture of a boy in the English countryside.

His most famous novel is “The Go-Between.” It begins with a wonderful opening line:

“The past is a foreign country: they do things differently there.”

A picture of the cover of the book, “The Past is a Foreign Country,” by David Lowenthal. The cover has a scene that could be set in the ancient middle-east with pillars of ruined structures in the background.

And that connects us back to Cambridge. The University of Cambridge Press published this book in 1985. It’s by David Lowenthal, and it created an entire sub-genre of history called Heritage Studies. It’s still in print today, in a revised edition.

The concept of this idea is that, although the past informs the present, the present also informs the past. Our thoughts and actions today extend from events that occurred in the past. But, at the same time, our understanding of the past is colored by our thoughts and actions today.

The past is a foreign country. They do things different there. But we can’t visit the past. We can’t see what they did differently. We can only interpret what they’ve left behind.

Two pictures of medieval manuscripts features drawings of elephants. The elephants are oddly proportioned, with flaring trunks, boar-like tusks, hairy bodies, and (in some cases) no knees. A few of the elephants have castle turrets on their backs, with knights standing within.

And like medieval scholars drawing elephants they’ve never seen, we make those interpretations through the lens of our own biases.

I love these medieval drawings of elephants. They’re so delightfully strange.

But I’m not showing you these pictures to make a point about how difficult it is to draw an elephant when you haven’t seen one.

A picture of another medieval manuscript. It contains a realistic drawing of an elephant.

If you go back to Corpus Christi, where they have the Chronophage—not right now! I’ll start talking about software soon, promise. Anyway, at Corpus Christi, they have Matthew Paris’ Chronica Majora. It contains this drawing of an elephant. You might assume that it came much later, because it’s so much more accurate. But all of these drawings were created around the same time, in the 13th century.

A side-by-side comparison of the two styles of elephants.

It’s quite the difference, isn’t it?

I’m not showing you these images to make a point about medieval monks. I’m actually showing them to make a point about your biases. In the modern era, we expect images to be true to life. We have cameras that give us nearly perfect representations of the world. But realism isn’t what medieval monks were always trying to accomplish. Religion and metaphor were a central part of their lives, to a degree that I think we in the modern world have trouble understanding.

The elephants on the left aren’t really elephants. They’re a way of presenting a moral lesson about your place in the world. The image serves that story. It’s not there to teach you about elephants. It’s there to teach you about God.

So if your first reaction to these elephants was to laugh at those ignorant medieval monks... then perhaps you’ve fallen prey to your biases. The elephant doesn’t look like an elephant because the metaphor was more important than the reality.

The past is a foreign country. They do things differently there.

[beat]

The past informs the present, but the present informs the past. We can’t help but to interpret it through the lens of our own experience, and those biases distort the reality of what it was actually like to live there.

This idea fascinates me, because it’s not only true of the past; it’s true of everything. Our biases and experiences influence so much of how we interpret the world.

A slide labelled “XP’s TDD.” It shows a loop of “Think”, “Red”, “Green”, “Refactor,” and then back to “Think.” There’s a smaller loop from “Refactor” back to “Green.”

I taught teams Extreme Programming for a few decades, as a consultant. Now that I’m VP of Engineering, I’m still teaching it, in a way. One thing that’s stood out to me over the years is that the people who struggle the most to learn XP are the ones who are more senior.

Junior developers have no problem! It’s the senior developers who struggle. They have too much baggage from their preconceptions.

A good example of this comes from Microsoft. XP was popular in the early 2000s, and practices like test-driven development, which come from XP, were entering the mainstream. So Microsoft published a set of “Guidelines for Test-Driven Development.”

There was a big backlash, and Microsoft took their guidelines down pretty quickly, because they got them terribly, ridiculously, horribly wrong. Microsoft didn’t actually practice XP, as far as I can tell, so they didn’t know that XP is a way of keeping software design simple and evolving it in response to customer needs. In XP, you don’t create your design in advance; you discover it as you go, and you focus on keeping it as simple as you can.

People who have practiced XP know that TDD is about tests and code evolving in step with each other, so that you learn as you go. A few lines of test code. See the tests fail. A few lines of production code. See the tests pass. A few improvements to the design. See the tests pass. A few more lines of test code. See the tests fail. And so on, and so forth, until the software is done, without following a preconceived path.

A slide labelled “Microsoft’s ‘TDD’.” It shows a waterfall-style process. The details are explained in the text.

As with many companies past and present, the Microsoft way wasn’t to evolve their design; it was to come up with a software design in advance, then build to that preconceived design. And so they saw what Kent Beck and others had said about TDD and interpreted it in the only way they knew how: as a way of coming up with a software design in advance, and then building to that design. Their guidelines for TDD were to:

Gather the requirements for your new feature
Make a list of tests that will satisfy the requirements
File work items for the tests that need to be written
Generate all the interfaces and classes you’ll need—using Visual Studio, of course!
Write all the tests
Write all the production code

I’m not exaggerating! This is what they actually said. Refactoring, iteration, learning as you go—key ideas of XP and TDD—nowhere to be found.

Microsoft’s approach to TDD was the exact opposite of what TDD was about. But they were only able to interpret TDD through the lens of their corporate approach to software development. And to this day, you see this same misunderstanding about TDD repeated by people who are steeped in up-front thinking.

XP is a foreign country. We do things differently here.

[beat]

As people, we can’t help but to interpret the world through lens of our own biases. But that means that we make assumptions about the world that aren’t true, and we can’t even recognize that we’re doing it. It’s not just the past that’s a foreign country... almost everything is.

That leads to problems. And in software, one of the biggest, is...

A title slide reading “The Accountability Problem.” It’s presented in the style of an illuminated medieval manuscript.

...the Accountability Problem.

[beat]

A dramatic cinematic shot of a man typing frantically at a keyboard as he stares into the camera.

People who aren’t software developers have probably seen more “programming” in movies and TV shows than in real life. Those shows are filled with magical people who can “hack” anything off-camera and in moments.

“The ship’s going to ram us, captain!” “Quick, hack into their retro-encabulator and reverse the polarity of their thrusters!” (frantic typing, dramatic music, camera zoooooooom) “It just barely missed us! Hoorah!”

I only wish software development was that cool.

A screenshot of a simple BASIC program that uses GOTO to print “HELLO” onto the screen over and over again, along with its output.

Of course, people do know that’s fiction. Some of them might have even written code in school. But in school, people write small programs that fulfill an assignment and don’t have to be maintained.

An image of a robot programming a computer.

Or maybe they’ve vibe-coded an app using GenAI.

A slide showing all three previous images together.

None of these experiences bear any relationship to the modern world of software development.

TV show hackers are just another deus ex machina... quite literally. It’s lazy writing.

School projects don’t require long-term maintenance or large-scale coordination.

Unsupervised AI coding assistants feel magical, but they break down once you get past the prototype stage.

All of these things trick people into thinking that software development is about code. About hands on keyboard. But that’s not what it’s about at all.

A slide showing Kent Beck’s Extreme Programming Values: Communication, Feedback, Simplicity, Courage, and Respect.

To paraphrase Kent Beck, professional software development is about...

Communication and collaboration between large numbers of people with different perspectives.

Feedback loops that enable us to tell when we’re building the right thing, and the thing right... and when we’re not.

Simplicity, because it’s our ability to understand and change software that determines timelines and cost.

Courage to do the right thing even when it’s hard, and it’s often hard.

Respect for the people doing the work and the people affected by the work.

A slide labelled “Business Assumptions.” It’s marked “Incorrect” in bold red text. The contents of the slide are explained in the text.

We know that software development is a matter of discovery and coordination. But to our business partners, we’re a foreign country. They can only see us through the lens of their experience.

Their experience is that software development is about writing code, in the same manner that someone might do a homework assignment. It’s tedious, perhaps; time-consuming, maybe; but ultimately, a matter of buckling down and doing the assignment... following a straight path from here to there.

If you think this way—if you think that software development is like a big homework assignment—then you start making a bunch of assumptions.

You assume that you only need to define the assignment correctly to get the right answer.

You assume that the assignment has one right answer, and there’s a clear path to that answer.

You assume that people can tell you what that path is and how long it will take.

You assume that, when work isn’t getting done according to that schedule, it’s because people aren’t working hard enough.

And you assume that, when work’s behind, putting pressure on people will make them work harder and get it done on time.

Ultimately, you think software development looks like this [play animation]: a trip from point A to point B.

When in reality, it’s more like this [play animation]: a process of exploration and discovery, where the outcome isn’t known until you get there.

Software development is a foreign country. We do things differently here.

A slide labelled “Project-Based Governance.” It’s marked “Avoid” in bold red text. The slide shows three steps: 1. Build the plan; 2. Work the plan; 3. Track progress vs. plan. The slide defines “Success” as “On time, on budget, as specified.”

These misconceptions aren’t harmless. They extend deep into organizational structures. The biggest impact is how software development is run in most organizations. Most organizations use project-based governance. You create a plan, then you work the plan. If you execute the plan properly, you’ll be successful, and you’ll finish on time.

In this environment, it’s management’s job to make sure that the plan is created correctly, worked correctly, and that people don’t slack off.

How do you know management is doing their job? What are they accountable for?

Delivering software on time and on budget.

It’s clean, it’s neat, it’s easy to understand, and it matches people’s misconceptions about software development.

And it results in bad software.

A slide showing a waterfall with four stages: “Analyze market,” “Define exactly what to build,” “Build it,” and “Profit.” The first two stages (“analyze market” and “define exactly what to build”) are labelled “wishful thinking” in bold red text.

The whole premise that we can define the assignment in advance is incorrect. Software development is a process of discovery—of iteration and refinement. We learn as we go, and that changes our plans.

This is an Agile development conference. You’ve heard it all before. I’m not going to belabor the point.

But our business partners haven’t heard it before, or if they have, it’s counter to their experiences. Like us seeing medieval pictures of elephants, like Microsoft with TDD, they can’t help but interpret the world through their own biases. And those biases lead to project-based governance.

In their minds, anything less... is a lack of accountability.

A slide showing four departments and what they’re accountable for. The contents of the slide are explained in the text.

So what can we do about this?

Ultimately, accountability is about being responsible for a set of results. At the executive level, everybody has to be accountable.

Marketing is responsible for generating leads for your Sales department. They say how many qualifying leads they’re going to create, and they’re accountable for having done so.

Partners also generates leads, or even sales, from people who are using complementary products and services. They’re accountable for bringing in partners, and for the revenue those partners generate.

Sales converts leads into paying customers. They’re accountable for the revenue generated by those customers.

Customer Success takes care of your customers. They’re accountable for retention, and for generating additional revenue from upsells.

Everyone is accountable for doing what they say they’ll do, including us in software development. But there’s something different about how everyone else is accountable. Did you notice?

For other departments, accountability is about the results they’re bringing to the organization, not the work they’re putting in. Sales isn’t saying, “we’re going to land customer X on date Y.” Everybody knows that sales take time, and things go sideways. So Sales says “we don’t know exactly which customers we’re going to land, or when, but overall, we’re going to generate X dollars of revenue.” Same for Marketing, and Partners, and Customer Success. We in software are the only ones who have to predict exactly what and when.

Our business colleagues aren’t unreasonable. They understand that things go wrong. But they also believe, deep in their hearts, that if you aren’t accountable, you won’t put forth your full effort.

And if we don’t define how we’re going to be accountable, they’ll do it for us, in the only way they know how. Which features are you going to deliver? When? If you don’t deliver them on time, you aren’t being accountable.

The same list of departments with their purpose removed.

We have to change the script.

So what should we be accountable for instead? What, exactly, do we do? What results do we create?

[beat]

We create new opportunities. Let’s say that the trajectory of your company is to grow its annual revenue by $10mm per year. Our job is to increase that rate of growth, to $12, $15, $20mm per year. Every time we ship a new feature, we should be increasing that rate of growth.

The same list of departments again, but now they’re larger. Sentences describing how product engineering helps them grow have been added. Those sentences are described in the text.

Our features should open up new markets, allowing Marketing to generate more leads.

We should provide useful APIs, allowing Partners to build new relationships.

We should respond to market trends, allowing Sales to convert more leads.

And we should fix the problems that get in customers’ way, reducing churn and increasing upsell.

What are we accountable for? We’re accountable for improving our companies’ trajectories. Every dollar invested into software development, other than keeping the lights on, should be reflected in permanent improvements to the value your company creates. That value may not be literal dollars or pounds; it may be helping to cure malaria or fighting climate change. But however you define value, the purpose of our work is to change that trajectory for the better.

A title slide reading “Demonstrating Accountability.” It’s presented in the style of an illuminated medieval manuscript.

It’s easy to say that we’ll be accountable for improving our companies’ trajectories. But how do we actually demonstrate that we’re doing so?

It’s nearly impossible to quantify the impact of any individual feature. It takes months to see an impact from a new feature, and even then, we can’t say that feature X resulted in change in behavior Y. Let’s say churn went down by half a percent. That’s great! Did it go down because of the feature we just released? Or because of a different one? Or is it more that interest rates just dropped and we hired an amazing new director for our customer success department?

This is why it’s tempting to look at when you’ll deliver a feature. It’s easy to measure.

An illustration of an elephant using a shovel to dig a hole.

But ultimately, features are a means to an end, not the end itself. There’s an old cliché that people don’t want a shovel, they want a hole in the ground. And they don’t want a hole in the ground, they want a building foundation. And they don’t want a building foundation, they want a nice big stable. And they don’t want a stable, they want war elephants that make their enemies say things like, “Carthago Delende Est!”

When we talk about delivering features, we’re talking about shovels when we should be talking about striking fear into the hearts of Roman soldiers.

A slide labelled “product bet.” It reads, “Strike fear into the hearts of Roman infantry by fielding a battalion of war-capable elephants.”

So instead of talking about features, I’ve introduced a way of talking about value. At OpenSesame, we’re calling them “Product Bets.”

Before we go further, a quick disclaimer. The term “bet” is common among startups and other entrepreneurial organizations, so you’ll hear the phrase “product bet” from a lot of different people. Each of us is using it in our own way. So my use of “product bets” isn’t the same as what you might have seen from somewhere else.

Okay, so what do we mean when we say product bet?

Ultimately, it’s a strategic investment in a business result. It’s summarized with a single sentence that has two parts:

First, the business outcome: Strike fear into the hearts of Roman infantry!

Second, the means by which we do so: ...by fielding a battalion of war-capable elephants.

The result always comes first: strike fear. The mechanism comes second: war elephants. And even then, it’s high level. We need a stable, we need animal breeders and trainers, we need to train soldiers, we need a supply line. We need so many things, and not just software. Those are features. We don’t talk about features in our product bet. We keep it high level. Just the headline.

The same slide as before, but a new line has been added. It reads, “Sponsor: General Hannibal.”

Next, we need a sponsor. Who amongst our leadership team is going advocate for this result? At OpenSesame, it’s usually our Chief Product Officer. But sometimes it’s our Chief Customer Officer, who’s in charge of sales and retention.

For the Carthaginians, of course, the sponsor is General Hannibal.

The same slide as before, but another new line has been added. It reads, “Present Value: 10,385,202 shekels.”

Next we talk about estimated present value. This is a core innovation. As I said, it’s nearly impossible to measure the impact of any feature, or even set of features. There’s too many confounding factors.

So we don’t measure the impact. We estimate the impact.

My software department takes accountability for delivering estimated value, not measured value.

Now, that’s not to say that we don’t want to validate results. Jeff Patton talks about using Dave McClure’s Pirate Metrics to do so. I welcome and encourage that kind of validation. Ultimately, you have to decide if the bet was successful.

(Spoilers: Hannibal’s bet isn’t going to be as successful as he was hoping.)

But the key idea of these product bets is that you don’t have to measure value. You only have to decide if the bet was successful. If it is, we get credit for the estimated value, not the actual value, which saves us a lot of time and trouble.

Estimating value allows us to be accountable without predicting specific dates and features.

Remember that the head of Sales is accountable for delivering a certain amount of new business every year. Let’s say it’s 10 million dollars. They’re going to deploy a certain number of sales people towards small-to-medium businesses, some towards mid-market, some towards enterprise. They’re going to conduct training and organize incentive programs. They’re going to get everybody fired up about how they need to sell, sell, sell! They’re going to monitor calls, check SalesForce, make sure people are following up.

But they’re not going to say, “Enterprise X is going to sign on date Y.” Because they can’t. The buyer’s going to go on vacation. Legal’s going to demand redlines. A year in advance, nobody knows when the contract will be signed, or if it will even be signed at all. But overall, they’ve got enough going on that they can say, “yes, we’re going to close $10mm in sales this year.”

The same is true of us. A year in advance, we don’t know which bets we’re going to do. We don’t know how much it’s going to cost to build them. We don’t know which ones are going to be successful and which ones are going to fail. But overall, we can say, “Yes, we’re going to deliver bets that are worth $10mm in estimated value this year.”

And that’s accountability.

[beat]

A picture of Eric Ries’ “Build-Measure-Learn” loop. It shows “Ideas” proceeding to “Build” proceeding to “Code.” “Code” proceeds to “Measure” proceeds to “Data.” Finally, “Data” proceeds to “Learn” back to “Ideas.” The center of the loop is labelled “Minimize total time through the loop.”

Wait a moment. “We don’t know how much it’s going to cost to build a bet?” How can we decide what to do if we don’t know how much it’s going to cost?

At this point, all we have is a headline. There’s no way for us to know how much it will cost, because we don’t know exactly what we’re going to build.

And if we’re doing Agile right, we will never know exactly what we’re going to build until after it’s done. As you all know, Agile software development is iterative and incremental. It’s a process of discovery.

I like Eric Ries’ characterization of this idea: we build, we measure, we learn, over and over again. And we don’t know what we’re going to do here [points at “build” step] until we know what happened here [points at “learn” step]. As long as we’re genuinely learning, we can’t know our costs in advance.

The product bet slide again, with another line added. It reads, “Maximum Wager: 5,000,000 shekels.”

What we can do, though, is put a maximum limit on how much we’ll spend. I call it the “maximum wager,” to continue with the betting theme. We track our spending, and if we’re not successful by the time we hit the limit, the bet has failed. We shut it down and move on to the next one. Or, at the very least, take a hard look at where things are at and decide on a new wager. As long as the total spending is less than the present value, it could still be a good investment.

The amount of the maximum wager is for your leadership team to decide. It’s not an estimate of cost. It’s a gut check about risk and value. The higher the value of the bet, the more you can wager. But you don’t want to wager so much that it would be crippling if the bet failed. Some bets will fail, and you’ll get nothing for your efforts. Success doesn’t mean fielding elephants. Success means winning a war with our elephants, and those Romans can be tricky.

The maximum wager is based on your leadership team’s gut feel of the risk and value involved. It’s not based on how much we think the bet will cost; it’s based on how much we’re willing to lose.

The build-measure-learn slide again.

And then we do our best to make sure that potential loss is minimized. We use the "Build, Measure, Learn" loop to validate whether the bet is going to be successful early on. Maybe one loop is focused on taking elephants up into the mountains to see how they handle the harsh conditions, and another loop dedicates a “red team” to see if they can be spooked into fleeing during battle.

It turns out they can. It would be nice to discover that early, not in the middle of battle with the Romans.

Although we in software are accountable for estimated value, not actual value, we only get to take credit for successful bets. It’s in our interest, and everyone’s interest, to weed out the unsuccessful bets early, so we can spend more time focusing on the successful ones. And so, we should design our build-measure-learn loops to test for failure as early as possible.

The product bet slide again, with nothing added.

With value and a maximum cost, we can perform an apples-to-apples comparison between bets and choose the one that seems like the best one to do next. Often, that will be the one with the highest value.

But don’t be fooled by all these numbers! They’re just estimates and guesses. A smart leadership team will go with their gut, not just follow the numbers like robots. The numbers are there to feed a conversation: to get people thinking. They’re not there to substitute for experience and judgment.

An elaborate image, in a medieval style, of an elephant, rabbit, and man working a complicated machine.

The big question: Does this work?

For me, so far, yes. It took me nearly two years to get my leadership team to really engage with this approach, and I needed the strong support of my CEO and CPO to get there. My CEO, in particular, had to get pretty insistent before people would engage.

The fact is, putting together bets, even such high-level ones, takes work. It also makes people accountable, by putting concrete numbers on previously-vague statements about value, and despite everybody’s desire for other people to be accountable, most leadership teams I’ve worked with aren’t really looking to take on more accountability themselves.

But, thanks to my CPO and CEO’s support, I can say that we are building software using product bets. We identified a handful to take to the leadership team earlier this year. They estimated the value, then chose a specific set of bets for us to pursue based on our capacity. It’s definitely elevated our conversation around product strategy, and I can see it getting even better as we gain familiarity with the approach.

What we haven’t done yet is finish any bets. We just started our first formal bets this year. So I can’t yet tell you how it will turn out.

What I can tell you is that I’m getting a lot less pushback than I used to about features and dates. The conversation is focused on bets, not features and dates, and when we talk about what folks want from Engineering, it’s less about, "tell me when you’re going to be done," and more about how we can take on more bets.

So, even though I haven’t yet used product bets to truly demonstrate accountability, they already seem to be helping.

Does it work? For me, so far, yes.

A slide summarizing product bets. It’s described in the text.

To summarize, we’re working on demonstrating accountability with product bets.

Specifically, we’re going to commit to delivering a certain amount of estimated value each year.

That estimated value comes from product bets. Each product bet is summarized by headline that focuses on a business result with a high level description of how we’ll achieve that result.

The bets to pursue are decided by the leadership team, and each bet has a leadership sponsor who champions it within that team.

Bets have an estimated value, and we focus on the estimate rather than trying to prove out actual value.

The leadership team also defines a maximum wager for each bet, which is based on a gut feel of risk and benefits, not costs, and together with the present value allows us to perform apples-to-apples comparisons of the bets.

A title slide reading “Quantifying the Bet.” It’s presented in the style of an illuminated medieval manuscript.

At this point, you might be wondering: where does that "present value" number come from?

The answer, like all things in business, is spreadsheets. Magical spreadsheets filled with arbitrary guesses.

The secret to spreadsheets is that they make our guesses look official. Professional. Good Business-y.

But seriously, yeah, spreadsheets. Let me show you.

A slide showing a spreadsheet calculating present value from future value. It’s described in the text.

Let me start out by explaining what “present value” is, just in case some of you aren’t familiar with it.

The core idea of “present value” is that money—let’s say $10—is worth more today than it is tomorrow. Today, I can buy a couple of candy bars with $10. In a few decades, I’ll only be able to buy half a candy bar due to inflation.

This is called “the time value of money,” but it’s very simple: money today is worth more than money tomorrow.

What this means is that earning $10 today is better than earning $10 next year, and even better still than earning $10 in two years. If inflation was 20%, $10 in future value next year would be equivalent to $8.33 in present value today. $10 in future value two years from now would be equivalent to $6.94 today. And so forth.

Of course, inflation isn’t 20%, thank goodness. But when your company makes an investment, they expect a certain return on that investment. The return they expect is called “cost of capital.” Your leadership team will tell you the cost of capital to use. It’s based on their judgment of how much they could get from using the money on other investments along with an adjustment for risk. For these examples, I’m arbitrarily choosing a 20% cost of capital.

The neat thing about cost of capital is that you can wager your entire present value and still get a good return on investment. As long as the bet is successful, even if you spend all of the present value, you’re still making money.

If you ask me for an investment and promise to return $10 to me today, $10 next year, and so on for the next three years, you’ll return $40 total. If my cost of capital is 20%, then I can look at the present value of each of those returns. It’s $10 today, $8.33 next year, $6.94 the following year, and so forth. Adding up those future returns gives me the total present value, which is $31.06, which means that I can invest up to $31.06 and still get at least a 20% return on my investment.

A slide labelled “Present Value Components.” The components are “Sales to new customers,” “Upsell of existing customers,” “Retention,” “Cost savings,” and “Expenditures.”

Okay, so that’s what present value is. Now, how do we determine what numbers to use?

As I said before—spreadsheets and guesses. You build a financial model that makes guesses about the future.

I’m going to share the model I used, but I have to be honest: I had a lot of trouble getting my leadership team to engage with product bets at first. In order to get this off the ground, I had to provide the financial model myself... and honestly, I think it could be a lot better.

We have a new CFO at OpenSesame, so I showed him about the model I’m about to show you. He said—this is a direct quote—“it’s an okay framework to start.” He also said, “come talk to me early when you start on the next set of bets.”

So, yeah. Thank you for coming to my okay talk. I’m sure it will be better next year.

In all seriousness, our CFO liked the general idea of product bets, and the categories I was using. He just thinks he can make the specifics more rigorous, which is great, and I’m looking forward to his help.

The fact is, it doesn’t really matter if the model is accurate or not. The important thing is to get people to engage with value rather than cost and dates being the primary driver of decision-making. You can use a rough, back-of-the-envelope model to get started. That’s what I did. As long as you’re consistent with your approach across bets, it’s still useful.

With that said, our product bets are broken down into five sections. Each one has its own little present value calculation.

There’s Sales, which represents the money we make from new customers as a result of the bet.

Upsell, which is the money we make from existing customers as a result of the bet.

Retention, which has to do with the fact that we sell subscriptions. Once we make a sale, we keep making money from that customer every year, so long as we can retain them. This is typical in the modern software-as-a-service world. So retention is a very important number.

Cost savings is reduction in spending, which counts as value, because spending $5 less on candy each year means I have $5 more in my pocket.

And then expenditures, which is additional spending we’ll incur as a consequence of the bet. For example, maybe I spend $5 less on candy each year, but I have to spend $1 every year on a budget tracking app that reminds me not to waste money on candy.

A logo drawn in a pseudo-medieval style. It reads, “War Elephants as a Service.” It shows an elephant with a castle turret on its back. The elephant has two trunks.

To illustrate these ideas, let me introduce you to my new employer: War Elephants as a Service.

We’re your one stop shop for all elephant-related warfare. We take care of the elephants, so you can take care of the invasion! Look at our glowing testimonials from top customers: Carthage... and Rome! Business is good. Or at least, it was. There’s not much demand for war elephants these days.

PS: Apologies for the mutant two-trunked elephant in the logo. Our ex-CEO tried to solve our financial problems with cost-cutting, so he replaced all of our graphic designers with AI. His last words as he was escorted out the building were, “I’ve made a terrible mistake.”

A slide showing the headline and sponsor for a product bet. It’s decorated with a cute baby elephant. The headline of the bet reads, “Open up new markets and improve retention with family-friendly elephant activities.” The sponsor is Babar, the CEO.

But we have new CEO now! Babar is our new “Chief Elephant Officer,” and he has an idea for keeping our business relevant in today’s fast-paced world. Since nobody seems to want war elephants any more, we’re going to switch from “war elephants” to “more elephants!” Elephant parades! Elephant-themed merchandise! And especially, cute baby elephants! Nothing says “more elephants” like an adorable fuzzy pachyderm.

Specifically, we’re going to open up new markets and improve retention by introducing family-friendly elephant activities. That’s our bet.

A spreadsheet labelled “Sales to New Customers.” It has four main rows: “Service Obtainable Market,” “Sales Rate,” “Future Value,” and “Present Value.” At the bottom are cells labelled “Cost of Capital” (which is set to 20%) and “Total Present Value.” The numbers are described in the text.

To quantify this bet, we’re going to look at the five categories I mentioned before: Sales to new customers, upsells to existing customers, retention, cost savings, and expenditures.

[points to “Service Obtainable Market” row] For new sales, we’re going to look at the “service obtainable market,” which is the total size of the market that we can reach for family-friendly elephant activities. Let’s say it’s 100 million dollars at the end of the first year, and grows over time as word gets out.

[points to “Sales Rate” row] Next, we’re going to estimate how much of that market we can capture. We face competition from zoos, but nobody has quite the expertise deploying large numbers of elephants that we do, so we’re going to say we can sell into 1% of the market, and that will also grow over time.

[points to “Future Value” row] Multiplying the service obtainable market by our sales rate of 1% gives us the amount we expect to make each year in future dollars. [points to “Present Value” row] Then we apply our present value formula at a 20% cost of capital and [points to “Total Present Value”] add it all up to get a total present value of nearly $5 million from new sales.

It all looks very official, doesn’t it? But how do we know it’s a 100 million dollar market? How do we know we can sell into 1% of it?

We don’t! It’s guesses. Educated guesses, maybe, but ultimately... guesses. That’s how these things work, and that’s why you need your leadership team to get involved. You can make your models more and more rigorous, but at the end of the day, somebody’s making their best guess, and those guesses should be overseen by the people in charge of those departments.

A spreadsheet labelled “Upsell to Existing Customers.” It has the same structure as the previous spreadsheet, but the numbers are different.

Next, we look at upsell. How many of our existing customers can we convince to try our new family-friendly elephant activities?

[points to “Service Obtainable Market” row] As before, we start with the total market that we can reasonably reach. This is the amount we think that our existing customers would be willing to spend on our new offering. In our case, it turns out our customers aren’t actually using their war elephants for war, but for things like parades. We think there’s a good $25 million to be made from our existing customers, and we don’t expect that to change much over time. To be clear, that’s not what we make from our existing customers, it’s the extra amount we think they’d pay for our new service.

[points to “Sales Rate” row] Then we look at our sales rate for that market. Given that our customers are already using their elephants for parades, we think they’re going to be pretty receptive to us providing services to support them. We estimate that we’ll be able to convert 5% of the upsell market, and that number will also grow over time.

[points to “Total Present Value”] Multiply the numbers, apply present value formula, and we have the total upsell value of $6.3mm.

A spreadsheet labelled “Retention.” It has the nearly same structure as the previous two spreadsheets, but first two rows are labelled “Customer ARR” and “Retention Change.”

Now let’s talk about retention. Our retention numbers have been pretty bad—as I said, countries don’t really need war elephants any more. [points to “Service Obtainable Market” row] But we still have a hundreds of millions of recurring revenue, even though it’s going down each year. That’s the ARR line—annual recurring revenue.

[points to “Retention Change” row] By pivoting from a focus on war to a focus on the military parades our clients are actually using elephants for, we think we can stem the bleeding a bit. Not much... about a quarter of a percent each year, going up slightly over time.

[points to “Total Present Value”] Multiply, present value, and there you have it. Three and a half million.

A spreadsheet labelled “Cost Savings.” It’s similar to the previous spreadsheets, with two major differences: the first two rows are labelled “Work Eliminated” and “Expenses Eliminated,” and the “Future Value” row adds those two rows together rather than multiplying them. All the dollar values are zero.

What about cost savings? [points to “Work Eliminated” row] Is this bet going to eliminate any of the existing work our employees do? Not really. [points to “Expenses Eliminated” row] Is it going to eliminate any expensive software subscriptions or other expenses? No, probably not.

[points to “Total Present Value”] Normally, we’d add up the cost savings and apply the present value calculation, but the numbers total out to zero in this case.

A spreadsheet labelled “Expenditures.” It only has two rows: “Future Value” and “Present Value.” (Like the other spreadsheets, it also has a “Cost of Capital” cell, which is set to 20%, and a “Total Present Value” cell.) All the numbers are negative.

And finally, expenditures. How much more are we going to spend as a result of this bet?

Well, there’s the cost of developing the bet itself, which is our wager, but we’ll bring that in later. In this section, we’re looking at the ongoing costs of running the program. [points to “Future Value” row] I’m going to hand-wave that a bit—you might have multiple line items here normally—but let’s just say it’s $2mm per year, going up as the program becomes more popular. Elephants aren’t cheap.

[points to “Total Present Value”] Present value, etc., gives us a total of $8.5mm in expenditures.

A summary spreadsheet labelled “Net Present Value.” This has a completely different structure from the previous sheets. It adds up the total present value of the previous bets, then the wager, to arrive at a net present value. The numbers are described in the text.

Bringing it all together, we have $5mm in new sales, $6.3mm in upsell, $3.5mm in improved retention, $0 in cost savings, and $8.5mm in expenditures. That comes to a total present value of $6.3mm before our development costs.

Now, how much do we want to wager on development? The leadership team thinks this is a slam dunk, and a way to save the business, so they’re going to wager nearly all of the value. Five million dollars. Remember, using cost of capital to determine present value means that we could wager the entire present value and still come out ahead... if the bet is successful.

That said, bets still have a risk of failure. Our leadership team is making some assumptions about how much people will be excited about baby elephants, so we’ll want to work incrementally and iteratively to test their assumptions early.

To summarize, the present value of the bet is based on sales to new customers, upsell to existing customers, change in retention, cost savings, and non-development expenditures related to those benefits.

The product bet slide again (the one with the cute baby elephant). The present value and wager numbers have been added to the headline and sponsor from before.

And that’s how we come up with the numbers in the product bet. To bring it back around, we’re betting that we can open up new markets and improve retention with family-friendly elephant activities. Babar is the sponsor for this bet and he thinks it’s worth $6mm in present value, and he’s willing to spend up to $5mm to try to make it work.

To calculate the value of those categories, we took a back-of-the-napkin approach where we estimated the size of the market and our ability to sell into that market. There’s certainly room for more rigor, and I encourage you to talk to your finance team about how to improve the model.

But do remember that it’s all still guesses at the end of the day. It’s better to have some model than a perfect model. The real benefit is in shifting the conversation from features and dates to about being accountable for value.

We may be a foreign country, but we can still speak our business partners’ language.

[beat]

But how do we get them to talk to us?

A title slide reading “The First Two Years.” It’s presented in the style of an illuminated medieval manuscript.

A leader I respect once told me, “You have 18-24 months after becoming VP of Engineering to make a difference. After that, the organization’s problems become your problems.”

I think he was right on target. As a leader, your colleagues in other departments will reserve judgement for the first six months or so. They’ll get impatient over the course of the next year. By the end of two years, they’ll be holding you accountable. If you don’t define what that looks like, they’ll define it for you, and they’re going to default to features and dates.

The problem with product bets, as an idea, is that they require leadership participation. You can’t create these spreadsheets on your own. Even if you did, nobody’s going to pay attention if you don’t have their buy-in. I’ve tried variants of the product bet idea many times over the years and getting that participation has been extraordinarily difficult. I’m a little surprised we’re able to do it at OpenSesame, to be honest.

Before you can get people to buy in to your definition of accountability, you need them to trust you. And in order for them to trust you, you need to be accountable.

A slide labelled “My Journey.” It has five steps: 1. Product-centric teams with FaST; 2. Results focus with VIs; 3. Reliability with forecasts; 4. Visibility with cost tracking; 5. Ongoing push for product bets.

I’m not sure how to solve this chicken-and-egg problem for your organization. I can tell you how I solved it for mine. Any change you introduce has to be in the context of your specific situation, so I’m not saying that you should do it my way. Some of my changes were pretty radical, and they’re not going to be a good idea for every situation.

We don’t have time to go into every detail, so this is going to be more of an overview than a how-to guide. I’ll provide resources for further investigation.

A QR code labelled “FaST.” The QR code’s URL is linked in the text.

QR Code: FaST: An Innovative Way to Scale

When I joined OpenSesame, I started by getting the lay of the land and deciding what to do. One of the things I saw was that the teams were heavily siloed by technology area, rather than by product line. Cross-team delays weren’t too bad, although they often can be in this situation, but it did mean that teams’ work didn’t line up to our business needs. So the first thing I did was to introduce Quentin Quartel’s Fluid Scaling Technology, or FaST.

We don’t have time to discuss FaST today, but you can learn more about my approach to it by following this QR code. The short version is that we combined teams into product-centric “collectives” and created a single queue of work for each collective. Each product has a dedicated collective and work queue. Those collectives self-organize into teams as needed to tackle the highest priority work.

FaST solved the problem of teams not matching business needs. A related problem was the teams planned their work in terms of technical priorities rather than business results. They called them “stories,” and “epics,” and recorded them in Jira, but they were more like technical tasks. At the same time that I introduced FaST, I also introduced the idea of “Valuable Increments” from my book. (In case it’s not clear on the slide, my book is The Art of Agile Development, and it’s now available in a second edition. You can find this material in the “Adaptive Planning” section.)

A valuable increment is a similar idea to an epic, in that it groups together multiple stories, but an “epic” is literally a “big story.” A valuable increment isn’t focused on size; it’s focused on value. Each VI is something that stands alone. When it’s done, you can release it, and you’ll have gotten value out of it even if you never work on anything related to it ever again.

Introducing FaST and VIs allowed me to talk in terms of the business results my teams were creating for each product line, not just their technical accomplishments.

I also knew, from experience, that one of my biggest battles was going to be around estimates and forecasting. Before I could gain the trust of the organization, I needed to be able to demonstrate that I could do what I said I would. Up to this point, their experience of software development was that we never delivered on time. At the same time, I didn’t want people to over focus on features and dates.

So I played a game that, to this day, I’m not sure was the right approach. I had my engineering managers start collecting data so we could provide more accurate forecasts. While they did that, I told teams to stop providing estimates to stakeholders.

This caused a lot of anger in my stakeholders. They didn’t like hearing that they couldn’t have estimates. I told them that our estimates weren’t accurate, and we were working on getting better information, but they still didn’t like it. I think I only got away with it because there had been high-profile failures with the old approach, and I was still in my honeymoon period, but it still caused a lot of friction.

It worked out in the end, I think, because the new forecasts really are much more reliable, but I had to collect data for about six months before I could provide the new forecasts. That was an uncomfortable period. I could have kept the old approach to forecasting, but it definitely didn’t work. I’m not sure if “wrong estimates” would have been better than “no estimates.” On the one hand, a clean break meant that it was obvious that I had switched to a new approach, and—as I said—it really works. On the other hand, I made some important members of the leadership team angry in the meantime.

Anyway, the way it works is that we get a “wisdom of the crowd” estimate for each VI before works starts. That involves a product manager providing a very brief description of what the VI involves—just a minute or two of verbal explanation. People can ask clarifying questions, but there usually aren’t many. Then everyone provides their gut feel of how long the work will take a team to accomplish, in weeks. We collect the answers without discussing them and record the median response. That’s the estimate. It only takes a few minutes per VI. Since our collectives have between 12 and 25 people, including managers, product managers, and designers, there’s enough people to make the “crowd” part of “wisdom of the crowd” work.

Our Wisdom of the Crowd estimates are stunningly accurate. The median estimate for a VI actually matches the median reality. It’s amazing. The approach comes from Quentin Quartel and his FaST method, and I’ve never seen anything so good. It’s easy and it’s accurate.

However, although Wisdom of the Crowd estimates are accurate, in aggregate, they’re not very precise. We graph estimates versus actuals—you can see it on the right there. About 30% of VIs take twice as long as estimated, and about 30% take half as long as estimated. That’s a pretty big range.

So we don’t present the raw estimates to stakeholders. If we did, we’d be late half the time. Instead, we increase the estimate so we’re early more often than we’re late.

Doing this requires me to play a political balancing act. According to our data, never being late would require us to multiply our estimates by six or seven, and that wouldn’t fly. We can’t tell them that a small, two-week VI is going to take 3-4 months. On the other hand, it’s also not acceptable to be late half the time.

Right now, I’ve chosen to be 75% accurate. In other words, we’re early 75% of the time and late 25% of the time. For us, that’s about a 2x multiplier, depending on the team. I’ve also told stakeholders to expect about 1 in 4 VIs to go longer than expected. So far, it’s working well.

If you’d like to know more about the analysis behind this technique, it’s in my book in the “Forecasting” section.

A slide labelled “Visibility,” with a graph showing how effort is spent over time. The graph has five sections: “Value Add,” “Bugs,” “Routine Maintenance,” “On Call & Incidents,” and “Deferred Maintenance.” The “Value Add” section is in blue, and is a small portion of the overall time. The other sections are in grey.

Collecting all that data for forecasting had a side benefit. My CEO pushed me to report productivity—that’s a whole ’nother story—and I decided to do it by reporting the percentage of time spent on muda versus the percentage of time spent on adding value to the business. Muda is activity that doesn’t add value. It’s the grey sections in the graph: maintenance, bugs, and on call.

This isn’t the real graph, for confidentiality reasons, but the story it tells is all too familiar: lots of time spent on deferred maintenance, lots of time spent on incidents, lots of bugs. And then just a fraction of time left over for doing valuable work.

I shared the real version of this graph with my leadership team and it was eye opening. All of the sudden, they understood exactly why things took so long, and why they didn’t ever get what they wanted. They had thought we had way more capacity than we actually did.

I told them that my responsibility was to reduce muda—the grey part—and make more room for valuable work—the blue part. That was an act of deliberate accountability, and it flipped the script. Yes, people still wanted me to be accountable for making teams deliver feature X on date Y, with all the fighting about deadlines that involves, but even more importantly, and primarily, I was accountable for decreasing muda. That’s precisely what I needed to be focused on, because that was our biggest problem.

And, over the past two years, that’s exactly what I’ve done. I report on my progress every quarter, and every quarter it’s a little bit better than it was before. And every quarter, I get a little bit less pushback on predicting dates.

A slide labelled “Push Push Push.” It has images of two books: “Fearless Change: Patterns for Introducing New Ideas,” and “More Fearless Change: Strategies for Making Your Ideas Happen.” Both are by Mary-Lynn Manns, Ph.D., and Linda Rising, Ph.D.

And then, finally, I just kept pushing. These two books are excellent resources on how to do so.

I introduced the original variant of the product bet idea in January 2024, or maybe even earlier. It didn’t go anywhere. I brought it up again in March 2024. We sort of tried it, without leadership buy-in, and it sort of fizzled. I brought it up again, and again. I worked with my colleague, the VP of Product. I talked to the Chief Product Officer. I included it in a presentation to leadership about how Agile works. I piggybacked on the CEO’s passion for quantifying results. I stopped asking Leadership to create financial models and just created my own, then asked them to fill in the values. (That’s why they’re not very rigorous.)

And then finally, in March of 2025, the stars aligned. The CPO started pushing the rest of the leadership team to get involved. We created five product bets, the leadership team filled in my spreadsheet, and we started working on the first bet. And now we’re off to the races. We just started our second bet a few months ago, and we’re talking about how to increase capacity for more bets.

There’s lot more to do, and lots more to learn, but now that the logjam has broken, I think it’s going to stick. Our new CFO is intrigued and I’m able to show steady progress with my VIs and forecasting techniques. I’m well on my way to erasing the stigma that engineering can’t be trusted to deliver. I had 18-24 months to make a difference. I’ve just passed my 2nd year at OpenSesame, and I’m still here. I think it’s going to work out.

A title slide reading “Conclusion.” It’s presented in the style of an illuminated medieval manuscript.

Software development may be a foreign country to the rest of the business, but we can still be a trusted part of their empire.

To do so, we have to take accountability, rather than allowing it to be forced upon us. Rather than falling into the habit of delivering X features on Y date, we can be accountable for what really matters: results, just like our colleagues in sales, marketing, and other parts of the business. And the results we create are new opportunities. Enabling more prospects. New partners. More leads. Better retention.

Product bets allow us to be accountable for the estimated value of those results. So far, they’ve been working for me. I hope they work for you, too.

The Best Product Engineering Org in the World [video]

April 18, 2025

I gave the opening keynote at the Regional Scrum Gathering Tokyo conference on January 8th, 2025. My topic was “The Best Product Engineering Org in the World:”

How do you create the best product engineering organization in the world? James Shore had to face that question in his new role as Vice President of Engineering. In the end, it came down to six answers, and six new questions:

People. We’d have the best people in the business, and we’d be the best place for them to work. But how do we compete with the reputation and money of Google, Apple, and Facebook?
Internal Quality. Our software would be easy to modify and maintain, with no bugs and no downtime. But what do we do about the software that’s not?

Lovability. Our customers, users, and internal consumers would love our products. But how do we decide what to work on?

Visibility. Our internal stakeholders would trust our decisions. But what do we do about their frustrations?

Agility. We’d actively seek out new opportunities and change directions to take advantage of them. But how do we create the technical ability to do so?

Profitability. We’d be the engine of a profitable and growing business. But how do we turn our engineering work into profit?

In this keynote, we’ll explore these six topics, the answers to these questions, and how they tie together to create the best product engineering org in the world.

Read the transcript here.

“Decoding Leadership” Podcast

April 18, 2025

Jade Rubick interviewed me for his “Decoding Leadership” podcast recently. We had a delightful conversation about scaling organizations. We talked about using FaST for increasing the size of teams, the role of management, and scaling development practices using player-coaches. It’s an engaging conversation that’s chock-full of interesting ideas. Take a look.

Upscale and Team Self-Selection

February 9, 2025

Brent Miller:

In April we would announce what we were doing for the conference in October... the year before Upscale we got zero done, which was just a terrible situation for us to be in.

...We started working on [Upscale] in the fall after the whiff on our conference, knowing we were going to roll it out in the spring—probably after we made the commitments for the fall—and we were going to have to do the organizational transformation, do all the change management, have everybody look around at the new world and be like, “How the heck does this work?” ...and still produce enough work to land at the conference and have a good talk.

And so the punchline is, we did the Upscale rollout, which culminates in the big self-selection event. We did that in April or May—I believe it was May. And then by October, we not only delivered all the big items, but a couple of extras that were not on the list.

...From one year to the next, we went from hitting zero of the big three to hitting at least the top ten plus extras, having [also] spent all the time and effort to do the transformation.

It was just a mind-blowing result at the end.

In this video, Jade Rubick and Brent Miller talk about Project Upscale, an intensive effort I led to reduce gridlock at New Relic. They look back at what worked, what didn’t work, and what they’ve learned. There are a lot of gems in here. Take a look.

The Best Product Engineering Org in the World

January 10, 2025

This is a transcript of my keynote presentation for the Regional Scrum Gathering Tokyo conference on January 8th, 2025. Watch the video here.

Introduction

People

Internal Quality

Lovability

Visibility

Agility

Profitability

“How are you measuring productivity?”

It was September 2023 and my CEO was asking me a question.

“How are you measuring productivity?”

It was September 2023, my CEO was asking me a question, and my position as Vice President of Engineering was less than three months old.

“How are you measuring productivity?”

It was September 2023, my CEO was asking me a question, my position was less than three months old, and I didn’t have an answer.

So I told the truth.

“How am I measuring productivity? I’m not. Software engineering productivity can’t be measured.”

It’s true! The question of measuring productivity is a famous one, and the best minds in the industry have concluded it can’t be done. Martin Fowler wrote an article in 2003 titled “Cannot Measure Productivity.” Kent Beck and Gergely Orosz revisited the question 20 years later. Kent Beck concluded, “Measure developer productivity? Not possible.”

My favorite discussion of the topic is Robert Austin’s, who wrote Measuring and Managing Performance in Organizations. He says a measurement based approach “generates relatively weak improvements“ and “significant distortion of incentives.”

How do I measure productivity? It can’t be done. At least, not without creating a lot of dysfunctional incentives.

But this isn’t a talk about measuring productivity. This is a talk about what you do, as VP of Engineering, when somebody asks for the impossible.

[turn right] “How are you measuring productivity?” [turn left] “I’m not. It can’t be done.” [turn right] “You’re wrong. I don’t believe you.”

I don’t... respond well to that sort of flat dismissal. I said some things that you’re not supposed to say to your CEO.

It was September 2023, my position was less than three months old, and it didn’t look like I was going to make it to the end of month four.

[beat]

Luckily, my CEO’s actually a pretty reasonable person. Our company is fully remote, so he invited me to come to his house next time I was in his city so we could discuss it face-to-face.

That gave me a month to cool off and think about what I wanted to say. I had an impossible—or at least, dangerous—request: measure productivity. Given that I couldn’t give my CEO what he wanted without creating dysfunction in engineering, what could I give him?

That’s what this talk is really about.

A slide with the text “The Best Product Engineering Org in the World” written in a bold white font on a black background.

The CEO, chief product officer, chief technical officer, and I met a month later. I said, “If we had the best product engineering organization in the world, what would it look like?” I walked them through an exercise right there on the CEO’s dining room table. It had a lot of index cards and sticky notes... of course!

A picture of a dining room table covered in cards and sticky notes.

In the end, we came up with six categories. Imagine we’re the best product engineering org in the world. What does that look like?

A slide with six categories written in a bold white font on a black background. The categories are: People, Internal Quality, Lovability, Visibility, Agility, and Profitability.

For us, it means these six things.

People. We’d have the best people in the business, and we’d be the best place for them to work. They’d beg to work for us, and people who left would try to replicate their experience everywhere they went.
Internal Quality. Our software would be easy to modify and maintain. We’d have no bugs and no downtime.
Lovability. Our customers, users, and internal consumers would love our products. We’d excel at understanding what stakeholders truly need and put our effort where it mattered the most.
Visibility. Our internal stakeholders would trust our decisions. Not because we’d be perfect, but because we’d go out of our way to keep them involved and informed.
Agility. We’d be entrepreneurial, scrappy, and hungry. We’d actively search out new opportunities and change direction to take advantage of them.
Profitability. We’d be the engine of a profitable and growing business. We’d work with our internal stakeholders to ensure our products were ready for the real world of sales, marketing, content, support, partners, accounting, and every other aspect of our business.

Are we the best product engineering org in the world today? No. Will we ever be? Probably not.

But we don’t need to be. It’s not about literally being the best product engineering org in the world. It’s about constantly striving to improve. These six categories are the ways we want to improve.

What does this mean for you? If you did this exercise with your leadership team, you’d probably get different answers. I’m not saying that our categories is right for everyone.

But it’s still an interesting thought exercise. We’re an organization that’s steeped in Agile thinking. These six categories may not be exactly what your org would use, but these six—People. Internal Quality. Lovability. Visibility. Agility. Profitability—these are worth investing in. I’m going to talk about what we’re doing in each of these six categories. If you’re a senior manager, some of these techniques might be worth using.

If you’re not a senior manager, these are techniques you could potentially take to your managers. Agile only succeeds if the organization really gets behind it. You can share these ideas as an examples of what to do to support your Agile teams.

Let’s dig in.

People

A cute scene of chibi people and animals working together at a table.

Everybody wants the best people in the business. But our company is relatively small. We can’t compete with the likes of Google, Amazon, Apple... the FAANG companies. They’re looking for the best people, too, and they have way more money than we do.

An image of a lone chibi programmer working in the dark.

But we can still get the best people in the business. That’s because we define “best” differently than they do. They’re looking for people who went to prestigious schools, who have worked for other FAANG companies, who can solve Leetcode problems in their sleep.

We don’t want those people.

An image of a two chibi programmers working together at a desk.

We’re an inverted organization. That means that tactical decisions are made by the people who are doing the work, not managers. (In theory, anyway, we’re not perfect.) So we’re looking for people who have peer leadership skills, who are great at teamwork, who will take ownership and make decisions on their own.

And we’re an XP shop. We use Extreme Programming as our model of how to develop software. As it turns out, XPers love teamwork, peer leadership, and ownership. They also love test-driven development, pairing, continuous integration, and evolutionary design. They tend to be passionate, senior developers. And they’re dying to be part of an XP team again.

You see, Extreme Programming is too... extreme... for most companies. Just like real Agile and real Scrum is too extreme for most companies. How many times have you seen Scrum used as an excuse for micromanagement, or a senior leader tell you that you have to be Agile and give them a detailed plan for what your team is going to do over the next year?

In other words, there aren’t many companies using XP. There are a lot of great people who wish they could use XP. We have our pick of top-quality candidates. And, as a fully remote company, we have a lot of flexibility in where we hire.

An image of a chibi manager mentoring a young programmer.

I said we’re an XP shop, but that’s not exactly true. The founders were immersed in XP, and XP is where we want to return, but there was a period of time where the company grew quickly and lost that XP culture. We have a bunch of engineers who don’t have the XP mindset. We need to bring them on board, too.

This is a matter of changing organizational culture, and organizational culture isn’t easy to change. Our engineering managers are at the forefront of that effort.

To help them along, we’ve revised our career ladder. This is the document that describes what you need to do in order to be promoted. The old career ladder emphasized understanding advanced technologies and building complex systems. The new one emphasizes teamwork, peer leadership, ownership, and XP engineering skills such as test-driven development, refactoring, and simple design.

QR Code: Career Ladder

This is what it looks like. It’s a big spreadsheet which describes each title in our engineering organization, along with the skills required to reach each title.

For example, Associate Software Engineers are hired fresh out of university. They’re only expected to have classroom engineering skills. They contribute to the team with the help of other team members.

Mid-level software engineers are expected to be able to contribute to the team without explicit guidance. We expect them to have basic communication, leadership, product, implementation, design, and operations skills.

Senior software engineers are expected to have the advanced version of those skills; technical leads are expected to mentor and exercise peer leadership; and so forth.

As I said, each level defines sets of skills. For example, associate software engineers are expected to be fluent at the skills in classroom engineering, which includes object-oriented programming, following direction as part of a pair, basic debugging skills, and basic function and variable abstraction.

Mid-level software engineers are expected to be fluent at basic communication, which includes skills such as working collaboratively with other team members, disagreeing constructively, building on other people’s ideas, and so forth.

There’s more details here than I can explain today, but you can use the QR code to find a detailed article, including the documentation we use for the skills.

Today, I’d like to highlight a few skills that I think are particularly important.

A slide labelled “Communication & Teamwork.” It shows a progression of skills. Associate Software Engineers have the skill “Active participation.” Software Engineers have “Active listening.” Senior Software Engineers have “Ensure everyone’s voice is heard.” Technical Leads have “Psychological safety.”

First: communication and teamwork. Before I joined, work was assigned to individual engineers. They would go off and work for a week or two, then come back with a finished result.

Now, rather than assigning work to individual engineers, we assign it to teams. (I’ll talk more about how teams are defined later.) We expect those teams to take a valuable increment, go off and work on it together, including collaborating with product management and stakeholders to understand what needs to be done, and to take responsibility for figuring out how to work together as a team.

This is a big cultural shift! It’s uncomfortable for some folks. To help change the engineering culture, we’ve defined a lot of skills around communication and teamwork. This is just one example.

A new engineer is expected to participate actively in team conversations. Then, as they grow, they’re expected not only to share their perspective, but to actively work to understand other people’s perspectives as well.

As engineers grow further, into a senior role on the team, they’re expected to pay attention to who is participating and who isn’t, and make sure there’s room for everybody to speak and be heard.

And ultimately, as the team’s most senior engineer, they’re expected to actively work with management to create an environment where people feel safe speaking their mind and expressing disagreement.

As a reminder, what I’m trying to do here is to change the engineering culture at my company. One of the ways I think the culture needs to change is to have more team work and less individual work. This ladder of growing expectations and responsibility is one of the ways I’m encouraging those changes.

A slide labelled “Peer Leadership.” It shows a similar progression of skills. Associate Software Engineers have the skill “Follow the process.” Software Engineers have “Team steward.” Senior Software Engineers have “Peer leadership.” Technical Leads have “Leaderful teams.”

Let me show you another example.

If we want to delegate decisions to the people doing the work, and we do, then peer leadership skills are essential. Peer leadership is the ability for everyone on the team to take a leadership role, at appropriate times, according to their skills and the needs of the team, regardless of titles.

We have many leadership skills, but one path starts with our most junior engineers, with a skill called “Follow the process.” But this skill isn’t just about following our existing process; it’s also about working with the rest of the team to adjust the process, or make exceptions, when the process isn’t a good fit for the situation.

As engineers grow, they start to take on explicit leadership roles. One of those roles is “team steward.” Each team has a “team steward” who’s responsible for defining how the team works together and keeping everyone aligned. This is a role we expect our engineers to take on early in their careers, to start building their leadership muscles.

Senior engineers are expected to have a more nuanced and fluid understanding of leadership. They’re supposed to understand that leadership isn’t about who’s “in charge”—who’s been formally identified as a leader—but instead about reacting to what the situation demands and following the lead of the people who know the most about the situation. They’re expected to identify and follow the people who are best suited to lead in any given situation, and build their own ability to do so, regardless of formal leadership roles.

Our most senior engineers take this one step further. They work with managers to understand the leadership skills of everyone on the team, the leadership skills that are missing or that need to be developed further, and to help team members grow their peer leadership skills where they’re most needed.

Another slide showing skill progression. This one is labelled “Ownership.” Associate Software Engineers have the skill “Intrinsic motivation.” Software Engineers have “Scut work.” Senior Software Engineers have “Critique the process.” Technical Leads have “Impediment removal.”

Similarly to leadership, if we’re going to delegate decisions to team contributors, then we need them to take ownership of creating great results.

One of these paths starts with intrinsic motivation: the idea that our engineers are motivated by the joy of engineering and working with a great team. We don’t want people who have to be constantly monitored by a manager, but who put in their best effort because that’s the kind of person they are.

Before they can advance out of a junior role, they have to have the maturity to take on the unpleasant tasks that exist on every team. In English, this is called “scut work”—the tedious, disagreeable chores that have to be done.

Scut work isn’t something that only juniors do. It’s something everyone has to do. What we’re looking for is the ability to take it on without being asked. We’re looking for the ability to recognize and take responsibility for things that need to be done, even if they aren’t the most fun.

Every engineer participates in the teams’ retrospectives, but to be a senior engineer, they have to do more than participate. They have to take ownership of making improvements. Our senior engineers are constantly identifying and proposing tweaks to improve how teams work together and how they interact with people outside the team.

And our most senior engineers take it one step further, identifying impediments to the team’s success that are outside of the team’s control and working with management and other leads to remove them.

A skill progression slide labelled “Design.” Associate Software Engineers have the skill “Function and variable abstraction.” Software Engineers have “Class abstraction” and “Function and variable refactoring.” Senior Software Engineers have “Simple design” and “Class refactoring.” Technical Leads have “Risk-driven architecture” and “Architectural refactoring.”

We also put a lot of emphasis on XP skills, and particularly on simplifying our design. This is one example.

Junior engineers are expected to know how to use functions and variables to make code more readable.

As they grow into mid-level engineers, they learn how to refactor those functions and variables to improve the abstractions, and they also learn how to create appropriate class abstractions.

Senior engineers know how to refactor those class abstractions, and they use that skill to simplify the design of the system. It’s common for an initial design to be overly-complicated, so it’s important for people to be paying attention to how to simplify it over time.

This emphasis on simple design is the opposite of what I’ve seen in some companies. In some companies, the more senior you are, the more complicated your designs are expected to be. But we do the opposite. We think complexity is easy, and it’s simplicity that’s hard, so we expect our more senior engineers to produce simpler designs.

And finally, our most senior engineers understand how to refactor the system as a whole, and how to prioritize those refactorings according to the risks and costs of change.

To recap, our career ladder is a tool for cultural change. We’re using it to move from being an organization that prized individual work, advanced technologies, and complex systems, to one that focuses on teamwork, peer leadership, ownership, and simplicity.

We launched the new career ladder in June of last year—about six months ago. It seems to be working. My managers tell me that they’re seeing shifts in behavior, with people volunteering to lead meetings and take on work they didn’t before. We’ve also been able to use the new career ladder as a touchstone for people who are having performance problems.

A cute image of chibi programmers working in pairs in an Extreme Programming team room.

Of course, the career ladder isn’t enough on its own. It helps people know what’s expected of them, but it doesn’t do any good if people don’t know how to perform these skills.

To help out, we’re supporting the career ladder changes with an XP coaching team. Every open headcount I’ve gotten since I joined has gone towards hiring very senior XP coaches. These are player coaches who get hands on with the code and lead by example. They work alongside the rest of the engineers, demonstrating as part of the team’s normal work how XP works and why it’s such a great way to work.

We’re at a ratio of about one XP coach for every 11 engineers, which isn’t quite enough yet. But it’s enough that we can start developing coaches internally rather than hiring externally. And, of course, as additional positions open up, we’ll be hiring people who already have XP skills, although not necessarily at the same level of seniority.

When I look at how other companies approach this problem, the main thing I see is a lack of commitment. They’ll have an “Agile Center of Excellence,” but the ratio will be closer to 1 coach for every 50 or 100 engineers. Those coaches often aren’t engineers, so they can’t lead by example. And even if they could, they’re spread too thin. The best way to learn XP is to be immersed in it, day in, day out, on your real-world work. You need a coach working alongside you as you learn. With ratios of 1 to 50 or worse, there’s just no way for that to happen.

As I said, we have a ratio of 1 to 11 XP coaches to engineers, and I would like it to be closer to 1 to 6. Our initial coaching hires are jump-starting that path, and we’re training people internally to get the rest of the way.

A slide summarizing the “People” topic. It says, “Define best differently,” with four bullet points: Teamwork, Peer leadership, Ownership, and XP skills. It also says, “Change company culture,” with two bullet points: Career ladder and XP player-coaches.

People are the life blood of any organization, and they’re particularly important in an Agile organization, where so many decisions are made by the team contributors, not managers.

If we were the best product engineering org in the world, we’d have the best people in the business, and we’d be the best place for them to work. To get there, we’re defining “best” differently than other companies. We’re looking for teamwork, peer leadership, and ownership. We’re attracting people who love XP and emphasize simple, clean design rather than algorithms and complex solutions. And we’re changing our company culture with a new career ladder and player-coaches who lead by example.

Internal Quality

A cute image of a chibi programmer working intently, with springs and gears flying out of the computer.

I don’t speak Japanese, but I do have a favorite Japanese word: muda. I learned about muda from the Toyota Production System.

Muda is my biggest problem, and it’s the biggest problem at many companies I know. Let me give you an example.

A graph showing how effort is spent over time. The details are described in the following text.

This graph shows five months of engineering effort on a product. This isn’t real data, for confidentiality reasons, but it’s based on my real-world experiences.

On the X axis, we have months of data. On the Y axis, we have the amount of time people spent on various types of work.

Over these five months, the example team spent about 35% of their time on deferred maintenance. They had a key technology that they hadn’t kept up to date, and the vendor was dropping support, so they had to put everything else on hold to replace it.

They spent about 25% of their time on call and responding to production incidents.

They spent about 5% of their time on routine maintenance.

They spent about 20% of their time fixing bugs.

And only about 15% of their time on doing things that added new value to their business.

Let me put it another way: if this fictional organization had spent one million dollars on development during this time, only $150 thousand would have been spent on things their business partners really valued. The other $850 thousand would have been wasted. It was necessary, but not valuable. Muda.

And that’s why “muda” is my favorite Japanese word. It’s the thing I need us all to fix.

A slide with three categories written in a bold black font on a grey background: “Complexity,” “Slow Feedback Loops,” and “Deferred Maintenance.”

Why is there so much muda? I see three common problems:

Complexity
Slow feedback loops
Deferred maintenance

They’re often lumped together as “technical debt” or “legacy code,” but each is its own problem. Let’s take a closer look at each one.

Complexity

Complexity is the result of having lots of different systems. It’s hard for any one developer to understand how everything works, so they have to work very slowly and carefully, and even then, you still get bugs and production incidents.

Our systems don’t have to be that complicated. In the rush to deliver features, people chose complicated technologies that promised fast results. This has been repeated many times. Each technology requires a bunch of expertise, and so it’s become impossible for any one person to be an expert in all of them. It’s become difficult to make those tools do exactly what we want, too, and it’s hard to make them work well together.

This is a fundamental mistake I see a lot of companies making. When they’re deciding how to deliver a feature, they focus on how much it will cost to build a feature. They don’t think about how much it will cost to maintain the feature. They choose solutions that are easy to build, but hard to maintain. But the majority of software development costs are maintenance costs, not build costs. Neglecting maintenance costs puts them in a difficult position.

A repeat image of chibi programmers working in pairs in an Extreme Programming team room. On closer inspection, the image has a number of oddities, including a plant that’s floating in mid-air, a character with his head on backwards, and a character with a coffee cup on his head. The image credit reads, “ChatGPT/Dall-E, 2024.”

In 2025, we can’t talk about development costs without also talking about AI. Don’t get me wrong! AI is a great tool. I used it for the images in this talk, and it allowed me to add character and interest that I otherwise wouldn’t be able to add.

But remember this image? Take a closer look at the character in the middle.

A zoomed in view of the character with the coffee cup on his head.

Did you wonder why he has a coffee cup on his head?

That’s because...

The same image with the coffee cup removed, revealing an eyeball in the character’s hair.

...he has an eyeball in his hair.

[beat]

A chibi image of a person with a skeptical look on his face. His arms are crossed and a coffee cup is resting on his elbow near his hand. The image credit reads, “ChatGPT/Dall-E, 2024.”

Or how about this character?

The same image with the coffee cup removed, revealing a third hand.

He’s hiding a third hand.

My point is that these tools are never going to be as good as the people selling them want you to believe. They’ll get better, but they won’t be perfect.

The problems with image generation are fairly obvious. The problems with AI in code are more subtle, and they come down to that tradeoff between speed of building and cost of maintenance.

You can build code quickly with AI, but getting it all to work together nicely is harder. If you’re using AI to write code, are you considering how you’ll maintain that code? Are you considering how you’re developing the skills of your junior engineers?

You can also build features that use AI, such as automatic content generators. It’s pretty easy to do, actually. But fine-tuning those prompts is tricky, and it takes a lot of manual effort to get them just right. Have you considered how you’ll keep those prompts up to date as the AI engines change out from under you? Have you thought about how you’ll find out when those fine-tuned prompts aren’t working like they’re supposed to?

Slow Feedback Loops

Ultimately, complexity comes from teams that prioritize building over maintaining, and the costs of doing so are devastating. Now let’s look at another source of muda: slow feedback loops.

Feedback loops are about how effectively developers can work. After an engineer makes a change, they have to check to see if that change did what they intended. How long does that take? That’s your feedback loop.

If it takes less than a second, then they can check every single change. Every line of code, even. This is what test-driven development is all about, and it’s an amazing way to work. Let me show you what it looks like.

QR Code: Fast Tests

This is the full build for a real production system. The system is on the small side, but it’s over 12 years old, so it’s had the opportunity to accumulate some technical debt. Let’s see how long the build takes. Don’t look away—this won’t take long.

[play video]

That’s it! Just over eight seconds.

For context, most organizations I meet are happy with a build that takes eight minutes, not eight seconds, and most are much, much slower.

A screen capture from the video, showing the “unit tests” portion of the build. The build ran 1,342 tests in 2.45 seconds, averaging 1.8ms per test.

A big part of the reason this build is so fast—or rather, why most builds are so slow—is the tests. Most teams have slow, brittle tests. This codebase has over 1300 tests, but they only take two and a half seconds.

Describing how to achieve these sorts of fast tests is a whole talk of its own, but I have a lot of material on this topic. You can find it at jamesshore.com/s/nullables, or just follow the QR code on the slides.

Eight seconds is a nice, fast build, but it’s actually still too slow for a great development experience. In a perfect world, we want engineers to be able to check their work at the speed of thought. If they can check every single line of code they write, as soon as they write it, finding bugs becomes easy: you make a change, run the build, and immediately find out if there was a mistake. There’s no need to debug because you know your mistake is in the line of code you just changed.

To get these kinds of results, you need your build to be less than five seconds—preferably less than one second. This isn’t a fantasy! It’s possible to do this with real production code. Let me show you:

[play video]

That’s less than half a second each time the build runs.

There are a few tricks here. First, the build automatically runs when the code is changed. Second, the build is written in a real programming language, and it stays in memory. It’s able to cache a bunch of information, such as the location and age of all the files, the relationships between files, and so forth. So when it detects a change, it doesn’t have to scan the file system again, and it only runs the tests on the code that’s changed.

[beat]

I have to be honest. Some of our code has these sorts of feedback loops. But for the systems with the most muda, it takes much longer than a second to get feedback. Sometimes it can take tens of minutes just to get the computer into a state where a manual test is even possible. So people don’t check every change. They batch up their work, test it all at once, and then have to go through long, tedious debugging sessions to figure out the cause of each error. Some errors aren’t caught at all. That leads to muda, and that’s why fast feedback loops are important.

Deferred Maintenance

A third issue that leads to muda is deferred maintenance. Deferred maintenance is really a consequence of the other two problems. If the system was simple, you could upgrade critical dependencies easily. But most companies have a lot of complicated dependencies, and some of their updates require major rearchitectures.

Similarly, if the feedback loops were fast, you could make changes quickly and safely. But most companies’ feedback loops are slow, so making changes take a long time.

Major rearchitectures plus slow changes means that upgrading dependencies can take weeks or months of effort. Now you have to make tough prioritization decisions. Do we build an important new feature? Or do we upgrade a component for no visible benefit? Business partners often choose to defer the maintenance. I can’t really blame them. But that deferred maintenance compounds, things get even more expensive to upgrade...

A repeat image of the graph showing how effort is spent over time.

...and eventually the bill comes due.

A repeat image of the slide with three categories written in a bold black font on a grey background: “Complexity,” “Slow Feedback Loops,” and “Deferred Maintenance.”

What do you do about such a difficult set of problems? Complexity, slow feedback loops, deferred maintenance. These problems are common. Usually you hear people talking about “legacy systems” and “technical debt.” Whatever they call it, the underlying problem is the same: low internal quality. High muda.

A slide labelled “Fixing Internal Quality.” It has four bullet points: “Big bang rewrite,” “Modular rewrite,” “Change-driven rewrite,” and “Improve in place.” The first two (“Big bang rewrite” and “modular rewrite”) are labelled “DANGER.”

I’ve seen four approaches to fixing systems with low internal quality:

Big-bang rewrite
Modular rewrite
Change-driven rewrite
Improve in place

Be careful: The first two approaches are popular... and usually fail.

A slide labelled “Big Bang Rewrite (Dream).” It shows two teams replacing a messy plate of spaghetti with a clean set of shiny dishes.

In the “big bang” rewrite, you start up a new team to write a new version of the software. Meanwhile, the old team keeps maintaining the old software: adding features, fixing bugs, and so forth. When the new software is done, the old software will be retired, and the new software will take its place.

In this slide, the old system is represented by the spaghetti in the top row, and the new system is represented by the clean, shiny dishes in the bottom row.

A variation of the previous slide labelled “Big Bang Rewrite (Reality).” The messy plate of spaghetti has turned into an even bigger mess, and the clean set of shiny dishes is shown to be missing.

This sounds nice in theory, but what really happens is the rewrite always takes much longer than expected. Meanwhile, the original software keeps getting bigger and bigger, and the mess gets worse and worse, because people think it’s going to be thrown away.

The replacement keeps taking longer than expected. You’re spending twice as much money, because you’re running two teams, but not getting good results for your money. So the replacement is either cancelled or rushed out the door. Customers are unhappy because it doesn’t do everything the old system did, and your engineers are unhappy because, in the rush to get the replacement done, they made a mess. It’s better than the old system is, but not that much better. It’s going to need another rewrite soon.

A repeat of the slide titled “Fixing Internal Quality.” The first bullet point, “Big bang rewrite,” has been crossed off.

In other words, “big bang” rewrites are dangerous. They should be avoided.

Another option, not quite as common, is the modular rewrite.

A slide labelled “Modular Rewrite (Dream).” It shows two teams converting a messy plate of spaghetti into a clean set of shiny dishes, one piece at a time.

A modular rewrite takes a big existing system and identifies pieces that can be split off and rewritten. Then each piece is rewritten, one at a time. This might be done by a separate team, but sometimes it’s done by the same team. Over time, the whole system is replaced, without the risk of a big-bang rewrite.

A variation of the previous slide labelled “Modular Rewrite (Reality).” Only the edges of the spaghetti have been converted into shiny dishes, and ugly connecting arrows go back and forth between the spaghetti and dishes.

But, as always, things are harder than expected. You end up chipping away at the easy edges of the system, but the big complicated core stays just as big and complicated as ever.

And, as always, other priorities intervene before you can finish. Now, instead of one complicated system, you have multiple complicated systems, all interfacing with each other in confusing ways. If you’re not careful, you end up with a bigger, uglier mess than you started with.

A repeat of the “Fixing Internal Quality” slide. The second bullet point, “Modular rewrite,” has also been crossed off.

Modular rewrites are safer than big-bang rewrites because they work in smaller pieces, but they suffer the same problem: everything is bigger and more complicated than expected, and if you stop part way, you’re left with a mess.

A slide labelled “Change-Driven Rewrite.” It shows a single team converting a messy plate of spaghetti into a clean set of shiny dishes, but in very tiny steps. Only a small portion of the spaghetti has been converted. Two connecting arrows go back and forth between the spaghetti and the dishes.

The trick to a rewrite is to realize that you can never compete with features. Instead of establishing a rewrite team, continue with a single team. Instead of prioritizing rewrite work, continue to prioritize features and bug fixes.

But... whenever you make a change, migrate the code you’re changing to the new system. Don’t describe it as a rewrite; it’s just part of the work to complete the new feature. This way, you don’t have to compete for budget, and you don’t have to justify the cost of the rewrite. You just do it as part of your normal work.

This approach is slow. It will take years to complete. But thanks to the Pareto Principle—the 80/20 rule, which says that 80% of the changes to the system will occur in 20% of the code—you don’t have to rewrite everything to see a benefit. You do need to commit to seeing it through, but it’s easier to do so when people aren’t breathing down your neck asking when the rewrite will be done.

In this approach, you don’t add a new team. You can still add people, if you want, but you add them to your existing team, and use them to develop new features... migrating code to the new system as you go.

A slide labelled “Improve in Place.” It shows a single team gradually transforming a messy plate of spaghetti into a clean set of shiny dishes without creating a new system.

But even better than a rewrite is not rewriting at all. Instead, improve your existing system in place. As you work on features and bug fixes, add tests, clean up your automated build, and file off the rough edges. It will never be perfect, but the Pareto Principle will kick in, and the parts of the system you work with most often will be the parts you improve the most.

Remember that eight-second build I showed you? That’s a 12-year-old codebase, and it wasn’t always that smooth. Ten years ago, it was kind of a mess. But steady, consistent effort to improve it in place means that, today, it’s a pleasure to work in, and better than it’s ever been.

A repeat of the “Fixing Internal Quality” slide. The fourth bullet point, “Improve in place,” has been marked “BEST.”

If you have internal quality problems, improve your existing systems in place. That takes specialized skills, so you might need to hire people for those skills. Personally, I hired a bunch of Extreme Programming coaches, and we’re doing a lot of training.

Sometimes, you can’t improve in place. If you want to change fundamental technologies, such as the programming language a system uses, or a core framework, you may not be able to improve the existing system. In that case, you can do a change-driven rewrite, and migrate code to a new system as part of your work on the old system.

Modular and big-bang rewrites can work, but they’re dangerous. Modular rewrites risk leaving you with a bigger, more complicated mess than before. Big bang rewrites risk leaving you with a half-baked product and angry customers. Although they can work, the risk is high, and I don’t recommend them.

If we were the best product engineering org in the world, our software would be easy to modify and maintain, we’d have no bugs, and we’d have no downtime.

If only that were true. We’re looking at three things:

Simplifying our technology stack. We’re focusing on the cost of maintenance instead of the cost to build.
Improving our feedback loops. We’re building systems that allow developers to check their work in less than five seconds, preferably less than a second, and introducing test-driven development.
No longer deferring maintenance. When a dependency has a new version, we want to upgrade it immediately. Don’t wait. Don’t allow it to be a prioritization decision. Make it a requirement and stop the line. We’re not there yet, but as our technology stack becomes simpler and our developer experience better, those upgrades will get easier.

And for the existing systems, where it’s not as easy as we’d like, we’re avoiding big rewrites. We’re improving our systems in place, where possible, and undertaking a change-driven rewrites where not.

Lovability

A cute picture of a chibi vampire hugging a computer.

Jeff Patton is here this week! He’s giving tomorrow’s keynote, and from previous experience, I can tell you that he’s not to be missed.

Jeff can tell you much more about making software people love than I can. So I’m going to talk about how we make time to put Jeff’s ideas into practice.

We just talked about internal quality and reducing muda. The better your software’s internal quality, the faster your teams will go. So investments in internal quality are easy: as much as you can afford. The main challenge is managing cash flows and balancing that investment with forward progress.

Lovability is external quality. It’s about building software that our users, buyers, and internal stakeholders love. We’re going to put all our remaining capacity towards external quality. But there are always more ideas than time to build them all. So lovability is also about understanding what stakeholders really need and putting our limited capacity toward what matters most.

A waterfall-style diagram showing a progression starting with “Analyze market” and proceeding to “Define exactly what to build,” then “Build it,” and finally “Profit!”

People have understood this for a long time. Back in the days before Agile, companies would put immense amounts of effort into requirements analysis in order to make sure they were building the right thing. They would analyze the market, define exactly what to build, and then build it.

Those of you who weren’t around in the 90’s might think this “waterfall” idea is just a fable—a straw man trotted out by Agile proponents to prove that they’re better than the old way. Surely no one really worked that way!

But they did... and a lot of companies still do.

They say they’re “Agile,” but when you look at how they plan, what they actually do is... analyze the market, decide exactly what to build, and then build it. The only difference is that they don’t make requirements documents any more. Instead, they make Jiras. They chop up their requirements documents into lots of itty-bitty pieces, and then they move those pieces around a lot.

A slide labelled “Project-Based Governance (Avoid).” It says, “1. Build the plan; 2. Work the plan; 3. Track progress vs. plan.” Success is defined as “On Time, On Budget, and As Specified.”

This approach is called “project-based governance.” You create a plan, then you work the plan. If you execute the plan perfectly, coming in on time, on budget, and as specified, you’re going to be successful.

At least, that’s the theory.

As Ryan Nelson wrote in CIO Magazine in 2006:

Projects that were found to meet all of the traditional criteria for success—time, budget, and specifications—may still be failures in the end because they fail to appeal to the intended users or because they ultimately fail to add much value to the business… Similarly, projects considered failures according to traditional IT metrics may wind up being successes because despite cost, time or specification problems, the system is loved by its target audience or provides unexpected value.

CIO Magazine, Sep 2006

A slide labelled “FBI’s Virtual Case File System.” It shows the following timeline: June 2001: Launch; Nov 2002: Requirements established; Dec 2003: Delivered—“The FBI immediately discovered a number of deficiencies in VCF that made it unusable”; Feb 2005: Cancelled. It closes with “$170 million spent; $104.5 million ‘completely unrecoverable.’”

One of my favorite stories about how this approach fails is the FBI’s “Virtual Case File” system, because there was a US Senate investigation, and we have a lot of details about what happened. It’s unusual in how high-profile it was, but the story is very typical of its time.

In June 2001, the FBI launched the Virtual Case File project.

17 months later, they had established “solid requirements.” Seventeen months! They knew exactly what their users needed—or thought they knew—and had a detailed plan.

A year after that, the project was delivered, and it didn’t work. The FBI “immediately discovered a number of deficiencies in VCF that made it unusable.”

In 2005, it was officially cancelled, and the director of the FBI appeared before a Senate subcommittee to explain how the FBI had managed to waste $104.5 million of taxpayers’ money.

The problem wasn’t that the software didn’t function; the problem was that all that detailed planning resulted in the wrong thing. The software didn’t meet the FBI’s needs.

A repeat of the “Project-Based Governance” slide.

Before you say, “that was then... we’re Agile,” look at this list again. How does your company manage projects? Do they define success as delivering on time, on budget, and as specified? Do they ask you to prepare a plan, and then track progress against that plan?

[beat]

A repeat of the waterfall-style diagram. The first two steps of “Analyze market” and “Define exactly what to build” are labelled “Wishful thinking.”

The problem with this approach is that we can’t predict what our customers and users really want. We can only guess. Some of those guesses are right; some are wrong. We have to conduct experiments to find out which ones are really worth pursuing.

One of my favorite expressions of this idea is Eric Ries’ “Build, Measure, Learn” loop. We have an idea about what customers might love. We build a simple experiment that lets us test that idea—the smallest, simplest experiment we can think of! It might not even be code. It might just be interviews, surveys, or Figma prototypes.

Then we conduct the experiment and see what data comes out of it. We learn from that data and that improves our ideas of what we should build next.

This loop is where the idea of “Minimum Viable Product” comes from. But it’s often misunderstood. Minimum Viable Product isn’t the smallest thing we can deliver to customers; it’s the smallest test we can perform to learn what we’re going to deliver to customers. Those tests can be very small. Because the faster we can get through this loop, the more we can learn, and the less time we waste on failed ideas.

A slide labelled “Product-Based Governance (Prefer).” It says, “1. Build-Measure-Learn; 2. Iterate quickly; 3. Track return on investment.” Success is defined as “Delighted customers/partners/users” and “Business impact.”

This leads to product-based governance. Rather than creating a plan and working the plan, we iterate on a series of very small plans. If we learn and change our plans, we can steer our way to success.

Success means delighting our stakeholders, and doing so in a way that impacts the business.

We aren’t tracking adherence to plan, but rather, return on investment.

A slide labelled “The Essence of Agile.” It quotes Martin Fowler: “Agile development is adaptive rather than predictive; people-oriented rather than process-oriented.”

Adapting plans is one of the key ideas of Agile, of course. Martin Fowler describes the essence of Agile this way:

“Agile development is adaptive rather than predictive; people-oriented rather than process-oriented.”

So now that Agile’s taken over the world, and just about everybody’s using some flavor of Scrum, we’re all able to adapt our plans, right?

I’d like to say that’s true... but we know it’s not, don’t we.

There are a lot of companies saying they do Agile, or Scrum, and they’re not either of these things. They predict rather than adapt, and they orient around process rather than people. And if you’re not adaptive, if you’re not people-oriented... you’re not Agile, no matter how many Scrummasters, Sprints, or standups you have.

[beat]

A cute image showing chibi executives gathered around a large round table.

The build-measure-learn loop and adapting your plans is important at a tactical level. But what about big-picture strategy? At my company, strategy is decided by our Leadership team, which consists of our CEO and heads of Product, Marketing, Sales, Partners, Content, Finance, and so forth.

Alignment at this level is critical. Each person has their own view of their world, and their own theories about what’s needed for success. Product wants to improve usability. Marketing and Sales want splashy new features. Finance wants to reduce manual billing. Each person has their own view of the world, and their own theories about what’s needed for success. How do you balance those competing concerns? How do you allocate engineering’s limited capacity?

We’re trying something we call “product bets.”

A slide labelled “Product Bets.” It says, “Increase vampire sales with an AI-powered night salesperson.” The remaining details are described in the following text.

A Product Bet is a big-picture hypothesis. For example, our head of sales might say, “A lot of our customers are vampires. We can close a lot more deals if we introduce an AI-powered salesperson that’s available at night.”

(I’ll tell you a secret. Our customers aren’t actually vampires. I have to keep the specifics of our situation confidential. They could be werewolves.)

Product bets are proposed with a short, one-page summary of the hypothesis. It has a thesis statement: “Increase vampire sales with an AI-powered night salesperson.”

A sponsor: the head of sales.

The value we expect to gain, which is typically linked to spreadsheet with a financial model.

A summary of the reasoning for the value: Vampires spend $80mm on services like ours. Many of them use their human servants to do their shopping, but if we sell to them directly, they’ll be more likely to buy, and we’ll increase our market share among vampires by at least one percent.

And finally, a wager, which is the maximum amount of money we’re willing to spend on this bet.

A slide labelled “Bets Lead to Objections.” It shows several objections that are described in the following text.

Other members of the leadership team have their own priorities and bets, so naturally, they’re going to come up with objections.

The head of Content might say, “Shopping is beneath vampires. That’s what they have human servants for.”

The head of Marketing might say, “AI isn’t good enough to do sales.”

The head of Finance might say, “That’s going to cost a lot more than $400K.”

We want these objections. We want an open and honest dialog between members of the Leadership team, picking holes in the bets and finding ways to make them stronger.

A slide labelled “Objections Lead to Experiments.” It says, “Vampires have human servants to do their shopping. It’s beneath them.” The remaining details are described in the following text.

Ultimately, if a bet is chosen, those objections lead to experiments. “Shopping is beneath vampires? Let’s find out!”

Our hypothesis is that vampires will actually feel appreciated by us having vampire-friendly hours.

We want to test hypothesis as quickly and cheaply as possible, so we don’t build software; we “build” by temporarily moving some salespeople to a night shift.

Then we measure the difference in vampire sales.

Let’s say we got five times as many sales. That’s a clear win.

Another slide labelled “Objections Lead to Experiments.” This one says, “AI isn’t good enough to do sales.” The remaining details are described in the following text.

Our next objection is that AI isn’t good enough to do sales.

Our hypothesis is that no, it’s not going to be as good, but it’s going to be good enough.

Again, we want to test that hypothesis with the minimum effort possible, so we still don’t build software. Instead, we build a custom prompt in ChatGPT for the human night shift to use. Some salespeople use ChatGPT and follow the script; others continue as they were. We measure the difference.

In this case, let’s say the sales decreased from 5x as many sales to 2x as many. What did we learn? A night shift is a great idea, but AI-driven sales isn’t. We change our plans, and decide to build a permanent human night shift instead of using AI.

And now we’ve saved nearly 400 thousand dollars of investment. We spent a little bit of time and money on bonuses for the sales people who participated in the night shift, and some on the custom ChatGPT prompt, but much less than we would have spent on building the full solution.

A slide labelled “Product Bets.” It shows three bullet points: “Strategic Prioritization;” “Critical Thinking;” and “Hypothesis Generation.”

Ultimately, we want our Leadership team to align around strategy. Product bets are our tool for doing so.

I have to be honest. Getting adoption on this idea has been very slow, and we’re still rolling it out. So, of all the things I’m presenting today, this one is the most experimental, and the one to be most careful about adopting yourselves.

But if it works out, it won’t just be a tool for strategic prioritization. It will also be a tool for Leadership to think critically about their ideas, and a way to generate hypotheses for us to test using the Build-Measure-Learn loop.

A slide summarizing the “Lovability” topic. It has two major points, each with sub-bullets. The first point is, “Experiment with product bets,” with the sub-bullets “Strategic prioritization,” “Critical thinking,” and “Hypothesis generation.” The second point is, “Adapt your plans,” with the sub-bullets “Product-based governance” and “Build-Measure-Learn.”

If we were the best product engineering org in the world, our users, customers, and internal consumers would love our products. But even more, we would understand what our stakeholders need and put our limited capacity where it matters most.

First, we need to achieve strategic alignment at the leadership level. We’re starting to use product bets for that. They’re not just for prioritization, though. They’re also a way to think critically about our plans and generate hypotheses that we can test.

Second, we need to validate those bets and adapt our plans based on what we learn. We’re using product-based governance that focuses on impact rather than adherence to plan, and we’re using the build-measure-learn loop to test our hypotheses as quickly and cheaply as possible, preferably without building software at all.

Visibility

A cute image of a sheepish programmer standing in front of a collection of chibi stakeholders with a mix of emotions from happy to angry.

Given limited capacity—and there’s always limited capacity—there will be winners and losers in the prioritization game. Some people will be happy about the amount of time they’ve gotten from Engineering, and some will be sad. Even angry.

Transparency is vital for building trust with internal stakeholders. Where are we spending our time, and why? We send out regular reports, but that isn’t enough. We also have to talk to people, understand their perspective, share what’s going on and why.

And even if you do all that, there will still be people who are unhappy. As we say in the US, you can’t please all the people all the time.

I have to admit: I don’t have a lot of good answers about how to build trust. I chose the company I’m in now because I already had their trust. I’d worked with the founders before, 15 years ago. They knew what they were getting, and what I bring was what they wanted.

Even so, the founders’ trust didn’t automatically extend to the rest of the Leadership team. A lot had changed in 15 years, and nobody else knew me.

In fact, what the founders and I wanted, and what the rest of Leadership wanted, weren’t in alignment. We wanted product-based governance and Build-Measure-Learn. But they wanted predictability. The way they judged trustworthiness was simple: was Engineering doing what they said they would do? In other words, did we ship on time?

And the answer was “no.” Engineering was not shipping on time.

A repeat of the “Build-Measure-Learn” slide.

There are a lot of reasons Engineering wasn’t shipping on time. We didn’t have a good approach to forecasting, to begin with. But even we did, predictability wasn’t something I was planning to bring to the organization. Predictability is the realm of project-based governance. I was planning to introduce product-based governance.

Product-based governance can be frustrating to people who want predictability. Because we don’t know what we’ll do here [motions to “Build” step] until we know what happened here [motions to “Learn” step].

We can predict how long it will take us to get through a single loop, but we can’t predict what the next loop will look like.

Well, we could, but that would mean we didn’t learn anything, and if we didn’t learn anything, we’re not producing as much value as we could.

A screen shot from the movie “War Games.” It shows a computer terminal. On the terminal is the phrase: “Hello. A strange game. The only winning move is not to play.”

So, as another classic American saying has it, “the only winning move is not to play.”

...How about a nice game of chess?

One of the first things I did after joining the company was to introduce a more rigorous approach to forecasting. This approach requires gathering a lot of data, and while that was happening, I told my teams to stop making predictions to stakeholders.

This wasn’t popular with my stakeholders.

It’s not that they needed the predictions for any purpose. Predictions were being used as a political weapon. “You promised this would be done!” “Well, we had to set it aside, because the Leadership team decided we should work on this other thing.” “It doesn’t matter—you promised it would be done, and this is more important to me than that other thing they decided on!”

In other words, predictions were causing more harm than good.

The new forecasting approach is in place now, but we’re still not providing dates. Instead, we’re providing time ranges: “Historically, a valuable increment of this size has taken between two and six weeks.”

But we’re not predicting dates, because we don’t know when the work will start. As an example, one of my stakeholders has a project that’s very important to him. It keeps getting delayed by other priorities, and he’s upset about that. So he asks me, “when will it be done?” And I say, “2-6 weeks after it starts.” And he says, “so when will it start?” And I say, “that’s up to the Leadership team to decide. We’re ready to start as soon as they say ‘go.’”

He’s still unhappy, but now he’s unhappy with the prioritization process, and putting his effort into influencing prioritization decisions, which I’m going to call a win.

An image of people playing the Agile Fluency Game.

QR Code: Agile Fluency Game

People really want date predictions. The reason I can get away with not providing them is that the CEO, CTO, and Chief Product Officer are on my side. They understand the value of adaptive planning, and they trust my leadership. It’s the reason I’m working there. I was consultant for 23 years prior to joining this company, and was looking for companies to join for five years prior to choosing this one. I chose them because I knew I would get this level of support.

One way the CEO is supporting me is that he invited me to give a presentation about Agile at one of our quarterly Leadership retreats. It’s normal for me to attend these retreats, but for this one, I was given a full four hours out of the schedule to use as I please.

I used the time to explain muda and the reasons for our capacity constraints. I talked about product-based governance and many of the things we’ve discussed today. And, most importantly, I had them sit down and play a game.

In that game, which is called the Agile Fluency Game, participants experience what it’s like to be part of a team that’s adopting Agile for the first time. There are a lot of different lessons in the game, but one of the biggest is the cost of maintenance. If you aren’t careful about managing your maintenance costs, you’ll go out of business. Before you do, you’ll have several uncomfortable rounds struggling to make progress while all your spare capacity is spent on muda.

In other words, exactly the problem we were facing.

That opportunity turned things around for me at the company. I wouldn’t say I have everyone’s trust yet. But I do have their respect and understanding. They understand why Engineering isn’t giving them what they want, they understand why we’re focused on adapting our plans, and they respect my ability to improve it... or at least, the founders’ trust in me.

A cute image of a chibi programmer working on a spreadsheet while a variety of people watch intently.

I still have a long way to go. In 2025, we’re putting more emphasis on product bets and strategic planning. As part of that work, I’ll be working with the leadership team to create financial models of the cost and value of each of those bets. I’ll be providing forecasts of the capacity available in each of our product collectives, and helping stakeholders understand how their prioritization decisions result in tradeoffs of engineering capacity.

I’m hoping that working together in this way will help us further develop the visibility and trust we need. It’s a long, slow process, but without trust, we can’t be successful.

If we were the best product engineering organization in the world, our internal stakeholders would trust our decisions. I don’t think we’re there yet. Most of that comes down to unhappiness with our capacity, and with prioritization decisions.

To be honest, I’m no political genius. Somebody who’s more clever than me can probably figure out a better way to build the trust we need. For me, though, it started with having champions in the organization who already trusted me; being transparent about our capacity challenges; and showing people why things were operating the way they were with a hands-on experience.

But at the same time, I’m staying true to our goals. People always want predictive, not adaptive approaches, and I’m holding firm on staying adaptive. I am providing forecasts, but I’m doing it by extrapolating from historical data that compares estimates to actual results. And even then, I’m only forecasting how long things will take once they’ve started. I’m not providing dates.

This is what’s working for me. Your situation is going to be very different, so I’m not suggesting that you follow this approach exactly... or even at all. For example, if your CEO doesn’t support adaptive planning the way mine does, you might need to make predictions. Please adapt these plans for your situation.

Agility

A cute image of a lot of people working together in small groups. Sticky notes are scattered around the room.

If we’re going to adapt our plans and follow that build-measure-learn loop, we need the technical ability to do so. There are two aspects to this: tactical and strategic.

A slide comparing waterfall, Scrum, and Extreme Programming lifecycles. “Waterfall” shows the phases, “Plan, Analyze, Design, Code, Test, Deploy.” “Scrum” shows the same phases, but repeated in short cycles. “Extreme Programming” shows the same short cycles as Scrum, but the phases are performed simultaneously and continuously, not sequentially.

From a tactical perspective, most engineers don’t know how to design software incrementally. Most software development education still comes from a waterfall perspective, which assumes time spent on analysis and design before coding.

If you break up their work into short Sprints, they’re going to create mini waterfalls, where they do a little bit of planning, a little bit of design, a little bit of programming, and a little bit of testing. They won’t have enough time to do the design and test work that they really need to do. They’ll struggle to create a cohesive design, and they’ll struggle with bugs. If you adapt your plans frequently—as you should!—it becomes even harder. Your internal quality will suffer. Muda will rise.

To be successful in an adaptive environment, you need to be able to keep your design and code clean at all times. Extreme Programming practices such as evolutionary design, merciless refactoring, and test-driven development allow you to test, code, and design simultaneously, so you always have enough time for design and testing, even if you’re using short Sprints.

In other words, before we can have business agility, we need to have technical agility. This is where Extreme Programming shines, and it’s why I’m hiring for XP skills at my company.

A slide showing four teams: Database, Front-End, Back-End, and Operations. Red arrows show dependencies between nearly all the teams.

There’s also a strategic component to supporting business agility. As our business strategy changes, the amount of investment we put into this product or that product changes. Our ability to respond to those changes depends on how we organize our teams.

The “classic” way to organize software teams is functionally. A front-end team here, a back-end team there, a database team over there. I think we know the problems this causes by now. In order to deliver any value, we have to coordinate all four teams. It leads to delays and errors. Muda.

A slide labelled “BizDevSecDataKitchenSinkOps.” It shows a large number of people. In the corner, a chibi character lugs a kitchen sink on his back with tears running down his face.

Agile teams are cross-functional. We create teams that can own an entire portion of a product: product and the front-end and the back-end and the database and operations and security. We could call that “BizDevSecDataKitchenSinkOps.”

Well, in theory. In practice, it’s hard to fit that many people into a single team. So we end up having to divide people amongst teams. The most popular book on this subject is called "Team Topologies."

A slide showing the cover of the book “Team Topologies.” It lists four types of teams: Stream-aligned teams, Enabling teams, Complicated-subsystem teams, and platform teams.

Team Topologies provides ways of organizing teams so you can keep them small—seven or so people in each team—while still keeping them autonomous and focused on delivering value.

We have stream-aligned teams, which is what we really want, and then we have enabling teams, complicated-subsystem teams, and platform teams as a way of working around the fact that we can’t really have what we want with such small teams.

A slide showing four teams: “Legacy Money-Maker,” “Big Future Bet,” “Market Expansion,” and “Far Future Bet.”

So if we have too many people for one team, we divide them into multiple teams.

For example, let’s say we’re at a company with four products: the legacy money-maker, a big bet on the future, a way of expanding into new markets, and a bet on the far future. We can create a stream-aligned team for each product.

But now we have a problem. Some of these teams are still way too big.

The same slide with the “Legacy Money-Maker” and “Big Future Bet” teams broken up into smaller teams. Red arrows show dependencies between the smaller teams.

Team Topologies says to split these teams up further. Depending how you design the teams, this can work pretty well. I’ve used this approach many times myself.

A slide labelled “Team Topologies Issues.” The contents are described in the following text.

But over time, I’ve found several problems with the Team Topologies approach.

First, everyone is so isolated to their teams, silos form. In my experience, people barely interact across teams, even in the same product. This makes it difficult to succeed at cross-team initiatives, and hard to move people between teams.

Second, teams are rigid. When business needs change, adding and removing people from teams is a problem. Because teams are limited in size, new business needs often require you to reorganize your teams, which is a huge disruption. Often, Engineering resists the reorg, leaving their business partners frustrated, because effort isn’t being directed at their highest priorities.

Third, specialties such as user experience, security, and operations, are spread too thin. You don’t need a full time person on every team, but having people work part time on each team doesn’t work either. It leads to constant task switching, which makes it difficult for people to focus and wastes a lot of time.

And fourth, the teams often aren’t really independent. When you have a legacy codebase, it usually has to be shared across multiple teams, and no team really wants to own it. Quality degrades further as people focus on the code they do own.

We need to build in another direction. We need to build up, not out.

A slide labelled “Types of Scaling.” It shows “Horizontal scaling,” which involves adding more teams, and “Vertical scaling,” which involves making an existing team bigger.

When people thinking about adding a lot of people to Engineering, they usually think about adding more teams, but that’s just one way to scale: horizontal scaling. You can also make teams bigger. That’s vertical scaling.

A repeat of the slide with “Legacy Money-Maker” and “Big Future Bet” teams broken up into smaller teams. Red arrows show dependencies between the smaller teams.

Vertical scaling allows you to remove the complexities of Team Topologies and...

The same slide again, with dependencies between teams removed and just one team per product.

...return to one value stream per product.

My favorite way to do this is an approach called FaST.

An image showing a group of people on the left and several stickies in the top right. One group of four stickies is labelled “In Progress” and another group of two stickies is labelled “Up Next.”

QR Code: FaST

FaST was invented by Quinton Quartel. It stands for “Fluid Scaling Technology.” You can learn more at fastagile.io, and I have several in-depth presentations on my website, which you can find by following this QR code. There’s also a session on FaST at this conference later today! Yoshiki Iida and Takeo Imai will be speaking in room C at 3:15.

Here’s how FaST works.

First, everybody gathers together in a big room. All the teams I’ve used FaST with have been remote, so we use a videoconference and Miro, a collaborative whiteboarding tool.

The meeting starts with announcements, then team leaders from the previous cycle, called stewards, describe what their teams have worked on since the last FaST meeting, two or three days ago.

Next, product leaders describe their business priorities.

On your whiteboard, you’ll have a set of high-level priorities. My teams work in terms of valuable increments, which are things we can release that bring value to our organization. You can see that some increments are in progress, and some are waiting to be started.

When product leads describe their business priorities, they’re describing how these have changed. Usually it’s just a quick, "no changes." But if something has changed, a product lead will explain what has changed and why.

The same image again, but now four people have volunteered to lead teams. They have stickies showing what their teams will work on: “Reticulate the splines,” “Hunt for grues,” “Reverse entropy,” and “Transect encabulator.”

Next, team members volunteer to steward a team. Anybody can steward a team, but most teams are stewarded by engineers. The maximum number of stewards is limited to ensure each team has about 3-5 people on it.

Each steward describes what their team is going to work on. The stewards are expected to work on something that advances the collective’s business priorities, but they use their own judgment on what that is. Most of the time, it will be feature work, but it can also be things like improving the build or cleaning up noisy logs. Usually, they’re a continuation from the previous cycle.

The next stage of the same image. Now all the people have moved to join the four teams shown in the previous image.

Finally, people self-select onto the teams, based on what they want to work on, what they want to learn, and who they want to work with. Ultimately, they’re expected to do what’s best for the organization. Most of the time, they’ll continue with the same team as the previous cycle.

And that’s the FaST meeting. It takes 10-20 minutes, and it’s really all there is to FaST. It’s a way of having a single large group of people work together collectively by dynamically breaking into teams every few days. It’s simple, it’s fast, and it’s effective.

A slide labelled “Team Topologies Issues: Solved!” The contents are described in the following text.

FaST completely solves the issues I’ve seen with Team Topologies. I’ve stopped using Team Topologies in favor of just having large product teams.

First, when you have lots of small teams, it’s hard to plan work that involves multiple teams. With FaST, you have larger teams, so cross-team initiatives are much less likely. We haven’t had any at my company since we started using FaST over a year ago.

Second, Team Topologies has trouble with big business priority changes. That’s not a problem with FaST—the "F" stands for "Fluid," and it’s really true. People dynamically adjust to whatever we need. It’s incredibly responsive, too—if there’s an urgent need, we bring it to the next FaST meeting, which happens twice a week. People form a team around it and go! We just have to be careful to manage priorities and minimize work in progress. I’m constantly reminding the product managers that it’s better to finish work than to start it.

Third, when you have small teams, specialists tend to get spread across multiple teams, leading to a lot of frustration and task switching. That isn’t an issue with FaST because the collectives are large enough that each one can have a dedicated specialist. They self-select into whatever work needs to be done.

And finally, shared code is no longer an issue because because you can combine the teams that share code into a single collective.

FaST isn’t perfect, and there are some real challenges with moving to FaST. If you’re interested in trying it, come talk to me about those challenges, or watch my presentations about it. But I haven’t seen anything better for solving the team organization problems that occur at scale in engineering organizations.

A slide summarizing the “Agility” topic. It says “Tactical Agility: Extreme Programming” and “Strategic Agility: FaST.”

If we were the best product engineering organization in the world, we would seek out opportunities to change our plans. We would work in small pieces and adjust our strategy based on what we learned. To do that, we not only need the business agility we’ve already discussed, we need technical agility. Specifically:

Extreme Programming practices, which allow us to change direction without creating a technical mess.
FaST, which allows teams to shift fluidly in response to changing business needs.

Profitability

A cute image showing a chibi team gathered around a large screen. Sparkles and dollar signs float in the air.

Profitability is last on the list for reason. If we take care of our people, if we take care of our internal quality, if we take care of our customers and users, if we take care of our internal stakeholders, and if we are responsive to changes in the market... we will be profitable.

Almost.

We have to remember that the only way we can take care of our people, our customers, our users, and everyone else... is if we stay in business. It’s not enough to build great software. We also have to build it to be sold, to be cost effective, and to be put into production.

There’s a funny paradox about engineering. Great engineering doesn’t seem to be heavily correlated with success. In my career as a consultant, I met a lot of companies that were really struggling from an engineering perspective, but were still very successful from a business perspective. That’s because, no matter how much of a mess they were under the covers, they served the needs of the business.

A slide showing business departments and the purpose they serve in the business. The contents are described in the following text.

Here are a bunch of departments you might see in a business-to-business product company. Each of them directly contribute to the company’s yearly revenue.

Marketing generates leads—people who might want to buy your software—for your Sales department. They’re judged on the number of qualifying leads they create.

Partners also generates leads, or even sales, from people who are using complementary software. They’re judged on the revenue partners generate.

Sales converts leads into paying customers. They’re judged on the new revenue they generate.

Customer Success takes care of your customers. They’re judged on customer retention and upsell rates.

The same list of departments with their purpose removed. A chibi character sits on top of the list with a thoughtful expression on her face.

So what does product engineering do?

[beat]

The same list of departments again, but now they’re larger. Sentences describing how product engineering helps them grow have been added. (Those sentences are described in the following text.)

Our features should open up new markets, allowing Marketing to generate more leads.

We should provide useful APIs, allowing Partners to build new relationships.

We should respond to market trends, allowing Sales to convert more leads.

And we should fix the problems that get in customers’ way, reducing churn and increasing upsell.

Every dollar invested into engineering should be reflected in permanent improvements to the value your company creates. It may not be dollars or yen; it may be helping to cure malaria or fighting climate change. But however you define value, the purpose of product engineering is to change that trajectory for the better.

A slide summarizing the “Profitability” topic. It has two major points, each with sub-bullets. The first point is, “Build products to be sold,” with the sub-bullets “Sales,” “Marketing,” “Support,” “Partners,” and “Finance.” The second point is, “Change business trajectory,” with the sub-bullets “Attract more prospects,” “Enable new partners,” “Convert more leads,” and “Reduce churn.”

If we were the best product engineering organization in the world, we would build our products to be sold. We would work closely with our internal stakeholders to ensure our products were ready for the real world of sales, marketing, content, support, partners, accounting, and every other aspect of our business. We would plan for observability and operability, for outages and data security. We would build software that changes the trajectory of our business.

A repeat of the slide with six categories written in a bold white font on a black background. The categories are: People, Internal Quality, Lovability, Visibility, Agility, and Profitability.

And we would do it by having the best people in the business; having such high internal quality that changes were easy and bugs were rare; focusing our efforts on the changes that would make the most difference for our users and customers; having the trust of our internal stakeholders; and seeking out new opportunities and adapting our strategy.

Are we the best product engineering organization in the world? No. We’re not. But we would like to be. And we’re never going to stop improving.

I hope these ideas will help your companies continue to improve as well. Thank you for listening.

Update on Software Engineering Career Ladder

August 19, 2024

Back in April, I posted the new career ladder I was planning to introduce at OpenSesame, which I’ve joined as VP of Engineering. We rolled it out in July, so now’s a good time to share what we’ve learned so far.

Here’s the latest version of the ladder. (PDF)

Culture Changes

The purpose of the new career ladder is to help change the engineering culture at OpenSesame. The previous career ladder put a lot of emphasis on individual ownership and investigating new technologies. The new ladder focuses on teamwork, peer leadership, and maintainable code.

So far, this seems to be working. My managers tell me that they’re seeing shifts in behavior, with people volunteering to lead meetings and take on work they didn’t before. We’ve also been able to use the new career ladder as a touchstone for people who are having performance problems.

Verdict: So far, so good.

Manager-Led Evaluation

The old career ladder was employee-led. Each employee was expected to take ownership of their progression by providing examples of their skills. Each example was entered into a spreadsheet and given a score of 1-5 by their manager (or rejected entirely). Once the employee had reached a certain score threshold, they were eligible for promotion.

Employees felt that this approach was fair, as unbiased as these things can be, and gave clear direction about what they needed to accomplish. On the minus side, it required a lot of work, and some employees didn't put in the effort. It was inadvertently weighted toward “butt in seat” time and ability to self-promote.

The new career ladder makes evaluations manager-led. Managers are expected to fill out a spreadsheet evaluating employees on a wide range of skills. Each skill is rated on the following scale:

None: the employee doesn’t have the skill
Learning: they’re still learning the skill
Proficient: they can exercise the skill without help in some situations
Fluent: the skill is second nature and they exercise it whenever it’s appropriate

Promotion requires fluency in all the skills for a given title, including previous titles, although individual skills can be waived at managers’ discretion.

This lighter-weight approach allows us to have a lot more skills, and we were hoping it would remove the bias the previous system had toward self-promotion and longevity. Our concern was that it would put a lot more burden on the individual managers to understand their employees and fill out their spreadsheets.

As we’ve put it into practice, it’s definitely been a lot of work for managers to fill out the spreadsheets. For managers that know their employees well, the work’s been tolerable... mostly a matter of documenting what they already know. For managers that are new to their team, it’s been tough. They don’t have the intimate knowledge that filling out the spreadsheet requires, and it’s taking a lot of time to find out.

You could argue that the system is working as intended, and that managers should know their people well enough to fill out the spreadsheet. I tend to agree. It’s still a burden. The theory is that it’s only going to be difficult the first time, and then will get easier. We’ll see if that happens.

The other open question is whether engineers feel this system is better. People had mixed feelings about the old system—they didn’t like the bias, but they did like how clear it was, and they thought it was fair if you were willing to put in the effort. I haven’t had a chance to go back and interview the engineers about what they think about the new system yet, but I’d like to do so.

Verdict: Jury’s still out on this one.

Latest Update

In April, the ladder only covered up to Technical Lead. Now it also covers the advanced titles—Staff Engineer for the engineer track and three Engineering Manager titles for the management track. We still need to add Principal Engineer for the engineer track, and specialty skills, but this is enough for a foundation.

The management track is the biggest addition, with 78 new skills. Just as the ladder sets new expectations of engineers, the management track sets new expectations for managers, with material about managing the system rather than just managing the work.

Read the full career ladder here. (PDF)

Here’s a summary of the titles and skills, with changes marked:

Associate Software Engineer

Associate Software Engineers are just starting their software development careers. They're expected to understand the basics of software development, and be able to work in a professional setting, but they're mostly working under the guidance of more experienced engineers.

Baseline Expectations (was “Professionalism”)
- Spoken and written English
- Work ethic
- Intrinsic motivation
- Remote attendance
- In-person attendance
- Active participation
- Assume positive intent (new)
- Respectful communication
- Transparency
- Team orientation
- Follow the process
- Persistence (was “Grit”)
- Absorb feedback
- Growth mindset
- OpenSesame Qualified¹
Classroom Engineering
- Object-oriented programming language
- Pairing/teaming driver
- Classroom-level debugging
- Function and variable abstraction

¹“OpenSesame Qualified” is our internal training program.

Software Engineer

Software Engineers contribute to the work of their team without explicit guidance. They've begun to demonstrate peer leadership skills and are developing their abilities as generalizing specialists.

Basic Communication
- Collective ownership
- Defend a contrary stance
- “Yes, and...”
- Try it their way
- Technical feedback
- Active listening
- As-built documentation
Basic Leadership
- Basic facilitation
- Team steward
- Valuable increment steward
- Scut work
Basic Product
- Your team’s product
- Your team’s customers and users
- User story definition
Basic Implementation
- Your team’s programming language
- Your team’s codebase
- Basic test-driven development
- Sociable unit tests
- Narrow integration tests
- End-to-end tests
- Manual validation
- Spike solutions
- Basic SQL
- Pairing/teaming navigator
- Basic algorithms
- Basic performance optimization
- Debugging your team’s components
- Simple dependency integration
- Unhappy path thinking
Basic Design
- Decompose problem into tasks
- Class abstraction
- Mental model of your team’s codebase
- Mental model of a complex dependency
- Method and variable refactoring
- Campsite rule
- Fail fast
- Paranoiac telemetry
- Evaluate simple dependencies
Basic Operations
- Source control
- Your team’s release process
- On-call responsibility
- On-call triaging
- Issue investigation
- Your team’s cloud infrastructure
- Code vulnerability awareness
- Cloud vulnerability awareness

Senior Software Engineer

Despite the name, Senior Software Engineers are still fairly early in their careers. However, they have enough experience to take a strong peer leadership role in their teams. They've developed broader generalist skills and deeper specialist skills.

Advanced Communication
- Clear and concise speaking
- Clear and concise writing
- Technical diagramming
- Explain mental model
- Ensure everyone’s voice is heard
- Coalition building
- Interpersonal feedback
- Runbook documentation
Advanced Leadership
- Peer leadership
- Comfort with ambiguity
- Risk management
- Intermediate facilitation
- Mentoring and coaching
- Critique the process
- Build quality in (new)
- Circles and soup
Advanced Product
- Ownership
- Vertical slices
- Cost/value optimization
Advanced Implementation
- All of your team’s programming languages
- All of your team’s codebases
- Codebase specialty
- Code performance optimization
- Complex dependency integration
- Retrofitting tests
- Exploratory testing
Advanced Design
- Codebase design
- Simple design
- Reflective design
- Cross-class refactoring
- Basic database design
- Mental model of team dependencies
- ~~Evaluate complex dependencies~~ (moved to Technical Lead)
- Simplify and remove dependencies
Advanced Operations
- Observability
- Basic build automation
- Basic deployment automation
- Incident leader
- Incident communicator
- Incident fixer
Senior SE Specialty
- Choose one of the specialty skill sets listed below.

Technical Lead

Technical Leads are the backbone of a team. They combine deep expertise in several specialties with the ability to mentor and coach less experienced team members. They work closely with the team's other technical leads to advise engineering managers on the capabilities and needs of the team. However, this remains a coding-centric role, and the majority of their time is spent as a player-coach working alongside other team members.

Team Leadership
- Personal authority
- Leaderful teams
- Leadership specialty
- Assess technical skills
- Assess interpersonal skills
- Assess product skills
- Technical interview
- Impediment removal
Interpersonal Leadership
- Humility
- Psychological safety
- Calm the flames
- Ignite the spark
Product Leadership
- Options thinking
- Status and forecasting
- Progress and priorities
Design Leadership
- Pragmatic idealism (new)
- Simple codebase architecture
- ~~Reflective codebase architecture~~ (moved to Staff Engineer)
- Risk-driven codebase architecture
- Architectural refactoring
- Evaluate complex dependencies (moved from Senior Software Engineer)
- Published API design
Technical Lead Specialties
- Choose three(?) additional specialty skill sets.

After the Technical Lead level, people can choose the Technical Track, which consists of the Staff Engineer and Principal Engineer titles, or the Management Track, which consists of the Manager and (eventually) Director titles.

Staff Engineer (New)

Staff Engineers make a difference to the performance of Engineering as a whole. They spend time with each team in turn, working hands-on as player-coaches, and cross-pollinating information and ideas. They bring a breadth and depth of expertise that people are happy to learn from.

Departmental Leadership
- Cross-team personal authority
- Management team personal authority
- Cross-pollination
- All teams’ products
- All teams’ codebases
- Reflective codebase architecture (moved from Senior Software Engineer)
- Evaluate shared dependencies
- Pedagogy
- The heart of the matter
- Identify systemic technical weaknesses
- Identify systemic product weaknesses
- Identify systemic process weaknesses

Principal Engineer (WIP)

This level hasn’t been defined yet.

Associate Manager of Engineering (New)

Associate Engineering Managers work under the guidance of more senior managers to learn the craft of managing teams. In contrast to the Engineer Track, which focuses on peer leadership, managers focus on the team as a system, with an emphasis on people, process, and product.

Manager Baseline
- Self-starter
- Time management
- Clear communication
- Situational awareness
- Responsiveness
- Reliability
- Organization
- Detail orientation
- Radical transparency
- Disagree
- Commit to vision
- Commit to action
Basic People Management
- Team rapport
- Tailored approach
- Performance evaluation
- Performance development
- Career development
- Assess team dynamics
Basic Process Management
- Systems thinking
- Understand the why
- Delegation
- Assess team process
- Development resources
- Management specialty
Basic Product Management
- Engineering credibility
- Go to gemba
- Engineering partner
- Assess team technical skills

Manager of Engineering (New)

Engineering Managers are capable of managing teams without guidance. They're not about telling people what to do; instead, their job is to engineer a system that provides the team with the context, skills, resources, and peer leadership it needs to excel without management involvement.

Advanced People Management
- Difficult conversations
- Performance remediation
- Improve team dynamics
- Conflict resolution
- Hiring manager
- Diverse perspectives
- Bench strength
Advanced Process Management
- Manage the system
- Improve team ownership
- Improve team process
Advanced Product Management
- Improve team technical skills
- Chartering
- Business context
- Stakeholder context
- Permeable shield
- Stakeholder communication
- Advanced forecasting
Organizational Management
- Presentations
- Spreadsheets
- Administration
- Procurement
- Governance
- Event planning
- Team-level advisor

Senior Manager of Engineering (New)

Senior Engineering Managers are leaders on the engineering management team. They look beyond their own team to how they can improve the performance of all engineering teams. They're mentors to other managers and advisors to senior management.

Management Leadership
- Management team leader
- Cross-team initiatives
- Cross-team pollination and growth
- Cross-team process alignment
- Identify systemic issues
- Address systemic issues
- Division-level advisor
Process Design
- Process design mentoring
- Change management
- Principle: Rely on people
- Principle: Deliver value
- Principle: Eliminate waste
- Principle: Seek technical excellence
- Principle: Improve your process
- Key idea: Build quality in
- Key idea: Collective ownership
- Key idea: Continuous improvement
- Key idea: Embrace failure
- Key idea: Face-to-face communication
- Key idea: Feedback and iteration
- Key idea: Fast feedback
- Key idea: Last responsible moment
- Key idea: Minimize work in progress
- Key idea: Optimize for maintenance
- Key idea: Self-organizing teams
- Key idea: Simplicity

Feedback

Please share your thoughts!

Agile Fluency eBook in Portuguese

May 14, 2024

One of my most enduring works is the Agile Fluency Model, which I created with Diana Larsen. Our original article, The Agile Fluency Model: A Brief Guide to Success with Agile has been translated into multiple languages. And now... that includes Brazilian Portuguese!

Download the Brazilian Portuguese version here.

Many thanks to Renato Barbieri for creating this translation for us. His book, Uma Breve História da Agilidade, tells the history of the Agile movement. I haven't read it yet—partly because I don’t know Portuguese—but I trust that it’s excellent.

Renato is donating all the royalties from the book to help victims of the current flooding in the south of Brazil. You can help him do so by buying his book here.

Free Self-Guided “Testing Without Mocks” Training

May 12, 2024

I’m thrilled to announce that my commercial “Testing Without Mocks” training course is now available for free!

“Testing Without Mocks” Training

My “Testing Without Mocks” resources—also known as “Nullables”—are consistently among the most popular material on this site. I used to offer an instructor-led course for it. But I’m too busy for that now, so I’ve released that same high-quality course in a self-guided format.

There’s just one caveat: the self-guided version of this course is offered without support. If you need tutoring or want a live, instructor-led course, contact me about paid options.

Other than that, it’s free for you to enjoy! Find it here.

A Useful Productivity Measure?

May 5, 2024

In my new role as VP of Engineering, there was one question I was dreading more than any other: “How are you measuring productivity?”

I can’t fault the question. I mean, sure, I’d rather it be phrased about how I’m improving productivity, rather than how I’m measuring it, but fair enough. I need to be accountable for engineering productivity. There are real problems in the org, I do need to fix them, and I need to demonstrate that I’m doing so.

Just one little problem: software productivity is famously unmeasurable. Martin Fowler: “Cannot Measure Productivity.” From 2003. 2003!

More recently, Kent Beck and Gergely Orosz tackled the same question. Kent concluded: “Measure developer productivity? Not possible.”¹

¹Kent and Gergely’s two-part article is excellent and worth reading. Part one. Part two. And a later followup.

So now what do I do? That’s it, I’m screwed, make up some bullshit metric and watch my soul die, McKinsey style? Fight with my CEO about his impossible request until he gives up and fires me?

Maybe not. I think I’ve found another way. It’s early, but it’s working for me so far. Will it work for you? Eeehhhhh... maybe. Probably not. But maybe.

My Solution

It started half a year ago, in September 2023. My CEO asked me how I was measuring productivity. I told him it wasn’t possible. He told me I was wrong. I took offense. It got heated.

After things cooled off, he invited me to his house to talk things over in person. (We’re a fully remote company, in different parts of the country, so face time takes some arranging.) I knew I couldn’t just blow off the request, so I decided to approach the question from the standpoint of accountability. How could I demonstrate to my CEO that I was being accountable to the org?

We met at the CEO’s house, along with the CTO and CPO (Chief Product Officer). I led them through an exercise: “Imagine we’ve built the best product engineering organization in the world. What does that look like?” We came up with six categories of ideas. Then I asked, “Which indicators will help us understand how we’re getting closer to these ideals?” We came up with indicators in each category. Blissfully, none of those categories were “productivity.” I don’t think anyone noticed. Bullet dodged.

I came away feeling fairly positive about the conversation. I discussed the results with my Engineering and Product peers, we refined, and I finally presented the first “Product Engineering Accountability Review” a few weeks ago. It went well! I used the indicators to support a qualitative discussion of what’s happening in Engineering, rather than just reporting numbers.

One problem: the CEO had a scheduling conflict and couldn’t come. So I don’t know what he would have thought. But at least the CTO and CPO liked it.

The Productivity OKR

Meanwhile, back in January, the leadership team had established that one of our company-wide OKRs¹ would be to define and improve productivity metrics for each department. I was to present mine to the full Leadership team at the end of April. Crap. Bullet un-dodged.

¹“OKRs” are “Objectives and Key Results.” They’re a way of setting and tracking goals. Similar to Management by Objectives, about which Deming said: “Eliminate management by objective. Eliminate management by numbers, numerical goals. Substitute leadership.” But that’s a rant for another day.

In October, we had defined six aspects of being the greatest product engineering company in the world. One of them was “profitability.” Its indicators were the most related to outcomes. If I had to measure productivity—and I did—they were the ones to use.

We had three indicators for profitability: actual RoI, estimated RoI, and value-add capacity. The first was best, in theory. In practice, it might be impossible to measure. Before I explain why, I need to explain how we calculate RoI.

Product Bets

Every engineering organization I’ve ever seen has had more demand than capacity. Prioritizing those demands is crucial. And fraught. Lots and lots of opportunity for conflict.

We’re no exception. To help bring order to the chaos, the VP of Product and I have introduced the idea of “Product Bets.” Each major initiative needs a Product Bet Proposal. It's a short, one-page document that explains:

What we’re going to accomplish
The value it’s estimated to generate
The amount we’re willing to bet
The justification for the value
How we’ll measure the value

In order for a proposal to be accepted, a member of the Leadership team needs to sponsor it, take accountability for its success, and convince the other Leadership members that their proposal is more important than all the others.

In theory, anyway. I’ve tried variants of this idea before, and it’s never lasted. Turns out leadership teams like accountability more when it’s other people who have to be accountable.

But we’re trying. So far... it’s kind of working. Maybe. Too early to tell, honestly. I’ll write more after the verdict’s in.

A True Measure of Productivity

But if the product bet process does work... well. That would be cool. It would give us a true measure of productivity. We would know the value of a bet, and it’s easy to know how much we spend on a bet. Value produced over dollars spent. Boom. Productivity. Done.

Even better, the numbers are nice. Very nice. Each bet has a “maximum wager,” which determines how many engineer-days we’ll invest in the bet before giving up. Those wagers are based on one tenth of the expected value over five years. In other words, 10x return on investment.

10x return on investment is enough to make anybody take notice. But... measuring value is a problem. Sure, each bet has an section on how we’ll measure value, but can we really tease that out from sales team effort, customer success team effort, other feature changes, and changes in the market? Probably not.

It might not matter. We may not be able to measure the actual value, but every bet also has an estimated value attached. Combined with actual cost, that gives us a measure of estimated RoI. It may not be real RoI, but it’s good enough for understanding the productivity of the engineering team.

That’s our first two productivity measures: actual RoI and estimated RoI. Pretty good. Except that we don’t have any data.

A Better Measure of Productivity... For Now

The RoI metrics rely on us having product bets. But we don’t. Not yet. We’re still rolling them out. So, no matter how good the metrics might be, we can’t use them. No data.

There’s a third indicator in the “profitability” category we can use, though. It’s value-add capacity.

Like any engineering organization, we spend some percent of our time on fixing bugs, performing maintenance, and other things that are necessary but don’t add value from a customer or user perspective. The Japanese term for this is muda.

If we didn’t have any muda, spent all our time on value-add work, and achieved a 10x return on each investment, our productivity would be ten: $10 for every $1 in salary (or close enough). If we spent 80% of our time on value-add work, our productivity would be eight. Twenty percent, two.

In other words, in the absence of RoI measures, the percent of engineering time spent on value-add activities is a pretty good proxy for productivity.

That’s the productivity number I reported to Leadership last week.

How It Was Received

It worked really well. The nice thing about reporting this number was that people were already frustrated with Engineering’s progress. They could see that we had capacity problems, but they didn’t know why. It was easy for them to assume that it was because people weren’t working hard, or didn’t know what they were doing.

I presented our metric as a single stacked bar chart. (Like a pie chart, but in a rectangle.) Muda on the bottom, value-add on the top. Then I expanded out the muda into another stacked bar chart, showing how much time was being spent across all of Engineering on deferred maintenance, bugs, on call, incident response, deployments, and so forth. Then expanded out again with more detail for the worst of those categories.

It completely changed the tenor of the conversation. Suddenly, the conversation shifted from, “how can we get the stuff we want sooner,” to “how can we decrease muda and spend more time on value-add work?” That’s exactly the conversation we need to be having.

Earlier in the week, the CEO told me that, next quarter, Leadership wants a briefing from me about how Engineering works. What my deliverables are, essentially. With the capacity measure, I have a good answer: my job is to double our value-add capacity over the next three years. Essentially, to double our output without increasing spending.

You know what? With my XP plans and the XP coaches I’ve hired, it’s totally doable. I think I’m being kind of conservative, actually.

A Fatal Flaw

So that’s my productivity measure: value-add capacity. The percentage of engineering time we spend on adding value for users and customers. Can you use it? Eehhhhh... maybe.

I can use value-add capacity—so far—because I’m aggressively stubborn about honest data. I refuse to skew our numbers or do worse work to make my department look good, and I’m keeping a close eye on what my teams are doing, too. This is important, because value-add capacity has a fatal flaw:

When a measure becomes a target, it ceases to be a good measure.

Goodhart’s Law

It’s ridiculously easy to cheat this metric. Even if you correctly categorize your muda—it’s very tempting to let edge cases slide—all you have to do is stop fixing bugs, defer some needed upgrades, ignore a security vulnerability... and poof! Happy numbers. At a horrible cost.

Actually, that’s the root of my org’s current capacity problems. They weren’t cheating a metric, but they were under pressure to deliver as much as possible. So they deferred a bunch of maintenance and took some questionable engineering shortcuts. Now they’re paying the price.

Unfortunately, you can get away with cheating this metric for a long time. Years, really. It’s not like you cut quality one month and then the truth comes out the next month. This is a metric that only works when people are scrupulously honest, including with themselves.

So, yeah, I’m not sure if this will work for you. It depends on how much ability you have to police things. The RoI indicators might not work for you either. They require product bets, or something similar, and that requires a lot of org changes. Even if they do work for me—jury’s out on that—they’re not something you can introduce overnight.

But, so far, value-add capacity is working for me, and I thought that might be interesting to you. Maybe spark some ideas. Just be cautious. Goodhart’s Law is a vengeful bastard. Remember, all this productivity metric stuff is a sideshow to what really matters:

Deliver valuable software. Do it often. And write it well.

Good luck.

A Software Engineering Career Ladder

April 27, 2024

Update: See an update on our progress in my August update.

I’ve been quiet lately, and that’s because I’ve joined OpenSesame as Vice President of Engineering. It’s been a fascinating opportunity to rebuild an engineering organization from the inside, and I’m loving every minute. We’re introducing a lot of cutting-edge software development practices, such as self-organizing vertically-scaled teams and Extreme Programming.

As you might expect, introducing these changes to an organization with [REDACTED] number of engineers has been challenging. (I’m not sure if I’m allowed to say how many engineers we have, so let’s just say “lots,” but not “tons.” Bigger than a breadbox, anyway. Enough that I don’t do any coding myself, and the managers that report to me don’t have time to do much either.)

What I’m really doing is changing the engineering culture at OpenSesame. Culture doesn’t change easily. It tends to snap back. True change involves changing hundreds of little day-to-day decisions. That’s hard, even when people want to make those changes, and full buy-in is hard to come by. I’ve hired several XP coaches to help, but even they’re stretched thin.

A Lever for Change

This is where the new career ladder comes in. OpenSesame had a pretty innovative approach to career development before I joined. It involved a spreadsheet where engineers would gather evidence of their skills. Each piece of evidence contributed towards an engineer’s promotion. It did a nice job of being objective (or at least, as objective as these things can be) and clear about expectations.

The new career ladder builds on the ideas of the previous spreadsheet to introduce the changes I want. Where the old spreadsheet focused on individual ownership and investigating new technologies, the new one emphasizes teamwork, peer leadership, and maintainable code. I’m hoping this will help direct people to new behaviors, which will in turn start to change the engineering culture.

The new spreadsheet also replaces the previous evidence-based approach with a simple manager-led evaluation of skills. This makes room for a lot more skills. Too many, possibly. It’s a very fine-grained approach. But I’m hoping that will help provide clarity to engineers and give them the opportunity to pick and choose which skills they want to work on first.

How It Works

Each title has certain skill requirements, which are grouped into skill sets. For example, “Software Engineer” requires these skill sets:

Basic Communication
Basic Leadership
Basic Product
Basic Implementation
Basic Design
Basic Operations

Each skill set includes several skills. For example, “Basic Design” includes these skills:

Decompose problem into tasks
Class abstraction
Mental model of your team’s codebase
Mental model of a complex dependency
Campsite rule
Fail fast
Paranoiac telemetry
Evaluate simple dependencies

(There’s a document that explains each skill in more detail.)

Managers evaluate each engineers’ skills by talking to team members and observing their work. Each skill is graded on this scale:

None. The engineer doesn’t have this skill.
Learning. The engineer is learning this skill.
Proficient. The engineer can succeed at the skill when they concentrate on it, but it isn’t second nature.
Fluent. The engineer uses the skill automatically, without special effort, whenever it’s appropriate.

When an employee is fluent at all the skills for a particular title (and all previous titles), they’re eligible for promotion to that title.

(We also offer step promotions, such as Software Engineer 1 to Software Engineer 2, which come when the engineer is proportionally far along their way to the next title.)

Submitted for Your Approval

Why tell you all this? Because I want your feedback. We have an early draft that we’re starting to roll out to a handful of engineers. I’m sure there are opportunities for improvement. We’ve probably forgotten some skills, or set the bar too high in some areas, or too low.

So I’d love for you to take a look and share what you think. Maybe you’ll find some of the ideas useful for your own teams, too. You can find the spreadsheet and documentation here:

Skill documentation (Updated August 18th)
Career ladder spreadsheet (somewhat mangled)

Please share your feedback in one of these places:

Full Career Ladder

Here’s the full list of titles and skills. You can find descriptions of each skill in the documentation.

Associate Software Engineers

Associate Software Engineer 1s are at the start of their career. They’re expected to understand the basics of software development, and be able to work in a professional setting, but they’re mostly working under the guidance of more experienced engineers.

Professionalism
- Spoken and written English
- Work ethic
- Intrinsic motivation
- Remote attendance
- In-person attendance
- Active participation
- Respectful communication
- Transparency
- Team orientation
- Follow the process
- Grit
- Absorb feedback
- Growth mindset
- OpenSesame Qualified¹
Classroom Engineering
- Object-oriented programming language
- Pairing/teaming driver
- Classroom-level debugging
- Function and variable abstraction

¹“OpenSesame Qualified” is our internal training program.

Software Engineers

Software Engineer 1s still have a lot to learn, but they’re able to contribute to the work of their team without explicit guidance. They’re beginning to demonstrate peer leadership skills and develop their abilities as generalizing specialists.

Basic Communication
- Collective ownership
- Defend a contrary stance
- “Yes, and...”
- Try it their way
- Technical feedback
- Active listening
- As-built documentation
Basic Leadership
- Basic facilitation
- Team steward
- Valuable increment steward
- Scut work
Basic Product
- Your team’s product
- Your team’s customers and users
- User story definition
Basic Implementation
- Your team’s programming language
- Your team’s codebase
- Basic test-driven development
- Sociable unit tests
- Narrow integration tests
- End-to-end tests
- Manual validation
- Spike solutions
- Basic SQL
- Pairing/teaming navigator
- Basic algorithms
- Basic performance optimization
- Debugging your team’s components
- Simple dependency integration
- Unhappy path thinking
Basic Design
- Decompose problem into tasks
- Class abstraction
- Mental model of your team’s codebase
- Mental model of a complex dependency
- Method and variable refactoring
- Campsite rule
- Fail fast
- Paranoiac telemetry
- Evaluate simple dependencies
Basic Operations
- Source control
- Your team’s release process
- On-call responsibility
- On-call triaging
- Issue investigation
- Your team’s cloud infrastructure
- Code vulnerability awareness
- Cloud vulnerability awareness

Senior Software Engineers

Despite the name, Senior Software Engineer 1s are still fairly early in their careers. However, they have enough experience to take a strong peer leadership role in their teams. They’ve developed broader generalist skills and deeper specialist skills.

Advanced Communication
- Clear and concise speaking
- Clear and concise writing
- Technical diagramming
- Explain mental model
- Ensure everyone’s voice is heard
- Coalition building
- Interpersonal feedback
- Runbook documentation
Advanced Leadership
- Peer leadership
- Comfort with ambiguity
- Risk management
- Intermediate facilitation
- Mentoring and coaching
- Critique the process
- Circles and soup
Advanced Product
- Ownership
- Vertical slices
- Cost/value optimization
Advanced Implementation
- All of your team’s programming languages
- All of your team’s codebases
- Codebase specialty
- Code performance optimization
- Complex dependency integration
- Retrofitting tests
- Exploratory testing
Advanced Design
- Codebase design
- Simple design
- Reflective design
- Cross-class refactoring
- Basic database design
- Mental model of team dependencies
- Evaluate complex dependencies
- Simplify and remove dependencies
Advanced Operations
- Observability
- Basic build automation
- Basic deployment automation
- Incident leader
- Incident communicator
- Incident fixer
Senior SE Specialty
- Choose one of the specialty skill sets listed below.

Technical Leads

Technical Leads are the backbone of a team. They combine deep expertise in several specialties with the ability to mentor and coach less experienced team members. They work closely with the team’s other technical leads to advise engineering managers on the capabilities and needs of the team. However, this remains a coding-centric role, and the majority of their time is spent as a player-coach working alongside other team members.

Team Leadership
- Personal authority
- Leaderful teams
- Leadership specialty
- Assess technical skills
- Assess interpersonal skills
- Assess product skills
- Technical interview
- Impediment removal
Interpersonal Leadership
- Humility
- Psychological safety
- Calm the flames
- Ignite the spark
Product Leadership
- Options thinking
- Status and forecasting
- Progress and priorities
Design Leadership
- Simple codebase architecture
- Reflective codebase architecture
- Risk-driven codebase architecture
- Architectural refactoring
- Published API design
Technical Lead Specialties
- Choose three(?) additional specialty skill sets.

Staff Engineers

Staff Engineers make a difference to the performance of Engineering as a whole. They rove between teams, cross-pollinating information and ideas. They work hands-on with each team, acting as player-coaches, bringing a breadth and depth of expertise that people are happy to learn from.

These skill sets haven’t been defined yet.

Principal Engineers

This level hasn’t been defined yet.

Specialty Skill Sets

Starting at the Senior Software Engineer level, engineers choose specialty skill sets in additional to the foundational skill sets described above. We haven’t defined these skill sets yet, but here are some of the ones we’re considering:

Product
Distributed systems
Databases
Security
Extreme Programming
Developer Automation
Algorithms
Machine Learning
Front-End
iOS
Android

Feedback

Please share your thoughts!

Art of Agile Development in Korean

November 19, 2023

Book cover for the Korean translation of “The Art of Agile Development, Second Edition” by James Shore. The title reads, “[국내도서] 애자일 개발의 기술 2/e”. It’s translated by 김모세 and published by O’Reilly. Other than translated text, the cover is the same as the English edition, showing a water glass containing a goldfish and a small sapling with green leaves.

I’m pleased to announce that the Korean translation of The Art of Agile Development is now available! You can buy it here.

Many thanks to 김모세 for their hard work on this translation.

Art of Agile Development in India and Africa (English)

July 24, 2023

Book cover for the Indian edition of “The Art of Agile Development, Second Edition” by James Shore. It’s the same as the normal edition, showing a water glass containing a goldfish and a small sapling with green leaves, except that the publisher is listed as SPD as well as O’Reilly. There’s also a black badge labelled “Greyscale Edition” that reads, “For Sale in the Indian Subcontinent and Selected Countries Only (refer back cover).”

I’m pleased to announce that there’s a special edition of The Art of Agile Development available in the Indian subcontinent and Africa! (It’s in English.) You can buy it here.

Many thanks to Shroff Publishers & Distributors Pvt. Ltd. (SPD) for making this edition available.

AI Chronicles #7: Configurable Client

July 2, 2023

In this weekly livestream series, Ted M. Young and I build an AI-powered role-playing game using React, Spring Boot, and Nullables. And, of course, plenty of discussion about design, architecture, and effective programming practices.

Watch us live every Monday! For details, see the event page. For more episode recordings, see the episode archive.

In this episode...

We turn to parsing the response returned from the “say” API, which will be the OpenAI response to a message. To do that, we add the ability to configure the nullable HttpClient so it returns predetermined responses from our tests. We discover that using the HTTP library's Response object provides a default Content Type, which we don't want for our tests, and deal with the window vs. Global implementation of fetch().

After we get everything working, we add types to make the TypeScript type checker happy. With that done, we're ready for the next episode, where we'll return to the Spring Boot back-end and implement the “say” API endpoint.

Ted spoke at the Kansas City Developer Conference (0:16)
Lack of Java-focused conferences in the USA (1:19)
Conferences in the USA vs. Europe/rest of world (2:10)
Trying to aim talks at the right level for the audience (4:38)
Ted's research to prepare for the AssertJ talk (6:05)
AssertJ assertions for Joda Money (7:23)
Joda Money project vs. JSR-354 Money & Currency (8:45)
Programming language cultures (10:34)
Checked exceptions and API design (12:20)
Language convention vs. enforced rules (14:02)
Why we wrap third-party libraries and objects (17:49)
Primitive Obsession (20:16)
Exploring and learning your tools (22:51)
Learning design from Martin Fowler's "Refactoring" book (23:25)
Small changes, small steps (24:39)
Loss of awareness of design? (25:32)
Reading books in a group (29:12)
Refactorings and their trade-offs (30:29)
James talks about CTO vs. VP Engineering (31:56)
Reviewing where we left off in the code (34:30)
Sidebar: forgetting what you were doing in a project (39:27)
Planning and doing one thing at a time (41:01)
Context-switching in a heavy pull-request environment (41:48)
Feedback loops and eXtreme Programming (42:55)
Testing parsing of responses in the BackEndClient (45:43)
Sidebar: who holds the state? (49:06)
Configuring the answer for the BackEndClient (50:05)
Test-driving HttpClient's default response (54:45)
Where is that text/plain content type coming from? (1:05:18)
Sidebar: differencing in test output and coding in VB, and QB (1:06:10)
Should our stubbed fetch() return content length? (1:09:06)
Configuring HttpClient's response for an endpoint (1:10:48)
Discovered need to specify full endpoint URL, not just path (1:14:58)
Test failed as expected, on to implementation (1:17:37)
Who has fetch()? Window vs. Global vs. globalThis (1:21:26)
Using Optional chaining and nullish coalescing (1:29:23)
Troubleshooting "headers.entries" (1:30:09)
Specifying content-type in configured response (1:36:30)
Generalize to allow partially configured response (1:42:10)
Sidebar on readability of "advanced" syntax in code (1:48:40)
Allowing multiple endpoints to be configured (1:52:13)
Avoiding real-world values in configuration tests (1:58:42)
Spiking some attempts at improving code (1:59:20)
Adding types to make TypeScript type checker happy (2:05:10)
Defining own type often easier than reusing library types (2:14:24)
Back to the BackEndClient failing test (2:16:12)
Refactor test code now that it passes (2:20:36)
Reviewing the test refactor (2:30:45)
BackEndClient is done: updated the plan and integrated (2:31:32)
Next time we'll start with the Spring Boot back-end endpoint (2:33:10)
Review our work (2:34:05)
Downside of sociable vs. isolated tests with mocks (2:35:02)
The Rubber Chicken (2:38:18)

Source code

Visit the episode archive for more.

AI Chronicles #6: Output Tracker

June 18, 2023

Watch us live every Monday! For details, see the event page. For more episode recordings, see the episode archive.

In this episode...

We continue working on our front end. After some conversation about working in small steps, we turn our attention to BackEndClient, our front-end wrapper for communication to the back-end server. We start out by writing a test to define the back-end API, then modify our HttpClient wrapper to track requests. By the end of the episode, we have the back-end requests tested and working.

Program Note (0:12)
Multi-Step Refactorings (2:26)
Work in Small Steps (6:31)
Evaluating Complexity (11:11)
Collaborative Development (19:04)
Continuous Improvement (25:24)
Fixing the Typechecker (28:02)
Today’s Plan (31:11)
James Shore’s Housecleaning Tips (33:03)
Build the BackEndClient (35:59)
- Sidebar: Delaying Good Code (38:48)
- End sidebar (41:41)
Make HttpClient Nullable (51:02)
- Sidebar: Tuple (58:48)
- End sidebar (1:00:29)
Stubbing the fetch() Response (1:09:24)
- Sidebar: In-Browser Testing (1:26:37)
Build OutputListener (1:28:00)
Request Tracking (1:55:04)
- Sidebar: Sidebar: Lint Error (2:12:54)
- End sidebar (2:16:50)
Back to the BackEndClient (2:33:51)
Debrief (2:44:23)

Source code

Visit the episode archive for more.

AI Chronicles #5: fetch() Wraps

June 9, 2023

Watch us live every Monday! For details, see the event page. For more episode recordings, see the episode archive.

In this episode...

It’s an eventful episode as we start off with a discussion of event sourcing, event-driven code, event storming, and more. Then we return to working on our fetch() wrapper. We factor our test-based prototype into a real production class, clean up the tests, and add TypeScript types.

Event Sourcing (0:22)
Event-Driven Code (11:11)
Event Storming (23:30)
The Original Sin of Software Scaling (27:23)
Refactoring Events (28:45)
Naming Conventions (32:47)
Java 21 (42:08)
Inappropriate Abstractions (44:25)
Let’s Do Some Coding (53:01)
Design the fetch() Wrapper (56:34)
Factor Out HttpClient (1:17:41)
Add TypeScript Types (1:23:40)
Node/TypeScript Incompatibility (1:39:52)
Clean Up the Tests (1:58:36)
Close SpyServer with Extreme Prejudice (2:05:06)
Back to Cleaning Up Tests (2:11:08)
Debrief (2:32:22)

Source code

Visit the episode archive for more.

People

Internal Quality

Complexity

Slow Feedback Loops

Deferred Maintenance

Lovability

Visibility

Agility

Profitability

Culture Changes

Manager-Led Evaluation

Latest Update

Associate Software Engineer

Software Engineer

Senior Software Engineer

Technical Lead

Staff Engineer (New)

Principal Engineer (WIP)

Associate Manager of Engineering (New)

Manager of Engineering (New)

Senior Manager of Engineering (New)

Feedback

My Solution

The Productivity OKR

Product Bets

A True Measure of Productivity

A Better Measure of Productivity... For Now

How It Was Received

A Fatal Flaw

A Lever for Change

How It Works

Submitted for Your Approval

Full Career Ladder

Associate Software Engineers

Software Engineers

Senior Software Engineers

Technical Leads

Staff Engineers

Principal Engineers

Specialty Skill Sets

Feedback

In this episode...

Contents

Source code

In this episode...

Contents

Source code

In this episode...

Contents

Source code