Useful engineering metrics and why velocity is not one of them

Here’s my horoscope for today:

Things should improve for you as the day progresses, Taurus. You shouldn’t depend on something that may not pan out the way you want.

As you can see, it’s useless, just like your team’s velocity metrics and burndown charts.

Velocity metrics are as loathsome as the horoscope because neither provides any insight on why something went wrong or how to fix it. Moreover, if only you squint your eyes hard enough, both burndown charts and the horoscope will show you whatever you want to see.

Stop and think about it. Besides telling you that “velocity is low,” what else does a burndown chart reveal about your team’s bottlenecks, problems, and inefficiencies? Nothing.

Low velocity, just like retrograde Mercury, can explain anything you want. Because velocity gives people absolutely no insight into the team’s problems, managers can come up with whatever reason to justify any unproductive decisions they already had in mind, like adding resources to an inefficient system or telling folks to “work harder” and develop a greater “sense of urgency” — words as vague as today’s horoscope.

To summarise, velocity is a terrible metric because it offers no predictive power and doesn’t help you make decisions.

In this post, I’ll expound on the metrics and visualizations you should use to help you improve your processes and make your team more productive and predictable.

I’ll start by modeling an engineering team as a queueing system. Then, I’ll explain the four core metrics which impact the queueing system’s performance and how they relate to each other.

Then, I’ll turn those metrics into charts to demonstrate how you could monitor your system’s performance over time and spot problematic patterns at a glance.

The third section of this post covers a few other granular metrics and visualizations that offer predictive power and help managers spot inefficiencies.

At the end of this post, there’s also a short warning to help folks avoid misusing these metrics and visualizations and a concise summary for you to share with your team.

The system and its metrics

The best way to understand which metrics best represent an engineering team’s performance is to model it as a queueing system. In this system, tasks come in on one end, and software comes out on the other. The team itself is the processing mechanism in the middle.

In an engineering system, tasks come in on one end and valuable software comes out on the other

To monitor the performance of this system, we must attach metrics to its parts. That way, we’ll understand how each part performs and how they influence one another.

Let’s start by attaching metrics to the right and left sides.

On the left side, where tasks come in, we have the arrival rate, which represents arrivals over time. On the right side, where tasks come out, we have the departure rate — also called throughput, — which represents departure over time.

The arrival rate determines how quickly tasks arrive on the left side, and the throughput determines how quickly tasks depart on the right side

Whenever the arrival rate exceeds the system’s departure rate, the number of items in the system — its WIP (work in progress) — increases. Therefore, queues form. When queues form, each item task takes increasingly longer to be done.

The larger a system's queue, the longer teams will take to get to the queue's end

The greater the difference between the arrival and departure rates, the more dramatic the rise in WIP will be. Consequently, the rate at which cycle times increase will be greater too.

Another way to visualize this phenomenon is through a cumulative flow diagram. This diagram shows the cumulative number of tasks entering and leaving the system over time.

The cumulative flow diagram is a helpful chart because it reveals an enormous amount of information about the team’s performance at a glance.

In that chart, the bottom slope represents the average amount of tasks delivered over time (throughput), while the top slope represents the average number of tasks entering the system over time (the arrival rate).

As the average arrival rate increases, the difference between the top and bottom slopes increases more drastically over time. Consequently, the amount of work-in-progress in the system, represented by the vertical distance between bands, grows faster. In turn, average cycle times, represented by the horizontal distance between bands, also increase more quickly.

The chart below shows how these metrics change over time when the average arrival rate exceeds the average completion rate.

When arrival rates are greater than departure rates, WIP increases over time, increasing the average cycle-times

A manager who wishes to make their team’s cycle times more uniform can try matching the rate at which tasks enter the system to the rate at which they leave.

When the average arrival rate equals the average departure rate, WIP remains constant, and so do cycle times

That way, WIP remains constant, and so do cycle times.

By being aware of the relationship between these four metrics, managers know how their team will behave as the variables change. That way, they know which variables to influence to obtain regular cycle times, making their teams predictable and productive.

Another way to describe this behavior is by using Little’s Law, which establishes a clear relationship between these variables.

{Avg.\ Cycle\ Time} = \cfrac{Avg.\ Work\ In\ Progress}{Avg.\ Throughput}

This simple formula summarises the behavior you’ve just seen in the cumulative flow diagram.

Breaking down the system into multiple sub-systems

For an engineer to deliver a task, they don’t simply type away a bunch of code and send it straight to production. Instead, they write some code, have someone review it, deploy the code to a staging environment, validate it, and only then send it to production.

Once again, we can model that process as a queueing system. The difference is that we’re now dealing with a queueing system composed of multiple queues feeding one another.

An engineering system modeled as a system composed of multiple queues feeding one another

The advantage of modeling our engineering system as a multi-queue system is that we can still use the same metrics to analyze its behavior. Furthermore, we can still use cumulative flow diagrams to monitor its performance.

Let’s go ahead and plot a cumulative flow diagram for a multi-queue system. This time we’ll break down the “in progress” band into multiple other bands representing the various queues, which are the different parts of our process.

A cumulative flow diagram for a multi-queue engineering system

Despite having broken down the cumulative flow diagram’s bands, the same principles apply. This time, however, we have much more granular information about how the different parts of the system behave.

If we want to know the number of items that need reviews, we can look at the vertical distance between the “development” and “review” bands, for example. Similarly, we can look at the horizontal distance between those bands to determine the approximate average time items take from “development” to “review.”

The cumulative flow diagram's properties remain the same in spite of us having broken it down into multiple bands

In addition to the visual representation of metrics remaining the same, the dynamics between them persist.

Assume, for example, that the rate at which tasks enter the review stage is greater than the rate at which they are deployed to a staging environment. In that case, the diagram’s red band will bulge, revealing an increase in work-in-progress and, consequently, in average cycle time.

When tasks become ready for review more quickly than they are reviewed, WIP increases, and, consequently, cycle-times elongate

This dynamic between metrics once again reveals how important it is to match arrival and departure rates. This practice applies both to the system as a whole and its different parts.

This rate-matching is the precise reason why Kanban worked so well for Japanese manufacturers during their “economic miracle.”

For example, a car manufacturer that uses Kanban would only send parts to the “painting” stage once the folks in the painting stage send back a message saying, “we have capacity to paint more parts.”

By sending these signals from the end of the queueing system to its beginning, the manufacturers could rate-match the different parts of their process, increasing their predictability and reducing WIP, which is particularly damaging when you have hundreds of pieces sitting on the factory floor.

That’s the theory behind Goldratt’s Theory of Constraints. A management paradigm focused on identifying and iteratively fixing these bottlenecks so that you’re constantly adjusting segments’ departure or arrival rates to match one another.

In the software industry, sometimes, we have similar stationary bottlenecks. These may occur when a company relies on manual testing instead of automated testing. In that case, we can identify the bottleneck and fix it to increase our testing segment departure rate.

Other times, our bottlenecks are temporary because we’re not reproducing the same work repeatedly. Instead, we’re creating the recipe for new workpieces, which implies variability. For that reason, besides knowing how to code, engineers need to understand how to test and operate their software. That way, we can dynamically allocate resources to fill bottlenecks at different process stages.

Some teams may not be aware of those dynamics, but, as humans, we are good at naturally finding ways to improve our processes. That’s why we came up with a culture of automated testing and principles for developing a “DevOps” culture.

In any case, when managers are aware of the principles behind these metrics, they can more easily see where the bottlenecks are and come up with creative solutions to improve their processes rather than simply adopting automated tests or instilling a “DevOps culture,” which may not apply to all cases.

Putting it all together.

An engineering team can be modeled as a queueing system. In such a system, tasks come in on one end, and software comes out on the other.

There are four metrics you can use to measure the performance of such a queueing system:

Arrival rate — the rate at which tasks arrive in the system
Work in progress — the number of items in progress at any given time
Departure rate or throughput — the rate at which tasks leave the system
Cycle time — the time it takes for tasks to leave the system

The amount of work-in-progress will increase whenever the system’s arrival rate exceeds the system’s departure rate. This increase in WIP represents the growth of a task queue. When queues form, cycle times elongate because it takes increasingly longer to get to the end of the queue.

One excellent way to visualize a system’s performance over time is to use a cumulative flow diagram, which plots the number of tasks in progress and completed over time.

In a cumulative flow diagram, the top slope represents the arrival rate, and the bottom slope represents the departure rate (or throughput). The vertical distance between bands represents the amount of work in progress, and the horizontal distance between them represents the approximate average cycle time.

To analyze the system’s performance in more detail, you can break down your queueing system into multiple connected queueing sub-systems. Each of these sub-systems feeds one another. You can perform such a breakdown also in your cumulative flow diagrams by breaking down the “in progress” band into multiple other bands representing the different stages of your process.

When a sub-system’s throughput exceeds the next, there will be a mismatch between arrivals and departures in the downstream system. Therefore, queues will form, and cycle times will elongate in the downstream system. That’s why it’s essential to rate-match your processes.

To rate match processes more easily, you could cross-train engineers to work on different parts of your process. That way, you can dynamically direct efforts towards bottlenecks, which tend to be mobile in stochastic processes like software development.

Such mismatch in arrival and departure rates between sub-systems will manifest itself in a cumulative flow diagram through the bulging of a particular band. That bulging band represents the process which can’t keep up with the upstream sub-system’s arrival rate.

Besides bulging bands, there are two other problems that a cumulative flow diagram may help you spot:

Large batch transfers or broken processes — when lines flatten
Disappearing bands — when a sub-system starves

Just because a cumulative flow diagram seems okay, that doesn’t mean your team is performing well.

The cumulative flow diagram shows quantities of items in different parts of the process but not exactly what those items are. Therefore, it may be the case that your arrival rate matches your departure rates simply because some items are being expedited while others are left to rot and never get done. That phenomenon is known as “flow debt”.

You can use a cycle-time scatterplot to detect _flow debt_and look for outliers. Alternatively, you could use a cycle time histogram and look for a bimodal distribution.

Notes on good metrics and a few words of caution

A good metric has three characteristics:

It’s not a target — Instead of determining an end state, it helps determine whether you’re going in the right direction.
It’s actionable — It helps managers make decisions and intervene in a system to generate improvements.
It has a clear relationship to other metrics with characteristics 1 and 2 — You know how other metrics will change as you pull different levers.

I consider the metrics for which I advocate in this post to be good metrics because they have all three characteristics. Velocity, on the other hand, has none of them.

Despite considering these metrics “good,” I must urge managers not to:

Turn metrics into targets — As Goodhart’s law states, when you turn metrics into targets, they cease being a good metric. That’s because people will game the system to make metrics get better at all costs instead of actually fulfilling the organization’s purpose.
Weaponize metrics — As Deming says, “whenever there’s fear, you get the wrong figures”.
Use metrics as a substitute to talk to your team and perform qualitative analysis. — Metrics must serve as an “alarm.” They ring a bell when something goes wrong, but it’s your job to determine how to intervene.

Managers must also be aware that managing exclusively by metrics (management by results) is like “driving a car looking through the rearview mirror”, as W. E. Deming would say.

The past will not necessarily resemble the future. Yet, spotting patterns in past behavior can help you adapt to the future and avoid making the same mistakes.

Wanna talk?

I currently offer mentorship and consulting packages for individuals and startups wishing to ship more software in less time and with less stress. If you’re interested in improving your processes and pipelines, book a free introduction call here. I’d love to help you solve any problems you might be facing or answer any questions you might have.

You can also send me a tweet or DM @thewizardlucas or an email at lucas@lucasfcosta.com.

Useful engineering metrics and why velocity is not one of them

The system and its metrics

Breaking down the system into multiple sub-systems

Other problematic patterns on cumulative flow diagrams

Flat lines

Bands that disappear

Other helpful visualizations for spotting problems

Putting it all together.

Notes on good metrics and a few words of caution

Further reading

Wanna talk?