In 2022, Andrew Sales (Chief Methodologist and SAFe Fellow at Scaled Agile, Inc.) was talking at the Tecnifor SAFe Day. You can find the talk here. His subject was “accelerating flow of customer value”. I want to reflect on this talk and share how the Org Topologies perspective can help improve flow of customer value in SAFe systemically.
This posts invites you to improve SAFe beyond the solutions proposed in this talk.
There is no blame.
In his introduction, Andrew says:
“The question arises: if the (SAFe) framework has flow built-in and the teams and the ARTs are cross-functional, why do we sometimes struggle so much to deliver value seamlessly to our customers?”
He proposes finding the answers in 8 flow fundamentals that will significantly change the work we do and SAFe in the future. Let’s have a look at them one by one.
#1 Visualize and limit WIP
The value-delivery capability of a team reduces as the Work in Progress increases.
Proposed remedial actions:
Create awareness (in the team) by making the WIP visible, and setting the WIP limit.
Reflection:
It’s an excellent idea to create awareness of the problem by making WIP visible. This will engage people in embracing improvement proposals. Awareness and knowledge are key stepping stones toward real change.
The proposed solution of limiting WIP will allow more flow in the team’s work. There will be immediate relief and improvement inside the team. The number of outputs over time will likely rise for this team. Limiting WIP does not move the team's position on the Org Topologies map. There has been no increase in the scope of work, and the scope of capabilities remains the same.
Unfortunately, this solution will cause new problems later in time. How does WP-limiting one team affect the rest of the development system? Will this solution improve the ART, Solution, Portfolio, or Business flow? Chances are 100% it will not.
Russel L. Ackoff, pioneer in systems thinking, says:
“the performance of a system does not depend on how the parts perform taken separately, but on how they perform together as a whole”.
Think about the dynamics: One team will process more items than before (but not more than their WIP limit). However, there is more work they can handle as demanded by the system (the reason for the WIP limit). By setting a WIP limit at the team level, we hide the work from the team. The work is still there and will pile up in a queue. What happens with the flow of work toward other teams in the ART that depend on the delivery of our WIP limited team? They probably will be idle because the work flows slower through the ART. Increasing flow at the team level has a cascading effect that slows down the planned delivery at higher levels (ART, Solution, etc). What happens if we slow down one step of an assembly line?
To optimize flow, we must think beyond the team limit and consider at least the ART level. Setting a WIP limit on the ART backlog will show us the actual system performance at a level where customer value is delivered. This action will surface flow problems at the solution level and above (where the ARTs need to be merged). These are better problems to have. We are no longer looking at the work at the team input level but at a level that binds a group of teams. Moving from the team to the ART level to find solutions is a systemic improvement corresponding to a vertical move on the Org Topologies map.
But will this vertical move solve the team's problem of being swamped with work? Yes, it will. There will be less work in flight in the ART, so the system will be less stressed at the team level, too. But some teams will still be swamped at some point! That’s why we need to monitor the teams for bottlenecks.
#2 Address Bottlenecks
The team has a problem with shared knowledge, collaboration, or discipline. Or there is a hard dependency on external resources (including missing customer feedback).
Proposed remedial actions:
Increase capacity at the bottleneck or replan the work.
Reflection:
Multiple approaches exist to increase the capacity at the bottleneck. The fastest way is to add a person to the team with the missing skill. This increases the bottleneck's capacity, but does it remove the bottleneck? From a team perspective, it does. Is that enough? Depending on how often we need the skill, we might want more teams to have it. This will give us more flexibility at the ART level because more teams can pull that work into their team.
Another approach is learning the new skill through pairing and mobbing. People with scarce skills work in teams that do not have them and transfer their knowledge. This takes time and will reduce the velocity of one or more teams.
These two solutions increase a team's capabilities. They are systemic and can be represented by a move to the right on the Org Topologies map.
Replanning the work is deferring the problem. This does not solve the single-skill problem but re-enforces it. There is a guarantee that the problem will reoccur in future iterations. That’s because the availability of the skill is unchanged. Another aspect is the delivery of customer value. We assume work has been prioritized on value in a queue (backlog). Replanning means teams will work on less valuable items because of the unavailability of an internal resource. We must ask ourselves if that’s the best option from the customer’s perspective.
Another effect of spreading the work in time and resequencing it is that it disrupts the flow at the ART level. It will distribute the work packages more asynchronously across the teams. There will be more handoffs and rework, causing delays. If you want to learn how we can handle that problem, continue to episode #3: minimize hand-offs and dependencies.
#3 Minimize hand-offs and dependencies
There will always be handoffs and dependencies when I have to wait for another person to finish their work before starting.
Proposed remedial actions:
Visualize the dependencies discovered during PI planning to create awareness. Reorganize around value to minimize handoffs and dependencies. Reorganize the ART and teams to reduce handoffs. Redefine the system designc, change the way the work is split between the teams or redesign the teams themselves.
Reflection:
Not all dependencies are a terrible thing to have. We want to avoid dependencies created by deep specialization. And we want to prevent asynchronous dependencies. Instead, we want synchronous dependencies, as they are opportunities for collaboration.
The proposals for solving the dependency problems make sense. There are two ways to go about this: moving left/down on the Org Topologies map or right/up. Going left down is choosing the path of specialization. Applying Team Topologies™ will help you achieve this by creating groups of teams dedicated to specific work. Accompanied by strategies to reduce the communication between these groups will give more control over the dependencies.
The up/left move is more challenging and less intuitive. This requires to improve the way we are organized around value.
Given that we know which solutions or systems we deliver to our customers, we need to redefine our ARTs to optimize them for the least number of unwanted dependencies. Org Topologies proposes redesigning the system so that all teams can jointly deliver value at the ART level. Inside the ART, we want all those teams that need each other almost all the time to provide value. We must ensure these teams are as cross-functional and cross-component as possible. The magic happens when we let them pull work directly from the ART backlog as one big team of teams and remove the separate team backlogs. Since all teams will work on the same problem simultaneously, they must collaborate and coordinate the work.
This might sound like an impossible move toward chaos, but rest assured, many companies have tried it before you. It makes sense to support this approach with #1: visualize the work (at the ART level) and #2: limit WIP (on the ART backlog to create focus and collaboration). Creating a synchronous team of teams inside an ART requires applying various practices we call “Elevating Structures”. Note that the interface of the outside world with the ART remains unchanged.
When teams collaborate to complete ART items during the sprint, we create a fast feedback loop between them.
#4 Get fast feedback
We have technical feedback (building the thing right) and customer feedback (building the right thing). Broken feedback loops will disrupt the system's flow and create rework and technical debt.
Proposed remedial actions:
Investigate which (kind of) feedback is missing. Shift-left the feedback loop, practice CI-CD.
Reflection:
Technical feedback can be improved by encouraging close collaboration between members of different teams. This reduces the chances of building something wrong. Another way of obtaining this is using communities of practice, where (technical) approaches can be shared across teams and ARTs. A third way we encourage improving technical feedback loops is automation (AI, test automation, CI, CD, etc.). Improving a team's technical capabilities is a move to the right on the Org Topologies map and closes the capability gap.
We can also simplify the SAFe system implementation and see if it has design elements that slow down feedback loops. Such elements might include handoffs (specialization or authorization), queues (describing the same work at different levels), groups of teams not working in the same cadence, etc.
The customer is not interested in inspecting at the team level. There are way too many teams, and the output delivered by a team is not what the customer wants. The customer wants integrated value supplied at the ART level or higher. This will make a visit to a Review time well spent.
A SAFe system tries to deliver value every three months. This gives you four windows per year for inspection. That’s not a lot. We encourage you to increase the inspection rate to prevent the incurred costs of not building the right thing. The teams must integrate their work more frequently since the customer wants to review at the ART level or higher.
Closing the (value) gap between the individual teams and the customer (with frequent/continuous reviews at the ART or Solution level) is a vertical move on the Org Topologies map. Delivering more frequently becomes more manageable if we work on smaller value items. To learn more, see the next section #5, on working in small batches.
#5 working in small batches
Working in large batches has undesired effects such as feedback delays, rework, and increased variability in predictability.
Proposed remedial actions:
Try to match the size of the work realistically to the cadence (do the stories fit in the iteration, and the features in the Pis?). Can we process smaller batches? Do we need different kinds of batches (release batches, integration batches, customer feedback batches)?
We should also change our processes (planning and execution) if needed to allow for the processing of smaller pieces of work. Prioritize enabler work to improve automation.
Reflection:
Most of us will agree that smaller batches are better. However, an important aspect should not be forgotten: How do we make smaller batches? Andy proposes creating different delivery types: release, integration, and customer batches. This will slow down the customer feedback loop (see section #4) because we decrease the amount of delveries to the customer. The same goes for prioritizing enabler work, but this choice can be justified because it’s slowing down to accelerate.
Org Topologies proposes to make a vertical move to solve the batching problem. This move implies that we organize around value delivery by making the teams work on issues at the customer level. When teams can work on a Business Area Problem, i.e., a customer journey (or possibly an epic in your terminology), we can slice the work without hindering customer feedback speed by slicing vertically. This means the teams still deliver value end to end, but not as broad as we would want to. To match the amount of work with the available capacity, consider vertical slicing using personas or delivering MVPs that can be incrementally enriched.
Moving up the map by broadening the scope of work offers other advantages such as limiting the queue lengths.
#6 Manage queue lengths
Queues represent committed work, and a queue represents wait time for the customer.
Proposed remedial actions:
To keep focus, we need to work hard to never commit beyond the current PI. Too much commitment, in combination with long queues, can cause us to lose our ability to be agile. In other words, we need to be careful not to plan too much and stay focused on what is promised. Reduce informal work by ensuring teams only work on work that is on their backlogs.
Reflection:
SAFe tries to create a predictable software delivery process by making teams commit to their planning. This is an understandable need from the perspective of an enterprise where hundreds of teams contribute. Unfortunately, creating predictable planning is impossible. That’s because software development is complex, i.e., too many unknowns exist. Advising the teams to be conservative in committing to planning to allow for Agility is the exact opposite of what Agile is trying to achieve. I was saddened to learn that SAFe does not encourage people to go that extra mile but promotes staying in the “predictable, secure world of committed work” to prevent getting on the “slippery slope of forecasting.”
Reducing queue length is sensible because shorter queues improve agility. How does that work? The shorter the queue, the closer we are to “the now,” the faster we can respond to an opportunity or request. If the queues of teams are filled for the coming three months, and something important pops up, three months is our fastest response time. And this seems to be good enough for most enterprises. In Org Topologies, this refers to the vertical axis, describing the switching cost: How much effort (money) does it cost to unwind current work and spin up new work.
When we approach the challenge of managing queue lengths systemically, we think not only of limiting the size of (team) queues but also of limiting the number of queues in the system. Backlogs are queues. How many Backlogs do you see? They all need to be maintained and managed. Can we remove backlogs? What will be the effect of having fewer backlogs on the speed of our system? Having fewer backlogs means fewer people need to align and manage their worklists. There will be fewer handoffs, hence more speed.
And which backlogs can be removed? If we can have teams collaborate in the same cadence on the same list of work, we will create opportunities for collaboration (see section #3, minimizing handoffs and dependencies). In other words, teams would all work from the ART Backlog, and each team would create a Sprint backlog directly from that.
I have observed that most Team Product Owners are unwilling to let go of their position. This is stopping us from making a vertical move to become a team of teams. These valuable people are vital to the system, but they are wasted on team queue management. They can deliver more effective value inside the team as requirement engineers or subject matter experts. How much prioritization can they decide on anyway?
As proposed in section #3, a team of teams coordinates the work by themselves. This concept can be viewed as a revolution or as not so different from how we (informally) already do it. I was surprised to see how resourceful people are to make SAFe work.
#7 Optimize time in the zone
Being in the Zone means to focus on the complex work we do. We want to reduce context switching.
Proposed remedial actions:
Optimize meetings and events and question their efficiency. Keep WIP low to reduce context switching. Use collaboration patterns like pairing and mobbing.
Reflection:
Similar to what was proposed in the previous section #6, we should not only look at the efficiency of meetings and events but also question their existence. Why is this meeting needed? What happens if we stop having this meeting? In which (Scrum) event does the subject of this meeting belong? The large number of roles and levels in the enterprise model requires many meetings to keep everybody current. Scattered information and split responsibilities create a need for meetings as well. We need to address these root causes to allow the workers to stay in their zone.
The assumption that lowering the WIP will reduce context switching implies that a team backlog contains many unrelated pieces of work. Moving the team upward on the Org Topologies map increases the scope of work from task to feature or from feature to business problem). This will give them a consistent context to work in. Task switching inside the same context is inevitable and mostly not experienced as disturbing.
Mobbing and pairing should be the default modus operandi, efficiently removing bottlenecks (section #2), minimizing handoffs and dependencies (section #3), and providing fast feedback (section #4).
Another great way of improving time in the zone is to eliminate waste caused by legacy policies and practices.
#8 Remediate legacy policies and practices
Old patterns and ways of working can hinder the proper functioning of the SAFe implementation.
Proposed remedial actions:
Identify old patterns and practices and address them.
Reflection:
One way to address them is to teach people who adhere to these old habits systemic thinking using the Org Topologies map. The map will help clarify the effect of any existing or new pattern, habit, and practice. It will allow you to assess if they are helping to move in “the right direction.” The direction this is going depends on the context, but in most organizations, this is a movement toward business agility. Our visual language will help you engage in a deeper dialogue of systemic organizational design rather than having a discussion on what SAFe does or does not prescribe.
© 2024, Roland Flemm and Alexey Krivitsky
Comentarios