A case against central platform teams
This article was originally published on my blog — https://kislayverma.com/organizations/a-case-against-platform-teams/
Most technology companies above a certain size start thinking about creating an internal platform team to build/manage systems that are used by multiple teams/products. This is a very high leverage team since they can beneficially impact many products at once and super-charge the organization. However, today I want to put some things that do not work well with internal platform teams, at least in the versions that I have encountered.
Tl;dr — It is better to operate multiple platform teams specializing in their own techno-business domains than to operate a single platform team. Platform Thinking is not about reuse, it is about facilitating evolution, and fixating on reuse destroys that opportunity. Instil platform thinking in all teams and allow self-reliant domains and platforms to emerge organically instead of forcing the issue up front.
What does this team do?
Having a separate platforms team often means that all horizontal concerns start getting pushed to them. Eventually, this team cannot identify its customers and simply ends up supporting many useful but disjoint systems. It is difficult to set goals and a north star for such a company because they are decoupled from the end user “by definition”. The only intent of such a team is to maximize the reuse of systems within an organization without any regard to the purposes of those systems and the expertise needed to run those systems.
In worse situations, other teams have no qualms building reusable components. But when it comes to supporting the reuse in production, respecting SLO and SLAs with other teams, people immediately start looking for a platform team to dump the operations on. The result is an operations laden central team forever doing firefighting because of its poor understanding of what it owns.
As the number of shared use-cases in a company grows, the platform team becomes the repository of all kinds of unconnected components whose business purpose they are disconnected from. Eventually we end up with a “team” whose team members have no idea what the others are working on as each member ends up specializing in some part or the other. We get technical specialists but not subject matter experts in the topics that most impact the business. If business starts to get into trouble, the platform team and its work is often among the first to go on the chopping block. This is because it is very difficult to explain to business how this collection of quasi-experts will help them make money.
The developer gold rush
There is something fundamentally unsound about the popular two-dimensional view of an organization’s technical stack. It biases us to the idea that things at the bottom are more foundational in nature, or more complex. It is certainly true that lower layers support more “scale” than upper layers. All of these things cause developers to rush to join the internal platform team as soon as it is incubated. The work is seen as more prestigious or more central, in some way, to the organization. The fact is that at best it is more technical in nature so developers don’t have to deal with the messiness of the real world, customer facing products.
This represents an organizational challenge in multiple ways. One is just managing who gets to do what and which roles are considered cool (why aren’t non-platform roles considered “business features“ and not “awesome engineering”). The other part is managing expectations in the team that eventually does get formed, Creating the team is easy, but keeping them business/customer focussed is harder than expected. Some many platform teams are mired in a strange mix of technical arrogance and customer detachment, making them far less powerful than they can be.
Other teams aren’t platform teams?
If there is a platform team, then does it mean that other teams aren’t platform teams? With a catch-all team in place, other teams start abandoning platform thinking and start thinking in product silos.
If platformization has value, then it should be the mindset and strategy for all teams and not the domain on one single team. Since all business problems are made up of domain and organization context, it stands to reason that all teams should be producing and operating platformized components. These platforms may be produced on top of others which are owned by other teams. The purpose of a platform or a generic product is not just reuse, but also to model a domain boundary where organizational expertise is centralized. The platform team cannot keep pulling in horizontal components wherever they arise because then they would have to be experts in all aspects of the business. The typical definition of a platform team which owns a bottom tier of reusable things is just not compatible with the way organizations work.
And that leads us to…
Lines of Ownership
The tech stack of a company can be visualized with higher order system abstractions at the top and lower ones at the bottom. Which means that the skill set of operating the architecture can vary substantially as you go down the stack. This gives credence to the idea that the lower layers are somehow “different” from the upper layer and should be managed differently.
When we look at the technical stack, how should the lines of ownership run? Should they run horizontally, along components of similar level of abstraction, or should they run vertically, grouping systems generating end-to-end business value?
If you are running an autonomous, cross functional team (and many many companies claim to do so), does the depth of the stack matter? The very premise of this team is to be able to work at all levels of the stack and in loose alignment with other teams at each level. In such a setup, the idea of a platform team with fixed horizontal charter is meaningless. Every team creates the layers of the platform that it requires, and shares them with others as needed. This keeps a low-overhead alignment process running because a component isn’t created only for reuse, it is created first for use and then gets reused as the need arises.
This kind of ownership also solves the problem of orphaned components which everyone critically depends upon but no one team maintains. Open sourcing is a great idea for writing code but a terrible idea for operations. Software MUST have an operational owner — one team that is responsible for ensuring that it is running as it is supposed to and meeting the benchmarks it is supposed to. Autonomous teams operate what they build, and if we can teach them to build platforms, then we don’t need to explicitly create platform teams.
But with that said, One of the big arguments against autonomous teams is that they often end up doing a bunch of repeated work. This is obviously suboptimal, so how can we minimize this.
Solving the reuse problem
This problem can be minimized by having all problems solved in a platformized manner such that once invented, all teams can benefit from the systems that emerge. There might be short term problems in aligning around launch dates and such, but in the longer run, all use cases of a certain capability or entity start getting centralized in one place.
However, this does not mean that the team which built this platform doesn’t own it anymore. Reuse does not mean separating ownership from creation. A well designed platform is designed to keep the platform team out of the client’s decision cycle (external programmability), so if a notification platform is built by the marketing team and adopted by many others, the marketing team can continue to own and operate it — nothing wrong with that.
Once a reusable thing finds more than 3 customer (Atwood’s law), then it may be time to consider this a horizontal responsibility and move it to a separate team. I don’t say move it to a general purpose platform team, but to a specific team that can specialize in the domain modelled by the component and can commit to evolving it to the benefit of all its customers. E.g. A notifications system can kick-start a new business domain altogether — a little Twilio in its own right. Or it may be only a shared technical capability like Managed Elasticsearch which can be handled by a team of storage/elasticsearch experts in the storage platform team/ES platform team.
The above example shows that new teams and domains can be spawned off from every team’s work at either the same level of abstraction if the team originally owns all levels of abstraction in its work (aka a full-stack team). A new domain is a whole new reusable component in a sense. We grow the organization vertically by layering components, and the layers are built creating reusable components atop each other. We also grow the organization horizontally by adding whole new problem spaces that need solving, and each of these will in turn new opportunities for deepening.
This organic process goes against the idea of a fixed, co-located platform charter. The idea of a “platform team” confuses the idea of ownership and reuse just as it confuses the idea of “depth” of the platform in the stack with domain knowledge. I think it is far better to think in terms of teams in specific domains each developing their own platforms, and moving cross teams and merging as either redundancy or reuse is discovered.
To my mind, the only platform teams on day one are the infra team (managing the hardware stuff) and perhaps the authentication/authorization team, and even these I think of business/tech capabilities rather than reuse/depth-in-the-stack driven teams. All other platforms should emerge from inside product teams. Think cellular division rather than divine hand.