Platform Nuts & Bolts : Enforcing Constraints in Platform Design
What rules should platform architecture enforce, and when should it embrace chaos?
I have discussed earlier on this blog about how there is a profound shift in the way constraints are enforced on business and technology usage as an organization adopts platform architecture. In platform architecture, process driven enforcement systems do not work well — they just get in the way because they put the platform owner in the path of any change that needs to be made.
To achieve the platform’s true potential, we want its users to be able to navigate it independently while the platform owner is able to maintain the operational and functional safeguards needed for a rock solid experience. And we want to be able to do this at scale across any number of users and any volume of usage. And we want to be careful about what we allow and what we don’t, because too much of the former leads to instability and the too much of the latter to rigidity and inaccessibility.
The only way to achieve all these goals is to bake the rules into the system’s usage pattern and make it impossible for them to be broken. The constraints and best practices of the system should not be imposed on top of the platform, they should be a part of the platform and any usage should be defined in their terms. There are two types of constraints that a platform typically enforces — technical constraints and data constraints. We will review both here.
Enforcing Technical constraints
Technical constraints are technical rules (duh!) around the usage of the platforms that allow the platform owner to effectively administer and operate the system. These are typically not very different from the considerations we have while operating technical products, except that we have an additional “per tenant” dimension in all of them (products MAY have multi-tenancy, but platform MUST have it).
The most common technical constraints are:
- Identity management : Identity management in platform systems goes deep. The platform wants to identify its tenants. Tenants want to identify their sub-systems or human users, they might also want to identify specific resources (API, database, queues etc). Typically, identity management ends up being the first dog-fooded mini-platform with the overall platform.
- Authentication/Authorization : Once we can assign identities, we want to enforce that the. users using the system are who they claim to be and they are doing only what they are allowed to do.
- Rate Limiting : This governs the amount of usage of the platform by any tenant. This ensures that the entire platform is not tied up by the surge in traffic from one tenant. We typically use identity management to identify who is using the platform.
- Billing and Charge back : All of the above along with terms of per usage cost result in the billing of a tenant. The same billing mini-platform may be used by the platform to bill its tenants and by tenants to monitor the charge back between their sub-systems. Platforms might enforce rules around unexpected billing spikes.
Enforcing these kinds of constraints is a well understood part of administering a platform since they are all widely use in product domain as well. While I am calling them constraints, we may also consider them technical capabilities in their own right because having them makes the platform more usable — it is not just the owner of the platform who wants a stable system but also the its users.
Enforcing business/data constraints
As I have written earlier, a lot of the daily discussion in our technical teams (rightly) centres around solving business problems. There is almost always a business context in which we are building a technical solution, and this backdrop has a significant impact on the way we think about how the system should behave and what is core to the system. Imagine designing for an employee management application. What are the first things that come to mind?
- Employee age < 200 years
- Salary never negative
- Joining date <= resignation date
- Unique employee id
- Salary payout cycle
This and many more such rules would go through our heads immediately. This is the classic product technology mindset. It tries to understand how the business is to be run, and implement those rules in code. Most of our focus is on ensuring that these rules are never broken. Now imagine that you are building a SaaS employee management platform. We do not know what our clients might want because we do not know the specifics of their business. We are solid on some things (Employees will likely be < 200 years in age and joining date <= resignation date), but not so much on other details. e.g.
- What kind of unique employee id — numeric, prefixed by department name?
- What is the basic employee’s lifecycle : Offered->Joined->Resigned or Offered->Probation->Confirmed->Resigned?
How can we enforce the rules when we do not know many of them? We cannot, and perhaps not surprisingly in the platform world, embracing this chaos is essential to build a viable system. Let’s identify two terms : Domain Context and Organization Context.
Domain context is the overall, high level knowledge of the various rules of any business domain (e-commerce, payments, CRM, Retail Banking). It identifies the invariants of that systems regardless of where or how it is being used (salary cannot be given to a person who does not work with an organization anymore).
Organization context identifies how a specific business operates. It comprises of the entire domain context as well a myriad of additional rules about how one specific organization is set up and conducts its business. Platforms are built with domain context in mind but with little to no reliance on specific organization context. Which means that the only rules they should enforce are the rules that the entire industry complies with. As you might imagine, there are very few such rules. Much of the complexity of writing software comes from the business domain, and platforms cannot police that without becoming coupled with their tenants (which is, of course, the death of the platform).
Some specific example of the kind of data constraints that should and should not be enforced follow.
Platform entities should be built only of domain concepts the values of which can be bound by the platform. Ideally, this limited set of attributes will still allow useful applications to be built on top. However, it is likely that tenants will want to store extra data specific to their use-cases against these entities. There are several ways of doing this, but in none of those ways does the platform ever allow itself to become aware of the details of these data model extensions, nor does it ever try to enforce any sort of validity rules against . The tenant is solely responsible for maintaining these fields.
The platform exposes standard APIs that use standard entity schemas as payloads. Tenants may, however, want to pass in extra information so that it can be relayed further downstream. A platform component should either let these pass through or if it exposes some other standard way of passing extra data, throw an error to force the tenant to use the standard mechanism. In either case, it is not the platform’s business to interpret what the payload is.
Platforms are all about external programmability. They always expose mechanisms that allow tenants to plug-in new workflows (a rule defining how some thing is to be done) or extend core entities (a rule defining what a thing is) with custom metadata. The platform can maintain and enforce these rules. However, the responsibility of defining the rules still rests with the tenant. The key consideration for platforms in this regard is to not get “conceptually coupled” with the use cases of their tenants.
e.g., the platform can expose a schema registry where a tenant can come and defines her schemas for custom fields. Now the platform can enforce the schema, but still in a dumb way. It does not understand “what” the field is, just that its value should fit some criteria.
Giving up attempts to control the tenant’s business workflows and data is one of the most fundamental technical underpinnings of a platform implementation. To the product thinker, this is also the most difficult to wrap her head around. And this is one of the areas where we have to be very very careful when we are exposing hooks for extensions.
Imaging that you are building a database administrator. In a sense you are the owner of the company’s database platform. You can enforce that no queries run for more than some stipulated time, no tables grow beyond a given size, no single IP or set of IPs fires too many queries etc. However, you cannot enforce that teams that use your DB infra as a platform are putting the right values in the column of their tables. Those values are not an invariant of your “database domain” (in the sense that they are valid values in some tables, but maybe not “these” tables). The tenant teams are responsible for ensuring the sanity of their data. This is an example of the platform owner consciously relinquishing control over the tenant’s business.
- Platforms perform policy enforcement via systems and controlled usage patterns.
- Platforms should enforce technical considerations very rigidly since they tie in directly with the ability to operate a reliable system with transparency.
- Platform should enforce business rules loosely because they don’t understand or want to understand the specifics of the tenant’s business.
- They can provide mechanisms to tenant using which tenants can set up rules they want to enforce of their data. The platform can then enforce these rules without taking any responsibility for their veracity.
An interesting fallout of the need to bake the constraints into the system is that it forces our hand on the technical choices in the system. A platform has to be built all at once, in one go, with much of the core functional including the checks and balances. It cannot be delivered piecemeal, because the users do not want an order system that does not do billing, and platform owner does not want to operate an order system which does not have permissions. All aspects of a platform including basic functional completeness and control mechanisms have to be delivered together. This is not to say that we haver to think of all features and capabilities up front, but that even the first version of the platform has to be useful as well stable. We will discuss this more in future posts.
I hope this has helped you get a deeper understanding of the kind of control structure platform system design entails. I’d love to hear your experiences and thoughts about these types of systems — drop a comment below!
If you are interested in further musings on the whats, whys, and hows of building platform architectures, sign up to my mailing list and stay tuned!