Opinionated Guidelines for Any Codebase
Table of Contents
Credit for these ideas goes to Domain Driven Design, Martin Fowler's work and ARF, a pattern developed (but not super well documented) by my colleague.
First, a quick note. These principles do make several assumtions about the domain, and are written in a way that targets primarily backend web software. I don't expect these principles to always work well for code running very close to the hardware, or in new or obsure domains. I would encourage anyone reading this to "chew the meat and spit out the bones."
An Uncompromising Paradigm
These guidelines describes a simple way to structure your codebase, that is both uncompromising in it's rules, yet flexible enough for most domains.
It can be simplified into 3 points:
- Code relating to business logic should be comprised of pure functions
- All IO (data) logic and models should be isolated from other IO logic or business logic
- Top level orchestration logic is flat and unobscured
But why?
Writing code is hard. Designing codebases that both accomplish the task, and are easy to understand is hard.
But what's the reason for this instability? When you think of the real world trades, they seem to have already figured out the best ways to build a wheel. Not many people have become frustrated with the design and set out to invent a new one.
So why is it then that every day, software engineers must deal with all of these iterations of reinventions of the wheel?
Why choose a set of principles to abide by?
New code must both accomplish the desired outcome, and be easy to comprehend and work with as a team of humans.
It takes enough cognitive load to consider everything that must be done to create a new service by itself, let alone decide on what principles to follow when it comes to architecture. A simple set of guidelines can help immensely off load this in the same way the python formatter black
makes all the formatting decisions for us.
Benefits to the recommended principles
"Good fences make good neighbors"
These principles dictate most importantly that that good abstractions are built around the core domain. Giving your business logic a little love in the beginning will allow the business logic to love you back in the end.
The benefits of the proposed guidelines can be summed up into 3 things: Readability, Testing and Conducive to change
Readability
The orchestration layer reads like a script or recipe, and business logic is boiled down to the core logic, making it easier to digest.
Testing
The primary benefit we achieve with the proposed principles is ease of unit testing and integration testing for different but similar reasons.
When the business logic is written as a set of pure functions, there are no side effects, that means mocking is not strictly necessary. Use cases can be succinct and run without reliance on any database. Small and easy unit tests means they're easier to write, that means use cases will tend to be tested more thoroughly resulting in a more comprehensive testing suite.
When the top level "orchestration logic" is mainly flat (meaning minimal control flow), and all of the important use cases are tested with unit tests, this unlocks potential for solid integration testing. Integration tests must only verify that the data layer is being evoked in an appropriate way, and that the data is making it through the business logic and returned. This means less integration tests are needed, and this results in a quick test suite.
Conducive to change
Any code which cannot be scrapped and rewritten again with confidence should be scrapped and rewritten again. (I'm being melodramatic)
My point is, with good enough testing, anything can be changed with confidence. This also means iterative adjustment and addition is simple as well as long as the guidelines continue to be adhered to.
Data logic when well abstracted can also be swapped out with new implementations.
Style Preference
There's a lot of flame wars, and ways programmers tend to divide themselves. What editors to use, what distro you're repping, "Functional vs Object Oriented", cloud services, reactive frameworks, web frameworks, databases, the list goes on and on.
In order to mature as an industry, one big thing we're all going to have to do is to drop these hot debates. Ego is the biggest contributer to code stink I have seen in my experience.
You see this especially prevalent when someone has been "burned" by a particular poorly implemented pattern. What often happens is they become disallusioned and forget any positive aspects of the system, and start using the complete opposite of that pattern.
As it turns out, some patterns are better than others at addressing certain problems. Functional programming is great for writing logic and handling streams of data. Do you know what it's not good at? Functional programming sucks at interfacing with external systems, like caling APIs or working with the database. And there's a reason for this, becasue Functional programming specifically strives to elliminate side effects, when in essence that is literally what a database is. The database is like one big side effect.
So how are these guidelines helping? Well, this holistic approach does not prescribe any sort of framework or style. In fact it encouranges the mixing of styles depending on the context your code is in. If you find that your orchestration logic is really well suited to a procedural style, then forget about your declarative ideals for a minute and write some subroutine calls.
Guidelines
Categorizations of Logic
There are 4 types of logic/layers to define: Data, Business, Orchestration and Presentation, each with different concerns and responsibilities.
Data
Data logic is logic concerned with external systems. These purely logistical concerns include pretty much any form of IO: SQL, ORM, external apis, file IO, etc.
Data logic serves the needs of the business, but does not execute any business logic on its own. This relationship is purely transactional, with a strong or loose interface depending on the desired coupling. This coupling or lack thereof should be optional given some IO may be more inherent to the problem space. Others may specifically require less commitment.
Depending on the use case, using methods on objects can work pretty well to encapsulate the external logic to tell a better story for the orchestration layer. (you see this with ORMs or API clients).
Note: Data logic of one kind must be isolated from data logic from another. i.e. no mixing ORM logic (or models!) with api logic. If coordination is needed between this data logic, it can be done in the Orchestration layer, and any pure business logic can be separated into the business logic layer.
Business
AKA the Domain layer
Business logic represents code concerned with the needs and the intrinsic complexity that comes with the business. This is the core logic that must work according to the business specifications. Because of the inherent coupling to the business, this code should be isolated from any external systems and concerns not inherent to the business. This should materialize as a set of pure functions which are in charge of the transition of data from one form into another. It is expected that the business logic scales linearly along with the complexity and scope of the features and value shipped.
Under no circumstances should Data logic enter the realm of Business logic. This inherently creates side effects within the business logic or at least anything relying on that logic will no longer be considered a pure function. Note that this includes working with objects tightly coupled to the external systems, or especially interacting with objects that with methods which interact with external systems. One good exception would be logging.
Business logic works best when the code is purely functional, being simple to unit test without mocks. Many use cases can be run without any load on the DB or other external systems. OOP works very poorly here in my experience (let me know if you find a case where OOP works well here). Although there is no need to specifically forbid OOP within business logic, as long as this does not result in side effects outside of the course of operation.
Orchestration
AKA the Service layer
Orchestration logic should be the smallest layer of them all, concerned only with working with each part to orchestrate a process. Think of it like snapping together legos, or writing a script. Orchestration logic controls the major moving parts working together to produce a result or perform a task.
Orchestration can include data or business logic, as it's inherently coupled with both. In fact, leaving simple in data logic can help with readability. In essence, this function should tell the story of the data from beginning to end. If either business or data logic grows too large within the Orchestration layer, it must be abstracted out into the respective layers.
Presentation
Presentation logic is represented in an additional layer of any system, also known as view logic. This logic can be treated similarly to the data layer. It is concerned with external systems, but the dependency is opposite. Presentation logic should avoid dealing with data/business logic directly and pass down to the orchestration layer as soon as possible. Although this is not strictly forbidden.
Categorizations of Data
Similar to logic, data can take various forms which couple to concerns more or less than others. Arguably how we define data is more important than the structure of logic or layers of code. It can be more difficult to do this though, since data is the bloodstream of any codebase, and tends to be shared more than we intend. I’d like to define just 2 types of data here.
Data models
Data models are the unique data logic specific models for each respective data logic module. These models are strongly coupled to the data logic, and therefore should never be used directly in Business logic. Similarly, data models of one kind should not be used directly with data logic of a different kind. Data models tend to be conducive to OOP, so methods can be a good way used to operate on the data.
Domain models
Domain models provide a domain-specific protocol for working with data within the business logic. If any data must be worked with business logic or other kinds of data logic, corresponding domain models should be used. These models allow the code to work with and transform data without being tied to any specific external system. Domain models should preferably be made up of immutable data structures only. Mutation can often times increase complexity and cause implicit side effects. Like mentioned earlier, OOP is not forbidden and can be used as long as it doesn't result in side effects outside of the operation.
Domain models can and should be named intentionally to be assist with building the use case or story (See domain models on wikipedia)
Domain models can feel excessive if always used. It’s recommended to use them only when the data cannot be extracted from the data models cleanly. For example, there’s not much need for a “company” domain model unless the domain is directly concerned with companies, because we can extract all we need (the company id, name etc) directly off of the model in the orchestration layer. Domain models only really become a need when the data is more than just a single record with basic field types.
An Example
Django doesn't always make it easy to separate ORM, templating and business logic. For this reason, these guidelines aren't conducive to django templates, and would prefer a reactive frontend framework.
An example of a django app structure is something like this
app ├── urls - Presentation layer ├── views - Presentation layer ├── service - Orchestration layer ├── types - Domain models ├── logic - Domain layer ├── db - Data layer ├── models - Data models ├── api - Data layer └── api_types - Data models
Example Details
It's encouraged to split out into other django apps if "vertical slices" appear
app.types
would contain the classes to contain your logic for the domain layer. Dataclasses work well, but nametuple
, TypedDict
or just regular python classes work well too.
Similarly app.api_types
is for types which will be sent to or from api calls. This isn't necessary, but It's nice to define what these are, rather than guessing at whatever point in code the data is extracted, running a higher risk of ~KeyError~s. An added bonus is you could link to documentation in the code as a reference. To reiterate, this may be completely unnecessary if you're only calling a couple trivial apis, so do what makes sense for your project here.
app.api
defines methods to make api calls. Most of the time these methods should accept either basic values or domain models, and be responsible for mapping to app.api_types
. The reason being that if app.api
or app.api_types
isn't responsible for controlling the protocol, some other module is, which means they're coupled.
app.db
is similar to app.api
in that it should usually accept simple values or domain models, then return domain models. There's nothing wrong with returning ORM models from these methods, as long as they're only used in the orchestration layer and don't make their way into other data or business logic.
app.logic
can really be named anything depending on the logic, and if there is any substantial logic there, it should be split into multiple modules and maybe apps. This represents the business logic, and it's a bit more strictly defined. It's similar to the last two data modules, except it must not incur any side effects.
app.service
is the orchestration layer which ties everything together. This layer should be very small. It consumes methods in app.api
, app.db
and app.logic
to perform tasks or generate results.
Then app.views
and app.urls
is what's left, they're just the presentation layer which ties into app.service
(it can tie in directly to app.db~/~app.models.
for simple views, or DRF views, since DRF is handling the logic).
A quick note on Django Rest Framework. DRF works well for simple CRUD applications, but most of the time we're not building simple CRUD apps, and there's some nuance there. This should be identified early, and we should try to avoid turning our Serializers into mega classes that run all of the api/db and logic.