How to use event storming to achieve domain-driven design
At first, you should read this article: https://abbasimhoseinit.medium.com/domain-driven-design-and-microservice-boundaries-63e7253e41b6
Domain-Driven Design is a powerful methodology for analyzing both the whole system-level (called “strategic” in DDD) as well as the in-depth (called “tactical”) composition of your large, complex systems. We have also seen that DDD analysis can help us identify fairly autonomous subcomponents, loosely coupled across bounded contexts of their respective domains.
It’s very easy to jump to the conclusion that in order to fully learn how to properly size microservices, we just need to become really good in domain-driven analysis; if we make our entire company also learn and fall in love with it (because DDD is certainly a team sport), we’ll be on our way to success!
In the early days of microservices architectures, DDD was so universally proclaimed as the one true way to size microservices that the rise of microservices gave a huge boost to the practice of DDD, as well — or at least more people became aware of it, and referenced it. Suddenly, many speakers were talking about DDD at all kinds of software conferences, and a lot of teams started claiming that they were employing it in their daily work. Alas, a close look easily uncovered that the reality was somewhat different and that DDD had become one of those “much-talked-about-less-practiced” things.
Don’t get us wrong: there were people using DDD way before microservices, and there are plenty using it now as well, but speaking specifically of using it as a tool for sizing microservices, it was more hype and vaporware than reality.
There are two primary reasons why more people talked about DDD than practiced it in earnest: it is complex and it is expensive. Practicing DDD requires quite a lot of knowledge and experience. Eric Evans’s original book on the subject is a hefty 520 pages long, and you would need to read at least a few more books to really get it, not to mention gain some experience actually implementing it on a number of projects. There simply were not enough people with the skills and experience and the learning curve was steep.
To exacerbate the problem, as we mentioned, DDD is a team sport and a time-consuming one at that. It’s not enough to have a handful of technologists well-versed in DDD; you also need to sell your business, product, design, etc., teams on participating in long and intense domain-design sessions, not to mention explain to them at least the basics of what you are trying to achieve. Now, in the grand scheme of things, is it worth it? Very likely, yes: especially for large, risky, expensive systems, DDD can have many benefits. However, if you are just looking to move quickly and size some microservices, and you have already cashed in your political capital at work, selling everybody on the new thing called microservices — good luck also asking a whole bunch of busy people to give you enough time to size your services right! It was just not happening — too expensive and too time-consuming.
And then suddenly a fellow by the name of Alberto Brandolini, who had invested decades in understanding better ways for teams to collaborate, found a shortcut! He proposed a fun, lightweight, and inexpensive process called Event Storming, which is heavily based and inspired by the concepts of DDD but can help you find bounded contexts in a matter of hours instead of weeks or months. The introduction of Event Storming was a breakthrough for the inexpensive applicability of DDD specifically for the sake of service sizing. Of course, it’s not a full replacement, and it won’t give you all the benefits of formal DDD (otherwise it would be magic). But as far as the discovery of bounded contexts goes, with good approximation — it is indeed magical!
Event Storming is a highly efficient exercise that helps identify bounded contexts of a domain in a streamlined, fun, and efficient manner, typically much faster than with more traditional, full DDD. It is a pragmatic approach that lowers the cost of DDD analysis enough to make it viable in situations in which DDD would not be affordable otherwise. Let’s see how this “magic” of Event Storming is actually executed.
Key Decision: Use Event Storming Instead of Formal DDD
Use the more lightweight Event Storming process instead of formal DDD to discover the main aggregates in your subdomain and identify edges of the various bounded contexts present in your system.
The Event-Storming Process
The beauty of Event Storming is in its ingenious simplicity. In physical spaces (prefer‐ red, when possible), all you need to hold a session of Event Storming is a very long wall (the longer the better), a bunch of supplies, mostly stickies and Sharpies, and four to five hours of time from well-represented members of your team. For a successful Event Storming session, it is critical that participants are not only engineers. Broad participation from such groups as product, design and business stakeholders makes a significant difference. You can also host virtual Event Storming sessions using digital collaboration tools that can mimic the physical process described here.
The process of hosting physical Event Storming sessions starts by purchasing the supplies. To make things easier, we’ve created an Amazon shopping list that we use for Event Storming sessions (see Figure 4–9). It is comprised of:
- A large number of stickies of different colors, most importantly, orange and blue, and then several other colors for various object types. You need a lot of those. (Stores never had enough for me, so I got in the habit of buying online.) • A roll of 1/2-inch white artist tape.
- A long roll of paper (e.g., IKEA Mala Drawing Paper) that we are going to hang on the wall using the artist tape. Go ahead and create multiple “lanes.” • At least as many Sharpies as the number of session participants. Everybody needs to have their own!
- Did we already mention a long, unobstructed wall that we can tape the roll of paper to?
During Event Storming sessions, broad participation, e.g., from subject-matter experts, product owners, and interaction designers, is very valuable. Event Storming sessions are short enough (just several hours rather than analysis requiring days or weeks) that, considering the value of their outcomes, the clarity they bring for all represented groups, and the time they save in the long term, they are time well-invested for all participants. An Event Storming session that is limited to just software engineers is mostly useless, since it happens in a bubble and cannot lead to the cross-functional conversations necessary for desired outcomes.
Once we have the supplies, the large room with a wide-open wall with a roll of paper we have taped to it, and all the required people, we (the facilitator) ask everybody to grab a bunch of orange stickies and a personal Sharpie. Then we give them a simple assignment: to write the key events of the domain being analyzed as orange sticky notes (one event per one note), expressed in a verb in the past tense, and place the notes along a timeline on the paper taped to the wall to create a “lane” of time, as shown in Figure 4–10.
Participants should not obsess about the exact sequence of events, and at this stage, there should be no coordination of events among participants. The only thing they are asked is to individually think of as many events as possible put the events they think occur earlier in time to the left and put the later events more to the right. It is not their job to weed out duplicates. At least, not yet. This phase of the assignment usually takes 30 minutes to an hour, depending on the size of the problem and the number of participants. Usually, you want to see at least 100 event sticky notes generated before you can call it a success.
In the second phase of the exercise, the group is asked to look at the resulting set of notes on the wall, and with the help of the facilitator, to start arranging them into a more coherent timeline, identifying and removing duplicates. Given enough time, it is very helpful for the participants to start creating a “storyline,” walking through the events in an order that creates something like a “user journey.” In this phase, the team may have some questions or confusion; we don’t try to solve these issues, but rather capture them as “hotspots” — differently colored sticky notes (typically purple) that have the questions on them. Hotspots will need to be answered offline, in follow-ups. This phase can likewise take 30 to 60 minutes.
In the third stage, we create what in Event Storming is known as a reverse narrative. Basically, we walk the timeline backward, from the end to the start, and identify commands; things that caused the events. We use sticky notes of a different color (typically blue) for the commands. At this stage, your storyboard may look something like Figure 4–11.
Be aware that a lot of commands will have a one-to-one relationship with an event. It will feel redundant like the same thing worked in the past versus present. Indeed, if you look at the previous figure, the first two commands are like that. It often confuses people new to Event Storming. Just ignore it! We don’t pass judgment during Event Storming, and while some commands may be 1:1 with events, some will not be. For example, the “Submit payment authorization” command triggers a whole bunch of events. Just capture what you know/think happens in real life and don’t worry about making things “pretty” or “neat.” The real world you are modeling is also usually messy.
In the next phase, we acknowledge that commands do not produce events directly. Rather, special types of domain entities accept commands and produce events. In Event Storming, these entities are called aggregates (yes, the name is inspired by the similar notion in DDD). What we do in this stage is rearrange our commands and events, breaking the timeline when needed, such that the commands that go to the same aggregate are grouped around that aggregate, and the events “fired” by that aggregate are also moved to it. You can see an example of this stage of Event Storming in Figure 4–12.
This phase of the exercise can take 15 to 25 minutes. Once we are done with it, you should discover that our wall now looks less like a timeline of events and more like a cluster of events and commands grouped around aggregates.
Guess what? These clusters are the bounded contexts we were looking for.
The only thing left is to classify various contexts by the level of their priority (similar to “root,” “supportive,” and “generic” in DDD). To do this, we create a matrix of bounded context/subdomains and rank them across two properties: difficulty and competitive edge. In each category, we use T-shirt sizes to rank accordingly. In the end, the decision making as to when to invest effort is based on the following guidelines:
- Large competitive advantage/large effort: these are the contexts to design and implement in-house and spend the most time on.
- Small advantage/large effort: buy!
- Small advantage/small effort: great assignments to trainees.
- Other combinations are coin toss and require a judgment call.
This last phase, the “competitive analysis,” is not part of Brandoli‐ ni’s original Event Storming process, and was proposed by Greg Young for prioritizing domains in DDD in general. We find it to be a useful and fun exercise when done with an adequate level of humor.
The entire process is very interactive, requires the involvement of all participants, and usually ends up being fun. It will require an experienced facilitator to keep things moving smoothly, but the good news is that being a good facilitator doesn’t take the same effort as becoming a rocket scientist (or DDD expert). After reading this book and facilitating some mock sessions for practice, you can easily become a world-class Event Storming facilitator!
As a facilitator, it is a good idea to watch the time and have a plan for your session. For a four-hour session rough allocation of time would look like this:
- Phase 1 (~30 min): Discover domain events
- Phase 2 (~45 min): Enforce the timeline
- Phase 3 (~60 min): Reverse narrative and Command Identification
- Phase 4 (~30 min): Identify aggregates/bounded contexts
- Phase 5 (~15 min): Competitive analysis
And if you noticed that these times do not add up to 4 hours, keep in mind that you will want to give people some breaks in the middle, as well as leave yourself time to prepare the space and provide guidance in the beginning.
Introducing the Universal Sizing Formula
Bounded contexts are a fantastic starting point for rightsizing microservices. We have to be cautious, however, do not assume that microservice boundaries are synonymous with the bounded contexts from DDD or Event Storming. They are not. As a matter of fact, microservice boundaries cannot be assumed to be constant over time. They evolve over time and tend to follow an increasing granularity of microservices as the organizations and applications they are part of mature. For example, Adrian Cockroft noted that this was definitely a repeating trend that they had observed during his time at Netflix.
Nobody Gets Microservice Boundaries Perfectly at the Outset
In successful cases of microservices adoption, teams do not start with hundreds of microservices. They start with a much smaller number, closely aligned with bounded contexts. As time goes by, teams split microservices when they run into coordination depen‐ dencies that they need to eliminate. This also means that teams are not expected to get service boundaries “right” out of the gate. Instead, boundaries evolve over time, with a general direction of increased granularity.
It is worth noting that it’s typically easier to split a service than to merge several services back together, or to move a capability from one service to another. This is another reason why we recommend starting with a coarse-grained design and waiting until we learn more about the domain and have enough complexity before we split and increase service granularity.
We have found that there are three principles that work well together when thinking about the granularity of microservices. We call these principles the Universal Sizing Formula for microservices.
The Universal Sizing Formula
To achieve a reasonable sizing of microservices, you should:
- Start with just a few microservices, possibly using bounded contexts.
- Keep splitting as your application and services grow, being guided by the needs of coordination avoidance.
- Be on the right trajectory for decreasing coordination. This is vastly more important than the current state of how “perfectly” you get service sizing.
This was part of my knowledge of reading the Book “Microservices up and Running.”