In this chapter we’ll introduce the final piece of the puzzle that ties together the Repository and Service Layer: the Unit of Work pattern.
If the Repository is our abstraction over the idea of persistent storage, the Unit of Work is our abstraction over the idea of atomic operations. It will allow us to finally, fully, decouple our Service Layer from the data layer.
Without UoW: API talks directly to 3 layers shows that, currently, there is a lot of communication
across the layers of our infrastructure: the API talks directly to the DB
layer to start a session, it talks to the repository layer to initialize a
SQLAlchemyRepository
, and it talks to the service layer to ask it to allocate.
With UoW: UoW now manages DB state shows our target state: the Flask API now only does two things: it initializes a Unit of Work, and it invokes a service. The service collaborates with the UoW (we like to think of the UoW as being part of the service layer), but neither the service function itself nor Flask longer needs to talk directly to the DB.
And we’ll do it all using a lovely piece of Python syntax, a context manager.
Tip
|
You can find our code for this chapter at github.com/cosmicpython/code/tree/chapter_06_uow. git clone https://github.com/cosmicpython/code.git && cd code git checkout chapter_06_uow # or, if you want to code along, checkout the previous chapter: git checkout chapter_04_service_layer |
Here’s how it’ll look in use, when it’s finished:
def allocate(
orderid: str, sku: str, qty: int,
uow: unit_of_work.AbstractUnitOfWork
) -> str:
line = OrderLine(orderid, sku, qty)
with uow: #(1)
batches = uow.batches.list() #(2)
...
batchref = model.allocate(line, batches)
uow.commit() #(3)
-
We’ll start a unit of work as a context manager
-
uow.batches
is the batches repo, so the unit of work provides us access to our permanent storage. -
When we’re done, we commit or roll back our work, using the UoW
The unit of work acts as a single entry point to our persistent storage, and keeps track of what objects were loaded and what the latest state is.[1] This gives us three useful things:
-
It gives us a stable snapshot of the database to work with, so that the objects we use aren’t changing halfway through an operation.
-
It gives us a way to persist all of our changes at once so that if something goes wrong, we don’t end up in an inconsistent state.
-
It offers a simple API to our persistence concerns and gives us a handy place to get a repository.
Here’s a test for a new UnitofWork
(or UoW, which we pronounce "you-wow").
It’s a context manager that allows us to start a transaction, retrieve and get
things from repos, and commit:
def test_uow_can_retrieve_a_batch_and_allocate_to_it(session_factory):
session = session_factory()
insert_batch(session, 'batch1', 'HIPSTER-WORKBENCH', 100, None)
session.commit()
uow = unit_of_work.SqlAlchemyUnitOfWork(session_factory) #(1)
with uow:
batch = uow.batches.get(reference='batch1') #(2)
line = model.OrderLine('o1', 'HIPSTER-WORKBENCH', 10)
batch.allocate(line)
uow.commit() #(3)
batchref = get_allocated_batch_ref(session, 'o1', 'HIPSTER-WORKBENCH')
assert batchref == 'batch1'
-
We initialize the Unit of Work using our custom session factory, and get back a
uow
object to use in ourwith
block. -
The UoW gives us access to the batches repository via
uow.batches
-
And we call
commit()
on it when we’re done.
For the curious, the insert_batch
and get_allocated_batch_ref
helpers
look like this:
def insert_batch(session, ref, sku, qty, eta):
session.execute(
'INSERT INTO batches (reference, sku, _purchased_quantity, eta)'
' VALUES (:ref, :sku, :qty, :eta)',
dict(ref=ref, sku=sku, qty=qty, eta=eta)
)
def get_allocated_batch_ref(session, orderid, sku):
[[orderlineid]] = session.execute(
'SELECT id FROM order_lines WHERE orderid=:orderid AND sku=:sku',
dict(orderid=orderid, sku=sku)
)
[[batchref]] = session.execute(
'SELECT b.reference FROM allocations JOIN batches AS b ON batch_id = b.id'
' WHERE orderline_id=:orderlineid',
dict(orderlineid=orderlineid)
)
return batchref
In our tests we’ve implicitly defined an interface for what a unit of work needs to do, let’s make that explicit by using an abstract base class:
class AbstractUnitOfWork(abc.ABC):
batches: repository.AbstractRepository #(1)
def __enter__(self) -> AbstractUnitOfWork: #(2)
return self #(3)
def __exit__(self, *args): #(3)
self.rollback()
@abc.abstractmethod
def commit(self): #(4)
raise NotImplementedError
@abc.abstractmethod
def rollback(self): #(5)
raise NotImplementedError
-
The UoW provides an attribute called
.batches
, which will give us access to the batches repository. -
If you’ve never seen a context manager,
__enter__
and__exit__
are the two magic methods that execute when we enter thewith
block and when we exit it. They’re our setup and teardown phases. -
The enter returns
self
, because we want access to theuow
instance and its attributes and methods, inside thewith
block. -
It provides a way to explicitly commit our work
-
If we don’t commit, or if we exit the context manager by raising an error, we do a
rollback
. (the rollback has no effect ifcommit()
has been called. Read on for more discussion of this).
Note
|
We’ve made the UoW part of the service layer, in our folder hierarchy. It is also an adapter, but as we’ll see later in the book, it often takes on more service-layer responsibilities to do with domain events, so the service layer feels like the right place to us. |
The main thing that our concrete implementation adds is the database session:
DEFAULT_SESSION_FACTORY = sessionmaker(bind=create_engine( #(1)
config.get_postgres_uri(),
))
class SqlAlchemyUnitOfWork(AbstractUnitOfWork):
def __init__(self, session_factory=DEFAULT_SESSION_FACTORY):
self.session_factory = session_factory #(1)
def __enter__(self):
self.session = self.session_factory() # type: Session #(2)
self.batches = repository.SqlAlchemyRepository(self.session) #(2)
return super().__enter__()
def __exit__(self, *args):
super().__exit__(*args)
self.session.close() #(3)
def commit(self): #(4)
self.session.commit()
def rollback(self): #(4)
self.session.rollback()
-
The module defines a default session factory that will connect to postgres, but we allow that to be overriden in our integration tests, so that we can use SQLite instead.
-
The dunder-enter is responsible for starting a database session, and instantiating a real repository that can use that session.
-
We close the session on exit.
-
Finally, we provide concrete
commit()
androllback()
methods that use our database session.
Here’s how we use a fake Unit of Work in our service layer tests
class FakeUnitOfWork(unit_of_work.AbstractUnitOfWork):
def __init__(self):
self.batches = FakeRepository([]) #(1)
self.committed = False #(2)
def commit(self):
self.committed = True #(2)
def rollback(self):
pass
def test_add_batch():
uow = FakeUnitOfWork() #(3)
services.add_batch("b1", "CRUNCHY-ARMCHAIR", 100, None, uow) #(3)
assert uow.batches.get("b1") is not None
assert uow.committed
def test_allocate_returns_allocation():
uow = FakeUnitOfWork() #(3)
services.add_batch("batch1", "COMPLICATED-LAMP", 100, None, uow) #(3)
result = services.allocate("o1", "COMPLICATED-LAMP", 10, uow) #(3)
assert result == "batch1"
...
-
FakeUnitOfWork
andFakeRepository
are tightly coupled, just like the real Unit of Work and Repository classes. That’s fine because we recognize that the objects are collaborators. -
Notice the similarity with the fake
commit()
function fromFakeSession
(which we can now get rid of). But it’s a substantial improvement because we’re now faking out code that we wrote, rather than 3rd party code. Some people say "don’t mock what you don’t own". -
And in our tests, we can instantiate a UoW and pass it to our service layer, instead of a repository and a session, which is considerably less cumbersome.
Why do we feel more comfortable mocking the Unit of Work than the Session? Both of our fakes achieve the same thing: they give us a way to swap out our persistence layer so that we can run tests in memory instead of needing to talk to a real database. The difference is in the resulting design.
If we only cared about writing tests that ran quickly, we could create mocks that replaced SQLAlchemy and use those throughout our codebase. The problem is that the Session is a complex object that exposes lots of persistence-related functionality. It’s easy to use the Session to make arbitrary queries against the database, but that quickly leads to data access code being sprinkled all over the codebase. To avoid that, we want to limit access to our persistence layer, so that each component has exactly what it needs, and nothing more.
By coupling to the Session interface, you’re choosing to couple to all the complexity of SQLAlchemy. Instead, we want to choose a simpler abstraction and use that to clearly separate responsibilities. Our Unit of Work is much simpler than a session, and we feel comfortable with the service layer being able to start and stop units of work.
"Don’t Mock What You Don’t Own" is a rule of thumb that forces us to build these simple abstractions over messy subsystems. This has the same performance benefit as mocking the SQLAlchemy Session, but encourages us to think carefully about our designs.
And here’s what our new service layer looks like:
def add_batch(
ref: str, sku: str, qty: int, eta: Optional[date],
uow: unit_of_work.AbstractUnitOfWork #(1)
):
with uow:
uow.batches.add(model.Batch(ref, sku, qty, eta))
uow.commit()
def allocate(
orderid: str, sku: str, qty: int,
uow: unit_of_work.AbstractUnitOfWork #(1)
) -> str:
line = OrderLine(orderid, sku, qty)
with uow:
batches = uow.batches.list()
if not is_valid_sku(line.sku, batches):
raise InvalidSku(f'Invalid sku {line.sku}')
batchref = model.allocate(line, batches)
uow.commit()
return batchref
-
Our service layer now only has the one dependency, once again on an abstract Unit of Work.
To convince ourselves that the commit/rollback behavior works, we wrote a couple of tests:
def test_rolls_back_uncommitted_work_by_default(session_factory):
uow = unit_of_work.SqlAlchemyUnitOfWork(session_factory)
with uow:
insert_batch(uow.session, 'batch1', 'MEDIUM-PLINTH', 100, None)
new_session = session_factory()
rows = list(new_session.execute('SELECT * FROM "batches"'))
assert rows == []
def test_rolls_back_on_error(session_factory):
class MyException(Exception):
pass
uow = unit_of_work.SqlAlchemyUnitOfWork(session_factory)
with pytest.raises(MyException):
with uow:
insert_batch(uow.session, 'batch1', 'LARGE-FORK', 100, None)
raise MyException()
new_session = session_factory()
rows = list(new_session.execute('SELECT * FROM "batches"'))
assert rows == []
Tip
|
We haven’t shown it here, but it can be worth testing some of the more "obscure" database behavior, like transactions, against the "real" database, ie the same engine. For now we’re getting away with using SQLite instead of Postgres, but in [chapter_07_aggregate] we’ll switch some of the tests to using the real DB. It’s convenient that our UoW class makes that easy! |
A brief digression on different ways of implementing the UoW pattern.
We could imagine a slightly different version of the UoW, which commits by default, and only rolls back if it spots an exception:
class AbstractUnitOfWork(abc.ABC):
def __enter__(self):
return self
def __exit__(self, exn_type, exn_value, traceback):
if exn_type is None:
self.commit() #(1)
else:
self.rollback() #(2)
-
should we have an implicit commit in the happy path?
-
and roll back only on exception?
It would allow us to save a line of code, and remove the explicit commit from our client code:
def add_batch(ref: str, sku: str, qty: int, eta: Optional[date], uow):
with uow:
uow.batches.add(model.Batch(ref, sku, qty, eta))
# uow.commit()
This is a judgement call, but we tend to prefer requiring the explicit commit so that we have to choose when to flush state.
Although it’s an extra line of code this makes the software safe-by-default. The default behavior is to not change anything. In turn, that makes our code easier to reason about because there’s only one code path that leads to changes in the system: total success and an explicit commit. Any other code path, any exception, any early exit from the uow’s scope, leads to a safe state.
Similarly, we prefer "always-rollback" to "only-rollback-on-error," because the former feels easier to understand; rollback rolls back to the last commit, so either the user did one, or we blow their changes away. Harsh but simple.
Here’s a few examples showing the Unit of Work pattern in use. You can see how it leads to simple reasoning about what blocks of code happen together:
Supposing we want to be able to deallocate and then reallocate orders?
def reallocate(line: OrderLine, uow: AbstractUnitOfWork) -> str:
with uow:
batch = uow.batches.get(sku=line.sku)
if batch is None:
raise InvalidSku(f'Invalid sku {line.sku}')
batch.deallocate(line) #(1)
allocate(line) #(2)
uow.commit()
-
If
deallocate()
fails, we don’t want to doallocate()
, obviously. -
But if
allocate()
fails, we probably don’t want to actually commit thedeallocate()
, either.
Our shipping company gives us a call to say that one of the container doors opened and half our sofas have fallen into the Indian Ocean. Oops!
def change_batch_quantity(batchref: str, new_qty: int, uow: AbstractUnitOfWork):
with uow:
batch = uow.batches.get(reference=batchref)
batch.change_purchased_quantity(new_qty)
while batch.available_quantity < 0:
line = batch.deallocate_one() #(1)
uow.commit()
-
Here we may need to deallocate any number of lines. If we get a failure at any stage, we probably want to commit none of the changes.
We now have three sets of tests all essentially pointing at the database, test_orm.py, test_repository.py and test_uow.py. Should we throw any away?
└── tests
├── conftest.py
├── e2e
│ └── test_api.py
├── integration
│ ├── test_orm.py
│ ├── test_repository.py
│ └── test_uow.py
├── pytest.ini
└── unit
├── test_allocate.py
├── test_batches.py
└── test_services.py
You should always feel free to throw away tests if you feel they’re not going to add value, longer term. We’d say that test_orm.py was primarily a tool to help us learn SQLAlchemy, so we won’t need that long term, especially if the main things it’s doing are covered in test_repository.py. That last you might keep around, but we could certainly see an argument for just keeping everything at the highest possible level of abstraction (just as we did for the unit tests).
Tip
|
This is another example of the lesson from [chapter_05_high_gear_low_gear]: As we build better abstractions, we can move our tests to run against them, which leaves us free to change the underlying details. |
For this chapter, probably the best thing to do is try to implement a
UoW from scratch. You could either follow the model we have quite closely,
or perhaps experiment with separating the UoW (whose responsibilities are
commit()
, rollback()
and providing the .batches
repository) from the
context manager, whose job is to initialize things, and then do the commit
or rollback on exit. If you feel like going all-functional rather than
messing about with all these classes, you could use @contextmanager
from
contextlib
.
We’ve stripped out both the actual UoW and the fakes, as well as paring back the abstract UoW. Why not send us a link to your repo if you come up with something you’re particularly proud of?
Hopefully we’ve convinced you that the Unit of Work is a useful pattern, and hopefully you’ll agree that the context manager is a really nice Pythonic way of visually grouping code into blocks that we want to happen atomically.
This pattern is so useful, in fact, that SQLAlchemy already uses a unit-of-work in the shape of the Session object. The Session object in SQLAlchemy is the way that your application loads data from the database.
Every time you load a new entity from the database, the Session begins to track changes to the entity, and when the Session is flushed, all your changes are persisted together.
Why do we go to the effort of abstracting away the SQLAlchemy session if it already implements the pattern we want?
For one thing, the Session API is rich and supports operations that we don’t
want or need in our domain. Our UnitOfWork
simplifies the Session to its
essential core: it can be started, committed, or thrown away.
For another, we’re using the UnitOfWork
to access our Repository
objects.
This is a neat bit of developer usability that we couldn’t do with a plain
SQLAlchemy Session.
Lastly, we’re motivated again by the dependency inversion principle: our service layer depends on a thin abstraction, and we attach a concrete implementation at the outside edge of the system. This lines up nicely with SQLAlchemy’s own recommendations:
Keep the lifecycle of the session (and usually the transaction) separate and external. The most comprehensive approach, recommended for more substantial applications, will try to keep the details of session, transaction and exception management as far as possible from the details of the program doing its work.
- Unit of Work is an abstraction around data integrity
-
It helps to enforce the consistency of our domain model, and improves performance, by letting us perform a single flush operation at the end of an operation.
- It works closely with the Repository and Service Layer
-
The Unit of Work pattern completes our abstractions over data-access by representing atomic updates. Each of our service-layer use-cases runs in a single unit of work which succeeds or fails as a block.
- This is a lovely case for a context manager
-
Context managers are an idiomatic way of defining scope in Python. We can use a context manager to automatically rollback our work at the end of request which means the system is safe by default.
- SQLAlchemy already implements this pattern
-
We introduce an even simpler abstraction over the SQLAlchemy Session object in order to "narrow" the interface between the ORM and our code. This helps to keep us loosely coupled.
Pros | Cons |
---|---|
|
|