Skip to content

Latest commit

 

History

History
716 lines (543 loc) · 24.4 KB

chapter_06_uow.asciidoc

File metadata and controls

716 lines (543 loc) · 24.4 KB

Unit of Work Pattern

In this chapter we’ll introduce the final piece of the puzzle that ties together the Repository and Service Layer: the Unit of Work pattern.

If the Repository is our abstraction over the idea of persistent storage, the Unit of Work is our abstraction over the idea of atomic operations. It will allow us to finally, fully, decouple our Service Layer from the data layer.

Without UoW: API talks directly to 3 layers shows that, currently, there is a lot of communication across the layers of our infrastructure: the API talks directly to the DB layer to start a session, it talks to the repository layer to initialize a SQLAlchemyRepository, and it talks to the service layer to ask it to allocate.

apwp 0601
Figure 1. Without UoW: API talks directly to 3 layers

With UoW: UoW now manages DB state shows our target state: the Flask API now only does two things: it initializes a Unit of Work, and it invokes a service. The service collaborates with the UoW (we like to think of the UoW as being part of the service layer), but neither the service function itself nor Flask longer needs to talk directly to the DB.

apwp 0602
Figure 2. With UoW: UoW now manages DB state

And we’ll do it all using a lovely piece of Python syntax, a context manager.

Tip

You can find our code for this chapter at github.com/cosmicpython/code/tree/chapter_06_uow.

git clone https://github.com/cosmicpython/code.git && cd code
git checkout chapter_06_uow
# or, if you want to code along, checkout the previous chapter:
git checkout chapter_04_service_layer

The Unit of Work Collaborates with Repository(-Ies)

Here’s how it’ll look in use, when it’s finished:

Example 1. Preview of unit of work in action (src/allocation/service_layer/services.py)
def allocate(
        orderid: str, sku: str, qty: int,
        uow: unit_of_work.AbstractUnitOfWork
) -> str:
    line = OrderLine(orderid, sku, qty)
    with uow:  #(1)
        batches = uow.batches.list()  #(2)
        ...
        batchref = model.allocate(line, batches)
        uow.commit()  #(3)
  1. We’ll start a unit of work as a context manager

  2. uow.batches is the batches repo, so the unit of work provides us access to our permanent storage.

  3. When we’re done, we commit or roll back our work, using the UoW

The unit of work acts as a single entry point to our persistent storage, and keeps track of what objects were loaded and what the latest state is.[1] This gives us three useful things:

  1. It gives us a stable snapshot of the database to work with, so that the objects we use aren’t changing halfway through an operation.

  2. It gives us a way to persist all of our changes at once so that if something goes wrong, we don’t end up in an inconsistent state.

  3. It offers a simple API to our persistence concerns and gives us a handy place to get a repository.

Test-Driving a UoW with Integration Tests

Here’s a test for a new UnitofWork (or UoW, which we pronounce "you-wow"). It’s a context manager that allows us to start a transaction, retrieve and get things from repos, and commit:

Example 2. A basic "roundtrip" test for a Unit of Work (tests/integration/test_uow.py)
def test_uow_can_retrieve_a_batch_and_allocate_to_it(session_factory):
    session = session_factory()
    insert_batch(session, 'batch1', 'HIPSTER-WORKBENCH', 100, None)
    session.commit()

    uow = unit_of_work.SqlAlchemyUnitOfWork(session_factory)  #(1)
    with uow:
        batch = uow.batches.get(reference='batch1')  #(2)
        line = model.OrderLine('o1', 'HIPSTER-WORKBENCH', 10)
        batch.allocate(line)
        uow.commit()  #(3)

    batchref = get_allocated_batch_ref(session, 'o1', 'HIPSTER-WORKBENCH')
    assert batchref == 'batch1'
  1. We initialize the Unit of Work using our custom session factory, and get back a uow object to use in our with block.

  2. The UoW gives us access to the batches repository via uow.batches

  3. And we call commit() on it when we’re done.

For the curious, the insert_batch and get_allocated_batch_ref helpers look like this:

Example 3. Helpers for doing SQL stuff (tests/integration/test_uow.py)
def insert_batch(session, ref, sku, qty, eta):
    session.execute(
        'INSERT INTO batches (reference, sku, _purchased_quantity, eta)'
        ' VALUES (:ref, :sku, :qty, :eta)',
        dict(ref=ref, sku=sku, qty=qty, eta=eta)
    )


def get_allocated_batch_ref(session, orderid, sku):
    [[orderlineid]] = session.execute(
        'SELECT id FROM order_lines WHERE orderid=:orderid AND sku=:sku',
        dict(orderid=orderid, sku=sku)
    )
    [[batchref]] = session.execute(
        'SELECT b.reference FROM allocations JOIN batches AS b ON batch_id = b.id'
        ' WHERE orderline_id=:orderlineid',
        dict(orderlineid=orderlineid)
    )
    return batchref

Unit of Work and Its Context Manager

In our tests we’ve implicitly defined an interface for what a unit of work needs to do, let’s make that explicit by using an abstract base class:

Example 4. Abstract Unit of Work context manager (src/allocation/service_layer/unit_of_work.py)
class AbstractUnitOfWork(abc.ABC):
    batches: repository.AbstractRepository  #(1)

    def __enter__(self) -> AbstractUnitOfWork:  #(2)
        return self  #(3)

    def __exit__(self, *args):  #(3)
        self.rollback()

    @abc.abstractmethod
    def commit(self):  #(4)
        raise NotImplementedError

    @abc.abstractmethod
    def rollback(self):  #(5)
        raise NotImplementedError
  1. The UoW provides an attribute called .batches, which will give us access to the batches repository.

  2. If you’ve never seen a context manager, __enter__ and __exit__ are the two magic methods that execute when we enter the with block and when we exit it. They’re our setup and teardown phases.

  3. The enter returns self, because we want access to the uow instance and its attributes and methods, inside the with block.

  4. It provides a way to explicitly commit our work

  5. If we don’t commit, or if we exit the context manager by raising an error, we do a rollback. (the rollback has no effect if commit() has been called. Read on for more discussion of this).

Note
We’ve made the UoW part of the service layer, in our folder hierarchy. It is also an adapter, but as we’ll see later in the book, it often takes on more service-layer responsibilities to do with domain events, so the service layer feels like the right place to us.

The Real Unit of Work Uses SQLAlchemy Sessions

The main thing that our concrete implementation adds is the database session:

Example 5. the real SQLAlchemy Unit of Work (src/allocation/service_layer/unit_of_work.py)
DEFAULT_SESSION_FACTORY = sessionmaker(bind=create_engine(  #(1)
    config.get_postgres_uri(),
))

class SqlAlchemyUnitOfWork(AbstractUnitOfWork):

    def __init__(self, session_factory=DEFAULT_SESSION_FACTORY):
        self.session_factory = session_factory  #(1)

    def __enter__(self):
        self.session = self.session_factory()  # type: Session  #(2)
        self.batches = repository.SqlAlchemyRepository(self.session)  #(2)
        return super().__enter__()

    def __exit__(self, *args):
        super().__exit__(*args)
        self.session.close()  #(3)

    def commit(self):  #(4)
        self.session.commit()

    def rollback(self):  #(4)
        self.session.rollback()
  1. The module defines a default session factory that will connect to postgres, but we allow that to be overriden in our integration tests, so that we can use SQLite instead.

  2. The dunder-enter is responsible for starting a database session, and instantiating a real repository that can use that session.

  3. We close the session on exit.

  4. Finally, we provide concrete commit() and rollback() methods that use our database session.

Fake Unit of Work for Testing:

Here’s how we use a fake Unit of Work in our service layer tests

Example 6. Fake unit of work (tests/unit/test_services.py)
class FakeUnitOfWork(unit_of_work.AbstractUnitOfWork):

    def __init__(self):
        self.batches = FakeRepository([])  #(1)
        self.committed = False  #(2)

    def commit(self):
        self.committed = True  #(2)

    def rollback(self):
        pass



def test_add_batch():
    uow = FakeUnitOfWork()  #(3)
    services.add_batch("b1", "CRUNCHY-ARMCHAIR", 100, None, uow)  #(3)
    assert uow.batches.get("b1") is not None
    assert uow.committed


def test_allocate_returns_allocation():
    uow = FakeUnitOfWork()  #(3)
    services.add_batch("batch1", "COMPLICATED-LAMP", 100, None, uow)  #(3)
    result = services.allocate("o1", "COMPLICATED-LAMP", 10, uow)  #(3)
    assert result == "batch1"
...
  1. FakeUnitOfWork and FakeRepository are tightly coupled, just like the real Unit of Work and Repository classes. That’s fine because we recognize that the objects are collaborators.

  2. Notice the similarity with the fake commit() function from FakeSession (which we can now get rid of). But it’s a substantial improvement because we’re now faking out code that we wrote, rather than 3rd party code. Some people say "don’t mock what you don’t own".

  3. And in our tests, we can instantiate a UoW and pass it to our service layer, instead of a repository and a session, which is considerably less cumbersome.

Don’t Mock What You Don’t Own

Why do we feel more comfortable mocking the Unit of Work than the Session? Both of our fakes achieve the same thing: they give us a way to swap out our persistence layer so that we can run tests in memory instead of needing to talk to a real database. The difference is in the resulting design.

If we only cared about writing tests that ran quickly, we could create mocks that replaced SQLAlchemy and use those throughout our codebase. The problem is that the Session is a complex object that exposes lots of persistence-related functionality. It’s easy to use the Session to make arbitrary queries against the database, but that quickly leads to data access code being sprinkled all over the codebase. To avoid that, we want to limit access to our persistence layer, so that each component has exactly what it needs, and nothing more.

By coupling to the Session interface, you’re choosing to couple to all the complexity of SQLAlchemy. Instead, we want to choose a simpler abstraction and use that to clearly separate responsibilities. Our Unit of Work is much simpler than a session, and we feel comfortable with the service layer being able to start and stop units of work.

"Don’t Mock What You Don’t Own" is a rule of thumb that forces us to build these simple abstractions over messy subsystems. This has the same performance benefit as mocking the SQLAlchemy Session, but encourages us to think carefully about our designs.

Using the UoW in the Service Layer

And here’s what our new service layer looks like:

Example 7. Service layer using UoW (src/allocation/service_layer/services.py)
def add_batch(
        ref: str, sku: str, qty: int, eta: Optional[date],
        uow: unit_of_work.AbstractUnitOfWork  #(1)
):
    with uow:
        uow.batches.add(model.Batch(ref, sku, qty, eta))
        uow.commit()


def allocate(
        orderid: str, sku: str, qty: int,
        uow: unit_of_work.AbstractUnitOfWork  #(1)
) -> str:
    line = OrderLine(orderid, sku, qty)
    with uow:
        batches = uow.batches.list()
        if not is_valid_sku(line.sku, batches):
            raise InvalidSku(f'Invalid sku {line.sku}')
        batchref = model.allocate(line, batches)
        uow.commit()
    return batchref
  1. Our service layer now only has the one dependency, once again on an abstract Unit of Work.

Explicit Tests for Commit/Rollback Behavior

To convince ourselves that the commit/rollback behavior works, we wrote a couple of tests:

Example 8. Integration tests for rollback behavior (tests/integration/test_uow.py)
def test_rolls_back_uncommitted_work_by_default(session_factory):
    uow = unit_of_work.SqlAlchemyUnitOfWork(session_factory)
    with uow:
        insert_batch(uow.session, 'batch1', 'MEDIUM-PLINTH', 100, None)

    new_session = session_factory()
    rows = list(new_session.execute('SELECT * FROM "batches"'))
    assert rows == []


def test_rolls_back_on_error(session_factory):
    class MyException(Exception):
        pass

    uow = unit_of_work.SqlAlchemyUnitOfWork(session_factory)
    with pytest.raises(MyException):
        with uow:
            insert_batch(uow.session, 'batch1', 'LARGE-FORK', 100, None)
            raise MyException()

    new_session = session_factory()
    rows = list(new_session.execute('SELECT * FROM "batches"'))
    assert rows == []
Tip
We haven’t shown it here, but it can be worth testing some of the more "obscure" database behavior, like transactions, against the "real" database, ie the same engine. For now we’re getting away with using SQLite instead of Postgres, but in [chapter_07_aggregate] we’ll switch some of the tests to using the real DB. It’s convenient that our UoW class makes that easy!

Explicit vs Implicit Commits

A brief digression on different ways of implementing the UoW pattern.

We could imagine a slightly different version of the UoW, which commits by default, and only rolls back if it spots an exception:

Example 9. A UoW with implicit commit…​ (src/allocation/unit_of_work.py)
class AbstractUnitOfWork(abc.ABC):

    def __enter__(self):
        return self

    def __exit__(self, exn_type, exn_value, traceback):
        if exn_type is None:
            self.commit()  #(1)
        else:
            self.rollback()  #(2)
  1. should we have an implicit commit in the happy path?

  2. and roll back only on exception?

It would allow us to save a line of code, and remove the explicit commit from our client code:

Example 10. ... would save us a line of code (src/allocation/service_layer/services.py)
def add_batch(ref: str, sku: str, qty: int, eta: Optional[date], uow):
    with uow:
        uow.batches.add(model.Batch(ref, sku, qty, eta))
        # uow.commit()

This is a judgement call, but we tend to prefer requiring the explicit commit so that we have to choose when to flush state.

Although it’s an extra line of code this makes the software safe-by-default. The default behavior is to not change anything. In turn, that makes our code easier to reason about because there’s only one code path that leads to changes in the system: total success and an explicit commit. Any other code path, any exception, any early exit from the uow’s scope, leads to a safe state.

Similarly, we prefer "always-rollback" to "only-rollback-on-error," because the former feels easier to understand; rollback rolls back to the last commit, so either the user did one, or we blow their changes away. Harsh but simple.

Examples: Using UoW to Group Multiple Operations Into an Atomic Unit

Here’s a few examples showing the Unit of Work pattern in use. You can see how it leads to simple reasoning about what blocks of code happen together:

Example 1: Reallocate

Supposing we want to be able to deallocate and then reallocate orders?

Example 11. Reallocate service function
def reallocate(line: OrderLine, uow: AbstractUnitOfWork) -> str:
    with uow:
        batch = uow.batches.get(sku=line.sku)
        if batch is None:
            raise InvalidSku(f'Invalid sku {line.sku}')
        batch.deallocate(line)  #(1)
        allocate(line)  #(2)
        uow.commit()
  1. If deallocate() fails, we don’t want to do allocate(), obviously.

  2. But if allocate() fails, we probably don’t want to actually commit the deallocate(), either.

Example 2: Change Batch Quantity

Our shipping company gives us a call to say that one of the container doors opened and half our sofas have fallen into the Indian Ocean. Oops!

Example 12. Change quantity
def change_batch_quantity(batchref: str, new_qty: int, uow: AbstractUnitOfWork):
    with uow:
        batch = uow.batches.get(reference=batchref)
        batch.change_purchased_quantity(new_qty)
        while batch.available_quantity < 0:
            line = batch.deallocate_one()  #(1)
        uow.commit()
  1. Here we may need to deallocate any number of lines. If we get a failure at any stage, we probably want to commit none of the changes.

Tidying Up the Integration Tests

We now have three sets of tests all essentially pointing at the database, test_orm.py, test_repository.py and test_uow.py. Should we throw any away?

└── tests
    ├── conftest.py
    ├── e2e
    │   └── test_api.py
    ├── integration
    │   ├── test_orm.py
    │   ├── test_repository.py
    │   └── test_uow.py
    ├── pytest.ini
    └── unit
        ├── test_allocate.py
        ├── test_batches.py
        └── test_services.py

You should always feel free to throw away tests if you feel they’re not going to add value, longer term. We’d say that test_orm.py was primarily a tool to help us learn SQLAlchemy, so we won’t need that long term, especially if the main things it’s doing are covered in test_repository.py. That last you might keep around, but we could certainly see an argument for just keeping everything at the highest possible level of abstraction (just as we did for the unit tests).

Tip
This is another example of the lesson from [chapter_05_high_gear_low_gear]: As we build better abstractions, we can move our tests to run against them, which leaves us free to change the underlying details.
Exercise for the Reader

For this chapter, probably the best thing to do is try to implement a UoW from scratch. You could either follow the model we have quite closely, or perhaps experiment with separating the UoW (whose responsibilities are commit(), rollback() and providing the .batches repository) from the context manager, whose job is to initialize things, and then do the commit or rollback on exit. If you feel like going all-functional rather than messing about with all these classes, you could use @contextmanager from contextlib.

We’ve stripped out both the actual UoW and the fakes, as well as paring back the abstract UoW. Why not send us a link to your repo if you come up with something you’re particularly proud of?

Wrap-Up

Hopefully we’ve convinced you that the Unit of Work is a useful pattern, and hopefully you’ll agree that the context manager is a really nice Pythonic way of visually grouping code into blocks that we want to happen atomically.

This pattern is so useful, in fact, that SQLAlchemy already uses a unit-of-work in the shape of the Session object. The Session object in SQLAlchemy is the way that your application loads data from the database.

Every time you load a new entity from the database, the Session begins to track changes to the entity, and when the Session is flushed, all your changes are persisted together.

Why do we go to the effort of abstracting away the SQLAlchemy session if it already implements the pattern we want?

For one thing, the Session API is rich and supports operations that we don’t want or need in our domain. Our UnitOfWork simplifies the Session to its essential core: it can be started, committed, or thrown away.

For another, we’re using the UnitOfWork to access our Repository objects. This is a neat bit of developer usability that we couldn’t do with a plain SQLAlchemy Session.

Lastly, we’re motivated again by the dependency inversion principle: our service layer depends on a thin abstraction, and we attach a concrete implementation at the outside edge of the system. This lines up nicely with SQLAlchemy’s own recommendations:

Keep the lifecycle of the session (and usually the transaction) separate and external. The most comprehensive approach, recommended for more substantial applications, will try to keep the details of session, transaction and exception management as far as possible from the details of the program doing its work.

Unit of Work Pattern: Wrap-up
Unit of Work is an abstraction around data integrity

It helps to enforce the consistency of our domain model, and improves performance, by letting us perform a single flush operation at the end of an operation.

It works closely with the Repository and Service Layer

The Unit of Work pattern completes our abstractions over data-access by representing atomic updates. Each of our service-layer use-cases runs in a single unit of work which succeeds or fails as a block.

This is a lovely case for a context manager

Context managers are an idiomatic way of defining scope in Python. We can use a context manager to automatically rollback our work at the end of request which means the system is safe by default.

SQLAlchemy already implements this pattern

We introduce an even simpler abstraction over the SQLAlchemy Session object in order to "narrow" the interface between the ORM and our code. This helps to keep us loosely coupled.

Table 1. Unit of Work: The Trade-Offs
Pros Cons
  • We’ve got a nice abstraction over the concept of atomic operations, and the context manager makes it very easy to see, visually, what blocks of code are grouped together atomically.

  • We have explicit control over when a transaction starts and finishes, and our application fails in a way that is safe by default. We never have to worry that an operation is partially committed.

  • It’s a nice place to put all your repositories so client code can access it

  • And we’ll see in later chapters, atomicity isn’t only about transactions, it can help us to work with events and the message bus.

  • Your ORM probably already has some perfectly good abstractions around atomicity. SQLAlchemy even has context managers. You can go a long way just passing a Session around.

  • We’ve made it look easy, but you actually have to think quite carefully about things like rollbacks, multithreading, and nested transactions. Perhaps just sticking to what Django or Flask-SQLAlchemy gives you will keep your life simpler.


1. You may have come across the word collaborators, to describe objects that work together to achieve a goal. The unit of work and the repository are a great example of collaborators in the object modeling sense. In responsibility-driven design, clusters of objects that collaborate in their roles are called object neighborhoods which is, in our professional opinion, totally adorable.