Skip to content

Latest commit

 

History

History
677 lines (474 loc) · 25.4 KB

README.md

File metadata and controls

677 lines (474 loc) · 25.4 KB

Monorepo architecture for Python code with Polylith and Poetry

By: Yoel Benitez Fonseca

Review and Corrections by: Robmay S. Garcia

This article explores a technique or methodology for achieving a "monorepo" architecture for code by leveraging the Polylith philosophy, along with a few poetry plugins, for implementation in Python CDK applications.


Requirements

Let's go ahead and get our dependencies installed for poetry and the plugins:

curl -sSL https://install.python-poetry.org | python3 -
poetry self add poetry-multiproject-plugin
poetry self add poetry-polylith-plugin

You will also need AWS CDK installed on your system. I will recommend you to follow the requirements section of the aws cdk workshop site:

npm install -g aws-cdk

Here I leave some additional readings about poetry and its plugins.multiproject and for polylith.

Starting point

Let's get started. As a starting point we will be using the final version of the CDK Python Workshop. Simply clone the code from the repository https://github.com/aws-samples/aws-cdk-intro-workshop/tree/master/code/python/main-workshop. Our first commit will be like this and the resulting source tree should looks like:

.
├── app.py
├── cdk.json
├── cdk_workshop
│   ├── cdk_workshop_stack.py
│   ├── hitcounter.py
│   └── __init__.py
├── lambda
│   ├── hello.py
│   └── hitcount.py
├── README.md
├── requirements-dev.txt
├── requirements.txt
└── source.bat

Now, let's turn this source tree into a poetry project by executing:

poetry init

When asked for the main and development dependencies answer no, we will add then later. The result should look like (pyproject.toml):

[tool.poetry]
name = "cdk-polylith"
version = "0.1.0"
description = ""
authors = ["Yoel Benitez Fonseca <[email protected]>"]
readme = "README.md"
packages = [{include = "cdk_polylith"}]

[tool.poetry.dependencies]
python = "^3.10"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Additionally, let's add a poetry configuration file poetry.toml to have poetry build the python virtual environment in the project folder before installing the dependencies. The file content should look like:

[virtualenvs]
path = ".venv"
in-project = true

Create a commit with what we already have.

Configuring polylith

Before delving into the code, I strongly recommend you to read the following polylith core concepts: workspace, component, base, project and development project taking into account that the python-polylith is an adaptation of those concepts for Python.

For those of you who are impatient, running the following command (only once) in our repository will create the necessary folder structure for our project:

poetry poly create workspace --name="cdk_workshop" --theme=loose

note: The --name parameter here will set the base package structure, all the code then will be imported from this namespace, for example from cdk_workshop ... for more details on this read the official documentation

After running the above command our source tree will look like this, note the new folders created (bases, components, development, and projects):

.
├── app.py
├── bases
├── cdk.json
├── cdk_workshop
│   ├── cdk_workshop_stack.py
│   ├── hitcounter.py
│   └── __init__.py
├── components
├── development
├── lambda
│   ├── hello.py
│   └── hitcount.py
├── poetry.toml
├── projects
├── pyproject.toml
├── README.md
├── requirements-dev.txt
├── requirements.txt
├── source.bat
└── workspace.toml

The workspace.toml file will configure the behavior of the poetry poly ... commands

Let's create a new commit. From now on, we will be moving the old code to this new structure.

Managing project requirements.

Before we go any further let's install the poetry project:

poetry install

note: ignore the warning about the project not containing any element's

And now we move our dependencies from requirements.txt and requirements-dev.txt to the pyproject.toml format:

poetry add aws-cdk-lib~=2.68
poetry add 'constructs>=10.0.0,<11.0.0'
poetry add cdk-dynamo-table-view==0.2.438

And for dev requirements:

poetry add pytest==7.2.2 -G dev

Now we can remove requirements.txt and requirements-dev.txt files because they will be managed by the pyproject.toml. The content of the file will now look like:

pyproject.toml after adding the requirements

All the changes will be visible in this commit.

note: Poetry developers recommend to add the poetry.lock to the repository. Other developers have reported problems with architecture changes and the .lock file, so I will leave it up to you to decide if you want to use it or not.

Components

From the polylith documentation (https://polylith.gitbook.io/polylith/architecture/2.3.-component):

A component is an encapsulated block of code that can be assembled together with a base (it's often just a single base) and a set of components and libraries into services, libraries or tools. Components achieve encapsulation and composability by separating their private implementation from their public interface.

So in CDK term's our component should be Stacks or Constructs since this are the reusable parts.

In this application we have the HitCounter construct and the CdkWorkshopStack stack, lets add them as components to our project:

poetry poly create component --name hit_counter
poetry poly create component --name cdk_workshop_stack

We will get a new directory under components with the name of the workspace (cdk_workshop) and under this a python package for each of the components. The same has happened to the tests folder (that is why we used --theme=loose when creating the workspace).

Next, we need to modify pyproject.toml to recognize this components. Edit and add the following to the package property in the [tool.poetry] section:

packages = [
    {include = "cdk_workshop/hit_counter", from = "components"},
    {include = "cdk_workshop/cdk_workshop_stack", from = "components"}
]

To make sure all is fine, run:

poetry install && poetry run pytest test/

if we run poetry poly info we will see our new components listed under the bricks section

poetry poly info

Alright, let's commit this changes before moving into the code.

The hit_counter component

Now that we have a HitCounter construct, we will copy the code in cdk_workshop/hitcounter.py to components/cdk_workshop/hit_counter/core.py by executing:

cp cdk_workshop/hitcounter.py components/cdk_workshop/hit_counter/core.py
git rm cdk_workshop/hitcounter.py

The code in this construct will need additional refactoring but we will come back to it later, for now we commit this change as is.

The cdk_workshop_stack component

We repeat the same process for the CdkWorkshopStack component, just change the file name and destination as shown below:

cp cdk_workshop/cdk_workshop_stack.py components/cdk_workshop/cdk_workshop_stack/core.py
git rm cdk_workshop/*

Now, pay attention to this little but important detail. There is a dependency between both components, cdk_workshop_stack needs the construct defined in hit_counter so we need to edit components/cdk_workshop/cdk_workshop_stack/core.py file to fix the import statement as shown in line 8 of the following snippet:

from constructs import Construct
from aws_cdk import (
    Stack,
    aws_lambda as _lambda,
    aws_apigateway as apigw,
)
from cdk_dynamo_table_view import TableViewer
from cdk_workshop.hit_counter.core import HitCounter

...

Note: Now we are able to use the fully qualified path to the class component like (cdk_workshop.hit_counter.core). The path is composed by cdk_workshop the workspace, hit_counter the component, and core the module in hit_counter.

Let's add another commit.

Bases

From the polylith documentation (https://polylith.gitbook.io/polylith/architecture/2.2.-base), bases are the building blocks that exposes a public API to the outside world.

A base has a "thin" implementation which delegates to components where the business logic is implemented. A base has one role and that is to be a bridge between the outside world and the logic that performs the "real work", our components. Bases don't perform any business logic themselves, they only delegate to components.

So, in the context of the AWS CDK application the candidate for a base will be the module that defines the application and do the synthesis, in other words the code that now resides on app.py.

Let's add a base to the project:

poetry poly create base --name workshop_app

Like in the case of the components, the previous command, will add a new package but in the bases directory. This time, under the path bases/cdk_workshop/workshop_app with a module for us to define the code of our base - poetry poly will add a demo test code too.

We need to alter our package list on pyproject.toml to add the newly created base to the Python project:

packages = [
    {include = "cdk_workshop/workshop_app", from = "bases"},
    {include = "cdk_workshop/hit_counter", from = "components"},
    {include = "cdk_workshop/cdk_workshop_stack", from = "components"}
]

Let's copy the code and fix the imports:

cp app.py bases/cdk_workshop/workshop_app/core.py
git rm app.py

The file content should look like:

import aws_cdk as cdk

from cdk_workshop.cdk_workshop_stack.core import CdkWorkshopStack

app = cdk.App()
CdkWorkshopStack(app, "cdk-workshop")

app.synth()

The result can be seen in this commit.

If you run poetry poly info you should see something like this:

Workspace Summary

Remarks

I suggest the use of a single base for each cdk application, but if more than one is necessary, each base should reuse the stacks and constructs defined in the components.

If you are facing a large CDK project, I recommend maintaining a single component package (a single component in polylith is a python package) for all the constructs, one construct per module. And a component for each Stack, the reason being to maintain a single source of dependencies between the components in the project: construct component -> stack component assuming the stack's components do not depend on the others stack components.

Projects

Projects configure Polylith's deployable artifacts.

In other words, projects define what we deploy, we combine one (or several bases but that's rare) base and several components into an artifact that allow us to deploy our code.

In polylith the projects live in the projects folder and they should not contain code unless such code is related to the deployment or building of the artifacts, in other words no python code there.

A CDK application is defined by the cdk.json file, in our case:

{
  "app": "python3 app.py",
  "context": {
    "@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
    "@aws-cdk/core:stackRelativeExports": true,
    "@aws-cdk/aws-rds:lowercaseDbIdentifier": true,
    "@aws-cdk/aws-lambda:recognizeVersionProps": true,
    "@aws-cdk/aws-cloudfront:defaultSecurityPolicyTLSv1.2_2021": true
  }
}

Note the content of the "app" key, we've removed app.py and now we need to do something else, beginning by adding a new project to our polylith repository:

poetry poly create project --name cdk_app

The project name can be anything you need or want, this will be used to build a python package. Now the projects folder have a new subfolder cdk_app with a pyproject.toml file on it. In this file is where we combine our bases and components to build the artifact to deploy. Edit this file to add our include statements under the package property as shown below:

packages = [
    {include = "cdk_workshop/workshop_app", from = "../../bases"},
    {include = "cdk_workshop/hit_counter", from = "../../components"},
    {include = "cdk_workshop/cdk_workshop_stack", from = "../../components"}
]

Note that we've added a ../../ to bases and components because this pyproject file is two levels down in the path

Next, we need to add the necessary dependencies form the pyproject.toml in the root folder, from there we only copy what we need for the bases and components, no dev dependencies.

[tool.poetry.dependencies]
python = "^3.10"
aws-cdk-lib = ">=2.68,<3.0"
constructs = ">=10.0.0,<11.0.0"
cdk-dynamo-table-view = "0.2.438"

The final result should be something like:

[tool.poetry]
name = "cdk_app"
version = "0.1.0"
description = ""
authors = ['Yoel Benitez Fonseca <[email protected]>']
license = ""

packages = [
    {include = "cdk_workshop/workshop_app", from = "../../bases"},
    {include = "cdk_workshop/hit_counter", from = "../../components"},
    {include = "cdk_workshop/cdk_workshop_stack", from = "../../components"}
]

[tool.poetry.dependencies]
python = "^3.10"
aws-cdk-lib = ">=2.68,<3.0"
constructs = ">=10.0.0,<11.0.0"
cdk-dynamo-table-view = "0.2.438"

[tool.poetry.group.dev.dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Running poetry poly info will show:

Workspace Summary

As you can see a new column has appeared and the bricks (bases and components) used by the project are marked.

Next, move the cdk.json file to the project folder

mv cdk.json projects/cdk_app/cdk.json

But because we move our app object to the bases/cdk_workshop/workshop_app/core.py module we need to edit cdk.json and change the app entry to:

  "app": "python3 -m cdk_workshop.workshop_app.core"

Let's add a checkpoint here and commit our changes.

cdk project new home

At this point we should be able to deploy our CDK application (theoretically speaking), let's test that assumption:

cd projects/cdk_app
poetry build-project

This build-project command will create a dist directory under projects/cdk_app containing the python package.

This new directory need to be include in the .gitignore file. To make this step simpler, copy the content of the recommended gitignore for python file and add it to the .gitignore in the repository root as shown in this example commit.

Here is a view of the project folder file tree

This python package contains our CDK app. So, to test our theory we need to created a python virtual env, install this package, and run cdk synth (under the projects/cdk_app folder) to see the CloudFormation template:

python3 -m venv .venv
source .venv/bin/activate
pip install dist/cdk_app-0.1.0-py3-none-any.whl
cdk synth

But wait, we get and error. Something like:

RuntimeError: Cannot find asset at cdk_polylith/projects/cdk_app/lambda

The root cause for this error is that the previous implementation assumed that any cdk command would be execute on the root of the repository but our app has been moved to projects/cdk_app. To fix this, we need to move the lambda folder under projects/cdk_app and run cdk synth again:

cd ../../
mv lambda/ projects/cdk_app/
cd projects/cdk_app/
cdk synth

Now all should work great!!! ... ummm no, not really. The idea behind polylith is that all code should live in the components or bases folders.

So, let's go back, discard these last changes and solve this problem in the polylith way - (don't forget to exit the venv created for the cdk_app project).

Include lambda functions code, the polilyth way.

In this project we have 2 lambdas:

./lambda/
├── hello.py
└── hitcount.py

The plan here is to add to bases (one for each function) to the project. Both are pretty simple, only hitcount.py have an external dependency to boto3.

Let's add the bases first:

poetry poly create base --name hello_lambda
poetry poly create base --name hitcounter_lambda

Note: If these functions shared code (e.g: something that could be refactored so that they both use it), it would be a good idea to add a new component for this feature.

Next, we add this new bases to the main pyproject.toml packages property:

packages = [
    {include = "cdk_workshop/workshop_app", from = "bases"},
    {include = "cdk_workshop/hello_lambda", from = "bases"},
    {include = "cdk_workshop/hitcounter_lambda", from = "bases"},
    {include = "cdk_workshop/hit_counter", from = "components"},
    {include = "cdk_workshop/cdk_workshop_stack", from = "components"}
]

Adding any dependencies too:

poetry add boto3

Run poetry install && poetry run pytest test/ to ensure all is correct.

Now, let's move the code:

mv lambda/hello.py bases/cdk_workshop/hello_lambda/core.py
mv lambda/hitcount.py bases/cdk_workshop/hitcounter_lambda/core.py
rm -rf lambda/

Let's add a checkpoint here and commit our changes.

The trick now is to generate a python package for each lambda function and use the bundling options of the lambda cdk construct to inject our code and requirements for the lambdas. Let's begin by adding the projects for each lambda:

poetry poly create project --name hello_lambda_project
poetry poly create project --name hitcounter_lambda_project

Similar to the cdk_app, the projects/hello_lambda_project/pyproject.toml should reference the corresponding hello_lambda base:

...

packages = [
    {include = "cdk_workshop/hello_lambda", from = "../../bases"}
]

...

And, the same for projects/hitcounter_lambda_project/pyproject.toml for hitcounter_lambda - including the dependency for boto3:

packages = [
    {include = "cdk_workshop/hitcounter_lambda", from = "../../bases"}
]

[tool.poetry.dependencies]
python = "^3.10"
boto3 = "^1.26.123"

In the CdkWorkshopStack file code we change the lambda function definition to:

        hello = _lambda.Function(
            self,
            "HelloHandler",
            runtime=_lambda.Runtime.PYTHON_3_9,
            code=_lambda.Code.from_asset(
                "lambda/hello",
                bundling=BundlingOptions(
                    image=_lambda.Runtime.PYTHON_3_9.bundling_image,
                    command=[
                        "bash", "-c",
                        "pip install -r requirements.txt -t"
                        " /asset-output && cp -au . /asset-output"
                    ]
                )
            ),
            handler="cdk_workshop.hello_lambda.core.handler",
        )

Note the handler declaration, like in cdk.json file we are using the package fully qualified namespace to declare our handler. The _lambda.Runtime.PYTHON_3_9.bundling_image property will build the lambda distribution using a requirements.txt file that we will generate.

Let's repeat the process for the hitcounter_lambda. In components/cdk_workshop/hit_counter/core.py we change:

            handler="cdk_workshop.hitcounter_lambda.core.handler",
            code=_lambda.Code.from_asset(
                "lambda/hello",
                bundling=BundlingOptions(
                    image=_lambda.Runtime.PYTHON_3_9.bundling_image,
                    command=[
                        "bash", "-c",
                        "pip install -r requirements.txt -t"
                        " /asset-output && cp -au . /asset-output"
                    ]
                )
            ),
            runtime=_lambda.Runtime.PYTHON_3_9,

Add the required folders (assets folders) to the cdk_app project.

mkdir -p mkdir -p projects/cdk_app/lambda/{hello,hitcounter}
touch projects/cdk_app/lambda/{hello,hitcounter}/requirements.txt

Alright, time for a checkpoint and commit our changes.

Ok, let's try the deploy again. First, we build the lambda packages:

cd projects/hello_lambda_project
poetry build-project
cd ../hitcounter_lambda_project/
poetry build-project
cd ../../

Our projects folder structure should look like this:

./projects/
├── cdk_app
│   ├── cdk.json
│   ├── dist
│   │   ├── cdk_app-0.1.0-py3-none-any.whl
│   │   └── cdk_app-0.1.0.tar.gz
│   ├── lambda
│   │   ├── hello
│   │   │   └── requirements.txt
│   │   └── hitcounter
│   │       └── requirements.txt
│   └── pyproject.toml
├── hello_lambda_project
│   ├── dist
│   │   ├── hello_lambda_project-0.1.0-py3-none-any.whl
│   │   └── hello_lambda_project-0.1.0.tar.gz
│   └── pyproject.toml
└── hitcounter_lambda_project
    ├── dist
    │   ├── hitcounter_lambda_project-0.1.0-py3-none-any.whl
    │   └── hitcounter_lambda_project-0.1.0.tar.gz
    └── pyproject.toml

We will need to add the .whl of the lambdas to the respective requirements.txt files on the cdk_app project:

cd projects/cdk_app/
cp ../hello_lambda_project/dist/*.whl lambda/hello/
cp ../hitcounter_lambda_project/dist/*.whl lambda/hitcounter/
cd lambda/hello/
ls * | find -type f -name "*.whl" > requirements.txt
cd ../hitcounter/
ls * | find -type f -name "*.whl" > requirements.txt
cd ../../ # back to projects/cdk_app
poetry build-project # need to rebuild since we make changes
source .venv/bin/activate
# --force-reinstall is necessary unless we change the package version
pip install --force-reinstall dist/cdk_app-0.1.0-py3-none-any.whl

note: Runtime.PYTHON_3_9.bundling_image will fail if any of the packages need a greater version of python.

Now we can deploy again:

# from the projects/cdk_app/ with the python virtual env active
cdk deploy

It is important to note that most of this process is probably part of the DevOps setup, and rarely you will have to do any of this manually. But hey! it is better to know where things come from and be able to fix it than waiting on somebody else to fix it for you.

IMPORTANT, the lambdas will fail complaining that they can not find the handler module event if it is included correctly in the lambda package code. For this to work you'll need Runtime.PYTHON_3_9 at least

Let's add the last checkpoint here and commit our changes.

Final considerations

  • This monorepo methodology makes it easy to start a new project or change an existing one.
  • All your repositories will look consistent with the same structure and elements.
  • With all the code in the same repository you can detect if something could potentially break other parts of the system even if they are deployed separately.
  • Last but not least, there is a clear separation between the code and the deploy artifacts

I hope this article help you improve your coding skills, make your projects more organized and professional, and save you some time in the future. Go do something fun with that extra time.

Until the next post, Take care and happy coding.

Acknowledgments

Thanks to Robmay S. Garcia for the review, corrections and help.

Thanks David Vujic for this excellent tool.