Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Latest commit

 

History

History
591 lines (418 loc) · 31.3 KB

README.md

File metadata and controls

591 lines (418 loc) · 31.3 KB

Rust Pokédex API 🦀

CI codecov Security audit crates.io docs.rs Contributor Covenant

This project implements a simple web application that contains a CRUD API for a Pokédex - a database of Pokémons. It is written in the Rust programming language and is meant as an experiment in building fully-working web applications in that language.

Rust is a systems programming language offering low resource footprint and excellent performance, but contrarily to other systems language like C, it also includes memory safety features that makes it an attractive alternative to higher-level languages sometimes used to build web applications, like Java, Ruby or JavaScript.

This project includes several components usually found in modern web applications, including:

  • A high-performance HTTP server to handle incoming requests
  • A REST API with CRUD endpoints for Pokémon entities
  • Automatic serialization/deserialization of Pokémon entities as JSON
  • Automatic OpenAPI documentation including Swagger UI support (and others)
  • An ORM-like interface to persist Pokémons in a Postgres database
  • Support for managing and applying database migrations
  • Validation of incoming data at the endpoint level
  • Database connection pooling to improve performance
  • Configurable logging using a simple logging facade
  • Error handling with separation between service errors and their HTTP response counterparts
  • Support for development and production environments

Building and running

Supported platforms

In theory, the web application should work on all platforms supported by Rust. However, Windows does not support natively running Linux Docker containers. The easiest way to run the app on Windows is through WSL. Otherwise, the Docker-based commands will not work out-of-the-box. This includes running the local Postgres database, which will need to be installed manually.

Prerequisites

In order to build and run the web application and related utilities, you will need at the minimum:

  • A Docker Engine installation, including Docker Compose. If you do not already have this, the easiest way to get it is to install Docker Desktop. (As mentioned, on Windows, the native Docker Desktop will not work; you can however use Docker on WSL.)
  • The just command runner. This make-like tool is used to run project-specific commands. It can be installed in a variety of ways.

With Docker

Follow these steps to build, setup and run the service using Docker. Installing Rust locally is not required in such a case.

Build the image

just docker-build

This will create a local Docker image named clechasseur/pokerust that contains the web application binary and related tools. It runs on Debian Linux. The first build can take a while as you need to download the builder Docker image and compile all the code. (If you're not used to a compiled language, this will seem to take forever, but hang in there 😉)

Start the local database

just db up

This will launch two local containers running Postgres. One will serve as the database server when running the web application locally; the other is for running integration tests (in case you want to do so later).

When you're done with the local database, you can stop it:

just db down

Run database migrations

just docker-migrate

This will execute a small tool named run_migrations (compiled in the above Docker image) that runs the database migrations on the locally-running Postgres databases. It will set up the database so it is ready to be used by the web application.

Seed the database (optional)

just docker-seed

This will execute a small tool named seed_db (compiled in the above Docker image) that will read a CSV file containing the data of 800 Pokémons and insert them in the local database. Any existing data in the DB will be wiped first. This step is optional, but can be useful to showcase the possibilities of the REST API without having to insert many Pokémons by hand.

Start the Pokédex server

just docker-serve

This will launch the web application server, listening locally on port 8080. You should see log entries on the console, including one indicating that the server has been successfully started:

[2023-10-29T03:38:50Z INFO  pokedex_rs] Pokedex server started in Production! Listening on 0.0.0.0:8080.

Afterwards, the API can be accessed at /api/v1/pokemons. It is also possible to see what endpoints are supported by accessing the application's Swagger UI.

Locally

In order to build and run the application locally, you need the following additional components:

  • A recent stable Rust toolchain (1.75.0 is required at the minimum). If you do not have Rust installed, the easiest way to do so is via rustup.
  • The libpq library (a C interface for Postgres). If you do not have it installed locally, you can install it in a variety of ways, including:
    • Homebrew (macOS / Linux): brew install libpq
    • Debian-based Linux: sudo apt install libpq-dev
  • If you wish to work with the database schema, you will need the Diesel CLI. It is not strictly required to run database migrations however, since the locally-built run_migrations tool works for this.

By default, the Diesel CLI requires some local libraries for Postgres, MySQL and SQLite; however, only the Postgres support is required for Pokédex. To install the CLI with only Postgres support, you can run:

cargo install diesel_cli --no-default-features --features "postgres"
  • If you wish to run rustfmt or build the docs, you will need a Nightly Rust toolchain. If you do not have one, you can install one by running:
rustup toolchain install nightly
  • If you wish to run tests locally with code coverage, you will need to install cargo-tarpaulin. If you do not already have it, you can install it in a variety of ways.
  • If you wish to locally determine the Minimum Supported Rust Version (MSRV) of the project, you will need to install cargo-msrv. If you do not already have it, you can install it in a variety of ways.

Please be aware that running cargo-msrv will install a lot of Rust toolchains locally.

Build the binaries

just build

This will build the web application and related binaries in the target/ folder. It can take a while the first time around.

Start and set up the local database

Even when building the application locally, you still need a Postgres database to store the Pokédex data. The easiest way to do so is via Docker:

just db up

When running on Windows (natively), this will need to be performed by hand. See the .env file and the test app.rs file for details on what the application expects for the local databases.

Then, just like above, you need to run migrations and (optionally) seed the database:

just migrate seed

As before, when you're done with the local DB, you can stop it:

just db down

Start the local server

just serve

This will launch the application server locally. As before, it is then accessible via /api/v1/pokemons.

If you check the console log, you might notice that running the server locally starts it in Development mode:

[2023-10-29T04:16:21Z INFO  pokedex_rs] Pokedex server started in Development! Listening on 127.0.0.1:8080.

This affects the content of error messages returned by the API (see below).

Run the tests

just test

This will run all the tests included in the project: unit tests, integration tests (which require the local test database to be up) and documentation tests (a cool Rust feature that allows you to embed tests in your code's documentation).

To run the tests with code coverage, use:

just tarpaulin

This will run all the tests and generate an HTML report named tarpaulin-report.html at the project root. Please be aware that this takes much longer, as code coverage requires a special build with instrumentation (and because of an apparent bug, the tests need to be rebuilt on every run 😔).

Generate the docs

just doc

This will generate documentation for the types and functions used in the code via rustdoc. The resulting HTML will then be launched in your local web browser.

rustdoc generates quite nice extensive documentation. For an example output, see the actix-web documentation on docs.rs.

Linting and code formatting

Rust comes with two tools to help you check your code:

  • clippy: a Rust linter. Checks your code for common mistakes that, while not technically bugs, could be improved.
  • rustfmt: a Rust code formatter. Formats your code automatically according to predefined rules which can be configured.

You can run both tools on the codebase:

just tidy

Features

This section explores some of the interesting features found in the project's code.

OpenAPI support

The application includes support for generating an OpenAPI 3.0 documentation of the API via the utoipa crate (note: the name is not a typo). When the app is running, the documentation can be accessed at /api-docs/openapi.json. The documentation can also be viewed via built-in frontends:

Internal errors when running in development

The application can run in two modes: Development or Production. It runs in the latter mode by default, but the mode can be set via the POKEDEX_ENV environment variable. In the local repo, this is set to Development in the .env file.

When the application runs in Development, any error returned by the API will contain an internal_error field containing the recursive error messages that caused the error to be returned.

For example, if you were to run a query on the API while the database was down:

% just db down
docker compose down
[+] Running 3/3
 ✔ Container pokerust-pokedex-db-test-1  Removed 
 ✔ Container pokerust-pokedex-db-1       Removed 
 ✔ Network pokerust_pokedex-net          Removed

% curl http://localhost:8080/api/v1/pokemons | jq
{
  "status_code": 500,
  "error": "Internal Server Error",
  "internal_error": "database connection error\ncaused by: Error occurred while creating a new object: error connecting to server: Connection refused (os error 61)\ncaused by: error connecting to server: Connection refused (os error 61)"
}

For security reasons, the internal_error field is not returned when running in Production, because it might expose security details. However, some errors (like validation errors) still include a details field that include more information:

% curl http://localhost:8080/api/v1/pokemons/-1 | jq
{
  "status_code": 400,
  "error": "Bad Request",
  "details": "Validation error: id: Validation error: range [{\"value\": Number(-1), \"min\": Number(0.0)}]"
}

Backtrace support

Rust includes support for generating a "backtrace" (e.g. a callstack) when an error occurs. However, although the Backtrace struct is available in stable Rust, storing one when an error occurs is only supported in Nightly Rust.

The application supports including backtraces with errors when running in Development. This requires two things:

  • Building the application with the Nightly toolchain
  • Setting the RUST_BACKTRACE environment variable (to 1) to enable backtrace capture (otherwise, backtraces will be empty)

Testing backtrace support locally

RUST_BACKTRACE=1 just toolchain=nightly serve

This will build and run the app using the Nightly Rust toolchain and also enable backtrace generation. Backtrace support can be verified in the server logs:

[2023-10-29T04:56:08Z INFO  pokedex_rs] Rust version used: 1.75.0-nightly
[2023-10-29T04:56:08Z INFO  pokedex_rs] Backtrace support: supported

The inclusion of a backtrace with errors can be tested by sending an invalid query to the API:

curl http://localhost:8080/api/v1/pokemons/-1

Testing backtrace support via Docker

Testing via Docker is a bit trickier since it requires building another Docker image for the application using the Nightly Rust toolchain:

just toolchain=nightly docker-build

This should build another version of the app's Docker image (clechasseur/pokerust:nightly); again, this will probably take a while the first time. To then run the application using that image and enable backtrace support, use:

just toolchain=nightly docker-serve --env POKEDEX_ENV=development --env RUST_BACKTRACE=1

Then, like before, you can test that backtrace support works:

curl http://localhost:8080/api/v1/pokemons/-1

You might notice that the returned backtrace is different from one generated locally. This is because backtrace generation is highly platform-specific (it is even completely unsupported on some platforms).

Logging level

The Pokédex application includes logging of various operations. As with other popular frameworks, log entries have different levels: error, warning, info, debug or trace (see log::Level).

By default, the application only displays log entries of level info or above. This can be configured via the RUST_LOG environment variable, however. For example, to enable trace logging when running locally:

RUST_LOG=trace just serve

Or via Docker:

just docker-serve --env RUST_LOG=trace

Lots of other options exist to control logging output, including filtering certain entries and only enable logging for specific modules. For more information, see the env_logger crate documentation.

Pagination support

The GET /api/v1/pokemons endpoint supports listing Pokémons in the Pokédex in pages. By default, the endpoint returns a maximum of 10 Pokémons at a time. Pagination is controlled via the page and page_size query parameters. For example:

curl "http://localhost:8080/api/v1/pokemons?page=2&page_size=5"

The returned JSON will include the Pokémons, as well as information about the total number of pages available for the specified page_size:

{
  "pokemons": [
    ...
  ],
  "page": 2,
  "page_size": 5,
  "total_pages": 160
}

For performance reasons, the page_size is limited (currently to 100). This is currently hardcoded in the service code (see MAX_PAGE_SIZE in service/pokemon.rs).

Documentation

Although the Pokédex application is a bin crate, the main program's code only includes what is necessary to actually start the HTTP server and listen to connections. The body of the code is in the project's lib crate (see lib.rs) and is all documented. As mentioned before, the documentation can be generated and viewed locally via:

just doc

Note that all the types and functions in the library are currently public. This would not normally be the case; it was done this way here so that it's easier to explore the app's code via the doc.

Integration testing

The project includes integration tests that launch a test service using actix-web's testing helpers, connecting it to a test DB hosted on a separate Postgres server. The integration tests perform requests on the actual API endpoints and parse the data to validate the result.

In order for the tests to be able to perform validations on entity counts, etc. every test creates a new test service and when the test concludes, the test DB is cleared of its content. This works well, but has one drawback: because changes are actually persisted to the database by running tests, they interfere with each other so have to be serialized (e.g. they cannot run in parallel).

In this sample project, the small number of tests makes this manageable. In a large project however, it would be quite a problem. Many frameworks go around this problem by creating a database connection and starting a transaction before handing the connection to the test code, then rolls back the transaction when the test is done. Since no actual data is ever committed to the database, tests can easily run in parallel.

I have not been able to find an easy way to implement this in the project, though. DB connections used by the API endpoints come from a connection pool. Furthermore, some tests perform multiple requests, so they'd have to reuse the same connection throughout so that they are in the same transaction. I'm not entirely sure what's the best way to hook into the API code in order to achieve this. I have a feeling that using mockall_double could help, but I haven't spent enough time thinking about it.

Interesting crates

In Rust, external libraries are stored in units called crates. Many open-source libraries are available and can be viewed and downloaded from the crates.io registry. It is not necessary to download them manually though; cargo, the Rust package manager, does so automatically when building by looking up dependencies in Cargo.toml.

Contrarily to many ecosystems, the Rust ecosystem does not include an everything-but-the-kitchen-sink framework to develop web applications (such as Ruby's Rails framework, or Elixir's Phoenix). Instead, Rust libraries tend to be broken down into small, reusable components that offer one or more related features. Because of this, building a web application requires the use of several crates (much of the time spent building this experimental project was spent looking for and testing various libraries for the different parts of the app).

The following list includes some of the more interesting crates used in the application's project. They are certainly useful to know to build similar Rust projects (or even other Rust projects that are unrelated, since some of those are quite ubiquitous in the Rust ecosystem).

Web frameworks

actix-web is probably the most popular web framework for Rust. It offers great performance while still being relatively easy to set up and use. For other options in terms of web development, check out the Are we web yet? website.

actix-web uses Rust's asynchronous programming support to handle requests in an efficient manner. This requires an asynchronous runtime. Enter tokio, an asynchronous runtime that is designed for building network applications. Although other asynchronous runtime implementations exist in the Rust ecosystem, tokio is by far the most widely used.

Database and persistence

  • diesel : A flexible ORM framework
  • deadpool : A library for asynchronous pooling of database connections (or any type of object, really)
  • diesel-async : Asynchronous wrapper around diesel that includes connection pooling support

diesel is probably the most popular ORM in the Rust ecosystem. One potential issue with diesel, however, is that it does not provide an asynchronous interface; this means that when you perform a database operation inside an async function, the thread in the runtime thread pool is hung until the DB call returns. However, whether this is a real "issue" is debated; asynchronous code is not in and out of itself necessarily faster.

This project uses the diesel-async crate to wrap calls to the database made with diesel so that they appear async. This is mostly "for show", however, since diesel remains synchronous. Rather, the DB calls are offloaded to other threads that are not part of the runtime thread pool. Whether this improves performance significantly would need to be benchmarked.

Alternatives to diesel include:

  • sqlx : A library to perform SQL queries using an asynchronous interface, albeit without a DSL
  • ormx : A small library adding ORM-like features to sqlx, albeit in a limited way
  • sea-orm : A truly asynchronous ORM that uses sqlx under the hood

sea-orm looks promising and seems to require less boilerplate than diesel. It also supports writing database migrations as Rust code instead of pure SQL (whether this is better is a matter of opinion, I guess).

Validators

  • validator : Simple library to add validation support for Rust structs
  • actix-web-validator : Adds validator support to actix-web projects, allowing automatic validation of API input

The combination of both of these crates allow code to add validations at the struct level, which will then be enforced at the API level. Validation errors can then be converted to proper HTTP responses (e.g. 400 Bad Request) via Actix's built-in error handling facilities.

OpenAPI support

  • utoipa : OpenAPI 3.0 documentation generator for your API (with a weird name to boot)
  • utoipa-swagger-ui : Automatic Swagger UI support (via utoipa)
  • utoipa-redoc : Automatic Redocly support (via utoipa)
  • utoipa-rapidoc : Automatic RapiDoc support (via utoipa)

Generating OpenAPI documentation for your API endpoints is easy with utoipa. It allows you to use derive macros on endpoints as well as schema and response structs to document them, then bind everything together to generate one OpenAPI JSON documentation. This documentation can then be used to host viewers like Swagger UI via the other crates.

I did find a few quirks when using utoipa (namely, many derive macros use the rustdoc documentation to generate the OpenAPI documentation, but sometimes you want it to be different); however, it is definitely quite the time saver.

Serialization

  • serde : The de facto sereliazation / deserialization library of the Rust ecosystem
  • serde_json : JSON parser and validator that uses serde to allow (de)serialization of Rust types
  • csv : CSV parser that supports (de)serialization via serde
  • serde_with / serde-this-or-that : Helpers for implementing serde support

Supporting serialization of data structures in formats like JSON is easier in languages that support some kind of reflection API. Compiled languages can have that kind of support (see: Java), but alas this is not the case for Rust. This is however often replaced with generic traits combined with clever proc macros.

Enter serde, a compile-time serialization library that showcases the power of Rust's trait system by implementing a format-agnostic serialization framework. Wait, what?

Basically, serde separates the serialization of a type into primitive instructions from the actual implementation of a serializer that persists the data in a specific format. To do this, serde offers the generic Serialize trait that can be implemented. This trait's only method, serialize, is passed a Serializer (another generic trait) and must use the serializer to save the type's data. For instance, a struct would persist itself by serializing a struct, then each of its named (or unnamed) fields.

Then, other crates like serde_json provide actual implementations of the Serializer trait for their specific data format. These serializers will take the provided serialization instructions and create an appropriate output. Magic!

But the fun doesn't end there. Because serde comes with support for serializing most basic Rust types out-of-the-box, the Serialize trait can be derived automatically for almost all types. For example, structs and enums support deriving Serialize as long as they contain fields that can all already be serialized themselves (e.g. their types already implement Serialize). In practice, this means that structs can be tagged with a simple #[derive(Serialize)] directly and boom, they can automatically be serialized in all data formats for which there exists a serde-based library (and there are many).

(Deserialization is similarly supported via the Deserialize and Deserializer traits.)

Logging

  • log : Simple logging facade for Rust
  • env_logger : Console logger that can be configured via an environment variable
  • simple_logger : Dead-simple console logger for simple cases

log is a logging facade that is heavily used in the Rust ecosystem. It includes easy macros to log data, like info!, error!, etc. Then, to perform actual logging, you can initialize a logger implementation (like env_logger) at the start of your program.

There exists multiple logger implementations; in particular, some can be used to log to files. They weren't explored in this project, but some can be found in the log crate documentation.

Error handling

  • thiserror : Useful derive macro to ease implementation of error types
  • anyhow : A type-erased error type for easy error handling in applications

For those used to handle errors through exceptions, Rust's error handling capabilities might feel weird at first (they are more akin to Go, for example).

In Rust, the basic way of handling errors is by using a type called Result. Result is an enum - a Rust concept that is similar in aspect to enums in other languages like Java or C++, but are actually more powerful: in Rust, each enum variant can optionally contain additional data and the enum "knows" which variant it is storing at any moment. Data in each enum variant is not shared, so you can only have one variant's data members at a time (a little like C's unions).

The Result enum has only two variants:

  • Ok(T) : represents a success and contains the resulting data of type T
  • Err(E) : represents an error and contains error information of type E

When a function is fallible, it usually returns a Result that can be used to determine if the call succeeded. If the function returns Err, then an error occurred, and it must be handled. Because Result is generic but strongly-typed, errors can be bubbled up, but their type will be clearly identified.

Rust also includes a trait called Error that is usually used for error types (although it is not required). The goal of this trait is to be able to fetch the "source" of the error - the underlying error that is the root cause. In many languages, this is actually called cause.

For external libraries, it is common to define a custom error type that implements Error and can be used to represent the different types of errors that can be returned by the library (often through an enum). This is where the thiserror crate comes in: it offers a derive macro to automatically implement the Error trait (and some related traits like Display) for your error type.

For applications, it is sometimes desirable to be able to handle any kind of error, because we might call many different libraries, so creating a custom type could be unwieldy. The anyhow crate can be used for this: its anyhow::Error type can be used to store any source error, as long as it implements the standard library's Error trait. This makes it easier to add proper error handling at the application level.

Rust's error handling design means you need to think carefully about how you handle errors in your code: when an error occurs in a deeper layer, should you bubble it up as-is? Should you wrap it in a more friendly error type to add context? Maybe you can simply compensate via other means? Although this is something that should be present in all applications, Rust's reliance on an actual Result type that is returned explicitly instead of via exceptions that can easily propagate through layers unchecked means you are forced to think it through. This can feel a little daunting at first, but it could be argued that the resulting API for your library will be more solid.

Contributing

Although this project is meant as an example only, if you want to discuss it, feel free to open a Discussion (or an Issue if you find a bug somewhere). Also see CODE_OF_CONDUCT.md.