Skip to content

Commit

Permalink
Update 0008-pdf-generation.md (#44)
Browse files Browse the repository at this point in the history
* Update 0008-pdf-generation.md

* updated links to permalinks
  • Loading branch information
DraKen0009 authored Sep 23, 2024
1 parent db259ff commit 4f8171f
Showing 1 changed file with 26 additions and 112 deletions.
138 changes: 26 additions & 112 deletions docs/care/CEP/Completed/0008-pdf-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ Currently, this function is responsible for generating PDFs, and the following i

# Proposed Solution

### Initial Solution Approach
## Initial Solution Approach

Initially, I considered utilizing a Python library like `WeasyPrint` or `xhtml2pdf`. Both libraries offer Django-friendly code. Here are the respective documentation links for further reference:
- [Xhtml2pdf](https://xhtml2pdf.readthedocs.io/)
- [WeasyPrint](https://weasyprint.readthedocs.io/)

After conducting further research and receiving feedback, I came across a recent [article](https://zerodha.tech/blog/1-5-million-pdfs-in-25-minutes/) by Zerodha that discussed PDF generation. This is where I discovered Typst.

### Final Solution (Using Typst)
## Final Solution (Using Typst)

Initially, PDFs were generated from HTML using Puppeteer, which involved spawning headless instances of Chrome. According to the Zerodha article, their earlier tech stack, similar to ours, relied on headless Chrome for PDF rendering, which proved inefficient at scale. However, they transitioned to Typst, noting its efficiency and scalability benefits. This sparked my interest, leading to extensive research validating Typst's effectiveness and credibility through various references.

Expand All @@ -36,7 +36,7 @@ I engaged with the Typst community and moderators on Discord, where I received p
By adopting Typst for our PDF generation needs, we can significantly improve the efficiency and scalability of our discharge report generation process, ensuring lower resource consumption and enhanced performance.


# Implementation Plan
## Implementation Plan

- **Step 1:** Update Docker Files to Add Typst Dependencies

Expand All @@ -55,117 +55,29 @@ By adopting Typst for our PDF generation needs, we can significantly improve the

### Step 1: Update Docker Files to Add Typst Dependencies

Since we don't have an apt installation for Typst, we can download it from the official releases according to the build we are working on. Here is how we can update our Dockerfile:

```docker
FROM python:3.11-slim-bullseye
ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1
ENV PATH /venv/bin:$PATH
ARG TYPST_VERSION=0.11.0
RUN apt-get update && apt-get install --no-install-recommends -y \
build-essential libjpeg-dev zlib1g-dev \
libpq-dev gettext wget curl gnupg git \
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
&& rm -rf /var/lib/apt/lists/*
# Download and install Typst for the correct architecture
RUN ARCH=$(dpkg --print-architecture) && \
if [ "$ARCH" = "amd64" ]; then \
TYPST_ARCH="x86_64-unknown-linux-musl"; \
elif [ "$ARCH" = "arm64" ]; then \
TYPST_ARCH="aarch64-unknown-linux-musl"; \
else \
echo "Unsupported architecture: $ARCH"; \
exit 1; \
fi && \
wget -O typst.tar.xz https://github.com/typst/typst/releases/download/v${TYPST_VERSION}/typst-${TYPST_ARCH}.tar.xz && \
tar -xf typst.tar.xz && \
mv typst-${TYPST_ARCH}/typst /usr/local/bin/typst && \
chmod +x /usr/local/bin/typst && \
rm -rf typst.tar.xz typst-${TYPST_ARCH}
# use pipenv to manage virtualenv
RUN python -m venv /venv
RUN pip install pipenv
COPY Pipfile Pipfile.lock ./
RUN pipenv install --system --categories "packages dev-packages"
COPY . /app
RUN python3 /app/install_plugins.py
HEALTHCHECK \
--interval=10s \
--timeout=5s \
--start-period=10s \
--retries=48 \
CMD ["/app/scripts/healthcheck.sh"]
WORKDIR /app
```
Since we don't have an apt installation for Typst, we can download it from the official releases according to the build we are working on. Updated Dockerfile : [dev.Dockerfile](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/docker/dev.Dockerfile)

### Step 2: Update Helper Functions and Create Typst Wrapper

First of all we have to create a wrapper function to allow our typ binary to compile our template.
First of all we have to create a wrapper function to allow our typ binary to compile our template. The updated function - [compile_typ](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/care/facility/utils/reports/discharge_summary.py#L202-L240)

After creating a wrapper we can update our helper functions. Updated helper function - [generate_discharge_summary_pdf](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/care/facility/utils/reports/discharge_summary.py#L242-L251)

```python
def compile_typ(output_file, data):
try:
content = render_to_string("reports/example.typ", context=data)
process = subprocess.run(
["typst", "compile", "-", output_file],
input=content.encode("utf-8"),
capture_output=True,
check=True,
)
logging.info(
f"Successfully Compiled Summary pdf for {data['consultation'].external_id}"
)
return True
except subprocess.CalledProcessError as e:
logging.error(
f"Error compiling summary pdf for {data['consultation'].external_id}: {e.output.decode('utf-8')}"
)
return False
except Exception as e:
logging.error(
f"Unexpected error compiling summary pdf: {e}"
)
return False
```
After creating a wrpper we can update our helper functions like :
```py
def generate_discharge_summary_pdf(data, file):
logger.info(
f"Generating Discharge Summary pdf for {data['consultation'].external_id}"
)
compile_typ(output_file=file.name, data=data)
logger.info(
f"Successfully Generated Discharge Summary pdf for {data['consultation'].external_id}"
)
```

### Step 3: Create Static Template for Reports

Completed it. Progress can be seen here - [Report Template](https://typst.app/project/rBahq8CbsixUHYwri4KtmO)
Now this document just cointains different components that I've used in my template.
Static Report Template progress could be seen here - [Report Template](https://typst.app/project/rBahq8CbsixUHYwri4KtmO).

Now this document just contains different components that I've used in the template.

### Step 4: Integrate Typst with Django Templates in Our Project
Updated the previous template using HTML/CSS with `Typst`. Template can be found [here](https://github.com/coronasafe/care/blob/5d5ca4630cebd168f3ca8a75ca2dae9bdc6110fd/care/templates/reports/patient_discharge_summary_pdf_template.typ)
Updated the previous template using HTML/CSS with `Typst`. Template can be at [patient_discharge_summary_pdf_template.typ](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/care/templates/reports/patient_discharge_summary_pdf_template.typ)

### Step 5: Create Tests
Generating `PNG` of the pdf using typst and comparing using `Pillow` library. It involves have sample png images of the pdf in `care/facility/tests/sample_reports` folder which are to be compared with the newly generated pdf pngs , if identical the test cases passes, else throws error.


To Update the sample `PNG` files, we can update the [test_compile_typ() function](https://github.com/ohcnetwork/care/blob/develop/care/facility/tests/test_pdf_generation.py#L41-L100) by adding the below code to test function below line 59.
To Update the sample `PNG` files, we can update the [test_compile_typ](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/care/facility/tests/test_pdf_generation.py#L41-L100) function by adding the below code to test function below line 59.
```python
subprocess.run(
["typst", "compile", "-", sample_file_path, "--format", "png"],
Expand All @@ -175,14 +87,15 @@ subprocess.run(
cwd="/",
)
```
To investigate any errors, we can remove the [finally block](https://github.com/ohcnetwork/care/blob/develop/care/facility/tests/test_pdf_generation.py#L89-L100) from our [test_compile_typ() function](https://github.com/ohcnetwork/care/blob/develop/care/facility/tests/test_pdf_generation.py#L41-L100). It'll generate the `test_output{n}.png` files in `care/facility/tests/sample_reports` folder, from where you can use a image diff checker to investigate the differences.
To investigate any errors, we can remove the [finally block](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/care/facility/tests/test_pdf_generation.py#L89-L100) from our [test_compile_typ](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/care/facility/tests/test_pdf_generation.py#L41-L100). It'll generate the `test_output{n}.png` files in `care/facility/tests/sample_reports` folder, from where you can use image diff checker to investigate the differences.

if in future we decide to add more data to the test function and the number of pages increases, then one should also update the [number_of_pngs_generated](https://github.com/ohcnetwork/care/blob/92f717eb15120cd5d73504acc52630d33dedba19/care/facility/tests/test_pdf_generation.py#L68) number to the number of pages of pdf generated.

if in future if we decide to add more data to the test function and the number of pages increases, then one should also update the [number_of_pngs_generated](https://github.com/ohcnetwork/care/blob/develop/care/facility/tests/test_pdf_generation.py#L68) number to the number of pages of pdf generated.
### Step 6: Remove All Previous Dependencies and Remove Chromium and django-hardcopy
Updated all the functions utilising the older dependecies with the newer versions and removed `django-hardcopy` from pipfile and `chromium` from docker file.
Updated all the functions utilising the older dependencies with the newer versions and removed `django-hardcopy` from pipfile and `chromium` from docker file.

### Step 7: Update production files for the changes
Updated prod.Dockerfile to remove older dependencies and added newer dependecies.
Updated prod.Dockerfile to remove older dependencies and added newer dependencies.


# Updates in the template are listed below
Expand Down Expand Up @@ -224,16 +137,17 @@ Updated prod.Dockerfile to remove older dependencies and added newer dependecies

### Others

> - Removed `Symptoms` and `Diagnosis (ICD-11)` tables and `Health Status at admission` section
> - Removed `Daily Round` section
> - Created three new `templatetags` , one to `format prescription`, one to `format_to_sentence_case` and one to `handle empty data`
> - Added conditions to update fields name according to admission status
- Removed `Symptoms` and `Diagnosis (ICD-11)` tables and `Health Status at admission` section
- Removed `Daily Round` section
- Created three new `templatetags` , one to `format prescription`, one to `format_to_sentence_case` and one to `handle empty data`
- Added conditions to update fields name according to admission status




# Results
## Results
- Typst is 8-15 times faster in generating PDFs
- Container size decreased by 30%
- Typst is 10-15% more memory efficient
- We eliminate the overhead associated with browser-based rendering, resulting in a more efficient and scalable process.

## Links
- **Link to PR** - https://github.com/ohcnetwork/care/pull/2132
- **Link to Care repo** - https://github.com/ohcnetwork/care

0 comments on commit 4f8171f

Please sign in to comment.