This repository provides a serverless solution for converting PDFs to images using AWS Lambda and Poppler, an open-source PDF processing library.
The project packages the Poppler utility (pdftocairo
) into an AWS Lambda layer, allowing the Lambda function to convert PDF files into images. The conversion is done in a lightweight, reusable Lambda layer, optimized for serverless environments.
- Docker (for building the Lambda layer)
- AWS CLI and AWS SAM CLI (for deployment)
- Node.js
To build the Poppler layer for ARM64 architecture:
docker build --platform=linux/arm64 --build-arg TARGET_PLATFORM=arm64 -f ./layers/poppler/Dockerfile -t poppler-lambda-layer-arm ./layers/poppler
docker run --rm --platform=linux/arm64 -v "$(pwd)/layers:/workspace" poppler-lambda-layer-arm
For x86_64 architecture, adjust the commands as follows:
- Platform:
linux/amd64
- Target Platform:
amd64
- Image Name:
poppler-lambda-layer-x86-64
- Set up a new SAM project.
- Define the Lambda function and layer in
template.yaml
. - Deploy the application:
sam build
sam deploy --guided
To test, send a POST request to the API endpoint with a base64-encoded PDF. The function returns the first page of the PDF as a base64-encoded PNG image.
curl -s -X POST "YOUR_ENDPOINT_HERE" \
-H "Content-Type: application/json" \
-d "{\"data\":\"$(base64 < sample.pdf | tr -d '\n')\"}" \
| sed -n 's/
This setup allows PDF-to-image conversion in a serverless environment, making it scalable and reusable for document processing tasks.