Export PDF To Excel using AI
This application shows to to use OpenAI or Anthropic Vision API to export a PDF file to Excel.
This application will convert PDF to Excel by using the following steps:
- Convert PDF to JPG using Ghostscript
- Let user select table(s) to export to Excel
- Extract the selected images and resize them to confirm to the vison API (2,000px max for OpenAI and 1,092px max for Anthropic)
- Use OpenAI or Anthropic Vision API to convert image to HTML
- Convert HTML files to one Excel file using VBA
Using the code
- Get Anthropic API key https://console.anthropic.com/settings/keys
- Get OpenAI API key https://platform.openai.com/settings/profile?tab=api-keys
- Download and Install Ghostscript
The code uses HttpClient to post JSON to OpenAI (https://api.openai.com/v1/chat/completions) and Anthropic (https://api.anthropic.com/v1/messages) endpoints. It encodes the image file using base64 encoding.
The application will resize large images before sending them to Vision API. So be carful not to select very large tables because the image quality will suffer and the AI will start to hallucinate.
I attempted to OCR the image (using tesseract-ocr) and send it along with the request but it only confused the AI...
Next step would be to try to export the PDF to a database this bridging the unstructured and structured data boundary!