Skip to content

technologiestiftung/parla-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

All Contributors

Parla (api & database)

This is a the api and database for the explorational project Parla. This is not production ready. Currently we explore if we can make the parliamentary documentation provided by the "The Abgeordnetenhaus" of Berlin as open data https://www.parlament-berlin.de/dokumente/open-data more accessible by embedding all the data and do search it using vector similarity search. The project is heavily based on this example from the supabase community. Built with Fastify and deployed to render.com using docker.

Prerequisites

Required Environment Variables

See .envrc.sample for the required environment variables.

Hint. We use direnv for development environment variables. See https://direnv.net/

Development

Install dependencies:

npm ci

Setup environment variables:

cp .envrc.sample .envrc

Change variables in .envrc according to your needs and load the env:

direnv allow

Startup a local Supabase database:

npx supabase start

Run the API:

npm run dev

API is now running (default on http://127.0.0.1:8080)

Deployment

Currently we deploy using docker on render.com.

  • Go to render.com
  • allow render to access your github repository
  • create a new web service (type should be docker)
  • populate the environment variables
  • deploy

Periodically regenerate indices

The indices on the processed_document_chunks and processed_document_summaries tables need be regenerated upon arrival of new data. This is because the lists parameter should be changed accordingly to https://github.com/pgvector/pgvector. To do this, we use the pg_cron extension available: https://github.com/citusdata/pg_cron. To schedule the regeneration of indices, we create two jobs which use functions defined in the API and database definition: https://github.com/technologiestiftung/parla-api. As those jobs run for quite a long time, we have to execute them in a session wrapped in BEGIN and COMMIT with the statement_timeout set to a high value (in our case, we use 600.000ms = 10min).

select cron.schedule (
    'regenerate_embedding_indices_for_summaries',
    '30 5 * * *',
    $$ BEGIN; SET statement_timeout = '600000'; select * from regenerate_embedding_indices_for_summaries(); COMMIT; $$
);

select cron.schedule (
    'regenerate_embedding_indices_for_chunks',
    '30 4 * * *',
    $$ BEGIN; SET statement_timeout = '600000'; select * from regenerate_embedding_indices_for_chunks(); COMMIT; $$
);

Feedback Feature

To have feedback types and tags in the initial version you can use this snippet

INSERT INTO feedbacks (kind, tag)
		values('positive', NULL), ('negative', 'Antwort inhaltlich falsch oder missverständlich'), ('negative', 'Es gab einen Fehler'), ('negative', 'Antwort nicht ausführlich genug'), ('negative', 'Dokumente unpassend');

It is also present in the supabase/seed.sql

Tests

npm t

Contributing

Before you create a pull request, write an issue so we can discuss your changes.

Contributors

Thanks goes to these wonderful people (emoji key):

Fabian Morón Zirfas
Fabian Morón Zirfas

💻 🚇 🎨 📖
Jonas Jaszkowic
Jonas Jaszkowic

💻 🤔 📖
Ingo Hinterding
Ingo Hinterding

📆 💻 🤔

This project follows the all-contributors specification. Contributions of any kind welcome!

Credits

Made by

A project by

Supported by

Related Projects