The project is installed using Hatch. Install that from here.
Configuration settings are loaded from a .env
file in the root directory. The following settings can be set
IGDB.CLIENT_ID
- IGDB API client identifierIGDB.CLIENT_SECRET
- IGDB API client secretGEMINI.API_KEY
- Gemini API key
Run the following command to get help:
hatch run lcls --help
hatch run lcls books extract
- Extract all solved books threads into data/books/solved.jsonhatch run lcls games extract
- Extract all solved games threads into data/games/solved.jsonhatch run lcls movies extract
- Extract all solved movies threads into data/movies/solved.json
hatch run lcls books query-gemini
- Use Gemini to process all solved book requests.
hatch run lcls books stats
- Show basic statistics for the books data-sethatch run lcls games stats
- Show basic statistics for the games data-sethatch run lcls movies stats
- Show basic statistics for the movies data-set
hatch run lcls games search --search-mode [default|exact] {NAME}
- Search IGDB by name.--search-mode
can be used to force exact matches.
All data is in the data
directory, split into sub-folders by data-set. In the sub-folders the annotation data is in
the .csv
files. The solved.json
files contain all threads that have been maked as solved. The gemini.json
contains
the responses generated by the Google Gemini LLM.
JSON-formated file, containing a list of objects. Each object has the following fields:
thread_id
: The unique id for this threadrequest
: The original request textprompt
: The request text embedded in the standard LLM promptsolved
: Boolean to indicate that the thread has a solutionconfirmed
: Boolean to indicate that the thread has been marked as confirmedtitle
: The title of the book / game / movieauthor
: The author of the book (only inbooks/solved.json
)years
: The release years of the game (only ingames/solved.json
)year
: The release year of the movie (only inmovies/solved.json
)
JSON-formated file, containing a list of objects. Each object has the following fields:
-
thread_id
: The uniqe id of the thread that this set of responses is for -
results
: A list of up to three sub-lists, each sub-list containing objects with the following fields:answer
: The LLM-provided answerexplanation
: The LLM-provided explanationconfidence
: The LLM-provided confidencetitle
: The title of the book / game / movieauthor
: The author of the book (only inbooks/gemini.json
)
If less than three sub-lists are present, then Gemini was not able to generate valid JSON (allowing for up to 10 attempts).
The project uses pre-commit and Ruff to ensure code-style consistency. Run
pre-commit install
to ensure style checks are run before committing changes.