Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
noah-art3mis committed May 7, 2024
1 parent e554642 commit ae8b182
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 6 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@ Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.

## How to use

1. Set the models in `eval_models.py`, prompts in `eval_prompts.py` and variables in `eval_variables.py`.
1. Set grading style in `grading.py`. This is important.
- "binary": is either right or wrong
- "qualitative": ask claude (to be implemented)
1. Run `python eval.py`.
1. Logs from the run will be in `output/<datetime>.yaml`.
1. Setup: `python -m venv venv`, `source venv/bin/activate`, `pip install -r requirements.txt`
1. Set the models in `eval_models.py`, prompts in `eval_prompts.py` and variables in `eval_variables.py`. See section on parameters.
1. (Not implemented) Set grading style in `main.py`.
- "binary": is either right or wrong
- "qualitative": ask claude
1. Run `python eval.py`.
1. Logs from the run will be in `output/<datetime>.yaml`.

## Parameters

Expand Down Expand Up @@ -50,6 +51,7 @@ Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.
## TODO

- add tests
- add qualitative eval
- add [asyncio](https://github.com/ollama/ollama-python?tab=readme-ov-file#async-client)
- add details on which answers tend to be wrong. summary expected

Expand Down
41 changes: 41 additions & 0 deletions outputs/20240507130128.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,44 @@
grade: 1
time_elapsed: 23.76
error: null
- !!python/object:utils.my_types.Result
id: 99cf15b1478c49f7869003875d0ef862
model: llama3
prompt_id: menciona_0
variable_id: pensão
expected:
- <<B>>
response: 'Analisando o texto, não há menção à necessidade de comprar remédios ou
itens de saúde (opção a). Além disso, não há menção a um evento imprevisto e excepcional
que tenha levado à busca de benefício (opção b).
Portanto, a resposta é: <<C>> (DEFERIDO)'
grade: 0
time_elapsed: 22.19
error: null
- !!python/object:utils.my_types.Result
id: 7448b305db81441d865e0ed12ae01f56
model: llama3
prompt_id: menciona_0
variable_id: despesas_essenciais
expected:
- <<C>>
response: 'Analisando o texto, posso dizer que:
* Não há menção a remédios ou itens de saúde, portanto não há necessidade de comprar
remédios ou itens de saúde (opção c).
* O texto não menciona um evento imprevisto e excepcional que tenha levado à busca
de benefício (opção b).
* O texto se refere à situação financeira difícil da família monoparental, que
solicita auxílio por vulnerabilidade, mas não há menção a saúde ou eventos imprevistos
(opção c).
Portanto, minha resposta é: <<C>> (DEFERIDO).'
grade: 1
time_elapsed: 31.24
error: null
11 changes: 11 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
anyio==4.3.0
certifi==2024.2.2
exceptiongroup==1.2.1
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
idna==3.7
ollama==0.1.9
PyYAML==6.0.1
sniffio==1.3.1
typing_extensions==4.11.0

0 comments on commit ae8b182

Please sign in to comment.