update readme

noah-art3mis · May 7, 2024 · ae8b182 · ae8b182
1 parent e554642
commit ae8b182
Show file tree

Hide file tree

Showing 3 changed files with 60 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -6,12 +6,13 @@ Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.
 
 ## How to use
 
-1.  Set the models in `eval_models.py`, prompts in `eval_prompts.py` and variables in `eval_variables.py`.
-1.  Set grading style in `grading.py`. This is important.
-    -   "binary": is either right or wrong
-    -   "qualitative": ask claude (to be implemented)
-1.  Run `python eval.py`.
-1.  Logs from the run will be in `output/<datetime>.yaml`.
+1. Setup: `python -m venv venv`, `source venv/bin/activate`, `pip install -r requirements.txt`
+1. Set the models in `eval_models.py`, prompts in `eval_prompts.py` and variables in `eval_variables.py`. See section on parameters.
+1. (Not implemented) Set grading style in `main.py`.
+    - "binary": is either right or wrong
+    - "qualitative": ask claude
+1. Run `python eval.py`.
+1. Logs from the run will be in `output/<datetime>.yaml`.
 
 ## Parameters
 
@@ -50,6 +51,7 @@ Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.
 ## TODO
 
 -   add tests
+-   add qualitative eval
 -   add [asyncio](https://github.com/ollama/ollama-python?tab=readme-ov-file#async-client)
 -   add details on which answers tend to be wrong. summary expected
 

diff --git a/outputs/20240507130128.yaml b/outputs/20240507130128.yaml
@@ -82,3 +82,44 @@
   grade: 1
   time_elapsed: 23.76
   error: null
+- !!python/object:utils.my_types.Result
+  id: 99cf15b1478c49f7869003875d0ef862
+  model: llama3
+  prompt_id: menciona_0
+  variable_id: pensão
+  expected:
+  - <<B>>
+  response: 'Analisando o texto, não há menção à necessidade de comprar remédios ou
+    itens de saúde (opção a). Além disso, não há menção a um evento imprevisto e excepcional
+    que tenha levado à busca de benefício (opção b).
+
+
+    Portanto, a resposta é: <<C>> (DEFERIDO)'
+  grade: 0
+  time_elapsed: 22.19
+  error: null
+- !!python/object:utils.my_types.Result
+  id: 7448b305db81441d865e0ed12ae01f56
+  model: llama3
+  prompt_id: menciona_0
+  variable_id: despesas_essenciais
+  expected:
+  - <<C>>
+  response: 'Analisando o texto, posso dizer que:
+
+
+    * Não há menção a remédios ou itens de saúde, portanto não há necessidade de comprar
+    remédios ou itens de saúde (opção c).
+
+    * O texto não menciona um evento imprevisto e excepcional que tenha levado à busca
+    de benefício (opção b).
+
+    * O texto se refere à situação financeira difícil da família monoparental, que
+    solicita auxílio por vulnerabilidade, mas não há menção a saúde ou eventos imprevistos
+    (opção c).
+
+
+    Portanto, minha resposta é: <<C>> (DEFERIDO).'
+  grade: 1
+  time_elapsed: 31.24
+  error: null
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,11 @@
+anyio==4.3.0
+certifi==2024.2.2
+exceptiongroup==1.2.1
+h11==0.14.0
+httpcore==1.0.5
+httpx==0.27.0
+idna==3.7
+ollama==0.1.9
+PyYAML==6.0.1
+sniffio==1.3.1
+typing_extensions==4.11.0