update readme

noah-art3mis · May 7, 2024 · ab0bd53 · ab0bd53
1 parent 23f4ef0
commit ab0bd53
Show file tree

Hide file tree

Showing 3 changed files with 35 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 ## Crucible
 
-Prompt evaluation package ("evals"). Can handle multiple models, prompts and variables at the same time.
+Prompt evaluation package ("evals"). Test multiple models, prompts and variables.
 
 Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.
 
@@ -10,33 +10,32 @@ Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.
 1.  Set grading style in `grading.py`. This is important.
     -   "binary": is either right or wrong
     -   "qualitative": ask claude (to be implemented)
-    -   "out_of_ten": loses points if wrong options are present in answer
 1.  Run `python eval.py`.
 1.  Logs from the run will be in `output/<datetime>.yaml`.
 
 ## Parameters
 
-    -   `model`
-        id (str): name as understood by ollama. you might need to download it first
+-   `model`
+    id (str): name as understood by ollama. you might need to download it first
 
                 Model("llama3")
 
-    -   `prompt`
-        id (str): name of the test case
-        slots (str): name of snippet to be inserted in prompt
-        content (str): actual prompt
+-   `prompt`
+    id (str): name of the test case
+    slots (str): name of snippet to be inserted in prompt
+    content (str): actual prompt
 
                 Prompt(
                     id="test_3",
                     slots="{variable}",
                     content="""Sua tarefa é analisar e responder se o texto a seguir menciona a necessidade de comprar remédios ou itens de saúde. Aqui está o texto:\n\n###\n\n{variable}\n\n###\n\n\nPrimeiro, analise cuidadosamente o texto em um rascunho. Depois, responda: a solicitação citada menciona a necessidade de comprar remédios ou itens de saúde? Responda "<<SIM>>" ou "<<NÃO>>".""",
                 )
 
-    -   `variable`
-        id (str): name of the test case
-        content (str): text of snippet to be inserted in prompt
-        expected (str list): values that would be considered correct
-        options (str list): all values that the response could take
+-   `variable`
+    id (str): name of the test case
+    content (str): text of snippet to be inserted in prompt
+    expected (str list): values that would be considered correct
+    options (str list): all values that the response could take
 
                 Variable(
                     id="despesas_essenciais",

diff --git a/main.py b/main.py
@@ -20,7 +20,7 @@
 PROMPTS = prompts
 VARIABLES = variables
 TEMPERATURE = 0.0
-GRADING_TYPE = "out_of_ten"
+GRADING_TYPE = "binary"
 
 
 def main():

diff --git a/outputs/20240507130128.yaml b/outputs/20240507130128.yaml
@@ -0,0 +1,22 @@
+- !!python/object:utils.my_types.Result
+  id: 420eedbe4c17403792cba9e9ef287676
+  model: llama3
+  prompt_id: menciona_0
+  variable_id: insegurança_alimentar
+  expected:
+  - <<C>>
+  response: 'Analisando o texto, posso dizer que:
+
+
+    * Não há menção a remédios ou itens de saúde, portanto não é necessário comprar
+    nada relacionado à saúde.
+
+    * A situação descrita é uma situação de insegurança alimentar e falta de fonte
+    de renda, o que pode ser considerado um evento imprevisto e excepcional. No entanto,
+    a busca por benefício nesse caso é para obter ajuda financeira e não relacionada
+    à saúde.
+
+    * Portanto, minha resposta é: <<C>> (DEFERIDO)'
+  grade: 1
+  time_elapsed: 34.49
+  error: null