Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
noah-art3mis committed May 7, 2024
1 parent 23f4ef0 commit ab0bd53
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 14 deletions.
25 changes: 12 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Crucible

Prompt evaluation package ("evals"). Can handle multiple models, prompts and variables at the same time.
Prompt evaluation package ("evals"). Test multiple models, prompts and variables.

Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.

Expand All @@ -10,33 +10,32 @@ Uses [ollama](https://github.com/ollama/ollama-python) to run LLMs locally.
1. Set grading style in `grading.py`. This is important.
- "binary": is either right or wrong
- "qualitative": ask claude (to be implemented)
- "out_of_ten": loses points if wrong options are present in answer
1. Run `python eval.py`.
1. Logs from the run will be in `output/<datetime>.yaml`.

## Parameters

- `model`
id (str): name as understood by ollama. you might need to download it first
- `model`
id (str): name as understood by ollama. you might need to download it first

Model("llama3")

- `prompt`
id (str): name of the test case
slots (str): name of snippet to be inserted in prompt
content (str): actual prompt
- `prompt`
id (str): name of the test case
slots (str): name of snippet to be inserted in prompt
content (str): actual prompt

Prompt(
id="test_3",
slots="{variable}",
content="""Sua tarefa é analisar e responder se o texto a seguir menciona a necessidade de comprar remédios ou itens de saúde. Aqui está o texto:\n\n###\n\n{variable}\n\n###\n\n\nPrimeiro, analise cuidadosamente o texto em um rascunho. Depois, responda: a solicitação citada menciona a necessidade de comprar remédios ou itens de saúde? Responda "<<SIM>>" ou "<<NÃO>>".""",
)

- `variable`
id (str): name of the test case
content (str): text of snippet to be inserted in prompt
expected (str list): values that would be considered correct
options (str list): all values that the response could take
- `variable`
id (str): name of the test case
content (str): text of snippet to be inserted in prompt
expected (str list): values that would be considered correct
options (str list): all values that the response could take

Variable(
id="despesas_essenciais",
Expand Down
2 changes: 1 addition & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
PROMPTS = prompts
VARIABLES = variables
TEMPERATURE = 0.0
GRADING_TYPE = "out_of_ten"
GRADING_TYPE = "binary"


def main():
Expand Down
22 changes: 22 additions & 0 deletions outputs/20240507130128.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
- !!python/object:utils.my_types.Result
id: 420eedbe4c17403792cba9e9ef287676
model: llama3
prompt_id: menciona_0
variable_id: insegurança_alimentar
expected:
- <<C>>
response: 'Analisando o texto, posso dizer que:
* Não há menção a remédios ou itens de saúde, portanto não é necessário comprar
nada relacionado à saúde.
* A situação descrita é uma situação de insegurança alimentar e falta de fonte
de renda, o que pode ser considerado um evento imprevisto e excepcional. No entanto,
a busca por benefício nesse caso é para obter ajuda financeira e não relacionada
à saúde.
* Portanto, minha resposta é: <<C>> (DEFERIDO)'
grade: 1
time_elapsed: 34.49
error: null

0 comments on commit ab0bd53

Please sign in to comment.