Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testing example #105

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/workflows/autoblocks-testing.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Autoblocks Testing

on:
push: # Run on every push.
schedule: # Run every day at ~7:17am PST.
- cron: '17 15 * * *'

jobs:
py:
runs-on: ubuntu-latest

defaults:
run:
shell: bash
working-directory: Python/testing-sdk

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install poetry
run: curl -sSL https://install.python-poetry.org | python3 -

- name: Check pyproject.toml & poetry.lock are in sync
run: poetry lock --check

- name: Install dependencies
run: poetry install

- name: Run Autoblocks tests
run: npx autoblocks testing exec -- poetry run start
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AUTOBLOCKS_API_KEY: ${{ secrets.AUTOBLOCKS_API_KEY }}
115 changes: 115 additions & 0 deletions Python/testing-sdk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
<!-- banner start -->
<p align="center">
<img src="https://app.autoblocks.ai/images/logo.png" width="300px">
</p>

<p align="center">
📚
<a href="https://docs.autoblocks.ai/">Documentation</a>
&nbsp;
&nbsp;
🖥️
<a href="https://app.autoblocks.ai/">Application</a>
&nbsp;
&nbsp;
🏠
<a href="https://www.autoblocks.ai/">Home</a>
</p>
<!-- banner end -->

## Setup

### Install [`poetry`](https://python-poetry.org/)

```bash
curl -sSL https://install.python-poetry.org | python3 -
```

### Install [`npm`](https://docs.npmjs.com/about-npm)

> **_NOTE:_** You might already have this installed. Check with `npm -v`.

If you don't have `node` or `npm` installed, we recommend you use `nvm` to do so:

#### Install [`nvm`](https://github.com/nvm-sh/nvm)

```bash
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
```

#### Install `node` and `npm`

```bash
nvm install node
```

#### Set the default version when starting a new shell

```bash
nvm alias default node
```

### Install dependencies

```
poetry install
```

## Run Autoblocks tests

### Set your Autoblocks API key

Retrieve your **local testing API key** from the [settings page](https://app.autoblocks.ai/settings/api-keys) and set it as an environment variable:

```bash
export AUTOBLOCKS_API_KEY=...
```

### Set your OpenAI API key

```bash
export OPENAI_API_KEY=...
```

### Run the tests

```bash
npx autoblocks testing exec -m "my first run" -- poetry run start
```

You should see something like:

<img width="1668" alt="Screenshot 2024-02-22 at 1 23 50 PM" src="https://github.com/autoblocksai/autoblocks-examples/assets/7498009/ded25aa8-1439-432d-86a6-7254b27b970b">

You can click on the links next to each test name to dig into more details.
You can also find all of your tests on the testing homepage in the [Autoblocks application](https://app.autoblocks.ai/testing/local).

## GitHub Actions setup

A starter workflow was added in [`.github/workflows/autoblocks-testing.yml`](./.github/workflows/autoblocks-testing.yml).
This workflow runs the tests on every push to the repository and also
on a daily schedule.

## Repo structure

```
my_project/
run.py <-- imports all tests from tests/ and runs them
evaluators/ <-- all common evaluators are implemented here
some_shared_evaluator1.py
some_shared_evaluator2.py
tasks/ <-- all "tasks" are implemented here
task1.py
task2.py
test_suites/ <-- tests for each task
task1/
__init__.py <-- implements the runner for task1
evaluators.py <-- evaluators used only for task1
test_cases.py <-- contains test cases for task1
task2/
__init__.py <-- implements the runner for task2
evaluators.py <-- evaluators used only for task2
test_cases.py <-- contains test cases for task2
```
Empty file.
Empty file.
61 changes: 61 additions & 0 deletions Python/testing-sdk/my_project/evaluators/has_substrings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import abc
from typing import Any
from typing import List
from typing import Optional

from autoblocks.testing.models import BaseEvaluator
from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold


class BaseHasSubstrings(BaseEvaluator, abc.ABC):
id = "has-substrings"

"""
Can be overriden by subclassing and changing this threshold. For example:

class MyEvaluator(HasSubstrings):
threshold = None

run_test_suite(
...
evaluators=[
MyEvaluator()
],
...
)
"""
threshold: Optional[Threshold] = Threshold(gte=1)

@abc.abstractmethod
def expected_substrings(self, test_case: BaseTestCase) -> List[str]:
"""
Required to be implemented by the subclass.

In most cases this will just return the field from the test case that contains the expected substrings,
but since it's a method, it can be used to calculate the expected substrings in a more complex way
if appropriate.

For example:

class MyEvaluator(HasSubstrings):
def expected_substrings(self, test_case: MyTestCase) -> List[str]:
return test_case.expected_substrings
"""
...

def output_as_str(self, output: Any) -> str:
"""
Can be overriden by the subclass to change how the output is converted to a string.
"""
return str(output)

def evaluate(self, test_case: BaseTestCase, output: Any) -> Evaluation:
expected_substrings = self.expected_substrings(test_case)
output_as_str = self.output_as_str(output)

for substring in expected_substrings:
if substring not in output_as_str:
return Evaluation(score=0, threshold=self.threshold)
return Evaluation(score=1, threshold=self.threshold)
47 changes: 47 additions & 0 deletions Python/testing-sdk/my_project/evaluators/is_valid_json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import json
from typing import Any
from typing import Optional

from autoblocks.testing.models import BaseEvaluator
from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold


class IsValidJson(BaseEvaluator):
id = "is-valid-json"

"""
Can be overriden by subclassing and changing this threshold. For example:

class MyEvaluator(IsValidJson):
threshold = None

run_test_suite(
...
evaluators=[
MyEvaluator()
],
...
)
"""
threshold: Optional[Threshold] = Threshold(gte=1)

def output_as_str(self, output: Any) -> str:
"""
Can be overriden by the subclass to change how the output is converted to a string.

For example:

class MyEvaluator(IsValidJson):
def output_as_str(self, output: SomeCustomOutputType) -> str:
return output.as_json()
"""
return str(output)

def evaluate(self, test_case: BaseTestCase, output: Any) -> Evaluation:
try:
json.loads(self.output_as_str(output))
return Evaluation(score=1, threshold=self.threshold)
except json.JSONDecodeError:
return Evaluation(score=0, threshold=self.threshold)
11 changes: 11 additions & 0 deletions Python/testing-sdk/my_project/run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from my_project.test_suites import flashcard_generator
from my_project.test_suites import study_guide_outline


def run():
# Autoblocks handles running these tests asynchronously behind the scenes
# in a dedicated event loop, so no need to attempt to add any concurrency
# here or use asyncio.run() or similar. Just simply call each test suite's
# run() function and Autoblocks will handle the rest.
flashcard_generator.run()
study_guide_outline.run()
Empty file.
121 changes: 121 additions & 0 deletions Python/testing-sdk/my_project/tasks/flashcard_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
import dataclasses
import json
from typing import List

from openai import OpenAI
from openai.types.chat.completion_create_params import ResponseFormat

openai_client = OpenAI()


@dataclasses.dataclass()
class Flashcard:
front: str
back: str


system_prompt = """Given a user's notes, generate flashcards that will allow the user to study those notes.

Your first task is to identify the facts or key points in the notes.
Then, create a flashcard for each fact or key point.
The front of the flashcard should be a question, and the back of the flashcard should be the answer to that question.
Each flashcard should be supported by content from the notes.
Ignore the tone of the notes and always make the flashcards in a professional tone.
Ignore any subjective commentary in the notes and only focus on the facts or key points.
Return the results as JSON in the below format:

```
{
"cards": [
{
"front": "What is the capital of France?",
"back": "Paris"
},
{
"front": "Who painted the Mona Lisa?",
"back": "Leonardo da Vinci"
}
]
}
```

Only return JSON in your response, nothing else. Do not include the backticks.

Example:

Notes:

'''
Am. History Notes 🇺🇸
Beginnings & Stuff
Columbus 1492, "found" America but actually not the first.
Native Americans were here first, tons of diff cultures.
Colonies & Things
13 Colonies cuz Brits wanted $ and land.
Taxation w/o Representation = Colonists mad at British taxes, no say in gov.
Boston Tea Party = Tea in the harbor, major protest.
Revolution Time
Declaration of Independence, 1776, basically "we're breaking up with you, Britain".
George Washington = First pres, war hero.
Moving West
Manifest Destiny = Idea that the US was supposed to own all land coast to coast.
Louisiana Purchase, 1803, Thomas Jefferson bought a ton of land from France.
'''

Flashcards:

{
"cards": [
{
"front": "Who was the first president of the United States?",
"back": "George Washington"
},
{
"front": "What was the idea that the US was supposed to own all land coast to coast?",
"back": "Manifest Destiny"
},
{
"front": "What was the year of the Louisiana Purchase?",
"back": "1803"
}
]
}
"""

user_prompt = """Notes:

'''
{notes}
'''

Flashcards:"""


def gen_flashcards_from_notes(notes: str) -> List[Flashcard]:
"""
Generates flashcards based on a user's notes.
"""
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo-1106",
temperature=0.0,
response_format=ResponseFormat(type="json_object"),
messages=[
dict(
role="system",
content=system_prompt,
),
dict(
role="user",
content=user_prompt.format(notes=notes),
),
],
)
raw_content = response.choices[0].message.content.strip()
parsed_content = json.loads(raw_content)
return [
Flashcard(
front=parsed_content["front"],
back=parsed_content["back"],
)
for parsed_content in parsed_content["cards"]
]
Loading
Loading