From cfd4b84775d67d8902f1cdce6e16b1eac063ab45 Mon Sep 17 00:00:00 2001 From: Abhinav N <114095553+Abhinav-Naikawadi@users.noreply.github.com> Date: Tue, 6 Aug 2024 13:34:28 -0700 Subject: [PATCH] Create task docs update (#39) * Update docs for task creation * remove task type * merge enrichments and fields * fallback value docs (#40) * Revamp create task docs --------- Co-authored-by: Dhruva Bansal Co-authored-by: Nihit --- docs/python-sdk.md | 55 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/docs/python-sdk.md b/docs/python-sdk.md index fd80edd..dd7e90c 100644 --- a/docs/python-sdk.md +++ b/docs/python-sdk.md @@ -367,7 +367,7 @@ These functions let you retrieve information about labeling tasks defined within ### Define a new Task -You can create a new task programmatically within a given project as follows: +You can create a new task programmatically within a given project using the `create_task` function: ```python import refuel @@ -381,33 +381,64 @@ refuel_client = refuel.init(**options) refuel_client.create_task( task='', - task_type='', dataset='', - input_columns=['col 1', 'col 2' ...], - context = '...', - fields = [{'name': '...', 'guidelines': ...}], - model = '...' + context = 'You are an expert at analyzing sentiment of an online review about a business...', + fields = [ + { + 'name': '...', + 'type': '...', + 'guidelines': '...', + 'labels': [...], + 'input_columns': [...], + 'fallback_value': '...' + }, + ... + ], + model = 'GPT-4 Turbo' ) ``` -- `task_type` is one of: `classification`, `multilabel_classification` or `attribute_extraction` -- `input_columns` is the subset of columns from the dataset that will be used as input for LLM -- `fields` is a list of dictionaries. Each dictionary contains a fixed set of keys: `name` (name of the LLM label field as it will be appear in the exported dataset), `guidelines` (labeling guidelines for the LLM) and `labels` (list of valid labels, this field is only required for classification type tasks) -- `model` is an optional parameter to select the LLM that will be used for this task. If not specified, we will use the default LLM set for your team. Here is the list of LLMs currently supported (use the model name as the parameter value): +Some details about the various parameters you see in the function signature above: + +| Parameter | Is Required | Default Value | Comments | +| :------------- | :-----------| :-------------| :------- | +| `task` | Yes | None | Name of the new task you're creating | +| `dataset` | Yes | None | Dataset (in Refuel) for which you are defining this task | +| `context` | Yes | None | Context is a high level description of the problem domain and the dataset that the LLM will be working with. It typically starts with something like 'You are and expert at ...' | +| `fields` | Yes | None | This is a list of dictionaries. Each entry in this list defines an output field generated in the task. See below for details about the schema of each field | +| `model` | No | team default | LLM that will be used for this task. If not specified, we will use the default LLM set for your team, e.g. GPT-4 Turbo | + + +Next, let's take a look at the schema of each entry in the `fields` list above: + +| Parameter | Is Required | Default Value | Comments | +| :------------- | :-----------| :-------------| :------- | +| `name` | Yes | None | Name of the output field, e.g. `llm_predicted_sentiment` | +| `type` | Yes | None | Type of output field. This is one of: [`classification`, `multilabel_classification`, `attribute_extraction`, `webpage_transform`, `web_search`] | +| `guidelines` | Yes | None | Output guidelines for the LLM for this field. Note that if the field type is a `web_search` type, the guidelines will be simply the query template | +| `labels` | Yes (for classification field types) | None | list of valid labels, this field is only required for classification type tasks | +| `input_columns` | Yes | None | Columns from the dataset to use as input when passing a "row" in the dataset to the LLM.| +| `ground_truth_column`| No | None | A column in the dataset that contains ground truth value for this field, if one exists. Note this is an optional parameter. | +| `fallback_value` | No | None | A fallback/default value that the LLM should generate for this field if a row cannot be processed successfully | + +Finally, here is the list of LLMs currently supported (use the model name as the parameter value): | Provider | Name | | :---------- | :---| | OpenAI | GPT-4 Turbo | +| OpenAI | GPT-4o | +| OpenAI | GPT-4o mini | | OpenAI | GPT-4 | | OpenAI | GPT-3.5 Turbo | -| OpenAI | GPT-3.5 Turbo (16K) | +| Anthropic | Claude 3.5 (Sonnet) | | Anthropic | Claude 3 (Opus) | -| Anthropic | Claude 3 (Sonnet) | | Anthropic | Claude 3 (Haiku) | | Google | Gemini 1.5 (Pro) | | Mistral | Mistral Small | | Mistral | Mistral Large | +| Refuel | Refuel LLM-2 | +| Refuel | Refuel LLM-2-small | ### Get Tasks