Sends an AI request to supported Large Language Models (LLMs) and returns extracted data as a set of key-value pairs. For this request, you need to define a schema yourself.
Here is an example on how to define an element for the extraction schema:
CreateAiExtractStructuredFields(
type=FieldType.DATE,
key="lease_start_date",
prompt="Lease Start Date in YYYY-MM-DD format",
)
The CreateAiExtractStructuredFields
class has the following parameters:
key
: str, the key of the extracted fielddescription
: Optional[str] = None, a description of the fielddisplay_name
: Optional[str] = None, a display name for the fieldprompt
: Optional[str] = None, a prompt for the fieldtype
: Optional[str] = None, the type of the fieldoptions
: Optional[List[CreateAiExtractStructuredFieldsOptionsField]] = None, options for the field
The schema element serves 2 purposes:
- It defines the output key and format for the extracted data
- It provides a prompt for the AI model to understand what you are looking for
You can define multiple fields for the schema and then use it in the AI request:
# Documents
items = [AiItemBase(id=file.id, type="file") for file in hab_files]
item = items[0]
# Build the structured prompt
fields: List[CreateAiExtractStructuredFields] = []
fields.append(
CreateAiExtractStructuredFields(
type=FieldType.STRING, key="property_id", prompt="Property id"
)
)
fields.append(
CreateAiExtractStructuredFields(
type=FieldType.STRING,
key="property_type",
prompt="Property type",
options=[
CreateAiExtractStructuredFieldsOptionsField(key="HAB-1"),
CreateAiExtractStructuredFieldsOptionsField(key="HAB-2"),
CreateAiExtractStructuredFieldsOptionsField(key="HAB-3"),
],
)
)
# ... many more fields ...
# Extract data from single document
ai_response: AiExtractResponse = client.ai.create_ai_extract_structured(
items=[item], fields=fields
)
print_ai_response("Extract data from document", ai_response)
In my case, the above example returned:
Prompt: Extract data from document
Answer:
{'property_address': 'Schiaparelli Plaza Property', 'number_of_bedrooms': 1, 'lessee_email': '[email protected]', 'lessee_name': 'Marie Tharp', 'agreement_date': '2024-04-24', 'lease_start_date': '2024-05-01', 'lease_end_date': '2027-04-30', 'monthly_rent': 3125, 'property_type': 'HAB-1', 'lessor_name': 'Schiaparelli plaza', 'agreement_term': 3, 'property_id': 'HAB-1-01'}
Go ahead and execute the script src/box_ai_extract_structured.py
to see the data extraction process.
Take a look at the workshop script to see how to extract data from multiple documents.