Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update INSTRUCTOR-Embedders examples with refactored Document class #63

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 15 additions & 11 deletions integrations/instructor-embedder.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ text_embedder = InstructorTextEmbedder(
model_name_or_path="hkunlp/instructor-base", instruction=instruction,
device="cpu"
)
text_embedder.warm_up()
result = text_embedder.run(text)
print(f"Embedding: {result['embedding']}")
print(f"Embedding Dimension: {len(result['embedding'])}")
```

### Using the Document Embedder
Expand All @@ -111,30 +115,30 @@ doc_embedder.warm_up()
# Text taken from PubMed QA Dataset (https://huggingface.co/datasets/pubmed_qa)
document_list = [
Document(
text="Oxidative stress generated within inflammatory joints can produce autoimmune phenomena and joint destruction. Radical species with oxidative activity, including reactive nitrogen species, represent mediators of inflammation and cartilage damage.",
metadata={
content="Oxidative stress generated within inflammatory joints can produce autoimmune phenomena and joint destruction. Radical species with oxidative activity, including reactive nitrogen species, represent mediators of inflammation and cartilage damage.",
meta={
"pubid": "25,445,628",
"long_answer": "yes",
},
),
Document(
text="Plasma levels of pancreatic polypeptide (PP) rise upon food intake. Although other pancreatic islet hormones, such as insulin and glucagon, have been extensively investigated, PP secretion and actions are still poorly understood.",
metadata={
content="Plasma levels of pancreatic polypeptide (PP) rise upon food intake. Although other pancreatic islet hormones, such as insulin and glucagon, have been extensively investigated, PP secretion and actions are still poorly understood.",
meta={
"pubid": "25,445,712",
"long_answer": "yes",
},
),
Document(
text="Disturbed sleep is associated with mood disorders. Both depression and insomnia may increase the risk of disability retirement. The longitudinal links among insomnia, depression and work incapacity are poorly known.",
metadata={
content="Disturbed sleep is associated with mood disorders. Both depression and insomnia may increase the risk of disability retirement. The longitudinal links among insomnia, depression and work incapacity are poorly known.",
meta={
"pubid": "25,451,441",
"long_answer": "yes",
},
),
]

result = doc_embedder.run(document_list)
print(f"Document Text: {result['documents'][0].text}")
print(f"Document Text: {result['documents'][0].content}")
print(f"Document Embedding: {result['documents'][0].embedding}")
print(f"Embedding Dimension: {len(result['documents'][0].embedding)}")
```
Expand Down Expand Up @@ -187,8 +191,8 @@ dataset = load_dataset("xsum", split="train")
# Create Document objects from the dataset and add them to the document store using the indexing pipeline
docs = [
Document(
text=doc["document"],
metadata={
content=doc["document"],
meta={
"summary": doc["summary"],
"doc_id": doc["id"],
},
Expand Down Expand Up @@ -236,8 +240,8 @@ results = query_pipeline.run(

# Print information about retrieved documents
for doc in results["Retriever"]["documents"]:
print(f"Text:\n{doc.text[:150]}...\n")
print(f"Metadata: {doc.metadata}")
print(f"Text:\n{doc.content[:150]}...\n")
print(f"Metadata: {doc.meta}")
print(f"Score: {doc.score}")
print("-" * 10 + "\n")
```
Expand Down