description
Paragraph-wise classification with parallelism

Finding relevant paragraphs

Classifying individual paragraphs using `classify`

Let’s start by just classifying whether the first paragraph answers a question. To do this, we’ll use a new agent method, classify. It takes a prompt and a list of choices, and returns a choice, a choice probability, and for some agent implementations an explanation.

Our single-paragraph classifier looks like this:

{% code title="paper_qa_class.py" %}

from fvalues import F

from ice.paper import Paper
from ice.paper import Paragraph
from ice.recipe import recipe


def make_prompt(paragraph: Paragraph, question: str) -> str:
    return F(
        f"""
Here is a paragraph from a research paper: "{paragraph}"

Question: Does this paragraph answer the question '{question}'? Say Yes or No.
Answer:"""
    ).strip()


async def classify_paragraph(paragraph: Paragraph, question: str) -> float:
    choice_probs, _ = await recipe.agent().classify(
        prompt=make_prompt(paragraph, question),
        choices=(" Yes", " No"),
    )
    return choice_probs.get(" Yes", 0.0)


async def answer_for_paper(paper: Paper, question: str):
    paragraph = paper.paragraphs[0]
    return await classify_paragraph(paragraph, question)


recipe.main(answer_for_paper)

{% endcode %}

Save it and run it on a paper:

{% code overflow="wrap" %}

python paper_qa_class.py --paper papers/keenan-2018.pdf --question "What was the study population?"

{% endcode %}

You should see a result like this:

0.024985359096987403

The trace looks simple:

Execution trace (view online)

According to the model, the first paragraph is unlikely to answer the question.

Classifying all paragraphs in parallel with `map_async`

To find the most relevant paragraphs, we map the paragraph classifier over all paragraphs and get the most likely ones.

For mapping, we use the utility map_async which runs the language model calls in parallel:

{% code title="paper_qa_classes.py" %}

from fvalues import F

from ice.paper import Paper
from ice.paper import Paragraph
from ice.recipe import recipe
from ice.utils import map_async


def make_prompt(paragraph: Paragraph, question: str) -> str:
    return F(
        f"""Here is a paragraph from a research paper: "{paragraph}"

Question: Does this paragraph answer the question '{question}'? Say Yes or No.
Answer:"""
    )


async def classify_paragraph(paragraph: Paragraph, question: str) -> float:
    choice_probs, _ = await recipe.agent().classify(
        prompt=make_prompt(paragraph, question),
        choices=(" Yes", " No"),
    )
    return choice_probs.get(" Yes", 0.0)


async def answer_for_paper(
    paper: Paper, question: str = "What was the study population?"
):
    probs = await map_async(
        paper.paragraphs, lambda par: classify_paragraph(par, question)
    )
    return probs


recipe.main(answer_for_paper)

{% endcode %}

You will now see a list of probabilities, one for each paragraph:

[
    0.024381454349145293,
    0.24823367447526778,
    0.21119208211186247,
    0.07488850139282821,
    0.16529937276656714,
    0.46596912974665494,
    0.09871877171479271,
    ...
    0.06523843237521842,
    0.041946178310281246,
    0.03264635093381785,
    0.023112249840077093,
    0.0018325902029144858,
    0.15962813987814772
]

Now all we need to do is add a utility function for looking up the paragraphs with the highest probabilities:

{% code title="paper_qa_ranker.py" %}

from fvalues import F

from ice.paper import Paper
from ice.paper import Paragraph
from ice.recipe import recipe
from ice.utils import map_async


def make_classification_prompt(paragraph: Paragraph, question: str) -> str:
    return F(
        f"""Here is a paragraph from a research paper: "{paragraph}"

Question: Does this paragraph answer the question '{question}'? Say Yes or No.
Answer:"""
    )


async def classify_paragraph(paragraph: Paragraph, question: str) -> float:
    choice_probs, _ = await recipe.agent().classify(
        prompt=make_classification_prompt(paragraph, question),
        choices=(" Yes", " No"),
    )
    return choice_probs.get(" Yes", 0)


async def answer_for_paper(
    paper: Paper, question: str, top_n: int = 3
) -> list[Paragraph]:
    probs = await map_async(
        paper.paragraphs, lambda par: classify_paragraph(par, question)
    )
    sorted_pairs = sorted(
        zip(paper.paragraphs, probs), key=lambda x: x[1], reverse=True
    )
    return [par for par, prob in sorted_pairs[:top_n]]


recipe.main(answer_for_paper)

{% endcode %}

Running the same command again…

{% code overflow="wrap" %}

python paper_qa_ranker.py --paper papers/keenan-2018.pdf --question "What was the study population?"

{% endcode %}

…we indeed get paragraphs that answer the question what the study population was!

{% code overflow="wrap" %}

[
    Paragraph(sentences=['A total of 1624 communities were eligible for inclusion in the trial on the basis of the most recent census (Fig. 1 ).', 'A random selection of 1533 communities were included in the current trial, and the remaining 91 were enrolled in smaller parallel trials at each site, in which additional microbiologic, anthropometric, and adverse-event data were collected.', 'In Niger, 1 community declined to participate and 20 were excluded because of census inaccuracies.', 'No randomization units were lost to follow-up after the initial census.'], sections=[Section(title='Participating Communities', number=None)], section_type='main'),
    ...
]

{% endcode %}

The trace:

Execution trace (view online)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finding-relevant-paragraphs.md

finding-relevant-paragraphs.md

Finding relevant paragraphs

Classifying individual paragraphs using `classify`

Classifying all paragraphs in parallel with `map_async`

Files

finding-relevant-paragraphs.md

Latest commit

History

finding-relevant-paragraphs.md

File metadata and controls

Finding relevant paragraphs

Classifying individual paragraphs using classify

Classifying all paragraphs in parallel with map_async

Classifying individual paragraphs using `classify`

Classifying all paragraphs in parallel with `map_async`