-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out how to get the database to properly recognise author papers #5
Comments
Tried to prepend each chunk with the article's title and authors, but didn't help at all. Detailed exampleA few chunks for example (all three chunks taken from the same paper authored by Julie Iskander):
But when querying: python query_data.py "Can you give me the title of any papers authored by Julie Iskander please" The corresponding prompt with the "relevant" chunks:
And Ollama (Mistral) answers:
When querying the database for papers authored by Julie Iskander, the Chroma DB similarity search failed to notice that "Julie Iskander" was in the prepended author list. Changing the surrounding text didn't change anything either e.g. printing a dict rather than a sentence didn't really help. Alternatively, I could probably use metadata filtering instead: https://python.langchain.com/v0.2/docs/integrations/vectorstores/chroma/#filtering-on-metadata |
I wanted to try other models that might perform better with publication documents (e.g. This led me to try and use Looking at the MTEB leaderboard, I tried Prompt:
Response:
Unlike using
|
Ok I've now understood that to use HuggingFaceEmbeddings, the models have to have a |
Currently, database doesn't seem to pull papers based on author(s) e.g.
python query_data.py "Do you know of any papers authored by Edward Yang?"
Which is partially right as it's one of the papers included in the dataset. Interestingly
data/1-s2.0-S0266352X20300379-main-1.pdf
(my other paper included in the paper) was thought to be more relevant, but not mentioned by the LLM - probably because the database returned a chunk later in the paper.Another example:
python query_data.py "Do you know of any papers authored by Michael Milton?"
Need to figure out how to get the database to return/recognise author information.
The text was updated successfully, but these errors were encountered: