Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose fetch_and_sort_papers() parameters in literature_review() #15

Open
sanjaynagi opened this issue Apr 16, 2023 · 6 comments
Open

Comments

@sanjaynagi
Copy link
Contributor

I think it would be great to be able to configure the following in the literature_review() function:

  • keyword combinations
  • year_range
  • top_n papers
@sanjaynagi
Copy link
Contributor Author

feel free to assign to me @eimenhmdt , though I'm sure you can handle that... :P

@eimenhmdt
Copy link
Owner

Thanks for the suggestions! What do you mean by keyword combinations?

Regarding the top_n papers, we need to be cautious to not create a context that's too large due to token limitations of GPT 3.5. What are your thoughts on this?

@sanjaynagi
Copy link
Contributor Author

I was thinking that if users wanted to, they could pass their own keyword_combinations which would be passed to semanticScholar, avoiding the first call to the openAI API :)

Thats true! have you any idea what that context would look like atm in terms of number of papers? If I get time today Ill try run the algorithm with quite a few and see. We would probably want a limit anyway so a user doesn't accidentally choose 1000 top papers or something and burn through their tokens.

@eimenhmdt
Copy link
Owner

re keywords: I think that's a good idea. Many researchers probably already know very well which keyword combinations they would want to use. If you want, it would be really cool if you could build this. :)

re context: Depends on the model you use. Currently, AutoResearcher uses GPT-3.5 turbo by default. But, it's also possible to use GPT-4. I think if we want to extract useful & sufficient information from each paper, 20–25 papers (GPT-4 2x this number) should be max in the context. But, I could also think of a refined algorithm that circumvents these limitations, i.e. adding additional steps, using a vector store etc.

@sanjaynagi
Copy link
Contributor Author

Interesting!

Sure - got my PhD viva in a few days though but will tackle after that.

@eimenhmdt
Copy link
Owner

Oh, cool. Best of luck for your viva!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants