Replies: 3 comments 1 reply
-
I don't feel there is a clear solution on how to do this well. Would be great to hear from other people about what has and hasn't worked for them. |
Beta Was this translation helpful? Give feedback.
-
Hi, the co-pilot feature I'm currently developing processes multiple queries to gather extensive information on a topic from the web. We use SearxNG, a meta-search engine, which returns snippets based on keyword searches. For all processed queries, we likely have comprehensive data on the topic. This data is further cleaned and reranked using similarity searches and certain thresholds. If you're specifically referring to scraping entire web pages, I've already explored this approach. Although I managed to achieve the lowest possible latency, the process of performing multiple queries, visiting pages, extracting information, and reranking it was not only time-consuming but also computationally expensive. I'll convert this into a discussion rather than an issue for now. |
Beta Was this translation helpful? Give feedback.
-
If you're able to solve this problem, it would put your project at the centre of open source web agents for sure. It's a problem we've been grappling with for AutoGPT and do have a solution, just wondering if a project focused specifically on web retrieval will be able to solve it even better. SearxNG is pretty fantastic! ❤️ |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
Currently, the question-answering system relies on the context provided by the search results alone. This can sometimes lead to incomplete or less accurate answers, as the context may not be comprehensive enough.
Describe the solution you'd like
Implement a feature that allows the system to open the web pages from the search results and process their content to enhance the context used for answering questions. This would involve:
Describe alternatives you've considered
Additional context
Implementing this feature would greatly improve the question-answering capabilities of the system by providing a richer context derived from the actual content of the web pages.
Beta Was this translation helpful? Give feedback.
All reactions