-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add autogenerated messages #518
base: main
Are you sure you want to change the base?
Conversation
Aaaand the Lockfile looks broken - I'll dive into it |
Hi @sorgfresser , Thanks for the PR. I've got to say, I have pretty mixed feelings about this. On the one hand I'm a bit worried about hastening the time when all actual content on the internet is drowned out by LLM garbage - https://marketoonist.com/2023/03/ai-written-ai-read.html . On the other hand, the market for apartments is already a trash market full of trash and flathunter exists to make that problem worse. I'll review the code anyway, and if we get it in a good shape I'm happy to merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good - thanks for this. Some changes I would like to see from my side, and please take a look at CI to see what pyright and pylint want.
@@ -31,7 +31,7 @@ def launch_flat_hunt(config, heartbeat: Heartbeat): | |||
|
|||
wait_during_period(time_from, time_till) | |||
|
|||
hunter = Hunter(config, id_watch) | |||
hunter = OpenAIHunter(config, id_watch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to create an OpenAIHunter
here, instead of just adding generate_text
to the filters for the default Hunter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Sorry it's been a while...
It's not just about the generate_text
, it's mostly about crawl_expose_details
which is needed for the description of an exposé and the description in turn is needed to extract the features worth mentioning in an application (at least that was my idea).
I do agree that changing the Hunter is silly and maybe the whole Hunter itself, but I did not like the idea of crawling details everytime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As another idea: I could introduce some additional logic in generate_text()
for this. I was thinking something like
def generate_text(self):
"""Add processor to generate text, if enabled"""
if self.config.openai_enabled():
if not any([isinstance(processor, CrawlExposeDetails) for processor in self.processors]):
self.crawl_expose_details()
self.processors.append(OpenAIProcessor(self.config))
return self
How about this to prevent the need for a new hunter?
@@ -181,12 +181,26 @@ def get_page(self, search_url, driver=None, page_no=None): | |||
|
|||
def get_expose_details(self, expose): | |||
"""Loads additional details for an expose by processing the expose detail URL""" | |||
soup = self.get_soup_from_url(expose['url']) | |||
driver = self.get_driver() | |||
if driver is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do I understand from this that you need to open a second tab to load the expose to get the details? Is there a reason to do this here instead of in the OpenAIProcessor
? If we have to do it here, can we have a switch to skip this if the OpenAI feature is disabled? Otherwise everybody not using OpenAI will make a bunch of unnecessary fetches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the whole get_expose_details
is broken for me with immobilienscout if one does not add a driver. This is due to a request initially returning "Checking that you're not a robot" and only once this check is finished the final content of the page will be returned. But I can simply cherry-pick this into a different pull request.
from flathunter.processor import ProcessorChain | ||
from flathunter.exceptions import BotBlockedException, UserDeactivatedException | ||
|
||
class OpenAIHunter(Hunter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine but per my other comment I think I would prefer just to change the default Hunter.
messages = [ | ||
{ | ||
"role": "system", | ||
"content": f"You are helping in generating an application for an exposé. For this, you will be provided with a dictionary which contains information on the kind of flat you are applying for and a prewritten text. The application is supposed to be written in {self.language}. Fill in the blanks, marked by [] in the text with the information from the dictionary. Do not directly copy text (except room information or similar things) from the dictionary - instead, paraphrase it." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if it makes sense also to have a German option here - presumably a lot of people what to generate their texts in German.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea! I'm currently running this with a german template and it works just fine - nonetheless, I can add an option for German if you'd like me to.
Adds OpenAI integration to automatically generate messages.
Right now, this only works for Immobilienscout as it requires
get_expose_details
to also return some kind of description and the lessor.I simply added the functionality for OpenAI for personal use but thought others might be interested.
Certainly, there are a lot of improvements that should be done before merging.
Please point them out to me so I can edit whatever should be rewritten!
Features:
crawl_expose_details
)