-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrapy price monitor update #14
base: master
Are you sure you want to change the base?
Scrapy price monitor update #14
Conversation
|
||
class CollectionHelper: | ||
"""Adapter to make interacting with scraping collection easier""" | ||
def __init__(self, proj_id, collection_name, api_key=None, create=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only problem I see with this is that it will force people to use our Collections, I'm not sure if it's needed, maybe we can just leave it out? It will make code easier to use by newbies. If someone new to Scrapy will start using it without apikey he will get errors, he might not know what to do to tix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project is intended to run on scrapy cloud, which will default to its enviornment apikey when api_key is None. The alternative would require setting up persistance to keep some sort of database which is far more complex.
for name, urls in products.items(): | ||
for url in urls: | ||
if self.name in url: | ||
now = datetime.now().strftime('%Y/%m/%d %H:%M:%S') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use isoformat here
datetime.now().isoformat()
'2021-11-19T09:30:27.341132'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually why is there need to pass item in meta? just for showing how to pass in meta? product name is already available on product page, retailer is static and now can also be taken from product page.
Output is somewhat confusing
{'product_name': 'soumission', 'retailer': 'books.toscrape.com', 'when': '2021/11/19 09:23:19', 'url': 'https://books.toscrape.com/catalogue/soumission_998/index.html', 'name': 'Soumission', 'price': 50.1}
so there is product name twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible that the name on the website may not exactly be the name on the json. The json name seems to be more of a label to indicate what product the urls should be linking to.
SHUB_PROJ_ID = os.getenv('SHUB_JOBKEY', '999999').split('/')[0] | ||
|
||
ITEM_PIPELINES = { | ||
'price_monitor.pipelines.CollectionStoragePipeline': 400, | ||
} | ||
|
||
# settings for Amazon SES email service | ||
AWS_ACCESS_KEY = os.getenv('$AWS_ACCESS_KEY') | ||
AWS_SECRET_KEY = os.getenv('$AWS_SECRET_KEY') | ||
EMAIL_ALERT_FROM = 'Price Monitor <[email protected]>' | ||
EMAIL_ALERT_TO = ['[email protected]'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tried if e-mail sending actually works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I don't personally have an AWS email service set up for this so I wasn't able to test if this part still works.
In my opinion we could skip e-mail sending, as this is extra configuration step and it's good to keep things easy for newcomers. We can just print somethign in bin/monitor without sending e-mail and add comment asking users to configure somethign, they can add slack integration, e-mail whaterver. |
I'm not certain if the intended audience of the post are total newcomers. One of the main selling points of the blog appears to be the email step so that the user gets passive notifications and I'm not sure if it's still a valuable sample if it defaults to a log you have to manually check every 30 minutes. I wonder if maybe a disclaimer explaining that this notification script is using an external service that can be changed and the user could write an alternative would be better? Update: I updated the alert step to seperate the ASN relevant code to make it easier for a user to update and/or replace this content for their preferred alert approach. |
Updating the scrapy price monitor example project as part of ecommerce content review.
Some notable changes from the original:
Old version is still present in the _scrapy_price_monitor_OLD directory.