Update reader.py strip html tags from content, example.py #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
HTML tags were present in content:
The changes made are to reader.py:
content
field:extracted_text
assignment to usestrip=True
:RedditContent
creation to use these new stripped values.These changes will remove HTML tags and extra whitespace from both the
content
andextracted_text
fields. Theget_text(strip=True)
method removes all HTML tags and strips leading and trailing whitespace.With these modifications, the output should no longer contain HTML tags in the
content
andextracted_text
fields.Update example.py
replaced
datetime.utcnow()
withdatetime.now(timezone.utc)
asdatetime.utcnow()
is depreciated and being removed fromdatetime
.Changes:
#Old:
#New: