-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move more parsing of strings and html server-side #575
base: master
Are you sure you want to change the base?
Changes from 7 commits
480f3b0
d68ff5e
b9af04d
e2dfb97
ddec724
e067568
e22c9bf
30140f8
8189ad2
e598a58
b322724
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
import {toLaxTitleCase} from '@frogpond/titlecase' | ||
|
||
import _jsdom from 'jsdom' | ||
const {JSDOM} = _jsdom | ||
|
||
export {encode, decode} from 'html-entities' | ||
|
||
// Html | ||
|
||
export function parseHtml(string) { | ||
return JSDOM.fragment(string).textContent.trim() | ||
} | ||
|
||
export function innerTextWithSpaces(elem) { | ||
return JSDOM.fragment(elem).split(/\s+/u).join(' ').trim() | ||
} | ||
|
||
export function removeHtmlWithRegex(str) { | ||
return str.replace(/<[^>]*>/gu, ' ') | ||
} | ||
|
||
export function fastGetTrimmedText(str) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we have JSDOM here and aren't resource constrained, I'd like to remove this fn in favor of a JSDOM-based solution There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for mentioning this. We will go with the |
||
return removeHtmlWithRegex(str).replace(/\s+/gu, ' ').trim() | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
export {get} from './http' | ||
export * from './cache' | ||
export * from './url' | ||
export * from './html' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we handle these here before adding this module? Does JSDOM handle this for us automatically when we call textContent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. Looks like JSDOM handles decoding the entities properly for us. I've created a Repl to show the differences between
fastGetTrimmedText
and JSDOM'stextContent
.