You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can see an example of how to get that here: #287 (comment)
Basically, what you can do is get the parent node via a selector, and then iterate over the selection returned by Contents() (as it is the only one that selects not only HTML elements but all types of nodes, like text and comments) and look for the text nodes to extract the text. What you have there is somewhat weird HTML (in the sense that usually, the spans are there to hold some text, but here it is empty - though with CSS class - and the text is in-between), but anyway sometimes we have to work with broken HTML so that may be your case.
For example:
const data = `
<html>
<body>
<p>
<span class="fas fa-user"></span> Published By <a href="mailto:[email protected]"> University Relations Directorate<!--Henry Amoah--></a> | <span class="fas fa-calendar"> </span> Monday October 3, 2022 | <span class="fas fa-clock"></span> 2:27 pm
</p>
</body>
</html>
`
func main() {
doc, err := goquery.NewDocumentFromReader(strings.NewReader(data))
if err != nil {
log.Fatal(err)
}
doc.Find("p").Contents().Each(func(i int, s *goquery.Selection) {
if goquery.NodeName(s) == "#text" {
fmt.Printf(">>> (%d) >>> %s\n", i, s.Text())
}
})
}
This would print (note that it doesn't get the "University Relations Directorate" as it is not a "free text" element, it is text inside the <a> element):
>>> (0) >>>
>>> (2) >>> Published By
>>> (4) >>> |
>>> (6) >>> Monday October 3, 2022 |
>>> (8) >>> 2:27 pm
<span class="fas fa-user"></span> Published By <a href="mailto:[email protected]"> University Relations Directorate<!--Henry Amoah--></a> | <span class="fas fa-calendar"> </span> Monday October 3, 2022 | <span class="fas fa-clock"></span> 2:27 pm
how do I go about writing it
Originally posted by @davidAg9 in #298 (comment)
The text was updated successfully, but these errors were encountered: