Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move node attributes under their own key in JSON displayer #109

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 26 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,18 +281,23 @@ $ cat robots.html | pup 'div#p-namespaces a'
$ cat robots.html | pup 'div#p-namespaces a json{}'
[
{
"accesskey": "c",
"href": "/wiki/Robots_exclusion_standard",
"attrs": {
"accesskey": "c",
"href": "/wiki/Robots_exclusion_standard",
"title": "View the content page [c]"
},
"tag": "a",
"text": "Article",
"title": "View the content page [c]"
"text": "Article"
},
{
"accesskey": "t",
"href": "/wiki/Talk:Robots_exclusion_standard",
"attrs": {
"accesskey": "t",
"href": "/wiki/Talk:Robots_exclusion_standard",
"rel": "discussion",
"title": "Discussion about the content page [t]"
},
"tag": "a",
"text": "Talk",
"title": "Discussion about the content page [t]"
"text": "Talk"
}
]
```
Expand All @@ -303,33 +308,27 @@ Use the `-i` / `--indent` flag to control the intent level.
$ cat robots.html | pup -i 4 'div#p-namespaces a json{}'
[
{
"accesskey": "c",
"href": "/wiki/Robots_exclusion_standard",
"attrs": {
"accesskey": "c",
"href": "/wiki/Robots_exclusion_standard",
"title": "View the content page [c]"
},
"tag": "a",
"text": "Article",
"title": "View the content page [c]"
"text": "Article"
},
{
"accesskey": "t",
"href": "/wiki/Talk:Robots_exclusion_standard",
"attrs": {
"accesskey": "t",
"href": "/wiki/Talk:Robots_exclusion_standard",
"rel": "discussion",
"title": "Discussion about the content page [t]"
},
"tag": "a",
"text": "Talk",
"title": "Discussion about the content page [t]"
"text": "Talk"
}
]
```

If the selectors only return one element the results will be printed as a JSON
object, not a list.

```bash
$ cat robots.html | pup --indent 4 'title json{}'
{
"tag": "title",
"text": "Robots exclusion standard - Wikipedia, the free encyclopedia"
}
```

Because there is no universal standard for converting HTML/XML to JSON, a
method has been chosen which hopefully fits. The goal is simply to get the
output of pup into a more consumable format.
Expand Down
6 changes: 4 additions & 2 deletions display.go
Original file line number Diff line number Diff line change
Expand Up @@ -264,13 +264,15 @@ type JSONDisplayer struct{}
func jsonify(node *html.Node) map[string]interface{} {
vals := map[string]interface{}{}
if len(node.Attr) > 0 {
attrs := map[string]interface{}{}
for _, attr := range node.Attr {
if pupEscapeHTML {
vals[attr.Key] = html.EscapeString(attr.Val)
attrs[attr.Key] = html.EscapeString(attr.Val)
} else {
vals[attr.Key] = attr.Val
attrs[attr.Key] = attr.Val
}
}
vals["attrs"] = attrs
}
vals["tag"] = node.DataAtom.String()
children := []interface{}{}
Expand Down
4 changes: 2 additions & 2 deletions tests/expected_output.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ a92e50c09cd56970625ac3b74efbddb83b2731bb table li
66950e746590d7f4e9cfe3d1adef42cd0addcf1d table li:last-of-type
0a37d612cd4c67a42bd147b1edc5a1128456b017 table a[title="The Practice of Programming"]
0d3918d54f868f13110262ffbb88cbb0b083057d table a[title="The Practice of Programming"] text{}
ecb542a30fc75c71a0c6380692cbbc4266ccbce4 json{}
199188dc8f1522426a628e41d96264bffb8beb0f json{}
95ef88ded9dab22ee3206cca47b9c3a376274bda text{}
e4f7358fbb7bb1748a296fa2a7e815fa7de0a08b .after-portlet
da39a3ee5e6b4b0d3255bfef95601890afd80709 .after
Expand All @@ -34,7 +34,7 @@ d314e83b059bb876b0e5ee76aa92d54987961f9a .navbox-list li:nth-last-child(1)
613bf65ac4042b6ee0a7a47f08732fdbe1b5b06b #toc
da39a3ee5e6b4b0d3255bfef95601890afd80709 #toc li + a
da39a3ee5e6b4b0d3255bfef95601890afd80709 #toc li + a text{}
97d170e1550eee4afc0af065b78cda302a97674c #toc li + a json{}
cd0d4cc32346750408f7d4f5e78ec9a6e5b79a0d #toc li + a json{}
da39a3ee5e6b4b0d3255bfef95601890afd80709 #toc li + a + span
da39a3ee5e6b4b0d3255bfef95601890afd80709 #toc li + span
da39a3ee5e6b4b0d3255bfef95601890afd80709 #toc li > li
Expand Down