-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling entering/exiting objects #95
Comments
I do not quite understand the reason for "exiting." It seems the code that you are working with works well (minus one issue with the beerId being a string, not a number). Is there additional processing that you would like to do after the fact that If you have a reason to maintain your "cursor" at the parent-level, so to speak, you can always use a more complex path within the j* functions. i.e. in place of your
EDIT: I also posted an answer on the SO post, as this functionality is a solution there as well. |
Thanks for the reply on this post, I appreciate it. I think the issue I had when working with the API was that when dealing with a nested object in my data, it seemed like I needed to use the enter_object call at the very end of my pipe sequence to parse it out, which definitely works, but I think makes the API a little bit odd to work with at times. I think an exit_object would be beneficial, so that you could do
I'll definitely try and test the above code, and see if this works easier. I think it would be beneficial to add an example about this to your vignette, which I'm 100% OK with using this if you are. |
Let me see if I can get a better reprex of this, as it's been awhile since I posted this, and want to make sure I provide a MWE for you. |
I definitely agree that the enter_object() can be a little strange to work with in that its behavior is not reversible. The reprex is much appreciated, though, as it would be helpful to explicitly quantify the missing functionality. It may help clarify where development effort is best spent. Much of the package follows a similar framework of irreversible behavior (i.e. gather_array, append_values, etc.), so it is important to think through a handful of examples to see what sort of change serves best. Most examples I have seen are solved by this jstring('path1','path2') functionality, and maybe warrant a way to make that functionality more efficiently typed, if anything. |
So after looking through your code, I think it would help to clarify from the documentation when users should be "entering" an object, vs simply using the approach you stated above. The Vignette implies that you should be able to enter objects with ease, which is very true, but I think needs to be reframed with a better example to demonstrate the concept illustrated above. Originally, I was operating under the assumption that you should be entering into the object explicitly, as it is nested, and the only real way to accomplish parsing my data was to use the enter_object at the end of my pipe sequence. I'd love to help with refactoring the vignette, what do you think of adding this as an example? This only has one line of data, but illustrates the point: pacman::p_load(magrittr, tidyjson,dplyr)
poop <-'{"review/appearance": 2.5, "beer/style": "Hefeweizen", "review/palate": 1.5, "review/taste": 1.5, "beer/name": "Sausa Weizen", "review/timeUnix": 1234817823, "beer/ABV": 5.0, "beer/beerId": "47986", "beer/brewerId": "10325", "review/timeStruct": {"isdst": 0, "mday": 16, "hour": 20, "min": 57, "sec": 3, "mon": 2, "year": 2009, "yday": 47, "wday": 0}, "review/overall": 1.5, "review/text": "A lot of foam. But a lot.\\tIn the smell some banana, and then lactic and tart. Not a good start.\\tQuite dark orange in color, with a lively carbonation (now visible, under the foam).\\tAgain tending to lactic sourness.\\tSame for the taste. With some yeast and banana.", "user/profileName": "stcules", "review/aroma": 2.0}'
# json needs to have \t escaped with \\t to parse properly
#note that this also does not require the use of enter_object, but spread values twice
clean <- poop %>%
spread_values(
review_appearance = jnumber("review/appearance"),
beer_style = jstring("beer/style"),
review_palate = jnumber("review/palate"),
review_taste = jnumber("review/taste"),
beer_name = jstring("beer/name"),
review_time = jstring("review/timeUnix"),
beer_ABV = jstring("beer/ABV"),
beer_beerid = jnumber("beer/beerId"),
beer_breweryid = jstring("beer/brewerId"),
review_overall = jnumber("review/overall"),
review_text = jstring("review/text"),
profile_name = jstring("user/profileName"),
review_aroma = jnumber("review/aroma"),
isdst = jnumber("review/timeStruct","isdst"),
mday = jnumber("review/timeStruct","mday"),
hour = jnumber("review/timeStruct","hour"),
min = jnumber("review/timeStruct","min"),
sec = jnumber("review/timeStruct","sec"),
mon = jnumber("review/timeStruct","mon"),
year = jnumber("review/timeStruct","year"),
yday = jnumber("review/timeStruct","yday"),
wday = jnumber("review/timeStruct","wday")
)
dplyr::glimpse(clean)
#> Observations: 1
#> Variables: 23
#> $ document.id <int> 1
#> $ review_appearance <dbl> 2.5
#> $ beer_style <chr> "Hefeweizen"
#> $ review_palate <dbl> 1.5
#> $ review_taste <dbl> 1.5
#> $ beer_name <chr> "Sausa Weizen"
#> $ review_time <chr> "1234817823"
#> $ beer_ABV <chr> "5"
#> $ beer_beerid <dbl> 47986
#> $ beer_breweryid <chr> "10325"
#> $ review_overall <dbl> 1.5
#> $ review_text <chr> "A lot of foam. But a lot.\tIn the smell som...
#> $ profile_name <chr> "stcules"
#> $ review_aroma <dbl> 2
#> $ isdst <dbl> 0
#> $ mday <dbl> 16
#> $ hour <dbl> 20
#> $ min <dbl> 57
#> $ sec <dbl> 3
#> $ mon <dbl> 2
#> $ year <dbl> 2009
#> $ yday <dbl> 47
#> $ wday <dbl> 0 |
I definitely agree that ensuring this sort of behavior is documented and easily accessible is a great call. The vignette is a good place for it as well. For efficiency, I also think this is a worthwhile construct to consider - auto-generating the column names and then correcting as needed. I am not sure that this functionality is on the CRAN version yet - but you can acquire it using json <- "{\"review/appearance\": 2.5, \"beer/style\": \"Hefeweizen\", \"review/palate\": 1.5, \"review/taste\": 1.5, \"beer/name\": \"Sausa Weizen\", \"review/timeUnix\": 1234817823, \"beer/ABV\": 5.0, \"beer/beerId\": \"47986\", \"beer/brewerId\": \"10325\", \"review/timeStruct\": {\"isdst\": 0, \"mday\": 16, \"hour\": 20, \"min\": 57, \"sec\": 3, \"mon\": 2, \"year\": 2009, \"yday\": 47, \"wday\": 0}, \"review/overall\": 1.5, \"review/text\": \"A lot of foam. But a lot.\\tIn the smell some banana, and then lactic and tart. Not a good start.\\tQuite dark orange in color, with a lively carbonation (now visible, under the foam).\\tAgain tending to lactic sourness.\\tSame for the taste. With some yeast and banana.\", \"user/profileName\": \"stcules\", \"review/aroma\": 2.0}"
d <- json %>% spread_all()
## Rename removing 'review/timeStruct' - presuming without checking
## uniqueness
n <- names(d)
names(d) <- n %>% stringr::str_replace("review/timeStruct\\.", "")
dplyr::glimpse(d)
#> Observations: 1
#> Variables: 23
#> $ document.id <int> 1
#> $ review/appearance <dbl> 2.5
#> $ beer/style <chr> "Hefeweizen"
#> $ review/palate <dbl> 1.5
#> $ review/taste <dbl> 1.5
#> $ beer/name <chr> "Sausa Weizen"
#> $ review/timeUnix <dbl> 1234817823
#> $ beer/ABV <dbl> 5
#> $ beer/beerId <chr> "47986"
#> $ beer/brewerId <chr> "10325"
#> $ review/overall <dbl> 1.5
#> $ review/text <chr> "A lot of foam. But a lot.\tIn the smell som...
#> $ user/profileName <chr> "stcules"
#> $ review/aroma <dbl> 2
#> $ isdst <dbl> 0
#> $ mday <dbl> 16
#> $ hour <dbl> 20
#> $ min <dbl> 57
#> $ sec <dbl> 3
#> $ mon <dbl> 2
#> $ year <dbl> 2009
#> $ yday <dbl> 47
#> $ wday <dbl> 0 |
@colearendt Yeah to be honest, I'm not really sure how good of an example this is, but I just want to make understandable so others can use the API with ease, and not have to resort to esoteric methods such as the following (which I think could work): What are you thoughts on using purrr to parse the data instead of relying upon lots of specialized functions? In other words, using purrr to accelerate the core "verbs". json <- "{\"review/appearance\": 2.5, \"beer/style\": \"Hefeweizen\", \"review/palate\": 1.5, \"review/taste\": 1.5, \"beer/name\": \"Sausa Weizen\", \"review/timeUnix\": 1234817823, \"beer/ABV\": 5.0, \"beer/beerId\": \"47986\", \"beer/brewerId\": \"10325\", \"review/timeStruct\": {\"isdst\": 0, \"mday\": 16, \"hour\": 20, \"min\": 57, \"sec\": 3, \"mon\": 2, \"year\": 2009, \"yday\": 47, \"wday\": 0}, \"review/overall\": 1.5, \"review/text\": \"A lot of foam. But a lot.\\tIn the smell some banana, and then lactic and tart. Not a good start.\\tQuite dark orange in color, with a lively carbonation (now visible, under the foam).\\tAgain tending to lactic sourness.\\tSame for the taste. With some yeast and banana.\", \"user/profileName\": \"stcules\", \"review/aroma\": 2.0}"
library(tidyjson)
map(json, spread_values) |
@pgensler Very interesting thought. It seems to me that the expectation would be for library(tidyjson)
## using spread_all
"{\"a\": 1, \"b\": 2, \"c\": 3}" %>% spread_all()
#> # A tbl_json: 1 x 4 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id a b c
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 "{\"a\":1,\"b\":2,\"c..." 1 1 2 3
## using spread_values (same output)
"{\"a\": 1, \"b\": 2, \"c\": 3}" %>% spread_values(a = jnumber(a), b = jnumber(b),
c = jnumber(c))
#> # A tbl_json: 1 x 4 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id a b c
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 "{\"a\":1,\"b\":2,\"c..." 1 1 2 3
## using spread_values (with bad input)
"{}" %>% spread_values(a = jnumber(a), b = jnumber(b), c = jnumber(c))
#> # A tbl_json: 1 x 4 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id a b c
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 {} 1 NA NA NA
## using spread_all (with bad input)
"{}" %>% spread_all()
#> # A tbl_json: 1 x 1 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id
#> <chr> <int>
#> 1 {} 1 I think you make a good point, though, that If you haven't installed and explored the development version of |
Interesting example from this SO post where some way of dealing with multiple arrays in parallel would be helpful to have. The workaround by splitting into separate objects and then combining with a raw_json <- "{
\"ShipmentID\" : \"0031632569\",
\"ShipmentType\" : \"Cross-border\",
\"ShipmentStatus\" : \"Final\",
\"PartyInfo\" : [
{
\"Type\" : \"Consignee\",
\"Code\" : \"0590000001\",
\"Name\" : \"HP Inc. C\/O XPOLogistics\",
\"Address\": {
\"AddressLine\" : [
\"4000 Technology Court\"
]
},
\"City\" : {
\"CityName\" : \"Sandston\",
\"CityCode\" : [
{
\"value\" : \"USSAX\",
\"Qualifier\" : \"UN\"
}
],
\"State\" : \"VA\",
\"CountryCode\" : \"US\",
\"CountryName\" : \"United States\"
}
}
]
}" |
Hello,
First off, I would like to thank you for making such a great package, as I have truly loved using this tool to work with JSON data. I am trying to parse some JSON data, and I'M running into an issue where I would like to be able to "exit" an object...See this code as an example.....
So far, this is the code I have managed to use to extract my data....is there a way to exit the object I am trying to parse? Please let me know.
I know this SO question asks about it, not sure if you have seen this:
http://stackoverflow.com/questions/35198991/tidyjson-is-there-an-exit-object-equivalent/39829902#39829902
The only way I know to do this would be to parse the data as normal up untill that group, and then separately parse that particular object ( review/timeStruct in this case), and then append the two together. Thanks for all your hard work in putting this package together!
The text was updated successfully, but these errors were encountered: