Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
michalovadek authored Oct 23, 2020
1 parent b043911 commit 3239c0e
Show file tree
Hide file tree
Showing 5 changed files with 38 additions and 6 deletions.
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# eurlex 0.3.3

## Minor changes

- hotfix for critical bug in xml parsing that scrambled column with legal basis where this was requested

# eurlex 0.3.2

## Major changes
Expand Down
3 changes: 1 addition & 2 deletions R/elx_make_query.R
Original file line number Diff line number Diff line change
Expand Up @@ -155,8 +155,7 @@ elx_make_query <- function(resource_type = c("directive","regulation","decision"
?type=<http://publications.europa.eu/resource/authority/resource-type/DEC_IMPL>||
?type=<http://publications.europa.eu/resource/authority/resource-type/DEC_DEL>||
?type=<http://publications.europa.eu/resource/authority/resource-type/DEC_FRAMW>||
?type=<http://publications.europa.eu/resource/authority/resource-type/JOINT_DEC>||
?type=<http://publications.europa.eu/resource/authority/resource-type/DEC_NC>)", sep = " ")
?type=<http://publications.europa.eu/resource/authority/resource-type/JOINT_DEC>)", sep = " ")
}

if (resource_type == "caselaw"){
Expand Down
13 changes: 10 additions & 3 deletions R/elx_parse_xml.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,16 @@ elx_parse_xml <- function(sparql_response = ""){
res_cols <- res_binding %>%
xml2::xml_attr("name")

out <- data.frame(res_cols, res_text, stringsAsFactors = FALSE) %>%
dplyr::group_by(res_cols) %>%
dplyr::mutate(triplet = dplyr::row_number()) %>%
unique(res_cols)

out <- dplyr::tibble(res_cols, res_text) %>%
dplyr::mutate(is_work = ifelse(res_cols=="work", T, NA)) %>%
dplyr::group_by(is_work) %>%
dplyr::mutate(triplet = dplyr::row_number(),
triplet = ifelse(is_work==T, triplet, NA)) %>%
dplyr::ungroup() %>%
tidyr::fill(triplet) %>%
dplyr::select(-.data$is_work) %>%
tidyr::pivot_wider(names_from = res_cols, values_from = res_text) %>%
dplyr::select(-.data$triplet)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ For the moment, it is recommended to retrieve metadata one variable at a time. F
2. `dates <- elx_make_query("directive", include_date_transpos = TRUE) %>% elx_run_query()`
3. `ids %>% dplyr::left_join(lbs) %>% dplyr::left_join(dates)`

rather than `elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE)`. The reason is that observations with missing data on any variable are currently dropped entirely when cumulating variable requests. By separating the calls, you are able to at least identify the missing data.
rather than `elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE)`. The reason is that rows with missing data on any variable are currently dropped entirely when cumulating variable requests. By separating the calls, you are able to identify the missing data, while retaining data from other columns.

One of the main contributions of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the `rvest` package), the function `elx_fetch_data()` enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI). The function currently enables downloading the title and the full text of a document in all available languages.

Expand Down
20 changes: 20 additions & 0 deletions eurlex.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source

0 comments on commit 3239c0e

Please sign in to comment.