Skip to content

Commit

Permalink
Merge pull request #246 from frictionlessdata/vignettes
Browse files Browse the repository at this point in the history
Move Data Package compatibility to vignettes
  • Loading branch information
peterdesmet authored Aug 27, 2024
2 parents f9185cb + 5e46bed commit 0520f14
Show file tree
Hide file tree
Showing 28 changed files with 876 additions and 495 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
# produced vignettes
vignettes/*.html
vignettes/*.pdf
vignettes/*.R
inst/doc

# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
* `add_resource()` now allows to replace an existing resource (#227).
* `read_resource()` now returns error if both `path` and `data` are provided (#143).
* `write_package()` no longer writes to `"."` by default, since this is not allowed by CRAN policies. The user needs to explicitly define a directory (#205).
* `write_package()` now writes incoming `null` values back to `NULL` in `datapackage.json`, rather than empty an empty lists. Properties that are assigned `NA` and `NULL` by the user, remain being written as `null` and removed respectively (#203).
* `write_package()` now writes incoming `null` values back to `NULL` in `datapackage.json`, rather than empty lists. Properties that are assigned `NA` and `NULL` by the user, remain being written as `null` and removed respectively (#203).
* New vignettes `vignette("data-package")`, `vignette("data-resource")`, `vignette("table-dialect")` and `vignette("table-schema")` describe how frictionless implements the Data Package standard. The (verbose) function documentation of `read_resource()` and `create_schema()` has been moved to these vignettes, improving readability and maintenance (#208, #246).
* The included dataset `example_package` is removed in favour of the function `example_package()`. This function allows to reproducibly provide a _local Data Package_, while before it needed to be a remote package. The `observations` resource was also changed from a remote to a local resource - allowing the entire example Data Package to be read locally - and from CSV to TSV - allowing to test for dialect. Examples and tests were updated (#114, #253).

## Changes for developers
Expand Down
18 changes: 10 additions & 8 deletions R/add_resource.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
#' Add a Data Resource
#'
#' Adds a [Data Resource](https://specs.frictionlessdata.io/data-resource/) to a
#' Data Package.
#' Adds a Data Resource to a Data Package.
#' The resource will be a [Tabular Data Resource](
#' https://specs.frictionlessdata.io/tabular-data-resource/).
#' The resource name can only contain lowercase alphanumeric characters plus
#' `.`, `-` and `_`.
#'
#' See `vignette("data-resource")` (and to a lesser extend
#' `vignette("table-dialect")`) to learn how this function implements the
#' Data Package standard.
#'
#' @inheritParams read_resource
#' @param data Data to attach, either a data frame or path(s) to CSV file(s):
#' - Data frame: attached to the resource as `data` and written to a CSV file
#' when using [write_package()].
#' - One or more paths to CSV file(s) as a character (vector): added to the
#' resource as `path`.
#' The **last file will be read** with [readr::read_delim()] to create or
#' The last file will be read with [readr::read_delim()] to create or
#' compare with `schema` and to set `format`, `mediatype` and `encoding`.
#' The other files are ignored, but are expected to have the same structure
#' and properties.
Expand All @@ -24,11 +27,10 @@
#' resource with the same name.
#' @param delim Single character used to separate the fields in the CSV file(s),
#' e.g. `\t` for tab delimited file.
#' Will be set as `delimiter` in the resource [CSV
#' dialect](https://specs.frictionlessdata.io/csv-dialect/#specification), so
#' read functions know how to read the file(s).
#' @param ... Additional [metadata
#' properties](https://specs.frictionlessdata.io/data-resource/#metadata-properties)
#' Will be set as `delimiter` in the resource Table Dialect, so read functions
#'. know how to read the file(s).
#' @param ... Additional [metadata properties](
#' https://docs.ropensci.org/frictionless/articles/data-resource.html#properties-implementation)
#' to add to the resource, e.g. `title = "My title", validated = FALSE`.
#' These are not verified against specifications and are ignored by
#' [read_resource()].
Expand Down
12 changes: 6 additions & 6 deletions R/create_package.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#' Create a Data Package
#'
#' Initiates a [Data Package](https://specs.frictionlessdata.io/data-package/)
#' object, either from scratch or from an existing list.
#' Initiates a Data Package object, either from scratch or from an existing
#' list.
#' This Data Package object is a list with the following characteristics:
#' - A `datapackage` subclass.
#' - All properties of the original `descriptor`.
#' - A [`resources`](
#' https://specs.frictionlessdata.io/data-package/#required-properties)
#' property, set to an empty list if undefined.
#' - A `resources` property, set to an empty list if undefined.
#' - A `directory` property, set to `"."` for the current directory if
#' undefined.
#' It is used as the base path to access resources with [read_resource()].
#'
#' The function will run [check_package()] on the created package to make sure
#' See `vignette("data-package")` to learn how this function implements the
#' Data Package standard.
#' [check_package()] is automatically called on the created package to make sure
#' it is valid.
#'
#' @param descriptor List to be made into a Data Package object.
Expand Down
51 changes: 5 additions & 46 deletions R/create_schema.R
Original file line number Diff line number Diff line change
@@ -1,56 +1,15 @@
#' Create a Table Schema for a data frame
#'
#' Creates a [Table Schema](https://specs.frictionlessdata.io/table-schema/) for
#' a data frame, listing all column names and types as field names and
#' (converted) types.
#' Creates a Table Schema for a data frame, listing all column names and types
#' as field names and (converted) types.
#'
#' See `vignette("table-schema")` to learn how this function implements the
#' Data Package standard.
#'
#' @param data A data frame.
#' @return List describing a Table Schema.
#' @family create functions
#' @export
#' @section Table schema properties:
#' The Table Schema will be created from the data frame columns:
#'
#' - `name`: contains the column name.
#' - `title`: not set.
#' - `description`: not set.
#' - `type`: contains the converted column type (see further).
#' - `format`: not set and can thus be considered `default`.
#' This is also the case for dates, times and datetimes, since
#' [readr::write_csv()] used by [write_package()] will format those to ISO8601
#' which is considered the default.
#' Datetimes in local or non-UTC timezones will be converted to UTC before
#' writing.
#' - `constraints`: not set, except for factors (see further).
#' - `missingValues`: not set.
#' [write_package()] will use the default `""` for missing values.
#' - `primaryKey`: not set.
#' - `foreignKeys`: not set.
#'
#' ## Field types
#'
#' The column type will determine the field `type`, as follows:
#'
#' - `character` as
#' [string](https://specs.frictionlessdata.io/table-schema/#string).
#' - `Date` as [date](https://specs.frictionlessdata.io/table-schema/#date).
#' - `difftime` as
#' [number](https://specs.frictionlessdata.io/table-schema/#number).
#' - `factor` as
#' [string](https://specs.frictionlessdata.io/table-schema/#string) with
#' factor levels as `enum`.
#' - [hms::hms()] as
#' [time](https://specs.frictionlessdata.io/table-schema/#time).
#' - `integer` as
#' [integer](https://specs.frictionlessdata.io/table-schema/#integer).
#' - `logical` as.
#' [boolean](https://specs.frictionlessdata.io/table-schema/#boolean).
#' - `numeric` as
#' [number](https://specs.frictionlessdata.io/table-schema/#number).
#' - `POSIXct`/`POSIXlt` as
#' [datetime](https://specs.frictionlessdata.io/table-schema/#datetime).
#' - Any other type as
#' [any](https://specs.frictionlessdata.io/table-schema/#any).
#' @examples
#' # Create a data frame
#' df <- data.frame(
Expand Down
9 changes: 5 additions & 4 deletions R/get_schema.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
#' Get the Table Schema of a Data Resource
#'
#' Returns the [Table Schema](https://specs.frictionlessdata.io/table-schema/)
#' of a Data Resource (in a Data Package), i.e. the content of its `schema`
#' property, describing the resource's fields, data types, relationships, and
#' missing values.
#' Returns the Table Schema of a Data Resource (in a Data Package), i.e. the
#' content of its `schema` property, describing the resource's fields, data
#' types, relationships, and missing values.
#' The resource must be a [Tabular Data Resource](
#' https://specs.frictionlessdata.io/tabular-data-resource/).
#'
#' See `vignette("table-schema")` to learn more about Table Schema.
#'
#' @inheritParams read_resource
#' @return List describing a Table Schema.
#' @family accessor functions
Expand Down
3 changes: 3 additions & 0 deletions R/read_package.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
#' https://specs.frictionlessdata.io/data-package/#descriptor) file that
#' describes the Data Package metadata and its Data Resources.
#'
#' See `vignette("data-package")` to learn how this function implements the
#' Data Package standard.
#'
#' @param file Path or URL to a `datapackage.json` file.
#' @return A Data Package object, see [create_package()].
#' @family read functions
Expand Down
167 changes: 6 additions & 161 deletions R/read_resource.R
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
#' Read data from a Data Resource into a tibble data frame
#'
#' Reads data from a [Data Resource](
#' https://specs.frictionlessdata.io/data-resource/) (in a Data Package) into a
#' tibble (a Tidyverse data frame).
#' Reads data from a Data Resource (in a Data Package) into a tibble (a
#' Tidyverse data frame).
#' The resource must be a [Tabular Data Resource](
#' https://specs.frictionlessdata.io/tabular-data-resource/).
#' The function uses [readr::read_delim()] to read CSV files, passing the
#' resource properties `path`, CSV dialect, column names, data types, etc.
#' Column names are taken from the provided Table Schema (`schema`), not from
#' the header in the CSV file(s).
#'
#' See `vignette("data-resource")`, `vignette("table-dialect")` and
#' `vignette("table-schema")` to learn how this function implements the
#' Data Package standard.
#'
#' @param package Data Package object, as returned by [read_package()] or
#' [create_package()].
#' @param resource_name Name of the Data Resource.
Expand All @@ -22,164 +25,6 @@
#' frame.
#' @family read functions
#' @export
#' @section Resource properties:
#' The [Data Resource properties](
#' https://specs.frictionlessdata.io/data-resource/) are handled as follows:
#'
#' ## Path
#'
#' [`path`](https://specs.frictionlessdata.io/data-resource/#data-location) is
#' required.
#' It can be a local path or URL, which must resolve.
#' Absolute path (`/`) and relative parent path (`../`) are forbidden to avoid
#' security vulnerabilities.
#'
#' When multiple paths are provided (`"path": [ "myfile1.csv", "myfile2.csv"]`)
#' then data are merged into a single data frame, in the order in which the
#' paths are listed.
#'
#' ## Data
#'
#' If `path` is not present, the function will attempt to read data from the
#' `data` property.
#' **`schema` will be ignored**.
#'
#' ## Name
#'
#' `name` is [required](https://specs.frictionlessdata.io/data-resource/#name).
#' It is used to find the resource with `name` = `resource_name`.
#'
#' ## Profile
#'
#' `profile` is [required](
#' https://specs.frictionlessdata.io/tabular-data-resource/#specification) to
#' have the value `tabular-data-resource`.
#'
#' ## File encoding
#'
#' `encoding` (e.g. `windows-1252`) is [required](
#' https://specs.frictionlessdata.io/data-resource/#optional-properties) if the
#' resource file(s) is not encoded as UTF-8.
#' The returned data frame will always be UTF-8.
#'
#' ## CSV Dialect
#'
#' `dialect` properties are [required](
#' https://specs.frictionlessdata.io/csv-dialect/#specification) if the resource
#' file(s) deviate from the default CSV settings (see below).
#' It can either be a JSON object or a path or URL referencing a JSON object.
#' Only deviating properties need to be specified, e.g. a tab delimited file
#' without a header row needs:
#' ```json
#' "dialect": {"delimiter": "\t", "header": "false"}
#' ```
#'
#' These are the CSV dialect properties.
#' Some are ignored by the function:
#' - `delimiter`: default `,`.
#' - `lineTerminator`: ignored, line terminator characters `LF` and `CRLF` are
#' interpreted automatically by [readr::read_delim()], while `CR` (used by
#' Classic Mac OS, final release 2001) is not supported.
#' - `doubleQuote`: default `true`.
#' - `quoteChar`: default `"`.
#' - `escapeChar`: anything but `\` is ignored and it will set `doubleQuote` to
#' `false` as these fields are mutually exclusive.
#' You can thus not escape with `\"` and `""` in the same file.
#' - `nullSequence`: ignored, use `missingValues`.
#' - `skipInitialSpace`: default `false`.
#' - `header`: default `true`.
#' - `commentChar`: not set by default.
#' - `caseSensitiveHeader`: ignored, header is not used for column names, see
#' Schema.
#' - `csvddfVersion`: ignored.
#'
#' ## File compression
#'
#' Resource file(s) with `path` ending in `.gz`, `.bz2`, `.xz`, or `.zip` are
#' automatically decompressed using default [readr::read_delim()]
#' functionality.
#' Only `.gz` files can be read directly from URL `path`s.
#' Only the extension in `path` can be used to indicate compression type,
#' the `compression` property is [ignored](
#' https://specs.frictionlessdata.io/patterns/#specification-3).
#'
#' ## Ignored resource properties
#'
#' - `title`
#' - `description`
#' - `format`
#' - `mediatype`
#' - `bytes`
#' - `hash`
#' - `sources`
#' - `licenses`
#' @section Table schema properties:
#' `schema` is required and must follow the [Table Schema](
#' https://specs.frictionlessdata.io/table-schema/) specification.
#' It can either be a JSON object or a path or URL referencing a JSON object.
#'
#' - Field `name`s are used as column headers.
#' - Field `type`s are use as column types (see further).
#' - [`missingValues`](
#' https://specs.frictionlessdata.io/table-schema/#missing-values) are used to
#' interpret as `NA`, with `""` as default.
#'
#' ## Field types
#'
#' Field `type` is used to set the column type, as follows:
#'
#' - [string](https://specs.frictionlessdata.io/table-schema/#string) as
#' `character`; or `factor` when `enum` is present.
#' `format` is ignored.
#' - [number](https://specs.frictionlessdata.io/table-schema/#number) as
#' `double`; or `factor` when `enum` is present.
#' Use `bareNumber: false` to ignore whitespace and non-numeric characters.
#' `decimalChar` (`.` by default) and `groupChar` (undefined by default) can
#' be defined, but the most occurring value will be used as a global value for
#' all number fields of that resource.
#' - [integer](https://specs.frictionlessdata.io/table-schema/#integer) as
#' `double` (not integer, to avoid issues with big numbers); or `factor` when
#' `enum` is present.
#' Use `bareNumber: false` to ignore whitespace and non-numeric characters.
#' - [boolean](https://specs.frictionlessdata.io/table-schema/#boolean) as
#' `logical`.
#' Non-default `trueValues/falseValues` are not supported.
#' - [object](https://specs.frictionlessdata.io/table-schema/#object) as
#' `character`.
#' - [array](https://specs.frictionlessdata.io/table-schema/#array) as
#' `character`.
#' - [date](https://specs.frictionlessdata.io/table-schema/#date) as `date`.
#' Supports `format`, with values `default` (ISO date), `any` (guess `ymd`)
#' and [Python/C strptime](
#' https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior)
#' patterns, such as `%a, %d %B %Y` for `Sat, 23 November 2013`.
#' `%x` is `%m/%d/%y`.
#' `%j`, `%U`, `%w` and `%W` are not supported.
#' - [time](https://specs.frictionlessdata.io/table-schema/#time) as
#' [hms::hms()].
#' Supports `format`, with values `default` (ISO time), `any` (guess `hms`)
#' and [Python/C strptime](
#' https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior)
#' patterns, such as `%I%p%M:%S.%f%z` for `8AM30:00.300+0200`.
#' - [datetime](https://specs.frictionlessdata.io/table-schema/#datetime) as
#' `POSIXct`.
#' Supports `format`, with values `default` (ISO datetime), `any`
#' (ISO datetime) and the same patterns as for `date` and `time`.
#' `%c` is not supported.
#' - [year](https://specs.frictionlessdata.io/table-schema/#year) as `date`,
#' with `01` for month and day.
#' - [yearmonth](https://specs.frictionlessdata.io/table-schema/#yearmonth) as
#' `date`, with `01` for day.
#' - [duration](https://specs.frictionlessdata.io/table-schema/#duration) as
#' `character`.
#' Can be parsed afterwards with [lubridate::duration()].
#' - [geopoint](https://specs.frictionlessdata.io/table-schema/#geopoint) as
#' `character`.
#' - [geojson](https://specs.frictionlessdata.io/table-schema/#geojson) as
#' `character`.
#' - [any](https://specs.frictionlessdata.io/table-schema/#any) as `character`.
#' - Any other value is not allowed.
#' - Type is guessed if not provided.
#' @examples
#' # Read a datapackage.json file
#' package <- read_package(
Expand Down
4 changes: 2 additions & 2 deletions R/remove_resource.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Remove a Data Resource
#'
#' Removes a [Data Resource](https://specs.frictionlessdata.io/data-resource/)
#' from a Data Package, i.e. it removes one of the described `resources`.
#' Removes a Data Resource from a Data Package, i.e. it removes one of the
#' described `resources`.
#'
#' @inheritParams read_resource
#' @return `package` with one fewer resource.
Expand Down
2 changes: 1 addition & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ unique_sorted <- function(x) {
#' Clean list
#'
#' Removes all elements from a list that meet a criterion function, e.g.
#' `is.null(x)` for empty elements.
#' [is.null()] for empty elements.
#' Removal can be recursive to guarantee elements are removed at any level.
#' Function is copied and adapted from `rlist::list.clean()` (MIT licensed), to
#' avoid requiring full `rlist` dependency.
Expand Down
2 changes: 1 addition & 1 deletion R/write_package.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#' location of file(s).
#' - Resource `path` has only URL(s): resource stays as is.
#' - Resource has inline `data` originally: resource stays as is.
#' - Resource has inline `data` as result of adding data with `add_resource()`:
#' - Resource has inline `data` as result of adding data with [add_resource()]:
#' data are written to a CSV file using [readr::write_csv()], `path` points to
#' location of file, `data` property is removed.
#' Use `compress = TRUE` to gzip those CSV files.
Expand Down
Loading

0 comments on commit 0520f14

Please sign in to comment.