Skip to content

Commit

Permalink
Clarified docs for save/readObject, especially for around '...'.
Browse files Browse the repository at this point in the history
Options should now be prefixed by the case-sentivei class name (for saving) and
by the object type (for reading); this is what is used for dispatch and so is
guaranteed to be non-conflicting across methods.
  • Loading branch information
LTLA committed Feb 22, 2024
1 parent 2643e0f commit 7efe119
Show file tree
Hide file tree
Showing 8 changed files with 76 additions and 50 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: alabaster.base
Title: Save Bioconductor Objects To File
Version: 1.3.19
Date: 2024-01-24
Version: 1.3.20
Date: 2024-02-21
Authors@R: person("Aaron", "Lun", role=c("aut", "cre"), email="[email protected]")
License: MIT + file LICENSE
Description:
Expand Down Expand Up @@ -30,7 +30,7 @@ LinkingTo:
Rhdf5lib
VignetteBuilder: knitr
SystemRequirements: C++17, GNU make
RoxygenNote: 7.3.0
RoxygenNote: 7.3.1
biocViews:
DataRepresentation,
DataImport
31 changes: 20 additions & 11 deletions R/AllGenerics.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#' Save objects to disk
#'
#' Generic to stage assorted R objects into appropriate on-disk representations.
#' Generic to save assorted R objects into appropriate on-disk representations.
#' More methods may be defined by other packages to extend the \pkg{alabaster.base} framework to new classes.
#'
#' @param x A Bioconductor object of the specified class.
#' @param path String containing the path to a directory in which to save \code{x}.
#' @param ... Further arguments to pass to specific methods.
#' @param ... Additional named arguments to pass to specific methods.
#'
#' @return
#' \code{dir} is created and populated with files containing the contents of \code{x}.
#' \code{NULL} should be invisibly returned.
#'
#' @details
#' Methods for the \code{stageObject} generic should create a directory at \code{path} in which the contents of \code{x} are to be saved.
#' @section Comments for extension developers:
#' Methods for the \code{saveObject} generic should create a directory at \code{path} in which the contents of \code{x} are to be saved.
#' The files may consist of any format, though language-agnostic formats like HDF5, CSV, JSON are preferred.
#' For more complex objects, multiple files and subdirectories may be created within \code{path}.
#' The only strict requirements are:
Expand All @@ -24,13 +24,22 @@
#' These are reserved for applications, e.g., to build manifests or to store additional metadata.
#' }
#'
#' Developers of \code{stageObject} methods may wish to save \dQuote{child} components of \code{x} with a subdirectory of \code{path}.
#' In such cases, developers should call \code{\link{altSaveObject}} on each child component, rather than calling \link{saveObject} directly.
#' Callers can pass optional parameters to specific \code{saveObject} methods via \code{...}.
#' Any options recognized by a method should be prefixed by the name of the class used in the method's signature,
#' e.g., any options for \code{\link{saveObject,DataFrame-method}} should start with \code{DataFrame.}.
#' This scoping avoids conflicts between otherwise identically-named options of different methods.
#'
#' When developing \code{saveObject} methods of complex objects, a simple approach is to decompose \code{x} into its \dQuote{child} components.
#' Each component can then be saved into a subdirectory of \code{path}, levering the existing \code{saveObject} methods for the component classes.
#' In such cases, extension developers should actually call \code{\link{altSaveObject}} on each child component, rather than calling \link{saveObject} directly.
#' This ensures that any application-level overrides of the loading functions are respected.
#' It is expected that each method will forward \code{...} (possibly after modification) to any internal \code{\link{altSaveObject}} calls.
#'
#' If a method makes use of additional arguments, it should be scoped by the name of the class for each method, e.g., \code{list.format}, \code{dataframe.include.nested}.
#' This avoids problems with conflicts in the interpretation of identically named arguments between different methods.
#' It is expected that arguments in \code{...} are forwarded to internal \code{\link{altSaveObject}} calls.
#' @section Comments for application developers:
#' Application developers can override \code{saveObject} by specifying a custom function in \code{\link{altSaveObject}}.
#' This can be used to point to a different function to handle the saving process for each class.
#' The custom function can be as simple as a wrapper around \code{saveObject} with some additional actions (e.g., to save more metadata),
#' or may be as complex as a full-fledged generic with its own methods for class-specific customizations.
#'
#' @author Aaron Lun
#' @examples
Expand All @@ -49,11 +58,11 @@
#' @importFrom jsonlite fromJSON
setGeneric("saveObject", function(x, path, ...) {
if (file.exists(path)) {
stop("cannot stage ", class(x)[1], " at existing path '", path, "'")
stop("cannot save ", class(x)[1], " at existing path '", path, "'")
}

# Need to search here to pick up any subclasses that might have better
# stageObject methods in yet-to-be-loaded packages.
# saveObject methods in yet-to-be-loaded packages.
if (.search_methods(x)) {
fun <- selectMethod("saveObject", class(x)[1], optional=TRUE)
if (!is.null(fun)) {
Expand Down
8 changes: 4 additions & 4 deletions R/readBaseList.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#'
#' @param path String containing a path to a directory, itself created with the list method for \code{\link{stageObject}}.
#' @param metadata Named list containing metadata for the object, see \code{\link{readObjectFile}} for details.
#' @param list.parallel Whether to perform reading and parsing in parallel for greater speed.
#' @param simple_list.parallel Whether to perform reading and parsing in parallel for greater speed.
#' Only relevant for lists stored in the JSON format.
#' @param ... Further arguments to be passed to \code{\link{altReadObject}} for complex child objects.
#'
Expand All @@ -27,14 +27,14 @@
#'
#' @export
#' @aliases loadBaseList
readBaseList <- function(path, metadata, list.parallel=TRUE, ...) {
readBaseList <- function(path, metadata, simple_list.parallel=TRUE, ...) {
all.children <- list()
child.path <- file.path(path, "other_contents")
if (file.exists(child.path)) {
all.dirs <- list.files(child.path)
all.children <- vector("list", length(all.children))
for (n in all.dirs) {
all.children[[as.integer(n) + 1L]] <- altReadObject(file.path(child.path, n), list.parallel=list.parallel, ...)
all.children[[as.integer(n) + 1L]] <- altReadObject(file.path(child.path, n), simple_list.parallel=simple_list.parallel, ...)
}
}

Expand All @@ -44,7 +44,7 @@ readBaseList <- function(path, metadata, list.parallel=TRUE, ...) {
output <- load_list_hdf5(lpath, "simple_list", all.children)
} else {
lpath <- file.path(path, "list_contents.json.gz")
output <- load_list_json(lpath, all.children, list.parallel)
output <- load_list_json(lpath, all.children, simple_list.parallel)
}

output
Expand Down
22 changes: 12 additions & 10 deletions R/readObject.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,26 @@
#' Developers of alabaster extensions can add extra functions to this registry, usually in the \code{\link{.onLoad}} function of their packages.
#' Alternatively, extension developers can request the addition of their packages to default registry.
#'
#' If a loading function makes use of additional arguments, it should be scoped by the name of the class for each method, e.g., \code{list.parallel}, \code{dataframe.include.nested}.
#' If a loading function makes use of additional arguments in \code{...},
#' those arguments should be prefixed by the name of the object type for each method, e.g., \code{simple_list.parallel}.
#' This avoids problems with conflicts in the interpretation of identically named arguments between different functions.
#' It is expected that arguments in \code{...} are forwarded to internal \code{\link{altReadObject}} calls.
#' Unlike the \code{...} arguments in \code{\link{saveObject}}, we prefix by the object type instead of the output class, as the former is used for dispatch here.
#'
#' When writing alabaster extensions, developers may need to load child objects inside the loading functions for their classes.
#' In such cases, developers should use \code{\link{altReadObject}} rather than calling \code{readObject} directly.
#' When writing loading functions for complex classes, extension developers may need to load child objects to compose the output object.
#' In such cases, developers should use \code{\link{altReadObject}} on the child subdirectories, rather than calling \code{readObject} directly.
#' This ensures that any application-level overrides of the loading functions are respected.
#' Once in memory, the child objects can then be assembled into more complex objects by the developer's loading function.
#' It is also expected that arguments in \code{...} are forwarded to internal \code{\link{altReadObject}} calls.
#'
#' Developers can manually control \code{\link{readObject}} dispatch by suppling a \code{metadata} list where \code{metadata$type} is set to the desired object type.
#' This pattern is commonly used inside the loading function for a subclass, to construct the base class first before adding the subclass-specific components.
#' In practice, base construction should be done using \code{\link{altReadObject}} so as to respect application-specific overrides.
#' Developers can manually control \code{readObject} dispatch by suppling a \code{metadata} list where \code{metadata$type} is set to the desired object type.
#' This pattern is commonly used inside the loading function for a subclass -
#' an instance of the base class is first constructed by an internal \code{readObject} call with the modified \code{metadata$type}, after which the subclass-specific slots are added.
#' (In practice, base construction should be done using \code{\link{altReadObject}} so as to respect application-specific overrides.)
#'
#' @section Comments for application developers:
#' Application developers can override the behavior of \code{readObject} by specifying a custom function in \code{\link{altReadObject}}.
#' Application developers can override \code{readObject} by specifying a custom function in \code{\link{altReadObject}}.
#' This can be used to point to a different registry of reading functions, to perform pre- or post-reading actions, etc.
#' If customization is type-specific, the custom \code{altReadObject} function can read the type from the \code{OBJECT} file to determine the most appropriate course of action;
#' this type information may then be passed to the \code{type} argument of \code{readObject} to avoid a redundant read from the same file.
#' the \code{OBJECT} metadata can then be passed to the \code{metadata} argument of any internal \code{readObject} calls to avoid a redundant read from the same file.
#'
#' @author Aaron Lun
#' @examples
Expand Down
4 changes: 2 additions & 2 deletions man/readBaseList.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 12 additions & 10 deletions man/readObject.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 22 additions & 9 deletions man/saveObject.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/stageDataFrame.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 7efe119

Please sign in to comment.