release 2.7.4 on CRAN

strengejacke · Aug 5, 2018 · 880cdc2 · 880cdc2
1 parent 33b31a2
commit 880cdc2
Show file tree

Hide file tree

Showing 72 changed files with 136 additions and 688 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -2,8 +2,8 @@ Package: sjmisc
 Type: Package
 Encoding: UTF-8
 Title: Data and Variable Transformation Functions
-Version: 2.7.3.9000
-Date: 2018-08-02
+Version: 2.7.4
+Date: 2018-08-04
 Authors@R: person("Daniel", "Lüdecke", role = c("aut", "cre"), email = "[email protected]", comment = c(ORCID = "0000-0002-8895-3206"))
 Maintainer: Daniel Lüdecke <[email protected]>
 Description: Collection of miscellaneous utility functions, supporting data 
@@ -17,9 +17,9 @@ Depends:
     stats,
     utils
 Imports:
-    broom (>= 0.5.0),
+    broom (>= 0.4.5),
     crayon,
-    dplyr (>= 0.7.1),
+    dplyr,
     haven (>= 1.1.2),
     magrittr,
     pillar,
@@ -28,15 +28,15 @@ Imports:
     sjlabelled (>= 1.0.12),
     stringdist (>= 0.9.4),
     stringr (>= 1.2.0),
-    tibble (>= 1.4.1),
-    tidyr (>= 0.7.0),
+    tibble,
+    tidyr,
     tidyselect
 Suggests:
     ggplot2,
     graphics,
     mice,
-    sjPlot (>= 2.4.0),
-    sjstats (>= 0.13.0),
+    sjPlot,
+    sjstats,
     knitr,
     rmarkdown,
     testthat

diff --git a/NEWS.md b/NEWS.md
@@ -207,189 +207,3 @@ The recoding and transformation functions get scoped variants, allowing to selec
 * `add_columns()` and `replace_columns()` crashed R when no data frame was specified in `...`-ellipses argument.
 * `descr()` and `frq()` used wrong variable labels when processing grouped data frames for specific situations, where the grouping variable had no sequences values.
 * `descr()` did not work for large data frames, because internally, because `psych::describe()` switched to fast mode by default then (removing columns from the output).
-
-# sjmisc 2.4.0
-
-## General
-
-* Argument `value` in `set_na()` is deprecated. Please use `na` instead.
-* Argument `recodes` in `rec()` is deprecated. Please use `rec` instead.
-* Argument `lab` in `set_label()` is deprecated. Please use `label` instead.
-* Argument `value` in `add_labels()` and `replace_labels()` is deprecated. Please use `labels` instead.
-* Argument `value` in `ref_lvl()` is deprecated. Please use `lvl` instead.
-
-## New functions
-
-* `row_sums()` as wrapper of `rowSums()` to compute row sums, but works within pipe-workflow and with select-helpers for variables, and always returns a tibble..
-* `row_means()` as wrapper of `sjstats::mean_n()` to compute row means, but works within pipe-workflow and with select-helpers for variables, and always returns a tibble..
-* `%nin%` as complement to `%in%`.
-
-## Changes to functions
-
-* Functions `rec()`, `dicho()`, `center()`, `std()`, `recode_to()` and `group_var()` get an `append`-argument, to optionally return the original data including the transformed variables as new columns.
-* `var_labels()` and `var_rename()` now give a warning if specified variables to rename or relabel do not exist in the data frame. Non-matching variables are ignored.
-* If `model.term` does not exist in models, `spread_coef()` now prints the name of non-existing coefficients.
-* `find_var()` gets a `fuzzy`-argument to enable fuzzy-matching for search pattern.
-
-## Bug fixes
-
-* `remove_empty_cols()` returned an empty data frame, when input data frame had no empty columns.
-* `remove_empty_rows()` returned an empty data frame, when input data frame had no empty rows.
-* `add_columns()` and `replace_columns()` in some cases coerced data frames of class `data.frame` with only one column into a vector, which gave an error when binding columns.
-* Argument `part.dist.match` in `str_pos()` caused an error when being larger than 0.
-
-# sjmisc 2.3.1
-
-## General
-
-* Re-exports `magrittr::%>%` (Bob Rudis said more packages should do so).
-
-## New functions
-
-* `replace_columns()` to replace variables in one data frame with variables from other data frames.
-
-## Changes to functions
-
-* `descr()` gets a `max.length`-argument to shorten variable labels in the output to a specific number of chars.
-* `descr()` now also reports the percentage of missing values.
-* `set_na()` no longer gives a warning when trying to replace values with `NA` for vectors that completely contained `NA`s.
-
-## Bug fixes
-
-* `descr()` now also works on single vectors as data argument.
-* Fixed bugs with `write_*()`-functions.
-
-# sjmisc 2.3.0
-
-## General
-
-* Added package-vignettes.
-* Functions were largely revised to work seamlessly within the tidyverse. This may break existing code, but in the long run, consistent api-design makes working with the package more intuitive. See `vignette("design_philosophy", "sjmisc")` for more details.
-* `as_labelled()` only converts vectors into `labelled`-class if vector has label attributes. This ensures that data can be properly saved into other formats, e.g. with `write_spss()`.
-* The `write_*()`-functions have been revised and should now save data frame with any common classes of vectors (labelled, factor, numeric, atomic...).
-
-## New functions
-
-* `center()` and `std()` are moving from package `sjstats` to `sjmisc`.
-* `add_columns()` to bind columns of first data frame at the end of all data frames.
-
-## Changes to functions
-
-* `frq()` now has the same argument-structure as `flat_table()`.
-* Following functions now follow a consistent tidyverse-approach, with the data being the first argument, followed by variable names: `add_labels()`, `replace_labels()`, `remove_labels()`, `count_na()`, `rec()`, `dicho()`, `split_var()`, `drop_labels()`, `fill_labels()`, `group_var()`, `group_labels()`, `ref_lvl()`, `recode_to()`, `replace_na()`, `set_na()` and `set_labels()`.
-* `get_values()` now sorts returned values by default, to be consistent with `get_labels()`.
-* `spread_coef()` gets arguments `se` and `p.val`, to define whether standard errors and p-values should be included in the return value or not.
-
-## Bug fixes
-
-* `merge_df()` did not copy label attributes for data frame with identical variables (that were row-bound).
-* `to_value()` did not work for character vectors of class labelled, with empty values (which typically have no value label).
-
-# sjmisc 2.2.1
-
-## Bug fixes
-
-* The `sort.frq` did not work `frq()`.
-
-# sjmisc 2.2.0
-
-## New functions
-
-* `zap_inf()` to "clean" vectors from `NaN` and infinite values.
-* `descr()` to provide basic descriptive statistics (similar to `describe()` in the psych-package), but including variable labels and usable in pipe-workflows. Also works with grouped data frames.
-
-## Changes to functions
-
-* `rec()`, `split_var()` and `dicho()` get an argument `suffix`, to append a suffix to variable (column) names, if applied on a data frame.
-* Value labels in `rec()` can now directly be assigned inside the `recodes`-syntax (see 'Details' in `?rec`).
-* `find_var()` gets a `as.df`-argument, to return a data frame with matching variables, instead of their column indices only.
-* `find_var()` gets a `as.varlab`-argument, to return a "summary" data frame with column number, variable name and variable label.
-* `flat_table()` now also accepts grouped data frames.
-* `flat_table()` gets a `show.values`-argument, to add values to associated labels in output.
-* `frq()` now also accepts grouped data frames.
-* `frq()` gets a `weight.by`-argument to weight frequencies.
-* `set_na()` can now also find values by their value labels and replace them with NA.
-* `set_na()` now removes unused value labels from values that have been replaced with NA.
-* The `as.tag`-argument in `set_na()` now defaults to `FALSE`.
-* `get_labels()` now always returns labels in sorted order of the associated values.
-* `get_labels()` gets a `drop.unused`-argument, to automatically drop labels from values that don't occur in the vector.
-* For a named vector as `labels`-argument, `set_labels()` now always sorts labels in sorted order of the associated values.
-* `is_empty()` gets a `first.only`-argument, to evaluate either first or all elements of a character vector.
-
-## Bug fixes
-
-* `set_na()` did not work on vectors of class `Date` when argument `as.tag = TRUE`.
-* `flat_table()` did not show values that had no value labels. Now all categories are shown in the frequency table.
-* `rec()` did not properly copy labels of tagged NA values when not all recoded values appeared in the vector.
-* `frq()` did not show correct values, when value labels of a vector were not sorted according their values.
-* `set_labels()` did not set labels properly for ordered factors.
-* `remove_labels()` returned NA-values for value labels (instead of no value labels) when the last value label of a vector was removed.
-
-
-# sjmisc 2.1.0
-
-## New functions
-
-* `find_var()` to find variables in data frames by name or label.
-* `var_labels()` as "tidyversed" alternative to `set_label()` to set variable labels.
-* `var_rename()` to rename variables.
-
-## Changes to functions
-
-* Following functions now get an ellipses-argument `...`, to apply function only to selected variables, but return the complete data frame (thus, overwriting existing variables in a data frame, if requested): `to_factor()`, `to_value()`, `to_label()`, `to_character()`, `to_dummy()`, `zap_labels()`, `zap_unlabelled()`, `zap_na_tags()`.
-
-## Bug fixes
-
-* Fixed bug with copying attributes from tibbles for `merge_df()`.
-* Fixed wrong argument-description in docs of `frq()`.
-
-# sjmisc 2.0.1
-
-## General
-
-* Removed package `coin` from Imports.
-
-## New functions
-
-* `count_na()` to print a frequency table of tagged NA values.
-
-## Changes to functions
-
-* `set_na()` gets a `drop.levels` argument to keep or drop factor levels of values that have been replaced with NA.
-* `set_na()` gets a `as.tag` argument to set NA values as regular or tagged NA.
-
-
-# sjmisc 2.0.0
-
-## General
-
-* **sjmisc** now supports _tagged_ `NA` values, a new structure for labelled missing values introduced by the [haven-package](https://cran.r-project.org/package=haven). This means that functions or arguments that are no longer useful, have been removed while other functions dealing with NA values have been largely revised.
-* All statistical functions have been removed and are now in a separate package, [sjstats](https://cran.r-project.org/package=sjstats).
-* Removed some S3-methods for `labelled`-class, as these are now provided by the haven-package.
-* Functions no longer check input for type `matrix`, to avoid conflicts with scaled vectors (that were recognized as matrix and hence treated as data frame).
-* `table(*, exclude = NULL)` was changed to `table(*, useNA = "always")`, because of planned changes in upcoming R version 3.4.
-* More functions (like `trim()` or `frq()`) now also have data frame- or list-methods.
-
-## New functions
-
-* `zap_na_tags()` to turn tagged NA values into regular NA values.
-* `spread_coef()` to spread coefficients of multiple fitted models in nested data frames into columns.
-* `merge_imputations()` to find the most likely imputed value for a missing value.
-* `flat_table()` to print flat (proportional) tables of labelled variables.
-* Added `to_character()` method.
-* `big_mark()` to format large numbers with big marks.
-* `empty_cols()` and `empty_rows()` to find variables or observations with exclusively NA values in a data frame.
-* `remove_empty_cols()` and `remove_empty_rows()` to remove variables or observations with exclusively NA values from a data frame.
-
-## Changes to functions
-* `str_contains()` gets a `switch` argument to switch the role of `x` and `pattern`.
-* `word_wrap()` coerces vectors to character if necessary.
-* `to_label()` gets a `var.label` and `drop.levels` argument, and now preserves variable labels by default.
-* Argument `def.value` in `get_label()` now also applies to data frame arguments.
-* If factor levels are numeric and factor has value labels, these are used in `to_value()` by default.
-* `to_factor()` no longer generates `NA` or `NaN`-levels when converting input into factors.
-
-## Bug fixes
-* `rec()` did not recode values, when these were the first element of a multi-line string of the `recodes` argument.
-* `is_empty()` returned `NA` instead of `TRUE` for empty character vectors.
-* Fixed bug with erroneous assignment of value labels to subset data when using `copy_labels()` ([#20](https://github.com/strengejacke/sjmisc/issues/20))
diff --git a/R/frq.R b/R/frq.R
@@ -1,4 +1,4 @@
-#' @title Frequencies of labelled variables
+#' @title Frequency table of labelled variables
 #' @name frq
 #'
 #' @description This function returns a frequency table of labelled vectors, as data frame.
@@ -7,9 +7,9 @@
 #'   according to their frequencies or not. Default is \code{"none"}, so
 #'   categories are not sorted by frequency. Use \code{"asc"} or
 #'   \code{"desc"} for sorting categories ascending or descending order.
-#' @param weight.by Name of variable in \code{x} that indicated the vector of
-#'   weights that will be applied to weight all  observations. Default is
-#'   \code{NULL}, so no weights are used.
+#' @param weight.by Bare name, or name as string, of a variable in \code{x}
+#'   that indicates the vector of weights, which will be applied to weight all
+#'   observations. Default is \code{NULL}, so no weights are used.
 #' @param auto.grp Numeric value, indicating the minimum amount of unique
 #'   values in a variable, at which automatic grouping into smaller  units
 #'   is done (see \code{\link{group_var}}). Default value for \code{auto.group}
@@ -36,7 +36,7 @@
 #'       The \code{print()}-method adds a table header with information on the
 #'       variable label, variable type, total and valid N, and mean and
 #'       standard deviations. Mean and SD are \emph{always} printed, even for
-#'       categorical vriables (factors) or character vectors. In this case,
+#'       categorical variables (factors) or character vectors. In this case,
 #'       values are coerced into numeric vector to calculate the summary
 #'       statistics.
 #'

diff --git a/R/row_sums.R b/R/row_sums.R
@@ -1,12 +1,10 @@
 #' @title Row sums and means for data frames
 #' @name row_sums
 #'
-#' @description \code{row_sums()} simply wraps \code{\link{rowSums}}, while
-#'              \code{row_means()} simply wraps \code{\link[sjstats]{mean_n}},
-#'              however, the argument-structure of both functions is designed
-#'              to work nicely within a pipe-workflow and allows select-helpers
-#'              for selecting variables and the return value is always a tibble
-#'              (with one variable).
+#' @description \code{row_sums()} and \code{row_means()} compute row sums or means
+#'    for at least \code{n} valid values per row. The functions are designed
+#'    to work nicely within a pipe-workflow and allow select-helpers
+#'    for selecting variables.
 #'
 #' @param n May either be
 #'          \itemize{
@@ -20,17 +18,17 @@
 #' @inheritParams rec
 #'
 #' @return For \code{row_sums()}, a tibble with a new variable: the row sums from
-#'         \code{x}; for \code{row_means()}, a tibble with a new variable: the row
-#'         means from \code{x}. If \code{append = FALSE}, only the new variable
-#'         with row sums resp. row means is returned. \code{total_mean()} returns
-#'         the mean of all values from all specified columns in a data frame.
+#'    \code{x}; for \code{row_means()}, a tibble with a new variable: the row
+#'    means from \code{x}. If \code{append = FALSE}, only the new variable
+#'    with row sums resp. row means is returned. \code{total_mean()} returns
+#'    the mean of all values from all specified columns in a data frame.
 #'
 #' @details For \code{n}, must be a numeric value from \code{0} to \code{ncol(x)}. If
-#'          a \emph{row} in \code{x} has at least \code{n} non-missing values, the
-#'          row mean or sum is returned. If \code{n} is a non-integer value from 0 to 1,
-#'          \code{n} is considered to indicate the proportion of necessary non-missing
-#'          values per row. E.g., if \code{n = .75}, a row must have at least \code{ncol(x) * n}
-#'          non-missing values for the row mean or sum to be calculated. See 'Examples'.
+#'    a \emph{row} in \code{x} has at least \code{n} non-missing values, the
+#'    row mean or sum is returned. If \code{n} is a non-integer value from 0 to 1,
+#'    \code{n} is considered to indicate the proportion of necessary non-missing
+#'    values per row. E.g., if \code{n = .75}, a row must have at least \code{ncol(x) * n}
+#'    non-missing values for the row mean or sum to be calculated. See 'Examples'.
 #'
 #' @examples
 #' data(efc)

diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html
diff --git a/docs/CONTRIBUTING.html b/docs/CONTRIBUTING.html
diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html
diff --git a/docs/articles/design_philosophy.html b/docs/articles/design_philosophy.html
diff --git a/docs/articles/exploringdatasets.html b/docs/articles/exploringdatasets.html
diff --git a/docs/articles/index.html b/docs/articles/index.html
diff --git a/docs/authors.html b/docs/authors.html