Skip to content

Latest commit

 

History

History
180 lines (159 loc) · 36.6 KB

README.md

File metadata and controls

180 lines (159 loc) · 36.6 KB

HSDS

The goal of HSDS is to make all the data sets of the book “A Handbook of Small Data Sets” (1994) of David J. Hand available. These data sets are particularly useful to demonstrate examples of function or statistical tests, but also to teach about statistics and R.

All data sets are already available individually at this repo: https://github.com/JedStephens/Handbook-of-Small-Data-Sets/tree/master. However, they are not immediately usable in R, and undocumented. This package aims to solve this issue, and provide clean and documented data sets.

Do you like this package and want to support me ? “Buy Me A Coffee”

Installation

You can install the development version of HSDS like so:

devtools::install_github("ABohynDOE/HSDS")

Available data sets

The book contains more than 500 data sets. For the moment, only some are available. They are summarized in the table below, along with their names, what they contain, their structure, and the type of variables present.

name Title Structure Variables

Germinating seeds

48 × 3

factor(2), numeric(1)

Guessing lengths

113 × 3

character(1), numeric(2)

Darwin’s cross-fertilized and self-fertilized plants

30 × 3

factor(1), integer(1), numeric(1)

Intervals between cars on the M1 motorway

41 × 2

character(2)

Tearing factor for paper

20 × 2

numeric(2)

Abrasion loss

30 × 3

numeric(3)

Mortality and water hardness

61 × 5

factor(1), numeric(4)

Tensile strength of cement

21 × 2

numeric(2)

Weight gain in rats

40 × 3

factor(2), numeric(1)

Weight of chickens

24 × 3

factor(2), numeric(1)

Flicker frequency

27 × 4

factor(3), numeric(1)

Effect of ammonium chloride on yield

32 × 5

factor(4), numeric(1)

Example

This is a basic example which shows you how to use a data set to make a nice plot:

library(HSDS)
library(ggplot2)

ggplot(germin, aes(x = water, y = seeds, color = box)) +
  geom_boxplot(na.rm = T) +
  theme_bw()

Contributing

We are far from the 500 data sets, so any help is welcome ! If you want to contribute, all raw data sets are already present in the repo (at data-raw/data-files), so feel free to clean one or more… ! If you do so, please respect the following guidelines:

  • data sets should be named after the data structure index of the book (available here)

  • all variables in the data set should be labelled (using the labelled package for example)

  • data sets should be documented using the text from the book