Skip to content

Commit

Permalink
adding slopegrap
Browse files Browse the repository at this point in the history
  • Loading branch information
lilyclements committed Jun 12, 2024
1 parent ec9dc2e commit d2dabb3
Show file tree
Hide file tree
Showing 4 changed files with 334 additions and 17 deletions.
20 changes: 3 additions & 17 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,25 +1,11 @@
Package: instatExtras
Title: A set of functions used in R-Instat
Version: 0.1.0
Authors@R:
person("Danny", "Parsons"),
person("Stephen", "Kogo"),
person("Steven", "Ndung'u"),
Authors@R: c(person("Danny", "Parsons", role = "aut"),
person("Lily", "Clements", "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-8864-0552")),
person("Shadrack", "Kibet"),
person("Bethan", "Clarke"),
person("Lazarus", "Kioko Muthenya"),
person("Maxwell", "Fundi"),
person("Antoine", "Ntalumeso"),
person("Haward", "Ketoyo"),
person("Ivan", "Luvisia"),
person("Stephen", "Lloyd"),
person("Patrick", "Munyoki"),
person("Anastasia", "Mbithe"),
person("Vitalis", "Kwambai"),
person("Derrick", "Agorhom"),
person("Sophie", "Malla Tatchum")
person("Antoine", "Ntalumeso", role = "aut"),
person("Stephen", "Lloyd", role = "aut"))
Description: This package contains a set of functions used in R-Instat. This includes functions written specifically for R-Instat, as well functions written elsewhere that are modified for use in R-Instat.
License: LGPL (>= 3)
Encoding: UTF-8
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ export(plot_mrl)
export(plot_multiple_threshold)
export(record_graph)
export(slope)
export(slopegraph)
export(slopegraph_theme)
export(spei_input)
export(spei_output)
Expand Down
205 changes: 205 additions & 0 deletions R/slopegraph.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
#' Plot a Slopegraph a la Tufte
#'
#' This function and documentation are taken from the `"newggslopegraph"` function in the `CGPfunctions` R package.
#' There are slight modifications for use in R-Instat.
#' This function creates a "slopegraph" as conceptualised by Edward Tufte. Slopegraphs are minimalist
#' and efficient presentations of your data that can simultaneously convey the relative rankings,
#' the actual numeric values, and the changes and directionality of the data over time.
#' Takes a dataframe as input, with three named columns being used to draw the plot.
#' Makes the required adjustments to the ggplot2 parameters and returns the plot.
#'
#' @param data a dataframe or an object that can be coerced to a dataframe.
#' Basic error checking is performed, to include ensuring that the named columns
#' exist in the dataframe. See the \code{\link{newcancer}} dataset for an example of
#' how the dataframe should be organized.
#' @param x a column inside the dataframe that will be plotted on the x axis.
#' Traditionally this is some measure of time. The function accepts a column of class
#' ordered, factor or character. NOTE if your variable is currently a "date" class
#' you must convert before using the function with \code{as.character(variablename)}.
#' @param y a column inside the dataframe that will be plotted on the y axis.
#' Traditionally this is some measure such as a percentage. Currently the function
#' accepts a column of type integer or numeric. The slopegraph will be most effective
#' when the y variables are not too disparate.
#' @param colour a column inside the dataframe that will be used to group and
#' distinguish different y variables.
#' @param data_label an optional column inside the dataframe that will be used
#' as the label for the data points plotted. Can be complex strings and
#' have `NA` values but must be of class `chr`. By default `y` is
#' converted to `chr` and used.
#' @param y_text_size Optionally the font size for the Y axis labels to be displayed.
#' y_text_size = 3 is the default must be a numeric.
#' @param line_thickness Optionally the thickness of the plotted lines that
#' connect the data points. LineThickness = 1 is the default must be a numeric.
#' @param data_text_size Optionally the font size of the plotted data points. DataTextSize = 2.5
#' is the default must be a numeric.
#' @param data_text_colour Optionally the font color of the plotted data points. `"black"`
#' is the default can be either `colors()` or hex value e.g. "#FF00FF".
#' @param data_label_padding Optionally the amount of space between the plotted
#' data point numbers and the label "box". By default very small = 0.05 to
#' avoid overlap. Must be a numeric. Too large a value will risk "hiding"
#' datapoints.
#' @param data_label_line_size Optionally how wide a line to plot around the data
#' label box. By default = 0 to have no visible border line around the
#' label. Must be a numeric.
#' @param data_label_fill_colour Optionally the fill color or background of the
#' plotted data points. `"white"` is the default can be any of the `colors()`
#' or hex value e.g. "#FF00FF".
#' @param line_colour Optionally the color of the plotted lines. By default it will use
#' the ggplot2 color palette for coloring by \code{colour}. The user may override
#' with \bold{one} valid color of their choice e.g. "black" (see colors() for choices)
#' \bold{OR}
#' they may provide a vector of colors such as c("gray", "red", "green", "gray", "blue")
#' \bold{OR} a named vector like c("Green" = "gray", "Liberal" = "red", "NDP" = "green",
#' "Others" = "gray", "PC" = "blue"). Any input must be character, and the length
#' of a vector \bold{should} equal the number of levels in \code{colour}. If the
#' user does not provide enough colors they will be recycled.
#' @param reverse_x_axis logical, set this value to \code{TRUE} if you want
#' to reverse the **factor levels** on the x scale.
#' @param remove_missing logical, by default set to \code{TRUE} so that if any \code{y}
#' is missing \bold{all rows} for that \code{colour} are removed. If set to \code{FALSE} then
#' the function will try to remove and graph what data it does have. \bold{N.B.} missing values
#' for \code{x} and \code{colour} are never permitted and will generate a fatal error with
#' a warning.
#'
#'
#' @return a plot of type ggplot to the default plot device
#' @export
#'
#' @author Chuck Powell
#' @seealso \code{\link{newcancer}} and \code{\link{newgdp}}
#' @references Based on: Edward Tufte, Beautiful Evidence (2006), pages 174-176.
#' This function and documentation are taken from the `"newggslopegraph"` function in the `CGPfunctions` R package.
#' Full credit on this function goes to the authors of the `CGPfunctions` R package.
#' @examples
#'
#' data <- data.frame(
#' Year = rep(c("2020", "2021"), each = 3),
#' Value = c(10, 20, 15, 12, 25, 18),
#' Group = rep(c("A", "B", "C"), times = 2)
#' )
#' # Use the slopegraph function
#' slopegraph(data, x = Year, y = Value, colour = Group)
slopegraph <- function(data, x, y, colour, data_label = NULL,
y_text_size = 3,
line_thickness = 1, line_colour = "ByGroup",
data_text_size = 2.5, data_text_colour = "black", data_label_padding = 0.05,
data_label_line_size = 0, data_label_fill_colour = "white",
reverse_x_axis = FALSE,
remove_missing = TRUE){


if (length(match.call()) <= 4) {
stop("Not enough arguments passed requires a dataframe, plus at least three variables")
}
argList <- as.list(match.call()[-1])
if (!methods::hasArg(data)) {
stop("You didn't specify a dataframe to use", call. = FALSE)
}
Nx <- deparse(substitute(x))
Ny <- deparse(substitute(y))
Ncolour <- deparse(substitute(colour))
if (is.null(argList$data_label)) {
Ndata_label <- deparse(substitute(y))
data_label <- argList$y
}
else {
Ndata_label <- deparse(substitute(data_label))
}
Ndata <- argList$data
if (!is(data, "data.frame")) {
stop(paste0("'", Ndata, "' does not appear to be a data frame"))
}
if (!Nx %in% names(data)) {
stop(paste0("'", Nx, "' is not the name of a variable in the dataframe"),
call. = FALSE)
}
if (anyNA(data[[Nx]])) {
stop(paste0("'", Nx, "' can not have missing data please remove those rows"),
call. = FALSE)
}
if (!Ny %in% names(data)) {
stop(paste0("'", Ny, "' is not the name of a variable in the dataframe"),
call. = FALSE)
}
if (!Ncolour %in% names(data)) {
stop(paste0("'", Ncolour, "' is not the name of a variable in the dataframe"),
call. = FALSE)
}
if (!Ndata_label %in% names(data)) {
stop(paste0("'", Ndata_label, "' is not the name of a variable in the dataframe"),
call. = FALSE)
}
if (anyNA(data[[Ncolour]])) {
stop(paste0("'", Ncolour, "' can not have missing data please remove those rows"),
call. = FALSE)
}
if (!class(data[[Ny]]) %in% c("integer", "numeric")) {
stop(paste0("Variable '",
Ny, "' needs to be numeric"), call. = FALSE)
}
if (!"ordered" %in% class(data[[Nx]])) {
if (!"character" %in% class(data[[Nx]])) {
if ("factor" %in% class(data[[Nx]])) {
message(paste0("\nConverting '", Nx,
"' to an ordered factor\n"))
data[[Nx]] <- factor(data[[Nx]],
ordered = TRUE)
}
else {
stop(paste0("Variable '",
Nx, "' needs to be of class character, factor or ordered"),
call. = FALSE)
}
}
}
data_label <- rlang::enquo(data_label)
if (reverse_x_axis) {
data[[Nx]] <- forcats::fct_rev(data[[Nx]])
}
NumbOfLevels <- nlevels(factor(data[[Nx]]))
if (length(line_colour) > 1) {
if (length(line_colour) < length(unique(data[[Ncolour]]))) {
message(paste0("\nGiven ", length(line_colour),
" colours. Recycling colours because there are ",
length(unique(data[[Ncolour]])), " ",
Ncolour, "s\n"))
line_colour <- rep(line_colour, length.out = length(unique(data[[Ncolour]])))
}
LineGeom <- list(ggplot2::geom_line(ggplot2::aes(colour = {{colour}}), size = line_thickness),
scale_colour_manual(values = line_colour))
}
else {
if (line_colour == "ByGroup") {
LineGeom <- list(ggplot2::geom_line(ggplot2::aes(colour = {{colour}},
alpha = 1), size = line_thickness))
}
else {
LineGeom <- list(ggplot2::geom_line(ggplot2::aes_(), size = line_thickness,
colour = line_colour))
}
}
if (anyNA(data[[Ny]])) {
if (remove_missing) {
data <- data %>% group_by({{colour}}) %>%
dplyr::filter(!anyNA({{y}})) %>% droplevels()
}
else {
data <- data %>% dplyr::filter(!is.na({{y}}))
}
}
data %>% ggplot2::ggplot(ggplot2::aes(group = {{colour}}, y = {{y}}, x = {{x}})) +
LineGeom +
# note: this may conflict with other label in R, in which case we need to rewrite this
ggrepel::geom_text_repel(data = . %>% dplyr::filter({{x}} == min({{x}})), ggplot2::aes(label = {{colour}}),
hjust = "left", box.padding = 0.1, point.padding = 0.1,
segment.colour = "gray", segment.alpha = 0.6, fontface = "bold",
size = y_text_size, nudge_x = -1.95, direction = "y",
force = 0.5, max.iter = 3000) +
ggrepel::geom_text_repel(data = . %>% dplyr::filter({{x}} == max({{x}})), ggplot2::aes(label = {{colour}}),
hjust = "right", box.padding = 0.1, point.padding = 0.1,
segment.colour = "gray", segment.alpha = 0.6, fontface = "bold",
size = y_text_size, nudge_x = 1.95, direction = "y",
force = 0.5, max.iter = 3000) +
ggplot2::geom_label(ggplot2::aes_string(label = Ndata_label), size = data_text_size, label.padding = ggplot2::unit(data_label_padding, "lines"),
label.size = data_label_line_size, colour = data_text_colour, fill = data_label_fill_colour)
}
125 changes: 125 additions & 0 deletions man/slopegraph.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit d2dabb3

Please sign in to comment.