ADD: added highlights and links to various places

chris-greening · Jan 28, 2023 · ea3a2b1 · ea3a2b1
1 parent 1beba32
commit ea3a2b1
Showing 1 changed file with 23 additions and 25 deletions.
diff --git a/posts/Performing multiple joins in R using dplyr and purrr/README.md b/posts/Performing multiple joins in R using dplyr and purrr/README.md
@@ -1,8 +1,8 @@
 # Joining multiple datasets on the same column in R using dplyr and purrr
 
-Joining multiple datasets on the same column is a common pattern in data preparation
+Joining **multiple datasets** on the same column is a common pattern in data preparation
 
-So let's jump in and explore how we can leverage R and the tidyverse to join an arbitrary number of datasets on a shared column with elegant, readable code!
+So let's jump in and explore how we can **leverage R** and the **tidyverse** to join an arbitrary number of datasets on a shared column with **elegant, readable code**!
 
 ## Table of Contents 
 - [Installing prerequisite packages](#installing-prerequisite-packages)
@@ -20,9 +20,9 @@ So let's jump in and explore how we can leverage R and the tidyverse to join an
 ## Installing prerequisite packages
 <a src="#installing-prerequisite-packages"></a>
 
-In this tutorial we'll be using dplyr and purrr from the popular tidyverse collection of packages
+In this tutorial we'll be using [`dplyr`](https://dplyr.tidyverse.org/) and [`purrr`](https://purrr.tidyverse.org/) from the popular [tidyverse](https://www.tidyverse.org/) collection of packages
 
-The following line of code will install them on your machine if they aren't already:
+The following line of code will **install** them on your machine if they aren't already:
 
 ```R
 install.packages(c("dplyr", "purrr"))
@@ -33,7 +33,7 @@ install.packages(c("dplyr", "purrr"))
 ## Examining our sample datasets
 <a src="#examining-our-sample-datasets"></a>
 
-For the following examples, we'll be using real-world agricultural data sourced via Eurostat containing the number of specific livestock animals (`swine`, `bovine`, `sheep`, and `goats`) in a `country` during a given `year`
+For the following examples, we'll be using **real-world agricultural data** sourced via [Eurostat](https://ec.europa.eu/eurostat) containing the number of specific livestock animals (`swine`, `bovine`, `sheep`, and `goats`) in a `country` during a given `year`
 
 For example, here is the `goats` dataset
 ```R
@@ -54,7 +54,7 @@ For example, here is the `goats` dataset
 # … with 1,312 more rows
 ```
 
-Our goal is to join these datasets by `country` and `year` into a single `livestock.data` variable containing all the animals like so:
+Our goal is to **join these datasets** by `country` and `year` into a single `livestock.data` variable containing all the animals like so:
 
 ```R
 > livestock.data
@@ -79,7 +79,7 @@ Our goal is to join these datasets by `country` and `year` into a single `livest
 ## Using dplyr::full_join to manually join two datasets at a time
 <a src="#using-dplyr"></a>
 
-Let's start with a naive approach and manually join our datasets one-by-one on the `country` and `year` columns
+Let's start with a naive approach and **manually** join our datasets one-by-one on the `country` and `year` columns
 
 ```R
 by = c("country", "year")
@@ -89,27 +89,27 @@ livestock.data <- dplyr::full_join(livestock.data, sheep, by=by)
 ```
 
 The above code accomplishes the exercise by:
-1. Manually stepping through each animal
-2. applying a function that takes two arguments (in this case `dplyr::full_join`)
-3. and chaining the output of one step (`livestock.data`) as the input for the next step
+1. **Manually** stepping through each animal
+2. **applying a function** that takes two arguments (in this case `dplyr::full_join`)
+3. and chaining the **output** of one step (`livestock.data`) as the **input** for the next step
 
-While this might work for four datasets, what if we had 100 datasets? 1000 datasets? _n datasets?!_ Suddenly not a great solution! 
+While this might work for four datasets, what if we had 100 datasets? 1000 datasets? _n datasets?!_ Suddenly **not** a great solution! 
 
-Let's investigate how we can improve, automate, and scale this
+Let's investigate how we can **improve, automate, and scale** this
 
 ---
 
 ## Understanding the reduce operation
 <a src="#understanding-the-reduce-operation"></a> 
 
-The reduce operation is a technique that combines all the elements of an array (i.e. an array containing our individual livestock datasets) into a single value (i.e. the final joined table).   
+The **reduce** operation is a technique that **combines** all the elements of an **array** (i.e. an array containing our individual livestock datasets) into a **single value** (i.e. the final joined table).   
 
 The reduce operation accomplishes this by:
-1. Looping over an array
-2. applying a function that takes two arguments (such as `dplyr::full_join`) 
-3. and chaining the output of one step as the input for the next step
+1. **Looping** over an array
+2. **applying a function** that takes two arguments (such as `dplyr::full_join`) 
+3. and chaining the **outpu**t of one step as the **input** for the next step
 
-Sound familiar? This is exactly what we just performed manually in the previous section except this time we'll be leveraging R to do it for us! 
+Sound familiar? This is *exactly* what we just performed manually in the previous section except this time we'll be **leveraging R** to do it for us! 
 
 So let's see in practice how we can apply the reduce operation to elegantly join our `livestock.data`
 
@@ -118,9 +118,9 @@ So let's see in practice how we can apply the reduce operation to elegantly join
 ## Leveraging purrr::reduce to join multiple datasets
 <a src="#leveraging-purrr-reduce"></a>
 
-`purrr` is a package that enhances R's functional programming toolkit for working with functions and vectors (i.e. reducing, mapping, filtering, etc.)
+`purrr` is a package that enhances R's **functional programming toolkit** for working with functions and vectors (i.e. reducing, mapping, filtering, etc.)
 
-In this case, we're going to use `purrr::reduce` in conjunction with `dplyr::full_join` to join all of our datasets in one line of concise, readable code
+In this case, we're going to use `purrr::reduce` in conjunction with `dplyr::full_join` to join all of our datasets in one line of **concise, readable code**
 
 ```R
 livestock.data <- purrr::reduce(
@@ -131,23 +131,21 @@ livestock.data <- purrr::reduce(
 )
 ```
 
-And that's it! We've joined all of our datasets in what's essentially a single line of code
+And that's it! We accomplished this by:
 
-We accomplished this by:
-
-1. Looping over a list of our livestock
+1. **Looping** over a list of our livestock
 ```R
 list(bovine, goats, swine, sheep)
 ```
-2. applying `dplyr::full_join` which takes two arguments 
+2. **applying** `dplyr::full_join` which takes two arguments 
 
 ```R
 function(left, right) {
   dplyr::full_join(left, right, by=c("country", "year"))
 }
 ```
 
-3. and chaining the output of one step as the input for the next step
+3. and chaining the **output** of one step as the **input** for the next step
 
 ![Image showing the different datasets joining together in a hierarchical chain that starts with bovine and goats joining into livestock.data, livestock.data joining with swine, and livestock.data finally joining with sheep](media/join_image.PNG)