title | description | prev | next | type | id |
---|---|---|---|---|---|
Chapter 6: Data visualisation with R |
This chapter will give you an introduction to the R graphics system and teach you how to get started with using the ggplot2 package for drawing all kind of plots. |
/chapter5 |
/chapter7 |
chapter |
6 |
Which of the following is a contributed R package?
That's correct! ggplot2 was developed by Hadley Wickham as part of his PhD.
No, this is a core package so it's already installed.
No, this a core package and loads automatically when you launch R.
No, this a core package and loads automatically when you launch R.
Which R package actually renders the graphics in R?
No, that's incorrect.
No, that's incorrect.
No, that's incorrect.
Yes, that's right!
Remember that there are two primary graphic models in R: the base and grid graphics. Which one is ggplot2 using?
No, that's incorrect.
Yes that's right! Well done!
For the following questions we are going to use BudgetFood
data from
the Ecdat
package which contains the budget share of food for Spanish
households. You can load the dataset and see the structure of the data
below.
library(Ecdat)
str(BudgetFood)
## 'data.frame': 23972 obs. of 6 variables:
## $ wfood : num 0.468 0.313 0.376 0.44 0.404 ...
## $ totexp: num 1290941 1277978 845852 527698 1103220 ...
## $ age : num 43 40 28 60 37 35 40 68 43 51 ...
## $ size : num 5 3 3 1 5 4 4 2 9 7 ...
## $ town : num 2 2 2 2 2 2 2 2 2 2 ...
## $ sex : Factor w/ 2 levels "man","woman": 1 1 1 2 1 1 1 2 1 1 ...
The meaning of the variables are described below:
wfood
: percentage of total expenditure which the household has spent on foodtotexp
: total expenditure of the householdage
: age of reference person in the householdsize
: size of the householdtown
: size of the town where the household is placed categorized into 5 groups: 1 for small towns, 5 for big onessex
: sex of reference person (man,woman)
Try to recreate the scatter plot below. Is there anything unusual that you notice about the plot?
Hint: The scatter plots are created using geom_point
.
You may have noticed that there is overplotting in the previous plot for the bottom left region. Let's try to create a hex plot like below this time.
Hint: The hex plots are created using geom_hex
.
This time let's have a look at some other variables in the BudgetFood
data. Let's check the distribution of the age by sex using boxplot
like below.
Hint: The five number summary is often graphically depicted by a boxplot.
Next study the distribution of the total expenditure of the household by size of the town using a violin plot like below.
Hint: what do you notice about the x-axis scale? Do you perhaps need to
convert town
?
Let's have a look at the distribution of the percentage of total
expenditure which the household has spent on food (wfood
). We'll
adjust the bin width of the histogram to 0.001. Do you notice the
peak at 0?
Hint: this plot is called a histogram.
Finally, let's also draw a density plot but display the y-axis as counts instead of density.
Hint: this is a density plot but the y-axis is showing counts. What did
after_stat
function do?
We are again going to use the BudgetFood
data from the Ecdat
package
to make the plots.
The plot below shows the histogram with the density plot overlaid on top of it. Try to recreate this figure.
Hint: this is a density plot but the y-axis is showing counts. What did
after_stat
function do?
In the next target plot, we are going to look at the distribution of the
age
by sex
. There are not that many observations where sex
information is missing so let's focus on sex
levels for "man" and
"woman".
Hint: Use the width
argument to change the width of the boxplot.
In the next target plot, we want to draw a scatter plot between the size
of the household and the total expenditure of the household. Then we
want to overlay this with a loess curve in red colour. You may not have
seen how to fit a curve on ggplot
yet so check out the hint if you
don't know.
Hint: geom_smooth
can draw curves from a fitted model.
In the next plot, try drawing a barplot by the different sexes with the total count written on top of the bar.
Hint: change the stat of one of the geometry used to write the count to compute the total count to be used on the y axis.
The next one might be tricky. You want to draw a bar plot showing the average total expenditure of by sex and town but the top blue bars show the average for women, while the bottom red bars show the average for men.
Hint: The legend shows the fill
with a label "woman" and "man". In
this case should fill
be an attribute or aesthetic?
In the plot below, we show the age distribution by man and woman. Change the colour so the violin plot for the man coloured as "violet" and woman as "royalblue".
Hint: Recall that scales are controlled by functions beginning with
scale_
. The second "word" in the scale function is the aesthetic.
In the next plot, let's have a look at the distribution of the total household expenditure by sex using histograms. Since the total household expenditure skewed to the right, take a log with base 10 transformation for the scale and change the colours like the target plot below.
Hint: Recall that scales are controlled by functions beginning with
scale_
. The second "word" in the scale function is the aesthetic.
In the following section, we won't give you much hints! Try to reach the target plots by seeing what's drawn in the target plot and complete the code needed to get to the target plot.
Hint: what variable is the data facetted by? If you want to see the grey shadow of the overall distribution for each panel, remember that the variable needs to be removed from the layer data.
Hint: A different style of facet is required to arrange the subplots in both x and y directions.
Hint: The patchwork package allows you to arbitrarily combine multiple separate plots.
Well done getting to the last practice exercise!
The next plot you're going to draw is the plot below -- or make the plot as pretty as you can!
Hint: Theme options for ggplot2 are customised in the theme() function.
You specify how a particular target (say the text) is shown using the
appropriate element_*()
function.
- ggplot2: elegant graphics for data analysis
- R Graphics Cookbook
- R for data science slack
community
has a channel on
ggplot2
- Download
the
ggplot2
cheat sheet - Ask or search questions at
StackOverflow
or RStudio Community
with
ggplot2
tag. - The R Graph Gallery
- Fundamentals of Data Visualization -- see code for this book here.