Skip to content

Latest commit

 

History

History
408 lines (252 loc) · 11.1 KB

chapter6.md

File metadata and controls

408 lines (252 loc) · 11.1 KB
title description prev next type id
Chapter 6: Data visualisation with R
This chapter will give you an introduction to the R graphics system and teach you how to get started with using the ggplot2 package for drawing all kind of plots.
/chapter5
/chapter7
chapter
6

Which of the following is a contributed R package?

That's correct! ggplot2 was developed by Hadley Wickham as part of his PhD.

No, this is a core package so it's already installed.

No, this a core package and loads automatically when you launch R.

No, this a core package and loads automatically when you launch R.

Which R package actually renders the graphics in R?

No, that's incorrect.

No, that's incorrect.

No, that's incorrect.

Yes, that's right!

Remember that there are two primary graphic models in R: the base and grid graphics. Which one is ggplot2 using?

No, that's incorrect.

Yes that's right! Well done!

For the following questions we are going to use BudgetFood data from the Ecdat package which contains the budget share of food for Spanish households. You can load the dataset and see the structure of the data below.

library(Ecdat)
str(BudgetFood)
## 'data.frame':    23972 obs. of  6 variables:
##  $ wfood : num  0.468 0.313 0.376 0.44 0.404 ...
##  $ totexp: num  1290941 1277978 845852 527698 1103220 ...
##  $ age   : num  43 40 28 60 37 35 40 68 43 51 ...
##  $ size  : num  5 3 3 1 5 4 4 2 9 7 ...
##  $ town  : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ sex   : Factor w/ 2 levels "man","woman": 1 1 1 2 1 1 1 2 1 1 ...

The meaning of the variables are described below:

  • wfood: percentage of total expenditure which the household has spent on food
  • totexp: total expenditure of the household
  • age: age of reference person in the household
  • size: size of the household
  • town: size of the town where the household is placed categorized into 5 groups: 1 for small towns, 5 for big ones
  • sex: sex of reference person (man,woman)

Try to recreate the scatter plot below. Is there anything unusual that you notice about the plot?

Hint: The scatter plots are created using geom_point.

You may have noticed that there is overplotting in the previous plot for the bottom left region. Let's try to create a hex plot like below this time.

Hint: The hex plots are created using geom_hex.

This time let's have a look at some other variables in the BudgetFood data. Let's check the distribution of the age by sex using boxplot like below.

Hint: The five number summary is often graphically depicted by a boxplot.

Next study the distribution of the total expenditure of the household by size of the town using a violin plot like below.

Hint: what do you notice about the x-axis scale? Do you perhaps need to convert town?

Let's have a look at the distribution of the percentage of total expenditure which the household has spent on food (wfood). We'll adjust the bin width of the histogram to 0.001. Do you notice the peak at 0?

Hint: this plot is called a histogram.

Finally, let's also draw a density plot but display the y-axis as counts instead of density.

Hint: this is a density plot but the y-axis is showing counts. What did after_stat function do?

We are again going to use the BudgetFood data from the Ecdat package to make the plots.

The plot below shows the histogram with the density plot overlaid on top of it. Try to recreate this figure.

Hint: this is a density plot but the y-axis is showing counts. What did after_stat function do?

In the next target plot, we are going to look at the distribution of the age by sex. There are not that many observations where sex information is missing so let's focus on sex levels for "man" and "woman".

Hint: Use the width argument to change the width of the boxplot.

In the next target plot, we want to draw a scatter plot between the size of the household and the total expenditure of the household. Then we want to overlay this with a loess curve in red colour. You may not have seen how to fit a curve on ggplot yet so check out the hint if you don't know.

Hint: geom_smooth can draw curves from a fitted model.

In the next plot, try drawing a barplot by the different sexes with the total count written on top of the bar.

Hint: change the stat of one of the geometry used to write the count to compute the total count to be used on the y axis.

The next one might be tricky. You want to draw a bar plot showing the average total expenditure of by sex and town but the top blue bars show the average for women, while the bottom red bars show the average for men.

Hint: The legend shows the fill with a label "woman" and "man". In this case should fill be an attribute or aesthetic?

In the plot below, we show the age distribution by man and woman. Change the colour so the violin plot for the man coloured as "violet" and woman as "royalblue".

Hint: Recall that scales are controlled by functions beginning with scale_. The second "word" in the scale function is the aesthetic.

In the next plot, let's have a look at the distribution of the total household expenditure by sex using histograms. Since the total household expenditure skewed to the right, take a log with base 10 transformation for the scale and change the colours like the target plot below.

Hint: Recall that scales are controlled by functions beginning with scale_. The second "word" in the scale function is the aesthetic.

In the following section, we won't give you much hints! Try to reach the target plots by seeing what's drawn in the target plot and complete the code needed to get to the target plot.

Hint: what variable is the data facetted by? If you want to see the grey shadow of the overall distribution for each panel, remember that the variable needs to be removed from the layer data.

Hint: A different style of facet is required to arrange the subplots in both x and y directions.

Hint: The patchwork package allows you to arbitrarily combine multiple separate plots.

Well done getting to the last practice exercise!

The next plot you're going to draw is the plot below -- or make the plot as pretty as you can!

Hint: Theme options for ggplot2 are customised in the theme() function. You specify how a particular target (say the text) is shown using the appropriate element_*() function.