-
Notifications
You must be signed in to change notification settings - Fork 4
/
preface.Rnw
39 lines (23 loc) · 8.99 KB
/
preface.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
\chapter*{Preface}
\begin{VF}
Suppose that you want to teach the `cat' concept to a very young child. Do you explain that a cat is a relatively small, primarily carnivorous mammal with retractible claws, a distinctive sonic output, etc.? I'll bet not. You probably show the kid a lot of different cats, saying `kitty' each time, until it gets the idea. To put it more generally, generalisations are best made by abstraction from experience.
\VA{R. P. Boas}{\emph{Can we make mathematics intelligible?}, 1981}\nocite{Boas1981}
\end{VF}
\noindent
Why did I choose ``\emph{Learn R: As a Language}'' as the title? This book is based on exploration and practice that aims at teaching how to express various operations on data using the \Rlang language. It focuses on the language, rather than on specific types of data analysis, exposes the reader to current usage, and does not spare the quirks of the language. When we use our native language in everyday life, we do not think about grammar rules or sentence structure, except for the trickier or unfamiliar situations. My aim is for this book to help readers learn to use \Rlang in this same way, i.e., to become fluent in \Rlang. The book is structured around the elements of natural languages like English with chapter titles that highlight the parallels between them and the \Rlang language.
\emph{Learn R: As a Language} is different from other books about \Rlang in that it emphasises the learning of the language itself, rather than how to use it to address specific data analysis tasks. My aim has been to enable readers to use \Rlang to implement original solutions to the data analysis and data visualisation tasks they encounter. The use of quantitative methods and data analysis has become more frequent in fields with limited long-term tradition in their use, like humanities, or, the complexity of the methods used has dramatically increased, like in Biology. Such trends can be expected to continue in the future.
Currently, many students of biological and environmental sciences learn to use \Rlang in courses about statistics or data analysis. However, this is frequently not in enough depth to effectively use \Rlang in scripts for automating data analyses or to ensure their reproducibility. There are also many researchers in various fields who are already familiar with statistical principles and willing to switch from other software to \Rlang. \emph{Learn R: As a Language} is written with these readers in mind to serve both as a textbook and as a reference.
A language is a system of communication. Basic concepts and operations are based on abstractions that are shared across programming languages and relevant to programs of all sizes and complexities; these abstractions are explained in the book together with their implementation in the \Rlang language. Other abstractions and programming concepts, outside the scope of this book, are relevant to large and complex pieces of software meant to be widely distributed. In other words, \emph{Learn R: As a Language} aims at teaching and supporting \emph{programming in the small}: the use of \Rlang to automate the drudgery of data manipulation, including the different steps spanning from data input and exploration to the production of publication-ready illustrations and reproducible data-based reports.
Using a language actively is the most efficient way of learning it. By using it, I mean actually reading, writing, and running scripts or programs. \emph{Learn R: As a Language} supports learning the \Rlang language in a way comparable to how children learn to speak: they work out what the rules are, simply by listening to people speak and trying to utter what they want to tell their parents. Of course, small children also receive guidance through feedback, but they are not taught a prescriptive set of rules like when learning a second language at school. Instead of listening, readers will read and run code, and instead of speaking, readers will write and try to run \Rlang-language code on a computer. I do provide explanations and guidance, as understanding how \Rlang works greatly helps with its use. However, the approach I encourage in this book is for readers to play with the numerous examples and to create variations upon them, to find out by themselves the patterns behind the \Rlang language. Instead of parents being the sounding board for the first utterances of readers new to \Rlang, the computer will play this role. Although working through the examples in \emph{Learn R: As a Language} in a group of peers or in class is beneficial, the book is designed to be useful also in the absence of such support.
\paragraph*{Changes in the second edition:} I edited the text from the first edition to correct all errors and outdated examples or explanations known to me. This revised second edition reflects changes in \Rlang and the contributed packages used in the book. Very little of the code from the first edition had stopped working but deprecations meant that a few examples triggered messages or warnings, and will eventually fail. Recent ($>$\,4.0.0) versions of \Rlang have significant enhancements, including the new pipe operator described and used in this second edition. Packages have also evolved, acquiring new features like a new approach to flipping plots in \pkgname{ggplot2}.
I have aimed at making the book more accessible to readers with no previous experience in computer programming. Feedback from readers and reviewers highlighted a few gaps in the content and some difficult-to-follow explanations. I revised the text, in some cases changing the sequence of presentation. I added diagrams to illustrate the structure of different types of objects and flowcharts to describe how program constructs work. I added tables listing groups of related functions. New sections cover character string operations, and details of data wrangling in \Rlang. Some of the most frequently asked questions about \Rlang are answered in the text and separately indexed. All exercises or ``playgrounds'' are numbered to facilitate their use as class work and the sharing of model answers. As the first edition has been frequently found useful as a reference, I expanded the already thorough indexing and added more cross-references connecting related sections across the whole book.
An additional change is in my view about packages \pkgnameNI{dplyr} and \pkgnameNI{tidyr}, part of the \pkgname{tidyverse}. I have come to think that the rate of development of these two packages can make them difficult for users for whom data analysis is just one aspect of their occupation. As these packages are widely used, I emphasise more than in the first edition the differences between functions and classes from packages \pkgnameNI{dplyr} and \pkgnameNI{tidyr} and equivalent ones from base \Rlang. I added a section on working with dates and times using the \pkgnameNI{lubridate} package. I updated and reorganised the chapter describing package \pkgname{ggplot2} and some of its extensions.
In numbers, the page count has increased by 27\%, the number of figures from eight to twenty-six plus nine in-text diagrams, and tables from none to seven. As for the design, text boxes have been replaced by call-outs marked with marginal bars. In addition, starting from version 2.0.0, the \pkgname{learnrbook} package supports the first and second editions of the book. It contains data, scripts, and all the code examples from both editions. It also helps with the installation of all the packages used in the book. The website at \url{https://www.learnr-book.info/} provides updated open-access content.
\section*{Acknowledgements}
I thank Jaakko Heinonen for introducing me in the late 1990s to the then new \Rlang. Along the way many experts have answered my questions in usenet and more recently in \stackoverflow. I wish to warmly thank members of my research group, students, collaborators, authors of books, and people I have met online or at conferences. They have made it possible for me to write this book. I am specially indebted to Dan Yavorsky, Tarja Lehto, Titta Kotilainen, Tautvydas Zalnierius, Fang Wang, Yan Yan, Neha Rai, Markus Laurel, Brett Cooper, Viivi Lindholm, Mat\v{e}j Rzehulka, Zuzana Svarna, colleagues, students, and anonymous reviewers for many very helpful comments on the draft manuscript and/or the first edition. Rob Calver, editor of both editions, provided advice and encouragement with great patience. Paul Boyd, Shashi Kumar, Ashraf Reza, Vaishali Singh, Lara Spieker, and Sherry Thomas efficiently helped with different aspects of this project.
The writing of this second edition was helped by a six-month sabbatical granted by the Faculty of Biological and Environmental Sciences of the University of Helsinki, Finland. I thank Prof.\ Kurt Fagerstedt for his support.
In many ways this text owes more to people who are not authors than to myself. However, as I am the one who has written \emph{Learn R: As a Language} and decided what to include and exclude, I take full responsibility for any errors and inaccuracies.
\vspace{1ex}
{\raggedleft \emph{Pedro J. Aphalo}\\
\raggedleft \emph{Helsinki, \today}
}