forked from TrevorFrench/R-for-Data-Analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
p1c4-data-structures.qmd
88 lines (59 loc) · 2.74 KB
/
p1c4-data-structures.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# Data Structure
In computer science, a data structure refers to the method which one uses to organize their data. Six basic data structures are commonly used in R:
- **Vectors** - Vectors contain ordered data of a single type.
- **Lists** - Lists are a collection of objects.
- **Matrices** - A matrix is a two-dimensional array where the data is all of the same type.
- **Factors** - Factors are used to designate levels within categorical data.
- **Data Frames** - A data frame contains two-dimensional data where the data can have different types.
- **Arrays** - Arrays are objects which have more than two dimensions (n-dimensional).
## Vectors
We can create a vector by using the "c" function to combine multiple values into a single vector. In the following example, we will combine four separate numbers into a single vector and the output the resulting vector to see what it looks like.
```{r}
x <- c(1, 3, 3, 7)
print(x)
```
## Lists
Lists are a collection of objects. This means that each element can be a different data type (unlike vectors). In the following example we'll create a list containing two character objects and one vector with the "list" function.
```{r}
first_name <- "John"
last_name <- "Smith"
favorite_numbers <- c(1, 3, 3, 7)
person <- list(first_name, last_name, favorite_numbers)
print(person)
```
## Matrices
A matrix is a two-dimensional array where the data is all of the same type. In the following example, we'll create a matrix with three rows and four columns.
```{r}
x <- matrix(
c(1,3,3,7,1,3,3,7,1,3,3,7)
, nrow = 3
, ncol = 4
, byrow = TRUE)
print(x)
```
## Factors
Factors are used to designate levels within categorical data. In the following example, we'll use the "factor" function on a vector of assorted color names to receive the "levels" which it contains.
```{r}
x <- c("Red", "Blue", "Red", "Yellow", "Yellow")
colors <- factor(x)
print(colors)
```
## Data Frames
A data frame contains two-dimensional data. Unlike the matrix data structure, each column of a data frame can contain data of a differing type (but within a column the data must be of the same type).
The following example will create a data frame with two rows and two columns.
```{r}
people <- c("John", "Jane")
id <- c(1, 2)
df <- data.frame(id = id, person = people)
print(df)
```
## Arrays
Arrays are objects that can have more than two dimensions. This is sometimes referred to as being "n-dimensional". The dimensions of the following example are 1 x 4 x 3. You'll see that the data consist of one row and four columns spread out over a third dimension.
```{r}
x <- array(
c(1,3,3,7,1,3,3,7,1,3,3,7)
, dim = c(1,4,3))
print(x)
```
## Resources
- W3 Schools: <https://www.w3schools.com/r/r_vectors.asp>