-
Notifications
You must be signed in to change notification settings - Fork 6
/
VectorHierarchy.html
92 lines (92 loc) · 8.09 KB
/
VectorHierarchy.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #40a070; }
code > span.fl { color: #40a070; }
code > span.ch { color: #4070a0; }
code > span.st { color: #4070a0; }
code > span.co { color: #60a0b0; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #06287e; }
code > span.er { color: #ff0000; font-weight: bold; }
</style>
<link rel="stylesheet" href="Class.css" type="text/css" />
</head>
<body>
<h1 id="hierarchies-of-vector-types-in-r">Hierarchies of Vector Types in R</h1>
<p>In data analysis, we typically have a collection of observations. We rarely have just a single value. This is one of the reasons why R is a vector-based "language". We deal with samples and subsamples.</p>
<p>We very rarely have a sample of size 1. So R doesn't bother with single values, i.e. scalars. We have vectors. A scalar is a vector of length 1. So 1 is a number.</p>
<p>We may have a single value for each observational unit. We combine these into a single collection of values.</p>
<p>There are two basic types of collections in R. vectors and lists (although we can create lists with a call to vector()!) These are ordered collections of elements.</p>
<p><strong>Vectors</strong> are homogeneous in that all the elements must have the same type and R will make this happen automatically by coercion.</p>
<p><strong>Lists</strong> can be heterogeneous, i.e. each element can have a different type.</p>
<p>There are 4 fundamental vector types that people encounter most often. These are, in order, + logical - <code>TRUE</code>, <code>FALSE</code> + integer - 1L, 2L, as.integer(4) + numeric - 1, 1.0, 2.3443242e4, pi + character - "1", "1.0", "abc"</p>
<p>There is also complex for representing complex numbers (real + imaginary part).</p>
<p>Why do we consider these vector types ordered? Basically, information from one can be converted without loss of information from the previous level to the next level. We can convert a logical value of <code>TRUE</code> to an integer value of 1 and <code>FALSE</code> to an integer 0. Similary, for example, we can convert an integer from 1L to 1.0, a numeric. So a + logical maps to an integer, e.g., <code>c( TRUE, 2L)</code> becomese <code>c(1L, 2L)</code> + integer maps to a numeric, e.g., <code>c(2L, 3.1)</code> becomes <code>c(2, 3.1)</code>. + a numeric maps to a character</p>
<p>(Recall 3 is actually numeric, not an integer and we need to use 3L.)</p>
<p>Importantly, we can convert each of these back to "lower" types. There are functions for this - as.logical(), as.integer(), as.numeric(), as.character():</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">as.logical</span>(<span class="dv">2</span>)
[<span class="dv">1</span>] <span class="ot">TRUE</span></code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">as.integer</span>(<span class="st">"2"</span>)
[<span class="dv">1</span>] <span class="dv">2</span></code></pre>
<p>But we may lose information.</p>
<p>After these 4 fundamental types, we have the factor type. This is built on top of an integer</p>
<h1 id="exercise">Exercise</h1>
<p>What is the class of</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">10</span>, <span class="dv">20</span>, <span class="dv">40</span>)</code></pre>
<p>?</p>
<p>And</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">c</span>(<span class="ot">TRUE</span>, 20L, <span class="st">"abc"</span>)</code></pre>
<p>And</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">c</span>(<span class="ot">TRUE</span>, <span class="dv">20</span>)</code></pre>
<h1 id="names-of-vector-elements">Names of Vector Elements</h1>
<p>Vectors are ordered collections of elements.</p>
<p>The elements may have names. We can then refer to the elements by name rather or position.</p>
<p>The names() function returns a character vector of the names, or <code>NULL</code>. We can set the names() with</p>
<pre><code> names(x) = c("a", "b", "c")</code></pre>
<h1 id="attributes">Attributes</h1>
<p>names() are treated as attributes of an object. We can have arbitrary attributes on an object. names is special, as is dim, class, row.names, and other important foundational attributes.</p>
<p>One can get the attributes of an object with</p>
<pre><code>attributes(mtcars)</code></pre>
<p>We can get an individual attribute with attr()</p>
<pre><code>attr(mtcars, "row.names")</code></pre>
<p>We can also set an attribute with attr()</p>
<pre><code>attr(mtcars, "origin") = "myData.csv"</code></pre>
<p>Then we can query it with</p>
<pre><code>attr(mtcars, "origin") </code></pre>
<p>The function str() shows attributes.</p>
<p>The function structure() is also convenient for creating objects and associating one or more attributes on the object.</p>
<pre><code>x = structure(1:26, names = LETTERS, class = c("alphabet"))</code></pre>
<h1 id="data-frame">Data Frame</h1>
<p>While the order may not be important for statistical analysis, it is often important for data analysis, e.g. to match observations in two vectors where the i-th element in one vector corresponds to the same observational unit in the other vector.</p>
<p>We can keep measurements from the same observational units in two "parallel" vectors.</p>
<p>Alternatively, we can combine two or more 'parallel' vectors in a data.frame.</p>
<p>A data.frame is a list. Check the typeof() of a data.frame. The elements of the list are the columns. But a data.frame has extra properties than a list. Every element of the list has to have the same length as all the others. e.g.,</p>
<pre class="sourceCode r"><code class="sourceCode r">x =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">a =</span> <span class="dv">1</span>:<span class="dv">10</span>, <span class="dt">b =</span> <span class="dv">1</span>:<span class="dv">10</span>)
<span class="dv">71</span>><span class="st"> </span>x$y =<span class="st"> </span><span class="dv">1</span>:<span class="dv">100</span>
Error in <span class="st">`</span><span class="dt">$<-.data.frame</span><span class="st">`</span>(<span class="st">`</span><span class="dt">*tmp*</span><span class="st">`</span>, <span class="st">"y"</span>, <span class="dt">value =</span> <span class="dv">1</span>:<span class="dv">100</span>) :<span class="st"> </span>
<span class="st"> </span>replacement has <span class="dv">100</span> rows, data has <span class="dv">10</span></code></pre>
<p>But, via the recycling rule,</p>
<pre class="sourceCode r"><code class="sourceCode r">x$y =<span class="st"> </span><span class="dv">1</span></code></pre>
<p>works fine. 1 is repeated nrow(x) times.</p>
<p>A data frame is guaranteed to be 2-dimensional - it has a dim(). The names() are the names of the elements. This comes from being a list. We can subset using the list operations, e.g. df[["y"]], df$y.</p>
<p>We can also use 2-dimensional subsetting</p>
<pre class="sourceCode r"><code class="sourceCode r">df[ i, j]</code></pre>
<p><a href="Subsetting2D.html">See 2-D Subsetting</a>.</p>
</body>
</html>