-
Notifications
You must be signed in to change notification settings - Fork 31
/
Copy pathReevesDogeUSD.Rmd
134 lines (97 loc) · 3.68 KB
/
ReevesDogeUSD.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
title: "ReevesDOGE_USD"
author: "Sam Reeves"
date: "4/11/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
We're going to showcase some features of various packages from the Tidyverse!
Here we have data concerning the DogeCoin/USD exchange rates according to Yahoo Finance from 2014 to roughly the present. Using magrittr, dplyr, lubridate, and ggplot2, this should be a snap. Let's get started.
### Load the data.
```{r}
rates <- read.csv("https://raw.githubusercontent.com/TheWerefriend/data607/master/tidyverseAssignment/DOGE-USD.csv")
colnames(rates)
```
### Check to see if close and adjusted close are the same.
```{r}
tests <- ifelse(rates$Close==rates$Adj.Close, "Identical", "Distinct")
"Distinct" %in% tests
```
This result makes sense because there are no possible corporate actions like stock splits which can cause the closing value in a day to be adjusted.
### Remove Adj.Close.
```{r message=FALSE}
library(magrittr)
library(dplyr)
rates <- rates %>%
select(!Adj.Close)
colnames(rates)
```
From the dplyr package, we can use the ! operator to select the complement of a list of variables by name.
### Convert the Date column to a date, and everything else to numeric values.
```{r message=FALSE}
library(lubridate)
rates$Date <- ymd(rates$Date)
```
Now, all the other values are stored as factors... But, something strange happens when we try to convert them directly to numeric values:
```{r}
rates$Open[[1]]
as.numeric(rates$Open[[1]])
```
Apparently, 0.000293 is the 190th factor level in the column! Since there are some null values in the data (days where major changes happened with the DOGE network, I assume), we must throw throw those observations out.
```{r warning=FALSE}
rates[, c(2:6)] <- rates %>%
select(!Date) %>%
sapply(as.character) %>%
sapply(as.numeric)
```
```{r}
rates <- na.omit(rates)
anyNA(rates)
```
Transmute() works exactly the same as mutate(), except that it alters an existing variable instead of creating a new one.
### Graph the prices and volume of Doge.
```{r}
library(ggplot2)
a <- ggplot(rates, aes(x = rates$Date)) +
geom_line(aes(y = rates$Open), color = "steelblue") +
geom_line(aes(y = rates$High), color = "green") +
geom_line(aes(y = rates$Low), color = "red") +
geom_line(aes(y = rates$High), color = "purple")
b <- ggplot(rates, aes(x = rates$Date)) +
geom_line(aes(y = rates$Volume), color = "black")
a + labs(x = "Time", y = "Price in USD")
b + labs(x = "Time", y = "Volume")
```
### TidyVerse Extend -- Richard Zheng (github: schoolkidrich)
```{r}
# remove outliers using group_by() and filter()
rates2 = rates %>%
group_by(Close)%>%
filter(between(Close, mean(Close) - (3 * sd(Close)),
mean(Close) - (3 * sd(Close))))
# scatter plot with line of best fit
rates2 %>%
ggplot(aes(x = Volume, y = Close))+geom_point()+geom_smooth(method = 'lm', formula = y~x)
# linear model predicting Close price by Volume
m1 = lm(Close~log(Volume),rates2)
summary(m1)
# histogram of residuals show mostly normal distribution
m1 %>%
resid()%>%
hist()
# predicted values
rates$predict_close = predict(m1,rates)
# predicted close prices show much less volatility than what really happened (removing outliers caused my model to be innacurate)
rates%>%
ggplot(aes(x=Date,y=predict_close))+geom_line(color = 'red')+geom_line(aes(x = Date,y=Close),color='blue')
# model 2 without removing outliers and using open price
m2 = lm(Close~Open+Volume,rates)
summary(m2)
# predictions by model
rates$predict_close2 = predict(m2)
# m2 seems to fit the data well
rates%>%
ggplot(aes(x=Date))+geom_line(aes(y = Close), color='blue')+geom_line(aes(y = predict_close2), color='red')
```