-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathOnline Community Feature.Rmd
168 lines (115 loc) · 4.97 KB
/
Online Community Feature.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
output:
pdf_document: default
html_document: default
---
Load Libraries
```{r}
rm(list=ls())
library(readxl)
library(ggplot2)
library(Hmisc)
library(MASS)
## library(caret)
library(regclass)
library(ISLR)
library(boot)
library(vcd)
library(pROC)
library (ROCR)
```
After loading the libraries, load the excel file containing data on Defaults
```{r}
##set your working directory
#setwd("/path/to/my/directory")
library("readxl")
##setwd("")
mydata = read_excel("Data2.xlsx")
summary(mydata) #summary of the data
```
You could also use
```{r}
str(mydata)
```
What do you notice? Looks like the Variables default and student are *character* and not *numeric*. Let's look at the values they take.
```{r}
head(mydata)
```
head alone might not display _all possible values_. Let's find all the possible values
```{r}
levels(as.factor(mydata$`Joined?`))
levels(as.factor(mydata$`Churned at 3 months after launch of the online community`))
```
Looks like both have only two levels and are Yes/No variables. Let's convert all No to 0 and Yes to 1. To do so, we create two new variables in the same mydata table. The two variables are labelled default2 and student2.
```{r}
mydata$churn<- ifelse(mydata$`Churned at 3 months after launch of the online community` == "No",0,1)
mydata$join <- ifelse(mydata$`Joined?` == "No",0,1)
```
Confirm the change by looking at the data again. And get the summary.
```{r}
head(mydata)
levels(as.factor(mydata$churn))
levels(as.factor(mydata$join)) #if you wanted to recheck the levels
```
```{r}
summary(mydata)
```
Now let's code the model. We want to predict default. The independent variables in our dataset are 'student=Yes/No', 'balance' and 'income'. The default variable is discrete and has only two levels: {0,1}. Unlike in a regression, where the dependent variable is continuous, default being discrete does not lend itself to standard regression analysis.
```{r}
#Logistic Regression Model Estimation
mylogit<-glm(`Churned at 3 months after launch of the online community`~ `Average Spend Last 3 months of Life with the firm`+ `Joined?`+ `Customer Age with Firm at time of launching the online community` ,data=mydata,family=binomial(link="logit"))
```
Let's output the results
```{r}
#coefficients
summary(mylogit)
```
While the p-values indicate confidence, we could also output the 2.5% and 97.5% confindence intervals
```{r}
#Confidence Intervals
Confidence=confint(mylogit)
Confidence
```
Another way to interpret the parameter estimates would be to compute the Odds Ratios
```{r}
#Odds Ratio Calculation, including confidence intervals
oddsr=round(exp(cbind(OddsRatio=coef(mylogit),confint(mylogit))),4)
oddsr
##CLV Calculation
## we calculate L -> the xepected lifetime of a consumer
churn_yes_joined <- data.frame(subset(mydata, mydata$`Churned at 3 months after launch of the online community`== 1 & mydata$`Joined?`== 1 ))
churn_total_joined <- data.frame(subset(mydata, mydata$`Joined?`== 1 ))
nr <-nrow(churn_yes_joined)
dr <- nrow(churn_total_joined)
churn_of_those_who_joined <- nr/ dr
churn_of_those_who_joined
r <- 1- churn_of_those_who_joined ## retention rate of those who joined
r
L <- 1/ (1-r) ## avg expected customer life time
m <- mean(churn_total_joined$Average.Spend.Last.3.months.of.Life.with.the.firm*3)*0.50
CLV <- m*L ## CLV of customers who joined
CLV
churn_yes_not_joined <- data.frame(subset(mydata, mydata$`Churned at 3 months after launch of the online community`== 1 & mydata$`Joined?`== 0 ))
churn_total_not_joined <- data.frame(subset(mydata, mydata$`Joined?`== 0 ))
nr_nj <-nrow(churn_yes_not_joined)
dr_nj <- nrow(churn_total_not_joined)
churn_of_those_who_didnt_join <- nr_nj/ dr_nj
churn_of_those_who_didnt_join
r_nj <- 1- churn_of_those_who_didnt_join ## retention rate of those who joined
L_nj <- 1/ (1-r_nj) ## avg expected customer life time
m_nj <- mean(churn_total_not_joined$Average.Spend.Last.3.months.of.Life.with.the.firm*3) *0.50
CLV_nj <- m_nj*L_nj ## CLV of customers who joined
CLV_nj
r_nj
###### CLV forevery customer
churn_yes <- data.frame(subset(mydata, mydata$`Churned at 3 months after launch of the online community`== 1 ))
churn <- nrow(churn_yes)/ nrow(mydata)
R <- 1 - churn
L <- 1/(1-R)
mydata$CLV <- ((mydata$`Average Spend Last 3 months of Life with the firm`*0.50) * L)
CLV_Joined <- lm(mydata$CLV ~ mydata$`Joined?`)
summary (CLV_Joined)
```
```{r}
# The average revenue and CLV increased with decreased retention, where the increase in CLV can be attributed to the growth in revenue. However, in the long run as the examination period increases, the CLV being very sensitive to retention rate could be negatively affected by the low retention value. In order to further validate this claim, we recommend the management to run an A/B test of the app on a small sample size and check if the online community causes the change in retention.
```