-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
408 lines (339 loc) · 18.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Woof Woof NYC</title>
<!-- Load CSS libraries -->
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-F3w7mX95PdgyTmZZMECAngseQB83DfGTowi0iMjiWaeVhAn4FJkqJByhZMI3AhiU" crossorigin="anonymous">
<!-- Style sheet -->
<link rel="stylesheet" href="css/style.css">
<!-- GOOGLE FONTS -->
<link href="https://fonts.cdnfonts.com/css/roboto" rel="stylesheet">
<style>
#map-circle{
width: 400px;height: 400px;position:absolute; border-width:5px;
-webkit-border-radius: 200px;-moz-border-radius: 200px;
border-radius: 200px;border-color: green;margin-left:-100px;margin-top:-95px;
}
</style>
</head>
<body>
<!--HEADER-->
<div class="header section">
<div class="row">
<div class="col-sm-10"></div>
<div class="col-sm-1"> <a href="https://kuomaje.cargo.site/Technology" target="_blank"> kk </a></div>
<div class="col-sm-1"> <a href="https://liu-yunsong.com"> ys </a></div>
</div>
</div>
<!--TITLE-->
<hr class="solid">
<div class="section">
<div class="row">
<div class="col-sm-8 title"> Visualizing Dogs in New York </div>
<div class="col-sm-2 title-desc"> Data Science </div>
<div class="col-sm-1 title-desc"> Final </div>
<div class="col-sm-1 title-desc"> Dec 2022 </div>
</div>
</div>
<!--TITLE DESCRIPTION-->
<div class="section">
<div class="row">
<div class="col-sm-8 title-label"> Title </div>
<div class="col-sm-2 title-desc-label"> Category </div>
<div class="col-sm-1 title-desc-label"> Assignment </div>
<div class="col-sm-1 title-desc-label"> Date </div>
</div>
</div>
<hr class="solid">
<!--SECTION ZERO---------------------------------------------------------------------------------------------------------------------------------->
<!--`````LANDING PAGE---------------------------------------------------------------------------------------------------------------------------->
<script type="text/javascript">
$(document).ready(function(){
$(window).scroll(function(){
if($(this).scrollTop() > 500){
$(".open-page").css({"opacity" : "0"});
$(".wide").css({"opacity" : "0"})
}
else {
$(".open-page").css({"opacity" : "1"});
$(".wide").css({"opacity" : "1"})
}
})
})
</script>
<div class="open-page">
<div id="fur-container"></div>
<div id="text_container">
<span id="text1"></span>
<span id="text2"></span>
</div>
<!-- The SVG filter used to create the merging effect -->
<svg id="filters">
<defs>
<filter id="threshold">
<!-- Basically just a threshold effect -
pixels with a high enough opacity are set to full opacity
and all other pixels are set to completely transparent. -->
<feColorMatrix in="SourceGraphic"
type="matrix"
values="1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 255 -140" />
</filter>
</defs>
</svg>
</div>
<!--SECTION ONE---------------------------------------------------------------------------------------------------------------------------------->
<!--`````INTRODUCTION---------------------------------------------------------------------------------------------------------------------------->
<div class="sectionStatement">
<h2>The project aims to analyze the relationship between <span class="bolded">social</span> and
<span class="bolded">spatial</span> factors and provide insight into the intangible
and physical organization of the city
through the perspective of dogs and their daily routines. It will explore socio-economic
indicators such as <span class="underline"> property values, neighborhood quality, proximity to open spaces
like parks, and walkability </span> of the urban environment to visualize the city's spatial
characteristics.
</h2>
</div>
<div class="sectionImg">
<br><br><br><br><br><img src="data/dog3.png"><br><br><br><br><br>
</div>
<br><br><br><br><br>
<div class="sectionStatement">
<h3>it all started from here...</h3>
<h4>
Our project is motivated by the prevalence of certain dog breeds in various neighborhoods in Cambridge and Boston,
and we seek to explore the correlation between dogs and socioeconomic factors such as <span class="bolded">income, work-life balance,
household size, and neighborhood quality</span>. While we would ideally like to examine individual dog and owner characteristics,
ethical data inquiry practices prevent us from doing so, and data limitations make it impractical. Instead, we will use zip
code as a common feature in our data collection and analysis. To conduct our investigation, we have chosen New York City as
our research site due to the availability of comprehensive data on dogs, people, and neighborhoods, as well as its socioeconomic
stratification and spatial segregation. Additionally, New York City is renowned as a bustling metropolis, making it an ideal
location to investigate the relationship between dogs and urban living.
</h4>
</div>
<div class="sectionImg">
<br><br><br><br><br><img src="data/dogwalk.png"><br><br><br><br><br>
</div>
<br><br><br><br><br>
<div class="sectionStatement">
<h3>about the time series...</h3>
<h4>
The time series plot reveals several patterns in the data on dog registrations.
Firstly, the number of registrations remains relatively stable with a slight upward
trend prior to the year 2020. Secondly, there are seasonal fluctuations within each year,
with registration numbers increasing until September before declining.
Finally, after the COVID-19 pandemic lockdown began in March 2020, registration
numbers began to grow at a much faster rate than before. This might be due to social
isolation and seeking comfort and companionship during the pandemic.
</h4>
</div>
<div class="section">
<div class="container" data-toggle="buttons" onchange="displayRadioValue()">
<label><input class="option-input radio" type="radio" name="borough" value="All" checked>All</label>
<label><input class="option-input radio" type="radio" name="borough" value="Manhattan">Manhattan</label>
<label><input class="option-input radio" type="radio" name="borough" value="Staten Island">Staten Island</label>
<label><input class="option-input radio" type="radio" name="borough" value="Bronx">Bronx</label>
<label><input class="option-input radio" type="radio" name="borough" value="Queens">Queens</label>
<label><input class="option-input radio" type="radio" name="borough" value="Brooklyn">Brooklyn</label>
</div>
</div>
<div style="height:50vh">
<div id="viz1_1" style="height: 100%; width:100%;"></div>
</div>
<br><br><br><br><br>
<!--SECTION TWO---------------------------------------------------------------------------------------------------------------------------------->
<!--`````PANDEMIC DOG BOOM----------------------------------------------------------------------------------------------------------------------->
<div class="sectionStatement">
<h3>what happened to dog population...</h3>
<h4>
Upon conducting an initial analysis of the data, we have discovered that Manhattan,
Brooklyn, and Queens have the highest number of registered dogs, whereas the Bronx
and Staten Island have notably fewer. Upon further investigation, we compared the number
of registered dogs to the population in each borough and found that while
the Bronx has a significantly lower number of registered dogs, it only has a
6% population difference compared to Manhattan (with a population of 1,471,160 in
the Bronx and 1,664,727 in Manhattan).
</h4>
</div>
<!--<div class="section">-->
<!-- <div class="container" data-toggle="buttons" onchange="displayRadioValue_packing()">-->
<!-- <label><input class="option-input radio" type="radio" name="borough" value="All" checked>All</label>-->
<!-- <label><input class="option-input radio" type="radio" name="borough" value="Manhattan">Manhattan</label>-->
<!-- <label><input class="option-input radio" type="radio" name="borough" value="Staten Island">Staten Island</label>-->
<!-- <label><input class="option-input radio" type="radio" name="borough" value="Bronx">Bronx</label>-->
<!-- <label><input class="option-input radio" type="radio" name="borough" value="Queens">Queens</label>-->
<!-- <label><input class="option-input radio" type="radio" name="borough" value="Brooklyn">Brooklyn</label>-->
<!-- </div>-->
<!--</div>-->
<div style="height:90vh">
<div id="viz2_1" style="height: 100%; width:100%;"></div>
</div>
<br><br><br><br><br>
<div class="sectionStatement">
<h3>what are the correlations...</h3>
<h4>To visualize the correlation between our datasets and dog information,
we have categorized our datasets and compared them. Our correlation matrix
has enabled us to quickly understand our datasets and identify the main factors
that influence the dog population. We have found that factors such as <span class="bolded">park availability,
commute time, income, and household size</span> have the strongest correlation with the number of registered dogs.
<br><br>
Our datasets are organized into the following categories:
<br>- Age, race, sex, population, income, poverty
<br>- Citizenship, language
<br>- Occupation, worker status, time to go to work, commute time, college
<br>- Housing, vehicle, household, park
</h4>
</div>
<br><br><br><br><br>
<div class="sectionImg">
<div class="row">
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_age.png"> Age </div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_race.png"> Race </div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_income.png"> Income </div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_poverty.png"> Poverty </div>
</div>
<div class="row">
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_occupation.png"> Occupation </div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_worker.png"> Worker </div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_commuteTime.png"> Commute Time</div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_college.png"> College </div>
</div>
<div class="row">
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_housing.png"> Housing</div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_household.png"> Household </div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_vehicle.png"> Vehicle</div>
<div class="col-sm-3 desc"> <img src="data/correlation/correlation_col_park.png"> Park </div>
</div>
</div>
<br><br><br><br><br>
<!--SECTION THREE------------------------------------------------------------------------------------------------------------------------------>
<!--`````COVID--------------------------------------------------------------------------------------------------------------------------------->
<div class="sectionStatement">
<h3>looking at the map...</h3>
<h4>
Upon examining the map, we have discovered that there is not a direct correlation
between the population of a given zip code region and the number of registered dogs
in that area. For instance, although <span class="underline"> Brooklyn is more densely populated than some other
areas, it has fewer registered dogs</span>. This finding is not unexpected, given that zoning
and land use, as well as various socioeconomic factors, can differ across neighborhoods.
We plan to further explore these factors in our analysis. On the other hand, we have
found that the Manhattan region has a higher number of registered dogs, particularly
in neighborhoods near Central Park. It is worth noting that these areas tend to have
higher rent prices and shorter commute times.
From feature importance of Decision Tree, Gradient Boosting, we see several features that are
highly relevant to the dog counts in each Zip Code: Park Acres, Park Count, Population, Dog Number
</h4>
</div>
<div style="height:100vh">
<div class="section">
</div>
<div class="section" >
<div class="container" data-toggle="buttons" onchange="displayDogValue()">
<label><input class="option-input radio" type="radio" name="dog-human-pop" value="All" checked>All</label>
<label><input class="option-input radio" type="radio" name="dog-human-pop" value="Manhattan">Manhattan</label>
<label><input class="option-input radio" type="radio" name="dog-human-pop" value="Staten Island">Staten Island</label>
<label><input class="option-input radio" type="radio" name="dog-human-pop" value="Bronx">Bronx</label>
<label><input class="option-input radio" type="radio" name="dog-human-pop" value="Queens">Queens</label>
<label><input class="option-input radio" type="radio" name="dog-human-pop" value="Brooklyn">Brooklyn</label>
</div>
</div>
<div>
<div id="tool-fixed" style="padding-left: 120px;position:absolute;"></div>
<div id="map" style="height: 70%; width:100%; position:absolute; z-index: 100"></div>
<div id="tool" style="width: 300px; margin:auto;">
<!-- <circle id="circle" style="color: white;position:absolute"></circle>-->
</div>
</div>
</div>
<!--SECTION FOUR------------------------------------------------------------------------------------------------------------------------------->
<!--`````MAPS---------------------------------------------------------------------------------------------------------------------------------->
<div class="sectionStatement">
<h3>what are the factors...</h3>
<h4>
We have identified several factors that are strongly correlated with the number of registered dogs
in different neighborhoods of New York City. These factors include park acres, commute time, and median income.
<br><br><span class="bolded">Park Acres:</span>
Parks are popular destinations for dog owners and their pets. As a result, areas with more parkland
tend to have higher numbers of registered dogs. Additionally, parks are often located near residential
areas and less frequently found in industrial areas.
<br><br><span class="bolded">Commute Time:</span>
Workers in the Manhattan region have shorter commutes compared to workers in other boroughs, with an
average commute time of 32 minutes. Our analysis has shown that the number of registered dogs is
significantly higher in the area around Manhattan. This is likely due to the higher density of jobs
in Manhattan, as well as its association with higher socioeconomic status.
<br><br><span class="bolded">Median Income:</span>
Owning a dog can be expensive, with costs ranging from food and medical expenses to emergency care and
property damage. Therefore, it is not surprising that there is a strong correlation between median income
and dog ownership. On average, workers in Manhattan have a median yearly income of $107,962, significantly
higher than workers in other boroughs. The Bronx has the lowest median income of all the boroughs, which may
explain why it has significantly fewer registered dogs than its population would suggest.
</h4>
</div>
<div class="sectionImg">
<br><br><br><br><br>
<a href="html/map_counts.html" target="_blank"> <img src="data/map/map_counts2.png"> </a>
<br><br><div class="desc"> Dog Population </div><br>
</div>
<div class="sectionImg">
<div class="row">
<div class="col-sm-4 desc"> <a href="html/map_commute.html" target="_blank"><img src="data/map/map_commute2.png"></a> Commute Time </div>
<div class="col-sm-4 desc"> <a href="html/map_pop.html" target="_blank"><img src="data/map/map_pop2.png"></a> Population </div>
<div class="col-sm-4 desc"> <a href="html/map_rent.html" target="_blank"><img src="data/map/map_rent2.png"></a> Rent </div>
</div>
</div>
<br><br><br><div class="desc"> Click on the image to learn more </div><br>
<br><br><br><br><br>
<div class="sectionStatement">
<h3>Data Science...</h3>
<h4>
Our project aims to predict the number of dogs in each zip code based on various socio-economic factors.
The response variable is the dog count, while the predicting variables are based on the data overview
contents in each notebook
<br><br><span class="bolded">To create our model, we followed these steps:</span>
<br>To create our model, we started with a linear regression baseline model and added
lasso regularization to remove trivial features. Using cross-validation, we determined that
a degree of 1 was optimal for polynomial regression. Grid search was used to find the best
max_depth for a decision tree regressor and plot feature importance. We then used a bagging
regressor and gradient boosting, optimizing the latter with grid search. Our best model nested
an optimal gradient boost regressor within an Ada Boost regressor, achieving a test accuracy of 0.769.
<!-- <br>1. We started with a baseline model using linear regression without penalization.-->
<!-- <br>2. We then added lasso regularization to identify trivial features, which we removed in the next step.-->
<!-- <br>3. To determine the best degree for polynomial regression, we used cross-validation and found that a degree of 1 was optimal.-->
<!-- <br>4. We used grid search cross-validation to find the optimal max_depth for a decision tree regressor and plotted feature importance for further EDA.-->
<!-- <br>5. We used bagging regressor with the decision tree from the previous step as the base estimator.-->
<!-- <br>6. We tried gradient boosting with an initial setup.-->
<!-- <br>7. We used grid search cross-validation with gradient boosting to find the optimal values for n_estimators, learning rate, subsample, and max_depth.-->
<!-- <br>8. Our best model nested the optimal Gradient Boost Regressor within Ada Boost Regressor, which achieved a test accuracy of 0.769.-->
<!-- -->
</h4>
</div>
<br><br><br><br><br><br><br><br><br><br>
<!-- embedding JS libraries -->
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-/bQdsTh/da6pkI1MST/rWKFNjaCP5gBSY4sEBT38Q/9RBh9AH40zEOg7Hlq2THRZ" crossorigin="anonymous"></script>
<!-- d3 -->
<!--<script src="https://d3js.org/d3.v7.min.js"></script>-->
<script src='js/d3.min.js'></script>
<script src='js/d3.v7.min.js'></script>
<!-- own js files -->
<!--<script src="js/link.js"></script>-->
<script src="js/packing.js"></script>
<script src="js/timeSeries.js"></script>
<!-- landing -->
<script src="https://code.jquery.com/jquery-3.2.1.js"></script>
<script src='js/three.min.js'></script>
<script src='https://assets.codepen.io/127738/simplex-noise.js'></script>
<script src='js/gsap.min.js'></script>
<script src='js/d32.min.js' crossorigin="anonymous"></script>
<!--<script src='https://cdnjs.cloudflare.com/ajax/libs/d3/4.2.2/d3.min.js'></script>-->
<!-- landing: own js -->
<script src="js/text_script.js"></script>
<script src="js/script.js"></script>
<!--mapping: own js-->
<script src="js/mapping.js"> </script>
</body>
</html>