generated from statsim/port-template
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
161 lines (154 loc) · 7.29 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>Generate synthetic data in the browser | StatSim Gen</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/materialize.min.css">
<link type="text/css" rel="stylesheet" href="https://statsim.com/port/css/port.css" media="screen"/>
<link rel="icon" type="image/png" href="https://statsim.com/app/images/favicon-32x32.png" sizes="32x32">
<link rel="icon" type="image/png" href="https://statsim.com/app/images/favicon-16x16.png" sizes="16x16">
<link type="text/css" rel="stylesheet" href="https://statsim.com/assets/common.css" media="screen"/>
<style>
a { color: #1ba310 }
.status-bar { background: #f5f5f5 }
th { font-size: 11px; font-weight: 400; color: #999 }
.btn, .port-btn { background: #1ba310 }
.btn:hover, .port-btn:hover { background: #17850d }
.port-btn { background-image: linear-gradient(141deg, #67c036 0%, #1ba310 75%) }
.spinner-green, .spinner-green-only { border-color: #1ba310 }
/*
.file-field .btn { background: #BBB }
.file-field .btn:hover { background: #AAA }
*/
</style>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-7770107-2"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-7770107-2');
</script>
</head>
<body>
<div class="status-bar">
<div class="container">
<div class="row">
<div class="col s12" style="font-size: 14px;">
<div id="menu"></div>
<a href="https://statsim.com/">StatSim</a> → <b>Gen</b>
<span class="version">Version: 0.0.2</span>
</div>
</div>
</div>
</div>
<div class="container">
<div class="row">
<div class="col m12">
<h1>Generate synthetic data in the browser</h1>
<h2>Synthetic datasets for machine learning experiments</h2>
</div>
</div>
<div class="row">
<div id="port-container"></div>
</div>
<div class="row">
<div class="col m12">
<h3>Supported datasets</h3>
<table>
<thead>
<tr>
<th>Dataset</th>
<th>Type</th>
<th>Variables</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr><td><b>Friedman 1</b></td><td>Regression</td><td>10 + 1</td><td>y = 10 * sin(Pi * x1 * x2) + 20 * (x3 - 0.5) ** 2 + 10 * x4 + 5 * x5 + <i>e</i></td></tr>
<tr><td><b>Friedman 2</b></td><td>Regression</td><td>4 + 1</td><td>y = sqrt(x1 ** 2 + (x2 * x3 - 1 / (x2 * x4)) ** 2) + <i>e</i></td></tr>
<tr><td><b>Friedman 3</b></td><td>Regression</td><td>4 + 1</td><td>y = atan(x2 * x3 - 1 / (x2 * x4) / x1) + <i>e</i></td></tr>
<tr><td><b>Peak</b></td><td>Regression</td><td>10 + 1</td><td>Peak Benchmark Problem. From: <a href="https://cran.r-project.org/web/packages/mlbench/mlbench.pdf">mlbench</a></td></tr>
<tr><td><b>Hastie</b></td><td>Classification</td><td>10 + 1</td><td>Binary classification problem used in Hastie et al</td></tr>
<tr><td><b>Moons</b></td><td>Classification</td><td>2 + 1</td><td>Two interleaving half circles</td></tr></tr>
<tr><td><b>Spirals</b></td><td>Classification</td><td>2 + 1</td><td>Two entangled spirals</td></tr></tr>
<tr><td><b>Ringnorm</b></td><td>Classification</td><td>10 + 1</td><td><a href="http://docs.salford-systems.com/BIAS_VARIANCE_ARCING.pdf">Breiman, L. (1996). Bias, variance, and arcing classifiers</a></td></tr></tr>
</tbody>
</table>
<br><br>
</div>
<div class="row features">
<div class="col m4 feature">
<h3>
Unlimited data size
</h3>
<p>
In the real world, data collection is almost always an expensive and complex process. Artificial data is an easier and faster alternative for testing statistical and machine learning methods. Until you have enough RAM and disk space, you can generate any number of records.
</p>
</div>
<div class="col m4 feature">
<h3>
Known generating functions
</h3>
<p>
In many practical cases, observations are noisy, and the data generating function is not fully known. That makes a model evaluation harder. Luckily all synthetic datasets have transparent rules and procedures under the hood. StatSim Gen uses <a href="https://github.com/zemlyansky/mkdata">mkdata</a>, an open-source library that has all its code publicly available on GitHub.
</p>
</div>
<div class="col m4 feature">
<h3>
Save results as CSV files
</h3>
<p>
The comma-separated format is probably the most popular for storing tabular data. Most data processing libs and programs support it. Just save results as a CSV file and then load it into another app. You can preview or profile CSV files using <a href="https://statsim.com/preview/">StatSim Preview</a> and <a href="https://statsim.com/profile/">StatSim Profile</a>, our free web apps. Or fit an XGBoost model in <a href="https://statsim.com/fit/">StatSim Fit</a>.
</p>
</div>
</div>
<div class="row">
<small>
If you enjoyed the app, star us on GitHub. To report errors, create an issue.<br>
</small>
<p>
<a class="github-button" href="https://github.com/statsim/gen" data-icon="octicon-star" data-show-count="true" aria-label="Star statsim/gen on GitHub">Star</a>
<a class="github-button" href="https://github.com/statsim/gen/issues" data-icon="octicon-issue-opened" data-show-count="true" aria-label="Issue statsim/gen on GitHub">Issue</a>
</p>
</div>
</div>
</div>
<script async defer src="https://buttons.github.io/buttons.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.js"></script>
<script src="https://statsim.com/port/dist/port.js"></script>
<script src="https://statsim.com/assets/common.js"></script>
<script>
var port = new Port({
portContainer: document.getElementById('port-container'),
schema: {
"model": {
"type": "function",
"autorun": false,
"worker": true,
"url": "gen.js",
"name": "gen",
},
"inputs": [
{ "type": "select", "name": "Dataset type", "default": "Friedman 1", "options": [
"Friedman 1",
"Friedman 2",
"Friedman 3",
"Peak",
"Hastie",
"Moons",
"Spirals",
"Ringnorm"
]},
{ "type": "select", "name": "File format", "default": "CSV", "options": ["CSV", "JSON"]},
{ "type": "int", "name": "Number of samples", "default": 1000}
],
"outputs": [
{ "type": "file", "name": "file" },
]
}
})
</script>
</body>
</html>