-
Notifications
You must be signed in to change notification settings - Fork 0
/
dataset.html
406 lines (392 loc) · 41.1 KB
/
dataset.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Dataset Handling
</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="description" content="Deep.Net machine learning framework"/>
<meta name="author" content="Deep.Net developers"/>
<script src="https://code.jquery.com/jquery-1.8.0.js"></script>
<script src="https://code.jquery.com/ui/1.8.23/jquery-ui.js"></script>
<script src="https://netdna.bootstrapcdn.com/twitter-bootstrap/2.2.1/js/bootstrap.min.js"></script>
<link href="https://netdna.bootstrapcdn.com/twitter-bootstrap/2.2.1/css/bootstrap-combined.min.css" rel="stylesheet"/>
<link type="text/css" rel="stylesheet" href="http://www.deepml.net/content/style.css" />
<script type="text/javascript" src="http://www.deepml.net/content/tips.js"></script>
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" async
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
</head>
<body>
<div class="container">
<div class="masthead">
<ul class="nav nav-pills pull-right">
<li><a href="http://github.com/DeepMLNet/DeepNet">github page</a></li>
</ul>
<h3 class="muted"><a href="http://www.deepml.net/index.html">Deep.Net</a></h3>
</div>
<hr />
<div class="row">
<div class="span9" id="main">
<h1><a name="Dataset-Handling" class="anchor" href="#Dataset-Handling">Dataset Handling</a></h1>
<p>Deep.Net provides a generic type for handling datasets used in machine learning.
It can handle samples that are of a user-defined record type containing fields of type ArrayNDT.
The following features are provided:</p>
<ul>
<li>data storage on host and CUDA GPU</li>
<li>indexed sample access</li>
<li>sample range access</li>
<li>mini-batch sequencing (with optional padding of last batch)</li>
<li>partitioning into training, validation and test set</li>
<li>loading from and saving to disk</li>
</ul>
<p>We are going to introduce it using a simple, synthetic dataset.</p>
<h2><a name="Creating-a-dataset" class="anchor" href="#Creating-a-dataset">Creating a dataset</a></h2>
<p>In most cases you are going to load a dataset by parsing some text or binary files.
However, since this is quite application-specific we do not want to concern ourselves with it here and will create a synthetic dataset using trigonometric functions on the fly.</p>
<h3><a name="Defining-the-sample-type" class="anchor" href="#Defining-the-sample-type">Defining the sample type</a></h3>
<p>Our sample type consists of two fields: a scalar <span class="math">\(x\)</span> and a vector <span class="math">\(\mathbf{v}\)</span>.
This corresponds to the following record type</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
<span class="l">6: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">open</span> <span class="i">ArrayNDNS</span>
<span class="k">type</span> <span onmouseout="hideTip(event, 'fs1', 1)" onmouseover="showTip(event, 'fs1', 1)" class="t">MySampleType</span> <span class="o">=</span> {
<span onmouseout="hideTip(event, 'fs2', 2)" onmouseover="showTip(event, 'fs2', 2)" class="i">X</span><span class="o">:</span> <span class="i">ArrayNDT</span><span class="o"><</span><span onmouseout="hideTip(event, 'fs3', 3)" onmouseover="showTip(event, 'fs3', 3)" class="i">single</span><span class="o">></span>
<span onmouseout="hideTip(event, 'fs4', 4)" onmouseover="showTip(event, 'fs4', 4)" class="i">V</span><span class="o">:</span> <span class="i">ArrayNDT</span><span class="o"><</span><span onmouseout="hideTip(event, 'fs3', 5)" onmouseover="showTip(event, 'fs3', 5)" class="i">single</span><span class="o">></span>
}
</code></pre></td>
</tr>
</table>
<p>We use the data type <code>single</code> for fast arithmetic operations on the GPU.</p>
<h3><a name="Generating-some-samples" class="anchor" href="#Generating-some-samples">Generating some samples</a></h3>
<p>Next, let us generate some samples.
The scalar <span class="math">\(x\)</span> shall be sampled randomly from a uniform distribution on the interval <span class="math">\([-2, 2]\)</span>.
The values of vector <span class="math">\(v\)</span> shall be given by the relation</p>
<p><span class="math">\[\mathbf{v}(x) = \left( \begin{matrix} \mathrm{sinh} \, x \\ \mathrm{cosh} \, x \end{matrix} \right)\]</span></p>
<p>We can implement that using the following code.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
<span class="l">6: </span>
<span class="l">7: </span>
<span class="l">8: </span>
<span class="l">9: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs5', 6)" onmouseover="showTip(event, 'fs5', 6)" class="f">generateSamples</span> <span onmouseout="hideTip(event, 'fs6', 7)" onmouseover="showTip(event, 'fs6', 7)" class="i">cnt</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs7', 8)" onmouseover="showTip(event, 'fs7', 8)" class="i">seq</span> {
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs8', 9)" onmouseover="showTip(event, 'fs8', 9)" class="i">rng</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs9', 10)" onmouseover="showTip(event, 'fs9', 10)" class="i">System</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs10', 11)" onmouseover="showTip(event, 'fs10', 11)" class="t">Random</span> (<span class="n">100</span>)
<span class="k">for</span> <span onmouseout="hideTip(event, 'fs11', 12)" onmouseover="showTip(event, 'fs11', 12)" class="i">n</span> <span class="o">=</span> <span class="n">0</span> <span class="k">to</span> <span onmouseout="hideTip(event, 'fs6', 13)" onmouseover="showTip(event, 'fs6', 13)" class="i">cnt</span> <span class="o">-</span> <span class="n">1</span> <span class="k">do</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs12', 14)" onmouseover="showTip(event, 'fs12', 14)" class="i">x</span> <span class="o">=</span> <span class="n">2.</span> <span class="o">*</span> (<span onmouseout="hideTip(event, 'fs8', 15)" onmouseover="showTip(event, 'fs8', 15)" class="i">rng</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs13', 16)" onmouseover="showTip(event, 'fs13', 16)" class="f">NextDouble</span> () <span class="o">-</span> <span class="n">0.5</span>) <span class="o">*</span> <span class="n">2.</span> <span class="o">|></span> <span onmouseout="hideTip(event, 'fs3', 17)" onmouseover="showTip(event, 'fs3', 17)" class="f">single</span>
<span class="k">yield</span> {
<span onmouseout="hideTip(event, 'fs2', 18)" onmouseover="showTip(event, 'fs2', 18)" class="i">X</span> <span class="o">=</span> <span class="i">ArrayNDHost</span><span class="o">.</span><span class="i">scalar</span> <span onmouseout="hideTip(event, 'fs12', 19)" onmouseover="showTip(event, 'fs12', 19)" class="i">x</span>
<span onmouseout="hideTip(event, 'fs4', 20)" onmouseover="showTip(event, 'fs4', 20)" class="i">V</span> <span class="o">=</span> <span class="i">ArrayNDHost</span><span class="o">.</span><span class="i">ofList</span> [<span onmouseout="hideTip(event, 'fs14', 21)" onmouseover="showTip(event, 'fs14', 21)" class="i">sinh</span> <span onmouseout="hideTip(event, 'fs12', 22)" onmouseover="showTip(event, 'fs12', 22)" class="i">x</span>; <span onmouseout="hideTip(event, 'fs15', 23)" onmouseover="showTip(event, 'fs15', 23)" class="i">cosh</span> <span onmouseout="hideTip(event, 'fs12', 24)" onmouseover="showTip(event, 'fs12', 24)" class="i">x</span>]
}
}
</code></pre></td>
</tr>
</table>
<p>The generateSamples function produces the specified number of samples.
We can test it as follows.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs16', 25)" onmouseover="showTip(event, 'fs16', 25)" class="i">smpls</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs5', 26)" onmouseover="showTip(event, 'fs5', 26)" class="f">generateSamples</span> <span class="n">100</span> <span class="o">|></span> <span onmouseout="hideTip(event, 'fs17', 27)" onmouseover="showTip(event, 'fs17', 27)" class="t">List</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs18', 28)" onmouseover="showTip(event, 'fs18', 28)" class="f">ofSeq</span>
<span class="k">for</span> <span onmouseout="hideTip(event, 'fs19', 29)" onmouseover="showTip(event, 'fs19', 29)" class="i">idx</span>, <span onmouseout="hideTip(event, 'fs20', 30)" onmouseover="showTip(event, 'fs20', 30)" class="i">smpl</span> <span class="k">in</span> <span onmouseout="hideTip(event, 'fs17', 31)" onmouseover="showTip(event, 'fs17', 31)" class="t">List</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs21', 32)" onmouseover="showTip(event, 'fs21', 32)" class="f">indexed</span> <span onmouseout="hideTip(event, 'fs16', 33)" onmouseover="showTip(event, 'fs16', 33)" class="i">smpls</span> <span class="k">do</span>
<span onmouseout="hideTip(event, 'fs22', 34)" onmouseover="showTip(event, 'fs22', 34)" class="f">printfn</span> <span class="s">"Sample </span><span class="pf">%3d</span><span class="s">: X=</span><span class="pf">%A</span><span class="s"> V=</span><span class="pf">%A</span><span class="s">"</span> <span class="i">idx</span> <span onmouseout="hideTip(event, 'fs20', 35)" onmouseover="showTip(event, 'fs20', 35)" class="i">smpl</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs2', 36)" onmouseover="showTip(event, 'fs2', 36)" class="i">X</span> <span onmouseout="hideTip(event, 'fs20', 37)" onmouseover="showTip(event, 'fs20', 37)" class="i">smpl</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs4', 38)" onmouseover="showTip(event, 'fs4', 38)" class="i">V</span>
</code></pre></td>
</tr>
</table>
<p>This prints</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
<span class="l">6: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Sample</span> <span class="n">0</span><span class="o">:</span> <span class="i">X</span><span class="o">=</span> <span class="n">1.8751</span> <span class="i">V</span><span class="o">=</span>[ <span class="n">3.1841</span> <span class="n">3.3374</span>]
<span class="i">Sample</span> <span class="n">1</span><span class="o">:</span> <span class="i">X</span><span class="o">=</span> <span class="o">-</span><span class="n">1.3633</span> <span class="i">V</span><span class="o">=</span>[ <span class="o">-</span><span class="n">1.8265</span> <span class="n">2.0824</span>]
<span class="i">Sample</span> <span class="n">2</span><span class="o">:</span> <span class="i">X</span><span class="o">=</span> <span class="n">0.6673</span> <span class="i">V</span><span class="o">=</span>[ <span class="n">0.7179</span> <span class="n">1.2310</span>]
<span class="i">Sample</span> <span class="n">3</span><span class="o">:</span> <span class="i">X</span><span class="o">=</span> <span class="n">1.6098</span> <span class="i">V</span><span class="o">=</span>[ <span class="n">2.4010</span> <span class="n">2.6009</span>]
<span class="o">..</span><span class="o">.</span>
<span class="i">Sample</span> <span class="n">99</span><span class="o">:</span> <span class="i">X</span><span class="o">=</span> <span class="o">-</span><span class="n">0.1610</span> <span class="i">V</span><span class="o">=</span>[ <span class="o">-</span><span class="n">0.1617</span> <span class="n">1.0130</span>]
</code></pre></td>
</tr>
</table>
<p>Now that we have some data, we can create a dataset.</p>
<h3><a name="Instantiating-the-dataset-type" class="anchor" href="#Instantiating-the-dataset-type">Instantiating the dataset type</a></h3>
<p>There are two ways to construct a dataset.</p>
<ol>
<li>The <code>Dataset<'S>.FromSamples</code> takes a sequence of samples (of type 'S) and constructs a dataset from them.</li>
<li>The <code>Dataset<'S></code> constructor takes a list of ArrayNDTs corresponding to the fields of the record type 'S. The first dimension of each passed array must correspond to the sample index.</li>
</ol>
<p>Since we already have a sequence of sample, we use the first method.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">open</span> <span class="i">Datasets</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs23', 39)" onmouseover="showTip(event, 'fs23', 39)" class="i">ds</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs16', 40)" onmouseover="showTip(event, 'fs16', 40)" class="i">smpls</span> <span class="o">|></span> <span class="i">Dataset</span><span class="o">.</span><span class="i">FromSamples</span>
</code></pre></td>
</tr>
</table>
<h2><a name="Accessing-single-and-multiple-elements" class="anchor" href="#Accessing-single-and-multiple-elements">Accessing single and multiple elements</a></h2>
<p>The dataset type supports the indexing and <a href="https://blogs.msdn.microsoft.com/chrsmith/2008/12/09/f-zen-array-slices/">slicing</a> operations to access samples.</p>
<p>When accessing a single sample using the indexing operator we obtain a record from the sequence of samples we passed into the <code>Dataset.FromSamples</code> methods.
For example to print the third sample we write</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs24', 41)" onmouseover="showTip(event, 'fs24', 41)" class="i">smpl2</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs23', 42)" onmouseover="showTip(event, 'fs23', 42)" class="i">ds</span><span class="o">.</span>[<span class="n">2</span>]
<span onmouseout="hideTip(event, 'fs22', 43)" onmouseover="showTip(event, 'fs22', 43)" class="f">printfn</span> <span class="s">"Sample 3: X=</span><span class="pf">%A</span><span class="s"> V=</span><span class="pf">%A</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs24', 44)" onmouseover="showTip(event, 'fs24', 44)" class="i">smpl2</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs2', 45)" onmouseover="showTip(event, 'fs2', 45)" class="i">X</span> <span onmouseout="hideTip(event, 'fs24', 46)" onmouseover="showTip(event, 'fs24', 46)" class="i">smpl2</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs4', 47)" onmouseover="showTip(event, 'fs4', 47)" class="i">V</span>
</code></pre></td>
</tr>
</table>
<p>and get the output</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Sample</span> <span class="n">3</span><span class="o">:</span> <span class="i">X</span><span class="o">=</span> <span class="n">0.6673</span> <span class="i">V</span><span class="o">=</span>[ <span class="n">0.7179</span> <span class="n">1.2310</span>]
</code></pre></td>
</tr>
</table>
<p>When accessing multiple elements using the slicing operator, the returned value is of the same sample record type but the contained tensors have one additional dimension on the left corresponding to the sample index.
For example we can get a record containing the first three sample using the following code.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs25', 48)" onmouseover="showTip(event, 'fs25', 48)" class="i">smpl0to2</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs23', 49)" onmouseover="showTip(event, 'fs23', 49)" class="i">ds</span><span class="o">.</span>[<span class="n">0..</span><span class="n">2</span>]
<span onmouseout="hideTip(event, 'fs22', 50)" onmouseover="showTip(event, 'fs22', 50)" class="f">printfn</span> <span class="s">"Samples 0,1,2:</span><span class="e">\n</span><span class="s">X=</span><span class="pf">%A</span><span class="s"></span><span class="e">\n</span><span class="s">V=</span><span class="e">\n</span><span class="s"></span><span class="pf">%A</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs25', 51)" onmouseover="showTip(event, 'fs25', 51)" class="i">smpl0to2</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs2', 52)" onmouseover="showTip(event, 'fs2', 52)" class="i">X</span> <span onmouseout="hideTip(event, 'fs25', 53)" onmouseover="showTip(event, 'fs25', 53)" class="i">smpl0to2</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs4', 54)" onmouseover="showTip(event, 'fs4', 54)" class="i">V</span>
</code></pre></td>
</tr>
</table>
<p>This prints</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
<span class="l">6: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Samples</span> <span class="n">0</span>,<span class="n">1</span>,<span class="n">2</span><span class="o">:</span>
<span class="i">X</span><span class="o">=</span>[ <span class="n">1.8751</span> <span class="o">-</span><span class="n">1.3633</span> <span class="n">0.6673</span>]
<span class="i">V</span><span class="o">=</span>
[[ <span class="n">3.1841</span> <span class="n">3.3374</span>]
[ <span class="o">-</span><span class="n">1.8265</span> <span class="n">2.0824</span>]
[ <span class="n">0.7179</span> <span class="n">1.2310</span>]]
</code></pre></td>
</tr>
</table>
<p>Hence all tensors in the sample record raise in rank by one dimension, i.e. the scalar <code>X</code> became a vector and the vector <code>V</code> became a matrix with each row corresponding to a sample.</p>
<h2><a name="Iterating-over-the-dataset" class="anchor" href="#Iterating-over-the-dataset">Iterating over the dataset</a></h2>
<p>You can also iterate over the samples of the dataset directly.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">for</span> <span onmouseout="hideTip(event, 'fs26', 55)" onmouseover="showTip(event, 'fs26', 55)" class="i">smpl</span> <span class="k">in</span> <span onmouseout="hideTip(event, 'fs23', 56)" onmouseover="showTip(event, 'fs23', 56)" class="i">ds</span> <span class="k">do</span>
<span onmouseout="hideTip(event, 'fs22', 57)" onmouseover="showTip(event, 'fs22', 57)" class="f">printfn</span> <span class="s">"Sample: </span><span class="pf">%A</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs26', 58)" onmouseover="showTip(event, 'fs26', 58)" class="i">smpl</span>
</code></pre></td>
</tr>
</table>
<p>This prints</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l"> 1: </span>
<span class="l"> 2: </span>
<span class="l"> 3: </span>
<span class="l"> 4: </span>
<span class="l"> 5: </span>
<span class="l"> 6: </span>
<span class="l"> 7: </span>
<span class="l"> 8: </span>
<span class="l"> 9: </span>
<span class="l">10: </span>
<span class="l">11: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Sample</span><span class="o">:</span> {<span class="i">X</span> <span class="o">=</span> <span class="n">1.8751</span>;
<span class="i">V</span> <span class="o">=</span> [ <span class="n">3.1841</span> <span class="n">3.3374</span>];}
<span class="i">Sample</span><span class="o">:</span> {<span class="i">X</span> <span class="o">=</span> <span class="o">-</span><span class="n">1.3633</span>;
<span class="i">V</span> <span class="o">=</span> [ <span class="o">-</span><span class="n">1.8265</span> <span class="n">2.0824</span>];}
<span class="i">Sample</span><span class="o">:</span> {<span class="i">X</span> <span class="o">=</span> <span class="n">0.6673</span>;
<span class="i">V</span> <span class="o">=</span> [ <span class="n">0.7179</span> <span class="n">1.2310</span>];}
<span class="i">Sample</span><span class="o">:</span> {<span class="i">X</span> <span class="o">=</span> <span class="n">1.6098</span>;
<span class="i">V</span> <span class="o">=</span> [ <span class="n">2.4010</span> <span class="n">2.6009</span>];}
<span class="o">..</span><span class="o">.</span>
<span class="i">Sample</span><span class="o">:</span> {<span class="i">X</span> <span class="o">=</span> <span class="o">-</span><span class="n">0.1610</span>;
<span class="i">V</span> <span class="o">=</span> [ <span class="o">-</span><span class="n">0.1617</span> <span class="n">1.0130</span>];}
</code></pre></td>
</tr>
</table>
<h2><a name="Mini-batches" class="anchor" href="#Mini-batches">Mini-batches</a></h2>
<p>The <code>ds.Batches</code> function returns a sequence of mini-batches from the dataset.
It takes one argument specifying the number of samples in each batch.
If the total number of samples in the dataset is not a multiple of the batch size, the last batch will have less samples.</p>
<p>The following code prints the sizes of the obtained mini-batches.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">for</span> <span onmouseout="hideTip(event, 'fs19', 59)" onmouseover="showTip(event, 'fs19', 59)" class="i">idx</span>, <span onmouseout="hideTip(event, 'fs27', 60)" onmouseover="showTip(event, 'fs27', 60)" class="i">batch</span> <span class="k">in</span> <span onmouseout="hideTip(event, 'fs28', 61)" onmouseover="showTip(event, 'fs28', 61)" class="t">Seq</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs29', 62)" onmouseover="showTip(event, 'fs29', 62)" class="f">indexed</span> (<span onmouseout="hideTip(event, 'fs23', 63)" onmouseover="showTip(event, 'fs23', 63)" class="i">ds</span><span class="o">.</span><span class="i">Batches</span> <span class="n">30</span>) <span class="k">do</span>
<span onmouseout="hideTip(event, 'fs22', 64)" onmouseover="showTip(event, 'fs22', 64)" class="f">printfn</span> <span class="s">"Batch </span><span class="pf">%d</span><span class="s">: shape of X: </span><span class="pf">%A</span><span class="s"> shape of V: </span><span class="pf">%A</span><span class="s">"</span>
<span class="i">idx</span> <span onmouseout="hideTip(event, 'fs27', 65)" onmouseover="showTip(event, 'fs27', 65)" class="i">batch</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs2', 66)" onmouseover="showTip(event, 'fs2', 66)" class="i">X</span><span class="o">.</span><span class="i">Shape</span> <span onmouseout="hideTip(event, 'fs27', 67)" onmouseover="showTip(event, 'fs27', 67)" class="i">batch</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs4', 68)" onmouseover="showTip(event, 'fs4', 68)" class="i">V</span><span class="o">.</span><span class="i">Shape</span>
</code></pre></td>
</tr>
</table>
<p>This outputs</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Batch</span> <span class="n">0</span><span class="o">:</span> <span class="i">shape</span> <span class="k">of</span> <span class="i">X</span><span class="o">:</span> [<span class="n">30</span>] <span class="i">shape</span> <span class="k">of</span> <span class="i">V</span><span class="o">:</span> [<span class="n">30</span>; <span class="n">2</span>]
<span class="i">Batch</span> <span class="n">1</span><span class="o">:</span> <span class="i">shape</span> <span class="k">of</span> <span class="i">X</span><span class="o">:</span> [<span class="n">30</span>] <span class="i">shape</span> <span class="k">of</span> <span class="i">V</span><span class="o">:</span> [<span class="n">30</span>; <span class="n">2</span>]
<span class="i">Batch</span> <span class="n">2</span><span class="o">:</span> <span class="i">shape</span> <span class="k">of</span> <span class="i">X</span><span class="o">:</span> [<span class="n">30</span>] <span class="i">shape</span> <span class="k">of</span> <span class="i">V</span><span class="o">:</span> [<span class="n">30</span>; <span class="n">2</span>]
<span class="i">Batch</span> <span class="n">3</span><span class="o">:</span> <span class="i">shape</span> <span class="k">of</span> <span class="i">X</span><span class="o">:</span> [<span class="n">10</span>] <span class="i">shape</span> <span class="k">of</span> <span class="i">V</span><span class="o">:</span> [<span class="n">10</span>; <span class="n">2</span>]
</code></pre></td>
</tr>
</table>
<p>If you need the last batch to be padded to the specified batch size, use the <code>ds.PaddedBatches</code> method instead.</p>
<h2><a name="Partitioning" class="anchor" href="#Partitioning">Partitioning</a></h2>
<p>It is often necessary to split a dataset into partitions.</p>
<p>The <code>ds.Partition</code> methods takes a list of ratios and returns a list of new datasets obtained by splitting the dataset according to the specified ratios.
Partitioning is done by sequentially taking samples from the beginning, until the first partition has the requested number of samples.
Then the samples for the second partition are taken and so on.</p>
<p>The following example splits our dataset into three partitions of ratios <span class="math">\(1/2\)</span>, <span class="math">\(1/4\)</span> and <span class="math">\(1/4\)</span>.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs30', 69)" onmouseover="showTip(event, 'fs30', 69)" class="i">partitions</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs23', 70)" onmouseover="showTip(event, 'fs23', 70)" class="i">ds</span><span class="o">.</span><span class="i">Partition</span> [<span class="n">0.5</span>; <span class="n">0.25</span>; <span class="n">0.25</span>]
<span class="k">for</span> <span onmouseout="hideTip(event, 'fs19', 71)" onmouseover="showTip(event, 'fs19', 71)" class="i">idx</span>, <span onmouseout="hideTip(event, 'fs31', 72)" onmouseover="showTip(event, 'fs31', 72)" class="i">p</span> <span class="k">in</span> <span onmouseout="hideTip(event, 'fs17', 73)" onmouseover="showTip(event, 'fs17', 73)" class="t">List</span><span class="o">.</span><span onmouseout="hideTip(event, 'fs21', 74)" onmouseover="showTip(event, 'fs21', 74)" class="f">indexed</span> <span onmouseout="hideTip(event, 'fs30', 75)" onmouseover="showTip(event, 'fs30', 75)" class="i">partitions</span> <span class="k">do</span>
<span onmouseout="hideTip(event, 'fs22', 76)" onmouseover="showTip(event, 'fs22', 76)" class="f">printfn</span> <span class="s">"Partition </span><span class="pf">%d</span><span class="s"> has </span><span class="pf">%d</span><span class="s"> samples."</span> <span class="i">idx</span> <span onmouseout="hideTip(event, 'fs31', 77)" onmouseover="showTip(event, 'fs31', 77)" class="i">p</span><span class="o">.</span><span class="i">NSamples</span>
</code></pre></td>
</tr>
</table>
<p>This prints</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Partition</span> <span class="n">0</span> <span class="i">has</span> <span class="n">50</span> <span class="i">samples</span><span class="o">.</span>
<span class="i">Partition</span> <span class="n">1</span> <span class="i">has</span> <span class="n">25</span> <span class="i">samples</span><span class="o">.</span>
<span class="i">Partition</span> <span class="n">2</span> <span class="i">has</span> <span class="n">25</span> <span class="i">samples</span><span class="o">.</span>
</code></pre></td>
</tr>
</table>
<h3><a name="Training-validation-and-test-splits" class="anchor" href="#Training-validation-and-test-splits">Training, validation and test splits</a></h3>
<p>In machine learning it is common practice to split the dataset into a training, validation and test dataset.
Deep.Net provides the <code>TrnValTst<'S></code> type for that purpose.
It is a record type with the fields <code>Trn</code>, <code>Val</code> and <code>Tst</code> of type <code>Dataset<'S></code>.
It can be constructed from an existing dataset using the <code>TrnValTst.Of</code> function.</p>
<p>The following code demonstrates its use using the ratios <span class="math">\(0.7\)</span>, <span class="math">\(0.15\)</span> and <span class="math">\(0.15\)</span> for the train, validation and test set respectively.
The ratio specification is optional; if it is omitted ratios of <span class="math">\(0.8\)</span>, <span class="math">\(0.1\)</span> and <span class="math">\(0.1\)</span> are used.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs32', 78)" onmouseover="showTip(event, 'fs32', 78)" class="i">dsp</span> <span class="o">=</span> <span class="i">TrnValTst</span><span class="o">.</span><span class="i">Of</span> (<span onmouseout="hideTip(event, 'fs23', 79)" onmouseover="showTip(event, 'fs23', 79)" class="i">ds</span>, <span class="n">0.7</span>, <span class="n">0.15</span>, <span class="n">0.15</span>)
<span onmouseout="hideTip(event, 'fs22', 80)" onmouseover="showTip(event, 'fs22', 80)" class="f">printfn</span> <span class="s">"Training set size: </span><span class="pf">%d</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs32', 81)" onmouseover="showTip(event, 'fs32', 81)" class="i">dsp</span><span class="o">.</span><span class="i">Trn</span><span class="o">.</span><span class="i">NSamples</span>
<span onmouseout="hideTip(event, 'fs22', 82)" onmouseover="showTip(event, 'fs22', 82)" class="f">printfn</span> <span class="s">"Validation set size: </span><span class="pf">%d</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs32', 83)" onmouseover="showTip(event, 'fs32', 83)" class="i">dsp</span><span class="o">.</span><span class="i">Val</span><span class="o">.</span><span class="i">NSamples</span>
<span onmouseout="hideTip(event, 'fs22', 84)" onmouseover="showTip(event, 'fs22', 84)" class="f">printfn</span> <span class="s">"Test set size: </span><span class="pf">%d</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs32', 85)" onmouseover="showTip(event, 'fs32', 85)" class="i">dsp</span><span class="o">.</span><span class="i">Tst</span><span class="o">.</span><span class="i">NSamples</span>
</code></pre></td>
</tr>
</table>
<p>This prints</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Training</span> <span onmouseout="hideTip(event, 'fs33', 86)" onmouseover="showTip(event, 'fs33', 86)" class="i">set</span> <span class="i">size</span><span class="o">:</span> <span class="n">70</span>
<span class="i">Validation</span> <span onmouseout="hideTip(event, 'fs33', 87)" onmouseover="showTip(event, 'fs33', 87)" class="i">set</span> <span class="i">size</span><span class="o">:</span> <span class="n">15</span>
<span class="i">Test</span> <span onmouseout="hideTip(event, 'fs33', 88)" onmouseover="showTip(event, 'fs33', 88)" class="i">set</span> <span class="i">size</span><span class="o">:</span> <span class="n">15</span>
</code></pre></td>
</tr>
</table>
<h2><a name="Data-transfer" class="anchor" href="#Data-transfer">Data transfer</a></h2>
<p>The <code>ds.ToCuda</code> and <code>ds.ToHost</code> methods copy the dataset to the CUDA GPU or to the host respectively.
The TrnValTst type provides the same methods.</p>
<h2><a name="Disk-storage" class="anchor" href="#Disk-storage">Disk storage</a></h2>
<p>Use the <code>ds.Save</code> method to save a dataset to disk using the HDF5 format.
The <code>Dataset<'S>.Load</code> function loads a saved dataset.
The TrnValTst type provides the same methods.</p>
<h1><a name="Dataset-loaders" class="anchor" href="#Dataset-loaders">Dataset loaders</a></h1>
<p>Currently Deep.Net provides the following loaders for common datasets.</p>
<ul>
<li><strong>MNIST</strong>. Use the <code>Mnist.load</code> function. It takes two parameters; the first is the path to the MNIST dataset (containing the files <code>t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz</code>) and the second is the desired ratio of the validation set to the training set (for example 0.166 if you want 50 000 training samples and 10 000 validation samples). The sample type <code>MnistT</code> contains two fields: <code>Img</code> for the flattened images and <code>Lbl</code> for the images in one-hot encoding.</li>
</ul>
<h1><a name="Summary" class="anchor" href="#Summary">Summary</a></h1>
<p>The <code>Dataset<'S></code> type provides a convenient way to work with datasets.
Type-safety is provided by preserving the user-specified sample type <code>'S</code> when accessing individual or multiple samples.
The dataset handler is used by the <a href="training.html">generic training function</a>.</p>
<div class="tip" id="fs1">type MySampleType =<br />  {X: obj;<br />   V: obj;}<br /><br />Full name: Dataset.MySampleType</div>
<div class="tip" id="fs2">MySampleType.X: obj</div>
<div class="tip" id="fs3">Multiple items<br />val single : value:'T -> single (requires member op_Explicit)<br /><br />Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.single<br /><br />--------------------<br />type single = System.Single<br /><br />Full name: Microsoft.FSharp.Core.single</div>
<div class="tip" id="fs4">MySampleType.V: obj</div>
<div class="tip" id="fs5">val generateSamples : cnt:int -> seq<MySampleType><br /><br />Full name: Dataset.generateSamples</div>
<div class="tip" id="fs6">val cnt : int</div>
<div class="tip" id="fs7">Multiple items<br />val seq : sequence:seq<'T> -> seq<'T><br /><br />Full name: Microsoft.FSharp.Core.Operators.seq<br /><br />--------------------<br />type seq<'T> = System.Collections.Generic.IEnumerable<'T><br /><br />Full name: Microsoft.FSharp.Collections.seq<_></div>
<div class="tip" id="fs8">val rng : System.Random</div>
<div class="tip" id="fs9">namespace System</div>
<div class="tip" id="fs10">Multiple items<br />type Random =<br />  new : unit -> Random + 1 overload<br />  member Next : unit -> int + 2 overloads<br />  member NextBytes : buffer:byte[] -> unit<br />  member NextDouble : unit -> float<br /><br />Full name: System.Random<br /><br />--------------------<br />System.Random() : unit<br />System.Random(Seed: int) : unit</div>
<div class="tip" id="fs11">val n : int</div>
<div class="tip" id="fs12">val x : single</div>
<div class="tip" id="fs13">System.Random.NextDouble() : float</div>
<div class="tip" id="fs14">val sinh : value:'T -> 'T (requires member Sinh)<br /><br />Full name: Microsoft.FSharp.Core.Operators.sinh</div>
<div class="tip" id="fs15">val cosh : value:'T -> 'T (requires member Cosh)<br /><br />Full name: Microsoft.FSharp.Core.Operators.cosh</div>
<div class="tip" id="fs16">val smpls : MySampleType list<br /><br />Full name: Dataset.smpls</div>
<div class="tip" id="fs17">Multiple items<br />module List<br /><br />from Microsoft.FSharp.Collections<br /><br />--------------------<br />type List<'T> =<br />  | ( [] )<br />  | ( :: ) of Head: 'T * Tail: 'T list<br />  interface IEnumerable<br />  interface IEnumerable<'T><br />  member GetSlice : startIndex:int option * endIndex:int option -> 'T list<br />  member Head : 'T<br />  member IsEmpty : bool<br />  member Item : index:int -> 'T with get<br />  member Length : int<br />  member Tail : 'T list<br />  static member Cons : head:'T * tail:'T list -> 'T list<br />  static member Empty : 'T list<br /><br />Full name: Microsoft.FSharp.Collections.List<_></div>
<div class="tip" id="fs18">val ofSeq : source:seq<'T> -> 'T list<br /><br />Full name: Microsoft.FSharp.Collections.List.ofSeq</div>
<div class="tip" id="fs19">val idx : int</div>
<div class="tip" id="fs20">val smpl : MySampleType</div>
<div class="tip" id="fs21">val indexed : list:'T list -> (int * 'T) list<br /><br />Full name: Microsoft.FSharp.Collections.List.indexed</div>
<div class="tip" id="fs22">val printfn : format:Printf.TextWriterFormat<'T> -> 'T<br /><br />Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn</div>
<div class="tip" id="fs23">val ds : seq<obj><br /><br />Full name: Dataset.ds</div>
<div class="tip" id="fs24">val smpl2 : MySampleType<br /><br />Full name: Dataset.smpl2</div>
<div class="tip" id="fs25">val smpl0to2 : MySampleType<br /><br />Full name: Dataset.smpl0to2</div>
<div class="tip" id="fs26">val smpl : obj</div>
<div class="tip" id="fs27">val batch : MySampleType</div>
<div class="tip" id="fs28">module Seq<br /><br />from Microsoft.FSharp.Collections</div>
<div class="tip" id="fs29">val indexed : source:seq<'T> -> seq<int * 'T><br /><br />Full name: Microsoft.FSharp.Collections.Seq.indexed</div>
<div class="tip" id="fs30">val partitions : obj list<br /><br />Full name: Dataset.partitions</div>
<div class="tip" id="fs31">val p : obj</div>
<div class="tip" id="fs32">val dsp : obj<br /><br />Full name: Dataset.dsp</div>
<div class="tip" id="fs33">val set : elements:seq<'T> -> Set<'T> (requires comparison)<br /><br />Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.set</div>
</div>
<div class="span3">
<!-- <img src="http://www.deepml.net/img/logo.png" alt="Deep.Net logo" style="width:150px;margin:10px" /> -->
<ul class="nav nav-list" id="menu" style="margin-top: 20px;">
<li class="nav-header">Deep.Net</li>
<li><a href="http://www.deepml.net/index.html">Home page</a></li>
<li class="divider"></li>
<li><a href="http://nuget.org/packages/DeepNet">Get Library via NuGet</a></li>
<li><a href="http://github.com/DeepMLNet/DeepNet">Source Code on GitHub</a></li>
<li><a href="http://www.deepml.net/release-notes.html">Release Notes</a></li>
<li class="nav-header">Basics</li>
<li><a href="http://www.deepml.net/tensor.html">Working with Tensors</a></li>
<li><a href="http://www.deepml.net/model.html">Model Definition</a></li>
<li><a href="http://www.deepml.net/components.html">Model Components</a></li>
<li><a href="http://www.deepml.net/dataset.html">Dataset Handling</a></li>
<li><a href="http://www.deepml.net/training.html">Training</a></li>
<li class="nav-header">Advanced</li>
<li><a href="http://www.deepml.net/diff.html">Automatic Differentiation</a></li>
<li class="nav-header">Documentation</li>
<li><a href="http://www.deepml.net/reference/index.html">API Reference</a></li>
</ul>
</div>
</div>
</div>
<a href="http://github.com/DeepMLNet/DeepNet"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_gray_6d6d6d.png" alt="Fork me on GitHub"/></a>
</body>
</html>