-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathgetting-started-with-dataset.html
656 lines (625 loc) · 38.2 KB
/
getting-started-with-dataset.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
<!DOCTYPE html>
<html>
<head>
<title>Dataset Project -- How to ...</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="https://caltechlibrary.github.io/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu" title="link to Caltech Library Homepage"><img src="https://caltechlibrary.github.io/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="../">README</a></li>
<li><a href="../user-manual.html">User Manual</a></li>
<li><a href="../docs/">Documentation</a></li>
<li><a href="./">How To</a></li>
<li><a href="../libdataset/">Libdataset</a></li>
<li><a href="../about.html">About</a></li>
<li><a href="../search.html">Search</a></li>
<li><a href="https://github.com/caltechlibrary/dataset">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="getting-started-with-dataset">Getting started with dataset</h1>
<p><em>dataset</em> is designed to easily manage collections of JSON
documents. A JSON object is associated with a unique key you provide. If
you are using the default storage engine the objects themselves are
stored on disc in a folder inside the collection folder. If you are
using a SQL storage engine they are stored in a column of a table of the
collection in your SQL database.</p>
<p>The collection folder contains a JSON object document called
<em>collection.json</em>. This file stores operational metadata about
the collection. If the collection is using a pairtree then a
<em>keymap.json</em> file will include the association of keys with
paths to their objects. When a collection is initialized a minimal
codemeta.json file will created describing the collection. This can be
update to a full codemeta.json file, follow the guideline and practice
described at the <a href="https://codemeta.github.io">codemeta</a>
website.</p>
<p><em>dataset</em> comes in several flavors — a command line program
called <em>dataset</em>, a web service called <em>datasetd</em> and the
Go language package used to build for programs.</p>
<p>This tutorial talks both the command line program and the Go package.
The command line is great for simple setup, the Go package allows you to
build on other programs that use dataset collections for content
persistence.</p>
<h2 id="create-a-collection-with-init">Create a collection with
init</h2>
<p>To create a collection you use the init verb. In the following
examples you will see how to do this with both the command line tool
<em>dataset</em> as well as the Python module of the same name.</p>
<p>Let's create a collection called <em>friends.ds</em>. At the command
line type the following.</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> init friends.ds</span></code></pre></div>
<p>Notice that after you typed this and press enter you see an "OK"
response. If there had been an error then you would have seen an error
message instead.</p>
<p>Working in Go is similar. We use the <code>dataset.Init()</code> func
to create our new collection. We can import the “dataset” package using
the import line <code>"github.com/caltechlibrary/dataset"</code>. Here’s
a general code sketch.</p>
<pre class="golang"><code> import (
// import the packages your program needs ...
"fmt"
"os"
// import dataset
"github.com/caltechlibrary/dataset"
)
func main() {
// The dataset collection is held in 'c'
// This create the collection "friends.ds"
collectionName := "frieds.ds"
// "c" is a handle to the collection
c, err := dataset.init(collectionName)
if err != nil {
fmt.Fprintf(os.Stderr, "Something went wrong, %s\n", err)
os.Exit(1)
}
defer c.Close() // Remember to close your collection
fmt.Printf("Created %q, ready to use\n", collectionName)
}</code></pre>
<p>In this Go example if the error is nil a statement is written to
standard out saying the collection was created, if not an error is
shown.</p>
<h3 id="removing-friends.ds">removing friends.ds</h3>
<p>There is no dataset verb to remove a collection. A collection is just
a folder with some files in it. You can delete the collection by
throwing the folder in the trash (Mac OS X and Windows) or using a
recursive remove in the Unix shell.</p>
<pre class="shell"><code> rm -fR friends.ds</code></pre>
<p>Or using <code>os.RemoveAll()</code> in Go programs.</p>
<pre><code> if _, err := os.Stat(collectionName); err == nil {
os.RemoveAll(collectionName)
}</code></pre>
<h2 id="create-read-update-and-delete">create, read, update and
delete</h2>
<p>As with many systems that store information dataset provides for
basic operations of creating, updating and deleting. In the following
section we will work with the <em>friends.ds</em> collection and
<em>favorites.ds</em> collection we created previously.</p>
<p>I have some friends who are characters in <a
href="https://zbs.org">ZBS</a> radio plays. I am going to create and
save some of their info in our collection called <em>friends.ds</em>. I
am going to store their name and email address so I can contact them.
Their names are Little Frieda, Mojo Sam and Jack Flanders.</p>
<div class="sourceCode" id="cb5"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> create friends.ds frieda <span class="dt">\</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="st">'{"name":"Little Frieda","email":"[email protected]"}'</span></span></code></pre></div>
<p>Notice the "OK". Just like <em>init</em> the <em>create</em> verb
returns a status. "OK" means everything is good, otherwise an error is
shown.</p>
<p>Doing the same thing in Go would look like. Note we have to
explicitly <code>Open()</code> the collection to get a collection object
then call <code>Create()</code> on the opened collection.
<code>defer</code> make it easy for us to remember to close the
collection when we’re done.</p>
<pre class="golang"><code> import (
"fmt"
"os"
"github.com/caltechlibrary/dataset"
)
func main() {
c, err := dataset.Open("fiends.ds")
if err != nil {
fmt.Fprintf(os.Stderr, "something went wrong, %s", err)
os.Exit(1)
}
defer c.Close() // Don't forget to close the collection
id := "frieda"
m := map[string]interface{}{
"id": id,
"name":"Little Frieda",
"email":"[email protected]",
}
// Create adds a map[string]interface{} to the collection.
if err := dataset.Create(id, m); err != nil {
fmt.Fprintf(os.Stderr, "%s",err)
os.Exit(1)
}
fmt.Printf("OK")
os.Exit(0)
}</code></pre>
<p>Go supports easy translation of struct types into JSON encoded byte
slices. Can then use that store the JSON representations using the
<code>CreateObject()</code> to create a JSON object from any Go
type.</p>
<pre class="golang"><code> import (
"encoding/json"
"fmt"
"os"
"github.com/caltechlibrary/dataset"
)
type Record struct {
ID string `json:"id"`
Name string `json:"name,omitempty"`
EMail string `json:"email,omitempty"`
}
func main() {
obj := &Record{
ID: "frieda",
Name: "Little Fieda",
EMail: "[email protected]",
}
if err := dataset.CreateObject("friends.ds", obj.ID, obj); err != nil {
fmt.Fprintf(os.Stderr, "%s", err)
os.Exit(1)
}
fmt.Printf("OK")
os.Exit(0)
}</code></pre>
<p>On the command line create requires us to provide a collection name,
a key (e.g. "frieda") and JSON markup to store the JSON object. We can
provide that either through the command line or by reading in a file or
standard input.</p>
<p>command line --</p>
<div class="sourceCode" id="cb8"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a> <span class="fu">cat</span> <span class="op"><<EOT</span> <span class="op">></span>mojo.json</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="st"> {</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="st"> "id": "mojo",</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="st"> "name": "Mojo Sam, the Yudoo Man", </span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="st"> "email": "[email protected]"</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a><span class="st"> }</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a><span class="st"> EOT</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="st"> cat mojo.json | dataset create friends.ds "mojo"</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a><span class="st"> cat <<EOT >jack.json</span></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a><span class="st"> {</span></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="st"> "id": "jack",</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a><span class="st"> "name": "Jack Flanders", </span></span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a><span class="st"> "email": "[email protected]"</span></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a><span class="st"> </span></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a><span class="st"> EOT</span></span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a><span class="st"> dataset create -i jack.json friends.ds "jack"</span></span></code></pre></div>
<p>in Go we can loop through records easily and add them --</p>
<pre class="golang"><code> // Open the collection
c, err := dataset.Open("friends.ds")
if err != nil {
...
}
defer c.Close()// Don't forget to close the collection
// Create some new records
newRecords := []Record{
Record{
ID: "mojo",
Name: "Mojo Sam",
EMail: "[email protected]",
},
Record{
ID: "jack",
Name: "Jack Flanders",
Email: "[email protected]",
},
}
// Save the new records into the collection
for _, record := range newRecords {
if err := dataset.CreateObject(record.ID, record); err != nil {
fmt.Fprintf(os.Stderr,
"something went wrong add %q, %s\n", record.ID, key)
}
}</code></pre>
<h3 id="read">read</h3>
<p>We have three records in our <em>friends.ds</em> collection —
"frieda", "mojo", and "jack". Let's see what they look like with the
<em>read</em> verb.</p>
<p>command line --</p>
<div class="sourceCode" id="cb10"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> read friends.ds frieda</span></code></pre></div>
<p>On the command line you can easily pipe the results to a file for
latter modification. Let's do this for each of the records we have
created so far.</p>
<div class="sourceCode" id="cb11"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> read <span class="at">-p</span> friends.ds frieda <span class="op">></span>frieda-profile.json</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> read <span class="at">-p</span> friends.ds mojo <span class="op">></span>mojo-profile.json</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> read <span class="at">-p</span> friends.ds jack <span class="op">></span>jack-profile.json</span></code></pre></div>
<p>Working in Go is similar but rather than write out our JSON
structures to a file we're going to keep them in memory as an array of
record structs before converting to JSON and writing it out.</p>
<p>In Go --</p>
<pre class="golang"><code> // Open our collection
c, err := dataset.Open("friends.ds")
if err != nil {
fmt.Fprintf(os.Stderr, "%s\n", err)
os.Exit(1)
}
defer c.Close() // remember to close the collection
// build our list of keys
keys := []string{ "frieda", "mojo", "jack" }
records := []*Record{}
// loop through the list and write the JSON source to file.
for _, key := range keys {
obj := &Record{}
if err := c.ReadObject(key, &obj); err != nil {
fmt.Fprintf(os.Stderr, "%s\n", err)
os.Exit(1)
}
records = append(records, obj)
}
src, _ := json.MarshalIndent(records)
fmt.Println("%s\n", src)
os.Exit(0)</code></pre>
<h3 id="update">update</h3>
<p>Next we can modify the profiles (the *.json files for the command
line version). We're going to add a key/value pair for "catch_phrase"
associated with each JSON object in <em>friends.ds</em>. For Little
Frieda edit freida-profile.json to look like --</p>
<div class="sourceCode" id="cb13"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"_Key"</span><span class="fu">:</span> <span class="st">"frieda"</span><span class="fu">,</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"email"</span><span class="fu">:</span> <span class="st">"[email protected]"</span><span class="fu">,</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"name"</span><span class="fu">:</span> <span class="st">"Little Frieda"</span><span class="fu">,</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"catch_phrase"</span><span class="fu">:</span> <span class="st">"Woweee Zoweee"</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span></code></pre></div>
<p>For Mojo's mojo-profile.json --</p>
<div class="sourceCode" id="cb14"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"_Key"</span><span class="fu">:</span> <span class="st">"mojo"</span><span class="fu">,</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"email"</span><span class="fu">:</span> <span class="st">"[email protected]"</span><span class="fu">,</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"name"</span><span class="fu">:</span> <span class="st">"Mojo Sam, the Yudoo Man"</span><span class="fu">,</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"catch_phrase"</span><span class="fu">:</span> <span class="st">"Feet Don't Fail Me Now!"</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span></code></pre></div>
<p>An Jack's jack-profile.json --</p>
<div class="sourceCode" id="cb15"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"_Key"</span><span class="fu">:</span> <span class="st">"jack"</span><span class="fu">,</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"email"</span><span class="fu">:</span> <span class="st">"[email protected]"</span><span class="fu">,</span></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"name"</span><span class="fu">:</span> <span class="st">"Jack Flanders"</span><span class="fu">,</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"catch_phrase"</span><span class="fu">:</span> <span class="st">"What is coming at you is coming from you"</span></span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span></code></pre></div>
<p>On the command line we can read in the updated JSON objects and save
the results in the collection with the <em>update</em> verb. Like with
<em>init</em> and <em>create</em> the <em>update</em> verb will return
an “OK” or error message. Let's update each of our JSON objects.</p>
<div class="sourceCode" id="cb16"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> update friends.ds freida frieda-profile.json</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> update friends.ds mojo mojo-profile.json</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> update friends.ds jack jack-profile.json</span></code></pre></div>
<p><strong>TIP</strong>: By providing a filename ending in “.json” the
dataset command knows to read the JSON object from disc. If the object
had stated with a "{" and ended with a "}" it would assume you were
using an explicit JSON expression.</p>
<p>In Go we can work with each of the record as
<code>map[string]interface{}</code> variables. We save from our previous
<em>Read</em> example. We add our “catch_phrase” attribute then
<em>Update</em> each record.</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
if err != nil {
// ... handle errors
}
defer c.Close()
// Read our three profiles
friedaProfile := map[string]interface{}{}
if err := c.Read("frieda", fredaProfile); err != nil {
// ... handle error
}
mojoProfile := map[string]interface{}{}
if err := c.Read("mojo", mojoProfile); err != nil {
// ... handle error
}
jackProfile := map[string]interface{}{}
if err := c.Read("jack", jackProfile); err != nil {
// ... handle error
}
// Add our catch phrases
friedaProfile["catch_phrase"] = "Wowee Zowee"
mojoProfile["catch_phrase"] = "Feet Don't Fail Me Now!"
jackProfile["catch_phrase"] = "What is coming at you is coming from you"
// Update our records
if err := c.Update("frieda", friedaProfile); err != "" {
// ... handle error
}
if err := c.Update("mojo", mojoProfile); err != "" {
// ... handle error
}
if err := c.Update("jack", jackProfile); err != nil {
// ... handle error
}</code></pre>
<p>A better approach where we would be to use a Go struct to hold the
profile records. This would ensure that they mapping of attribute names
are consistently handled.</p>
<pre class="golang"><code> import (
"github.com/caltechlibrary/dataset"
)
type Profile struct {
Name string `json:"name"`
EMail string `json:"email,omitempty"`
CatchPhrase string `json:"catech_phrase,omitempty"`
}
func main() {
// Load our minimal records, i.e. name and email
records := map[string]*Profile{}{
"frieda": &Profile{
Key: "frieda",
EMail: "[email protected]",
Name: "Little Frieda",
},
"mojo": &Profile{
Key: "mojo",
EMail: "[email protected]",
Name: "Mojo Sam, the Yudoo Man",
},
"jack": &Profile{
Key: "jack",
EMail: "[email protected]",
Name: "Jack Flanders",
},
}
// Create the collection and add our records
c, err := dataset.Init("friends.ds", "")
if err != nil {
// ... handle errror
}
for key, record := range records {
if err := c.CreateObject(key, recorrd); err != nil {
// ... handle error
}
}
// Add our catch phrases
records["frieda"].CatchPhrase = "Wowee Zowee"
records["mojo"].CatchPhrase = "Feet Don't Fail Me Now!"
records["jack"].CatchPhrase =
"What is coming at you is coming from you"
// Update our records
for key, record := range records {
if err := c.UpdateObject(key, record); err != "" {
// ... handle error
}
}
}</code></pre>
<h3 id="delete">delete</h3>
<p>Eventually you might want to remove a JSON object from the
collection. Let's remove Jack Flander's record for now.</p>
<p>command line --</p>
<div class="sourceCode" id="cb19"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> delete friends.ds jack</span></code></pre></div>
<p>Notice the “OK” in this case it means we've successfully delete the
JSON object from the collection.</p>
<p>An perhaps as you've already guessed working in Go looks like --</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
if err != nil {
// ... handle error
}
defer c.Close()
if err := c.Delete("jack"); err != nil {
fmt.Fprintf(os.Stderr, "%s\n", err)
os.Exit(1)
}
fmt.Println("OK")
os.Exit(0)</code></pre>
<h2 id="keys-and-count">keys and count</h2>
<p>Eventually you have lots of objects in your collection. You are not
going to be able to remember all the keys. dataset provides a
<em>keys</em> function for getting a list of keys as well as a
<em>count</em> to give you a total number of keys.</p>
<p>Now that we've deleted a few things let's see how many keys are in
<em>friends.ds</em>. We can do that with the <em>count</em> verb.</p>
<p>Command line --</p>
<div class="sourceCode" id="cb21"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> count friends.ds</span></code></pre></div>
<p>In Go --</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
if err != nil {
// ... handle error
}
defer c.Close()
cnt = c.Length() // NOTE: this is an int64 value
fmt.Printf("Total Records Now: %d\n", cnt)</code></pre>
<p>Likewise we can get a list of the keys with the <em>keys</em>
verb.</p>
<div class="sourceCode" id="cb23"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> keys friends.ds</span></code></pre></div>
<p>If you are following along in Go then you can just save the keys to a
variable called keys.</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
if err != nil {
// ... handle error
}
defer c.Close()
keys, err = c.Keys()
if err != nil {
// ... handle error
}
fmt.Printf("%s\n", strings.Join(keys, "\n"))</code></pre>
<h2 id="data-frames">Data frames</h2>
<p>JSON objects are tree like. This structure can be inconvenient for
some types of analysis like tabulation, comparing values or generating
summarizing reports. Many languages support a concept of "data frame".
Meaning a list of objects, possibly with associated metadata about how
the list was created. This becomes a convenient way to process data.
Frames can easily be transformed.</p>
<h3 id="the-frame">the frame</h3>
<p>dataset also comes with a <em>frame</em> verb. A <em>frame</em> is an
order list of objects based on a set of keys and metadata about how the
values for the objects we mapped from the collection’s JSON documents.
It is similar to the "data frames" concepts in languages like Julia,
Matlab, Octave, Python and R.</p>
<p>To define a frame we only need two pieces of information, a list of
keys in the collection to be framed and a list of dot notated paths to
map into a set of labels for the object in the frame.</p>
<div class="sourceCode" id="cb25"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frame-create <span class="at">-i</span><span class="op">=</span>friends.keys friends.ds <span class="dt">\</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"name-and-email"</span> <span class="dt">\</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> .name=name .email=email <span class="dt">\</span></span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> .catch_phrase=catch_phrase</span></code></pre></div>
<p>In Go it would look like</p>
<div class="sourceCode" id="cb26"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a> c, err :<span class="op">=</span> dataset.Open(<span class="st">"friends.ds"</span>)</span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a> <span class="op">//</span> ... handle error</span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a> defer c.Close()</span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a> verbose :<span class="op">=</span> true</span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a> keys <span class="op">=</span> c.Keys()</span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a> dotPaths :<span class="op">=</span> []string{ <span class="st">".name"</span>, <span class="st">".email"</span>, <span class="st">".catch_phrase"</span> }</span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a> labels :<span class="op">=</span> []string{ <span class="st">"name"</span>, <span class="st">"email"</span>, <span class="st">"catch_phrase"</span> }</span>
<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> err :<span class="op">=</span> c.FrameCreate(<span class="st">"friends.ds"</span>, <span class="st">"name-and-email"</span>, </span>
<span id="cb26-10"><a href="#cb26-10" aria-hidden="true" tabindex="-1"></a> keys, dotPaths, labels, verbose)<span class="op">;</span> err <span class="op">!=</span> nil {</span>
<span id="cb26-11"><a href="#cb26-11" aria-hidden="true" tabindex="-1"></a> <span class="op">//</span> ... handle error</span>
<span id="cb26-12"><a href="#cb26-12" aria-hidden="true" tabindex="-1"></a> }</span></code></pre></div>
<p>In Go it'd look like</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
// ... handle error
defer c.Close()
frm, err := c.FrameRead("name-and-email")
// ... handle error
src, err := json.MarshalIndent(frm, "", " ")
// ... handle error
fmt.Printf("%s\n", src)</code></pre>
<p>Looking at the resulting JSON object you see other attributes beyond
the object list of the frame. These are created to simplify some of
dataset more complex interactions.</p>
<p>Most of the time you don't want the metadata, so you we have a way of
just retrieving the object list.</p>
<div class="sourceCode" id="cb28"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frame-objects friends.ds <span class="st">"name-and-email"</span></span></code></pre></div>
<p>Or in Go --</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
// ... handle error
defer c.Close()
objects, err := c.FrameObjects("name-and-email")
// ... handle error
src, err := json.MarshalIndent(objects, "", " ")
// ... handle error
fmt.Printf("%s\n", src)</code></pre>
<p>Let's add back the Jack record we deleted a few sections ago and
“reframe” our “name-and-email” frame.</p>
<div class="sourceCode" id="cb30"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a> <span class="co"># Adding back Jack</span></span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> create <span class="at">-i</span> jack-profile.json friends.ds jack</span>
<span id="cb30-3"><a href="#cb30-3" aria-hidden="true" tabindex="-1"></a> <span class="co"># Save all the keys in the collection</span></span>
<span id="cb30-4"><a href="#cb30-4" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> keys friends.ds <span class="op">></span>friends.keys</span>
<span id="cb30-5"><a href="#cb30-5" aria-hidden="true" tabindex="-1"></a> <span class="co"># Now reframe "name-and-email" with the updated friends.keys</span></span>
<span id="cb30-6"><a href="#cb30-6" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> reframe <span class="at">-i</span><span class="op">=</span>friends.keys friends.ds <span class="st">"name-and-email"</span> </span>
<span id="cb30-7"><a href="#cb30-7" aria-hidden="true" tabindex="-1"></a> <span class="co"># Now let's take a look at the frame's objects</span></span>
<span id="cb30-8"><a href="#cb30-8" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frame-objects friends.ds <span class="st">"name-and-email"</span></span></code></pre></div>
<p>Let's try the same thing in Go --</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
// ... handle error
defer c.Close()
if err := c.CreateObject("jack", jackProfile); err != nil {
// ... handle error
}
keys, err := c.Keys()
if err != nil {
// ... handle error
}
if err := c.Reframe("name-and-email", keys); err != nil {
// ... handle error
}
objects, err := c.FrameObjects("name-and-email")
// ... handle error
src, err := json.MarshalIndent(objects, "", " ")
// ... handle error
fmt.Printf("%s\n", src)</code></pre>
<p>We can list the frames in the collection using the <em>frames</em>
verb.</p>
<div class="sourceCode" id="cb32"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frames friends.ds</span></code></pre></div>
<p>In Go --</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
// ... handle error
defer c.Close()
frameNames := c.Frames()
fmt.Printf("%s\n", string.Join(frame_names, "\n"))</code></pre>
<p>In our frame we have previously defined three columns, looking at the
JSON representation of the frame we also see a "labels" attribute.
Labels are used when exporting and synchronizing content between a CSV
file, Google Sheet and a collection (labels become column names).</p>
<p>Labels are the target attribute name. They are set at the time of
frame definition and persist as long as the frame exists. The order of
the columns reflects the order of the pairs defining the dot paths and
labels. In our previous examples we provided the order of the columns
for the frame "name-and-email" as <code>.name</code>,
<code>.email</code>, <code>.catch_phrase</code> dot paths. If we want to
have the labels "ID", "Display Name", "EMail", and "Catch Phrase" we
need to define our frame that way.</p>
<div class="sourceCode" id="cb34"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frame-keys friends.ds <span class="op">></span>keys.json </span>
<span id="cb34-2"><a href="#cb34-2" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frame-delete friends.ds <span class="st">"name-and-email"</span></span>
<span id="cb34-3"><a href="#cb34-3" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frame <span class="at">-i</span> keys.json friends.ds <span class="st">"name-and-email"</span> <span class="dt">\</span></span>
<span id="cb34-4"><a href="#cb34-4" aria-hidden="true" tabindex="-1"></a> <span class="st">"._Key=ID"</span> <span class="st">".name=Display Name"</span> <span class="dt">\</span></span>
<span id="cb34-5"><a href="#cb34-5" aria-hidden="true" tabindex="-1"></a> <span class="st">".email=EMail"</span> <span class="st">".catch_phrase=Catch Phrase"</span></span></code></pre></div>
<p>In Go it might look like</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
// ... handle error
defer c.Close()
verbose := true
keys, err := c.FrameKeys("name-and-email")
// ... handle error
frm, err := c.FrameRead("name-and-email")
// ... handle error
// Retrieve our dot paths and labels then append
// the additional path and label
dotPaths := frm.DotPaths
dotPaths = append(dotPaths, ".catch_phrase")
labels := frm.Labels
labels = append(labels, "catch_phrase")
err := c.FrameDelete("name-and-email")
// ... handle error
err := c.Frame("name-and-email", keys, dotPath, labels, verbose)
if err != nil {
// ... handle error
}</code></pre>
<p>Finally the last thing we need to be able to do is delete a frame.
Delete frames work very similar to deleting a JSON record.</p>
<div class="sourceCode" id="cb36"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a> <span class="ex">dataset</span> frame-delete friends.ds <span class="st">"name-and-email"</span></span></code></pre></div>
<p>Or in Go --</p>
<pre class="golang"><code> c, err := dataset.Open("friends.ds")
// ... handle
defer c.Close()
err := c.FrameDelete("name-and-email")
// ... handle error</code></pre>
<p><strong>TIP</strong>: Frames like collections have a number of
operations. Here's the list</p>
<ol type="1">
<li><p><em>frame</em> will let you define a frame</p></li>
<li><p><em>frame-def</em> will let you read back a frame’s
definition</p></li>
<li><p><em>frame-objects</em> return the frame's object list</p></li>
<li><p><em>frame-keys</em> return the frame's key list</p></li>
<li><p><em>frames</em> will list the frames defined in the collection
columns in a frame, it will cause the frame to regenerate its object
list</p></li>
<li><p><em>delete-frame</em> will remove the frame from the
collection</p></li>
<li><p><em>refresh</em> will let you refresh the objects in a frame from
the current state of the collection, it’ll prune any existing objects in
the frame is they no longer exist.</p></li>
<li><p><em>reframe</em> will take a new list of keys from the colletion
recreating ( (replacing) the objects in the data frame based on the new
list of keys</p></li>
</ol>
</section>
<footer>
<span>© 2022 <a href="https://www.library.caltech.edu/copyright">Caltech Library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span><a href="mailto:[email protected]">Email Us</a></span>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
</footer>
</body>
</html>