forked from tebarkley/unix-tutorial
-
Notifications
You must be signed in to change notification settings - Fork 0
/
commands.html
635 lines (485 loc) · 30.1 KB
/
commands.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Unix Tutorial</title>
<!-- Bootstrap -->
<link href="css/bootstrap.min.css" rel="stylesheet">
<link href="css/style.css" rel="stylesheet">
</head>
<!-- Navigation -->
<nav class="navbar navbar-custom navbar-fixed-top" role="navigation">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse"
data-target=".navbar-main-collapse">
<i class="fa fa-bars"></i>
</button>
<a class="navbar-brand page-scroll" href="#">
<span id="page-title">Practical Unix</span>
</a>
</div>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse navbar-right navbar-main-collapse">
<ul class="nav navbar-nav">
<li><a href="index.html">Data Manipulation</a></li>
<li class="active"><a href="commands.html">Unix Basics</a></li>
<li><a href="https://github.com/tebarkley/unix-tutorial"
target="_blank" class="gh-link"><img
src='images/GitHub-Mark-32px.png' alt='github'></a></li>
</ul>
</div>
<!-- /.navbar-collapse -->
</div>
<!-- /.container -->
</nav>
<header>
<div class="container">
<div class="row">
</div>
</div>
</header>
<!-- Begin Body -->
<body id="page-top" data-spy="scroll" data-target='.sidebar'>
<div class="container">
<div class="row">
<div class="col-md-3">
<div class="sidebar-nav">
<div class="navbar navbar-default sidebar" role="navigation">
<ul class="nav nav-tabs sidenav" role="navigation">
<li><a href="#using" class="list-group-item">Using UNIX
</a></li>
<li><a href="#navigation" class="list-group-item">Navigating the File
Directory
</a></li>
<li><a href="#permissions" class="list-group-item">Permissions
</a></li>
<li><a href="#remote" class="list-group-item">Connecting to Remote Machines
</a></li>
<li><a href="#basic" class="list-group-item">Basic Data Manipulation
</a></li>
<li><a href="#advanced" class="list-group-item">Advanced Data Manipulation
</a></li>
<li><a href="#resources" class="list-group-item">Learn More
</a></li>
</ul>
</div>
</div>
</div>
<div class="col-md-9">
<section>
<!-- ############################### Introduction ###############################-->
<p class='intro-text'><br>This is a very basic introduction to UNIX, designed for the UC
Berkeley School of Information's Data Mining and Analytics class. It is
focused on commands to help you obtain and investigate data sets. This page reviews basic commands. Click over to the Data Manipulation page to practice applying these commands to analyze a data set of Yelp reviews.</p>
<hr class="col-md-12">
</section>
<section id='using'>
<!-- ############################### Basic Concepts ###############################-->
<h2>Using UNIX</h2>
<h4>Accessing the shell</h4>
<p>On a Mac computer, the shell can be accessed through the default application called <b>Terminal</b>. </p>
<p>On Windows, if you want to run UNIX commands directly on your machine, you need to download a tool. The most commonly used is <a href=https://cygwin.com/install.html>cygwin</a>. Cygwin has a lot of functionality that you may or may not want to install- you might check out <a href=https://www.youtube.com/watch?v=TjxEH_tr7e0>this tutorial</a> or others if you want to install it. Alternatively, if you have installed git on your windows machine, you probably have git bash, which you can use to practice the commands in this tutorial. Finally, instead of running the commands locally, you could instead log on to a remote machine with a UNIX-based operating system and work with your data there. If you are an I School student, you can use your I School computing account to log in to one of the I School linux servers. If you aren't an I School student but are enrolled in an I School class, ask your instructors for an I School computing account.</p>
<h4>Basic Command Structure</h4>
<p>The general form of a UNIX command is:</p>
<pre> <code><command> [-option(s)] [arguments]</code></pre>
<p>A command is executed by hitting the <i>Enter</i> key. In general (with some exceptions):
<ul>
<li>Command names are always lowercase</li>
<li>Options are preceded by a <code>-</code></li>
<li>If an option takes more than one argument then they should be separated by commas with no spaces, or if spaces are used the string should be included in double quotes (").</li>
<li>Options precede other arguments on the command line</li>
<li>Multiple options can usually be stringed together with one preceding <code>-</code>. For instance,
<code>ls -a -l -R</code> is equivalent to <code>ls -alR</code>.</li>
<li>The order of the options does not matter</li>
<li>The order of arguments may be important</li>
</ul>
</p>
<h4>Standard Input and Output Redirection</h4>
<p>UNIX processes, such as the commands we are practicing here, open three files: standard input, standard output, and standard error. For our purposes, we'll focus on standard output: you can read more about standard input, output, and error <a href='http://sc.tamu.edu/help/general/unix/redirection.html'>here</a>.</p>
<p> By default, the output of a command gets printed to the screen. We'll use the following to redirect standard output:
<ul>
<li><code>></code> to redirect output to a file (creating a new file, or overwriting an existing file)</li>
<li><code>>></code> to append output to a file</li>
<li><code>|</code> to redirect output to another command</li>
</ul></p>
<h4 id='man'>Getting help</h4>
<p>The <code>man</code> command lets you view UNIX manual pages within the
shell. For example, typing <code>man cat</code> will display the online
manual for the <code>cat</code> command.</p>
<img src='images/man-cat.jpg' alt='man-cat' id='man-cat'>
<p id='cat-caption'><b><i>Just think of this guy when you're not sure how to
use a UNIX command!</b></i></p>
<pre> <code>man cat</code></pre>
<samp>
<p class="p1">CAT(1)
BSD General Commands Manual
CAT(1)</p>
<p class="p3">NAME</p>
<p class="p1"> <span class="s1">cat</span> --
concatenate and print files</p>
<p class="p3">SYNOPSIS</p>
<p class="p1"> <span class="s1">cat</span> [<span
class="s1">-benstuv</span>] [<span class="s2">file</span> <span
class="s2">...</span>]</p>
<p class="p3">DESCRIPTION</p>
<p class="p1"> The <span class="s1">cat</span> utility
reads files sequentially, writing them to the standard out-</p>
<p class="p1"> put. The <span
class="s2">file</span> operands are processed in command-line order.
If <span class="s2">file</span> is a</p>
<p class="p1"> single dash (`-') or absent, <span
class="s1">cat</span> reads from the standard input. If <span
class="s2">file</span> is</p>
<p class="p1"> a UNIX domain socket, <span class="s1">cat</span>
connects to it and then reads it until EOF. This</p>
<p class="p1"> complements the UNIX domain binding
capability available in inetd(8).</p>
<p class="p1"> </p>
<p class="p2"> The options are as follows:</p>
<p class="p1"> <span class="s1">-b</span>
Number the non-blank output lines, starting at 1.</p>
<p class="p1"> <span class="s1">-e</span>
Display non-printing characters (see the <span
class="s1">-v</span> option), and display a</p>
<p class="p1"> dollar sign
(`$') at the end of each line.</p>
</samp>
<br><br>
<p>The following key strokes are used to navigate through the
<code>man</code> pages:
<ul>
<li><i>spacebar</i> or <i>f</i> to page down</li>
<li><i>b</i> to page up</li>
<li><i>/searchterm[Enter]</i> to first page containing searchterm</li>
<li><i>q</i> to quit the man page and return to the command prompt</li>
</ul>
</p>
<p>
<h4>Text Editors</h4>
<p>You can create and edit text files directly from your terminal. <code>vi</code> and <code>emacs</code> are two commonly used text editors with a lot of rich functionality. There are also simpler editors like <code>pico</code>, which is available on the I School server. To use a text editor, type in the name of the editor followed by the filename, for example: <code>pico example.txt</code>. The editor will then display your file for editing.</p>
</section>
<hr class="col-md-12">
<!-- ############################### Navigating the File Directory ###############################-->
<section id='navigation'>
<h2>Navigating the File Directory</h2>
<p>This section shows commands to navigate your directory folders and view,
create, move and remove directories and files. Remember to use <code>man
<command></code> to see full documentation for any particular
command.</p>
<ul>
<h4>Seeing where you are in the file system</h4>
<li><code>pwd</code> prints the path to the current directory.</li>
<li><code>$HOME</code> is an environment variable that stores the location
of your home directory.
</li>
<p class='sub-point'>The <code>~</code> is a shortcut for
<code>$HOME</code>.</p>
<p class='sub-point'>The <code>echo</code> command simply displays text to
standard output. Thus, <code>echo $HOME</code> displays the location of
your home directory.</p>
<li><code>ls</code> lists the contents of a directory.</li>
<p class='sub-point'><code>ls -l</code> lists file and directory details
(date, size, permissions).</p>
<p class='sub-point'><code>ls -a</code> lists all files and directories
(including hidden files).</p>
<h4>Changing directories</h4>
<li><code>cd <directory path></code> changes your working directory.
</li>
<p class='sub-point'><code>cd ..</code> moves up a level to the parent
directory.</p>
<h4>Creating and removing directories</h4>
<li><code>mkdir <directory name></code> makes a <b>new</b> directory.
</li>
<li><code>rmdir <directory name></code> deletes an <b>empty</b>
directory.
</li>
<li><code>rm <file></code> deletes a file.</li>
<p class='sub-point'><code>rm -R <directory name></code> deletes a
directory and all of its contents, recursively.</p>
<h4>Copying, moving, and renaming files</h4>
<li><code>cp <file1> <file2></code> copies file1 and calls it
file2.
</li>
<p class='sub-point'><code>cp <file> .</code> copies the specified
file to the current directory.</p>
<li><code>mv <file1> <file2></code> renames file1 as file2. If
different paths are indicated, then this command moves the file.
</li>
<h4>Inspecting files</h4>
<li><code>cat <file></code> displays a file.</li>
<li><code>less <file></code> displays a file one page at a time.
Navigate in the <a href="#man">same way</a> as a <code>man</code> file.
</li>
<li><code>wc <file></code> counts the characters, words, and lines in
a file.
</li>
<p class='sub-point'><code>wc -c</code> returns just the number of <b>characters</b>
in the file.</p>
<p class='sub-point'><code>wc -w</code> returns just the number of
<b>words</b> in the file.</p>
<p class='sub-point'><code>wc -l</code> returns just the number of
<b>lines</b> in the file.</p>
<h4>Compressing files</h4>
<li><code>zip <file></code> and <code>unzip <file></code> zip
and unzip files.
</li>
</ul>
</section>
<hr class="col-md-12">
<!-- ############################### Permissions ###############################-->
<section id='permissions'>
<h2>Permissions</h2>
<p>This section explains how file and directory permisisons are represented
and how to change them.</p>
<h4>Viewing permissions</h4>
<p>When you type the command <code>ls -l</code> (list long), the first column
specifies the current permissions of all (non-hidden) files and directories
located in your current directory. It may look like this:</p>
<pre> <code>drwxr-xr-x</code></pre>
<p>The first character specifies the file type: <i>-</i> for an ordinary file, <i>d</i> for a directory, and <i>l</i> for a symbolic link</p>
<p>The next nine characters are in groups of three, each representing the permissions for one of UNIX's three permission-tier categories: <b>user</b> (the owner of the file, listed in the 3rd field of <code>ls -l</code>), <b>group</b>, (the group owner of the file, listed in the 4th field of <code>ls -l</code>), and <b>other</b> (the permissions for everybody else). </p>
<p>The first character of each group of three is either a <b>r</b> or a <b>-</b> to indicate whether that category of user has <b>read</b> permission over the file. For an ordinary file, read permission means that category of user can view and open the file. For a directory, read permission means that the list of filenames stored in the directory is viewable (ie that category of user can use the <code>ls</code> command on that directory).</p>
<p>The second character of each group of three is either a <b>w</b> or a <b>-</b> to indicate whether that category of user has <b>write</b> permission over the file. For an ordinary file, write permission means that category of user can edit the file. For a directory, write permission means that the category of user can create or delete files within the directory.</p>
<p>The third character of each group of three is either a <b>x</b> or a <b>-</b> to indicate whether that category of user has <b>execute</b> permission over the file. For an ordinary file, execute permission means that the category of user can run the file as code. For a directory, execute permission means that the category of user can 'pass through' the directory (ie that category of user can use the <code>cd</code> command to navigate through the directory.)</p>
<p>Thus, the code snippet shown at the beginning of this section refers to a directory for which the user (owner) category has read, write, and execute privileges while the group and other categories have only read and execute privileges.</p>
<img src='images/permissions.png' alt='Permissions' id='permissions-img'>
<br>
<h4>Changing permissions</h4>
<p>The permissions of a file can be changed with the <code>chmod</code> command, which lets you add or remove permissions for the different permission tiers. From the previous example, adding write permissions for the group category to a file called example.txt would be done as so:</p>
<pre> <code>chmod g+w example.txt</code></pre>
<p>To then add write permissions for the other category:</p>
<pre> <code>chmod o+w example.txt</code></pre>
<p>To then remove write and execute permissions for the group and other categories:</p>
<pre> <code>chmod go-xw example.txt</code></pre>
<p>The above examples illustrate how the <code>chmod</code> command works: <code>chmod</code> is followed by a sequence of characters denoting the <b>user categories</b> for which to change permissions (<code>u</code> for user and/or <code>g</code> for group and/or <code>o</code> for other, or <code>a</code> for all (same as <code>ugo</code>), the <b>action</b> to take (<code>+</code> to add permissions or <code>-</code> to remove permissions), and the <b>permission(s)</b> to change (<code>r</code> for read, <code>w</code> for write, and/or <code>x</code> for execute). This sequence of characters is followed by the file name for which permission changes are to be applied.</p>
</section>
<hr class="col-md-12">
<!-- ############################### Connecting to Remote Machines ###############################-->
<section id='remote'>
<h2>Connecting to Remote Machines</h2>
<p>This section shows commands to connect to remote UNIX machines using
<code>ssh</code> and securely transfer files using <code>scp</code>.</p>
<h4>ssh</h4>
<p>The <code>ssh</code> command lets you securely connect to a remote UNIX
machine. You can use this to, for example, remotely connect to the I School server. If your I School computing account user name is <i> user1</i>, you could log in to the I School UNIX machine like so:</p>
<pre> <code>ssh [email protected]</code></pre>
<p>After being prompted for your password, you would be taken to your home directory on the I School server</p>
<p>If you use Windows and don't have a program like cygwin that lets you run UNIX commands locally, you can download <a href=http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html>putty</a> which will let you ssh into UNIX servers. The screenshot below shows an example of using ssh to log in to the I School server.</p>
<img src='images/putty.PNG' alt='putty' id='putty'>
<h4>scp</h4>
<p>The <code>scp</code> command allows for the secure transferring files to and
from a remote machine. With <code>scp</code>, the remote location is specified as <i>hostname:filename</i></p>
<p>To copy a file called <i>example.txt</i> from your current local directory to a directory called <i>data</i> that's located within your home directory on hostname <i>ischool.berkeley.edu</i>:</p>
<pre> <code>scp example.txt [email protected]:data/</code></pre>
<p>To copy a file called <i>example.txt</i> in the <i>data</i> directory of your home directory on hostname <i>ischool.berkeley.edu</i> into your current local directory:</p>
<pre> <code>scp [email protected]:data/example.txt .</code></pre>
<p>In the above case, the <code>.</code> stands for your current directory.</p>
<p>The <code>-r</code> option of <code>scp</code> copies an entire directory tree. To copy your entire remote home directory to your current local directory:</p>
<pre> <code>scp -r [email protected]:. .</code></pre>
<h4>wget</h4>
<p>The <code>wget</code> command lets you download files from the web. The following command downloads a zip file from example.com</p>
<pre> <code>wget http://example.com/example.zip</code></pre>
</section>
<hr class="col-md-12">
<!-- ############################### Basic Filters ###############################-->
<section id='basic'>
<h2>Basic Data Manipulation</h2>
<p>This section shows some simple commands that can be used to subset data
for output to a file or for input into another command.</p>
<ul>
<li><p>The <code>head</code> command lets you view the first <i>n</i> lines of a
file. By default, <code>head</code> displays the first 10 lines. Use the
<code>-n</code> option to specify the number of lines. In files with a
header, <code>head -1 <file></code> will display the header.</p></li>
<li><p>The <code>tail</code> command lets you view the last <i>n</i> lines of a
file. Like head, it displays the last 10 lines by default. Use the
<code>-n</code> option to specify the number of lines. With tail, it is
also possible to specify where you want to begin displaying the file, using
a <i>+</i>. <code> tail +2 file</code> will display a file from the second
line onward. In files with a header, this command displays every line
except the header.</p></li>
<li><p>The <code>sort</code> command lets you sort lines of a text file.</p></li>
<p class='sub-point'><code>sort -b <file></code> ignores leading blanks</p>
<p class='sub-point'><code>sort -f <file></code> ignores case of letters (folds lower case to upper case characters)</p>
<p class='sub-point'><code>sort -n <file></code> compares according to string numerical value</p>
<p class='sub-point'><code>sort -r <file></code> reverses the sorting order</p>
<p class='sub-point'><code>sort -u <file></code> removes repeated lines</p>
<p class='sub-point'><code>sort -k n <file></code> sorts on the nth field</p>
<p class='sub-point'><code>sort -t<i>delim</i> <file></code> specifies the field delimiter as <i>delim</i></p>
<li><p>The <code>cut</code> command lets you extract data from a file by column or field. This command can be executed in three forms:</p></li>
<p class='sub-point'><code>cut -b <list> <file></code>, where the list after the <code>-b</code> option specifies byte position
<p class='sub-point'><code>cut -c <list> <file></code>, where the list after the <code>-c</code> option specifies character position</p>
<p class='sub-point'><code>cut -f <list> -d <delim> <file></code>, where the list after the <code>-f</code> option specifies field position (column number) and the argument after <code>-d</code> is the file delimiter.</p>
<li><p>The <code>uniq</code> command lets you find unique (or duplicated) lines
in a file. It requires input to be sorted, so typically output from <code>sort</code> is piped to <code>uniq</code>.</p></li>
<p class='sub-point'><code>uniq -u <file></code> selects non-repeating lines</p>
<p class='sub-point'><code>uniq -d <file></code> selects one copy of duplicated lines</p>
<p class='sub-point'><code>uniq -c <file></code> prepends each line with a count of the number of times it occurs</p>
<li><p>The <code>tr</code> command lets you do character substitution. The format of the <code>tr</code> command is <code>tr -option(s) 'to_replace' 'replace_with' standard_input</code>, where the to_replace and replace_with expressions are character sequences of the same length. Unlike the other commands in this section, <code>tr</code> does not take a filename as an argument; rather, it can take input from a pipe, or you can redirect input from a file by doing <code>tr -option(s) 'to_replace' 'replace_with' < filename</code></p></li>
<p class='sub-point'><code>tr -d 'char(s)'</code> deletes characters matching the specified sequence</p>
<p class='sub-point'><code>tr -s 'char'</code> reduces repeating consecutive occurrences of the specified character to a single character. This is useful for eliminating redundant spaces, by doing <code>tr -s ' '</code></p>
</ul>
</section>
<hr class="col-md-12">
<!-- ############################### Advanced Filters ###############################-->
<section id='advanced'>
<h2>Advanced Data Manipulation</h2>
<p> This section covers more advanced commands for manipulating data. It
starts with an overview of <i>regular expressions</i>, which the commands
in this section make great use of. It then goes over the <code>grep</code>,
<code>sed</code>, and <code>awk</code> commands.
<h3>Regular Expressions</h3>
<p>The commands in this section take advantage of regular expressions to do
advanced pattern matching. Here, we list some of the essential regular
expressions to know. If you are new to regular expressions or want to learn
more, please visit some of the suggested resources.</p>
<p><b>Representing single characters</b>
<ul>
<li><code>.</code> matches anything</li>
<li><code>\d</code> matches a digit</li>
<li><code>\D</code> matches a non-digit</li>
<li><code>\s</code> matches whitespace</li>
<li><code>\S</code> matches non-whitespace</li>
<li><code>a</code> matches a</li>
<li><code>[abc]</code> matches any character a, b, or c</li>
<li><code>[a-c]</code> matches any character a, b, or c</li>
<li><code>[a-zA-Z]</code> matches any character a-z or A-Z</li>
<li><code>[a-zA-Z0-9]</code> matches any character a-z or A-Z or number
</li>
<li><code>[^abc]</code> matches any character not a, b, or c</li>
</ul>
</p>
<p><b>Specifying Sequences</b>
<ul>
<li><code>a*</code> matches 0 or more occurences of a</li>
<li><code>a?</code> matches 0 or 1 occurences of a</li>
<li><code>a+</code> matches 1 or more occurences of a</li>
<li><code>a{x}</code> matches x occurences of a</li>
<li><code>a{x,}</code> matches at least x occurences of a</li>
<li><code>a{x,y}</code> matches between x and y occurences of a</li>
</ul>
</p>
<p><b>Specifying Locations</b>
<ul>
<li><code>^abc</code> matches abc at the beginning of the line</li>
<li><code>abc$</code> matches abc at the end of the line</li>
</ul>
</p>
<p>It is important to note that regular expressions make use of <i>‘metacharacters’</i>
like <code>*</code>, <code>?</code>, <code>+</code>, etc. If you want to
use regular expressions to, for example, find lines that end with a
question mark, you would need to escape the question mark with a backslash
like so: <code>\?$</code>.</p>
<!-- ############################### GREP ###############################-->
<h3>Grep</h3>
<p>The <code>grep</code> command searches for patterns within files, and
returns lines that have a match.The format of the grep command is:</p>
<pre> <code>grep [-option(s)] ['pattern'] [filename(s)]</code></pre>
<p>Some useful grep options include:
<ul>
<li><code>-i</code> to ignore the case of the pattern when searching in the
file
</li>
<li><code>-v</code> to select lines that don't match the pattern</li>
<li><code>-w</code> to select only lines where the match is a whole word
(rather than a part of the word)
</li>
<li><code>-f</code> to read in patterns from a file, one pattern per line.
</li>
</ul>
</p>
<p>The pattern given to grep can be a simple string or a more complex regular
expression. To be safe, it should always be put in quotes so that the shell
does not interpret and replace the metacharacters.</p>
<!-- ############################### SED ###############################-->
<h3>Sed</h3>
<p><code>Sed</code> stands for stream editor. It can do a lot, but we’ll
focus on its usage in substitution.The format of the sed command is:</p>
<pre> <code>sed [-option(s)] ['address action']</code></pre>
<p>The <b>address</b> specifies a line number or a range of line numbers:
<code>(1,7)</code> would select lines 1-7. The <b>actions</b> include
deleting, printing, appending, and replacing text.</p>
<p>For substitution, the ‘address action’ part of the sed command takes the
following form: <i>address/toreplace/replacewith/flags</i>.
<i>toreplace</i> can be a regular expression. An important flag to know is
<code>g</code>, which replaces all occurences of <i>toreplace</i> with <i>replacewith</i>.
Without the <code>g</code> flag, sed only replaces the first occurence.</p>
<!-- ############################### AWK ###############################-->
<h3>Awk</h3>
<p><code>Awk</code> is a programming language useful for processing tabular
data. The format for an <code>awk</code> command is:</p>
<pre> <code>awk [-option(s)] ['selection_criteria {action}'] [filename(s)]</code></pre>
<p>Use the <code>-F</code>
option to specify the field delimiter for the file. The default delimiter
for <code>awk</code> is contiguous spaces and tabs.</p>
<p>The <b>selection_criteria</b> are like the addressing in <code>sed</code>,
but can take advantage of <code>awk</code> built in variables. Built-in
variables include:
<ul>
<li><code>$1</code> for line 1</li>
<li><code>$0</code> for all lines</li>
<li><code>NR</code> for record number</li>
</ul>
</p>
<p>The <b>selection_criteria</b> can take advantage of operators to subset
the data. Operators include:
<ul>
<li><code><</code>, <code><=</code>, <code>></code>, <code>>=</code> for
less than, less than or equal to, greater than, and greater than or equal
to
</li>
<li><code>==</code> for equal to</li>
<li><code>!=</code> for not equal to</li>
<li><code>~</code> to match a regular expression</li>
<li><code>!~</code> to match everything but a regular expression</li>
<li><code>&&</code> for logical and</li>
<li><code>||</code> for logical or</li>
<li><code>!</code> for logical not</li>
</ul>
</p>
<p>To use regular expressions in an <code>awk</code>
<b>selection_criteria</b>, enclose them in forward slashes.The below
command prints out lines from a csv file for which the first column begins
with ‘words’ or ‘Words’:</p>
<pre> <code>awk -F, '$1 ~ /^[Ww]ords/ { print }' test.csv</code></pre>
<p>For our tutorial purposes, the action will be a print statement. In more
advanced contexts, it can be an awk program.</p>
</section>
<hr class="col-md-12">
<!-- ############################### Learn More ###############################-->
<section id='resources'>
<h2>Learn More</h2>
<p>There is a lot more to UNIX than is explained on this short page. We
encourage you to check out the following resources to learn more</p>
<p>
<ul>
<li><a
href=http://people.ischool.berkeley.edu/~kevin/unix-tutorial/toc.html>I
School Unix Tutorial</a></li>
<li><a
href=http://www.ee.surrey.ac.uk/Teaching/Unix/>UNIX Tutorial for Beginners</a></li>
<li><a
href=http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=PracticalUnix>
Practical Unix videos from Stanford</a></li>
<li><a href=http://www.gregreda.com/2013/07/15/unix-commands-for-data-science/>Useful UNIX Commands for Data Science</a></li>
<li><a
href=http://regexone.com/lesson/0>
Tutorial on Regular Expressions</a></li>
</ul>
</p>
</section>
<hr class="col-md-12">
</div>
</div>
</div>
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="js/bootstrap.min.js"></script>
<script src="js/script.js"></script>
</body>
</html>