commands.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Unix Tutorial</title>

  <!-- Bootstrap -->
  <link href="css/bootstrap.min.css" rel="stylesheet">
  <link href="css/style.css" rel="stylesheet">

</head>
<!-- Navigation -->
<nav class="navbar navbar-custom navbar-fixed-top" role="navigation">
  <div class="container-fluid">
    <div class="navbar-header">
      <button type="button" class="navbar-toggle" data-toggle="collapse"
              data-target=".navbar-main-collapse">
        <i class="fa fa-bars"></i>
      </button>
      <a class="navbar-brand page-scroll" href="#">
        <span id="page-title">Practical Unix</span>
      </a>
    </div>

    <!-- Collect the nav links, forms, and other content for toggling -->
    <div class="collapse navbar-collapse navbar-right navbar-main-collapse">

      <ul class="nav navbar-nav">
        <li><a href="index.html">Data Manipulation</a></li>
        <li class="active"><a href="commands.html">Unix Basics</a></li>
        <li><a href="https://github.com/tebarkley/unix-tutorial"
               target="_blank" class="gh-link"><img
            src='images/GitHub-Mark-32px.png' alt='github'></a></li>
      </ul>
    </div>
    <!-- /.navbar-collapse -->
  </div>
  <!-- /.container -->
</nav>

<header>
  <div class="container">
    <div class="row">
    </div>
  </div>
</header>

<!-- Begin Body -->

<body id="page-top" data-spy="scroll" data-target='.sidebar'>

<div class="container">
<div class="row">
<div class="col-md-3">
  <div class="sidebar-nav">
    <div class="navbar navbar-default sidebar" role="navigation">
      <ul class="nav nav-tabs sidenav" role="navigation">
        <li><a href="#using" class="list-group-item">Using UNIX
        </a></li>
        <li><a href="#navigation" class="list-group-item">Navigating the File
          Directory
        </a></li>
        <li><a href="#permissions" class="list-group-item">Permissions
        </a></li>
        <li><a href="#remote" class="list-group-item">Connecting to Remote Machines
        </a></li>
        <li><a href="#basic" class="list-group-item">Basic Data Manipulation
        </a></li>
        <li><a href="#advanced" class="list-group-item">Advanced Data Manipulation
        </a></li>
        <li><a href="#resources" class="list-group-item">Learn More
        </a></li>
      </ul>
    </div>
  </div>
</div>
<div class="col-md-9">
<section>
  <!-- ############################### Introduction ###############################-->
  <p class='intro-text'><br>This is a very basic introduction to UNIX, designed for the UC
    Berkeley School of Information's Data Mining and Analytics class. It is
    focused on commands to help you obtain and investigate data sets. This page reviews basic commands. Click over to the Data Manipulation page to practice applying these commands to analyze a data set of Yelp reviews.</p>
  <hr class="col-md-12">
</section>

<section id='using'>
  <!-- ############################### Basic Concepts ###############################-->
  <h2>Using UNIX</h2>

  <h4>Accessing the shell</h4>

  <p>On a Mac computer, the shell can be accessed through the default application called <b>Terminal</b>. </p>
  <p>On Windows, if you want to run UNIX commands directly on your machine, you need to download a tool. The most commonly used is <a href=https://cygwin.com/install.html>cygwin</a>. Cygwin has a lot of functionality that you may or may not want to install- you might check out <a href=https://www.youtube.com/watch?v=TjxEH_tr7e0>this tutorial</a> or others if you want to install it. Alternatively, if you have installed git on your windows machine, you probably have git bash, which you can use to practice the commands in this tutorial. Finally, instead of running the commands locally, you could instead log on to a remote machine with a UNIX-based operating system and work with your data there. If you are an I School student, you can use your I School computing account to log in to one of the I School linux servers. If you aren't an I School student but are enrolled in an I School class, ask your instructors for an I School computing account.</p>

  <h4>Basic Command Structure</h4>

  <p>The general form of a UNIX command is:</p>

  <pre> <code>&lt;command&gt; [-option(s)] [arguments]</code></pre>
  
  <p>A command is executed by hitting the <i>Enter</i> key. In general (with some exceptions):
      <ul>
        <li>Command names are always lowercase</li>
        <li>Options are preceded by a <code>-</code></li>
        <li>If an option takes more than one argument then they should be separated by commas with no spaces, or if spaces are used the string should be included in double quotes (").</li>
        <li>Options precede other arguments on the command line</li>
        <li>Multiple options can usually be stringed together with one preceding <code>-</code>. For instance, 
          <code>ls -a -l -R</code> is equivalent to <code>ls -alR</code>.</li>
        <li>The order of the options does not matter</li>
        <li>The order of arguments may be important</li>
      </ul>  
  </p>

  <h4>Standard Input and Output Redirection</h4>

  <p>UNIX processes, such as the commands we are practicing here, open three files: standard input, standard output, and standard error. For our purposes, we'll focus on standard output: you can read more about standard input, output, and error <a href='http://sc.tamu.edu/help/general/unix/redirection.html'>here</a>.</p>

  <p> By default, the output of a command gets printed to the screen. We'll use the following to redirect standard output:
    <ul>
      <li><code>></code> to redirect output to a file (creating a new file, or overwriting an existing file)</li>
      <li><code>>></code> to append output to a file</li>
      <li><code>|</code> to redirect output to another command</li>
    </ul></p>

  <h4 id='man'>Getting help</h4>

  <p>The <code>man</code> command lets you view UNIX manual pages within the
    shell. For example, typing <code>man cat</code> will display the online
    manual for the <code>cat</code> command.</p>

  <img src='images/man-cat.jpg' alt='man-cat' id='man-cat'>

  <p id='cat-caption'><b><i>Just think of this guy when you're not sure how to
    use a UNIX command!</b></i></p>


  <pre> <code>man cat</code></pre>
  <samp>
    <p class="p1">CAT(1)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
      &nbsp; &nbsp; BSD General Commands Manual &nbsp; &nbsp; &nbsp; &nbsp;
      &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; CAT(1)</p>

    <p class="p3">NAME</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; <span class="s1">cat</span> --
      concatenate and print files</p>

    <p class="p3">SYNOPSIS</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; <span class="s1">cat</span> [<span
        class="s1">-benstuv</span>] [<span class="s2">file</span> <span
        class="s2">...</span>]</p>

    <p class="p3">DESCRIPTION</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; The <span class="s1">cat</span> utility
      reads files sequentially, writing them to the standard out-</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; put.&nbsp; The <span
        class="s2">file</span> operands are processed in command-line order.&nbsp;
      If <span class="s2">file</span> is a</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; single dash (`-&#39;) or absent, <span
        class="s1">cat</span> reads from the standard input.&nbsp; If <span
        class="s2">file</span> is</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; a UNIX domain socket, <span class="s1">cat</span>
      connects to it and then reads it until EOF.&nbsp; This</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; complements the UNIX domain binding
      capability available in inetd(8).</p>

    <p class="p1">&nbsp;</p>

    <p class="p2">&nbsp; &nbsp; &nbsp;The options are as follows:</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; <span class="s1">-b</span>&nbsp; &nbsp;
      &nbsp; Number the non-blank output lines, starting at 1.</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; <span class="s1">-e</span>&nbsp; &nbsp;
      &nbsp; Display non-printing characters (see the <span
          class="s1">-v</span> option), and display a</p>

    <p class="p1">&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; dollar sign
      (`$&#39;) at the end of each line.</p>
  </samp>
  <br><br>

  <p>The following key strokes are used to navigate through the
    <code>man</code> pages:
  <ul>
    <li><i>spacebar</i> or <i>f</i> to page down</li>
    <li><i>b</i> to page up</li>
    <li><i>/searchterm[Enter]</i> to first page containing searchterm</li>
    <li><i>q</i> to quit the man page and return to the command prompt</li>
  </ul>
  </p>

  <p>

  <h4>Text Editors</h4>

  <p>You can create and edit text files directly from your terminal. <code>vi</code> and <code>emacs</code> are two commonly used text editors with a lot of rich functionality. There are also simpler editors like <code>pico</code>, which is available on the I School server. To use a text editor, type in the name of the editor followed by the filename, for example: <code>pico example.txt</code>. The editor will then display your file for editing.</p>
</section>

<hr class="col-md-12">
<!-- ############################### Navigating the File Directory ###############################-->
<section id='navigation'>
  <h2>Navigating the File Directory</h2>

  <p>This section shows commands to navigate your directory folders and view,
    create, move and remove directories and files. Remember to use <code>man
      &lt;command&gt;</code> to see full documentation for any particular
    command.</p>
  <ul>
    <h4>Seeing where you are in the file system</h4>
    <li><code>pwd</code> prints the path to the current directory.</li>

    <li><code>$HOME</code> is an environment variable that stores the location
      of your home directory.
    </li>
    <p class='sub-point'>The <code>~</code> is a shortcut for
      <code>$HOME</code>.</p>

    <p class='sub-point'>The <code>echo</code> command simply displays text to
      standard output. Thus, <code>echo $HOME</code> displays the location of
      your home directory.</p>

    <li><code>ls</code> lists the contents of a directory.</li>
    <p class='sub-point'><code>ls -l</code> lists file and directory details
      (date, size, permissions).</p>

    <p class='sub-point'><code>ls -a</code> lists all files and directories
      (including hidden files).</p>


    <h4>Changing directories</h4>
    <li><code>cd &lt;directory path&gt;</code> changes your working directory.
    </li>
    <p class='sub-point'><code>cd ..</code> moves up a level to the parent
      directory.</p>

    <h4>Creating and removing directories</h4>
    <li><code>mkdir &lt;directory name&gt;</code> makes a <b>new</b> directory.
    </li>

    <li><code>rmdir &lt;directory name&gt;</code> deletes an <b>empty</b>
      directory.
    </li>

    <li><code>rm &lt;file&gt;</code> deletes a file.</li>
    <p class='sub-point'><code>rm -R &lt;directory name&gt;</code> deletes a
      directory and all of its contents, recursively.</p>

    <h4>Copying, moving, and renaming files</h4>
    <li><code>cp &lt;file1&gt; &lt;file2&gt;</code> copies file1 and calls it
      file2.
    </li>
    <p class='sub-point'><code>cp &lt;file&gt; .</code> copies the specified
      file to the current directory.</p>

    <li><code>mv &lt;file1&gt; &lt;file2&gt;</code> renames file1 as file2. If
      different paths are indicated, then this command moves the file.
    </li>


    <h4>Inspecting files</h4>
    <li><code>cat &lt;file&gt;</code> displays a file.</li>
    <li><code>less &lt;file&gt;</code> displays a file one page at a time.
      Navigate in the <a href="#man">same way</a> as a <code>man</code> file.
    </li>

    <li><code>wc &lt;file&gt;</code> counts the characters, words, and lines in
      a file.
    </li>
    <p class='sub-point'><code>wc -c</code> returns just the number of <b>characters</b>
      in the file.</p>

    <p class='sub-point'><code>wc -w</code> returns just the number of
      <b>words</b> in the file.</p>

    <p class='sub-point'><code>wc -l</code> returns just the number of
      <b>lines</b> in the file.</p>


    <h4>Compressing files</h4>
    <li><code>zip &lt;file&gt;</code> and <code>unzip &lt;file&gt;</code> zip
      and unzip files.
    </li>
  </ul>
</section>

<hr class="col-md-12">

<!-- ############################### Permissions ###############################-->
<section id='permissions'>
  <h2>Permissions</h2>

  <p>This section explains how file and directory permisisons are represented
    and how to change them.</p>

  <h4>Viewing permissions</h4>

  <p>When you type the command <code>ls -l</code> (list long), the first column
    specifies the current permissions of all (non-hidden) files and directories
    located in your current directory. It may look like this:</p>

  <pre> <code>drwxr-xr-x</code></pre>
  <p>The first character specifies the file type: <i>-</i> for an ordinary file, <i>d</i> for a directory, and <i>l</i> for a symbolic link</p>

  <p>The next nine characters are in groups of three, each representing the permissions for one of UNIX's three permission-tier categories: <b>user</b> (the owner of the file, listed in the 3rd field of <code>ls -l</code>), <b>group</b>, (the group owner of the file, listed in the 4th field of <code>ls -l</code>), and <b>other</b> (the permissions for everybody else). </p>

  <p>The first character of each group of three is either a <b>r</b> or a <b>-</b> to indicate whether that category of user has <b>read</b> permission over the file. For an ordinary file, read permission means that category of user can view and open the file. For a directory, read permission means that the list of filenames stored in the directory is viewable (ie that category of user can use the <code>ls</code> command on that directory).</p>

  <p>The second character of each group of three is either a <b>w</b> or a <b>-</b> to indicate whether that category of user has <b>write</b> permission over the file. For an ordinary file, write permission means that category of user can edit the file. For a directory, write permission means that the category of user can create or delete files within the directory.</p>

  <p>The third character of each group of three is either a <b>x</b> or a <b>-</b> to indicate whether that category of user has <b>execute</b> permission over the file. For an ordinary file, execute permission means that the category of user can run the file as code. For a directory, execute permission means that the category of user can 'pass through' the directory (ie that category of user can use the <code>cd</code> command to navigate through the directory.)</p>

  <p>Thus, the code snippet shown at the beginning of this section refers to a directory for which the user (owner) category has read, write, and execute privileges while the group and other categories have only read and execute privileges.</p>

  <img src='images/permissions.png' alt='Permissions' id='permissions-img'>
  <br>
  <h4>Changing permissions</h4>

  <p>The permissions of a file can be changed with the <code>chmod</code> command, which lets you add or remove permissions for the different permission tiers. From the previous example, adding write permissions for the group category to a file called example.txt would be done as so:</p>

  <pre> <code>chmod g+w example.txt</code></pre>

  <p>To then add write permissions for the other category:</p>

  <pre> <code>chmod o+w example.txt</code></pre>

  <p>To then remove write and execute permissions for the group and other categories:</p>
  <pre> <code>chmod go-xw example.txt</code></pre>

  <p>The above examples illustrate how the <code>chmod</code> command works: <code>chmod</code> is followed by a sequence of characters denoting the <b>user categories</b> for which to change permissions (<code>u</code> for user and/or <code>g</code> for group and/or <code>o</code> for other, or <code>a</code> for all (same as <code>ugo</code>), the <b>action</b> to take (<code>+</code> to add permissions or <code>-</code> to remove permissions), and the <b>permission(s)</b> to change (<code>r</code> for read, <code>w</code> for write, and/or <code>x</code> for execute). This sequence of characters is followed by the file name for which permission changes are to be applied.</p>

</section>

<hr class="col-md-12">


<!-- ############################### Connecting to Remote Machines ###############################-->
<section id='remote'>
  <h2>Connecting to Remote Machines</h2>

  <p>This section shows commands to connect to remote UNIX machines using
    <code>ssh</code> and securely transfer files using <code>scp</code>.</p>

  <h4>ssh</h4>

  <p>The <code>ssh</code> command lets you securely connect to a remote UNIX
    machine. You can use this to, for example, remotely connect to the I School server. If your I School computing account user name is <i> user1</i>, you could log in to the I School UNIX machine like so:</p>

  <pre> <code>ssh user1@ischool.berkeley.edu</code></pre>

  <p>After being prompted for your password, you would be taken to your home directory on the I School server</p>

  <p>If you use Windows and don't have a program like cygwin that lets you run UNIX commands locally, you can download <a href=http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html>putty</a> which will let you ssh into UNIX servers. The screenshot below shows an example of using ssh to log in to the I School server.</p>

  <img src='images/putty.PNG' alt='putty' id='putty'>

  <h4>scp</h4>

  <p>The <code>scp</code> command allows for the secure transferring files to and
    from a remote machine. With <code>scp</code>, the remote location is specified as <i>hostname:filename</i></p>

  <p>To copy a file called <i>example.txt</i> from your current local directory to a directory called <i>data</i> that's located within your home directory on hostname <i>ischool.berkeley.edu</i>:</p>

  <pre> <code>scp example.txt user1@ischool.berkeley.edu:data/</code></pre>

  <p>To copy a file called <i>example.txt</i> in the <i>data</i> directory of your home directory on hostname <i>ischool.berkeley.edu</i> into your current local directory:</p>

  <pre> <code>scp user1@ischool.berkeley.edu:data/example.txt .</code></pre>

  <p>In the above case, the <code>.</code> stands for your current directory.</p>

  <p>The <code>-r</code> option of <code>scp</code> copies an entire directory tree. To copy your entire remote home directory to your current local directory:</p>

  <pre> <code>scp -r user1@ischool.berkeley.edu:. .</code></pre>

  <h4>wget</h4>

  <p>The <code>wget</code> command lets you download files from the web. The following command downloads a zip file from example.com</p>

  <pre> <code>wget http://example.com/example.zip</code></pre>

</section>

<hr class="col-md-12">
<!-- ############################### Basic Filters ###############################-->
<section id='basic'>
  <h2>Basic Data Manipulation</h2>

  <p>This section shows some simple commands that can be used to subset data
    for output to a file or for input into another command.</p>

  
  <ul>
  <li><p>The <code>head</code> command lets you view the first <i>n</i> lines of a
    file. By default, <code>head</code> displays the first 10 lines. Use the
    <code>-n</code> option to specify the number of lines. In files with a
    header, <code>head -1 &lt;file&gt;</code> will display the header.</p></li>

  <li><p>The <code>tail</code> command lets you view the last <i>n</i> lines of a
    file. Like head, it displays the last 10 lines by default. Use the
    <code>-n</code> option to specify the number of lines. With tail, it is
    also possible to specify where you want to begin displaying the file, using
    a <i>+</i>. <code> tail +2 file</code> will display a file from the second
    line onward. In files with a header, this command displays every line
    except the header.</p></li>

  <li><p>The <code>sort</code> command lets you sort lines of a text file.</p></li>
    <p class='sub-point'><code>sort -b &lt;file&gt;</code> ignores leading blanks</p>
    <p class='sub-point'><code>sort -f &lt;file&gt;</code> ignores case of letters (folds lower case to upper case characters)</p>
    <p class='sub-point'><code>sort -n &lt;file&gt;</code> compares according to string numerical value</p>
    <p class='sub-point'><code>sort -r &lt;file&gt;</code> reverses the sorting order</p>
    <p class='sub-point'><code>sort -u &lt;file&gt;</code> removes repeated lines</p>
     <p class='sub-point'><code>sort -k n &lt;file&gt;</code> sorts on the nth field</p>
    <p class='sub-point'><code>sort -t<i>delim</i> &lt;file&gt;</code> specifies the field delimiter as <i>delim</i></p>


  <li><p>The <code>cut</code> command lets you extract data from a file by column or field. This command can be executed in three forms:</p></li>
    <p class='sub-point'><code>cut -b &lt;list&gt; &lt;file&gt;</code>, where the list after the <code>-b</code> option specifies byte position
    <p class='sub-point'><code>cut -c &lt;list&gt; &lt;file&gt;</code>, where the list after the <code>-c</code> option specifies character position</p>
    <p class='sub-point'><code>cut -f &lt;list&gt; -d &lt;delim&gt; &lt;file&gt;</code>, where the list after the <code>-f</code> option specifies field position (column number) and the argument after <code>-d</code> is the file delimiter.</p>


  <li><p>The <code>uniq</code> command lets you find unique (or duplicated) lines
    in a file. It requires input to be sorted, so typically output from <code>sort</code> is piped to <code>uniq</code>.</p></li>
      <p class='sub-point'><code>uniq -u &lt;file&gt;</code> selects non-repeating lines</p>
      <p class='sub-point'><code>uniq -d &lt;file&gt;</code> selects one copy of duplicated lines</p>
      <p class='sub-point'><code>uniq -c &lt;file&gt;</code> prepends each line with a count of the number of times it occurs</p>

  <li><p>The <code>tr</code> command lets you do character substitution. The format of the <code>tr</code> command is <code>tr -option(s) 'to_replace' 'replace_with' standard_input</code>, where the to_replace and replace_with expressions are character sequences of the same length. Unlike the other commands in this section, <code>tr</code> does not take a filename as an argument; rather, it can take input from a pipe, or you can redirect input from a file by doing <code>tr -option(s) 'to_replace' 'replace_with' < filename</code></p></li>
      <p class='sub-point'><code>tr -d 'char(s)'</code> deletes characters matching the specified sequence</p>
      <p class='sub-point'><code>tr -s 'char'</code> reduces repeating consecutive occurrences of the specified character to a single character. This is useful for eliminating redundant spaces, by doing <code>tr -s ' '</code></p>
</ul>
</section>

<hr class="col-md-12">
<!-- ############################### Advanced Filters ###############################-->
<section id='advanced'>
  <h2>Advanced Data Manipulation</h2>

  <p> This section covers more advanced commands for manipulating data. It
    starts with an overview of <i>regular expressions</i>, which the commands
    in this section make great use of. It then goes over the <code>grep</code>,
    <code>sed</code>, and <code>awk</code> commands.

  <h3>Regular Expressions</h3>

  <p>The commands in this section take advantage of regular expressions to do
    advanced pattern matching. Here, we list some of the essential regular
    expressions to know. If you are new to regular expressions or want to learn
    more, please visit some of the suggested resources.</p>

  <p><b>Representing single characters</b>
  <ul>
    <li><code>.</code> matches anything</li>
    <li><code>\d</code> matches a digit</li>
    <li><code>\D</code> matches a non-digit</li>
    <li><code>\s</code> matches whitespace</li>
    <li><code>\S</code> matches non-whitespace</li>
    <li><code>a</code> matches a</li>
    <li><code>[abc]</code> matches any character a, b, or c</li>
    <li><code>[a-c]</code> matches any character a, b, or c</li>
    <li><code>[a-zA-Z]</code> matches any character a-z or A-Z</li>
    <li><code>[a-zA-Z0-9]</code> matches any character a-z or A-Z or number
    </li>
    <li><code>[^abc]</code> matches any character not a, b, or c</li>
  </ul>
  </p>

  <p><b>Specifying Sequences</b>
  <ul>
    <li><code>a*</code> matches 0 or more occurences of a</li>
    <li><code>a?</code> matches 0 or 1 occurences of a</li>
    <li><code>a+</code> matches 1 or more occurences of a</li>
    <li><code>a{x}</code> matches x occurences of a</li>
    <li><code>a{x,}</code> matches at least x occurences of a</li>
    <li><code>a{x,y}</code> matches between x and y occurences of a</li>
  </ul>
  </p>

  <p><b>Specifying Locations</b>
  <ul>
    <li><code>^abc</code> matches abc at the beginning of the line</li>
    <li><code>abc$</code> matches abc at the end of the line</li>
  </ul>
  </p>

  <p>It is important to note that regular expressions make use of <i>‘metacharacters’</i>
    like <code>*</code>, <code>?</code>, <code>+</code>, etc. If you want to
    use regular expressions to, for example, find lines that end with a
    question mark, you would need to escape the question mark with a backslash
    like so: <code>\?$</code>.</p>

  <!-- ############################### GREP ###############################-->
  <h3>Grep</h3>

  <p>The <code>grep</code> command searches for patterns within files, and
    returns lines that have a match.The format of the grep command is:</p>

  <pre> <code>grep [-option(s)] ['pattern'] [filename(s)]</code></pre>

  <p>Some useful grep options include:
  <ul>
    <li><code>-i</code> to ignore the case of the pattern when searching in the
      file
    </li>
    <li><code>-v</code> to select lines that don't match the pattern</li>
    <li><code>-w</code> to select only lines where the match is a whole word
      (rather than a part of the word)
    </li>
    <li><code>-f</code> to read in patterns from a file, one pattern per line.
    </li>
  </ul>
  </p>

  <p>The pattern given to grep can be a simple string or a more complex regular
    expression. To be safe, it should always be put in quotes so that the shell
    does not interpret and replace the metacharacters.</p>

  <!-- ############################### SED ###############################-->
  <h3>Sed</h3>

  <p><code>Sed</code> stands for stream editor. It can do a lot, but we’ll
    focus on its usage in substitution.The format of the sed command is:</p>
  <pre> <code>sed [-option(s)] ['address action']</code></pre>

  <p>The <b>address</b> specifies a line number or a range of line numbers:
    <code>(1,7)</code> would select lines 1-7. The <b>actions</b> include
    deleting, printing, appending, and replacing text.</p>

  <p>For substitution, the ‘address action’ part of the sed command takes the
    following form: <i>address/toreplace/replacewith/flags</i>.
    <i>toreplace</i> can be a regular expression. An important flag to know is
    <code>g</code>, which replaces all occurences of <i>toreplace</i> with <i>replacewith</i>.
    Without the <code>g</code> flag, sed only replaces the first occurence.</p>

  <!-- ############################### AWK ###############################-->
  <h3>Awk</h3>

  <p><code>Awk</code> is a programming language useful for processing tabular
    data. The format for an <code>awk</code> command is:</p>
  <pre> <code>awk [-option(s)] ['selection_criteria {action}'] [filename(s)]</code></pre>
    <p>Use the <code>-F</code>
    option to specify the field delimiter for the file. The default delimiter
    for <code>awk</code> is contiguous spaces and tabs.</p>

  <p>The <b>selection_criteria</b> are like the addressing in <code>sed</code>,
    but can take advantage of <code>awk</code> built in variables. Built-in
    variables include:
  <ul>
    <li><code>$1</code> for line 1</li>
    <li><code>$0</code> for all lines</li>
    <li><code>NR</code> for record number</li>
  </ul>
  </p>

  <p>The <b>selection_criteria</b> can take advantage of operators to subset
    the data. Operators include:
  <ul>
    <li><code><</code>, <code><=</code>, <code>></code>, <code>>=</code> for
      less than, less than or equal to, greater than, and greater than or equal
      to
    </li>
    <li><code>==</code> for equal to</li>
    <li><code>!=</code> for not equal to</li>
    <li><code>~</code> to match a regular expression</li>
    <li><code>!~</code> to match everything but a regular expression</li>
    <li><code>&&</code> for logical and</li>
    <li><code>||</code> for logical or</li>
    <li><code>!</code> for logical not</li>
  </ul>
  </p>

  <p>To use regular expressions in an <code>awk</code>
    <b>selection_criteria</b>, enclose them in forward slashes.The below
    command prints out lines from a csv file for which the first column begins
    with ‘words’ or ‘Words’:</p>

  <pre> <code>awk -F, '$1 ~ /^[Ww]ords/ { print }' test.csv</code></pre>

  <p>For our tutorial purposes, the action will be a print statement. In more
    advanced contexts, it can be an awk program.</p>
</section>

<hr class="col-md-12">

<!-- ############################### Learn More ###############################-->
<section id='resources'>
  <h2>Learn More</h2>

  <p>There is a lot more to UNIX than is explained on this short page. We
    encourage you to check out the following resources to learn more</p>

  <p>
  <ul>
    <li><a
        href=http://people.ischool.berkeley.edu/~kevin/unix-tutorial/toc.html>I
      School Unix Tutorial</a></li>
    <li><a
        href=http://www.ee.surrey.ac.uk/Teaching/Unix/>UNIX Tutorial for Beginners</a></li>
    <li><a
        href=http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=PracticalUnix>
      Practical Unix videos from Stanford</a></li>
      <li><a href=http://www.gregreda.com/2013/07/15/unix-commands-for-data-science/>Useful UNIX Commands for Data Science</a></li>
      <li><a
        href=http://regexone.com/lesson/0>
      Tutorial on Regular Expressions</a></li>

  </ul>
  </p>
</section>

<hr class="col-md-12">

</div>
</div>
</div>


<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script
    src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="js/bootstrap.min.js"></script>
<script src="js/script.js"></script>

</body>
</html>