command | description |
---|---|
man <command> |
manual page for the command e.g. man ls to get the man page for ls |
cd |
change directory |
cp |
copy files |
curl |
transfer a URL |
cut |
cut out selected portions of each line of a file |
date |
returns the current date and time |
grep |
filnds the lines that contain a pattern |
head |
prints the top few lines to the terminal window |
ls |
list directory contents |
mkdir |
make a directory |
mv |
move files |
pwd |
return working directory name |
rm |
remove, or delete files and directories. Use caution, it is easy to delete more that you want. |
scp |
remote secure copy |
sort |
sorts the lines |
ssh |
remote login |
tail |
prints the last few lines to the terminal window |
uniq |
prints the unique lines |
wc |
counts the number of lines, characters and words |
~ |
shortcut for your home directory |
The files you need later in this review are in our github repository. There will be direction on how to retrieve them
Let's go to a directory with a lot of files in it and list those files
cd /bin/
ls
What's the difference between these two commands?
Try them both!!
ls -l
ls -lt
Pipes
You can string more than one command together with a pipe |
, such that the standard output of the first command is 'piped' into the standard input of the second command.
Try it!!
ls -lt | head
Semicolons
You can string more than one command together by putting a semi-colon ;
after the each command. Here, the commands will be run sequentially, but any output does not get passed from one command to the next.
Try it!!
date ; sleep 2 ; date
If you want to know more about
sleep
typeman sleep
Download a file.
Change directory to your home directory. You likely have permissions to write to your home directory. Now use curl
to download files. On some systems only one of these may be available
cd ~
curl -O https://raw.githubusercontent.com/prog4biol/pfb2024/master/files/cuffdiff.txt
Note '-O' is the letter O not the number Zero 0
Redirect STDOUT
You can redirect the output of a command into a file.
cd ~
grep Chr7 cuffdiff.txt > fav_chr_cuffdiff.txt
Append STDOUT to the end of a file that already exists
You can append the output of a command to a file
grep Chr9 cuffdiff.txt >> fav_chr_cuffdiff.txt
Redirect STDERR
You can redirect STDERR to a file.
Let's review what STDERR actually is.
cat blablabla.txt
file blablabla.txt does not exist so we get
cat: blablabla.txt: No such file or directory
printed to the terminal. This message is labeled by the operating system as an error message or STDERR.
STDERR is a labeled type of output we can redirect
cat blablabla.txt 2> errors.txt
We can redirect the error messages, A.K.A. STDERR, to a new file called anything we want
What happens when you try to redirect STDOUT?
cat blablabla.txt > errors.txt
cat: blablabla.txt: No such file or directory
still gets printed to the screen because we only redirect STDOUT to our file. There is no STDOUT in this case and our file will be empty. How would you verify this?
Redirect STDOUT and STDERR
You can redirect both STDOUT and STDERR to two separate files in one command.
# just print it to the terminal first
cat fav_chr_cuffdiff.txt blablabla.file
# redirect to two files, STDOUT to out.txt, STDERR to err.txt
cat fav_chr_cuffdiff.txt blablabla.file 1> out.txt 2> err.txt
# this does the same, do you see the difference?
cat fav_chr_cuffdiff.txt blablabla.file > out.txt 2> err.txt
Examine the contents of
out.txt
anderr.txt
. REMEMBER you can use your text editor to open and look at the contents of any file. A text file can have any extention that you want it to have. It does not have to end in .txt, it can be .fa .fasta .nt .trash .anything .nothing
You can also redirect both STDOUT and STDERR to the same file.
cat fav_chr_cuffdiff.txt blablabla.file &> all_out_err.txt
Check out what is in the
all_out_err.txt
-
Log into your machine.
-
What is the full path to your home directory?
-
Go up one directory?
- How many files does it contain?
- How many directories?
-
Make a directory called
PFB_problemsets
in your home directory. -
Navigate into this new directory called problemsets. Verify that you are in the correct directory by using
pwd
. -
Use
curl -O
to copy https://raw.githubusercontent.com/prog4biol/pfb2024/master/files/sequences.nt.fa from the web into your problemsets directory. Ifcurl
is not available on your system, usewget
as an alternative. (Note '-O' is the letter O not the number Zero 0) -
Without using a text editor, use unix commands to find these qualities for the file
sequences.nt.fa
. This file can be found here https://raw.githubusercontent.com/prog4biol/pfb2024/master/files/sequences.nt.fa- How many lines does this file contain?
- How many characters? (Hint: check out the options of wc)
- What is the first line of this file? (Hint: read the man page of head)
- What are the last 3 lines? (Hint: read the man page of tail)
- How many sequences are in the file? (Hint: use grep) (Note: The start of a sequence is indicated by a
>
character.)
-
Rename
sequences.nt.fa
tocancer_genes.fasta
. (Hint: read the man page for mv) -
If you haven't already, copy/download this remote file, cuffdiff.txt, to your problemset directory using
curl
. Here is the url you can use: https://raw.githubusercontent.com/prog4biol/pfb2024/master/files/cuffdiff.txt. Use-O
is the letter O not the number zero 0. -
Do the following to
cuffdiff.txt
. The descriptions of each column in the file are in the table below.- Look at the first few lines of the file
- Sort the file by log fold change 'log2(fold_change)', from highest to lowest, and save in a new file in your directory called sorted.cuffdiff.out
- Sort the file (log fold change highest to lowest) then print out only the first 100 lines. Save in a file called
top100.sorted.cuffdiff.out
. - Sort the file by log fold change, print out the top 100, print only first column (Hint: read the man page for
cut
). This will be a list of the top 100 genes with the largest change in expression. Make sure your list is sorted by gene name and is unique. Save this curated list in a file calleddifferentially.expressed.genes.txt
.
Cuffdiff file format
Column number | Column name | Example | Description |
---|---|---|---|
1 | Tested id | XLOC_000001 | A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested |
2 | Tested id | XLOC_000001 | A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested |
3 | gene | Lypla1 | The gene_name(s) or gene_id(s) being tested |
4 | locus | chr1:4797771-4835363 | Genomic coordinates for easy browsing to the genes or transcripts being tested. |
5 | sample 1 | Liver | Label (or number if no labels provided) of the first sample being tested |
6 | sample 2 | Brain | Label (or number if no labels provided) of the second sample being tested |
7 | Test status | NOTEST | Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing. |
8 | FPKMx | 8.01089 | FPKM of the gene in sample x |
9 | FPKMy | 8.551545 | FPKM of the gene in sample y |
10 | log2(FPKMy/FPKMx) | 0.06531 | The (base 2) log of the fold change y/x |
11 | test stat | 0.860902 | The value of the test statistic used to compute significance of the observed change in FPKM |
12 | p value | 0.389292 | The uncorrected p-value of the test statistic |
13 | q value | 0.985216 | The FDR-adjusted p-value of the test statistic |
14 | significant | no | Can be either "yes" or "no", depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing |