Notes:
-
I've been using these commands for file editing and other miscellaneous tasks.
-
Some commands might be repeated. I'm using this repository to store everything I'm using and may use it again in the future.
sed -e 's/,/\t/g' input > output
column -t filename > output
cut -d$'\t' -f 1-10 NAMEOFFILE.txt > NAMEOFFILE2.txt
sed '1h;1d;$!H;$!d;G' nameoffile
sed -e "s/^M//" filename > newfilename
OBS: To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession.
sed 's/^.....//' input>output
sed 's/.$//'
sed 's/ /\t/g' input>output
awk '{print NF}' input>output
cat HEADERCFILE FILEWITHNOHEADER > output
sort -k1 input > output
awk '{print $2,$1,$3}' input>output
paste input1 input2 > output
less -S nameoffile
sed 1d input > output
cut -‐f1 -‐d" " input>output
cut -d " " -f 2- input>output
awk 'FNR>=1 && FNR<=11' input>output
echo "$(tail -n +2 input)" > output
or
sed -i '1d' input
nohup ./command_that_will_take_forever_to_complete &
grep -vwF -f id1.txt id2.txt > id3.txt
- This file contains the animals that are not in common... that is, animals that have a phenotype.
OBS: print strings that are not common when comparing two different files. the output will be all the different values/animals/sentences between the two files. In the example above, I wanted to know which animals had phenotype and not genotype (the number of animals genotyped was lower). I just wanted to work with phenotyped animals that had a genotype, that is, to exclude individuals without genomic information.
join -1 1 -2 1 <(sort -k1 input1) <(sort -k1 input2) > output
chmod u+x nameofexecutable
awk '{print $0, "Fernando"}' input>output
wc -l nameoffile
awk '{if($9>=1.0)print;}' input> output
grep 'pattern' file1
head -n nameoffile
tail -n nameoffile
join -1 1 -2 1 <(sort -k1 id_GENO.txt) <(sort -k1 P120_Genotipados.txt) > P120_FINAL.txt
OBS: id_GENO: is a file with IDs for genotyped animals. In this case, the output is a phenotype file for animals that have genotypes.
expand -t 1 filename > output
sed 's~NA~0~g' pheno5.txt >pheno6.txt
tar -zcvf archive.tar.gz directory/
cp -r /work/course2022/week2/day10/ .
cat file3 >> output_file2
paste -d " " input1 input2 > output
sort -k startfield,endfield filename > output
sort -nk startfield,endfield filename > output
sort -nrk startfield,endfield filename > output
sort -k1,1 -k2,2 filename > output
41. Merge two files by column 1 but suppress the joined output lines. Good for finding who is not in one of the files
join -v1 phenotypes.txt pedigree.txt > output
grep -v PATTERN filename
sed -e 's/UGA/SA/g' pedigree.txt > SA_pedigree.txt
OBS: replaces UGA with SA in the pedigree file.
sed -e '24s/UGA/SA/' pedigree.txt > SA_pedigree.txt
OBS: In this case, line 24.
sed -e '/pattern to match/d' file > output
awk '{print $1,$NF}' filename>output
awk '{print $0}' filename>output
awk '{if ($2==2) print $1}' filename>output
awk '{if (NR>1000) print $3, $4}' filename>output
awk '{if (NR==1) print length ($2)}' filename
awk '{print $2}' filename |sort| uniq -c >output
Example: How many progenies an animal has. In this case, the sire was column 2.
In this code, the reference allele is A.
- Missing genotypes (coded as 5):
awk '{print substr ($2, 7, 1)}' genot_pic.txt | awk '$1==5' | wc -l
- AA (coded as 2):
awk '{print substr ($2, 7, 1)}' genot_pic.txt | awk '$1==2' | wc -l
- Aa (coded as 1):
awk '{print substr ($2, 7, 1)}' genot_pic.txt | awk '$1==1' | wc -l Aa
- aa (coded as 0):
awk '{print substr ($2, 7, 1)}' genot_pic.txt | awk '$1==0' | wc -l
awk '{print substr ($2, 7, 1)}' genot_pic.txt | awk '{sum=sum+$1} END {print sum/(2*NR)}'
scp serveraddres:work/ads-guest37/.day12/pheno.dat /pathinyourlocaldirectory
awk '{sum+=$5} END { print "Average = ",sum/NR}' nameoffile
- Note: In this case, the column I wanted the average was column 5, represented as $5.
head -n1 nameoffile | tr , '\n' > output
expand -t 1 input>output
scp -P 22 file sabrina@server_number:~/
awk -F '\t' -v OFS='\t' '{ $(NF+1) = NUMBERYOUWANT; print }' infile >outfile
60. For BLUPF90 solutions file: Skip the header and then sort, first by trait, then by effect, then by level
awk ‘NR>1’ solutions | sort -k1,1n -k2,2n -k3,3n > solutions.sorted
awk 'NR>1{print $COLUMN_NUMBER}' FILE | sort -u
find directory/path/here -type f -printf '%TY-%Tm-%Td %TH:%TM %p\n' | awk '$2 == "12:14" {print $3}' | xargs rm
sed 's/^[^0-9]*//' snp_file.dat > snp_cleaned.dat
awk '{print length($0); exit}' file
awk '!seen[$1]++' input > output
- Ex: You want the second field (the genotypes) to start at the same position (e.g., column 10 or a specific width).
awk '{printf "%-10s%s\n", $1, $2}' input_file.txt > fixed_format_file.txt