-
Notifications
You must be signed in to change notification settings - Fork 94
/
Copy pathcount-fasta-seqs.sh
96 lines (92 loc) · 3.24 KB
/
count-fasta-seqs.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#!/bin/sh
# How this script should behave:
#
# INPUT: Paths to one or more fasta sequence files
#
# OUTPUT: For each file, it should write a line with the number of sequences
# in the file, a space, and then the file NAME (NOT the path!), and a
# final line with the total number of sequences across all files.
#
# EXAMPLE: In the same directory as this script, you should find an example
# fasta file named 'example-seqs1.fasta', which contains:
#
# >RMB3263_Cyrtodactylus_philippinicus_Negros
# CGGGCCCATACCCCGAAAATGTTGGTATAAACCCCTTCCTATACTAATAAACCCCATTATTTGATCACTATTACTAAC
#
# >CWL052_Cyrtodactylus_philippinicus_Negros
# CGGGCCCATACCCCGAAAATGTTGGTATAAACCCCTTCCTATACTAATAAACCCCATTATTTGATCACTATTACTAAC
#
# If you run this script on this fasta file, you want to get the
# following output:
#
# $ sh count-fasta-seqs.sh example-seqs1.fasta
# 2 example-seqs1.fasta
# 2
#
# There should be another example fasta file named
# 'example-seqs2.fasta', which contains:
#
# >RMB7155_Sphenomorphus_arborens_Negros
# ATGAACCCCATTATAACCTCCCTCATTTTATCAAGCCTGGCCCTTGGAACCGTAATCACACTAACAAGCTACCACTGA
#
# >RMB7156_Sphenomorphus_arborens_Negros
# ATGAACCCCATTATAACCTCCCTCATTTTATCAAGCCTGGCCCTTGGAACCGTAATCACACTAACAAGCTACCACTGA
#
# >RMB7163_Sphenomorphus_arborens_Negros
# ATGAACCCCATTATAACCTCCCTCATTTTATCAAGCCTGGCCCTTGGAACCGTAATCACACTAACAAGCTACCACTGA
#
# If you run this script on BOTH fasta files, you want to get the
# following output:
#
# $ sh count-fasta-seqs.sh example-seqs1.fasta example-seqs2.fasta
# 2 example-seqs1.fasta
# 3 example-seqs2.fasta
# 5
#
#
# Your goal is to work collaboratively with ~ 3 other people to edit this
# script until it passes all the tests that have been written for it. I.e., you
# should be able to run:
#
# $ sh run_tests.sh
#
# and see 'All tests passed!' at the bottom of the output.
#
# To do this, one member of your group should fork this repository, and add the
# other members as collaborators, so that all members of the team can be
# pulling and pushing changes to the script to the collaborative remote
# repository on Github.
#
# HINTS
# The first thing you need to be able to do is access the paths to the fasta
# files that were 'given to' this script. The variable "$@" will be very useful
# for this. Let's take a look at what it gives us:
echo "$@"
# How are you going to work with each file path?
# HINT: for loop (remember "for do done"?)
#
# To get the name of each file from the path, checkout the command 'basename':
#
# $ man basename
#
# To count the number of sequences in each file, I recommend you checkout
# 'grep' and 'wc':
#
# $ man grep
# $ man wc
#
# WARNING about 'grep': ALWAYS quote the string that you are trying to find!
# For example, do:
#
# $ grep "string I want to find" file-i-want-to-find-it-in.txt
# **NOT**
# $ grep string I want to find file-i-want-to-find-it-in.txt # DON'T DO THIS!
#
# To keep a tally of the total number of sequences across all files, 'expr'
# might be useful:
#
# $ man expr
#
# Good luck!
#
# ADD YOUR CODE BELOW: