Skip to content

Commit

Permalink
remove protein from filter stitle list
Browse files Browse the repository at this point in the history
-since we added it as an uniformative word it makes sense to remove it from the lists. Otherwise it would both be considered an uniformative word (minamally scored) as well as filterd out
  • Loading branch information
coeit committed Jul 28, 2022
1 parent 1955fe8 commit 85e147e
Show file tree
Hide file tree
Showing 3 changed files with 1 addition and 4 deletions.
1 change: 0 additions & 1 deletion misc/filter_stitle_regexs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,4 @@ IPR.*
(?i)\bcontig\b
(?i)\brelated\b
(?i)\bremark\b
(?i)\bprotein\b
(?i)\b\w?orf(\w?|\d+)\b
1 change: 0 additions & 1 deletion misc/filter_stitle_regexs_NCBI_NR.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,4 @@ IPR.*
(?i)\bcontig\b
(?i)\brelated\b
(?i)\bremark\b
(?i)\bprotein\b
(?i)\b\w?orf(\w?|\d+)\b
3 changes: 1 addition & 2 deletions misc/filter_stitle_regexs_UniRef.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
(?i)^H0.*protein
(?i)contains.*
IPR.*
\w{2,}\d{1,2}[gGmMcC]\d+(\.\d+)*
\w{2,}\d{1,2}[gGmMcC]\d+(\.\d+)*[a-zA-Z]?
\b\[.*
^(\s|/|\(|\)|-|\+|\*|,|;|\.|:|\||\d)+$
(?i)\bunknown\b
Expand All @@ -16,5 +16,4 @@ IPR.*
(?i)\bcontig\b
(?i)\brelated\b
(?i)\bremark\b
(?i)\bprotein\b
(?i)\b\w?orf(\w?|\d+)\b

0 comments on commit 85e147e

Please sign in to comment.