You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.
Besides the changes we discussed in replaceOrgWithAbbrev.py, other files use organism names in their output or input.
in src/makeCoreClusterAnalysisTree.py, the input and output use sanitized organism names:
"The input MUST be a Newick file with organism IDs REPLACED with their names"
"WARNING: Organism name %s in the database was not found in the provided tree. It will be deleted!!\n" %(collist[ii]))
The description and the header comment in this file conflict about the function of the script:
src/db_getBlastResultsBetweenSpecificGenes.py
description = "Given list of genes to match, returns a list of BLAST results between genes in the list only"
Provide a list of organisms to match [can match any portion of the organism so if you give it just "mazei" it will return to you a list of Methanosarcina mazei]
I think this is from duplication between thses scripts:
src/db_getBlastResultsBetweenSpecificGenes.py src/db_getBlastResultsBetweenSpecificOrganisms.py
Other scripts to check if the organism name or ID are used:
db_findClustersByOrganismList.py
db_getOrganismsInClusterRun.py
db_getOrganismsInCluster.py
db_addOrganismNameToTable.py
db_bidirectionalBestHits.py
db_TBlastN_wrapper.py
We discussed keeping the library functions, but another way to find the dependences is to see what called these library functions:
lib/TreeFuncs.py: '''Parse a node name into an organism ID.
lib/ClusterFuncs.py: Given an organism name, return the ID for that organism name.
lib/CoreGeneFunctions.py: The return object is a list of (runid, clusterid, organism) tuples sorted by run ID then by cluster ID.'''
lib/CoreGeneFunctions.py:def findGenesByOrganismList(orglist
lib/CoreGeneFunctions.py: The organisms in "orglist" are considered the "ingroup"
The text was updated successfully, but these errors were encountered:
This is a mess but I'll take this suggestion based on our discussion. Then we will just have one script that converts to user-readable IDs at the end, correct? (Also, when we're building figures we should have the option to do that automatically since they're made to be looked at and not computed on)
That sounds good to me. I'll commit my multi-format parser /lib/
function if is not already in the push request.
James H
On 05/29/2013 02:52 PM, mattb112885 wrote:
This is a mess but I'll take this suggestion based on our discussion.
Then we will just have one script that converts to user-readable IDs
at the end, correct? (Also, when we're building figures we should have
the option to do that automatically since they're made to be looked at
and not computed on)
—
Reply to this email directly or view it on GitHub #48 (comment).
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Besides the changes we discussed in replaceOrgWithAbbrev.py, other files use organism names in their output or input.
in src/makeCoreClusterAnalysisTree.py, the input and output use sanitized organism names:
"The input MUST be a Newick file with organism IDs REPLACED with their names"
"WARNING: Organism name %s in the database was not found in the provided tree. It will be deleted!!\n" %(collist[ii]))
The description and the header comment in this file conflict about the function of the script:
src/db_getBlastResultsBetweenSpecificGenes.py
description = "Given list of genes to match, returns a list of BLAST results between genes in the list only"
Provide a list of organisms to match [can match any portion of the organism so if you give it just "mazei" it will return to you a list of Methanosarcina mazei]
I think this is from duplication between thses scripts:
src/db_getBlastResultsBetweenSpecificGenes.py src/db_getBlastResultsBetweenSpecificOrganisms.py
Other scripts to check if the organism name or ID are used:
db_findClustersByOrganismList.py
db_getOrganismsInClusterRun.py
db_getOrganismsInCluster.py
db_addOrganismNameToTable.py
db_bidirectionalBestHits.py
db_TBlastN_wrapper.py
We discussed keeping the library functions, but another way to find the dependences is to see what called these library functions:
lib/TreeFuncs.py: '''Parse a node name into an organism ID.
lib/ClusterFuncs.py: Given an organism name, return the ID for that organism name.
lib/CoreGeneFunctions.py: The return object is a list of (runid, clusterid, organism) tuples sorted by run ID then by cluster ID.'''
lib/CoreGeneFunctions.py:def findGenesByOrganismList(orglist
lib/CoreGeneFunctions.py: The organisms in "orglist" are considered the "ingroup"
The text was updated successfully, but these errors were encountered: