Unable to build clustering annotation from the command line #207

ycaspi257 · 2024-09-05T04:43:53Z

Hello,
I was trying to build an Autoannotate clustering from the command line using a command:
autoannotate annotate-clusterBoosted clusterAlgorithm=MCL labelColumn=EnrichmentMap::GS_DESCR maxWords=3 network=current edgeWeightColumn=name

However, I get an error message:
Cannot invoke "org.baderlab.autoannotate.internal.model.AnnotationSetBuilder.getClusters()" because "this.builder" is null

Clustering using the Cytoscape Autoannotate menu works just fine. Only the command line send the error message. In addition, if I increase the similaritycutoff of the network so that fewer edges are formed, clustering from the command line or the Cytoscape Autoannotate menu were perfectly well.

What can be the source of the problem?

Best,
Yaron Caspi

The text was updated successfully, but these errors were encountered:

mikekucera · 2024-09-05T16:08:39Z

What version of AutoAnnotate are you using?

Can you please send me your framework-cytoscape.log file found in the <user-home>/CytoscapeConfiguraiton/3 folder. That should contain the entire exception trace. And if possible please send me your session file.

Thanks!

ycaspi257 · 2024-09-06T04:54:03Z

Dear Mike, Thank you very much for your prompt reply. The files you requested are attached. I am using Autoannotate V.1.4.1 with Cytoscape 3.10.2 Java 10.0.12 on Ubuntu 20.04. You can see the problem, e.g., in the network "Left_Hemisphere_fMRI_NQ-EF". The command I was using is: autoannotate annotate-clusterBoosted clusterAlgorithm=MCL labelColumn=EnrichmentMap::GS_DESCR maxWords=3 network=current Waiting forward for your further help. Best, Yaron Caspi BTW, it was very hard, or even impossible, to find in the documentation the appropriate value for the clusterAlgorithm to put in the command instead of MCL On 06/09/2024 00:09, Mike Kucera wrote: What version of AutoAnnotate are you using? Can you please send me your framework-cytoscape.log file found in the <user-home>/CytoscapeConfiguraiton/3 folder. That should contain the entire exception trace. And if possible please send me your session file. Thanks! — Reply to this email directly, view it on GitHub<#207 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BLBDGVAI7KSWIXOFDKN3UWLZVB6Z3AVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZSGEZDSMBTGE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

mikekucera · 2024-09-06T19:13:18Z

Hi, It looks like GitHub didn't attach your files. Can you please send them to me directly at [email protected]. Thanks.

mikekucera · 2024-09-11T14:18:02Z

Hi, there are two things that should help here...

Try updating AutoAnnotate to the latest version (currently 1.5.1). I don't get the same error with the latest version.
You must use a numeric column for the edgeWeightColumn attribute. Using the 'name' column, which has type String, causes an error in clusterMaker. Try edgeWeightColumn=EnrichmentMap::similarity_coefficient

ycaspi257 · 2024-09-12T12:32:45Z

Dear Mike, Thank you so much. After updating to version 1.5.1, it indeed seems to work. Two more unrelated questions. A. Is there a simple command to get the list of clustered and number of nodes they include (like the menu item used to export cluster to file)? B. Is there a way to add words to the "excluded words" list definitely. I mean, is there a file or something similar that I can edit to add several words definitely? Best, Yaron On 11/09/2024 22:18, Mike Kucera wrote: Hi, there are two things that should help here... 1. Try updating AutoAnnotate to the latest version (currently 1.5.1). I don't get the same error with the latest version. 2. You must use a numeric column for the edgeWeightColumn attribute. Using the 'name' column, which has type String, causes an error in clusterMaker. Try edgeWeightColumn=EnrichmentMap::similarity_coefficient — Reply to this email directly, view it on GitHub<#207 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BLBDGVE4WMYC4JDSNR7UWTLZWBGLBAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBTHAYDQOJQGU>. You are receiving this because you authored the thread.Message ID: ***@***.***>

risserlin · 2024-09-12T12:54:24Z

Hi Yaron,
I know you are running commands but are you running this through R or python?

If you are running commands thought R or python, with regards to you first question, there isn't a simple command to get the info but what I usually do is after autoannotating the network I get the node table (I use RCy3 from R and use the function - getTableColumns)
default_node_table <- getTableColumns(table= "node",network = network_suid)

with that table you can use the column __mclCluster to get the number of nodes in the cluster and their names.

With regards to adding words to the exclusion list permanently, In word cloud there is a mechanism to add words to the list and I believe that it gets stored and reloaded but I prefer to run the following command prior to annotating:
wordcloud ignore add value="wordtoignore"network=SUID:1234

Imbedded in one of my R workflows I have:
#add the set of words to ignore
words2ignore <- c("pid",1:10)
responses <- lapply(words2ignore,function(x){ wordcloud2_url <- paste("wordcloud ignore add value="",x, "" ","network=SUID:",network_suid, sep="");
commandsGET(wordcloud2_url)})

Thanks,
Ruth

ycaspi257 · 2024-09-12T13:08:30Z

Dear Ruth, Thank you so much. I use R. When doing it manually (at least for autoannotate), I did not find a mechanism to gets it stored. This is why I thought that there might be an excluded words file somewhere that I can just edit. I was mainly interested in adding excluded words to the autoannotate clustering algorithm and not word cloud (to get the cluster labeling to fit my purposes). Thank again. Best, Yaron On 12/09/2024 20:54, Ruth Isserlin wrote: Hi Yaron, I know you are running commands but are you running this through R or python? If you are running commands thought R or python, with regards to you first question, there isn't a simple command to get the info but what I usually do is after autoannotating the network I get the node table (I use RCy3 from R and use the function - getTableColumns) default_node_table <- getTableColumns(table= "node",network = network_suid) with that table you can use the column __mclCluster to get the number of nodes in the cluster and their names. 1. With regards to adding words to the exclusion list permanently, In word cloud there is a mechanism to add words to the list and I believe that it gets stored and reloaded but I prefer to run the following command prior to annotating: wordcloud ignore add value="wordtoignore"network=SUID:1234 Imbedded in one of my R workflows I have: #add the set of words to ignore words2ignore <- c("pid",1:10) responses <- lapply(words2ignore,function(x){ wordcloud2_url <- paste("wordcloud ignore add value="",x, "" ","network=SUID:",network_suid, sep=""); commandsGET(wordcloud2_url)}) Thanks, Ruth — Reply to this email directly, view it on GitHub<#207 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BLBDGVFCTMSPXXX5ANGWOKDZWGFJLAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWGIYDQMZUGA>. You are receiving this because you authored the thread.Message ID: ***@***.***>

risserlin · 2024-09-12T13:25:39Z

Hi Yaron,
Autoannotate uses wordcloud to compute the labels so if you want to exclude words you have to make the change in word cloud.
There is a file in the WordCloud jar (which you can find in your CytoscapeConfiguration/3/apps/installed directory) called FlaggedWords.txt that you can add words to.

You would need to run the following commands to do it. (This is very hacky, sorry)

mv WordCloud-v3.1.4.jar WordCloud-v3.1.4.zip

create a FlaggedWords.txt file which looks like this:
kegg
reactome
react
biocarta
go
nci
msigdb
my_new_word1
my_new_word2

And then run:
zip -u WordCloud-v3.1.4.zip FlaggedWords.txt

mv WordCloud-v3.1.4.zip WordCloud-v3.1.4.jar

Alternately, depending on the words, you can ask @mikekucera to add the words to distribution but often words can be very specific to the dataset or data sources you are using so we try to avoid that.

Thanks,
Ruth

ycaspi257 · 2024-09-12T13:33:45Z

Dear Ruth, Thank again. I will follow these instructions. I was mainly referring to dataset pathway name from gene ontology, namely, GOCC, GOMF and GOBP. When working with GSEA - GSEA add these to the node names. Hence, when doing the clustering, there is a bias toward these words in the cluster name. It might be reasonable to exclude these words (or give an option to exclude those and similar words that GSEA adds) in future distributions, since they are relatively general and not specific. Best, Yaron On 12/09/2024 21:26, Ruth Isserlin wrote: Hi Yaron, Autoannotate uses wordcloud to compute the labels so if you want to exclude words you have to make the change in word cloud. There is a file in the WordCloud jar (which you can find in your CytoscapeConfiguration/3/apps/installed directory) called FlaggedWords.txt that you can add words to. You would need to run the following commands to do it. (This is very hacky, sorry) mv WordCloud-v3.1.4.jar WordCloud-v3.1.4.zip create a FlaggedWords.txt file which looks like this: kegg reactome react biocarta go nci msigdb my_new_word1 my_new_word2 And then run: zip -u WordCloud-v3.1.4.zip FlaggedWords.txt mv WordCloud-v3.1.4.zip WordCloud-v3.1.4.jar Alternately, depending on the words, you can ask @mikekucera<https://github.com/mikekucera> to add the words to distribution but often words can be very specific to the dataset or data sources you are using so we try to avoid that. Thanks, Ruth — Reply to this email directly, view it on GitHub<#207 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BLBDGVGT5CDX3X7ZHP34QX3ZWGI6TAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWGI4DAMBRGU>. You are receiving this because you authored the thread.Message ID: ***@***.***>

risserlin · 2024-09-12T13:46:06Z

Hi Yaron,
Which geneset files are you using? Are you using the one supplied by GSEA? (word cloud weights the words based on occurrence in the network so if GOBP and GOMF are everywhere they shouldn't be coming up in the cluster tag). I don't see them coming up in my networks but I use the baderlab genesets and not the ones supplied with GSEA so I am curious if there is an issue.
Thanks,
Ruth

mikekucera · 2024-09-12T17:12:44Z

There is no global list of excluded words you can edit. The only way to do it is to modify the default list of words stored in the app jar like Ruth suggested. Excluded words are saved in the session file and can only be set on a per-network basis. If you are using R then they easiest thing to do is have a series of commands of the form wordcloud ignore add value="wordtoignore" network=current in your script before the command to create the annotations.

ycaspi257 · 2024-09-13T16:26:58Z

Dear Ruth, I am using C5.all.v2024.1.Hs.symbols.gmt, which is distributed with GSEA. That results in EnrichmentMap GS_DESCR mode names like https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_ELECTRON_TRANSPORT_CHAIN and Enrichment map node names like GOBP_ELECTRON_TRANSPORT_CHAIN. And this is then taken by autoannotate to include labels that include words such as GOBP ... Naturally, this can be removed by a python/R scripts. But working manually is cumbersome. Best, Yaron On 9/12/24 21:46, Ruth Isserlin wrote: Hi Yaron, Which geneset files are you using? Are you using the one supplied by GSEA? (word cloud weights the words based on occurrence in the network so if GOBP and GOMF are everywhere they shouldn't be coming up in the cluster tag). I don't see them coming up in my networks but I use the baderlab genesets and not the ones supplied with GSEA so I am curious if there is an issue. Thanks, Ruth — Reply to this email directly, view it on GitHub<#207 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BLBDGVFVEFBOIQCK7OMHMV3ZWGLLJAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWGMZTIMBRGI>. You are receiving this because you authored the thread.Message ID: ***@***.***>

risserlin · 2024-09-13T17:14:30Z

Hi Yaron,
Ok that makes sense. I forgot that is the way GSEA structures their gmt file. EM and AA are optimized for our gmt files which structures the name and description a little differently. I would recommend switching to them if you can. They are updated on a monthly basis so they are more up to date than the ones released by GSEA - https://download.baderlab.org/EM_Genesets/current_release/ - (info here - https://baderlab.org/GeneSets)
Only caveat is they are only available for Human, Mouse, Rat and Woodchuck.
Thanks,
Ruth

refs #207

mikekucera self-assigned this Sep 5, 2024

mikekucera added a commit that referenced this issue Oct 7, 2024

Warn about using non-numeric column for 'edgeWeightColumn'

c5cd896

refs #207

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to build clustering annotation from the command line #207

Unable to build clustering annotation from the command line #207

ycaspi257 commented Sep 5, 2024 •

edited by mikekucera

Loading

mikekucera commented Sep 5, 2024

ycaspi257 commented Sep 6, 2024 via email

mikekucera commented Sep 6, 2024

mikekucera commented Sep 11, 2024

ycaspi257 commented Sep 12, 2024 via email

risserlin commented Sep 12, 2024

ycaspi257 commented Sep 12, 2024 via email

risserlin commented Sep 12, 2024

ycaspi257 commented Sep 12, 2024 via email

risserlin commented Sep 12, 2024

mikekucera commented Sep 12, 2024

ycaspi257 commented Sep 13, 2024 via email

risserlin commented Sep 13, 2024

Unable to build clustering annotation from the command line #207

Unable to build clustering annotation from the command line #207

Comments

ycaspi257 commented Sep 5, 2024 • edited by mikekucera Loading

mikekucera commented Sep 5, 2024

ycaspi257 commented Sep 6, 2024 via email

mikekucera commented Sep 6, 2024

mikekucera commented Sep 11, 2024

ycaspi257 commented Sep 12, 2024 via email

risserlin commented Sep 12, 2024

ycaspi257 commented Sep 12, 2024 via email

risserlin commented Sep 12, 2024

ycaspi257 commented Sep 12, 2024 via email

risserlin commented Sep 12, 2024

mikekucera commented Sep 12, 2024

ycaspi257 commented Sep 13, 2024 via email

risserlin commented Sep 13, 2024

ycaspi257 commented Sep 5, 2024 •

edited by mikekucera

Loading