This repository is devoted to a computational project on recognizing so-called off-sample images in imaging mass spectrometry data. The project is carried out by the Alexandrov team at EMBL Heidelberg. We used public data from METASPACE to create a gold standard set of ion images, as well as developed and evaluated several methods for recognizing off-sample ion images.
For more information, please see our recent paper Ovchinnikova et al. (2020) BMC Bioinformatics.
Team:
- Katja Ovchinnikova: biclustering and molecular co-localization method development, gold standard preparation
- Vitaly Kovalev: deep learning method development
- Lachlan Stuart: development of the TagOff web app
- Theodore Alexandrov: supervision, gold standard preparation
We used public datasets from METASPACE, a community-populated knowledge base of metabolite images. Please see the section Acknowledgements acknowledging contributors of the used data.
TagOff was rapidly prototyped using the METASPACE codebase as a foundation, allowing its back-end, image display and annotation filtering to be reused. The TagOff-specific changes can be found in this commit range.
It can be run by starting the METASPACE webapp,
then navigating to http://localhost:8999/#/imageclassifier?db=HMDB-v4&user=your_name&max=10000&ds=2016-12-07_07h59m24s.
The querystring of the URL encodes the filter criteria used to select the annotations.
New criteria can be created and copied from the Annotations page of METASPACE.
Two other parameters exist: max
and user
. max
limits the number of annotations shown, and user
accepts a name
which is added to the image labels, allowing multiple people to independently label the same image.
After annotations have been made, the data can be exported with:
sqlite3 -header -csv ./metaspace/webapp/imageclassification.sqlite "select * from imageclassifications" > ./metaspace/webapp/dist/results.csv
Copy and paste the commands below into a terminal
To download and unpack an archive with the images
wget -qO - https://github.com/metaspace2020/offsample/releases/download/0.2/GS.tar.gz | tar -xvz
To download and unpack an archive with the images grouped by a predicted class as well as predicted probabilities
wget -qO - https://github.com/metaspace2020/offsample/releases/download/0.2/gs_predictions.tar.gz | tar -xvz
wget -qO - https://s3-eu-west-1.amazonaws.com/sm-off-sample/pixel-annot-export-v0.10.tar.gz | tar -xvz
We trained Convolutional Neural Networks using Fastai and PyTorch libraries. The best performance we achieved using Resnet50 CNN pretrained on Imagenet. More details here
The model was wrapped into a web service and deployed as a part of Metaspace. The service implementation is available on GitHub
We have generated DHB matrix clusters according to (Keller and Li, 2000). This resulted in 353 molecular formulas available here.
We are planning to integrate the best methods into https://metaspace2020.eu.
Would like to cite this project in a scientific publication? Please cite Ovchinnikova et al. (2020) BMC Bioinformatics.
We thank the contributors of all public data to METASPACE and particularly those whose data was selected for the gold standard: Sarah Aboulmagd, Michael Becker, Dhaka Bhandari, Mark Bokhart, Berin Boughton, Shane Ellis, Mathieu Gaudin, Erin Gemperline, Cristina Gonzalez Lopez, Richard Goodwin, Anne Mette Handler, Bram Heijs, Sophie Jacobsen, Christian Janfelt, Emrys Jones, Patrik Kadesch, Pegah Khamehgir-Silz, Mario Kompauer, Lingjun Li, Manuel Liebeke, Michael Linscheid, James McKenzie, David Muddiman, Andrew Palmer, József Pánczél, Marina Reuter, Livia S. Eberlin, Veronika Saharuka, Marta Sans, Julian Schneemann, Kumar Sharma, Bernhard Spengler, Nicole Strittmatter, Zoltan Takats, Dusan Velickovic, Eric Weaver, Guanshi Zhang. The work was supported by the funding from the EU Horizon2020 project METASPACE (No. 634402), NIH NIDDK project KPMP, ERC Consolidator project METACELL (No. 773089).
Unless specified otherwise in file headers or LICENSE files present in subdirectories, all files in this repository are licensed under the Apache 2.0 license.