A Go toolkit for finding hot topics in biological and medical field.
Living in an era where findings in science and technology come out every single day, how to catch hot topics in the field of interest from tons of literature becomes an important problem. Also, when stepping into a new research area, we usually need to read some related papers, including classical ones and latest ones, to get a sense of what is it and what contributions can we make to this field. But carefully reading papers one by one requires much time and energy and sometimes we even don't need to know all the information in papers, maaybe some key words are sufficient for us to start.
To address this problem, here I present BioHits, a Go toolkit that can find hot topics for us automatically with just a few lines of code by scraping PubMed iteratures, processing words in title and abstract, and doing some statistical summary.
It originally is a course project for 02601: Programming for Scientists, but I will continue making improvements and updates after finishing this course.
BioHits has two dependencies: colly and wordclouds, which had been zipped into the tar file
tar file
|
└───github.com
│ │
│ └───gocolly
| └───colly
│ │
│ └───psykhi
| └───wordclouds
│
└───BioHits
| |
| └───fonts
| └───stopwords
| │ go.mod
| │ go.sum
| | ...
|
└───test
| │ go.mod
| │ go.sum
| │ main.go
| │ test.exe
Because BioHits is built as a package, it is better if we test it by importing it and its functions in another program.
- Important note 1 : this package needs
go module
. In order to turn ongo module
, try runset GO111MODULE=on
orexport GO111MODULE on
in your terminal. - Important note 2 : because this package needs
go module
, if you run into an error saysCan't find BioHits in xxx directory
, try only copy the BioHits folder toGOROOT/src
(usego env
command to find yourGOROOT
dir). - Important note 3 : DO NOT enter in
BioHits
folder, enter intest
folder instead, and run the program by running./test.exe
(default parameters).
- There are 6 main functions implemented in BioHits, which are all used in test file
main.go
. - There are 12 parameters in total, whose type, description, default values have been listed in
main.go
. Parameters can be changed using-flag value
, such as./test.exe -keyWords gene,cancer,kidney
.
By default, there are three outputs, searchOutput.txt
, wordCloud.png
and topwords.txt
.
searchOutput.txt
: it contains the PMID, title and abstract of papers.wordCloud.png
: this is the word cloud image generated using words extracted from title and abstract of papers.topwords.txt
: this txt file contains top words returned by BioHits.
Xin Wang, [email protected], Carnegie Mellon University
If you have any question, please post an issue at here
Copyright © 2021. Xin Wang. All Rights Reserved