InclusionCriteria

Instructions Creators: Xiaoru Dong, Linh Hoang
Preparation date: 2018-12-14, last updated 2019-04-18
Manuscript working title: Machine classification of inclusion criteria from Cochrane systematic reviews
Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider

INSTRUCTIONS

Description:

These instructions describe the steps needed to replicate the results in the manuscript.

1. Python program:

Programming Language: Python (version 3.0)
Please make sure that you have the following programs on your machine in order to run the scripts:
- Python 3.0: https://www.python.org/downloads/
- Jupyter Notebook: http://jupyter.org/install
The Python scripts are used to generate features and to create the Weka input files corresponding to the 3 feature extraction and selection approaches that we implemented in this study:
- Features generated by the bag of words feature extraction strategy.
- Features selected by the information gain feature selection strategy.
- Features selected by a manual analysis feature selection strategy.

Python script to generate features by the bag of words feature extraction approach:

Step 1: Download the script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/bag_of_words_feature_extraction.ipynb
Step 2: Download the input file “Inclusion_Criteria_Annotation.csv” (one of the study’s data files), which is available at: https://doi.org/10.13012/B2IDB-5958960_V2. Note where you store the file.
Step 3: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.
Step 4: Run the script to get two output files: "AllWords.csv" and "AllWords_weka_input.arff"
Step 5: Use the "AllWords_weka_input.arff" file as the input in order to run the classification model in Weka (for how to run classification model in Weka, please read the Weka section below)

Python script to generate features by the information gain feature selection approach:

Step 1: Download the first script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/generate_no_redundant_Weka_input.ipynb
Step 3: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.
Step 4: Run the script to get two output files: "AllWord_Noredundant.csv" and "AllWord_Noredundant_weka_input.arff"
Step 5: Use the "AllWord_Noredundant_weka_input.arff" file as input in order to run information gain in Weka (for how to run information gain in Weka, please read the Weka section below). After running information gain in Weka, save the "InformativeWords" from Weka to the same folder.
Step 6: Download the second script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/information_gain_feature_selection.ipynb
Step 7: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.
Step 8: Run the script to get two output files: "WordsSelectedByInformationGain.csv" and "WordsSelectedByInformationGain_weka_input.arff"
Step 9: Use the "WordsSelectedByInformationGain_weka_input.arff" file as input in order to run classification model in Weka (for how to run classification model in Weka, please read the Weka section below)

Python script to generate features by the information gain feature selection approach:

Step 1: Download the script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/manual_analysis_feature_selection.ipynb
Step 2: Download the input file “WordsSelectedByManualAnalysis.csv” (one of the study’s data files), which is available at: https://doi.org/10.13012/B2IDB-8659314_V1. Note where you store the file.
Step 3: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.
Step 4: Run the script to get one output file: "WordSelectedbyManualAnalysis_weka_input.arff"
Step 5: Use the "WordSelectedbyManualAnalysis_weka_input.arff" file as input in order to run classification model in Weka (for how to run classification model in Weka, please read the Weka section below)

2. Weka program:

Please make sure that you have Weka on your machine in order to implement the classifiers: https://www.cs.waikato.ac.nz/ml/weka/downloading.html

Run classification in Weka:

Step 1: Open Weka on your machine, select “Explorer” mode.
Step 2: On the “Preprocess” tab:
--> Click “Open file” and select the Weka input file that you want to implement classification with. For example: if you want to implement a classifier with all features, select the “AllWords_weka_input.arff” Weka input file as shown in the screenshot below.

--> Click “All” to choose all of the words and use them as features to train the classifier as shown in the screenshot below.
Step 3: On the “Classify” tab:
--> Click “Choose” to select the algorithm that you want to run. NOTE: We used three algorithms: Random Forest, J48 and Naïve Bayes. For example: if you want to run a classifier using “Random Forest” algorithm, select RandomForest as shown in the screenshot below:

--> Click “Percentage split” in the “Test options” and put 90% (this means we want to get 90% of our data set for training, 10% for testing).
--> Click “More Options...” and set the seed to 3.
--> Click “Start” to run the classifier:
Step 4: Get the classifier results. Three measurements were reported in our manuscript: Precision, Recall and F-Measure as shown in the screenshot below.

Run information gain o Weka:

We also used Weka to run Information Gain feature selection. To do so:
--> On the “Preprocess” tab: Click “Open file” and select the Weka input file AllWord_Noredundant_weka_input.arff
--> On the “Select attributes” tab:
Click “Choose” and select “InfoGainAttributeEval” as shown in the screenshot below.

Click “Start” to run the Information Gain feature selection.
--> Weka generated a list of informative words selected by Information Gain feature selection strategy. We then used the python script (above) to generate the data file “WordsSelectedByInformationGain.csv” and the Weka input file “WordsSelectedByInformationGain_weka_input.arff” accordingly.

For any questions about the instruction, please contact:
Linh Hoang - [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Extract_Cochrane_Review_Inclusion_Criteria.ipynb		Extract_Cochrane_Review_Inclusion_Criteria.ipynb
README.md		README.md
bag_of_words_feature_extraction.ipynb		bag_of_words_feature_extraction.ipynb
generate_no_redundant_Weka_input.ipynb		generate_no_redundant_Weka_input.ipynb
info_errors.ipynb		info_errors.ipynb
information_gain_feature_selection.ipynb		information_gain_feature_selection.ipynb
manual_analysis_feature_selection.ipynb		manual_analysis_feature_selection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InclusionCriteria

INSTRUCTIONS

Description:

1. Python program:

Python script to generate features by the bag of words feature extraction approach:

Python script to generate features by the information gain feature selection approach:

Python script to generate features by the information gain feature selection approach:

2. Weka program:

Run classification in Weka:

Run information gain o Weka:

About

Releases

Packages

Contributors 3

Languages

infoqualitylab/InclusionCriteria

Folders and files

Latest commit

History

Repository files navigation

InclusionCriteria

INSTRUCTIONS

Description:

1. Python program:

Python script to generate features by the bag of words feature extraction approach:

Python script to generate features by the information gain feature selection approach:

Python script to generate features by the information gain feature selection approach:

2. Weka program:

Run classification in Weka:

Run information gain o Weka:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages