Skip to content

code for Machine classification of inclusion criteria from Cochrane systematic reviews

Notifications You must be signed in to change notification settings

infoqualitylab/InclusionCriteria

Repository files navigation

InclusionCriteria

Instructions Creators: Xiaoru Dong, Linh Hoang
Preparation date: 2018-12-14, last updated 2019-04-18
Manuscript working title: Machine classification of inclusion criteria from Cochrane systematic reviews
Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider

INSTRUCTIONS

Description:

These instructions describe the steps needed to replicate the results in the manuscript.

1. Python program:

  • Programming Language: Python (version 3.0)

  • Please make sure that you have the following programs on your machine in order to run the scripts:

  • The Python scripts are used to generate features and to create the Weka input files corresponding to the 3 feature extraction and selection approaches that we implemented in this study:

    • Features generated by the bag of words feature extraction strategy.
    • Features selected by the information gain feature selection strategy.
    • Features selected by a manual analysis feature selection strategy.

Python script to generate features by the bag of words feature extraction approach:

  • Step 1: Download the script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/bag_of_words_feature_extraction.ipynb

  • Step 2: Download the input file “Inclusion_Criteria_Annotation.csv” (one of the study’s data files), which is available at: https://doi.org/10.13012/B2IDB-5958960_V2. Note where you store the file.

  • Step 3: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.

  • Step 4: Run the script to get two output files: "AllWords.csv" and "AllWords_weka_input.arff"

  • Step 5: Use the "AllWords_weka_input.arff" file as the input in order to run the classification model in Weka (for how to run classification model in Weka, please read the Weka section below)

Python script to generate features by the information gain feature selection approach:

  • Step 1: Download the first script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/generate_no_redundant_Weka_input.ipynb

  • Step 3: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.

  • Step 4: Run the script to get two output files: "AllWord_Noredundant.csv" and "AllWord_Noredundant_weka_input.arff"

  • Step 5: Use the "AllWord_Noredundant_weka_input.arff" file as input in order to run information gain in Weka (for how to run information gain in Weka, please read the Weka section below). After running information gain in Weka, save the "InformativeWords" from Weka to the same folder.

  • Step 6: Download the second script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/information_gain_feature_selection.ipynb

  • Step 7: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.

  • Step 8: Run the script to get two output files: "WordsSelectedByInformationGain.csv" and "WordsSelectedByInformationGain_weka_input.arff"

  • Step 9: Use the "WordsSelectedByInformationGain_weka_input.arff" file as input in order to run classification model in Weka (for how to run classification model in Weka, please read the Weka section below)

Python script to generate features by the information gain feature selection approach:

  • Step 1: Download the script: https://github.com/infoqualitylab/InclusionCriteria/blob/master/manual_analysis_feature_selection.ipynb

  • Step 2: Download the input file “WordsSelectedByManualAnalysis.csv” (one of the study’s data files), which is available at: https://doi.org/10.13012/B2IDB-8659314_V1. Note where you store the file.

  • Step 3: Open the script in Jupyter Notebook. Change the “path” variable in the script to the path of your own folder where you stored the input file.

  • Step 4: Run the script to get one output file: "WordSelectedbyManualAnalysis_weka_input.arff"

  • Step 5: Use the "WordSelectedbyManualAnalysis_weka_input.arff" file as input in order to run classification model in Weka (for how to run classification model in Weka, please read the Weka section below)

2. Weka program:

Run classification in Weka:

  • Step 1: Open Weka on your machine, select “Explorer” mode.
  • Step 2: On the “Preprocess” tab:
    --> Click “Open file” and select the Weka input file that you want to implement classification with. For example: if you want to implement a classifier with all features, select the “AllWords_weka_input.arff” Weka input file as shown in the screenshot below.
    1
    --> Click “All” to choose all of the words and use them as features to train the classifier as shown in the screenshot below.
    2
  • Step 3: On the “Classify” tab:
    --> Click “Choose” to select the algorithm that you want to run. NOTE: We used three algorithms: Random Forest, J48 and Naïve Bayes. For example: if you want to run a classifier using “Random Forest” algorithm, select RandomForest as shown in the screenshot below:
    3
    --> Click “Percentage split” in the “Test options” and put 90% (this means we want to get 90% of our data set for training, 10% for testing).
    --> Click “More Options...” and set the seed to 3.
    --> Click “Start” to run the classifier:
    4
  • Step 4: Get the classifier results. Three measurements were reported in our manuscript: Precision, Recall and F-Measure as shown in the screenshot below.
    5

Run information gain o Weka:

  • We also used Weka to run Information Gain feature selection. To do so:
    --> On the “Preprocess” tab: Click “Open file” and select the Weka input file AllWord_Noredundant_weka_input.arff
    --> On the “Select attributes” tab:
    Click “Choose” and select “InfoGainAttributeEval” as shown in the screenshot below.
    6
    Click “Start” to run the Information Gain feature selection.
    --> Weka generated a list of informative words selected by Information Gain feature selection strategy. We then used the python script (above) to generate the data file “WordsSelectedByInformationGain.csv” and the Weka input file “WordsSelectedByInformationGain_weka_input.arff” accordingly.

For any questions about the instruction, please contact:
Linh Hoang - [email protected].

About

code for Machine classification of inclusion criteria from Cochrane systematic reviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •