MLWIC2 can be used to automatically classify camera trap images or to train new models for image classification, it contains two pre-trained models: the species_model
identifies 58 species and empty images, and the empty_animal
model distinguishes between images with animals and those that are empty. MLWIC2 also contains Shiny apps for running the functions. These can be accessed using runShiny
. In the steps below, you can see Shiny options for some steps. This indicates that you can run these steps with Shiny apps by running the function provied. Note that when you are using Shiny apps to select directories and files, you can only navigate using the top part half of the screen and you must scroll to the bottom of the window to find the Select
button.
If you have issues, please submit them to the issues tab and do not email the authors of this package with questions. This way everyone can learn from the issue.
You need to have Anaconda Navigator installed, along with Python 3.7 (Python 3.6 or 3.5 will also work just as well). If you are using a Windows computer, you will likely need to install Rtools if you don't already have it installed.
# install devtools if you don't have it
if (!require('devtools')) install.packages('devtools')
# check error messages and ensure that devtools installed properly.
# install MLWIC2 from github
devtools::install_github("mikeyEcology/MLWIC2")
# This line might prompt you to update some packages. It would be wise to make these updates.
# load this package
library(MLWIC2)
When running install_github
, some users will get an error about Rcpp or Rlang. This is due to the update to R version 4. If you have issues, update R to the latest version and re-install these R packages.
You only need to run steps 2-3 the first time you use this package on a computer. If you have already run MLWIC on your computer, you can skip step 2
python_loc
is the location of Python on your computer. On Macs, it is often in the default-you can determine the location by opening a terminal window and typingwhich python
. In Windows you can open your command prompt and typewhere python
.- If you already have a conda environment called "r-reticulate" with Python packages installed, you can specify
r_reticulate = TRUE
; if you don't know what this means, leave this argument as the default by not specifying it. - This function installs several necessary Python packages. Running this function will take a few minutes. You may see some errors when you run
setup
- you can ignore these; if there are problems with the installation, whey will become apparent when you runclassify
. - If you want to use a graphics processing unit (GPU), set
gpu=TRUE
in this function. Using a GPU is not necessary to run MLWIC2, and if you are using a trained model to classify images it will not have a major effect, but if you are training a model, a GPU will result in much faster training; see more details here.
Step 3: Download the MLWIC2_helper_files folder from this link.
- Unzip the folder and then store this folder in a location where you can find it on your computer (e.g., Desktop). Note the location, as you will specify this as
model_dir
when you run the functionsclassify
,make_output
, andtrain
. (optional) If you want to check md5sums for this file, the value should be4f3d57ea4d17055cac5df3591f87bbb3
.
Before running models on your own data, I recommend you try running using the example provided.
- Option 1: If you have labels for your images and you want to test the model on your images (set
images_classified=TRUE
), you need to have aninput_file
csv that has at least two columns and one of these must be "filename" and the other must be "class_ID".class_ID
is a column containing a number for the label for each species. If you're using the "species_model", you can find the class_ID associated with each species in this table and put them in this column.
- Option 2: This is the same as option 1, excpet instead of having a column
class_ID
containing the number associated with each species, you have a column calledclass
containing your classifications as words (e.g., "dog" or "cattle", "empty"), the function will find the appropriateclass_ID
associated with these words (class_ID
s can be found in this table). - Option 3: If you do not have your images classified, but you have all of the filenames for the images you want to classify, you can have an
input_file
csv in your with a column called "filename" and whatever other columns you would like. - Option 4: MLWIC2 can find the filenames of all of your images and create your input file. For this option, you need to specify your
path_prefix
which is the parent directory of your images. If you have images stored in sub-folders within this directory, specifyrecursive=TRUE
, if not, you can specifyrecursive=FALSE
. You also need to specify thesuffixes
(e.g., ".jpg") for your filenames so that MLWIC2 knows what types of files to look for. By default (if you don't specify anything), it will look for ".JPG" and ".jpg". - Option 5: If you are planning to train a model, you will want training and testing sets of images. This function will set up these files also, see
?make_input
for more details.
path_prefix
is the absolute path where your images are stored.- You can have image files in subdirectories within your
path_prefix
, but this must be relfected in yourdata_info
file. For example, if you have a file located at.../images/subdirectory1/imagefile.jpg
, and yourpath_prefix=.../images/
, your filename for this image in yourdata_info
file would besubdirectory/imagefile.jpg
.
- You can have image files in subdirectories within your
data_info
is the absolute path to where your input file is stored. Check your output frommake_input
.model_dir
is the absolute path to where you stored the MLWIC2_helper_files folder in step 3.log_dir
is the absolute path to the model you want to use. If you are using the built in models, it is either "species_model" or "empty_animal". If you trained a model with MLWIC2, this would be what you specified as yourlog_dir_train
.os
is your operating system type. If you are using MS Windows, setos="Windows"
, otherwise, you can ignore this argument.num_classes
is the number of species or groups of species in the model. If you are using the species_model,num_classes=1000
; if you're using the empty_animal model,num_classes=2
. If you trained your own model, this is the number that you specified.top_n
is the number of guesses that classes that the model will provide guesses for. E.g., iftop_n=5
, the output will include the top 5 classes that it thinks are in the image (and the confidences that are associated with these guesses).num_cores
is the number of cores on your computer that you want to use. Runningparallel::detectCores()
will tell you how many cores you have on your computer. Depending on how long you intend to run the model, you might not want to use all of your cores. For example, you could specifynum_cores = parallel::detectCores() - 2
so that you would keep two cores available for other processes.- See
?classify
for more options.
If you are having trouble finding your absolute paths, you can use the shiny option MLWIC2::runShiny('classify')
and select your files/directories from a drop down menu. Your paths will be printed on the screen so that next time you can run directly in the R console if you prefer (this is a good way to begin learning how to code).
- If you are using the example images, the command would look something like this (modified based on your computer-specific paths).
classify(path_prefix = "/Users/mikeytabak/Desktop/images", # path to where your images are stored
data_info = "/Users/mikeytabak/Desktop/image_labels.csv", # path to csv containing file names and labels
model_dir = "/Users/mikeytabak/Desktop/MLWIC2_helper_files", # path to the helper files that you downloaded in step 3, including the name of this directory (i.e., `MLWIC2_helper_files`)
python_loc = "/anaconda2/bin/", # location of python on your computer
save_predictions = "model_predictions.txt", # how you want to name the raw output file
make_output = TRUE, # if TRUE, this will produce a csv with a more friendly output
output_name = "MLWIC2_output.csv", # if make_output==TRUE, this will be the name of your friendly output file
num_cores = 4 # the number of cores you want to use on your computer. Try runnning parallel::detectCores() to see what you have available. You might want to use something like parallel::detectCores()-1 so that you have a core left on your machine for accomplishing other tasks.
)
- This function uses Exiftool software. Exiftool is a command line tool and
write_metadata
is a wrapper that will run the software to create metadata categories and fill them with the output ofclassify
. If you want to use this function you will need to first install Exiftool following the directions here. output_file
is the path to and file name of your output file from classify (model_dir
+/
+output_name
). Unless you deviated from the default settings, this file should be located in yourMLWIC2_helper_files
folder.model_type
is either the "species_model" or the "empty_animal" model- You might need to specify your
exiftool_loc
if you are running on a Windows computer. This is the path to your exiftool installation. - Here is how I would run this function given my example call to classify above.
write_metadata(output_file="/Users/mikeytabak/Desktop/MLWIC2_helper_files/MLWIC2_output.csv", # note that if you look at the classify command above, this is the [model_dir]/[output_name]
model_type="species_model", # the type of model I used for classify
exiftool_loc="/usr/local/bin", # location where exiftool is stored, you might not need to specify this.
show_sys_output = FALSE
)
If you aren't satisfied with the accuracy of the builtin models, you can train train your own model using your images. The parameters will be similar to those for classify
, but you will want to specify some more options based on how you want to train the model.
path_prefix
is the absolute path where your images are stored.data_info
is the absolute path to where your input file is stored. Check your output frommake_input
.model_dir
is the absolute path to where you stored the MLWIC2_helper_files folder in step 3.num_classes
is the number of species (or groups of species) you want the model to recognizearchitecture
is the DNN architecture. The options are c("alexnet", "densenet", "googlenet", "nin", "resnet", "vgg"). I recommend starting with "resnet" and setdepth=18
. If you get poor accuracy with this, "densenet" is another good option.depth
is the number of layers in the DNN. If you are using resnet, the options are c(18, 34, 50, 101, 152). If you are using densenet, the options are c(121, 161, 169, 201), otherwise, the depth will be automatically set for you.batch_size
is the number of images simultaneously passed to the model for training. It must be a multiple of 16. Smaller numbers will train models that are more accurate, but it will take longer to train. The default is 128.log_dir_train
is the directory where you will store the model information. This will be called when you what you specify in thelog_dir
option of theclassify
function. You will want to use unique names if you are training multiple models on your computer; otherwise they will be over-writtenretrain
If TRUE, the model you train will be a retraining of the model you specify inretrain_from
. If FALSE, you are starting training from scratch. Retraining will be faster but training from scratch will be more flexible.retrain_from
name of the directory from which you want to retrain the model. If you are retraining from the species model, you would setretrain_from="species_model"
. If you need to stop training (e.g., you have to turn off your computer), you canretrain_from
what you set as yourlog_dir_train
and set yournum_epochs
to the total number you want minus the number that have completed.num_epochs
the number of epochs you want to use for training. The default is 55 and this is recommended for training a full model. But if you need to start and stop training, you can decrease this number.- You can read about more options by typing
?train
into the console.
If you use this package in a publication, please site our manuscript:
Tabak, M. A., Norouzzadeh, M. S., Wolfson, D. W., Newton, E. J., Boughton, R. K., Ivan, J. S., … Miller, R. S. (2020-In Press). Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2. BioRxiv, 2020.03.18.997700. doi:10.1101/2020.03.18.997700
@article{tabakImprovingAccessibilityTransferability2020,
title = {Improving the Accessibility and Transferability of Machine Learning Algorithms for Identification of Animals in Camera Trap Images: {{MLWIC2}}},
shorttitle = {Improving the Accessibility and Transferability of Machine Learning Algorithms for Identification of Animals in Camera Trap Images},
author = {Tabak, Michael A. and Norouzzadeh, Mohammad S. and Wolfson, David W. and Newton, Erica J. and Boughton, Raoul K. and Ivan, Jacob S. and Odell, Eric A. and Newkirk, Eric S. and Conrey, Reesa Y. and Stenglein, Jennifer and Iannarilli, Fabiola and Erb, John and Brook, Ryak K. and Davis, Amy J. and Lewis, Jesse and Walsh, Daniel P. and Beasley, James C. and VerCauteren, Kurt C. and Clune, Jeff and Miller, Ryan S.},
year = {2020},
month = mar,
pages = {2020.03.18.997700},
publisher = {{Cold Spring Harbor Laboratory}},
doi = {10.1101/2020.03.18.997700},
journal = {bioRxiv},
language = {en}
}
Disclaimer: MLWIC2 is a free software that comes with no warranty. You are recommended to test the software's ability to classify images in your dataset and not assume that the reported accuracy will be found on your images. The authors of this paper are not responsible for any decisions or interpretations that are made as a result of using MLWIC2.