Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility #37

Merged
merged 21 commits into from
Oct 5, 2023
Merged

Reproducibility #37

merged 21 commits into from
Oct 5, 2023

Conversation

Ben-Drucker
Copy link

See Issue #36 for details.

@Ben-Drucker Ben-Drucker linked an issue Jun 8, 2023 that may be closed by this pull request
R/rf_modeling.R Outdated Show resolved Hide resolved
R/rf_modeling.R Outdated Show resolved Hide resolved
@Ben-Drucker Ben-Drucker force-pushed the ben-reproducibility branch from c36dfbe to be77774 Compare June 8, 2023 20:44
@Ben-Drucker Ben-Drucker closed this Aug 7, 2023
@Ben-Drucker Ben-Drucker reopened this Aug 7, 2023
@Ben-Drucker Ben-Drucker changed the title Reproducibility [not working] Reproducibility Aug 7, 2023
@Ben-Drucker Ben-Drucker self-assigned this Aug 7, 2023
Comment on lines 232 to +233
res <- parLapply(cl = multiproc_cl, X = 1:K, fun = fn)
# stopCluster(multiproc_cl) # replaced with on.exit()
#

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting cleanup

Comment on lines -197 to +216
y=dSet[!i,response])
y=dSet[!i,response], seed = seed)
Copy link
Author

@Ben-Drucker Ben-Drucker Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass seed to downstream FUN

Comment on lines +1 to +2
linters: linters_with_defaults(cyclocomp_linter = NULL, commented_code_linter = NULL, object_name_linter = NULL, line_length_linter = line_length_linter(length = 90L))
encoding: "UTF-8"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boilerplate static analysis setup

Comment on lines -159 to +160
sel.alg=c("varSelRF","Boruta","top"), ...){
sel.alg=c("varSelRF","Boruta","top"), cores=NULL,
seed=0, ...){
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided cores and seed options

Comment on lines -178 to +179
Boruta = function(x,y) select_features_Boruta(x,y,...),
Boruta = function(x,y, ...) select_features_Boruta(x,y,...),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enables seed to be passed to select_features_Boruta.

top = select_features_top)

# do K-fold split here
if(is.null(K))
K <- nrow(dSet)
num_rep <- ceiling(nrow(dSet)/K)
set.seed(seed)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set seed to determine cv_idx.

R/rf_modeling.R Outdated
Comment on lines 212 to 214
RNGkind("L'Ecuyer-CMRG")
set.seed(i)
seed <- i
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set random number generator kind and random seed in worker function by its index (i).

Comment on lines +200 to +206
clusterEvalQ(multiproc_cl, invisible(suppressWarnings({
Sys.setenv(`_R_S3_METHOD_REGISTRATION_NOTE_OVERWRITES_` = "false")
suppressWarnings(suppressPackageStartupMessages({
library("MSnID")
library("Biobase")
}))
})))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable quiet evaluation of library loads

R/rf_modeling.R Outdated
Comment on lines 188 to 198
if(is.null(cores)){
cores <- max(1, detectCores() - 1)
}
stopifnot(1 <= cores)
if(cores > detectCores()){
msg <- sub("\n", "", "The number of specified processes is greater than
the number of cores available on this computer.
This may lead to high computational overhead.")
warning(msg)
}
multiproc_cl <- makeCluster(cores, outfile = "")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic # of cores determination

@Ben-Drucker Ben-Drucker marked this pull request as ready for review August 7, 2023 23:46
@Ben-Drucker
Copy link
Author

To try out, use

remotes::install_github("PNNL-Comp-Mass-Spec/MSnSet.utils", ref = "ben-reproducibility", force = TRUE)

@vladpetyuk vladpetyuk merged commit 6d0d6ee into master Oct 5, 2023
3 checks passed
@vladpetyuk vladpetyuk deleted the ben-reproducibility branch October 5, 2023 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random seeds seem to have no effect
2 participants