Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve_efficiency #78

Merged
merged 82 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
d89e165
first attempt of improve speed
randrescastaneda Sep 19, 2024
333bd5b
first attempt of improving is_id
randrescastaneda Sep 19, 2024
0f0b7d5
build super big data to test efficiency in possible_ids
randrescastaneda Sep 19, 2024
baa6bd6
break down by function init
randrescastaneda Sep 20, 2024
1bd7aee
update
randrescastaneda Sep 20, 2024
209f272
update format
randrescastaneda Sep 20, 2024
fb9dff9
add some testing
randrescastaneda Sep 20, 2024
f222a78
passed all tests
randrescastaneda Sep 20, 2024
a9d99d9
document
randrescastaneda Sep 20, 2024
ab45f0b
fix documentation
randrescastaneda Sep 20, 2024
a633ba4
add old function
randrescastaneda Sep 23, 2024
b3fe098
small changes
randrescastaneda Sep 24, 2024
8992048
small changes
RossanaTat Sep 24, 2024
a61102c
add vars argument
RossanaTat Sep 24, 2024
0ee172b
fix documentation
RossanaTat Sep 24, 2024
c6a8ab7
fix msg
RossanaTat Sep 24, 2024
2cb1362
fix error with single var
RossanaTat Sep 24, 2024
451b025
adding tests
RossanaTat Sep 24, 2024
35bd6e7
adding tests
RossanaTat Sep 24, 2024
fc29864
adding test/identify probs
RossanaTat Sep 25, 2024
694aae4
fix relationship between include and vars
RossanaTat Sep 25, 2024
8a5775f
small adds
RossanaTat Sep 25, 2024
2faa4a8
add more tests and fix issue with vars-include
RossanaTat Sep 26, 2024
1dddea5
more tests
RossanaTat Sep 26, 2024
e086a07
small fix
RossanaTat Sep 26, 2024
3d5c28b
try again fix vars arg
RossanaTat Sep 26, 2024
8ffdd2e
fix filter by name -duplicates issue
RossanaTat Sep 27, 2024
1bf176a
fix again filtering and add tests
RossanaTat Sep 27, 2024
be8f101
add checked vars attempt one
RossanaTat Sep 27, 2024
4db75b8
fixes
RossanaTat Sep 30, 2024
151d7f4
fix issue with storing in joynenv and adding attribute of checked_ids
RossanaTat Sep 30, 2024
a6fee8b
fix get_all error
RossanaTat Oct 1, 2024
ae49f2d
another issue of get_all ??
RossanaTat Oct 1, 2024
5d85558
fix issue with indexing inside the loop
RossanaTat Oct 1, 2024
fdbd3d7
add tests
RossanaTat Oct 1, 2024
883bd46
test max comb size
RossanaTat Oct 1, 2024
44c92e9
clean code
RossanaTat Oct 2, 2024
c84dfff
debugging store checked vars
RossanaTat Oct 2, 2024
1dd5e13
add tests
RossanaTat Oct 2, 2024
97ce951
fix store checked vars again
RossanaTat Oct 3, 2024
40930d6
update docuemntation
RossanaTat Oct 3, 2024
aa46744
aux fun to store and return ids attempt one
RossanaTat Oct 3, 2024
ed5cc75
implement it and update docuemntation
RossanaTat Oct 3, 2024
c870803
documentation again
RossanaTat Oct 3, 2024
668dccf
update tests
RossanaTat Oct 3, 2024
63d01d0
clean some code
RossanaTat Oct 3, 2024
8d8c3d5
fix error in filter by name
RossanaTat Oct 3, 2024
b0f8b7c
fix tests
RossanaTat Oct 3, 2024
f8d3004
update ret list
RossanaTat Oct 4, 2024
6bb9117
update tests on include and exclude plus store checked vars
RossanaTat Oct 4, 2024
fe5c15f
more tests
RossanaTat Oct 4, 2024
c38d614
update tests and ensure they pass
RossanaTat Oct 4, 2024
2e26d95
clean code
RossanaTat Oct 4, 2024
0b74d2c
attempt one, two and three
RossanaTat Oct 8, 2024
9eb2519
fix issue with duplicate vars
RossanaTat Oct 8, 2024
8b2d8fb
try again, fixing the issue of more rows than n_rows
RossanaTat Oct 9, 2024
1be3522
calculate max val insisde the fun instead
RossanaTat Oct 9, 2024
35236b2
fix issue with single id
RossanaTat Oct 9, 2024
6eca3d5
fix again when n_ids is 1
RossanaTat Oct 9, 2024
cb9ad26
add tests
RossanaTat Oct 9, 2024
e28e4d7
add more tests on create ids
RossanaTat Oct 9, 2024
26415ac
first cleanup of code
RossanaTat Oct 9, 2024
041ab52
tests
RossanaTat Oct 9, 2024
d689569
clean code and add documentation
RossanaTat Oct 9, 2024
bb22bd2
ensure tests pass
RossanaTat Oct 9, 2024
050d3ac
remove lapply
RossanaTat Oct 10, 2024
b65d12b
add condition for when rows is larger -not always!
RossanaTat Oct 10, 2024
57b4dbd
documentation for checked ids
RossanaTat Oct 11, 2024
75b48cd
doc again
RossanaTat Oct 11, 2024
f2c29c6
small fix
RossanaTat Oct 15, 2024
b899960
improve is_id and freq_table
randrescastaneda Oct 17, 2024
9f003c0
Merge pull request #76 from randrescastaneda:fix_is_id
RossanaTat Oct 22, 2024
1f655cc
merge fix is id
RossanaTat Oct 22, 2024
80d3878
print freq table only if verbose
RossanaTat Oct 23, 2024
28b0ed0
Merge branch 'improve_efficiency' into create_ids solving conflicts
randrescastaneda Oct 30, 2024
28e9f45
Merge pull request #75 from randrescastaneda/create_ids
randrescastaneda Oct 30, 2024
eafc27f
fix error in vignette
RossanaTat Oct 30, 2024
3a4ad09
Merge branch 'improve_efficiency' of https://github.com/randrescastan…
RossanaTat Oct 30, 2024
4d79020
clean code: remove old possible ids
RossanaTat Oct 30, 2024
1b1b375
fix warning
RossanaTat Oct 30, 2024
23d8eef
addressing warnings and notes
RossanaTat Oct 31, 2024
4fa5e98
Merge branch 'DEV' into improve_efficiency
RossanaTat Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 24 additions & 11 deletions R/freq_table.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ if (getRversion() >= '2.15.1')
#' @param byvar character: name of variable to tabulate. Use Standard evaluation.
#' @param digits numeric: number of decimal places to display. Default is 1.
#' @param na.rm logical: report NA values in frequencies. Default is FALSE.
#' @param freq_var_name character: name for frequency variable. Default is "n"
#'
#' @return data.table with frequencies.
#' @export
Expand All @@ -26,33 +27,45 @@ if (getRversion() >= '2.15.1')
freq_table <- function(x,
byvar,
digits = 1,
na.rm = FALSE) {
na.rm = FALSE,
freq_var_name = "n") {

x_name <- as.character(substitute(x))
if (!is.data.frame(x)) {
cli::cli_abort("Argument {.arg x} ({.field {x_name}}) must be a data frame")
}
if (isFALSE(is.data.table(x))) {
x <- qDT(x)
}


fq <- qtab(x[[byvar]], na.exclude = na.rm)
ft <- data.frame(joyn = names(fq),
n = as.numeric(fq))
fq <- qtab(x[, ..byvar], na.exclude = na.rm, dnn = byvar)

ft <- fq |>
as.data.table() |>
setnames("N", "n") |>
# filter zeros
fsubset(n > 0)

N <- fsum(ft$n)
ft <- ft |>
ftransform(percent = paste0(round(n / N * 100, digits), "%"))

# add row with totals
ft <- rowbind(ft, data.table(joyn = "total",
n = N,
percent = "100%")) |>
# filter zeros
fsubset(n > 0)
total_row <- rep("total", length(byvar)) |>
as.list() |>
as.data.table() |>
setnames(new = byvar) |>
ftransform(n = N,
percent = "100%")

setrename(ft, joyn = byvar, .nse = FALSE)
ft <- rowbind(ft, total_row)
setrename(ft,
n = freq_var_name,
.nse = FALSE)
}



#' Report frequencies from attributes in report var
#'
#' @param x dataframe from [joyn_workhorse]
Expand Down
48 changes: 25 additions & 23 deletions R/is_id.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,40 +35,42 @@ if (getRversion() >= '2.15.1')
#' is_id(y1, by = "id")
is_id <- function(dt,
by,
verbose = getOption("joyn.verbose"),
verbose = getOption("joyn.verbose", default = FALSE),
return_report = FALSE) {

# make sure it is data.table
if (!(is.data.table(dt))) {
# Ensure dt is a data.table
if (!is.data.table(dt)) {
dt <- as.data.table(dt)
} else {
dt <- data.table::copy(dt)
}

# count
m <- dt[, .(copies =.N), by = mget(by)]
is_id <- m[, mean(copies)] == 1
# Check for duplicates
is_id <- !(anyDuplicated(dt, by = by) > 0)

if (verbose) {

cli::cli_h3("Duplicates in terms of {.code {by}}")

d <- freq_table(m, "copies")
print(d[])

cli::cli_rule(right = "End of {.field is_id()} report")

if (is_id) {
cli::cli_alert_success("No duplicates found by {.code {by}}")
} else {
cli::cli_alert_warning("Duplicates found by: {.code {by}}")
}
}

if (isFALSE(return_report)) {

return(is_id)
if (return_report) {
# Return the duplicated rows if requested
if (verbose) cli::cli_h3("Duplicates in terms of {.code {by}}")

} else {
d <- freq_table(x = dt,
byvar = by,
freq_var_name = "copies")

return(m)
if (verbose) {
d |>
fsubset(copies > 1) |>
print()
}

if (verbose) cli::cli_rule(right = "End of {.field is_id()} report")
return(invisible(d))
} else {
return(is_id)
}

}

Loading
Loading