Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maximum of column with missing #50

Open
sprmnt21 opened this issue Apr 3, 2022 · 3 comments
Open

maximum of column with missing #50

sprmnt21 opened this issue Apr 3, 2022 · 3 comments

Comments

@sprmnt21
Copy link

sprmnt21 commented Apr 3, 2022

Trying to follow some examples from the tutorial, I found different outputs than expected(as showed in the documentation).

julia> ds = Dataset(g = [2, 1, 1, 2, 2],
                                 x1_int = [0, 0, 1, missing, 2],
                                 x2_int = [3, 2, 1, 3, -2],
                                 x1_float = [1.2, missing, -1.0, 2.3, 10],
                                 x2_float = [missing, missing, 3.0, missing, missing],     
                                 x3_float = [missing, missing, -1.4, 3.0, -100.0])
5×6 Dataset
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float
     │ identity  identity  identity  identity   identity   identity
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?
─────┼───────────────────────────────────────────────────────────────
   1 │        2         0         3        1.2  missing    missing
   2 │        1         0         2  missing    missing    missing
   3 │        1         1         1       -1.0        3.0       -1.4
   4 │        2   missing         3        2.3  missing          3.0
   5 │        2         2        -2       10.0  missing       -100.0

julia> groupby!(ds, 1)
5×6 Grouped Dataset with 2 groups
Grouped by: g
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float  
     │ identity  identity  identity  identity   identity   identity  
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?  
─────┼───────────────────────────────────────────────────────────────
   1 │        1         0         2  missing    missing    missing   
   2 │        1         1         1       -1.0        3.0       -1.4
   3 │        2         0         3        1.2  missing    missing   
   4 │        2   missing         3        2.3  missing          3.0
   5 │        2         2        -2       10.0  missing       -100.0

julia> modify(ds, r"int" => x -> x .- maximum(x))
5×6 Grouped Dataset with 2 groups
Grouped by: g
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float  
     │ identity  identity  identity  identity   identity   identity  
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?  
─────┼───────────────────────────────────────────────────────────────
   1 │        1        -1         0  missing    missing    missing   
   2 │        1         0        -1       -1.0        3.0       -1.4
   3 │        2   missing         0        1.2  missing    missing
   4 │        2   missing         0        2.3  missing          3.0
   5 │        2   missing        -5       10.0  missing       -100.0

julia> combine(ds, :x1_int => x -> maximum(x))
2×2 Dataset
 Row │ g         function_x1_int 
     │ identity  identity
     │ Int64?    Int64?
─────┼───────────────────────────
   1 │        1                1
   2 │        2          missing

The behavior does not appear to be closely associated with group by

 julia> ungroup!(ds)
5×6 Sorted Dataset
 Sorted by: g
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float  
     │ identity  identity  identity  identity   identity   identity
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?
─────┼───────────────────────────────────────────────────────────────
   1 │        1         0         2  missing    missing    missing
   2 │        1         1         1       -1.0        3.0       -1.4
   3 │        2         0         3        1.2  missing    missing
   4 │        2   missing         3        2.3  missing          3.0
   5 │        2         2        -2       10.0  missing       -100.0

julia> combine(ds, :x1_int => x -> maximum(x))
1×1 Dataset
 Row │ function_x1_int 
     │ identity
     │ Int64?
─────┼─────────────────
   1 │         missing

My status

(v1.7) pkg> status
      Status `C:\Users\sprmn\.julia\v1.7\Project.toml`
  [8be319e6] Chain v0.4.10
  [35d6a980] ColorSchemes v3.17.1
  [5ae59095] Colors v0.12.8
  [f7bf1975] Impute v0.6.8
  [5c01b14b] InMemoryDatasets v0.6.10
  [8197267c] IntervalSets v0.6.0
  [c8e1da08] IterTools v1.4.0
  [08abe8d2] PrettyTables v1.3.1
  [2913bbd2] StatsBase v0.33.16
  [bd369af6] Tables v1.7.0

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

@sl-solution
Copy link
Owner

the maximum function returns missing when any of the values in a column is missing. Change maximum to IMD.maximum to automatically skip missings.

@sprmnt21
Copy link
Author

sprmnt21 commented Apr 3, 2022

then the issue is in the doc https://docs.juliahub.com/InMemoryDatasets/cS87e/0.4.0/man/grouping/, which, but I notice only now, is related to an old version of IMD.

@sl-solution
Copy link
Owner

I see. Before we were overriding the Base functions, however, it has been fixed since v.0.6.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants