Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setDT should check if any column is POSIXlt #4800

Open
iago-pssjd opened this issue Nov 4, 2020 · 17 comments · May be fixed by #6658
Open

setDT should check if any column is POSIXlt #4800

iago-pssjd opened this issue Nov 4, 2020 · 17 comments · May be fixed by #6658
Milestone

Comments

@iago-pssjd
Copy link
Contributor

iago-pssjd commented Nov 4, 2020

I was doing setDT() of a dataframe (or tibble) with a variable POSIXlt. It seems not to be any problem. Even I can do str() of that object. It does not show anything strange. Then errors started to appear when I tried to manipulate the object or simply do head().

As I have found the problem, I would ask/suggest why do not produce the error when running setDT or, otherwise show a warning?, which would tell about the problem that there are POSIXlt variables and they are not supported. Possibly these variables could be removed or coerced to POSIXct or another better format for data.table and the warning would tell about that.

Update: Minimal example with session info:

> library(tibble)
> library(data.table)
> now <- as.POSIXlt(Sys.time())
> x <- as.data.frame(tibble(now))
> mdt <- data.table(id=1:3, d=strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S"))
Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :
  POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.
> # previous is fine; but next...
> setDT(x)
> head(x)
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent
> str(x)
Classesdata.tableand 'data.frame':	11 obs. of  1 variable:
 $ now: POSIXlt, format: "2020-11-04 10:24:47"
 - attr(*, ".internal.selfref")=<externalptr> 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.0 tibble_3.0.3     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5          rstudioapi_0.11     magrittr_1.5.0.9000 tidyselect_1.1.0    munsell_0.5.0      
 [6] colorspace_1.4-1    R6_2.4.1            rlang_0.4.7         fansi_0.4.1         plyr_1.8.6         
[11] dplyr_1.0.2         tools_4.0.3         grid_4.0.3          gtable_0.3.0        cli_2.0.2          
[16] ellipsis_0.3.1      assertthat_0.2.1    lifecycle_0.2.0     crayon_1.3.4        purrr_0.3.4        
[21] ggplot2_3.3.2       vctrs_0.3.4         glue_1.4.2          compiler_4.0.3      pillar_1.4.6       
[26] generics_0.0.2      scales_1.1.1        pkgconfig_2.0.3    

Any case, thank you for this great package.

@jangorecki
Copy link
Member

@iago-pssjd Thank you for your report. What would be even more better is to include minimal example (so we can just copy copy to observe issue you are describing) and your session info (to check if you might be using older version of DT and issue might have been fixed already).

@iago-pssjd
Copy link
Contributor Author

iago-pssjd commented Nov 4, 2020

Thank you @jangorecki . I updated the issue and removed the reference to data.table() which i toke just after look at the SO question, but I didn't check it. Now I see that this function does what I asked for, but setDT yet not.

@jangorecki
Copy link
Member

Thank you.
We can define an action point for this issue to add check for POSIXlt inside setDT.

@jangorecki jangorecki changed the title Print a warning/... when creating data.table with POSIXlt setDT should check if any column is POSIXlt Nov 4, 2020
@shrektan
Copy link
Member

It looks like a problem of print.data.table() to me and not setDT(). print.data.table() should work with a data.table object that contains POSIXlt vectors.

In my opinion, generally, we should not change the vector value of the object by setDT(), unless there're very strong reasons.

@iago-pssjd
Copy link
Contributor Author

@shrektan is not only a problem of print.data.table(), since if you want to do other operations on the object, it gives an error too (let me remember which specific operations, but I believe I was trying to create some variable, and I believe it had nothing to do with the time variable, bot sure also if using by = ...)

@shrektan

This comment has been minimized.

@jangorecki
Copy link
Member

jangorecki commented Nov 11, 2020

If we change POSIXlt to POSIXct it won't be by reference anymore so I think the best is to raise error. Other operations will also warn on POSIXlt input.

library(data.table)
x <- data.table(g=Sys.time(), a=1)
x[["g"]] <- as.POSIXlt(Sys.time())
str(x)
#> Classes 'data.table' and 'data.frame':   11 obs. of  2 variables:
#>  $ g: POSIXlt, format: "2020-11-12 02:01:56"
#>  $ a: num 1
#>  - attr(*, ".internal.selfref")=<externalptr>
x[, sum(a), g]
#> Error in `[.data.table`(x, , sum(a), g): column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

@shrektan
Copy link
Member

shrektan commented Nov 11, 2020

@jangorecki The example is not suitable as the warning is thrown from data.table(). However, I did some test (and edited your example) and realized that your conclusion is correct, the POSIXlt is indeed special - it's a named list - so it's supposed to not work with data.table other operations well.

@iago-pssjd you're right, POSIXlt is indeed different and we should take action in setDT().

@iago-pssjd
Copy link
Contributor Author

iago-pssjd commented Feb 26, 2021

Just an addition on my initial message and example that I found now. In there, I did

> x <- as.data.frame(tibble(now))
> mdt <- data.table(id=1:3, d=strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S"))
Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :
  POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.

But if I do then:

str(as.data.table(x))
Classesdata.tableand 'data.frame':	11 obs. of  1 variable:
 $ now: POSIXlt, format: "2021-02-26 16:07:19"
 - attr(*, ".internal.selfref")=<externalptr> 

So as.data.table.list converts POSIXlt column to POSIXct, but the S3 method for class data.frame doesn't.

@ben-schwen
Copy link
Member

ben-schwen commented Sep 9, 2024

I guess we could just reuse setdt_nrows from setDT for list input from #5427? WDYT @MichaelChirico, if so, I can fill the PR later

@iago-pssjd iago-pssjd added this to the 1.17.0 milestone Nov 19, 2024
@MichaelChirico
Copy link
Member

I think since #6299, we start to get nicer behavior of POSIXlt columns (?).

Anyway, throwing a warning() from setDT() (e.g. encouraging as.data.table() instead) looks appropriate. I think an error is too strong.

@iago-pssjd
Copy link
Contributor Author

Just tested example in initial post, after data.table::update_dev_pkg():

> library(tibble)
> library(data.table)
> now <- as.POSIXlt(Sys.time())
> x <- as.data.frame(tibble(now))
> mdt <- data.table(id=1:3, d=strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S"))
Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :
  POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.
> # previous is fine; but next...
> str(as.data.table(x))
Classes 'data.table' and 'data.frame':	11 obs. of  1 variable:
 $ now: POSIXlt, format: Error en format.POSIXlt(object): a valid "POSIXlt" object has names
> setDT(x)
> head(x)
Error en format.POSIXlt(x, ...): 
  a valid "POSIXlt" object is a list of at least 9 elements

@MichaelChirico
Copy link
Member

I guess we could just reuse setdt_nrows from setDT for list input from #5427? WDYT @MichaelChirico, if so, I can fill the PR later

Ah, I guess I see what you mean now. We already error for list input to setDT():

setDT(list(as.POSIXlt(Sys.time())))
# Error in setDT(list(as.POSIXlt(Sys.time()))) : 
#   Column 1 has class 'POSIXlt'....

error(_("Column %d has class 'POSIXlt'. Please convert it to POSIXct (using as.POSIXct) and run setDT() again. We do not recommend the use of POSIXlt at all because it uses 40 bytes to store one date."), i+1);

I still suspect erroring is not the best choice, so I propose moving to warning() in both cases (list or data.frame input). User will not make it far with POSIXlt columns, but they can at least have agency on the fix instead of having setDT() blocked.

Found the error has been there since #646.

@MichaelChirico
Copy link
Member

Also related: #5981

@MichaelChirico
Copy link
Member

MichaelChirico commented Dec 11, 2024

OK I have some progress in branch setdt-posixlt:

https://github.com/Rdatatable/data.table/tree/setdt-posixlt

After some effort, I think it's really too much work to get setDT() to support POSIXlt input vs. the benefit.

So rather than warn() for list input, I will move to error for data.frame input. I don't think this will be a back-compatibility issue because I can't imagine anything useful/remotely functional has come out of setDT(data.frame(<POSIXlt>)).

@MichaelChirico
Copy link
Member

Hmm just discovered another inconsistency of setDT() on list vs. data.frame input:

M = list(matrix(1:6,ncol=2L))
setDT(M)[]
#       V1    V2
#    <int> <int>
# 1:     1     4
# 2:     2     5
# 3:     3     6

DF = data.frame(row.names=letters[1:3])
DF$M = matrix(1:6, ncol=2L)
setDT(DF) # issues warning
DF
# Error in dimnames(x) <- dn : 
#   length of 'dimnames' [1] not equal to array extent

IMO the behavior of setDT(<list>) here is a bit problematic...

@MichaelChirico MichaelChirico modified the milestones: 1.17.0, 1.18.0 Dec 12, 2024
@iago-pssjd
Copy link
Contributor Author

@MichaelChirico

I can't imagine anything useful/remotely functional has come out of setDT(data.frame(<POSIXlt>))

Actually, note that the structure setDT(data.frame(<POSIXlt>)) would work, since data.frame makes the transformation from POSIXlt to POSIXct. It is due to this that I used the library tibble in the example/

now <- as.POSIXlt(Sys.time())
str(data.frame(now))
'data.frame':	1 obs. of  1 variable:
 $ now: POSIXct, format: "2024-12-12 09:02:40"
str(data.frame(x=now))
'data.frame':	1 obs. of  1 variable:
 $ x: POSIXct, format: "2024-12-12 09:02:40"
str(as.data.frame(now))
'data.frame':	1 obs. of  1 variable:
 $ now: POSIXct, format: "2024-12-12 09:02:40"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants