Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast data.table to matrix conversion in C [Depends: #4196] #4144
base: master
Are you sure you want to change the base?
Fast data.table to matrix conversion in C [Depends: #4196] #4144
Changes from 109 commits
a0c09dc
a614743
a7a22a0
27fa71c
0d6742a
89e4239
e101b0e
dff21b5
a7118b7
4456f26
7b4891a
d226099
cc08c5d
92c1d54
002fd76
ee4e6af
02310aa
a9c9926
5c9f526
7cbf38e
224a8ab
ab3ed57
946f38e
297b607
2dd90d9
504ee7c
d4c2465
4d3669e
b367f23
b711fb0
d41cb0e
1f566f4
f58029d
893cbba
e63df58
1f7f94e
1f50c13
caeb706
f5fdfb3
a0b760e
49fc1b3
cc954bc
a75705a
a1ffe65
24af9fd
9ceea1b
64334f8
475c24d
d08e67d
7b60f79
815c4b5
d2ec0f0
5a81e37
9a68b7c
447d3e7
b4bdf9f
6294407
589b7c0
8aa261e
f0aafae
ee625f3
e90db9c
8aa779f
685385e
dbd9331
0bad2ca
da74463
2f853b9
3481a93
5038ebc
bdcea33
32ba105
2f12067
746d287
131e2d2
87ac4c9
9768918
4d87e9f
c8e6aed
75f6b57
defc74e
9f64ddb
5dbc33c
4067cd4
20ce5fd
a963766
b842b65
80199fd
8c2cbd3
f4ecb46
a948c7f
ec698bb
b7cbaab
47e79be
2be89d6
0f14c06
f1ce5cd
ab0453c
5d516d7
af02985
d9a5fc8
590363d
5381580
f713224
e92c6d0
9047f64
c4405dd
6cf1553
bef9fa1
ab048ad
18c45c4
1a1e40a
eee7197
37f8c42
d040581
c351389
69cb3b0
f618c64
2da2d76
109bcf8
44232b1
114c710
bdf66fb
f5dd790
1ef7be9
3de02cc
c43daec
0a136fc
9f3b8eb
8164cc1
5d59d23
99dead0
856a8c1
9397963
c3ba221
6a1343d
70cdd14
bd5ca7a
3a384c0
5d82c5f
07ae873
cc0d412
8198f46
2ab6122
c3bd7ac
a839c40
06597bf
56c1ec0
e23ddbe
186d687
60614d5
a3ba3c6
7e361d0
e3a9e41
30017a8
93144c1
f0e99c3
e73f143
47a1331
07e6f00
51ecbc0
a00d0a7
10f6d03
f880b27
0452272
4498083
c81b367
2d57084
a39d771
005c228
9dfd669
ffb7310
4dfe9a5
f44ecdb
4fa0c53
7c7758a
6cffa32
ce9e211
e98cf28
82de007
d1fec57
57ee386
94d8bf5
fca5ab9
eec32e1
c501db2
3e88266
a7fac5a
37cbea9
c65e113
9a9b0df
80f4bd4
cd5bb04
79f8899
7b20cea
a284589
93563a3
7236060
f5c5b46
dd81334
2d30f8a
61fcc0a
6a5cb04
7668701
8519f31
fdea714
53846b8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the input data.table has many columns these two calls are quite expensive in the context of the pre column checks, as these attributes are not used for any other column checks.
It would be nice if we could somehow avoid these, but I can't see how, as they're absolutely necessary for avoiding problems with memcpy in the C code (as the memcpy call assumes the number of elements to copy is .N - if this is not true bad things can happen).
As far as I can tell, the only way these checks will fail is if a data.table has been created through
structure(list(...), class=c("data.table", "data.frame"))
asdata.table()
,as.data.table()
, andsetDT()
will error if trying to create malformed columns like this.Is there a way we can check for a valid externalptr in
attributes(dt)[[".internal.selfref"]]
to avoid these expensive checks on valid data.tables?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have found the internal
selfrefok()
function. Unfortunatelyselfrefok()
does not guarantee the data.table is not malformed, assetDT()
will allow these columns and thenselfrefok()
will be1L
. E.g. as in test2074.07
. This is particularly problematic if one of the columns has length < .N.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems quite wasteful to allocate memory for a new column whenever a column has to be coerced to a different type/class, only to then have to copy that memory again into the new matrix in the C code. For the atomic types at least, we may just be able to set a target type, pass that to the C code, then do the coercion there at copy-time. This would slow down the copy loop in the C code as you'd have to check and coerce each column - one way around this would be to pass the vectors of unique column types and individual column types to write a nested for loop iterating along columns of each type separately to avoid slowdowns due to branch prediction. Perhaps some memory profiling would be useful here to see if this is worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assign.c:memrecycle
does that; new feature 29 in 1.12.4 refers to zero-copy type coercion.