Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbindlist(l, use.names=TRUE) handle different encodings for column names #5453

Merged
merged 14 commits into from
Dec 3, 2024

Conversation

ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Aug 31, 2022

Closes #5452

As also now document in the code, our rbindlist mapping follows a 3 pass approach:
1st pass - gather unique column names
2nd pass - count duplicates
3rd pass - create final column mapping

We actually do not change the underlying encodings here, but convert the read value for comparisons which seems safer since we do not want to alter the input tables.

@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.61%. Comparing base (014dafb) to head (45bc9ff).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5453      +/-   ##
==========================================
- Coverage   98.61%   98.61%   -0.01%     
==========================================
  Files          79       79              
  Lines       14536    14534       -2     
==========================================
- Hits        14335    14333       -2     
  Misses        201      201              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented May 17, 2024

Comparison Plot

Generated via commit f85272b

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 4 minutes and 54 seconds
Installing different package versions 7 minutes and 52 seconds
Running and plotting the test cases 2 minutes and 21 seconds

@MichaelChirico
Copy link
Member

Could you write out in the PR description what are the three call sites being updated? Is there any possibility of using a helper instead to avoid relying on updating 3 places to maintain similar changes in the future? Also please ensure there are tests for each of the 3 call sites (I guess codecov would be telling the truth here, but it's currently broken).

IIUC, UTF-8 was chosen as the "winning" encoding, it seems like even if none of the inputs are UTF-8. If that's right, I think it is fine, though it might be a breaking change, right? Please describe the behavior more explicitly in the NEWS item, and perhaps in ?rbindlist, and possibly write some new tests.

@MichaelChirico MichaelChirico added this to the 1.17.0 milestone Sep 4, 2024
NEWS.md Outdated Show resolved Hide resolved
@MichaelChirico
Copy link
Member

Saw my earlier comment. At least I'm consistent :)

@ben-schwen
Copy link
Member Author

Updated the description and also the code comments!

@MichaelChirico MichaelChirico merged commit f05893e into master Dec 3, 2024
6 of 7 checks passed
@MichaelChirico MichaelChirico deleted the rbindlist_encodings branch December 3, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rbind cannot handle different name encodings
2 participants