-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precise text file parsing #4081
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
15ceb54
New build option: USE_PRECISE_TEXT_PARSER.
cyfdecyf 063cfdf
Add benchmark for CSVParser with Atof and AtofPrecise.
cyfdecyf 0830ca8
Fix lint complaint.
cyfdecyf 0453f8e
Fix typo in open result error message.
cyfdecyf 9416169
Revert "Fix lint complaint."
cyfdecyf b1d2fd4
Revert "Add benchmark for CSVParser with Atof and AtofPrecise."
cyfdecyf 494cdc3
Use AtofPrecise in Common::__StringToTHelper.
cyfdecyf e1b0e92
[option] precise_float_parser: precise float number parsing for text …
cyfdecyf bec4479
Remove USE_PRECISE_TEXT_PARSER compile option.
cyfdecyf cc71045
test: add test for Common::AtofPrecise.
cyfdecyf 9d6fb02
test: remove ChunkedArrayTest with 0 length.
cyfdecyf d20a65c
fix lint, add copyright.
cyfdecyf 1fb062d
Revert "test: remove ChunkedArrayTest with 0 length."
cyfdecyf 95d0874
Use LightGBM::Common::Sign
cyfdecyf 9c30899
save precise_float_parser in model file.
cyfdecyf 4a658e0
Fix error checking in AtofPrecise. Add more test cases.
cyfdecyf f789cfe
Remove test case that can't pass under macOS.
cyfdecyf 131f8ae
Apply suggestions from code review
cyfdecyf File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean we can replace
LightGBM/include/LightGBM/utils/common.h
Lines 1079 to 1102 in 1d2f3e1
which was added in #3942 with a call to this
AtofPrecise
? Do they behave exactly the same?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on floating number. So I write a test program https://gist.github.com/cyfdecyf/63f4e7339bbe5a5a23474fda66375742
The only difference is that
strtod
would not return negative NaN. So theAtof
function in this gist handles this special case.Please help take a look at the gist and check if there's any problem. I'll update
Common::Atof
and replace the change added in #3492.Refer to: 1 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special handling for -NaN in gist (revision 5) the has one problem though: it's incorrect to handle input like " -nan" (note beginning space). But I'm wondering if there's need for the special handling of this case? From this mailling list thread, it seems like different C library have different treatment for parsing "-nan".
I suggest just leave
Common::AtofPrecise
as is and don't add special handling for "-nan" and the like.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code introduced in #3942 is replaced by
AtofPrecise
in commit 498090d.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! LGTM.