-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix off by one error in dbase::File #75
Conversation
tests/test_file.rs
Outdated
file.set_options(reading); | ||
|
||
let expected_trim_end = StationRecord { | ||
name: format!("{}", " Franconia-Springfield",), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something isn't right here: this shouldn't have any whitespace at the front. Let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the record length in the header of the stations.dbf file doesn't include the header. It is 1016, and each field has length 254 (4 fields). So I wonder if whether the deletion flag is included in the record length depends on the dbf version? I'm not sure. Reading the .dbf file using R gives no errors, so I doubt it's corrupt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a separate PR that leaves the tests as-is, and uses the field lengths to calculate the total record size. This should work regardless of whether or not the record size from the header includes the deletion flag #76.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I open this file in other software I see the leading spaces for the fields that it's not completeky unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You solution is better in the sense that if the file is malformed like the stations.dbf seems to be, the reading would be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found when testing that each subsequent row has one more leading space than the previous row, which means the off by one issue still occurs for the stations file.
What software did you use to open it? I tried the dbfread library in python and foreign in R and both read the file without leading spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used LibreOffice, and it has the same behaviour as you described, each row has one more leading space than the previous.
The python lib and R could very well both be trimming the beginning of the field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried again with python and R with a modified version if stations.dbf
to test the trimming. Python's dbfread
reads the file without issue, but R's foreign
(which uses the shapefile c library under the hood) does not.
If foreign
and libreoffice don't work with that file, I don't see any reason why dbase-rs
should have to handle the edge case. I think the best solution would be to use the record length from the file header for both Reader
and File
, rather than having to recalculate it based on the field lengths. In my opinion this would be cleaner, but it's up to you.
However, given the stations.dbf
file is used for tests and such, we could easily just change the record length in that file from F803
(1016) to F903
(1017) to make it valid and keep the tests as they were. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a few more changes that makes this pr more correct.
Also I aggree that the problem is that the file is not valid, and so the code was made so that it would read the file ""correctly"".
The file is probably not correct because I probably made it by trimming the original file, using dbase-rs, but if dbase-rs was wrong in the first place then the resulting trimmed file is not correct.
So in this PR I also added the original and correct file so that tests are now correct and on a valid file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
de94c80
to
15e99e7
Compare
dbase::File calculated the position of the record/field assuming the record size by using the header's info which includes the implicit deletion flag size, and dbase::File added the deletion flag size again resulting of bad readings.
69e1bc8
to
c8fe23b
Compare
dbase::File calculated the position of the record/field assuming the record size by using the header's info which includes the implicit deletion flag size, and dbase::File added the deletion flag size again resulting of bad readings.