Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix off by one error in dbase::File #75

Merged
merged 1 commit into from
Oct 28, 2023
Merged

Conversation

tmontaigu
Copy link
Owner

dbase::File calculated the position of the record/field assuming the record size by using the header's info which includes the implicit deletion flag size, and dbase::File added the deletion flag size again resulting of bad readings.

file.set_options(reading);

let expected_trim_end = StationRecord {
name: format!("{}", " Franconia-Springfield",),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something isn't right here: this shouldn't have any whitespace at the front. Let me check

Copy link
Contributor

@casperhart casperhart Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the record length in the header of the stations.dbf file doesn't include the header. It is 1016, and each field has length 254 (4 fields). So I wonder if whether the deletion flag is included in the record length depends on the dbf version? I'm not sure. Reading the .dbf file using R gives no errors, so I doubt it's corrupt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a separate PR that leaves the tests as-is, and uses the field lengths to calculate the total record size. This should work regardless of whether or not the record size from the header includes the deletion flag #76.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I open this file in other software I see the leading spaces for the fields that it's not completeky unexpected

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You solution is better in the sense that if the file is malformed like the stations.dbf seems to be, the reading would be correct

Copy link
Contributor

@casperhart casperhart Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found when testing that each subsequent row has one more leading space than the previous row, which means the off by one issue still occurs for the stations file.
What software did you use to open it? I tried the dbfread library in python and foreign in R and both read the file without leading spaces.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used LibreOffice, and it has the same behaviour as you described, each row has one more leading space than the previous.

The python lib and R could very well both be trimming the beginning of the field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried again with python and R with a modified version if stations.dbf to test the trimming. Python's dbfread reads the file without issue, but R's foreign (which uses the shapefile c library under the hood) does not.

If foreign and libreoffice don't work with that file, I don't see any reason why dbase-rs should have to handle the edge case. I think the best solution would be to use the record length from the file header for both Reader and File, rather than having to recalculate it based on the field lengths. In my opinion this would be cleaner, but it's up to you.

However, given the stations.dbf file is used for tests and such, we could easily just change the record length in that file from F803 (1016) to F903 (1017) to make it valid and keep the tests as they were. What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a few more changes that makes this pr more correct.

Also I aggree that the problem is that the file is not valid, and so the code was made so that it would read the file ""correctly"".

The file is probably not correct because I probably made it by trimming the original file, using dbase-rs, but if dbase-rs was wrong in the first place then the resulting trimmed file is not correct.

So in this PR I also added the original and correct file so that tests are now correct and on a valid file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@tmontaigu tmontaigu force-pushed the fix-off-by-one-file branch from de94c80 to 15e99e7 Compare October 27, 2023 20:50
dbase::File calculated the position of the record/field
assuming the record size by using the header's info
which includes the implicit deletion flag size, and dbase::File
added the deletion flag size again resulting of bad readings.
@tmontaigu tmontaigu force-pushed the fix-off-by-one-file branch from 69e1bc8 to c8fe23b Compare October 27, 2023 21:01
@tmontaigu tmontaigu merged commit c8fe23b into master Oct 28, 2023
4 checks passed
@tmontaigu tmontaigu deleted the fix-off-by-one-file branch November 1, 2023 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants