Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBCDFO Python Test Data #130

Open
jared321 opened this issue Feb 27, 2024 · 3 comments
Open

IBCDFO Python Test Data #130

jared321 opened this issue Feb 27, 2024 · 3 comments
Assignees

Comments

@jared321
Copy link
Collaborator

jared321 commented Feb 27, 2024

@jmlarson1 Can I include the following test data files in the IBCDFO python package so that users can run tests/examples without having to get them each time with wget. This would also keep people's folders clean because the files won't be uncompressed and linger.

  • C_for_benchmark_probs.csv
  • D_for_benchmark_probs.csv
  • Q_z_and_b_for_benchmark_problems_normalized_subset.mat

Also, according to setup.py we should be including in the package *.txt files from pounders/tests/benchmark_results. However, I don't see these being used anywhere. Should those be excluded from the package and that folder deleted?

@jared321 jared321 self-assigned this Feb 27, 2024
@jmlarson1
Copy link
Collaborator

jmlarson1 commented Feb 27, 2024

I don't know the best protocol for csv and mat data files in a GitHub repository.

I do not know why package_data lists the txt files. Perhaps that part of setup.py can be dropped?

@jared321
Copy link
Collaborator Author

I will drop the txt files from setup.py.

Here are some criteria that I would think about in such situations

  • A repository is fundamentally for version controlling different files. Could these data files change over time and, if so, should we keep a record of those changes?
  • Are the tests that use these files matched to the data files that they load? If so, might be good to always package them together.
  • Are the files massive? If so, maybe think twice or put them in a different repo that manages large files so that it doesn't take up a lot of disk space and a long time to clone the repo. If large files won't change that often, maybe this is less of a concern.
  • If you put the files in the package, can you hide them behind function calls in the package's interface that provide programmatic access to the data. If so, it might be easier to write and maintain the code that uses that the data and decouple applications from packages (or two packages in the case of IBCDFO using BenDFO data).
  • Will putting the data in the package make it easier, quicker, and less error prone for others to use and access the data (and mean that I have less documentation to write and maintain)?

For this case, the csv files are not large at all, but the mat file is much larger than all other files in the repo. Do you want POptUS users to have to have wget installed? If they don't how to install it, how would you help them get their system setup so that the tests can download the data?

I sometimes have to work for a long period of time with internet access. I would be annoyed if I couldn't continue testing without access. You have at least set it up so that I could continue working so long as I have the zip file before I go offline.

If you would like, I can learn about git-lfs to see if it can help us without requiring too much overhead to us or complicating the lives of POptUS users.

@jared321
Copy link
Collaborator Author

There is another wrinkle to this. Would the same data files be used in the test suite of each implementation of a method (i.e., Python, MATLAB, Julia, R, and Rust)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants