Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chunking issues in sum_AMEL and reduce_damages #83

Merged
merged 21 commits into from
Jul 6, 2023

Conversation

JMGilbert
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented May 11, 2023

Codecov Report

Merging #83 (43b7843) into dscim-v0.4.0 (152ae4f) will increase coverage by 0.21%.
The diff coverage is 91.30%.

@@               Coverage Diff                @@
##           dscim-v0.4.0      #83      +/-   ##
================================================
+ Coverage         67.99%   68.21%   +0.21%     
================================================
  Files                17       17              
  Lines              1859     1878      +19     
================================================
+ Hits               1264     1281      +17     
- Misses              595      597       +2     
Impacted Files Coverage Δ
src/dscim/preprocessing/preprocessing.py 71.81% <80.00%> (-0.30%) ⬇️
src/dscim/preprocessing/input_damages.py 88.72% <94.44%> (+0.35%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@JMGilbert JMGilbert marked this pull request as ready for review June 12, 2023 15:50
save_path str
Path to save concatenated file in .zarr format
"""
paths = glob.glob(f"{damage_dir}/{basename}*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer to explicitly create a list of filenames to open, in case there's extra data files or anything like that. Maybe that's handled in a data check later?


for v in list(data.coords.keys()):
if data.coords[v].dtype == object:
data.coords[v] = data.coords[v].astype("unicode")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well handle this in a unit test to add the coverage and avoid the warning

data.coords[v] = data.coords[v].astype("unicode")
for v in list(data.variables.keys()):
if data[v].dtype == object:
data[v] = data[v].astype("unicode")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above comment

"ssp": 1,
}
else:
chunkies = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add to unit tests

.rename(var)
.chunk(
{
"batch": 15,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeing this dictionary of chunks repeated many times confirms that we should generalize at least a little bit - perhaps define a global chunkies and eventually put into a config. This can be done in a later PR.

@kemccusker
Copy link
Member

We decided to add the test coverage and generalizing of chunk sizes to later PRs.

@kemccusker kemccusker merged commit d9bdae3 into dscim-v0.4.0 Jul 6, 2023
@kemccusker kemccusker deleted the dscim-v0.4.0_fixes branch July 6, 2023 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants