Precompute fixed-length string dtypes #20

eric-czech · 2020-08-05T10:02:29Z

This also updates the test data to contain alleles and sample id strings that are longer to ensure that inferred data types are correct.

The function _max_str_len should probably be a central utility, but we can address that in https://github.com/pystatgen/sgkit/issues/90.

tomwhite

Looks good - just a small request to doc how test data was generated.

tomwhite · 2020-08-07T08:46:30Z

sgkit_plink/tests/data/plink_sim_10s_100v_10pmiss.bim

-1	1:7:A:C	0.0	7	C	A
-1	1:8:A:C	0.0	8	C	A
-1	1:9:A:C	0.0	9	C	A
+1	1:1:G:CGCGCG	0.0	1	CGCGCG	G


Can you document how you generated this file?

Hm how about this:

I'll add some notes for now saying this was created using software in a separate environment (hail)

Create an issue to document this properly and include the associated code that created it

Wait to see where the multi-repo conversation goes (https://github.com/pystatgen/sgkit/issues/65#issuecomment-670049733)

Add this to the validation folder if we merge repos since it is essentially the same process as the one I used in the REGENIE and HWE PRs

That sound good? I'd rather not bootstrap another validation-like concept with all the associated CI changes if I can help it.

@tomwhite does @eric-czech's proposal sound good?

Yes (sorry forgot to reply) - sounds great!

I'll add some notes for now saying this was created using software in a separate environment (hail)

467d4bd#diff-33e9afa89862f603b25f9c5abf5ef334R6

eric-czech · 2020-08-20T19:29:19Z

@tomwhite can I get a sign off on those updates when you get a chance?

eric-czech added 2 commits August 5, 2020 05:52

Precompute string lengths sgkit-dev#12

0bb862d

Reformat

bfbeedb

eric-czech requested a review from tomwhite August 5, 2020 12:21

tomwhite reviewed Aug 7, 2020

View reviewed changes

eric-czech mentioned this pull request Aug 20, 2020

Document data generation for unit test #29

Open

Add note on test data generation

467d4bd

tomwhite approved these changes Aug 24, 2020

View reviewed changes

eric-czech merged commit f62fc70 into sgkit-dev:master Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute fixed-length string dtypes #20

Precompute fixed-length string dtypes #20

eric-czech commented Aug 5, 2020

tomwhite left a comment

tomwhite Aug 7, 2020

eric-czech Aug 7, 2020

hammer Aug 12, 2020

tomwhite Aug 13, 2020

eric-czech Aug 20, 2020

eric-czech Aug 20, 2020

eric-czech commented Aug 20, 2020

Precompute fixed-length string dtypes #20

Precompute fixed-length string dtypes #20

Conversation

eric-czech commented Aug 5, 2020

tomwhite left a comment

Choose a reason for hiding this comment

tomwhite Aug 7, 2020

Choose a reason for hiding this comment

eric-czech Aug 7, 2020

Choose a reason for hiding this comment

hammer Aug 12, 2020

Choose a reason for hiding this comment

tomwhite Aug 13, 2020

Choose a reason for hiding this comment

eric-czech Aug 20, 2020

Choose a reason for hiding this comment

eric-czech Aug 20, 2020

Choose a reason for hiding this comment

eric-czech commented Aug 20, 2020