You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There appear to be some issues when parsing well_id, particularly in the embedding.parquet files from sources 1, 2 and 7 of the JUMP dataset. The well_id listed in the index corresponds to the previous "segment" of the key.
I used this filtering to check it:
Source 15 of the JUMP dataset seem to have a different key structure than the others sources, which leads to a number of parsing errors (which are not recognised as parsing errors according to the is_parsing_error column of the index):
The dataset_id is 'jump' and not 'cpg0016-jump'
There are 460 unique 'plate_id' values from that source, but only 183 of those follow the expected structure.
# There are 460 plate_ids for source 15 in JUMP, are there really 460 plates?
# also, plate_id varies in structure!
df = (index
.filter(pl.col("dataset_id").eq("jump"))
.filter(pl.col("source_id").eq("source_15"))
.unique(subset=["plate_id"])
.select(pl.col(["plate_id"]))
.collect(streaming=True)
)
df
# For source 15 in JUMP, are there really 460 plates?
# There are 183 unique ones matching the regex for the plate name structure
df = (index
.filter(pl.col("dataset_id").eq("jump"))
.filter(pl.col("source_id").eq("source_15"))
.filter(pl.col("plate_id").str.contains("^PE(P|C)[0-9]{8}$"))
.select(pl.col(["plate_id"]).unique().sort())
.collect(streaming=True)
)
df
The text was updated successfully, but these errors were encountered:
There appear to be some issues when parsing well_id, particularly in the embedding.parquet files from sources 1, 2 and 7 of the JUMP dataset. The well_id listed in the index corresponds to the previous "segment" of the key.
I used this filtering to check it:
Source 15 of the JUMP dataset seem to have a different key structure than the others sources, which leads to a number of parsing errors (which are not recognised as parsing errors according to the is_parsing_error column of the index):
The text was updated successfully, but these errors were encountered: