Replacing metadata in rows #2954
hyanwong
started this conversation in
Show and tell
Replies: 2 comments
-
Actually, the most efficient way to change multiple rows is probably to add them onto the end and then run the code in #2953: import tskit
import numpy as np
def change_ragged_order(ragged_arr, ragged_offset, new_order):
# returns a tuple of a new ragged_col in a new order and a new offset array
ranges = np.array([ragged_offset[:-1], ragged_offset[1:]]).T
new_ranges = ranges[new_order, :]
idx = [np.arange(l, r, dtype=ragged_offset.dtype) for l, r in new_ranges if l != r]
select = [] if len(idx) == 0 else np.concatenate(idx)
return ragged_arr[select], np.insert(np.cumsum(np.diff(new_ranges, axis=1)), 0, 0)
def change_metadata(new_md_dict, table):
if table.metadata_schema.schema is not None:
for k, v in new_md_dict.items():
new_md_dict[k] = table.metadata_schema.validate_and_encode_row(v)
data = [table.metadata]
# add a list of new byte arrays, then concat
data += [np.array(bytearray(v), dtype=table.metadata.dtype) for v in new_md_dict.values()]
tmp_offset = np.cumsum([len(d) for d in data], dtype=table.metadata_offset.dtype)[1:]
tmp_offset = np.concatenate((table.metadata_offset, tmp_offset))
tmp_md = np.concatenate(data)
new_row_ids = np.arange(len(new_md_dict)) + table.num_rows
idx = np.arange(table.num_rows)
idx[list(new_md_dict.keys())] = new_row_ids
d = table.asdict()
d["metadata"], d["metadata_offset"] = change_ragged_order(tmp_md, tmp_offset, idx)
table.set_columns(**d) Test:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Note that you can change individual row metadata by simple assignment. I had forgotten this! https://tskit.dev/tutorials/tables_and_editing.html#minor-edits tables.individuals[1] = tables.individuals[1].replace(metadata={"name": "Robert"}) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Following on from #2953, I wanted to be able to change metadata in an individual row (or rows) without having to validate/reencode the existing metadata. I think this does it:
Here's a test.
A useful improvement to this would be to be able to replace multiple rows at once (maybe input could be a dict mapping
{index: md}
)Beta Was this translation helpful? Give feedback.
All reactions