You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi, i've implemented the sampling method in the paper for personal purpose
hope its helpful for u too
importrandomdefsample_ner_data_struct_shot(samples, count_fn, k=1, random_state=None):
""" sample or select a subset of samples with k using the sampling method from https://arxiv.org/abs/2010.02405 Args: samples: list count_fn: input a sample, return a dict of {entity_type: count} k: number of entity instances for each entity type Returns: indices of the selected samples entity count of the selected samples """# count entitiescount= {} # total countsamples_count= [] # count for each sampleforsampleinsamples:
sample_count=count_fn(sample)
samples_count.append(sample_count)
fore_type, e_countinsample_count.items():
count[e_type] =count.get(e_type, 0) +e_count# sort by entity count, iterate from the infrequent entity to the frequent and sampleentity_types=sorted(count.keys(), key=lambdak: count[k])
selected_ids=set()
selected_count= {k:0forkinentity_types}
random.seed(random_state)
forentity_typeinentity_types:
whileselected_count[entity_type] <k:
samples_with_e= [iforiinrange(len(samples)) ifentity_typeinsamples_count[i] andinotinselected_ids]
sample_id=random.choice(samples_with_e)
selected_ids.add(sample_id)
# update selected_countfore_type, e_countinsamples_count[sample_id].items():
selected_count[e_type] +=e_countreturnlist(selected_ids), selected_countfromcollectionsimportCounterdefcount_entity_(sample):
returnCounter([slot['label'] forslotinsample['slots']])
No description provided.
The text was updated successfully, but these errors were encountered: