map.average_precision killed when using large sample size #90

AMCalejandro · 2025-03-10T20:10:30Z

Hi,

I was using copairs to get a Phenotypic activity assesment based on mAP.

I was doing this using cpg0014 extracted features averaged per well. Below I show the shape of the data

>>> feats_meta[1].shape
(9216, 23)

I believe the issue I am experiencing could be solved by adding more resources to my VM but you might want to take care of an scenario when compute resources are limited but still desired to complete the job.

Memory available

               total        used        free      shared  buff/cache   available
Mem:            14Gi       1.7Gi        12Gi       1.0Mi       300Mi        12Gi
Swap:             0B          0B          0B

CPU info

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
    CPU family:          6
    Model:               63
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4599.99

Running the code below, it gets killed when trying to do map.average_precision on the whole dataset. Subsampling solves the issue and the job completes


from copairs import map
from copairs.matching import assign_reference_index

df_metadata = feats_meta[1]
feats = feats_meta[0]

reference_col = "Metadata_reference_index"
df_metadata_activity = assign_reference_index(
    df_metadata,
    "Metadata_broad_id == 'None'",  # condition to get reference profiles (neg controls)
    reference_col=reference_col,
    default_value=-1,
)

# positive pairs are replicates of the same treatment
pos_sameby = ["Metadata_broad_id", reference_col]
pos_diffby = []
neg_sameby = []
# negative pairs are replicates of different treatments
neg_diffby = ["Metadata_broad_id", reference_col]


metadata = df_metadata_activity
profiles = feats.values

activity_ap = map.average_precision(
    metadata, profiles, pos_sameby, pos_diffby, neg_sameby, neg_diffby
)

activity_ap = activity_ap.query("Metadata_broad_id != 'None'")  # remove DMSO
activity_ap.to_csv("output/mAP/mAP.csv", index=False)

activity_map = map.mean_average_precision(
    activity_ap, pos_sameby, null_size=1000000, threshold=0.05, seed=0
)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map.average_precision killed when using large sample size #90

map.average_precision killed when using large sample size #90

AMCalejandro commented Mar 10, 2025

map.average_precision killed when using large sample size #90

map.average_precision killed when using large sample size #90

Comments

AMCalejandro commented Mar 10, 2025