Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map.average_precision killed when using large sample size #90

Open
AMCalejandro opened this issue Mar 10, 2025 · 0 comments
Open

map.average_precision killed when using large sample size #90

AMCalejandro opened this issue Mar 10, 2025 · 0 comments

Comments

@AMCalejandro
Copy link

Hi,

I was using copairs to get a Phenotypic activity assesment based on mAP.

I was doing this using cpg0014 extracted features averaged per well. Below I show the shape of the data

>>> feats_meta[1].shape
(9216, 23)

I believe the issue I am experiencing could be solved by adding more resources to my VM but you might want to take care of an scenario when compute resources are limited but still desired to complete the job.

Memory available

               total        used        free      shared  buff/cache   available
Mem:            14Gi       1.7Gi        12Gi       1.0Mi       300Mi        12Gi
Swap:             0B          0B          0B

CPU info

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
    CPU family:          6
    Model:               63
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4599.99

Running the code below, it gets killed when trying to do map.average_precision on the whole dataset. Subsampling solves the issue and the job completes


from copairs import map
from copairs.matching import assign_reference_index

df_metadata = feats_meta[1]
feats = feats_meta[0]

reference_col = "Metadata_reference_index"
df_metadata_activity = assign_reference_index(
    df_metadata,
    "Metadata_broad_id == 'None'",  # condition to get reference profiles (neg controls)
    reference_col=reference_col,
    default_value=-1,
)

# positive pairs are replicates of the same treatment
pos_sameby = ["Metadata_broad_id", reference_col]
pos_diffby = []
neg_sameby = []
# negative pairs are replicates of different treatments
neg_diffby = ["Metadata_broad_id", reference_col]


metadata = df_metadata_activity
profiles = feats.values

activity_ap = map.average_precision(
    metadata, profiles, pos_sameby, pos_diffby, neg_sameby, neg_diffby
)

activity_ap = activity_ap.query("Metadata_broad_id != 'None'")  # remove DMSO
activity_ap.to_csv("output/mAP/mAP.csv", index=False)

activity_map = map.mean_average_precision(
    activity_ap, pos_sameby, null_size=1000000, threshold=0.05, seed=0
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant