MPNST sample updates AND PDX code addition #251

sgosline · 2024-11-22T00:23:12Z

This is a pretty large merge as my branch is a bit out dated, so @jjacobson95 please confirm that the dataset build functionality works! (I didn't try build_all.py, only build_dataset.py).

This PR closes #146 which has been open far too long. The PDX data required a new curve fitting statistic, which was part of there delay. There was a second untracked issue about thempnst samples not matching entirely, but that has also been resolved .

jjacobson95 · 2024-11-22T17:27:47Z

Does this branch replace the mpnst build with mpnstPDX or does it intend for both to be run? Trying to figure out how to resolve these conflicts.

jjacobson95 · 2024-11-22T17:33:20Z

It looks like its built to be separate dataset from mpnst, is this the plan?

sgosline · 2024-11-22T18:12:12Z

I updated the mpnst dataset (drug response in PDX MT) and added the mpsntpdx (drug response in vivo pdx) dataset.

jjacobson95 · 2024-11-22T18:55:05Z

Is the samples file duplicated between the two datasets?

Trying to figure out how build/mpnstPDX/build_samples.sh should work.

jjacobson95 · 2024-11-22T18:58:19Z

Same question with the drugs file. There is no build_drugs.sh, should I make this just a duplicate of the mpnst_drugs.tsv file as well?

sgosline · 2024-11-22T19:07:32Z

The sample file should be copied, not regenerated again because the original samples are the same. I'm not sure what happened to the drug file, it has been added now.

jjacobson95 · 2024-11-22T19:53:40Z

The mpnst_copy_number.csv file is missing improve_sample_id values for some rows.

When running the validation code, I'm getting this error:

linkml-validate --schema schema/coderdata.yaml --target-class "Copy Number" local/mpnst_copy_number.csv
...
[ERROR] [local/mpnst_copy_number.csv/345134] 'improve_sample_id' is a required property in /
[ERROR] [local/mpnst_copy_number.csv/345135] 'improve_sample_id' is a required property in /
[ERROR] [local/mpnst_copy_number.csv/345136] 'improve_sample_id' is a required property in /

Tail of mpnst_copy_number in this branch:

57135,0.00695501817407717,deep del,MPNST PDX MT,NF Data Portal,
57054,0.00695501817407717,deep del,MPNST PDX MT,NF Data Portal,
57055,0.00695501817407717,deep del,MPNST PDX MT,NF Data Portal,
9085,0.00695501817407717,deep del,MPNST PDX MT,NF Data Portal,
253175,0.00695501817407717,deep del,MPNST PDX MT,NF Data Portal,

Tail of mpnst_copy_number in build 0.1.40:

57135,0.00732396189266279,deep del,MPNST PDX MT,NF Data Portal,4270
57054,0.00732396189266279,deep del,MPNST PDX MT,NF Data Portal,4270
57055,0.00732396189266279,deep del,MPNST PDX MT,NF Data Portal,4270
253175,0.00732396189266279,deep del,MPNST PDX MT,NF Data Portal,4270
9085,0.00732396189266279,deep del,MPNST PDX MT,NF Data Portal,4270

sgosline · 2024-11-23T00:35:50Z

Ok, the latest commit should fix this. its a change to the sample generation I added....

jjacobson95 · 2024-11-23T17:19:57Z

Running into several issues as this branch is 71 commits behind main. Don't have time today to work on these, but in a merged branch (drop_drugs) I am running into the following issues for MPNST (not pdx).

Drugs File

Traceback (most recent call last):
  File "/app/build_drug_desc.py", line 97, in <module>
    main()
  File "/app/build_drug_desc.py", line 87, in main
    id_morg = ids.rename({"canSMILES":'smile'},axis=1).merge(morgs)[['improve_drug_id','structural_descriptor','descriptor_value']]
  File "/opt/venv/lib/python3.10/site-packages/pandas/core/frame.py", line 9843, in merge
    return merge(
  File "/opt/venv/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 148, in merge
    op = _MergeOperation(
  File "/opt/venv/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 719, in __init__
    self.left_on, self.right_on = self._validate_left_right_on(left_on, right_on)
  File "/opt/venv/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 1500, in _validate_left_right_on
    raise MergeError(
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False

Experiments file:

Joining with `by = join_by(chem_name)`
Error in `left_join()`:
! Can't join `x$chem_name` with `y$chem_name` due to incompatible types.
ℹ `x$chem_name` is a <character>.
ℹ `y$chem_name` is a <logical>.
Backtrace:
     ▆
  1. ├─base::subset(...)
  2. ├─dplyr::distinct(...)
  3. ├─dplyr::select(...)
  4. ├─dplyr::mutate(left_join(alldrugs, drug_map), time_unit = "hours")
  5. ├─dplyr::left_join(alldrugs, drug_map)
  6. ├─dplyr:::left_join.data.frame(alldrugs, drug_map)
  7. │ └─dplyr:::join_mutate(...)
  8. │   └─dplyr:::join_cast_common(x_key, y_key, vars, error_call = error_call)
  9. │     ├─rlang::try_fetch(...)
 10. │     │ └─base::withCallingHandlers(...)
 11. │     └─vctrs::vec_ptype2(x, y, x_arg = "", y_arg = "", call = error_call)
 12. ├─vctrs (local) `<fn>`()
 13. │ └─vctrs::vec_default_ptype2(...)
 14. │   ├─base::withRestarts(...)
 15. │   │ └─base (local) withOneRestart(expr, restarts[[1L]])
 16. │   │   └─base (local) doWithOneRestart(return(expr), restart)
 17. │   └─vctrs::stop_incompatible_type(...)
 18. │     └─vctrs:::stop_incompatible(...)
 19. │       └─vctrs:::stop_vctrs(...)
 20. │         └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
 21. │           └─rlang:::signal_abort(cnd, .file)
 22. │             └─base::signalCondition(cnd)
 23. └─rlang (local) `<fn>`(`<vctrs__2>`)
 24.   └─handlers[[1L]](cnd)
 25.     └─dplyr:::rethrow_error_join_incompatible_type(cnd, vars, error_call)
 26.       └─dplyr:::stop_join(...)
 27.         └─dplyr:::stop_dplyr(...)
 28.           └─rlang::abort(...)

jjacobson95 · 2024-11-23T17:22:18Z

As these are preventing the full build, I'm going to exclude them to get the updated drug and samples numbers for the others for AACR. When these are working, I'll rebuild again with them included.

sgosline · 2024-11-24T20:24:52Z

So is the bug in the pdx_code branch? or the drop_drugs branch? I can work on it today.

jjacobson95 · 2024-11-24T21:01:46Z

Sorry this is a bit complicated. It is currently on the drop_drugs branch and I think it exists because some files don't match how the build_all / build_dataset process is currently working. This updated build process is detailed in mpnst-readme-update branch.

If you could just get the sample / drug numbers for these (for AACR), I'd be happy to align this to the build process when I return after thanksgiving.

Just a side note, I am still working on getting the numbers for the others, the GDC-client was updated which broke HCMI yesterday (undocumented bug) so I'm seeing what that'll take to fix.

sgosline · 2024-11-24T21:03:42Z

yeah the numbers are unchanged since the paper, so we can just use those.

jjacobson95 · 2024-11-24T21:15:23Z

Sounds great

sgosline · 2024-11-27T02:31:41Z

I am adding another commit to where i now have things working. hope they work for you.

mPnst and mpnstpdx code now build.

jjacobson95 · 2024-12-02T21:39:46Z

Looks like this is pretty close to working. No error messages to report during the build, however, the validator is failing for mpnst_transcriptomics, mpnst_proteomics, mpnst_copy_number, and mpnst_mutations.

All of the error messages where is fails are the same:
'improve_sample_id' is a required property.

It indicates that there are thousands of rows with missing improve_sample_id values.

For example, for mpnst_transcriptomics.csv:

Header (improve_sample_ids are present)

entrez_id,transcriptomics,improve_sample_id,source,study
729759,15.555831,22,NF Data Portal,MPNST PDX MT
401934,0.103466,22,NF Data Portal,MPNST PDX MT
388581,3.487198,22,NF Data Portal,MPNST PDX MT
388581,4.795471,22,NF Data Portal,MPNST PDX MT
388581,28.787805,22,NF Data Portal,MPNST PDX MT
388581,7.684035,22,NF Data Portal,MPNST PDX MT
388581,2.763487,22,NF Data Portal,MPNST PDX MT
80772,5.464398,22,NF Data Portal,MPNST PDX MT
80772,1.631856,22,NF Data Portal,MPNST PDX MT

Tail (improve_sample_ids are not present)

4513,766.053199,,NF Data Portal,MPNST PDX MT
4509,113.918447,,NF Data Portal,MPNST PDX MT
4508,134.885977,,NF Data Portal,MPNST PDX MT
4514,1387.07963,,NF Data Portal,MPNST PDX MT
4537,350.462059,,NF Data Portal,MPNST PDX MT
4539,212.951003,,NF Data Portal,MPNST PDX MT
4538,143.134393,,NF Data Portal,MPNST PDX MT
4540,50.31931,,NF Data Portal,MPNST PDX MT
4541,9.996729,,NF Data Portal,MPNST PDX MT
4519,90.065835,,NF Data Portal,MPNST PDX MT

sgosline · 2024-12-03T00:29:17Z

I can't seem to repro this error locally. it all works on my end, using this branch.

jjacobson95 · 2024-12-04T21:24:58Z

After clearing the cache and rebuilding, I am still ending up with missing improve_sample_ids.

This is my run command:

 # ensure branch is up to date
git pull 

# ensure branch is on drop_drugs branch
git branch 

# remove all docker caches and images
docker stop $(docker ps -q)
docker rm $(docker ps -aq)
docker rmi $(docker images -q) -f
docker builder prune -a --force
docker system prune -a --volumes -f

# ensure nothing is in local to interfere or affect the build
rm -r local    

#run build command
python build/build_dataset.py --dataset mpnst --build

I'll go ahead and git clone to test a totally fresh repo as well. Maybe there is some artifact that could be causing this.

*Update: On a clean repo, this issue is still present for me. I am using a Mac with an M1 chip.

sgosline added 4 commits November 20, 2024 18:04

continued updatest to MT dta and pDX data

45842dd

added docker and code for PDX data build

ea9be4a

Moved over files from main to enable build test

51adfeb

got single dataset build to work with script

c4f0db8

sgosline requested a review from jjacobson95 November 22, 2024 00:23

updated

1680f41

fixed #251

8a0fe22

sgosline added a commit that referenced this pull request Nov 27, 2024

addresses the issues in #251

d0dc5ed

mPnst and mpnstpdx code now build.

jjacobson95 closed this in 8a0fe22 Dec 6, 2024

jjacobson95 merged commit df14101 into main Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPNST sample updates AND PDX code addition #251

MPNST sample updates AND PDX code addition #251

sgosline commented Nov 22, 2024

jjacobson95 commented Nov 22, 2024

jjacobson95 commented Nov 22, 2024

sgosline commented Nov 22, 2024 •

edited

Loading

jjacobson95 commented Nov 22, 2024

jjacobson95 commented Nov 22, 2024 •

edited

Loading

sgosline commented Nov 22, 2024 •

edited

Loading

jjacobson95 commented Nov 22, 2024

sgosline commented Nov 23, 2024

jjacobson95 commented Nov 23, 2024 •

edited

Loading

jjacobson95 commented Nov 23, 2024 •

edited

Loading

sgosline commented Nov 24, 2024

jjacobson95 commented Nov 24, 2024

sgosline commented Nov 24, 2024

jjacobson95 commented Nov 24, 2024

sgosline commented Nov 27, 2024

jjacobson95 commented Dec 2, 2024 •

edited

Loading

sgosline commented Dec 3, 2024

jjacobson95 commented Dec 4, 2024 •

edited

Loading

MPNST sample updates AND PDX code addition #251

MPNST sample updates AND PDX code addition #251

Conversation

sgosline commented Nov 22, 2024

jjacobson95 commented Nov 22, 2024

jjacobson95 commented Nov 22, 2024

sgosline commented Nov 22, 2024 • edited Loading

jjacobson95 commented Nov 22, 2024

jjacobson95 commented Nov 22, 2024 • edited Loading

sgosline commented Nov 22, 2024 • edited Loading

jjacobson95 commented Nov 22, 2024

sgosline commented Nov 23, 2024

jjacobson95 commented Nov 23, 2024 • edited Loading

jjacobson95 commented Nov 23, 2024 • edited Loading

sgosline commented Nov 24, 2024

jjacobson95 commented Nov 24, 2024

sgosline commented Nov 24, 2024

jjacobson95 commented Nov 24, 2024

sgosline commented Nov 27, 2024

jjacobson95 commented Dec 2, 2024 • edited Loading

sgosline commented Dec 3, 2024

jjacobson95 commented Dec 4, 2024 • edited Loading

sgosline commented Nov 22, 2024 •

edited

Loading

jjacobson95 commented Nov 22, 2024 •

edited

Loading

sgosline commented Nov 22, 2024 •

edited

Loading

jjacobson95 commented Nov 23, 2024 •

edited

Loading

jjacobson95 commented Nov 23, 2024 •

edited

Loading

jjacobson95 commented Dec 2, 2024 •

edited

Loading

jjacobson95 commented Dec 4, 2024 •

edited

Loading