-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPNST sample updates AND PDX code addition #251
Conversation
Does this branch replace the mpnst build with mpnstPDX or does it intend for both to be run? Trying to figure out how to resolve these conflicts. |
It looks like its built to be separate dataset from mpnst, is this the plan? |
I updated the mpnst dataset (drug response in PDX MT) and added the mpsntpdx (drug response in vivo pdx) dataset. |
Is the samples file duplicated between the two datasets? Trying to figure out how build/mpnstPDX/build_samples.sh should work. |
Same question with the drugs file. There is no build_drugs.sh, should I make this just a duplicate of the mpnst_drugs.tsv file as well? |
The sample file should be copied, not regenerated again because the original samples are the same. I'm not sure what happened to the drug file, it has been added now. |
The mpnst_copy_number.csv file is missing improve_sample_id values for some rows. When running the validation code, I'm getting this error:
Tail of mpnst_copy_number in this branch:
Tail of mpnst_copy_number in build 0.1.40:
|
Ok, the latest commit should fix this. its a change to the sample generation I added.... |
Running into several issues as this branch is 71 commits behind main. Don't have time today to work on these, but in a merged branch (drop_drugs) I am running into the following issues for MPNST (not pdx). Drugs File
Experiments file:
|
As these are preventing the full build, I'm going to exclude them to get the updated drug and samples numbers for the others for AACR. When these are working, I'll rebuild again with them included. |
So is the bug in the pdx_code branch? or the drop_drugs branch? I can work on it today. |
Sorry this is a bit complicated. It is currently on the drop_drugs branch and I think it exists because some files don't match how the build_all / build_dataset process is currently working. This updated build process is detailed in mpnst-readme-update branch. If you could just get the sample / drug numbers for these (for AACR), I'd be happy to align this to the build process when I return after thanksgiving. Just a side note, I am still working on getting the numbers for the others, the GDC-client was updated which broke HCMI yesterday (undocumented bug) so I'm seeing what that'll take to fix. |
yeah the numbers are unchanged since the paper, so we can just use those. |
Sounds great |
I am adding another commit to where i now have things working. hope they work for you. |
mPnst and mpnstpdx code now build.
Looks like this is pretty close to working. No error messages to report during the build, however, the validator is failing for mpnst_transcriptomics, mpnst_proteomics, mpnst_copy_number, and mpnst_mutations. All of the error messages where is fails are the same: It indicates that there are thousands of rows with missing improve_sample_id values. For example, for mpnst_transcriptomics.csv: Header (improve_sample_ids are present)
Tail (improve_sample_ids are not present)
|
I can't seem to repro this error locally. it all works on my end, using this branch. |
After clearing the cache and rebuilding, I am still ending up with missing improve_sample_ids. This is my run command:
I'll go ahead and *Update: On a clean repo, this issue is still present for me. I am using a Mac with an M1 chip. |
This is a pretty large merge as my branch is a bit out dated, so @jjacobson95 please confirm that the dataset build functionality works! (I didn't try
build_all.py
, onlybuild_dataset.py
).This PR closes #146 which has been open far too long. The PDX data required a new curve fitting statistic, which was part of there delay. There was a second untracked issue about the
mpnst
samples not matching entirely, but that has also been resolved .