Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove drugs from CTRP dataset #241

Open
sgosline opened this issue Oct 29, 2024 · 7 comments · May be fixed by #252
Open

remove drugs from CTRP dataset #241

sgosline opened this issue Oct 29, 2024 · 7 comments · May be fixed by #252
Assignees

Comments

@sgosline
Copy link
Member

remove these drugs:
BRD-K03911514
BRD-K07442505
BRD-K13185470
BRD-K16130065
BRD-K20514654
BRD-K27188169
BRD-K55473186
YL54
BRD-K58730230
BRD-K79669418
BRD-K99584050

@sgosline sgosline self-assigned this Oct 29, 2024
@sgosline sgosline assigned jjacobson95 and unassigned sgosline Nov 13, 2024
@sgosline sgosline moved this to In progress in CoderData Nov 13, 2024
@sgosline
Copy link
Member Author

@jjacobson95 why dont you go ahead and add these before starting the build process.

@jjacobson95
Copy link
Collaborator

jjacobson95 commented Nov 13, 2024

Are these in the current data? I'm not finding them in the broad_sanger drugs file in the 0.1.40 build.

To Reproduce, run the following:


Bash

coderdata download --prefix broad_sanger_drugs.tsv.gz
gunzip broad_sanger_drugs.tsv.gz

Python

all_drugs = pd.read_csv("broad_sanger_drugs.tsv",sep="\t")

brd_list = [
    "BRD-K03911514",
    "BRD-K07442505",
    "BRD-K13185470",
    "BRD-K16130065",
    "BRD-K20514654",
    "BRD-K27188169",
    "BRD-K55473186",
    "YL54",
    "BRD-K58730230",
    "BRD-K79669418",
    "BRD-K99584050"
]

# filter by items in the list above
all_drugs = all_drugs[all_drugs.chem_name.isin(brd_list)]
print(all_drugs) # empty

# check for lowercase versions of drugs in list
brd_list = [item.lower() for item in brd_list] 
all_drugs = all_drugs[all_drugs.chem_name.isin(brd_list)]
print(all_drugs) # also empty

Resulting Empty Dataframe:
Screenshot 2024-11-13 at 1 28 57 PM

@jjacobson95
Copy link
Collaborator

jjacobson95 commented Nov 13, 2024

I will still build in code to remove them if they exist, just wondering where they are coming from here.

@sgosline
Copy link
Member Author

Maybe they drop out in the pubchem or structure mapping. We can just cloes this if they are not there.

@jjacobson95
Copy link
Collaborator

It could be that they were missing due to the incompleteness of the previous build. I already added it to the pipeline in the drop_drugs branch so it shouldn't hurt to have.

@sgosline
Copy link
Member Author

So can we close this?

@sgosline sgosline moved this from In progress to Done in CoderData Nov 22, 2024
@jjacobson95 jjacobson95 linked a pull request Nov 22, 2024 that will close this issue
@jjacobson95
Copy link
Collaborator

Once we merge PR #252, this will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants