Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select subset of the data #265

Closed
1 task
gcroci2 opened this issue Jul 10, 2024 · 2 comments · Fixed by #269
Closed
1 task

Select subset of the data #265

gcroci2 opened this issue Jul 10, 2024 · 2 comments · Fixed by #269
Assignees
Labels
enhancement New feature or request

Comments

@gcroci2
Copy link
Contributor

gcroci2 commented Jul 10, 2024

Once I've generated the data by following the quickstart, I need to select a subset of them for testing purposes:

import pickle

with open('output/npl.pkl', 'rb') as f:
    data = pickle.load(f)
bgcs, gcfs, spectra, mfs, strains, links = data

# select subset for bgcs, gcfs, spectra, mfs
bgcs_selected = bgcs[0:10]
gcfs_selected = gcfs[0:10]
spectra_selected = spectra[0:10]
mfs_selected = mfs[0:10]
# select subset for strains
strains_list = [strain for strain in strains]
strains_set = set(strains_list[0:10])
strains.filter(strains_set)
  • Could it be a useful addition to give the possibility to filter via indexes the strains as well? As we do for the rest.
  • For the links, I didn't figure out a way for selecting a subset - and they are the heaviest part of the .pkl file, of course. I tried the following but it gives me TypeError: LinkGraph.add_link() takes 3 positional arguments but 4 were given:
from nplinker.scoring import LinkGraph

selected_links = links.links[0:100]

lg_selected = LinkGraph()
for i in range(len(selected_links)):
    lg_selected.add_link(selected_links[i][0], selected_links[i][1], selected_links[i][2]['metcalf'])

Action points

  • implement the filter method for LinkGraph: lg1 = links.filter(gcfs, spectra)
@github-project-automation github-project-automation bot moved this to Backlog in dev Jul 10, 2024
@CunliangGeng CunliangGeng moved this from Backlog to Ready in dev Jul 11, 2024
@CunliangGeng
Copy link
Member

Could it be a useful addition to give the possibility to filter via indexes the strains as well? As we do for the rest.

It's possible. But why do you need to select subset of strains? Note that the bgcs_selected you got might have strains that are not present in the subset of the strains.

@gcroci2
Copy link
Contributor Author

gcroci2 commented Jul 11, 2024

We decided to proceed with creating a filtering method in the LinkGraph class for filtering nodes given as input selected entities.

@CunliangGeng CunliangGeng added the enhancement New feature or request label Jul 11, 2024
@CunliangGeng CunliangGeng moved this from Ready to In progress in dev Jul 12, 2024
@CunliangGeng CunliangGeng linked a pull request Jul 12, 2024 that will close this issue
@CunliangGeng CunliangGeng moved this from In progress to In review in dev Jul 15, 2024
@github-project-automation github-project-automation bot moved this from In review to Done in dev Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants