Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: respect original SMILES index in fp-idx #2

Open
tantrev opened this issue Dec 7, 2018 · 5 comments
Open

Request: respect original SMILES index in fp-idx #2

tantrev opened this issue Dec 7, 2018 · 5 comments

Comments

@tantrev
Copy link

tantrev commented Dec 7, 2018

It would just be really nice if the fp-idx tools retained the same index number as their original SMILES input. I've ran into a problem where the index of skipped SMILES strings is ignored, which consequently messes up the indexing between fp-idx's downstream .fps and .idx files. This would especially useful for getting the original SMILES string back from similarity searches.

@johnmay
Copy link
Owner

johnmay commented Dec 7, 2018

Hi, I wrote this code as part of a blog post mainly to show what a good baseline looks like. At the time there were some talks/papers coming out with some complex algorithms that claimed to be useful only because the existing implementations were bad.

Feel free to submit patches but if you're mainly just using it I would recommend trying out: http://chemfp.com/.

@tantrev
Copy link
Author

tantrev commented Dec 7, 2018

Thank you for the quick reply! I'm impressed - your code is really nice, especially for a blog post. If multi-threading was added to this beauty, it'd be really close to a full-fledged solution. :P

Yeah, the problem with Chemfp is that it basically requires OpenEye's software for fast fingerprint generation (the RDKit and OpenBabel implementations look slow as molasses) and the free version of Chemfp doesn't handle that many molecules (the paid version also seems to be a little iffy as to whether it can really support more than 300 million molecules).

Your code actually supports most of my needs and seems to be the best free offering around. I'll see if I can figure out this indexing issue too - I was just wondering if it was something easy for you to fix.

@johnmay
Copy link
Owner

johnmay commented Dec 7, 2018

You can use CDK to generate an FPS and use it with ChemFP.

@johnmay
Copy link
Owner

johnmay commented Dec 7, 2018

BTW this is my paid version: https://www.nextmovesoftware.com/arthor.html

@tantrev
Copy link
Author

tantrev commented Dec 7, 2018

Thank you! Yeah, maybe the ChemFP road with CDK might be viable too - it's just annoying to have to write custom software after paying a licensing fee. You definitely have the best solution with Arthor - it's just going to take a bit of work for my lowly "research assistant" self to raise the necessary funds. :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants