-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: respect original SMILES index in fp-idx #2
Comments
Hi, I wrote this code as part of a blog post mainly to show what a good baseline looks like. At the time there were some talks/papers coming out with some complex algorithms that claimed to be useful only because the existing implementations were bad. Feel free to submit patches but if you're mainly just using it I would recommend trying out: http://chemfp.com/. |
Thank you for the quick reply! I'm impressed - your code is really nice, especially for a blog post. If multi-threading was added to this beauty, it'd be really close to a full-fledged solution. :P Yeah, the problem with Chemfp is that it basically requires OpenEye's software for fast fingerprint generation (the RDKit and OpenBabel implementations look slow as molasses) and the free version of Chemfp doesn't handle that many molecules (the paid version also seems to be a little iffy as to whether it can really support more than 300 million molecules). Your code actually supports most of my needs and seems to be the best free offering around. I'll see if I can figure out this indexing issue too - I was just wondering if it was something easy for you to fix. |
You can use CDK to generate an FPS and use it with ChemFP. |
BTW this is my paid version: https://www.nextmovesoftware.com/arthor.html |
Thank you! Yeah, maybe the ChemFP road with CDK might be viable too - it's just annoying to have to write custom software after paying a licensing fee. You definitely have the best solution with Arthor - it's just going to take a bit of work for my lowly "research assistant" self to raise the necessary funds. :P |
It would just be really nice if the fp-idx tools retained the same index number as their original SMILES input. I've ran into a problem where the index of skipped SMILES strings is ignored, which consequently messes up the indexing between fp-idx's downstream .fps and .idx files. This would especially useful for getting the original SMILES string back from similarity searches.
The text was updated successfully, but these errors were encountered: