You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to start by saying that I've found this package to be very convenient and useful!
My only issue is the time complexity of the name2taxids function. It does a linear search through the db.names dictionary (of type Dict{Int, String}), cumulating all IDs that match the name, which will be slow for larger datasets.
I found that you can essentially invert the db.names dictionary to get a name => taxids dictionary (of type Dict{String, Vector{Int}}), but it can take a couple of seconds to create. Although this far outweighs the minutes or even hours that might be spent on doing linear searches for every query one might have.
I reckon something along the lines of a function for creating such a dictionary would be nice to have. It's rather trivial to do manually, but requires accessing stuff that are not user-facing.
This is what I've been doing:
name_to_taxids =Dict{String, Vector{Int}}()
for (taxid, name) in db.names
push!(get!(name_to_taxids, name, Int[]), taxid)
end
Cheers!
The text was updated successfully, but these errors were encountered:
Thank you for the kind comment and valuable feedback!
I indeed thought that the name2taxids function could be improved, and I think your ideas are very good.
Currently, I am very busy and don't seem to have even a little time to devote to development. However, I definitely want to improve on this point.
Of course, opinions on more detailed implementations or PRs are welcome!
Howdy!
I want to start by saying that I've found this package to be very convenient and useful!
My only issue is the time complexity of the name2taxids function. It does a linear search through the db.names dictionary (of type Dict{Int, String}), cumulating all IDs that match the name, which will be slow for larger datasets.
I found that you can essentially invert the db.names dictionary to get a name => taxids dictionary (of type Dict{String, Vector{Int}}), but it can take a couple of seconds to create. Although this far outweighs the minutes or even hours that might be spent on doing linear searches for every query one might have.
I reckon something along the lines of a function for creating such a dictionary would be nice to have. It's rather trivial to do manually, but requires accessing stuff that are not user-facing.
This is what I've been doing:
Cheers!
The text was updated successfully, but these errors were encountered: