-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alignment] Meta-issue #314
Comments
A few things come to my mind scanning over all the new issues: In general there seem to be three dimensions of configuration right now:
To be honest, I think that's a bit much. Why again did global VS local have to be separate configs and not config elements? We could reduce the complexity of the configuration by one dimension if users could just do Myers. Why is it a different align_config? That's seems strange, because myers is not orthogonal to local or global, it's just a different algorithm for doing either, or not? Ideally, I would like to not expose this at all and just pick myers implementation in the background for all pairwise alignments where the config allows this and we expect myers to be faster. If this cannot be reasonably assumed (what are the corner cases?), I would rather implement the switch at the top-level, i.e. have Concerning naming: as we discussed before, we don't use literal algorithm names (or author names) anywhere else in the library which is good, because it's technical detail. We don't want our users to have to know who Myers is, that's why I proposed "pairwise_bitvector" above, but maybe you can think of something better? Also concerning the actual function-call being We could name the functions: seqan3::align_pairwise();
seqan3::align_pairwise_bitvector();
seqan3::align_multi(); So it would just be pulling "align" into the function name. Note that this also turns the function name into a verb again. What do you think? |
One more thing concerning the delegate interface: // Delegate interface.
align::pairwise(input_rng, cfg) | on_hit([](auto & res) { std::cout << res.score << "\n" << res.alignment | view::to_cigar; }); If it is implemented as above, the delegate can just work on the range, and therefore the delegate cannot be executed in the thread that computes the result, but only work serially. I would either put the on_hit back into the config (and make the function not return a range in that case), or not implement the interface at all right now (if it doesn't provide a benefit over the range interface, it's not worth it, I think. |
They don't necessarily, but given that there will be more implementations, such as xdrop or split break point, and each of them having different constraints, e.g. not all config_elements can be used arbitrarily with any configuration. On this level, we can a) document well that it is a different algorithm, and b) better check on the constraints. Also consider, that usually you don't switch between global or local within the application, but have one of them executed.
The problem with the pick and choose is, that some configurations would automatically switch the runtime behaviours in the background, which in case of the Myers' is tremendous. It might be possible to integrate a configuration that suggests to the user that the most efficient implementation is used if applicable (nicely, if it would expand on SIMD implementations as well - I will think about it), but I at least want to have a specific configuration to give the specific algorithm. IMO since Myers' is extremely limited it does not make sense to have them as a configuration element.
I don't think another top-level function is fitting in this case. I do agree of giving it another name but I think, bitvector is also insignificant. I could imagine a cfg as
I in general agree, that's why added the other possibility as a different strategy so we can discuss this. I at least didn't want to have something like, Well, at the end we need to find a proper design that is less confusing for the people to find things, or where they expect things to have. So for me putting the configurations and such into align namespace makes it clearer to look for the corresponding sources and elements. An aligned_sequence_concept is used in different places, and thus not bound to the namespace seqan3::align I guess. I would use the same argument for the high-level invocation functions.
This is exactly something I need to discuss with @marehr. But honestly, I can't say anything right now about the limitations/usage scenarios so this needs to be further specified. It is not on the immediate todo anyway. We need to learn more from the push_mi design and the executor design and see which direction the standard is going. |
@rrahn I added seqan/seqan#355 as a design consideration. cc: @smehringer, @h-2 |
@rrahn and I already agreed on doing it the "right way" 😉 |
@rrahn Is xdrop already implemented in seqan3? |
This is a meta-issue collecting tasks which are transformed into single issues:
The implementation is blocked by #295General
align
seems to be the best fit for the namespace in order to distinguish the configurations from other modules. Accordingly, I suggest to call the final invocation functionsalign_pairwise(...)
andalign_multiple(...)
to distinguish between pairwise and multiple alignment, which differ in the input (2 sets vs 1 set of sequences). If both function remain in the namespace ofalign
, I would suggest only the shorter versionalign::pairwise(...)
andalign::multiple(...)
.align
[Setup and document namespace]align
#318Things to make better in seqan3 than in seqan2
Required Configurations:
align_cfg::global
[Implement]align::global_config
#319align_cfg::local
[Add align::local_config #321]align_cfg::edit
[Add]align::myers_config
#322align_cfg::semiglobal
[Add]align::myers_config
#322Optional Configuration Elements
align_cfg::gap_
Add configuration element for affine gaps #323]align_cfg::scoring
[Add configuration adaptor]; Depends on [[Meta-Issue] Scoring Scheme Module #327]with_scoring
#326align_cfg::result
align_cfg::free_ends
align_cfg::band
Execution
Finalise myers-alignment code and merge #259]on_hit
execution functorTest functionality
alignment_score_matrix(_banded)
to have a matrix representation of unbanded and banded alignments.alignment_trace_matrix(_banded)
to have a matrix representation for traces of unbanded and banded alignments.Parallelization
This is splitted into epics: seqan/product_backlog#54 and seqan/product_backlog#57
Examples:
The text was updated successfully, but these errors were encountered: