Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data param allows selection of the same dataset twice #19803

Open
pavanvidem opened this issue Mar 13, 2025 · 16 comments
Open

Data param allows selection of the same dataset twice #19803

pavanvidem opened this issue Mar 13, 2025 · 16 comments

Comments

@pavanvidem
Copy link
Member

pavanvidem commented Mar 13, 2025

Describe the bug
The dropdown dataset selection (not the column select) allows the same dataset to be selected twice for a single parameter. It happens only when a user double clicks on the dataset while selecting. I guess this should not be allowed. If I remember correctly, when a dataset was already selected from the dropdown, it would disappear from the dropdown. But in the latest version, the selected datasets still appear in the dropdown.

  • The first screenshot shows that the dataset 4 was selected twice.
  • The second screenshot shows that the dataset 5 was selected as count file but still shown in the dropdown.

Screenshots

Image


Image

Galaxy Version and/or server at which you observed the bug
Galaxy Version: {"version_major":"24.2","version_minor":"2.dev0"} on usegalaxy.eu

Browser and Operating System
Operating Systems: Ubuntu, Windows
Browser: Chrome, Firefox

Galaxy History with error
We observed this yesterday during a workshop. We could find it because luckily DESeq2 was complaining about duplicate rows, which should not be the case. After a bit of investigation, it turned out that the problem was of course, not the inputs but duplicate file selection. Here is the history with the error reproduced multiple times: https://usegalaxy.eu/u/videmp/h/deseq2-duplicate-genes-error-investigation

To Reproduce
Steps to reproduce the behavior:

  1. Take a successful run
  2. Unselect all the count files for one of the factor levels
  3. While selecting again, double click on of the count files
  4. Run the tool.
  5. If it is error state, check one of the output dataset details to see the duplicated count file which was double clicked.
@dadrasarmin
Copy link

dadrasarmin commented Mar 13, 2025

I could produce a similar "mistake" with another tool (Falco). I believe the problem is the drop down menu UI as mentioned above. With Falco, if I change the Raw read data from your current history from single dataset to multiple datasets, I can click multiple times (2 or more) on just one input file, and then the job will be submitted 2 times instead of 1 time.
Link to the history for reproducing.

This behavior was also observed with another tool during the workshop called plotCorrelation. If you click on an input multiple times, it will be added multiple times to the plot that is the result of this task.

@pavanvidem
Copy link
Member Author

thanks @dadrasarmin that's a better example for reproducing!

@mvdbeek
Copy link
Member

mvdbeek commented Mar 17, 2025

Of note, it is valid to add the same dataset multiple times in a multiple="true" input, say if you want to provide shared controls or plot something twice, that should continue to be possible.

I struggled to reproduce your examples, for me the missing step is that once a dataset is selected, click outside of the select field, click into the select field again and type to engage the filtering. At that point I can click on the same dataset again. It won't be shown in the list of selected datasets which is definitely a bug, but it is in the tool state and the submitted payload contains the dataset twice (which I don't think is a bug).

@ElectronicBlueberry
Copy link
Member

Of note, it is valid to add the same dataset multiple times in a multiple="true" input, say if you want to provide shared controls or plot something twice, that should continue to be possible.

Do you have a specific example for this? If this is intended we need to think about a better way of communicating this in the UI.

@bgruening
Copy link
Member

Of note, it is valid to add the same dataset multiple times in a multiple="true" input, say if you want to provide shared controls or plot something twice, that should continue to be possible.

While technically true, I'm pretty sure this has not been the case before. Datasets disappeared from the select list when already added, which for the majority of tools is what you want and how we can prevent errors.
I'm wondering since when this happened and if we should offer an additional annotation to the tools, to allow_duplicats="true".

If this is not a good suggestion maybe we should grey out the already selected files and sort them down in the list?

Whatever we are doing, can we please start documentation for the select box and list our rationale for all the features and reasons? This will make it easier for future implementations, etc.

@pavanvidem
Copy link
Member Author

pavanvidem commented Mar 17, 2025

Ok, I will try to explain this better :) Thanks to @dadrasarmin for finding another example.

If it is a multi-select feature, then user should be able to select multple (also more than twice) times. But the UI allows only selection of any file maximally 2 times regardless of the number of times clicked. To me, it seems to be a UI bug than a multi-select feature.

Here is concat tool run which does not have an input multiple="true". If I click multiple times on the file, it selects twice and run the tool twice. In the column-select mode you can see that there are 2 files selected but still shows only 1 file as selected. I guess it should not be possible for a param that does not have multiple="true", or?

Image

@dadrasarmin
Copy link

As a biologist, I would expect similar behaviour from two different wrapping of "Concatenating" as well (this and this), because there is no explanation of double click feature for these tools. However, the first tool submits two jobs of the file as its input and the second tool concatenate the file to itself and returns just one output. As mentioned by Pavan above, the difference is the way wrapper defines the parameter (multiple="true") but the user will probably does not check the code and, in my view, some kind of documentation is needed.

@mvdbeek
Copy link
Member

mvdbeek commented Mar 17, 2025

it should not be possible for a param that does not have multiple="true", or?

yes, that should definitely not be possible.

Do you have a specific example for this? I

Image

Say A is a control for B and C. Anyway, it is true that you couldn't select this in the user interface outside of workflows or the API. I would appreciate a fix that makes it clear what is selected, since at this point you can click on re-run and get into this situation.

@ahmedhamidawan
Copy link
Member

ahmedhamidawan commented Mar 18, 2025

Could this be somewhat related to ecbacaf ?

Edit: Actually no, this is a case of being able to click a value twice manually, whereas this commit fixed the case where somehow handleIncoming receives duplicate datasets...

@mvdbeek
Copy link
Member

mvdbeek commented Mar 26, 2025

I concede any change is a good change, I would be very happy if someone could take this on.

@davelopez
Copy link
Contributor

I'll take a look, since I'm already investigating #19813 and might need to touch the FormSelect component (if that is where the issue is)

@davelopez davelopez self-assigned this Mar 26, 2025
@davelopez
Copy link
Contributor

I couldn't reproduce this in the current 24.2 branch, so I checked out some older commits and could easily reproduce it again.

So I can confirm that #19837 already fixed this 🎉
Thanks @ahmedhamidawan and @jdavcs!

@mvdbeek
Copy link
Member

mvdbeek commented Mar 26, 2025

Still broken, sorry.

@mvdbeek mvdbeek reopened this Mar 26, 2025
@mvdbeek
Copy link
Member

mvdbeek commented Mar 26, 2025

Drag a dataset that is already there into the batch field. There's something fundamentally not correct here:

Image

Same would happen if you click re-run on a job that ran with duplicates

@davelopez
Copy link
Contributor

Ouch! I didn't try drag&drop just the way it was reported here. I'll keep investigating then 😞

@mvdbeek
Copy link
Member

mvdbeek commented Mar 26, 2025

I don't think it was part of the ticket before, but the general issue being that we don't enforce unique inputs I realised this is the easiest way to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants