-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Meta] Gather issues observed with ML-bot and find ways to improve it #256
Comments
@softvision-raul-bucata : We've observed that the ML BOT changes the milestone to ml-autoclosed for what seems to be valid issues. If the ML-BOT needs adjustment, we can manually triage the issues that seem valid, and move them to the relevant milestone until the adjustment is made. |
@ksy36 reply: Right now the criteria for the issue to be considered "valid" is whether it is moved to "needsdiagnosis" milestone. So the ML model takes into account the content of such issues (Domain, Tested different browser, UA, Description, etc.) and based on that makes a decision on incoming issues classification (valid/invalid). As we have a lot more "invalid" issues than "valid", and the number grows over time, I think this shift is bound to happen. I've been looking at ml-autoclosed issues from time to time and moving some of them to needstriage/needsdiagnosis for the model to learn. But a lot of issues that seem valid are not reproducible, so even if an issue is reopened, it is not necessarily contributing to the "valid" pool. We can try to improve the current rate. I've experimented with training a new model with these changes on the issues that you sent and it seems to make the prediction more accurate:
So yeah, that would be awesome, if you have some time. If you notice an issue that seems reasonable and potentially valid, you could move it to the "accepted" milestone, as you did with issues on that list. It doesn't need to be done on a daily or weekly basis, maybe every other week or 2. I'll make these 2 changes to the model and will keep reopening issues that seem potentially valid as well. I could also experiment with the model further to make it more accurate. Also, one thing that is worth mentioning - if we receive a lot of duplicates for a certain issue, they never end up in needsdiagnosis. At some point, there are so many duplicates that the model considers a lot of future duplicates "invalid" as the weight is so much more on the "invalid" side and there is only one "valid" issue (example of that is imgur.com). While the duplicates are technically "valid", they are not actionable, so the fact that they're automatically closed works for us. |
Maybe we could add an additional column to track whether an issue has resolved in an actionable milestone (needsdiagnosis or moved). I've tried to add it in the spreadsheet, but don't have edit access. I think this issue is only one as of today: And these 3 might be valid, but we can't test as a special account is needed: |
I've updated the list with the created date and status. |
I've added webcompat/webcompat.com#3685 to temporarily add automatic labelling to gather statistics:
I'll add the same labels to the issues that are currently on the list once the change is deployed. So we won't need to manually update the list once all labels are added as this data will be on GitHub and Elasticsearch db. |
Deployed webcompat/webcompat.com#3685, so it's adding the labels now. Also added the labels to the issues from the list: https://github.com/webcompat/web-bugs/issues?q=is%3Aissue+is%3Aopen+label%3Abugbug-reopened |
I've built a graph in kibana to visualize closed as invalid - issues with Despite the fact that there can be a lot of issues that looked valid, percent of "true" valid issues is quite low, with 2.52% being the highest: So "true" valid issues is what we should pay attention to and around 1-2% missed issues is expected. To get accurate results we need to keep reopening closed issues, so I've been doing it for the past week and will continue to do so until we have 3-4 weeks of data. It could be that the first improvement to the model might have been enough to low the amount of "true" valid issues that are closed as invalid. |
That's really cool. |
An update here: I've been reopening issues for the past 6 weeks, and the percentage of missed issues is within expected range with 1.82% being the highest: We could potentially increase the confidence threshold (from 97% to 98 or 99%) and it might improve accuracy by a bit, but would increase the amount of issues that need to be triaged manually. With the current rate of missed issues, I think we have the optimal balance. There are also a few things that I can experiment with:
|
This task is created to gather the issues which are observed in regards to ML-bot behavior and come up with ways to improve it.
The text was updated successfully, but these errors were encountered: