Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hold for Payment 2024-09-10][$2000][Tracking] ProposalPolice™ 👮🏼 #35108

Closed
roryabraham opened this issue Jan 24, 2024 · 68 comments
Closed
Assignees
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Task Weekly KSv2

Comments

@roryabraham
Copy link
Contributor

roryabraham commented Jan 24, 2024

Slack context: https://expensify.slack.com/archives/C01GTK53T8Q/p1706120841703079

Proposal

develop a GitHub 🤖 using GPT-4 capabilities to standardize and monitor proposals.

Strategy

The purpose of the Expensify Open-Source Community is to provide a global community of all the best React Native developers in the world streamlined access to contribute to the Expensify product. In order to attract and retain talented developers, the Expensify Open-Source Community must be inclusive, fair, and provide equitable opportunity to all community members, including newcomers.

Problem

There is a tendency among the open-source community to perform what I call speedrun proposals – a contributor will post an incomplete or otherwise low-quality proposal very quickly in order to be the first proposal, then subsequently edit it multiple times. Sometimes these edits significantly alter the RCA or solution and/or derive from later proposals without clear acknowledgement.
When the C+ or Expensify engineer go to review those proposals, they will tend to choose the first proposal of sufficient quality that solves the problem – and in the case of edited proposals it takes some extra sleuthing to understand the true timeline of proposals.
In short, this practice of speedrun proposals inhibits open collaboration amongst contributors and makes it harder for reviewers to fairly evaluate proposals against each other.

Solution

Create a open-source GitHub bot (tentatively named the proposal-police™ 👮🏼🚨) that helps to enforce the documented best practices for proposals, and aids the reviewers in maintaining a clearer timeline of proposals.

More specifically, the proposal-police™ would leverage GPT-4 to:

  • Enforce the use of the proposal template by scanning new issue comments. If a proposal does not follow the template, the bot will comment, tagging the submitter and providing a link to the proposal template to encourage standardization.
  • Monitor issue edits of proposals, classifying them as either minor or substantive. When a substantive change is detected that alters the initial problem statement, RCA, or proposed solution, the bot will append a suffix to the comment like: Last edited: timestamp, making the timeline clearer to proposal reviewers.

In conclusion, the proposal-police™ will help promote higher quality proposals and a more informed and trust-based review process 🤝, improving baseline fairness and allowing community members and Expensify employees to better focus on getting shit :done:!

Issue OwnerCurrent Issue Owner: @ikevin127
@roryabraham roryabraham added Weekly KSv2 Planning Changes still in the thought process NewFeature Something to build that is a new item. labels Jan 24, 2024
Copy link

melvin-bot bot commented Jan 24, 2024

Current assignee @mallenexpensify is eligible for the NewFeature assigner, not assigning anyone new.

@roryabraham
Copy link
Contributor Author

Credit to @ikevin127 for the proposal. Comment for assignment 😉

@marcochavezf
Copy link
Contributor

marcochavezf commented Jan 24, 2024

Happy to assist with the review, co-assigning myself

@marcochavezf marcochavezf self-assigned this Jan 24, 2024
@ikevin127
Copy link
Contributor

Comment ? Cheers!

@mallenexpensify
Copy link
Contributor

Edited by proposal-police: This proposal was added at 2024-01-24 13:15:24 UTC

@roryabraham @marcochavezf what are you thoughts about starting with an MVP that adds the above to a comment that's been edited? Then, we (all) can review how it's working then consider additional improvements.

@roryabraham
Copy link
Contributor Author

roryabraham commented Jan 24, 2024

@mallenexpensify are you basically suggesting that for the MVP we remove this piece?

Enforce the use of the proposal template by scanning new issue comments. If a proposal does not follow the template, the bot will comment, tagging the submitter and providing a link to the proposal template to encourage standardization

I don't want the edited by proposal-police to appear above any comment that's edited because that will cause false positives and discourage people from fixing typos or adding minor clarifications. That's where ChatGPT comes in to help enforce Rule #2 - only substantive edits need to be called out.

@ikevin127
Copy link
Contributor

ikevin127 commented Jan 25, 2024

(POC) Bot source: https://github.com/ikevin127/proposal-police

That's where ChatGPT comes in to help enforce Rule #2 - only substantive edits need to be called out.

For reference these are the current (POC) assistant instructions:

Instructions

You are a GitHub bot using GPT-4 to monitor and enforce proposal comments within a GitHub repo’s issues.

PROPOSAL TEMPLATE:

Proposal (mandatory line)

Please re-state the problem that we are trying to solve in this issue. - (mandatory line)

{user content here}

What is the root cause of that problem? - (mandatory line)

{user content here}

What changes do you think we should make in order to solve the problem? - (mandatory line)

{user content here}

What alternative solutions did you explore? (Optional) - (optional line)

{optional user content here}

IMPORTANT NOTES ON THE PROPOSAL TEMPLATE:

  • the "###" are optional, it can be just one #, two ## or 3 ### but these are OPTIONAL and the proposal should still be classified as VALID with different levels of markdown bold or none;
  • besides the "#" mentioned above, also adding emojis in between the bold markdown notation and the mandatory lines should still be classified as VALID with different levels of markdown bold or none; example: ## 🤖 Proposal - should be valid;
  • the last proposal optional line (What alternative solutions did you explore? (Optional)) can exist or not and no matter its {optional user content here}, the proposal should still be classified as VALID;

BOT ACTIONS:

  1. NEW COMMENTS: For each new comment, check if it’s a proposal by verifying the presence of mandatory lines in the proposal template - user content is allowed here. If all mandatory lines are present, respond with "NO_ACTION (without the quotes of course). If any mandatory line is missing, only respond with the following template: "{user} Your proposal will be dismissed because you did not follow the proposal template." (without the quotes of course), it is PARAMOUNT that this is the exact text and there's no other prose in the response. If the comment does not contain (## Proposal), also respond with "NO_ACTION" (without the quotes of course).

CHANGES CLASSIFICATION:

When comparing an initial proposal (unedited) with the latest edit of a proposal comment, only consider the following ‘CHANGES’ CLASSIFICATIONS:

a. MINOR: These will be small differences like correcting typos, adding permalinks, videos, screenshots to either the first, second or third proposal template mandatory lines or adding the (Optional) alternative - all these without considerable changes to the RCA aka (### What is the root cause of that problem?) or SOLUTION aka (### What changes do you think we should make in order to solve the problem?) initial text.

b. SUBSTANTIAL: With focus on the RCA / SOLUTION sections, these will be big differences on the RCA and / or SOLUTION sections (either one of them, or both of them) - meaning if initially the proposal’s RCA / SOLUTION text was mentioning a certain root cause or suggesting a certain solution and the latest edit is mentioning a completely different RCA and / or considerable solution changes.

  1. EDITED COMMENTS: For each edited proposal comment containing the (## Proposal) template title, compare the given initial proposal with the latest edit. If changes are MINOR, respond only with "NO_ACTION" (without the quotes of course), it is PARAMOUNT that there's no other prose in the response. If changes are SUBSTANTIAL, it is PARAMOUNT to respond ONLY with the following tamplate: "[EDIT_COMMENT] Edited by proposal-police: This proposal was edited at {updated_timestamp}." - also it is PARAMOUNT that there's no other prose in the response! No SUBSTANTIAL text response, ONLY the mentioned template between the quotes.

IMPORTANT NOTE: It is paramount to ALWAYS respond with the exact text provided in quotes (""). If neither case 1 nor 2 apply, only respond with "[NO_ACTION] - {describe the edge case here}." - it is PARAMOUNT that there's no other prose in the response.

Feel free to take apart the paragraphs where needed in order to change the AI's behaviour / response.

Currently on the AI side we return templates of either "NO_ACTION" (when proposal template was followed or edit changes are minor) or cases like "{user} Your proposal will be dismissed because you did not follow the proposal template." that include variables which we replace within the bot's code, variables we get from the github context depending on the triggered webhook (new comment, edited comment, etc).

@mallenexpensify
Copy link
Contributor

I don't want the edited by proposal-police to appear above any comment that's edited because that will cause false positives and discourage people from fixing typos or adding minor clarifications

How does Chat GPT inclusion help keep the _edited by proposal-police _ from being above the comment? I might me missing or not understanding a step or detail.

@ikevin127
Copy link
Contributor

ikevin127 commented Jan 28, 2024

Hey y'all 👋

I created the expensify-proposal-testing repo for testing purposes.

Sent you all collaborator invitations to the repo so you can create issues and play around.
Already created an issue here where I posted a few comments to test it out - feel free to use it or just create your own 👈

Note: make sure to add the Help Wanted label to new issues, otherwise the proposal-police won't act.

Currently we're using this format Edited by proposal-police: This proposal was added at 2024-01-24 13:15:24 UTC for when a proposal was edited substantially.

@mallenexpensify
Copy link
Contributor

Thanks @ikevin127 ,

Edited by proposal-police: This proposal was added at 2024-01-29 12:58:07 UTC.

Does it make sense to have the above read

Edited by proposal-police: This proposal was edited at 2024-01-29 12:58:07 UTC.

The main problem we're wanting to solve is making the timestamp for edits easily noticeable for everyone, right?

@ikevin127
Copy link
Contributor

ikevin127 commented Jan 31, 2024

True - for those proposals where the previous vs current edit has substantial changes in the RCA and solution sections 🫣

I updated the bot according to #35108 (comment) requirements 🚀

This is how it looks after the update - basically when the proposal-police edits the comment, it will mention the updated_at as timestamp of when the last significant edit was posted by the Contributor.

cc @mallenexpensify

@melvin-bot melvin-bot bot added the Overdue label Feb 8, 2024
@marcochavezf
Copy link
Contributor

Tested here but I suppose the GH bot is not running, correct? Also, what do you suggest as the next steps to implement it in the App repo?

@melvin-bot melvin-bot bot removed the Overdue label Feb 8, 2024
@ikevin127
Copy link
Contributor

@marcochavezf Hey 👋

It is running, has been for the past few days - what happened (I checked the logs) is that with all your 3 comments the gpt assistant responded with NO_ACTION 😅

You can check out some of my past comments on that issue and try different scenarios like:

  • wrong proposal template (I know you tried this in your last comment, but it looks like it just categorized it as a regular comment)
  • correct template but skeleton proposal, then edit it by adding substantial changes in RCA / solution

Obviously for those you tried and there was no action taken we'll need more fine-tunning if we all agree that it should've done something😁

As for the next steps to implement it within the App repo:

Note

To be taken with a grain of🧂
Feel free to add / change / subtract ♻️

  1. 🛠️ Setup stage
  • create the assistant on Expensify owned OpenAI account
  • github bot server setup / deploy pipeline to facilitate iterating / fine-tunning for the team
  1. 📝 Pre-release stage
  • decide which bot capabilities we want to go with on first release: only go with the 'wrong proposal template check' or also include the 'edit post in case of substantial changes' even though it might require some extra fine-tunning
  • decide the conditions in which the bot should run, like for example: if issue is open and has Help Wanted label or any other
  • agree on initial testing period, making sure we have a plan in case things go south like disabling the bot
  1. 🎉 Release stage
  • shipit and watch the action
  • gather some feedback from the community
  • decide what to do next: fine-tunning or other improvements / ideas

@melvin-bot melvin-bot bot added the Overdue label Feb 19, 2024
@marcochavezf
Copy link
Contributor

Hi @ikevin127, apologies for the late reply. The plan sounds good, thanks! A few thoughts:

create the assistant on Expensify owned OpenAI account

I think we can do it programmatically and just pass the API key as ENV param from the GH action, no?

decide which bot capabilities we want to go with on first release: only go with the 'wrong proposal template check' or also include the 'edit post in case of substantial changes' even though it might require some extra fine-tunning

It would be great to go with both. What would you need to fine-tune the solution?

decide the conditions in which the bot should run, like for example: if issue is open and has Help Wanted label or any other

I think when the label Help Wanted is added and when someone post in the GH issue

agree on initial testing period, making sure we have a plan in case things go south like disabling the bot

One idea is to add a label Beta ProposalPolice where the bot can only run if also that label is added.

@melvin-bot melvin-bot bot removed the Overdue label Feb 19, 2024
@ikevin127
Copy link
Contributor

@marcochavezf No worries 😁 Even though this is a Weekly issue, I wasn't expecting a reply every week.

I think we can do it programmatically and just pass the API key as ENV param from the GH action, no?

Unfortunately, OpenAI only allows using assistants within the same org (account) therefore we will need to:

  • create the assistant on Expensify's OpenAI account by copy & pasting the already fine-tuned instructions that we have
  • get the newly creaded assistant ID and api key from Expensify's OpenAI account
  • pass these 2 to the GH bot via env variables and we're done w/ the setup

It would be great to go with both. What would you need to fine-tune the solution?

Both sounds good then 🎉

About fine-tunning: I was talking more in terms of general small adjustments of the assistant's instructions regarding specifics of what we consider minor / major proposal edits and so on such that the bot triggers the right action / response.

But since we can create a new label like Beta ProposalPolice this is more than enough to safely fine-tune the assistant while in beta.

One idea is to add a label Beta ProposalPolice where the bot can only run if also that label is added.

I think this would be great, this way we can safely test it out and fine-tune it live!

This being said, once we have the setup completed, installed the bot in the E/App repo and added the new Beta ProposalPolice - we're good to go! 🚀

@melvin-bot melvin-bot bot added the Overdue label Feb 28, 2024
@marcochavezf
Copy link
Contributor

Sounds good, thanks @ikevin127! Besides the assistantID and the OpenAI API key, do you think you can create a PR with the bot? Or would we need to set up something else first?

@melvin-bot melvin-bot bot removed the Overdue label Feb 28, 2024
@ikevin127
Copy link
Contributor

@marcochavezf Sure, here's the proposal-police repo, it's public. How do you think I should use the existing repo to create a PR and where ?

@mallenexpensify
Copy link
Contributor

Copy link

melvin-bot bot commented Aug 6, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 9.0.16-8 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-08-13. 🎊

For reference, here are some details about the assignees on this issue:

  • @ikevin127 requires payment (Needs manual offer from BZ)

@melvin-bot melvin-bot bot added Daily KSv2 Weekly KSv2 and removed Weekly KSv2 Daily KSv2 labels Aug 7, 2024
@melvin-bot melvin-bot bot changed the title [HOLD for payment 2024-08-13] [HOLD for payment 2024-08-07] [Tracking] ProposalPolice™ 👮🏼 [HOLD for payment 2024-08-14] [HOLD for payment 2024-08-13] [HOLD for payment 2024-08-07] [Tracking] ProposalPolice™ 👮🏼 Aug 7, 2024
Copy link

melvin-bot bot commented Aug 7, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 9.0.17-2 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-08-14. 🎊

For reference, here are some details about the assignees on this issue:

  • @ikevin127 requires payment (Needs manual offer from BZ)

Copy link

melvin-bot bot commented Aug 14, 2024

Payment Summary

Upwork Job

  • ROLE: @ikevin127 paid $(AMOUNT) via Upwork (LINK)

BugZero Checklist (@mallenexpensify)

  • I have verified the correct assignees and roles are listed above and updated the neccesary manual offers
  • I have verified that there are no duplicate or incorrect contracts on Upwork for this job (https://www.upwork.com/ab/applicants//hired)
  • I have paid out the Upwork contracts or cancelled the ones that are incorrect
  • I have verified the payment summary above is correct

@mallenexpensify mallenexpensify changed the title [HOLD for payment 2024-08-14] [HOLD for payment 2024-08-13] [HOLD for payment 2024-08-07] [Tracking] ProposalPolice™ 👮🏼 [Tracking] ProposalPolice™ 👮🏼 Aug 14, 2024
@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Weekly KSv2 labels Aug 15, 2024
@roryabraham
Copy link
Contributor Author

I did a lot of reviews for this one so I'm co-assigning myself

@roryabraham roryabraham self-assigned this Aug 23, 2024
@mallenexpensify
Copy link
Contributor

@ikevin127 , what's status here? Anything else we're waiting on?

@ikevin127
Copy link
Contributor

@mallenexpensify Hey, as mentioned in this expensify comment:

🟢 Overall the bot's activity looks really good, it's doing its job as instructed and polices proposals for new comments (checking for proposal template match) and for significantly edited proposals.

The latest discussion regarding additional work / changes in code on the proposal police is this slack comment ragarding some issues with the date on a small percentage of proposals that were had significant changes in edits and the date outputted by the proposal police bot, which I plan to work on whenever I get up to date with current PRs / issues.

^ This is not a disruptive issue in any way as, in case the wrong (same 2023 date) is posted by the bot when the proposal is edited significantly, the C+ can click on the comment's edit history and see the edit before the bot's edit and get the correct date - which I'm sure most C+ already do because of the UTC format which we chose for the bot output date.

Other than that one date-related issue which happens every now an then which I plan to address in the near future, I think we're good here for moving forward with the final checklist of requirements before we can proceed to payment.

@mallenexpensify
Copy link
Contributor

Thanks @ikevin127

the final checklist of requirements before we can proceed to payment.

What are the requirements we need to ✅ ? For payment, I think we discussed finishing the project first, then discussing payment. If so, what do you propose for the amount? (let's only focus on this core project and not any future work for deciding the payment amount).

@ikevin127
Copy link
Contributor

@mallenexpensify By requirements I was referring to things like waiting a bit after proposal police is live to give it some time and show success before discussing payment - I think that happened and we can move forward here.

♻️ Summary

⏳ Timeline of events

🟢 Present day: Overall the bot's activity good, it's doing its job as instructed and polices proposals for new comments (checking for proposal template match) and for significantly edited proposals.

🤖 Some recent examples of Proposal Police in action

CASE 1 - Proposal template match check

CASE 2 - Proposal edited w/ significant changes

💰 Payment amount

Given the amount of work involved to bring this from a 📓 P/S state to 🔴 live and considering the somewhat standard amount paid to C+ for completing projects, I think $2000 would be a fair amount as payment for this project.

@mallenexpensify mallenexpensify changed the title [Tracking] ProposalPolice™ 👮🏼 [$2000][Tracking] ProposalPolice™ 👮🏼 Sep 7, 2024
Copy link

melvin-bot bot commented Sep 7, 2024

⚠️ Could not update price automatically because there is no linked Upwork Job ID. The BZ team member will need to update the price manually in Upwork.

@mallenexpensify
Copy link
Contributor

Thanks for the thorough breakdown @ikevin127 , $2000 seems fair. I'm created a job, please accept
https://www.upwork.com/jobs/~021832212739812288245

I'll wait til Tuesday to pay, in case @marcochavezf or others have feedback

@mallenexpensify mallenexpensify changed the title [$2000][Tracking] ProposalPolice™ 👮🏼 [Hold for Payment 2024-09-10][$2000][Tracking] ProposalPolice™ 👮🏼 Sep 7, 2024
@ikevin127
Copy link
Contributor

Sure, thanks Matt! 💯

@ikevin127
Copy link
Contributor

@mallenexpensify Everything set for payment or we're waiting on more feedback ?

@mallenexpensify mallenexpensify removed the Reviewing Has a PR in review label Sep 11, 2024
@mallenexpensify
Copy link
Contributor

Contributor: @ikevin127 paid $2000 via Upwork

Do we need any regression tests for this? Guessing not cuz it's only in GH.
For additional work/tweaks/improvements, what's fair? @ikevin127 maybe you work on any small stuff for free the next 30 days? Then, after that, we discuss payment, per issue, for future improvements/fixes? (and.. if any not-small stuff comes up in 30 days, we can discuss potential payment there too)

@ikevin127
Copy link
Contributor

@mallenexpensify ✅ Correct, no regression tests needed here. Regarding small stuff like tweaks / improvements there's no need for payments in the next 30 days or ever. In case some major work / changes are required (doubt) then the standard issue rate works for me 👍

🗒️ To do

  1. I have it on my list to refactor the proposal updated date issue since every now and then it defaults to a 2023 date, this being a small tweak - won't require any payment.
  2. Probably we want to change the response regarding the proposal template check to be more friendly - again won't require payment because it's as simple as updating the AI instructions on OpenAI dashboard.

@mallenexpensify
Copy link
Contributor

Thanks, sounds good and makes sense. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Task Weekly KSv2
Projects
None yet
Development

No branches or pull requests

6 participants