Outreachy: "MapKnitter.org Spam Management System" Project Q&A #1745

PeculiarE · 2022-06-02T22:58:24Z

This is the discussion thread for questions on my Outreachy project (#1744).

PeculiarE · 2022-06-02T23:02:50Z

Hi @jywarren! I plan on starting work on the first couple of tasks on the list at #1744 but to do that, I need clarifications on the following, please.

(1) So @grvsachdeva has already done some work on these tasks in the PR #1030. And the project goals did mention building on existing work. So I'd like to clarify if this existing PR will be merged or am I supposed to just simply incorporate some elements from it into my own?

(2) When adding the status column to the maps (as seen in #1030), all maps will get a default status of 1 (i.e. published) including maps which have been archived using the old spam/archive method. This means we would then have two conditions for fetching map(s): their status(published/spammed) and whether or not they've been archived.

Should we then:

a. Make every archived map automatically a spammed map when doing migrations, i.e. for rows whose archived value is 1, assign the status of 0 instead of the default 1 to the status column. This way, our fetching endpoints will only fetch based on a singular criteria: the status column

b. Leave as is. Give all maps a default of 1 and fetch based on status as well as check if the map is archived.

I believe these are the 2 questions I need answered first before I get started. Other qs will definitely crop up as I move from one task to the next on the list.

Lastly, even though the design implementation won't be happening now, I would like to clarify if the proposed design is okay or we need to make a few tweaks to it before implementation later down the line.

Thank you!

jywarren · 2022-06-02T23:32:00Z

(1) So @grvsachdeva has already done some work on these tasks in the PR #1030. And the project goals did mention building on existing work. So I'd like to clarify if this existing PR will be merged or am I supposed to just simply incorporate some elements from it into my own?

I would be OK with either route. Do you have a feeling for which you think makes more sense, based on the work Gaurav did, or would you like me or another mentor to look through the PR and see how far along it is, and whether we think there's a best way forward?

(2) When adding the status column to the maps (as seen in #1030), all maps will get a default status of 1 (i.e. published) including maps which have been archived using the old spam/archive method. This means we would then have two conditions for fetching map(s): their status(published/spammed) and whether or not they've been archived.

Should we then:

a. Make every archived map automatically a spammed map when doing migrations, i.e. for rows whose archived value is 1, assign the status of 0 instead of the default 1 to the status column. This way, our fetching endpoints will only fetch based on a singular criteria: the status column

b. Leave as is. Give all maps a default of 1 and fetch based on status as well as check if the map is archived.

I believe these are the 2 questions I need answered first before I get started. Other qs will definitely crop up as I move from one task to the next on the list.

I think we should make spammed/archived maps status 0, as essentially all of them are spam. And in any case, we aren't deleting them -- since it's non-destructive, it's OK to just reclassify them under the new system.

Lastly, even though the design implementation won't be happening now, I would like to clarify if the proposed design is okay or we need to make a few tweaks to it before implementation later down the line.

I think the design is good as it is! Since spam moderators will be using both PublicLab.org and MapKnitter's spam dashboards, we want both the style/layout and functionality to be very similar, or they could get confused and make mistakes. So unless you see an opportunity for a major improvement, or elimination of a serious ambiguity or bug, we should stick as close as we can to the PublicLab.org design. That said, if you have extra time towards the end of your project, we could think about making improvements across both systems in synchrony!

Thank you for the excellent questions!!

PeculiarE · 2022-06-03T10:30:05Z

I would be OK with either route. Do you have a feeling for which you think makes more sense, based on the work Gaurav did, or would you like me or another mentor to look through the PR and see how far along it is, and whether we think there's a best way forward?

I believe I'd prefer incorporate stuffs from the PR as I work down the planning list. While there are some functions in there that are pretty straightforward and I can understand what they do and where they are to be used, I'm not so sure of others. But they might begin to make sense as I progress through the project. And I believe in that case, it's better to come to those conclusions myself rather than merging them all at once without being sure what each function does and how it ties to the project goals and tasks.

I think we should make spammed/archived maps status 0, as essentially all of them are spam. And in any case, we aren't deleting them -- since it's non-destructive, it's OK to just reclassify them under the new system.

Alrighty! Thank you. This further buttresses my point about not merging the PR straight up as the migrations there do not take this new addition into consideration.

However, this leads to another very important question, I think. My thought process is still garbled on this one, been turning my head on it for a couple of hours now. So, I'm just gonna lay it all out and hopefully, we can decide on a clear-cut process:

The policy on PublicLab (which we fully intend to incorporate in this project) is that spamming content automatically bans the owner of the content. If we spam maps with the archived status, we should also ban the author of those maps as well during the migrations.

But does this make sense from a business/community perspective? Banning authors now whose maps were archived maybe 2 years ago.... I'm not sure if this is okay. What if the author is no longer a spammer and now uses MapKnitter as it should be used?

And in a sorta roundabout way, we have to consider that one of the project goals is to spam all maps of a banned user. So do we also spam all maps of that banned user?

In summary, if we give an archived map a status of 0 (i.e. spammed), should we ban the user, and then spam all maps belonging to that user? Is this overkill, or is there a better way of handling this?

I think the design is good as it is! Since spam moderators will be using both PublicLab.org and MapKnitter's spam dashboards, we want both the style/layout and functionality to be very similar, or they could get confused and make mistakes. So unless you see an opportunity for a major improvement, or elimination of a serious ambiguity or bug, we should stick as close as we can to the PublicLab.org design. That said, if you have extra time towards the end of your project, we could think about making improvements across both systems in synchrony!

Cool, thank you @jywarren. This makes a lot of sense.

jywarren · 2022-06-04T05:59:40Z

I think you're right that although, as a matter of policy, we usually ban authors, we don't have to in this retrospective way. Although I want to note that banning is reversible and we usually tell people how to appeal in the message they see upon logging in, so it's not a permanent state. But I still agree with you, let's just spam the maps, not the users for now.

One additional advantage of taking pieces from the PR instead of merging the whole thing is that we could look for more opportunities to break it into pieces. Some parts could probably be done before, even if those parts don't actually get activated in the live site without their companion pieces. But if we write tests to protect them, they are still ok to merge in their own earlier PR, simplifying the review process and lowering the stakes of publishing to the live site. For example, we could decide to deal with old maps (and the migration) in it's own PR, without yet worrying about code that actually interacts with the migration. That's just one example.

It might be good to start thinking about how your tasks break into PRs and which if any might be dependent on others. But don't worry too much about knowing every step ahead of time. You can also just keep it in mind, while picking out smaller tasks that can be done in a PR, and it'll naturally filter out the big ones that'll take more coordination in a big PR later.

Great questions!

PeculiarE · 2022-06-04T12:54:44Z

Thank you @jywarren! I do agree with you on the importance of writing tests to help protect against any breaking changes.

I'm used to solving a task in one huge PR, and breaking one task into several related PRs is going to be a major (but very doable) shift in my coding style. I'm excited (and a wee bit anxious) to delve into this and learn a new way of doing things 😅

PeculiarE · 2022-06-08T10:20:58Z

Hi @jywarren, @cesswairimu, @TildaDares....I'm not so conversant with tests in the Rails framework so I'm not sure what the error in the image below means.

I believe that all the tests pass (because of the 0 failures) but then codecov throws an error about not finding a repo or a owner and I'm not quite sure what this means and how to debug it. Is this error related to the codebase or just specific to codecov?

Thank you!

cesswairimu · 2022-06-09T06:49:19Z

Hi @PeculiarE you are right, for codecov, we normally use it on pull requests to check for test coverage and its weird that the reports are appearing locally I will look into that but you can ignore those for now, thanks

PeculiarE · 2022-06-09T09:47:48Z

Alright! Thank you 😄 @cesswairimu

PeculiarE · 2022-06-14T10:10:28Z

Hi @jywarren....so I've been working on the single spam and batch spam endpoints for the past few days. Usually, the policy at PublicLab.org is that spamming content bans the author, but since we have anonymous authors on MapKnitter, what I implemented checks if the owner of the map is anonymous, and if yes, the only action to be carried out will be the spamming of the map; banning of the author is skipped.

Does this pass muster? Or is there a better way of going about maps with anonymous authors?

jywarren · 2022-06-14T12:58:01Z

what I implemented checks if the owner of the map is anonymous, and if yes, the only action to be carried out will be the spamming of the map; banning of the author is skipped.

I think that sounds great. Thanks!

PeculiarE · 2022-06-19T23:17:20Z

Hi @jywarren, so I have a PR up at #1762 on spamming maps (either singly or in batches).

Code Climate is requesting for a bit of refactoring to reduce complexity. While I work on that, I'd love to get a bit of clarity on some questions:

1. In the project goals, we have:

Display a list of any maps a banned PublicLab.org user has (if any) when they are banned

Offer a button to spam all maps by that user in one click, like mapknitter.org/spam/user/

Now, when we spam a map or several maps, we ban the author(s) (as long as they aren't anonymous). Based on the project goals above, is the spamming cycle supposed to run like this:

spam a map -> automatically ban the author -> spam all maps belonging to that author

Or does this feature of spamming all maps belonging to a banned author only apply to when the moderator manually clicks on ban user(s)?

2. In the general project discussion thread at #10754, you mentioned that advertisement on a map or any of its images was a major spam flag. So, this got me thinking: when spamming a map, we ban the main author (the user who created the map), but shouldn't we also ban co-authors (users who uploaded images on the map) if any?

jywarren · 2022-06-21T06:26:00Z

Hi @PeculiarE - sounds super, this is great to see.

to your questions, for 1) each of these takes some careful consideration. I think it's likely that an author posting spam will mean their previous maps are also spam. But it's not 100% the case. I think it's OK to spam all maps belonging to that author, as long as we a) clearly tell the admin that this happened, and b) link to an easy place to review what those maps were and let the admin undo the action if need be. This could just be that in the alert we show, we name the maps and link to them, and that we ensure that an "unspam" button is on each map page we link to. OR, maybe it's easier to just link to a sorted list of recently spammed maps? But if it's an old map, maybe it won't appear in a list of spammed maps, since we probably don't track the timestamp of when we spammed it? What do you think? I think maybe easier to link to each map and put a button on those pages (visible to mods only) which lets us undo the action? I believe there is a per-author page with a list of maps too. So that's a 3rd option perhaps - show map state there, and a button to republish the map or something? Isn't it supposed to be something like http://mapknitter.org/profile/warren ?

Hmm, co-authors... well, let's see -- looking over some spam maps, do we see that multiple authors is a common scenario? I might guess that spammers don't work in synchrony, or that a single spammer doesn't make multiple accounts... that would be my guess. So I think it's unnecessary... what do you think?

PeculiarE · 2022-06-24T11:54:33Z

Hi! @jywarren

1. I was thinking through your responses and had already started mapping a feasible solution based on these ideas when I realised a flaw(or at least something I think will be a major design/logic flaw) with an undo action:

If a user is banned and all their maps automatically spammed, providing the admin the option of 'undoing' this auto-spam (i.e. unspamming one or more of the spammed maps) poses a problem. Because unspamming a map automatically unbans the author, the admin's original action is also reversed/undone.

Based on this, I propose the automatic spamming of a banned user's maps be limited to only actions not induced/started by an admin/mod(e.g. scheduled calls from MapKnitter to PublicLab's API to check if an author has been banned.) In this case, it makes sense to automatically spam the maps of such users

But in situations where the admin is directly banning an author or spamming a map (and consequently banning the author), the admin should be the one to decide if they then want to spam some, all, or even none of the maps belonging to the banned author.

I wrote about how we could achieve this in my proposal in this particular section. A couple of adjustments will have to be made to my proposed ideas there, but in summary, it will work like this:

When a single author is banned either directly or via the spamming of their map: The admin will be taken to the profile page of the author(e.g. http://mapknitter.org/profile/warren) and they will have the option to spam a single map or batch-spam multiple maps of the author.

When multiple authors are banned at once either directly or via the spamming of their maps: While we can list the links to all the authors' profile pages on the alert, the alert message could become extremely long and cluttered in cases where we have a sizable number of banned authors.
I believe a better idea will be to link to a page (visible only to admins/mods) showing authors banned recently (within the past 30mins? 1hour? What do you think is an appropriate time range?). Clicking on each author on the page will then take us to their profile page where we can then spam a single map or batch-spam multiple maps.

To achieve this, we will need to create a timestamp column on the users table that is only updated when the user status changes (maybe something like status_updated_at). I dunno if we need to do same for the maps table. I don't see the need right now, but who knows? We just might need it in the nearest future. What do you think?

One last thing: In the alert message to the admin, we can provide the link to the profile page (for a single author) or in the case of multiple authors, the recently banned page. The admin will have to click on the provided link to visit these pages. An alternative will be to automatically redirect them to these pages. The alert message will inform them they will be redirected to where they can spam the maps of the banned author(s) if they so choose. I'm in favour of the redirection option as it ensures the admin visits the page just like we intended to review all the maps belonging to that author. Would you prefer us just providing the clickable link instead, and then it's up to the admin/mod to decide to visit the page or not?

2. It is very possible for a single spammer to make multiple accounts. For example, these maps look like one person created them but with different accounts. But you're right when you say the chances of both spam accounts authoring the same map is pretty low. So yeah, multiple-author banning is unnecessary. Thank you 😄

jywarren · 2022-06-28T12:28:43Z

Hi, @PeculiarE i appreciate your careful thinking on this! I'm OK with this extra page showing recently spammed maps, but yes you are right it will require a new column. I think we can not for the maps table for the time being.

Actually, does changing the user status automatically change user.updated_at? Could we sort by that?

Could we also have a button on that page that just says "spam all maps by author" -- imagine if the user is named "freeCarWashesDotCom" and you are really quite sure. It could save some time!

PeculiarE · 2022-06-29T17:52:44Z

Hiya @cesswairimu, @TildaDares, @jywarren

While working on the PR at #1772, I checked the plots2 repo to see how the publish action executes over there. And I noticed the 'Post spent so and so time in moderation' for first-time posts.

I'm not so sure the reason why this message is displayed to the moderator, and I'd appreciate some clarification.

Also, Is it something we want implemented on MapKnitter as well? Along with the extra directions within the alert?

Please see the lines of code being referred to here.

Secondly, the concept of a 'moderated' user (i.e. a user with the status of 5) is still vague to me. I'm not sure what preempts the decision to moderate the user and what are the defined limitations of a moderated user. I did read somewhere (can't remember the exact issue or pull request) that moderated users are 'almost' the same as banned users. And so I'm not sure....if a user is moderated already, can they still be banned? Or does banning only happen to users with status of 1?

jywarren · 2022-07-05T21:49:43Z

Hi @PeculiarE - sorry for my slow reply, i was traveling back to the US and missed some things.

That message was supposed to give the moderators a sense of whether the moderation process is introducing too much delay for newcomers - like, do we feel bad if they had to wait 2 days, vs. 2 hours? I think if we have the data, we might as well show it for MK as well.

I would not worry about duplicating the moderated user state. We have used it only once ever. It's designed so that if someone breaks the code of conduct, we may ban them, but not necessarily delete their contributions (that's left to the discretion of the moderator). Because they may have been banned for conduct unrelated to their work (as opposed to being banned for posting a commercial advertisement unrelated to Public Lab themes). Does that make sense? It's a very rare occurrence. So I don't think its necessary to do this in MK.

PeculiarE · 2022-07-06T16:09:37Z

No worries @jywarren 😄 .....glad you arrived back home safely.

We do have the data on MK, thank you. Would include it in my next PR or make an FTO for it. I think we can skip the 'Now reach out to the new community member; thank them, just say hello....' part of the alert message present in the Public Lab version because:

map creators are not necessarily new members to the community (logged-in users are coming from Public Lab)
Non-logged-in users are anonymous and we can't even reach them
What do you think?

Thank you for your detailed explanation on the concept of a moderated user. Wasn't planning on implementing same on MK though. I asked because of a question constantly buzzing in my brain: "if a user is moderated, can he or she still be banned?" And based on your explanations, the answer to that will be yes. Because the moderated state might be due to violating our code of conduct policy and not necessarily for posting spam content.

Thank you!

PeculiarE · 2022-07-11T17:15:51Z

That message was supposed to give the moderators a sense of whether the moderation process is introducing too much delay for newcomers - like, do we feel bad if they had to wait 2 days, vs. 2 hours? I think if we have the data, we might as well show it for MK as well.

Based on our recent conversation at #1786 and pending confirmation from the community reps, I do believe this is redundant, and there is no need to show this data anymore.

jywarren · 2022-10-11T08:08:50Z

Sounds great! Yes, some coding styles follow what's called a "single purpose rule" that each function does only one thing. Our style extends this to PRs as much as possible, so that the task of the reviewer is also simpler. When possible, we like there to be a test included, which plays a role of explaining what the code is supposed to do, or how it's supposed to work.

…

On Sat, Jun 4, 2022, 6:54 PM PeculiarE ***@***.***> wrote: Thank you @jywarren <https://github.com/jywarren>! I do agree with you on the importance of writing tests to help protect against any breaking changes. I'm used to solving a task in one huge PR, and breaking one task into several related PRs is going to be a major (but very doable) shift in my coding style. I'm excited (and a wee bit anxious) to delve into this and learn a new way of doing things 😅 — Reply to this email directly, view it on GitHub <#1745 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAF6J5YKI6VIPFAZ3VXK3DVNNGZ5ANCNFSM5XWQR6UQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

PeculiarE mentioned this issue Jun 2, 2022

Outreachy: "MapKnitter.org Spam Management System" Project Planning #1744

Open

35 tasks

PeculiarE added discussion outreachy labels Jun 2, 2022

PeculiarE added the spam label Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outreachy: "MapKnitter.org Spam Management System" Project Q&A #1745

Outreachy: "MapKnitter.org Spam Management System" Project Q&A #1745

PeculiarE commented Jun 2, 2022

PeculiarE commented Jun 2, 2022 •

edited

Loading

jywarren commented Jun 2, 2022

PeculiarE commented Jun 3, 2022

jywarren commented Jun 4, 2022

PeculiarE commented Jun 4, 2022

PeculiarE commented Jun 8, 2022

cesswairimu commented Jun 9, 2022

PeculiarE commented Jun 9, 2022

PeculiarE commented Jun 14, 2022

jywarren commented Jun 14, 2022

PeculiarE commented Jun 19, 2022 •

edited

Loading

jywarren commented Jun 21, 2022

PeculiarE commented Jun 24, 2022

jywarren commented Jun 28, 2022

PeculiarE commented Jun 29, 2022

jywarren commented Jul 5, 2022 •

edited

Loading

PeculiarE commented Jul 6, 2022

PeculiarE commented Jul 11, 2022

jywarren commented Oct 11, 2022 via email

Outreachy: "MapKnitter.org Spam Management System" Project Q&A #1745

Outreachy: "MapKnitter.org Spam Management System" Project Q&A #1745

Comments

PeculiarE commented Jun 2, 2022

PeculiarE commented Jun 2, 2022 • edited Loading

jywarren commented Jun 2, 2022

PeculiarE commented Jun 3, 2022

jywarren commented Jun 4, 2022

PeculiarE commented Jun 4, 2022

PeculiarE commented Jun 8, 2022

cesswairimu commented Jun 9, 2022

PeculiarE commented Jun 9, 2022

PeculiarE commented Jun 14, 2022

jywarren commented Jun 14, 2022

PeculiarE commented Jun 19, 2022 • edited Loading

jywarren commented Jun 21, 2022

PeculiarE commented Jun 24, 2022

jywarren commented Jun 28, 2022

PeculiarE commented Jun 29, 2022

jywarren commented Jul 5, 2022 • edited Loading

PeculiarE commented Jul 6, 2022

PeculiarE commented Jul 11, 2022

jywarren commented Oct 11, 2022 via email

PeculiarE commented Jun 2, 2022 •

edited

Loading

PeculiarE commented Jun 19, 2022 •

edited

Loading

jywarren commented Jul 5, 2022 •

edited

Loading