Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profanity Filter #237

Open
jharpster opened this issue Dec 12, 2017 · 5 comments
Open

Profanity Filter #237

jharpster opened this issue Dec 12, 2017 · 5 comments

Comments

@jharpster
Copy link

Expand the profanity filters and make them multi-lingual.

Brief Description

The existing word list is inadequate to address more than the simplest profanities.

What is the motivation / use case for this feature?

Create more robust vandalism detection

What is the expected behaviour ?

Consider incorporating a broader list of profanities from this list.

@mvexel
Copy link

mvexel commented Feb 15, 2021

I notice that the connected MapRoulette Challenge has a very high number of tasks marked as Not an Issue (false positive). As the MapRoulette superuser I am getting some complaints about the tasks in this Challenge. I would recommend that we disable this MapRoulette Challenge until the quality of the filter can be improved. Thanks.

Screen Shot 2021-02-15 at 11 59 21 AM

@matkoniecz
Copy link

matkoniecz commented Feb 16, 2021

One of glaring issues is that MapRoulette Challenge is not listing what is supposed to be a profanity.

So I have no idea is it a complete bug, pattern matching English profanities to text in other languages or something else.

Looking at it I am unable to spot what caused it to be reported, not sure which English profanity matched here. I have not seen a single valid report in Poland.

screen02

@willemarcel
Copy link
Collaborator

@mvexel Thanks for the feedback. I have stopped to update this challenge.

@matkoniecz I'll evaluate the possibility of improving or disabling the profanity filter on the next few days.

@bugdebugger
Copy link

@willemarcel

I too stumbled on this problem on MapRoulette

  1. You can't tell why the node was tagged with the profanity tag
  2. Most tasks are resolved as "Not an issue"
  3. I did quite of few of them myself. All were false positives

So I went digging and figured the following things out.
Some of this is probably obvious if you are familiar with OSM and the code around it. I wasn't 😄

  • The tagging probably comes from mapbox/osm-compare --> profanity comparator
  • It uses word lists (forked from LDNOOBW 3 years ago and never updated)
    • The lists are of questionable quality 😑
  • In general the tag values (e.g. name) are checked against the word lists for multiple languages (default: en / es / de / fr / ru / zh)
    • Only for tags with a language suffix it does the right thing e.g. name:es is only checked against the spanish word list
    • A related issue on mapbox/osm-compare didn't go anywhere
  • It's easy to change the comparator to return all flagged words + the locale in which they are offending instead of true/false. But I don't know how that fits into the rest of the tech stack

Then I checked the word-lists in all languages I understand

  • some lists are ok and only contain actual rude profanities or very vulgar expressions 😳
  • but some of the lists also contain completely normal words e.g. first names, names of vegetables, numbers (!!)
    • These words could be used (mostly in spoken language) to mean something related to e.g. reproductive organs, intercourse, etc. etc. But some are just ridiculous

The many false positives are caused by the combination of the above findings.

Some examples of what currently happens

  • Every node/way/line everywhere with a name tag where the value contains the number 13 or the name Peter will be flagged as profanity. Not by chance the screenshot of @matkoniecz is something with "13A" in the name tag
    • 13 is in the chinese word list (ZH)
    • Peter is in the french word list (FR)
  • Every chapel in Italy will be flagged as profanity
    • As cappella (italian for chapel) is on the italian word list
  • While the english word list doesn't include "John", "Johnson", "Willie", "Willy" or even "Prick" etc. etc. 😜

@matkoniecz
Copy link

Thanks for the feedback. I have stopped to update this challenge.

Would it be possible to take it down completely or archive?

https://maproulette.org/browse/challenges?query=profanity

screen02

It would be worth saving time on manual marking 2800 entries as invalid by people using MR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants