Profanity Filter #237

jharpster · 2017-12-12T18:06:36Z

Expand the profanity filters and make them multi-lingual.

Brief Description

The existing word list is inadequate to address more than the simplest profanities.

What is the motivation / use case for this feature?

Create more robust vandalism detection

What is the expected behaviour ?

Consider incorporating a broader list of profanities from this list.

mvexel · 2021-02-15T19:00:09Z

I notice that the connected MapRoulette Challenge has a very high number of tasks marked as Not an Issue (false positive). As the MapRoulette superuser I am getting some complaints about the tasks in this Challenge. I would recommend that we disable this MapRoulette Challenge until the quality of the filter can be improved. Thanks.

matkoniecz · 2021-02-16T15:32:30Z

One of glaring issues is that MapRoulette Challenge is not listing what is supposed to be a profanity.

So I have no idea is it a complete bug, pattern matching English profanities to text in other languages or something else.

Looking at it I am unable to spot what caused it to be reported, not sure which English profanity matched here. I have not seen a single valid report in Poland.

willemarcel · 2021-02-18T19:23:04Z

@mvexel Thanks for the feedback. I have stopped to update this challenge.

@matkoniecz I'll evaluate the possibility of improving or disabling the profanity filter on the next few days.

bugdebugger · 2021-02-21T18:30:02Z

@willemarcel

I too stumbled on this problem on MapRoulette

You can't tell why the node was tagged with the profanity tag
Most tasks are resolved as "Not an issue"
I did quite of few of them myself. All were false positives

So I went digging and figured the following things out.
Some of this is probably obvious if you are familiar with OSM and the code around it. I wasn't 😄

The tagging probably comes from mapbox/osm-compare --> profanity comparator
It uses word lists (forked from LDNOOBW 3 years ago and never updated)
- The lists are of questionable quality 😑
In general the tag values (e.g. name) are checked against the word lists for multiple languages (default: en / es / de / fr / ru / zh)
- Only for tags with a language suffix it does the right thing e.g. name:es is only checked against the spanish word list
- A related issue on mapbox/osm-compare didn't go anywhere
It's easy to change the comparator to return all flagged words + the locale in which they are offending instead of true/false. But I don't know how that fits into the rest of the tech stack

Then I checked the word-lists in all languages I understand

some lists are ok and only contain actual rude profanities or very vulgar expressions 😳
but some of the lists also contain completely normal words e.g. first names, names of vegetables, numbers (!!)
- These words could be used (mostly in spoken language) to mean something related to e.g. reproductive organs, intercourse, etc. etc. But some are just ridiculous

The many false positives are caused by the combination of the above findings.

Some examples of what currently happens

Every node/way/line everywhere with a name tag where the value contains the number 13 or the name Peter will be flagged as profanity. Not by chance the screenshot of @matkoniecz is something with "13A" in the name tag
- 13 is in the chinese word list (ZH)
- Peter is in the french word list (FR)
Every chapel in Italy will be flagged as profanity
- As cappella (italian for chapel) is on the italian word list
While the english word list doesn't include "John", "Johnson", "Willie", "Willy" or even "Prick" etc. etc. 😜

matkoniecz · 2021-10-04T11:23:01Z

Thanks for the feedback. I have stopped to update this challenge.

Would it be possible to take it down completely or archive?

https://maproulette.org/browse/challenges?query=profanity

It would be worth saving time on manual marking 2800 entries as invalid by people using MR.

tridip1931 added discussion feedback labels Jan 11, 2018

tridip1931 added [Skill] Research [Skill] Accessibility [Type] Enhancement [Status] Need more info and removed discussion labels Jan 20, 2018

This was referenced Feb 23, 2021

Profanity tag ... where? (Whom did the mapper offend?) mapbox/osm-compare#209

Open

Remove entry "13." from zh.json mapbox/osm-compare#222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profanity Filter #237

Profanity Filter #237

jharpster commented Dec 12, 2017

mvexel commented Feb 15, 2021 •

edited

Loading

matkoniecz commented Feb 16, 2021 •

edited

Loading

willemarcel commented Feb 18, 2021

bugdebugger commented Feb 21, 2021

matkoniecz commented Oct 4, 2021

Profanity Filter #237

Profanity Filter #237

Comments

jharpster commented Dec 12, 2017

Brief Description

What is the motivation / use case for this feature?

What is the expected behaviour ?

mvexel commented Feb 15, 2021 • edited Loading

matkoniecz commented Feb 16, 2021 • edited Loading

willemarcel commented Feb 18, 2021

bugdebugger commented Feb 21, 2021

matkoniecz commented Oct 4, 2021

mvexel commented Feb 15, 2021 •

edited

Loading

matkoniecz commented Feb 16, 2021 •

edited

Loading