Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
NotaInutilis committed Sep 30, 2023
1 parent a83e3e4 commit 23ef269
Show file tree
Hide file tree
Showing 22 changed files with 322,401 additions and 0 deletions.
31 changes: 31 additions & 0 deletions .github/workflows/update.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Generate blocklists

on:
push:
branches:
- master
paths:
- 'sources/**.txt'
workflow_dispatch:

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Run update
run: ./scripts/update.sh
- name: Commit and push
run: |
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git add -A
if git commit -m "Update blocklists"; then
git push
fi
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
75 changes: 75 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Super SEO Spam Suppressor

Super SEO Spam Suppressor (SSSS[^SSSS]) is a domains blocklist of sites abusing SEO tactics to spam web searches with advertisement, empty content (monetized with ads) and malware (looking like ads). It is best used with uBlacklist or Search Engine Spam Blocker.

[^SSSS]: It's a Gridman reference. I'm spelling it out because it's also the name of a skin disease: don't go looking for SSSS on image search.

As of now, present day, time of writing year 2023, Google is now a terminally enshittified mess, merely a husk of the wonderful discovery tool it was yesterday.
Do you want to learn about *thing*?
How about **buying** *thing* and **consuming** *thing* instead?
Its drive to commercialize our every online interaction also has consequences on other, much more user friendly search engines such as DuckDuckGo, whose indexers crawl through shit optimized for Google's terrible algorithm.
This list is, as any good adblocking tool is, an attempt to escape this neverending capitalist coercition and attention theft.
All of the tech giants play this game so consider also using a social media blocklist.

## Browser extensions

### uBlacklist syntax

[Blocklist in uBlacklist format](https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/ublacklist.txt) to use with [uBlacklist](https://github.com/iorate/ublacklist). It removes blocked sites from search engine results.

[Click here to subscribe.](https://iorate.github.io/ublacklist/subscribe?name=Super%20SEO%20Spam%20Suppressor&url=https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/ublacklist.txt)

### Domains list

[Domains list format](https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/domains.txt) to use with [Search Engine Spam Blocker](https://github.com/no-cmyk/Search-Engine-Spam-Blocker). It removes blocked sites from search engine results.

### AdBlock Plus syntax

[Blocklist in AdBlock format](https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/adblock.txt) to use with an adblocker ([uBlock Origin](https://ublockorigin.com), [Adguard](https://adguard.com)…) or Adguard Home. It uses a [strict blocking rule](https://github.com/gorhill/uBlock/wiki/Strict-blocking) to block access to those sites on your browser.

[Click here to subscribe.](https://subscribe.adblockplus.org/?location=https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/adblock.txt&title=Super%20SEO%20Spam%20Suppressor)

## Hosts format

[Blocklist in Hosts format](https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/hosts.txt) to use in a [hosts](https://en.wikipedia.org/wiki/Hosts_(file)) file or PiHole.

[IPV6 version](https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/hosts.txt.ipv6).

Known issue: Firefox's DNS over HTTPS option bypasses the computer's hosts file ruleset. https://bugzilla.mozilla.org/show_bug.cgi?id=1453207

## Dnsmasq format

[Blocklist in Dnsmasq format](https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/master/dnsmasq.txt) to use with the [Dnsmasq](https://thekelleys.org.uk/dnsmasq/doc.html) DNS server software.

## How to contribute

Clone this repository and add new domains in the appropriate `.txt` files in the `sources` folder. If you do not want to categorize, just put it in the `sources/default.txt` file and it will be blocked.

> For the `https://www.example.com` website, add `example.com` to the `sources/default.txt` file.
Then, when you push your changes to the `sources` folder, GitHub actions should kick in and automatically generate new versions of the blocklists. Should you want to generate them yourself, you can run the `scripts/update.sh` script (prerequisites : bash, python).

Finally, make a pull request: we'll review and approve it within a few days.

### Categorization

Blocked sites are organized using subfolders and `.txt` files in the `sources` folder. Use markdown (`.md`) files and comments (`#`) to add more information and references.

### How to contribute (easy mode)

If you have no idea how Git works, you can still contribute! Just [open an issue](https://github.com/NotaInutilis/Super-SEO-Spam-Suppressor/issues) with the URLs you would like to add to the list, grouping them by language and categories if possible. We'll check and add them shortly.

## Credits

This blocklist is left in the public domain.

This blocklist borrows:
- the blocklist generation code and readme that I co-wrote for rimu's [No-QAnon](https://github.com/NotaInutilis/Super-SEO-Spam-Suppressor) blocklist which is distributed under the [anti-fascist licence](https://github.com/NotaInutilis/Super-SEO-Spam-Suppressor/blob/master/LICENSE.txt).
- the full domain blocklist of quenhus' [uBlock-Origin-dev-filter](https://github.com/quenhus/uBlock-Origin-dev-filter) which is in [the public domain (unlicence)](https://github.com/quenhus/uBlock-Origin-dev-filter/blob/main/LICENSE).
- the full domain blocklist of no-cmyk's [Search Engine Spam Blocklist](https://github.com/no-cmyk/Search-Engine-Spam-Blocklist) which has no licence.
- the full domain blocklist of franga2000's [Search Engine Spam Blocklist](https://github.com/franga2000/aliexpress-fake-sites) which has no licence.
- a few entries from one of DandelionSprout's "Ad Removal List for Unusual Ads" on the [adfilt](https://github.com/DandelionSprout/adfilt) blocklist repository which is distributed under the [Dandelicence](https://github.com/DandelionSprout/adfilt/blob/master/LICENSE.md).

## Other useful lists

[Jmdugan Blocklists](https://github.com/jmdugan/blocklists/tree/master/corporations): consider blocking social media and big tech corporations.
6 changes: 6 additions & 0 deletions headers/adblock.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[uBlock Origin]
! Title: Super SEO Spam Suppressor
! Homepage: https://github.com/rimu/no-qanon
! Author: NotaInutilis
! Expires: 4 days
! Description: A domains blocklist of sites abusing SEO tactics to spam web searches with advertisement, empty content (monetized with ads) and malware (looking like ads).
10 changes: 10 additions & 0 deletions scripts/domains_to_adblock.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This script converts domains.txt into a format used by ad blocking browser extensions.
# Usage:
# python domains_to_adblock.py > adblock.txt

text_file = open("domains.txt", "r")
lines = text_file.readlines()
text_file.close()

for line in lines:
print('||' + line.strip() + '^')
9 changes: 9 additions & 0 deletions scripts/domains_to_dnsmasq.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# This script converts domains.txt into dnsmasq's blocking syntax.
# python domains_to_dnsmasq.py > dnsmasq.txt

text_file = open("domains.txt", "r")
lines = text_file.readlines()
text_file.close()

for line in lines:
print('address=/' + line.strip() + '/')
11 changes: 11 additions & 0 deletions scripts/domains_to_hosts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# This script converts domains.txt into a hosts file format.
# Usage:
# python domains_to_hosts.py > hosts.txt

text_file = open("domains.txt", "r")
lines = text_file.readlines()
text_file.close()

for line in lines:
print('0.0.0.0 ' + line.strip())
print('0.0.0.0 www.'+ line.strip())
11 changes: 11 additions & 0 deletions scripts/domains_to_hosts_ipv6.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# This script converts domains.txt into a hosts file format for IPv6.
# Usage:
# python domains_to_hosts_ipv6.py > hosts.txt.ipv6

text_file = open("domains.txt", "r")
lines = text_file.readlines()
text_file.close()

for line in lines:
print('::1 ' + line.strip())
print('::1 www.'+ line.strip())
10 changes: 10 additions & 0 deletions scripts/domains_to_ublacklist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This script converts domains.txt into a match pattern format used by the uBlacklist browser extension.
# Usage:
# python domains_to_ublacklist.py > ublacklist.txt

text_file = open("domains.txt", "r")
lines = text_file.readlines()
text_file.close()

for line in lines:
print('*://*.' + line.strip() + '/*')
40 changes: 40 additions & 0 deletions scripts/update.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/usr/bin/env bash

# Use this script to generate all the blocklists using the `.txt` files in the `sources` folder.
# e.g.
# ./scripts/update.sh

# Cleanup sources:
## Normalizes URLs into domains: lowercases, remove leading spaces, protocol (`x://`) `www.` subdomains, everything after `/`, only one space before `#`. Keeps comments intact.
find ./sources -type f -name "*.txt" -exec sed -ri 'h; s/[^#]*//1; x; s/#.*//; s/.*/\L&/; s/^[[:space:]]*//i; s/^.*:\/\///i; s/^www\.//i; s/\/[^[:space:]]*//i; s/[[:space:]].*$/ /i; G; s/(.*)\n/\1/' {} \;
## Remove duplicate domains from each source file (keeps repeated comments and empty lines for organization).
find ./sources -type f -name "*.txt" -exec bash -c '
awk "(\$0 ~ /^[[:space:]]*#/ || NF == 0 || !seen[\$0]++)" "$0" > "$0_temp.txt";
mv "$0_temp.txt" "$0";
' {} \;

# Combine all sources into a domains list.
find ./sources -type f -name "*.txt" -exec cat {} \; > domains.txt

# Cleanup the domain list:
## Remove comments, inline comments, spaces and empty lines.
sed -i '/^#/d; s/#.*//; s/ //g; /^ *$/d' domains.txt
## Sort and remove duplicates.
sort -u domains.txt > domains_temp.txt
mv domains_temp.txt domains.txt

# Generate blocklists:
## From the domain list.
python scripts/domains_to_hosts.py > hosts.txt
python scripts/domains_to_hosts_ipv6.py > hosts.txt.ipv6
python scripts/domains_to_dnsmasq.py > dnsmasq.txt

## For browser extensions.
python scripts/domains_to_adblock.py > adblock_temp.txt
cp ./headers/adblock.txt adblock.txt
cat adblock_temp.txt >> adblock.txt
rm adblock_temp.txt
python scripts/domains_to_ublacklist.py > ublacklist_temp.txt
cp ./headers/adblock.txt ublacklist.txt # Currently using the same adblock header until uBlacklist implements its own header. https://github.com/iorate/ublacklist/issues/351
cat ublacklist_temp.txt >> ublacklist.txt
rm ublacklist_temp.txt
2 changes: 2 additions & 0 deletions sources/Content farms/Automated content.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# It's all pure content: empty, soulless, probably AI automated.
https://www.criticalhit.net/
Empty file.
10 changes: 10 additions & 0 deletions sources/Content farms/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Content farms are websites full of pure content: articles for the article throne.
They're looking for search engine clicks and views on their ads.
The boring articles go on for way too long, with unnecessary padding forced in every sentence because the writer is paid by the word/character.
They're currently being replaced by AIs.

Buying guides are what you're being fed whenever you're looking for any kind of information about appliances.
They're mainly writing product comparisons based on web stores listing, which they paraphrase with a dash of SEO.
There are no original pictures nor actual testing of the machine involved.
Their goal is to send users to an online store via affiliate links and earn a commission on each sale.
They're often published by advertisement companies while proudly touting their "independence".
3 changes: 3 additions & 0 deletions sources/Malware/Malvertisement external forwarding.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Uses the external forwarding service to redirect to a string of websites which trigger advertisement, badware and malware adblock filter lists.

sudrtestt.ru/*
24 changes: 24 additions & 0 deletions sources/Malware/Malvertisement redirection.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Accessing it via search results launches a series of redirection to websites which trigger adblock filter lists of advertisement, badware and malware.
# Some of these domains redirect to Google when accessed directly.

lanounou-animaux.fr/*
goodwillaccompagnement.fr/*
ballenro.fr/*
efm49.fr/*
lesrandounaires.fr/*
3dmo.fr/*
ecb-piscines.fr/*
fattodicanapa.it/*
owczarekniemiecki24.pl/*
nalac.fr/*
linodesigns.de/*
gm-pm-personal-consulting.de/*
pedrobahon.es/*
isotec-institut.de/*
aug-dus.de/*
schmuck-emotionen.de/*
berpack.de/*
q14interieur.nl/*
mwk-foerderungen.de/*
carecleaningservice.de/*
skateservice.de/*
6 changes: 6 additions & 0 deletions sources/Malware/Name squatting software.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Domain name squatting of an already existing sofware which often does not have an official site.
# Overtakes the real developers on SEO and offers a mirror of the software file instead of a redirection.
# Such disingenuous behavior presents a malware risk.

https://mindthegapps.com/
https://magiskzip.com
12 changes: 12 additions & 0 deletions sources/Phishing/Fake stores.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Fake online stores that overtake the real ones. Both scam and phishing at the same time.

# Fake French branch of the Keen shoes brand
keen-chaussure.fr
keen-france.fr
keenshoesfrance.com
keenfrance.fr
keen-fr.com
keen-chaussures.com
keenfrancefr.com
keenfr.com
keen-chaussures.fr
5 changes: 5 additions & 0 deletions sources/_imported/Ad Removal List for Unusual Ads.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Blatantly bot-generated fraudsters from https://github.com/DandelionSprout/adfilt/blob/master/AdRemovalListForUnusualAds.txt
e-rabattkode.no
hotdeals.com
coupert.com
rabattkodendin.com
4 changes: 4 additions & 0 deletions sources/_imported/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
https://github.com/franga2000/aliexpress-fake-sites/blob/main/domains.txt
https://github.com/no-cmyk/Search-Engine-Spam-Blocklist
Blatantly bot-generated fraudsters from https://github.com/DandelionSprout/adfilt/blob/master/AdRemovalListForUnusualAds.txt
https://raw.githubusercontent.com/quenhus/uBlock-Origin-dev-filter/main/dist/other_format/domains/global.txt
Loading

0 comments on commit 23ef269

Please sign in to comment.