收集并维护开源黑白名单数据:域名、IP、URL等
主要收集广告拦截软件(AdGuard、Adblock)和代理软件(Crash、ShadowSocks)规则中的域名
- adguard
- easylist
- firebog
- fancyss # 科学上网
- gfwlist
- https://www.github.com/blackmatrix7/ios_rule_script
- https://www.github.com/LM-Firefly/Rules
- https://www.github.com/Hackl0us/SS-Rule-Snippet
- https://www.github.com/ACL4SSR/ACL4SSR
主要收集常见活跃度较高的域名,参考来源
- http://www.queryadmin.com/1566/download-csv-top-1-million-websites-popularity/
- https://hackertarget.com/top-million-site-list-download/
- Alexa
https://s3.amazonaws.com/alexa-static/top-1m.csv.zip [2022年5月停止服务] - Cisco Umbrella
https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip - Majestic
https://downloads.majestic.com/majestic_million.csv - Statvoo
https://statvoo.com/dl/top-1million-sites.csv.zip - Tranco
https://tranco-list.s3.amazonaws.com/top-1m.csv.zip - DomCop
https://www.domcop.com/files/top/top10milliondomains.csv.zip - BuiltWith
https://builtwith.com/dl/builtwith-top1m.zip
以可信网站(四川省政府)为起始页
- 递归爬取网页中的所有URL和对应的title # 限制深度,过滤常见大站域名
- 提取域名和常见拼音简写