-
-
Notifications
You must be signed in to change notification settings - Fork 9
/
ciro-santilli-s-projects.bigb
186 lines (132 loc) · 8.25 KB
/
ciro-santilli-s-projects.bigb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
= Ciro Santilli's projects
= Projects
{synonym}
Major projects can be seen at: <the most important projects done by Ciro Santilli>{full}.
A summary of minor projects is given at: <Ciro Santilli's minor projects>.
This section is a dump for anything else, to keep those sacred first sections that show on the top of the homepage clean.
= OurBigBook
{c}
{parent=Ciro Santilli's projects}
= OurBigBook Project
{synonym}
{title2}
https://docs.ourbigbook.com/
\Image[https://raw.githubusercontent.com/ourbigbook/ourbigbook/master/logo.svg]
{title=Logo of the <OurBigBook Project>}
= OurBigBook Markup
{c}
{parent=OurBigBook}
{tag=Lightweight markup language}
{tag=Personal knowledge base}
The <markup language> of <OurBigBook.com>.
Also used on <Ciro Santilli's website> as a <static website> via the <OurBigBook CLI>.
The one <markup language> to rule them all?
Documentation at: https://docs.ourbigbook.com[].
= OurBigBook CLI
{c}
{parent=OurBigBook}
{tag=Static site generator}
Official <Command-line interface> to convert a directory of <OurBigBook Markup> files into a <static website>. See also: https://cirosantilli.com/ourbigbook/ourbigbook-cli
= OurBigBook Library
{c}
{parent=OurBigBook}
Base <JavaScript> library that implements the <OurBigBook Markup>. Use by both:
* <OurBigBook CLI>
* <OurBigBook Web>
= OurBigBook Web
{c}
{parent=OurBigBook}
The website system that runs <OurBigBook.com>. For further information see:
* <OurBigBook.com>: rationale
* https://cirosantilli.com/ourbigbook/ourbigbook-web[]: project documentation
Relies on the <OurBigBook Library> to compile <OurBigBook Markup>.
\Include[ourbigbook-com]
= OurBigBook feature
{c}
{parent=OurBigBook}
= OurBigBook topic feature
{c}
{parent=OurBigBook feature}
= OurBigBook topics feature
{synonym}
More info at: https://docs.ourbigbook.com#ourbigbook-web-topics
= OurBigBook dynamic tree
{c}
{parent=OurBigBook feature}
More info at: https://docs.ourbigbook.com/ourbigbook-web-dynamic-article-tree
= x86 bare metal examples
{c}
{parent=Ciro Santilli's projects}
{splitSuffix}
https://github.com/cirosantilli/x86-bare-metal-examples
As mentioned at <Linux Kernel Module Cheat>{full}, this should be merged into that other project.
= Ciro Santilli's naughty projects
{c}
{parent=Ciro Santilli's projects}
If <Ciro Santilli> weren't a <Ciro Santilli's campaign for freedom of speech in China>[natural born activist], he chould have made an excellent <intelligence analyst>! See also: <Being naughty and creative are correlated>{full}.
* <Stack Overflow Vote Fraud Script>
* <GitHub> makes Ciro feel especially naughty:
* <All GitHub Commit Emails>: he extracted (almost) all Git commit emails from <GitHub> with <Google BigQuery>
* https://github.com/cirosantilli/test-many-commits-1m/[A repository with 1 million commits]: likely the https://www.quora.com/Which-GitHub-repo-has-the-most-commits/answer/Ciro-SantilliI[live repo with the most commits as of 2017]
* https://stackoverflow.com/questions/20099235/who-is-the-user-with-the-longest-streak-on-github/27742165#27742165[An 100 year GitHub streak], likely longest ever when that existed. It was consuming too much <server> resources however, which led to GitHub admins manually https://web.archive.org/web/20151021135921/https://github.com/cirosantilli/[turning off his contribution history].
* https://github.com/cirosantilli/test-octopus-100k[A repository with a 100k commit Git octopus merge]. Now that is a true https://softwareengineering.stackexchange.com/questions/314215/can-a-git-commit-have-more-than-2-parents/377903#377903[Cthulhu merge].
* https://github.com/isaacs/github/issues/1718[500 on adoc infinite header xref recursion]: that was fun while it lasted
Outside this website:
* https://cirosantilli.com/china-dictatorship/zhihu-censorship-of-hao-haidong
= All GitHub Commit Emails
{c}
{parent=Ciro Santilli's naughty projects}
{tag=Open-source intelligence}
{tag=Ciro Santilli's data projects}
https://github.com/cirosantilli/all-github-commit-emails
In this project <Ciro Santilli> extracted (almost) all Git commit emails from <GitHub> with <Google BigQuery>! The repo was later taken down by <GitHub>. Newbs, censoring publicly available data!
Ciro also created a beautifully named variant with one email per commit: https://github.com/cirosantilli/imagine-all-the-people[]. True art. It also had the effect of breaking this "what's my first commit tracker": https://twitter.com/NachoSoto/status/1761873362706698469
\Image[https://raw.githubusercontent.com/cirosantilli/media/master/GitHub_Archive_Google_bigquery_PushEvent_email_highlight.png]
{height=810}
{title=<#GitHub Archive> query showing hashed emails}
{description=It was <Ciro Santilli> that made them hash the emails. They weren't hashed before he published the emails publicly.}
\Image[https://raw.githubusercontent.com/cirosantilli/media/master/All_GitHub_commit_emails_repo_screenshot_before_takedown_archive_is.png]
{height=768}
{title=<All GitHub Commit Emails> repo before takedown}
{description=Screenshot from <archive.is>.}
= Facebook profile face dump
{c}
{parent=Ciro Santilli's naughty projects}
{tag=Ciro Santilli's data projects}
In 2016 Ciro made a script downloaded <Facebook> profile pictures.
This was possible at the time without any login by using a 2010 profile ID dump from originally announced at: https://blog.skullsecurity.org/2010/return-of-the-facebook-snatchers since profile picture access was not authenticated.
The profile ID dump was downloadable through a <BitTorrent> named `fbdata.torrent` of about 2.8GB, mostly compressed. Doing:
``
find . -type f | xargs sha256sum | sha256sum
``
on Ubuntu 20.04 gives:
``
2c9a739c9c5495e38ebab81fc67411b7c6562f139dcb8619901a3f01230efdd5
``
This dump widely reported e.g. on <Hacker News> at: https://news.ycombinator.com/item?id=1554558[].
At some point however, Facebook finally started to require tokens to view public profile pictures, thus making such further collection impossible, e.g. as of 2021: https://developers.facebook.com/docs/graph-api/reference/v9.0/user/picture[] mentions:
\Q[Querying a User ID (UID) now requires an access token.]
This is also mentioned e.g. at: https://stackoverflow.com/questions/11442442/get-user-profile-picture-by-id[]. This major privacy flaw was therefore finally addressed at some point, making it impossible to reproduce this project.
Ciro downloaded 10 thousand of those pictures, and did facial extraction with: https://stackoverflow.com/questions/13211745/detect-face-then-autocrop-pictures/37501314#37501314
He then created single a video by joining 10 thousand of those cropped faces which can be uploaded e.g. to <YouTube>. Ciro later decided it was better to make those videos private however, as sooner later he'd lose his account for it.
<Companies> like <YouTube> blocking this kind of content is the type of thing that makes companies take longer to fix such gaping privacy issues, and is a bit like <security through obscurity>. A video makes it clear to everyone that there is a privacy issue very effectively. But people prefer to hide and look away, and then 99% of people who know nothing about tech get their privacy busted by actual criminals/government spies and never learn about it.
But now that Facebook finally fixed it, it's fine, no need for the video anymore.
= Ciro Santilli's data projects
{parent=Ciro Santilli's projects}
<Ciro Santilli> has enjoyed doing projects dealing with with lots of data! They usually have a large overlap with <Ciro Santilli's naughty projects>, but not always!
= Wikipedia CatTree
{c}
{parent=Ciro Santilli's data projects}
{splitSuffix}
{tag=Ciro Santilli's minor projects}
This mini-project walks the category hierarchy <Wikipedia dumps> and dumps them in various simple formats, <HTML> being the most interesting!
* <HTML> dumps: https://cirosantilli.com/wikipedia-cattree/
* methodology: https://stackoverflow.com/questions/17432254/wikipedia-category-hierarchy-from-dumps/77313490#77313490
Scripts used:
* \a[wikipedia/import-sqlite.sh]
* \a[wikipedia/sqlite_preorder.py]
* \a[wikipedia/wikipedia-cattree.sh]
\Image[https://raw.githubusercontent.com/cirosantilli/media/master//Wikipedia_CatTree.png]
{title=<Mathematics> dump of <Wikipedia CatTree>}
{source=https://cirosantilli.com/wikipedia-cattree/Mathematics}
\Include[ciro-santilli-s-open-source-contributions]{parent=Ciro Santilli's projects}