-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
92 lines (79 loc) · 3.53 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Working, with predicting regexstrs:
SMBC
Frankenstein Superstar
Evil Inc
Girl Genius
Wayward Sons
Sheldon
Cyanide & Happiness
xkcd
Order of the Stick
Full Frontal Nerdity
Darths & Droids
You Suck
Nnewts
Misfile
Hark! A vagrant
Perry Bible Fellowship
Wondermark
Working, with regexstrs that guess but don't predict URLs:
Wonderalla
Reptilis Rex
Dresden Codak
Lady Sabre
Basic Instructions
Scenes from a Multiverse
The Abominable Charles Christopher
Penny Arcade
PvP
Partially Clips
Three Panel Soul
Working, with regexstrs that predict the title:
None.
Do we even need a regexstr of the title, then? Some used to be useful before
we had sequence matching (e.g. SMBC, Sheldon).
Working, with unguessable feed:
Help Desk
Tragedy Series
RSS feed issues:
Doonesbury has a non-Doonesbury-related slate feed
No feed for Dilbert (but there's a link to it).
Monster of the week has sequentially-incrementing, but non-guessable,
IDs. Currently the title is almost guessable, but it's guessed wrong.
Do we need the title regexstr, really?
No RSS for dinosaur comics.
2D Goggles has sequences mis-identified.
Wizard School has an RSS feed for Wizard School only (as opposed to
all meetmyminion.com comics), but it's not the default. You want
http://feeds.feedburner.com/MinionComicsWizardSchool
No feed for Least I Could Do
No feed for oglaf.
No feed for sinfest.
Gunnerkrigg Court has titles that are always the same, and links to the front page.
No feed for El Goonish Shive. (And need a separate one for sketchbook in any
case.)
Sluggy only has one entry in the feed, and links to the front page. So we
ignore it.
All of the posts in the Trenches are currently news posts.
Tragedy series has essentially unguessable IDs misidentified as IDs.
Non-RSS websites:
Doonesbury: confused by useless RSS feed. Link title is 'Prev', href is /strip/archive/yyyy/mm/dd, today's link exists.
Dilbert: comic found correctly as largest image, link, href is /yyyy-mm-dd/, class is 'STR_Prev_PNG_Fix', today's link exists.
Dinosaur comics: doesn't find image for some reason; link, title is 'Previous comic', rel is 'prev', links to /index.php?comic=2169, today's link exists.
Full frontal nerdity: image is /ffn/strips/yyyy-mm-dd.jpg; link to index.php?date=yyyy-mm-dd, contains image back.jpg, today's link exists (well, 6th March as the comic's somewhat on hiatus).
Darths and droids: wrong RSS feed. Link body is '<PREVIOUS', links to /episodes/nnnn.html, today's link exists.
Oglaf: doesn't find image (possibly because requires a click-through). Link is id pvs, class nav_ro, contains an image set by a CSS rule, target is /fireofgod/1 - horrible.
Sinfest: image is /comikaze/comics/yyyy-mm-dd.gif. Link is to /archive_page.php?comicID=nnnn, src images/prev_a.gif, alt Previous. Today's link exists.
Gunnerkrigg Court: link to /archive_page.php?comicID=nnnn, contains img src prev_a.jpg, alt 'Previous'. Today's page exists.
Sluggy: useless RSS feed. Link to /comics/archives/daily/yymmdd, contains image class "ui-icon-seek-prev", text "Prev.". Today's link redirects to front page.
El Goonish Shive: finds correct image, /comics/yyyy/mm/yyyymmdd_....png. Link to /?date=yyyy-mm-dd, contains image src /templates/home/arrow_prev.gif. Today's link exists.
Project Apollo: link to abnnn.html, contains image, alt 'Previous Page', src layout/button-back.png. Page doesn't exist. Site fucked.
So, interesting things:
* link title Prev or Previous
* class contains Prev
* rel contains prev
* image back.jpg or prev_a.gif or arrow_prev.gif
* image alt Previous
* link text contains PREVIOUS or Prev
General TODO:
* Get alt text