-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
283 lines (238 loc) · 10.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
<!DOCTYPE html>
<html>
<head>
<title>The Archival Acid Test</title>
<base href="http://acid.matkelly.com/" />
<style type="text/css">
/* TODO: Potentially some CSS3 features that Heritrix might not provoke crawlers to fetch content by reference */
body {height: 150%;}
div#relativeImageFromCSS {background-image: url('img.png');}
div#absoluteImageFromCSS {background-image: url('http://thisdomain.com/img.png');}
div#externalImageFromCSS {background-image: url('http://anotherSite/img.png');}
div {width: 100px;}
img, canvas {width: 10px; height: 10px; margin: 0 1px 1px 0; padding: 0; display: block; float: left;}
iframe {padding: 0;}
h1 {margin-bottom: 0; padding-bottom: 0;}
h2, h3, h4, p {margin: 0; padding: 0; clear: both;}
h2 {margin-top: 1.0em;}
ul {margin: 0.5em auto; padding-top: 0; list-style-type: none;}
div {height: 10px; border: 1px solid black; padding: 1px 0 1px 1px;}
div#test1 {width: 55px;}
div#scriptParent {width: 77px;}
div#html5Parent {width: 22px;}
p#test2iHeightAssurance {margin-top: 2000px;}
p {margin-bottom: 1.0em;}
li span {font-style: italic;}
</style>
<script type="text/javascript" src="arrayBuffer.js"></script>
</head>
<body>
<h1>The Archival Acid Test</h1>
<p>This page was setup to test the capabilities and shortcomings of archival web crawlers. More information about the rationale and individual tests can be found <a href="#moreInfo">below</a>. For questions/comments contact <a href="mailto:[email protected]?subject=Archive Acid test comment">Mat Kelly</a>.</p>
<h2>The Tests</h2>
<h3>The Basics (5 tests)</h3>
<div id="test1">
<!--Test 1a, Local Image, relative --><img src="pixel.png" title="test1a" />
<!--Test 1b, Local Image, absolute --><img src="http://acid.matkelly.com/pixel.png" title="test1b" />
<!--Test 1c, Remote Image, absolute --><img src="http://www.cs.odu.edu/~mkelly/acid/pixel.png" title="test1c" />
<!--Test 1d, Inline Content, Encoded Image--><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJ
bWFnZVJlYWR5ccllPAAAAyBpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdp
bj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6
eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEz
NDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJo
dHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlw
dGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAv
IiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RS
ZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpD
cmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBXaW5kb3dzIiB4bXBNTTpJbnN0YW5jZUlE
PSJ4bXAuaWlkOjZDOTUzNTREMUVCRDExRTJBRjM2RkI1NjAwQzQzQTFGIiB4bXBNTTpEb2N1bWVu
dElEPSJ4bXAuZGlkOjZDOTUzNTRFMUVCRDExRTJBRjM2RkI1NjAwQzQzQTFGIj4gPHhtcE1NOkRl
cml2ZWRGcm9tIHN0UmVmOmluc3RhbmNlSUQ9InhtcC5paWQ6NkM5NTM1NEIxRUJEMTFFMkFGMzZG
QjU2MDBDNDNBMUYiIHN0UmVmOmRvY3VtZW50SUQ9InhtcC5kaWQ6NkM5NTM1NEMxRUJEMTFFMkFG
MzZGQjU2MDBDNDNBMUYiLz4gPC9yZGY6RGVzY3JpcHRpb24+IDwvcmRmOlJERj4gPC94OnhtcG1l
dGE+IDw/eHBhY2tldCBlbmQ9InIiPz5pvSZLAAAADklEQVR42mJmYPgPEGAAAQ8BA8CY0HcAAAAA
SUVORK5CYII=" title="test1d" />
<!--Test 1e, Scheme-less resource --><img src="//acid.matkelly.com/pixel.png" title="test1e" />
<!--TODO: Test 1f, Hot-linking -->
</div>
<h3>Javascript (7 tests)</h3>
<div id="scriptParent">
<!--Test 2a, Script, local --><script type="text/javascript" src="localScript.js" id="localScript"></script>
<!--Test 2b, Script, remote --><script type="text/javascript" src="http://www.cs.odu.edu/~mkelly/acid/externalScript.js" id="externalScript"></script>
<script type="text/javascript">
//Test 2c, Script inline, DOM Manipution
document.addEventListener('DOMContentLoaded',function(){
var inlineScriptImage = new Image();
inlineScriptImage.src = "pixel.png";
var scriptParent = document.getElementById('scriptParent');
scriptParent.appendChild(inlineScriptImage);
});
//Test 2d, AJAX image replacement of content that should be in the archive
document.addEventListener('DOMContentLoaded',function(){
//Test 2d setup
var test2DImageRed = new Image();
test2DImageRed.src = "red.png";
test2DImageRed.id = "test2d";
var scriptParent = document.getElementById('scriptParent');
scriptParent.appendChild(test2DImageRed);
//Test 2d init AJAX
var xhr = new XMLHttpRequest();
xhr.open('GET','http://acid.matkelly.com/pixel.png',true);
xhr.responseType = 'arraybuffer';
xhr.onload = function(e) {
document.getElementById('test2d').src = "data:image/png;base64,"+base64ArrayBuffer(e.currentTarget.response);
};
xhr.send();
});
//Test 2e, AJAX requests with content that should be included in the archive, test for false positive
// e.g. Same Origin Policy
document.addEventListener('DOMContentLoaded',function(){
//Test 2e setup
var test2EImageBlue = new Image();
test2EImageBlue.src = "pixel.png";
test2EImageBlue.id = "test2e";
var scriptParent = document.getElementById('scriptParent');
scriptParent.appendChild(test2EImageBlue);
//Test 10 init AJAX
var xhr = new XMLHttpRequest();
xhr.open('GET','http://acid.matkelly.com/pixel.png',true);
try{
xhr.responseType = 'arraybuffer'; //response type of synchronous request should not be changeable, INVALID_ACCESS_ERR: DOM Exception 15
xhr.onload = function(e) {
document.getElementById('test2e').src = "data:image/png;base64,"+base64ArrayBuffer(e.currentTarget.response);
};
xhr.send();
}catch(err){
//console.log(err); //correct path
}
});
// TODO Test 2f: some code only served to certain user agents
// Test 2g: code that manipulates DOM after a certain delay (test the synchonicity of tools)
document.addEventListener('DOMContentLoaded',function(){
//Test 12 setup
var test2GImageRed = new Image();
test2GImageRed.src = "red.png";
test2GImageRed.id = "test2g";
var scriptParent = document.getElementById('scriptParent');
scriptParent.appendChild(test2GImageRed);
//change image after 2 seconds
setTimeout(function(){
document.getElementById('test2g').src = "pixel.png";
},2000);
});
// Test 2h: code that manipulates DOM after a certain delay (test the synchonicity of tools)
document.addEventListener('DOMContentLoaded',function(){
//Test 2h setup
//var test13ImageBlue = new Image();
var test2HIframeBlue = document.createElement('iframe');
test2HIframeBlue.src = "pixel.png";
test2HIframeBlue.id = "test2h";
var scriptParent = document.getElementById('scriptParent');
//scriptParent.appendChild(test13ImageBlue);
//scriptParent.appendChild(test13IframeBlue);
setTimeout(function(){
//document.write('<scr'+'ipt type="text/javascript" src="dynamicallyIncludedScript.js"></sc'+'ript>');
},2100);
});
// Test 2i: code that loads content only after user interaction (tests for interaction-reliant loading of resource)
var test2iExecuted = false;
document.addEventListener('DOMContentLoaded',function(){
var test2IImageRed = new Image();
test2IImageRed.src = "red.png";
test2IImageRed.id = "test2i";
test2IImageRed.title = "2i";
var scriptParent = document.getElementById('scriptParent');
scriptParent.appendChild(test2IImageRed);
window.onscroll = function(oEvent){
if(test2iExecuted){return;}
test2iExecuted = true;
document.getElementById('test2i').src = "pixel_2i.png";
};
});
</script>
</div>
<h3>HTML5 Features (2 tests)</h3>
<div id="html5Parent">
<script type="text/javascript" id="test3Ascript">
//TEST 3a: HTML5 Canvas drawing
document.addEventListener('DOMContentLoaded',function(){
//Test 3a setup
var test3Acanvas = document.createElement('canvas');
test3Acanvas.width = "10";
test3Acanvas.height = "10";
test3Acanvas.id = "test3a";
var scriptParent = document.getElementById('html5Parent');
scriptParent.replaceChild(test3Acanvas,document.getElementById('test3Ascript'));
//Test 3a, HTML5 Canvas Draw
var canvas = document.getElementById('test3a');
var context = canvas.getContext("2d");
context.fillStyle = "#0000FF";
context.fillRect(0,0,10,10);
});
//Test 3b: LocalStorage
document.addEventListener('DOMContentLoaded',function(){
//Test3b setup
var test3BImageRed = new Image();
test3BImageRed.src = "red.png";
test3BImageRed.id = "test3b";
var scriptParent = document.getElementById('html5Parent');
scriptParent.appendChild(test3BImageRed);
localStorage.setItem("test3bSrc", "pixel.png");
document.getElementById('test3b').src = localStorage.getItem('test3bSrc');
});
</script>
</div>
<!--<script type="text/javascript">
document.body.append('img');
document.write("<scrip" + "t src='dynamicallyIncludedScript.js'>"+ " "+"</sc"+"ript>");
</script>
</div>
<h3>HTML5 Features</h3>
<div>
<iframe id="externalWebpage"></iframe>
<iframe id="xssAllowed"></iframe>
<iframe id="xssDisallowed"></iframe>
</div>
<h3>Embedded Objects</h3>
<div>
<object><param src="includedAudio.mp3"></object>
<embed src="anotherIncluedAudioTrack.mp3"></embed>
<object src="http://youtube.com/someYouTubeVideo.flv"></object>
</div>-->
<h2 id="moreInfo">More Information</h2>
<h3>The Motivation</h3>
<p>The purpose of this web page is to test the capability of web crawlers intended for archiving (e.g., Heritrix) and potentially their corresponding replay systems (e.g., Wayback).
<h3>Tests' Rationales</h3>
<p>Tell an archival crawler to capture this page. Replay the capture in an archival replay system. Any non-blue squares means that some aesthetic or functionality capability of the page on the live web is not being preserved into the archive.</p>
<h4>The Basics</h4>
<ul>
<li>1a - Local image, relative to the test</li>
<li>1b - Local image, absolute URI</li>
<li>1c - Remote image, absolute</li>
<li>1d - Inline content, encoded image</li>
<li>1e - Scheme-less resource</li>
<li>1f - Hotlinking <span>(not implemented)</span></li>
</ul>
<h4>JavaScript</h4>
<ul>
<li>2a - Script, local</li>
<li>2b - Script, remote</li>
<li>2c - Script inline, DOM manipulation</li>
<li>2d - Ajax image replacement of content that should be in archive</li>
<li>2e - Ajax requests with content that should be included in the archve, test for false positive (e.g., same origin policy)</li>
<li>2f - Code only served to certain user agents (e.g., mobile) <span>(not implemented)</span></li>
<li>2g - Code that manipulates DOM after a certain delay (test the synchronicity of the tools)</li>
<li>2h - <span>Vacant</span></li>
<li>2i - Code that loads content only after user interaction (tests for interaction-reliant loading of a resource)</li>
</ul>
<h4>HTML5 Features</h4>
<ul>
<li>3a - HTML5 Canvas Drawing</li>
<li>3b - LocalStorage</li>
<li>3c - Exernal Webpage <span>(not implemented)</span></li>
<li>3d - XSS Allowed <span>(not implemented)</span></li>
<li>3e - XSS Disallowed <span>(not implemented)</span></li>
<li>3f - Embedded Objects <span>(not implemented)</span></li>
<p id="test2iHeightAssurance"> </p>
</body>
</html>