title | author | date | ORCID |
---|---|---|---|
newspapers_al-quds: read me |
Till Grallert |
2018-03-28 10:57:56 +0300 |
orcid.org/0000-0002-5739-8094 |
This repository contains bibliographic metadata for the newspaper al-Quds published by Jirjī Ḥabīb Ḥanāniyā in Jerusalem between 1908 and 1914. The Center for Palestine Studies at Columbia University scanned issues 1 to 391 and put them online. Currently these issues can only be accessed through their issue number and nested sub pages. I therefore produced machine-actionable bibliographic metadata including volume and issue numbers, as well as dates in all three calendars mentioned in the paper's masthead.
NOTE: as of late 2021 the facsimiles can no longer be reached. they were originally hosted on a Google Drive and all links are broken.
This repository contains a single TEI XML file containing one <biblStruct>
for each issue. This file is produced through automatic iteration making use of this code and manual validation against the digital facsimiles.
The TEI is then automatically converted to MODS XML for integration into reference management software etc (such as Zotero).
Since the publication schedule of al-Quds was rather irregular, I had to check a large number of facsimiles for their publication dates in order to adjust the input parameters for the algorithm generating the metadata. Doing so I came across a large number of missing issues, sub-pages that display only "Hello world", and incomplete scans. I have listed these errors below. Note that the list of files with missing pages will inadvertandly grow since I have not gone through individual issues (and might never do).
- errors:
- Missing scans (some of these pages show "Hello world"):
- Cut-off scans with illegible columns:
- Missing pages:
- URLs with different patterns: