Skip to content

Commit

Permalink
Merge pull request #51 from jwmullan/main
Browse files Browse the repository at this point in the history
Updated NEH and Mellon Logos across site
  • Loading branch information
mabarber92 authored Oct 10, 2024
2 parents d1ec4a2 + 41adf4f commit b622de2
Show file tree
Hide file tree
Showing 8 changed files with 145 additions and 74 deletions.
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,10 @@ sidebar_sponsors: true
sponsors:
mellon:
link: "https://mellon.org/"
image: "/assets/images/main-images/mellon-logo.jpeg"
image: "/assets/images/main-images/mellon-logo-2.jpg"
neh:
link: "https://www.neh.gov/pages/NSF/media/image1.png"
image: "/assets/images/main-images/neh.jpg"
image: "/assets/images/main-images/neh-logo-2.jpg"
nsf:
link: "https://www.nsf.gov/"
image: "/assets/images/pages/NSF/media/image1.png"
180 changes: 120 additions & 60 deletions _projects/ACDC.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,65 +3,125 @@ title: "*Automatic Collation for Diversifying Corpora* (*ACDC*)"
layout: page
banner: /assets/images/main-images/Isfahan_Lotfollah_mosque_ceiling_symmetric_narrow_border.png
excerpt: "The *Automatic Collation for Diversifying Corpora* (*ACDC*) project, funded by a Level III Digital Humanities Advanced Grant from the National Endowment for the Humanities, aims to significantly improve the accuracy of handwritten text recognition (HTR) for Arabic-script manuscripts. Our team will develop a collation tool to automatically create large amounts of training data from existing digital texts and manuscript images without time-consuming human annotation of individual manuscripts."
image: /assets/images/pages/ACDC/media/image1.jpg
image: /assets/images/pages/ACDC/media/neh-logo-2.jpg
---

![](/assets/images/pages/ACDC/media/image1.jpg){: width="50%"}{: .align-right}

The *Automatic Collation for Diversifying Corpora* (*ACDC*) project, funded by a Level III Digital Humanities Advanced Grant from the National Endowment for the Humanities, aims to significantly improve the accuracy of handwritten text recognition (HTR) for Arabic-script manuscripts. Our team will develop a collation tool to automatically create large amounts of training data from existing digital texts and manuscript images without time-consuming human annotation of individual manuscripts.
The *ACDC* project will accomplish this task by extending the capabilities of the text alignment tool [passim](https://github.com/dasmiq/passim) and the OCR/HTR engine [Kraken](http://kraken.re/master/index.html) to align poor initial HTR transcriptions of diverse manuscript exemplars with existing digital texts in order to automatically produce training data in a "distantly supervised" manner.
The *ACDC* tool's acceleration of the training data production process will mark an important step towards the creation of the generalizable Arabic and Persian HTR models required for the digital transcription of large-scale Persian and Arabic manuscript collections.
## Primary Project Personnel
**Jonathan Parkes Allen**
Mellon Post-Doctoral Fellow, [Roshan Institute for Persian Studies](https://sllc.umd.edu/fields/persian), University of Maryland, College Park; Acting Assistant Director, *OpenITI AOCP* project
**Matthew Thomas Miller**
Assistant Professor of Persian Literature & Digital Humanities, Roshan Institute for Persian Studies, University of Maryland, College Park; Director, [Roshan Initiative in Persian Digital Humanities](https://sllc.umd.edu/fields/persian/roshan-institute/digital-humanities); Affiliate, [Maryland Institute for Technology in the Humanities](https://mith.umd.edu/)
**David Smith**
Associate Professor, [Khoury College of Computer Sciences](https://www.khoury.northeastern.edu/), Northeastern University; Founding Member, [NULab for Texts, Maps, and Networks](https://cssh.northeastern.edu/nulab/)
**Alejandro Toselli**
Associate Research Scientist, [Khoury College of Computer Sciences](https://www.khoury.northeastern.edu/), Northeastern University
**Si Wu**
Doctoral Candidate, [Khoury College of Computer Sciences](https://www.khoury.northeastern.edu/), Northeastern University
## Advisory Board
**Carl Ernst**
William R. Kenan, Jr. Distinguished University Professor, University of North Carolina, Chapel Hill; Co-Director, [UNC Center for Middle East and Islamic Studies](https://mideast.unc.edu/)
**Adi Keinan-Schoonbaert**
Digital Curator, Asian and African Collections, British Library
**Evyn Kropf**
Librarian for Middle Eastern & North African Studies and Religious Studies, University of Michigan; Curator, [Islamic Manuscripts Collection](https://www.lib.umich.edu/collections/collecting-areas/special-collections-and-archives/islamic-manuscripts), University of Michigan
**Sarah Bowen Savant**
Professor of History, [Institute for the Study of Muslim Civilisations](https://www.aku.edu/ismc/Pages/home.aspx), Aga Khan University, London; Principal Investigator, [KITAB project](https://kitab-project.org/)
**Sabine Schmidtke**
Professor of Islamic Intellectual History, [School of Historical Studies](https://www.ias.edu/hs), Institute for Advanced Study; Principal Investigator, [The Zaydi Manuscript Tradition (*ZMT*) project](https://www.ias.edu/digital-scholarship/zaydi_manuscript_tradition)
**Columba Stewart**
Executive Director, [Hill Museum & Manuscript Library](https://hmml.org/); Professor of Theology, Saint John's University
**Daniel Stoekl Ben Ezra**
Directeur d'Études, École Pratique des Hautes Études (EPHE), Paris, Section des Sciences historiques et philologiques; Principal Investigator, [eScripta project](https://escripta.hypotheses.org/)
![](/assets/images/pages/ACDC/media/neh-logo-2.jpg){: width="50%"}{: .align-right}

The *Automatic Collation for Diversifying Corpora* (*ACDC*) project, funded by a Level III Digital Humanities Advanced Grant from the National Endowment for the Humanities, aims to significantly improve the accuracy of handwritten text recognition (HTR) for Arabic-script manuscripts. Our team will develop a collation tool to automatically create large amounts of training data from existing digital texts and manuscript images without time-consuming human annotation of individual manuscripts.



The *ACDC* project will accomplish this task by extending the capabilities of the text alignment tool [passim](https://github.com/dasmiq/passim) and the OCR/HTR engine [Kraken](http://kraken.re/master/index.html) to align poor initial HTR transcriptions of diverse manuscript exemplars with existing digital texts in order to automatically produce training data in a "distantly supervised" manner.



The *ACDC* tool's acceleration of the training data production process will mark an important step towards the creation of the generalizable Arabic and Persian HTR models required for the digital transcription of large-scale Persian and Arabic manuscript collections.


## Primary Project Personnel



**Jonathan Parkes Allen**



Mellon Post-Doctoral Fellow, [Roshan Institute for Persian Studies](https://sllc.umd.edu/fields/persian), University of Maryland, College Park; Acting Assistant Director, *OpenITI AOCP* project



**Matthew Thomas Miller**



Assistant Professor of Persian Literature & Digital Humanities, Roshan Institute for Persian Studies, University of Maryland, College Park; Director, [Roshan Initiative in Persian Digital Humanities](https://sllc.umd.edu/fields/persian/roshan-institute/digital-humanities); Affiliate, [Maryland Institute for Technology in the Humanities](https://mith.umd.edu/)



**David Smith**



Associate Professor, [Khoury College of Computer Sciences](https://www.khoury.northeastern.edu/), Northeastern University; Founding Member, [NULab for Texts, Maps, and Networks](https://cssh.northeastern.edu/nulab/)



**Alejandro Toselli**



Associate Research Scientist, [Khoury College of Computer Sciences](https://www.khoury.northeastern.edu/), Northeastern University



**Si Wu**



Doctoral Candidate, [Khoury College of Computer Sciences](https://www.khoury.northeastern.edu/), Northeastern University


## Advisory Board



**Carl Ernst**



William R. Kenan, Jr. Distinguished University Professor, University of North Carolina, Chapel Hill; Co-Director, [UNC Center for Middle East and Islamic Studies](https://mideast.unc.edu/)



**Adi Keinan-Schoonbaert**



Digital Curator, Asian and African Collections, British Library



**Evyn Kropf**



Librarian for Middle Eastern & North African Studies and Religious Studies, University of Michigan; Curator, [Islamic Manuscripts Collection](https://www.lib.umich.edu/collections/collecting-areas/special-collections-and-archives/islamic-manuscripts), University of Michigan



**Sarah Bowen Savant**



Professor of History, [Institute for the Study of Muslim Civilisations](https://www.aku.edu/ismc/Pages/home.aspx), Aga Khan University, London; Principal Investigator, [KITAB project](https://kitab-project.org/)



**Sabine Schmidtke**



Professor of Islamic Intellectual History, [School of Historical Studies](https://www.ias.edu/hs), Institute for Advanced Study; Principal Investigator, [The Zaydi Manuscript Tradition (*ZMT*) project](https://www.ias.edu/digital-scholarship/zaydi_manuscript_tradition)



**Columba Stewart**



Executive Director, [Hill Museum & Manuscript Library](https://hmml.org/); Professor of Theology, Saint John's University



**Daniel Stoekl Ben Ezra**



Directeur d'Études, École Pratique des Hautes Études (EPHE), Paris, Section des Sciences historiques et philologiques; Principal Investigator, [eScripta project](https://escripta.hypotheses.org/)






35 changes: 23 additions & 12 deletions acknowledgements.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,35 @@ banner: /assets/images/main-images/Isfahan_Lotfollah_mosque_ceiling_symmetric_na
---

The Open Islamicate Texts Initiative (OpenITI) has received generous funding from a number of sources which we gratefully acknowledge below.
<br>

-----------------------------------
# *Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project* (*OpenITI AOCP*)
<br>

-----------------------------------

# *Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project* (*OpenITI AOCP*)



![](/assets/images/pages/Acknowledgements/media/image1.png){: width="35%"}{: .align-right}
![](/assets/images/pages/Acknowledgements/media/mellon-logo-2.jpg){: width="35%"}{: .align-right}



The *Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project* (*OpenITI AOCP*) is funded through grants from the [Public Knowledge](https://mellon.org/programs/public-knowledge/) program of [The Andrew W. Mellon Foundation](https://mellon.org/).

The *Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project* (*OpenITI AOCP*) is funded through grants from the [Public Knowledge](https://mellon.org/programs/public-knowledge/) program of [The Andrew W. Mellon Foundation](https://mellon.org/).
<br>
<br>

----------------------------------
# *Automatic Collation for Diversifying Corpora* (*ACDC*)
----------------------------------

# *Automatic Collation for Diversifying Corpora* (*ACDC*)



![](/assets/images/pages/Acknowledgements/media/neh-logo-2.jpg){: width="35%"}{: .align-right}




The *Automatic Collation for Diversifying Corpora* (*ACDC*) project is funded through a Level III Digital Humanities Advancement grant from the [National Endowment for the Humanities](https://www.neh.gov/).


![](/assets/images/pages/Acknowledgements/media/image2.jpg){: width="35%"}{: .align-right}
The *Automatic Collation for Diversifying Corpora* (*ACDC*) project is funded through a Level III Digital Humanities Advancement grant from the [National Endowment for the Humanities](https://www.neh.gov/).
Binary file added assets/images/main-images/mellon-logo-2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/main-images/neh-logo-2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/pages/ACDC/media/neh-logo-2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b622de2

Please sign in to comment.