Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on Voynich Manuscript #681

Open
bi3mw opened this issue Jan 24, 2025 · 3 comments
Open

Training on Voynich Manuscript #681

bi3mw opened this issue Jan 24, 2025 · 3 comments

Comments

@bi3mw
Copy link

bi3mw commented Jan 24, 2025

Hello,
I would like to train on the so-called Voynich Manuscript. My current problem is as follows.

I get this error message when I start the training:
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Here is my XML - File ( example ):

*PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
*Metadata />
*Page imageFilename="000006.png">
*TextRegion id="block_1" custom="structure {type:region_type;}">
*Coords points="2,34 2,72 897,72 897,34" />
*TextLine id="line_1">
*Baseline points="2,40 177,36 385,34 544,34 745,36" />
*TextEquiv>
Unicode>ysheees chetchy teodar otcheol tockhy/Unicode>
*/TextEquiv>
*Word>
Unicode>ysheees/Unicode>
*/Word>
*Word>
Unicode>chetchy/Unicode>
*/Word>
*Word>
Unicode>teodar/Unicode>
*/Word>
*Word>
Unicode>otcheol/Unicode>
*/Word>
*Word>
Unicode>tockhy/Unicode>
*/Word>
*/TextLine>
*/TextRegion>
*/Page>
*/PcGts>

Note: * is an opening bracket, but it makes the post invisible.

@bi3mw
Copy link
Author

bi3mw commented Feb 1, 2025

Is there no solution to this problem ?

@mittagessen
Copy link
Owner

mittagessen commented Feb 1, 2025 via email

@bi3mw
Copy link
Author

bi3mw commented Feb 2, 2025

Thanks for the feedback, I will think carefully about the comments.

I have now succeeded in creating well-formed XML files. It should be noted that the page tag should look like this, for example:

*Page imageFilename="000000.png" imageWidth="1426" imageHeight="78">

Without the specification of “imageWidth” and “imageHeight”, Ketos aborts the training with an error message.

My question is whether the Coords points must be specified after each Word tag, like this:

    *Word>
      *Unicode>tchodar</Unicode>
      *Coords points="1,26 172,26 172,49 1,49" />
    */Word>
    *Word>

or is this information unnecessary ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants