Skip to content
This repository has been archived by the owner on Nov 16, 2020. It is now read-only.

Problem with DocumentSelectionDescriptor version of runCitLabHtr #4

Open
jscrane opened this issue Jun 13, 2018 · 3 comments
Open

Problem with DocumentSelectionDescriptor version of runCitLabHtr #4

jscrane opened this issue Jun 13, 2018 · 3 comments
Assignees

Comments

@jscrane
Copy link

jscrane commented Jun 13, 2018

I'm having a problem with this API (the other one, with the string page range works fine).

Looks like the server isn't processing the page range properly.

If you can see it, job 346078 shows this. Digging into the "jobDataProps" I can see the following XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<documentSelectionDescriptor>
    <docId>27808</docId>
    <pageList>
        <pages>
            <pageId>1</pageId>
        </pages>
        <pages>
            <pageId>2</pageId>
        </pages>
        <pages>
            <pageId>3</pageId>
        </pages>
        <pages>
            <pageId>4</pageId>
        </pages>
        <pages>
            <pageId>5</pageId>
        </pages>
        <pages>
            <pageId>6</pageId>
        </pages>
        <pages>
            <pageId>7</pageId>
        </pages>
        <pages>
            <pageId>8</pageId>
        </pages>
        <pages>
            <pageId>9</pageId>
        </pages>
    </pageList>
</documentSelectionDescriptor>

In the GUI the pages column is blank for this job, and it finished immediately.

It's no big deal, as I can use the other form for now. Just FYI.

@kahlep kahlep self-assigned this Jun 13, 2018
@kahlep
Copy link
Contributor

kahlep commented Jun 13, 2018

The descriptor object is based on the values from TrpPage#getPageId instead of the pageNr.

Of course, the job should show errors when passing page IDs that do not belong to the document. It ignores those silently now and I will fix this.

@jscrane
Copy link
Author

jscrane commented Jun 15, 2018

OK I've tried using the pageId, as follows:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<documentSelectionDescriptor>
    <docId>27808</docId>
    <pageList>
        <pages>
            <pageId>798149</pageId>
        </pages>
        <pages>
            <pageId>798150</pageId>
        </pages>
        <pages>
            <pageId>798151</pageId>
        </pages>
    </pageList>
</documentSelectionDescriptor>

And this time, the server says:

java.lang.NullPointerException
              at eu.transkribus.appserver.logic.jobs.standard.CITlabHtrJob.runHtr(CITlabHtrJob.java:202)
              at eu.transkribus.appserver.logic.jobs.standard.CITlabHtrJob.doProcess(CITlabHtrJob.java:140)
              at eu.transkribus.appserver.logic.jobs.abstractjobs.ATrpJobRunnable.run(ATrpJobRunnable.java:112)
              at eu.transkribus.appserver.logic.JobProcessStarter.run(JobProcessStarter.java:113)
              at eu.transkribus.appserver.logic.JobProcessStarter.main(JobProcessStarter.java:212)

(The modelId was 133, and the jobId was 347160.)

Apologies if I've done something stupid.

@kahlep
Copy link
Contributor

kahlep commented Jun 18, 2018

Found the missing null check and added it. Thanks for reporting.
An update of the HTR module is not possible quickly though.
If you want to stick to the descriptor (which initially used int for the transcript ID) you would have to add values <= 0 by PageDescriptor#setTsId for running the HTR on the current version of a page.
This is equivalent to using the page String.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants