Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat CONTENT attribute not only from String elements (but others as well) #52

Closed
wrznr opened this issue Apr 27, 2021 · 2 comments · Fixed by #56
Closed

Treat CONTENT attribute not only from String elements (but others as well) #52

wrznr opened this issue Apr 27, 2021 · 2 comments · Fixed by #56
Assignees
Labels
bug Something isn't working

Comments

@wrznr
Copy link
Member

wrznr commented Apr 27, 2021

Currently, only CONTENT attributes from String are evaluated and realized in the resulting TEI. But ALTO has some other elements which may carry this attribute, most notably HYP.

@wrznr wrznr added the bug Something isn't working label Apr 27, 2021
@wrznr wrznr self-assigned this Apr 27, 2021
@bertsky
Copy link
Member

bertsky commented Dec 3, 2021

Isn't it also debatable whether it is correct to just join all TextLine/String like so?

return " ".join(element.get("CONTENT") for element in line.xpath("./alto:String", namespaces=ns))

(I would expect that the white-space joiner only be applied where there is an SP interspersed. But there might be different conventions in the field, like having no SP at all, i.e. implicit white-space, as in PAGE-XML.)

@bertsky bertsky mentioned this issue Dec 3, 2021
@bertsky
Copy link
Member

bertsky commented Dec 6, 2021

Regarding HYP itself, I'm not sure anymore whether printing @CONTENT verbatim is correct: Basisformat states that only hyphen-minus should be allowed.

Maybe make that a config parameter? (We could technically have lots of these; related to #26)

@wrznr wrznr closed this as completed in #56 Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants