-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing content (only titles) in Escher Tageblatt #137
Comments
In the newly ingested data, there is also an important number of CI with These may correspond to incorrect page region segments, originating from bad OLR and/or errors in the conversion process. We need to investigate further, also with comparing to the 'old' ingested data. I noticed that sometimes the title is equal to the full text (cannot count this in Solr, though. The break down is available via this solr query and below.
|
While checking on samples for the status of title in rebuilt:
These content items only have a title ("t" property) in rebuilt, but no full text. Seems to affect a lot more articles than the ones I sampled and looked at from these issues:
https://impresso-project.ch/app/article/tageblatt-1936-12-24-a-i0030
https://impresso-project.ch/app/article/tageblatt-1936-04-03-a-i0050
https://impresso-project.ch/app/article/tageblatt-1936-10-21-a-i0031
https://impresso-project.ch/app/article/tageblatt-1936-02-01-a-i0033
The text was updated successfully, but these errors were encountered: