Fall 2017 publication thread #11

ctschroeder · 2017-10-30T18:12:21Z

Please use this thread to track our Fall 2017 corpora publication process.

Data freeze: November 9, 2017

Corpora to publish + reviewers:

ap (Carrie & Beth)
pseudo-theophilus (new, Beth)
johannes.canons (new, Carrie)-ready!
victor (martyrdoms) (new, Beth)
dirt (new, Carrie)

Reviewers, please be sure to:

review http://wiki.copticscriptorium.org/doku.php?id=checklist_for_publishing_corpora
CHECK the issues in your corpus dev repository to see if there are global issues or edits to previously published corpora that need to be made
CHECK global issues & edit as necessary (listed here for convenience):
- lemmatization #10
- Regularize POS tag for Christos tagger-part-of-speech#6
make sure no docs in your corpus show red spreadsheet or metadata validation errors in Gitdox

ctschroeder · 2017-11-07T15:26:43Z

@amir-zeldes Apa Johannes document FA 29-30 is done EXCEPT for some additional information (folio #s) needed for the idno metadatum. You should be able to test the TEI converter on it, though. Please see my thread here gucorpling/gitdox#54 about the converter, first. Thanks!!
ETA: it's now ready. Got the info I needed from Diliana.

ctschroeder · 2017-11-08T16:32:55Z

@amir-zeldes AP and Apa Johannes are DONE except for two AP we are waiting on answers to queries; those sayings are from outside contributors and are marked "review."

There are a TON of AP. I edited a few that were already published but needed edits. I updated versioning and committed. However, this means that we have some AP in sgml format and some in excel and some in both. Amir, let me know if you have questions about these. I think the rule of thumb is: if there is an sgml file in the gitdox folder, use that. For any excels, don't use them unless there is no sgml in gitdox. Unless you want me to go through and systematically commit every AP to github in the gitdox folder. Let me know!

amir-zeldes · 2017-11-08T17:43:50Z

Oh, no worries, I'm not going to export from Excel or SGML files - it'll all happen directly from GitDox based on document status (published/to_publish). If you could quickly verify that the statuses are correct, I can attempt the first conversion. I'll have a look at Johannes first maybe.

ctschroeder · 2017-11-08T18:11:47Z

The statuses for the AP and Johannes are correct. I am waiting for responses to those two AP tho.

amir-zeldes · 2017-11-09T15:11:51Z

Should I convert AP without those two then or wait?

ctschroeder · 2017-11-09T16:10:28Z

There is only one left. Greg emailed last night to say he will add the translation today. I will look for it when I get on the train. —c

…

Sent from my iPhone

amir-zeldes · 2017-11-09T16:31:12Z

Ok, I'll hold off on ap. I'm getting on a plane to Pittsburgh but will check mail again in the evening. sent from my mobile

…

On Nov 9, 2017 11:10 AM, "Caroline T. Schroeder" ***@***.***> wrote: There is only one left. Greg emailed last night to say he will add the translation today. I will look for it when I get on the train. —c Sent from my iPhone > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACFlW_JYIImjSrCDGkuWj-ACM70rnNzJks5s0yP0gaJpZM4QLn7D> .

ctschroeder · 2017-11-10T00:01:02Z

Dirt is ready!! OMG this is a lot of material. Also @amir-zeldes I saw your email about TEI but could not get to it with everything else. Will get to it in the morning.

amir-zeldes · 2017-11-10T04:52:48Z

OK, shenoute.dirt is now in ANNIS as well, accessible with your logins. Let me know if everything looks OK (it only had the same issues of pb_xml:id and the TEI column, which I removed)

ctschroeder · 2017-11-10T14:17:12Z

re dirt:
@cluckmarq if you have a login to ANNIS you can see your Dirt file online.

@amir-zeldes a couple of things:
Document Metadata

filename is "dirt" in ANNIS. ?
license is showing the html code not the link

Text & annotation

what are we missing in the annotations so that the linguistic analysis view looks wrong? translation spans?
likewise do we need a paragraph span?

Otherwise I think Dirt looks good.

ctschroeder · 2017-11-10T14:29:13Z

Re johannes.canons:

Document metadata

As with Dirt, the document name is off; it reads "johannes". Something I think is not converting correctly from the document name in Gitdox.
license & source_info likewise display html code not links

Text and annotation

@eplatte is that first N in line 1 supposed to look like that with a line under it or something?
as with dirt: what are we missing in the annotations so that the linguistic analysis view looks wrong? translation spans? it would be nice if the stylesheet could show chapters or verses or break by verses? Do we need p or translation spans the same size as chapter/verse? I know we do not want to have 500 different stylesheets. Since we are moving to versification in all corpora as we (re)publish (except perhaps AP, but we could add them there, too), then perhaps we can decide on a versification stylesheet and just have that one, migrating corpora to using that one as we (re)publish?

Thanks! Let me know if we should annotate anything differently based on this conversation.

amir-zeldes · 2017-11-10T16:55:18Z

OK, I'm on the doc name and license issues. I figured out the naming problem, which is a bug in the TreeTagger module in SNP. It looks fixable, but might have repercussions I don't understand. I opened an issue here:

korpling/treetagger-emf-api#1

Basically, in stripping off the extension of the filename, it just removes everything after the first dot. The quickest fix is to not have dots in filenames, but ultimately (after the release) I'd like to see this fixed. I think not putting corpus.NAME as the name is consistent with our older corpora though, and it's kind of redundant. Would everyone be OK with the document being called: GL71-74 and the corpus name being shenoute.dirt? I think that's actually cleaner than shenoute.dirt > shenoute.dirt.GL71-74.

amir-zeldes · 2017-11-10T16:56:54Z

The hyperlink issue also seems to be an internal SNP thing due to our new workflow. Again, no quick solution, but I can simply 'un-escape' the > etc. manually in the ANNIS files for this release. But in the future, it's a problem we'll need to solve.

Let me know about the document names, and if we're OK with GL71-74 etc., then I can reimport everything with those 2 problems fixed.

ctschroeder · 2017-11-10T17:33:27Z

Thank you for investigating. I would really rather not change document names because we have so many corpora with dots in the names, and some of these docs go free floating like in the TEI archive (which now someone is using). Is there any other way around this problem?

amir-zeldes · 2017-11-10T19:03:51Z

We could wait for the SNP problem to be solved, but that would delay the release... or we could manually change them back in all output documents, which is a bit irritating.

But I'm not sure we should want to: our previous corpora don't do this (so documents in Eagerness are called GL29 etc.), and the folder (or corpus in ANNIS, repo) uniquely identifies the corpus anyway. Did you want to retroactively change names in all corpora to include the doc name?

eplatte · 2017-11-10T23:14:38Z

Johannes:
I think what's happening with the initial ⲛs in Johannes is a visualization problem. The punctuation and the ⲛ are two separate characters that are combining in the visualization for some reason. I'm happy to try to fix it in GitDox if anyone has any suggestions, but those are the correct characters according to Diliana's transcription.
Also, I know the lines in Johannes look odd, but they're all right!
As far as the linguistic analysis view, this document has p spans that are the same as verses, but no translation. Again, let me know if there's something I can fix.

ctschroeder · 2017-11-11T17:23:09Z

Victor should be done. I am not sure the layer names are correct, but they are understandable.

Also Amir, I know you're busy, but when you get a chance, let us know what to do to change the spans so the normalized and analytical visualizations are more sensible.

amir-zeldes · 2017-11-12T17:02:03Z

Sure, norm responds to p to make paragraph breaks (may never be inside a bound group) and analytical responds to translation. The simplest solution for analytic without adding a variant stylesheet is to add translation="..." in the relevant spans. Otherwise, we can have a variant stylesheet that responds to another annotations (technically no problem, just potentially causes confusion if we ever mix up stylesheets or actually add translations).

ctschroeder · 2017-11-13T04:25:49Z

Hi. I have added translation spans so the analytic views should look ok now.
For Victor, Dirt, Johannes: can we have a chapter view visualization that is basically the normalized view, but instead of the text being broken up by p span it is broken up by chapter, and the verses are also included. So something like:
(ch 1) (1) here is some coptic. (2) here is some more coptic. (3) we sure love coptic.
(ch 2) (1) here is a new chapter of coptic. (2) boatloads of coptic
I'm open to suggestions.

ctschroeder · 2017-11-13T04:29:11Z

Sorry: Victor, Dirt, Johannes, Ps-Theophilus. Since you haven't published a test run of Ps-theophilus, maybe you could try the visualization with that one first?
My goal is to have versification for everything so we would gradually be moving over all the texts to that view.
Also can you keep the manuscript # in the visualization in square brackets.

amir-zeldes · 2017-11-15T02:01:48Z

OK, theo, dirt, johannes and victor are now online and ready for inspection in ANNIS. TEI also converts no problem except for a modeling issue Carrie and I are discussing, but basically they all validate, suggesting there are no issues with the underlying annotations at this point.

AP is sadly riddled with little things, so I'll plug away at that next; the other ones would be ready for a complete release on my end.

ctschroeder · 2017-11-15T03:55:05Z

Can I help with AP?

…

Sent from my iPhone

amir-zeldes · 2017-11-15T18:30:57Z

Thanks, no, I need to fix errors and re-run SNP each time, so you can't (if I find something systematic I'll let you know).

What I need from you and @eplatte is just a green light for the other corpora that are in ANNIS right now. If they check out, we just need AP to release!

eplatte · 2017-11-15T18:33:12Z

I'm at Reed today, but I will check during my lunch break. Beth

…

On Nov 15, 2017 10:30 AM, "Amir Zeldes" ***@***.***> wrote: Thanks, no, I need to fix errors and re-run SNP each time, so you can't (if I find something systematic I'll let you know). What I need from you and @eplatte <https://github.com/eplatte> is just a green light for the other corpora that are in ANNIS right now. If they check out, we just need AP to release! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIB4NZvBxESmayC3PaEX2G2R6nsndbGZks5s2y3hgaJpZM4QLn7D> .

amir-zeldes · 2017-11-15T18:50:57Z

Thanks! No need to overdo it though, it can all wait!

ctschroeder · 2018-01-09T20:51:08Z

PUBLISHED! yay

ctschroeder added the publish label Oct 30, 2017

ctschroeder added this to the Fall 2017 milestone Oct 30, 2017

ctschroeder assigned amir-zeldes, ctschroeder and eplatte Oct 30, 2017

ctschroeder closed this as completed Jan 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fall 2017 publication thread #11

Fall 2017 publication thread #11

ctschroeder commented Oct 30, 2017 •

edited

Loading

ctschroeder commented Nov 7, 2017 •

edited

Loading

ctschroeder commented Nov 8, 2017

amir-zeldes commented Nov 8, 2017

ctschroeder commented Nov 8, 2017 via email

amir-zeldes commented Nov 9, 2017

ctschroeder commented Nov 9, 2017 via email

amir-zeldes commented Nov 9, 2017 via email

ctschroeder commented Nov 10, 2017

amir-zeldes commented Nov 10, 2017

ctschroeder commented Nov 10, 2017

ctschroeder commented Nov 10, 2017

amir-zeldes commented Nov 10, 2017

amir-zeldes commented Nov 10, 2017

ctschroeder commented Nov 10, 2017 via email

amir-zeldes commented Nov 10, 2017

eplatte commented Nov 10, 2017

ctschroeder commented Nov 11, 2017

amir-zeldes commented Nov 12, 2017

ctschroeder commented Nov 13, 2017

ctschroeder commented Nov 13, 2017

amir-zeldes commented Nov 15, 2017

ctschroeder commented Nov 15, 2017 via email

amir-zeldes commented Nov 15, 2017

eplatte commented Nov 15, 2017 via email

amir-zeldes commented Nov 15, 2017

ctschroeder commented Jan 9, 2018

Fall 2017 publication thread #11

Fall 2017 publication thread #11

Comments

ctschroeder commented Oct 30, 2017 • edited Loading

ctschroeder commented Nov 7, 2017 • edited Loading

ctschroeder commented Nov 8, 2017

amir-zeldes commented Nov 8, 2017

ctschroeder commented Nov 8, 2017 via email

amir-zeldes commented Nov 9, 2017

ctschroeder commented Nov 9, 2017 via email

amir-zeldes commented Nov 9, 2017 via email

ctschroeder commented Nov 10, 2017

amir-zeldes commented Nov 10, 2017

ctschroeder commented Nov 10, 2017

ctschroeder commented Nov 10, 2017

amir-zeldes commented Nov 10, 2017

amir-zeldes commented Nov 10, 2017

ctschroeder commented Nov 10, 2017 via email

amir-zeldes commented Nov 10, 2017

eplatte commented Nov 10, 2017

ctschroeder commented Nov 11, 2017

amir-zeldes commented Nov 12, 2017

ctschroeder commented Nov 13, 2017

ctschroeder commented Nov 13, 2017

amir-zeldes commented Nov 15, 2017

ctschroeder commented Nov 15, 2017 via email

amir-zeldes commented Nov 15, 2017

eplatte commented Nov 15, 2017 via email

amir-zeldes commented Nov 15, 2017

ctschroeder commented Jan 9, 2018

ctschroeder commented Oct 30, 2017 •

edited

Loading

ctschroeder commented Nov 7, 2017 •

edited

Loading