Skip to content

Commit

Permalink
fix description, corresp, language #196
Browse files Browse the repository at this point in the history
  • Loading branch information
matyaskopp committed Aug 15, 2023
1 parent 9a9123c commit 37cd30d
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 2 deletions.
20 changes: 18 additions & 2 deletions src/parlaMint/parczech2parlamint.xsl
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@

<xsl:template match="//tei:encodingDesc/tei:projectDesc">
<xsl:copy>
<p xml:lang="cs"><ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref></p>
<p xml:lang="en"><ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a project that aims to (1) create a multilingual set of comparable corpora of parliamentary proceedings uniformly encoded according to the <ref target="https://github.com/clarin-eric/parla-clarin">Parla-CLARIN recommendations</ref> and covering the COVID-19 pandemic from November 2019 as well as the earlier period from 2015 to serve as a reference corpus; (2) process the corpora linguistically to add Universal Dependencies syntactic structures and Named Entity annotation; (3) make the corpora available through concordancers and Parlameter; and (4) build use cases in Political Sciences and Digital Humanities based on the corpus data.</p>
<p xml:lang="cs"><ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> je projekt, jehož cílem je (1) vytvořit vícejazyčný soubor srovnatelných korpusů parlamentních jednání jednotně kódovaných podle <ref target="https://clarin-eric.github.io/ParlaMint/">kritérií ParlaMint</ref> pokrývajících období od roku 2015 do poloviny roku 2022; (2) přidat do korpusu jazykové anotace a strojově je přeložit do angličtiny; (3) zpřístupnit korpus prostřednictvím vyhledávacích nástrojů; a (4) představit příklady využití korpusu v politických vědách a digitálních humanitních vědách.</p>
<p xml:lang="en"><ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a project that aims to (1) create a multilingual set of comparable corpora of parliamentary proceedings uniformly encoded according to the <ref target="https://clarin-eric.github.io/ParlaMint/">ParlaMint encoding guidelines</ref>, covering the period from 2015 to mid-2022; (2) add linguistic annotations to the corpora and machine-translate them to English; (3) make the corpora available through concordancers; and (4) build use cases in Political Sciences and Digital Humanities based on the corpus data.</p>
</xsl:copy>
</xsl:template>

Expand All @@ -89,6 +89,17 @@
</xsl:if>
</xsl:template>

<xsl:template match="//tei:fileDesc/tei:titleStmt/tei:meeting">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:if test="not(@corresp)">
<xsl:attribute name="corresp">#parliament.PSP</xsl:attribute>
<xsl:message>WARN: adding meeting/@corresp: #parliament.PSP</xsl:message>
</xsl:if>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<xsl:template match="//tei:fileDesc/tei:titleStmt/tei:funder[1]">
<xsl:element name="funder">
<orgName xml:lang="en">CLARIN research infrastructure</orgName>
Expand All @@ -97,6 +108,11 @@
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="//tei:setting/tei:name[@type='country']/text()">
<xsl:text>Česká republika</xsl:text>
</xsl:template>


<xsl:template match="//tei:fileDesc/tei:editionStmt/tei:edition">
<xsl:element name="edition">2.0</xsl:element>
</xsl:template>
Expand Down
1 change: 1 addition & 0 deletions src/run_parczech2parlamint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ create_parlaMint() {
--pf \
--delete "//_:tagUsage[@occurs='0']" \
"{}"
find $OUT_DIR/ -type f -name "*.xml" | xargs -I {} sed -i '/^ *$/d' {}
}
create_parlaMint "$INPUT_RAW_DIR" "$OUTPUT_RAW_DIR" "$RENAME_LOG.raw" -t "" 0
Expand Down

0 comments on commit 37cd30d

Please sign in to comment.