Name		Name	Last commit message	Last commit date
parent directory ..
fixings		fixings
profile/html		profile/html
.gitignore		.gitignore
ParlaMint-template.ana.xml		ParlaMint-template.ana.xml
ParlaMint-template.xml		ParlaMint-template.xml
README.md		README.md
chars-summ.pl		chars-summ.pl
chars.pl		chars.pl
check-links.xsl		check-links.xsl
classlisize.py		classlisize.py
coaloppo-tsv2xml.xsl		coaloppo-tsv2xml.xsl
copy.xsl		copy.xsl
corpus2sample.xsl		corpus2sample.xsl
dirify-parlamint.pl		dirify-parlamint.pl
finalize-parlamint.pl		finalize-parlamint.pl
fixchars.pl		fixchars.pl
join-verts.pl		join-verts.pl
list-affiliation-org-role-pairs.xsl		list-affiliation-org-role-pairs.xsl
list-element-attribute.xsl		list-element-attribute.xsl
list-links.xsl		list-links.xsl
list-metadatatext.xsl		list-metadatatext.xsl
ministers-tei2tsv.xsl		ministers-tei2tsv.xsl
ministers-tsv2tei.xsl		ministers-tsv2tei.xsl
orientations-tsv2tei.xsl		orientations-tsv2tei.xsl
pack-parlamint.pl		pack-parlamint.pl
parlamint-add-common-content.xsl		parlamint-add-common-content.xsl
parlamint-coaloppo.xsl		parlamint-coaloppo.xsl
parlamint-factorize-teiHeader.xsl		parlamint-factorize-teiHeader.xsl
parlamint-parties.xsl		parlamint-parties.xsl
parlamint-tei2text.xsl		parlamint-tei2text.xsl
parlamint-tei2vert.pl		parlamint-tei2vert.pl
parlamint-xml2vert.pl		parlamint-xml2vert.pl
parlamint2conllu.pl		parlamint2conllu.pl
parlamint2conllu.xsl		parlamint2conllu.xsl
parlamint2final.xsl		parlamint2final.xsl
parlamint2meta.xsl		parlamint2meta.xsl
parlamint2root.xsl		parlamint2root.xsl
parlamint2tbl-data.xsl		parlamint2tbl-data.xsl
parlamint2tbl-meta.xsl		parlamint2tbl-meta.xsl
parlamint2tbl-overview.xsl		parlamint2tbl-overview.xsl
parlamint2xmlvert.xsl		parlamint2xmlvert.xsl
parlamintp-tei2text.pl		parlamintp-tei2text.pl
parlamintp-tei2vert.pl		parlamintp-tei2vert.pl
parlamintp2conllu.pl		parlamintp2conllu.pl
parties-tei2tsv.xsl		parties-tei2tsv.xsl
polish-xml.pl		polish-xml.pl
validate-parlamint-particDesc.xsl		validate-parlamint-particDesc.xsl
validate-parlamint.pl		validate-parlamint.pl
validate-parlamint.xsl		validate-parlamint.xsl
validatedir-parlamint.pl		validatedir-parlamint.pl
vert2chronotsv.pl		vert2chronotsv.pl

README.md

ParlaMint scripts

This directory contains various scripts that are used to validate or convert ParlaMint corpora to other formats. Most scripts have an explanation of how to run them in comments and the start of the script. Examples of usage are also given in the repository Makefile.

Validation

validate-parlamint.pl: Perl script that runs all the validation scripts below
validate-parlamint.xsl: checks for common encoding or metadata mistakes
check-links.xsl:checks that all IDs that are referred to actually exist
parlamint2root.xsl: not strictly validation (altough the result can be used for such), makes the ParlaMint corpus root files ParlaMint.xml and ParlaMint.ana.xml on the basis of the individual corpora roots.

Conversion

parlamint-tei2text.xsl: transforms a ParlaMint corpus component file to plain text
parlamint2conllu.pl: runs the parlamint2conllu XSLT script as well as running the UD validator on the resulting files. Not that it is assumed that this directory contains (gitignored) the UD validator, which is installed with git clone [email protected]:UniversalDependencies/tools.git
parlamint2conllu.xsl: convert the linguistically annotated TEI corpus component to CoNLL-U format. It expects the TEI root corpus file as the value of the $meta parameter.
parlamint2xmlvert.xsl: convert the linguistically annotated TEI corpus compoment to vertical format for the CQP line of concordancers. It expects the TEI root corpus file as the value of the hdr parameter. Note that the produced files is still in XML - to convert it to "proper" vertical format, use parlamint-xml2vert.pl.
corpus2sample.xsl: takes a root corpus file as input and outputs a sample in output directory, which is specified via the $outDir parameter. The script retains the first and last component file from the corpus, and first and last $Range utterances in them.
classlisize.py: takes a 'plain text' ParlaMint TEI component file as input then uses the classla-stanfordnlp pipeline for linguistic processing, and outputs the linguistically annotated TEI file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scripts

Scripts

README.md

ParlaMint scripts

Validation

Conversion

Files

Scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

Scripts

Folders and files

parent directory

README.md

ParlaMint scripts

Validation

Conversion