Sample data should be pushed to the Data branch of the ParlaMint repository directly into the parliament folder (Data/ParlaMint-XX
) in a flat structure of files.
- Create a GitHub account if you don't have one.
- Fork ParlaMint repository into your organization or private account.
- Start the terminal on your computer and navigate to the folder where you want the ParlaMint local clone of the repository to be placed:
# replace <USER-ORG> with your GitHub user or organization name
git clone [email protected]:<USER-ORG>/ParlaMint.git
- Set the data branch in your repository to be synchronized with the data branch in the ParlaMint repository:
cd ParlaMint
git remote add upstream https://github.com/clarin-eric/ParlaMint.git
git fetch upstream
git checkout -b data upstream/data
git push -u origin data
- check you are in the data branch
git status
# switch do data branch:
git checkout data
- Update your local git repository with your remote repository
git pull
- Add new data to your local git repository:
# replace XX with your country code
git add Data/ParlaMint-XX/*.xml
git commit -m 'XX' Data/ParlaMint-XX/ParlaMint-XX*.xml
-
Add common content (tagUsages, word extents, version):
- edit files and save in
Data/ParlaMint-XX/add-common-content/ParlaMint-XX/
folder:make add-common-content-XX
- check if modified files are ok
- replace
Data/ParlaMint-XX/*.xml
files withData/ParlaMint-XX/add-common-content/ParlaMint-XX/
content - commit changes
git commit -m 'XX add common content' Data/ParlaMint-XX/ParlaMint-XX*.xml
- edit files and save in
-
Push data to your Fork:
git push
- update your repository with new content in ParlaMint repository:
- create a pull request: https://github.com/USER-ORG/ParlaMint/compare/data...clarin-eric:data
- check changes
- merge pull request
- update ParlaMint repository with data in your repository:
- create a pull request: https://github.com/clarin-eric/ParlaMint/compare/data...USER-ORG:data
You can check if all prerequisites are installed with the command make check-prereq
if all success the output is:
Saxon: OK
Jing: OK
UD tools: OK
INFO: Maximum java heap size (saxon needs 5-times more than the size of processed xml file)
1.80469 GB
Saxon is expected to be at this location in your system: /usr/share/java/saxon.jar
You need superuser privileges to do this.
# download saxon file into /opt folder
sudo wget https://search.maven.org/remotecontent?filepath=net/sf/saxon/Saxon-HE/10.6/Saxon-HE-10.6.jar -O /opt/saxon.jar
# create a symbolic link to the correct location
sudo ln -s /opt/saxon.jar /usr/share/java/saxon.jar
Important note: jing archive below also contains Saxon. But that version of Saxon does not support all features that are needed.
Jing is expected to be at this location in your system: /usr/share/java/jing.jar
You need superuser privileges to do this.
# download jing into tmp folder
wget https://github.com/relaxng/jing-trang/releases/download/V20181222/jing-20181222.zip -O /tmp/jing-20181222.zip
# extract jinfg into /opt
sudo unzip /tmp/jing-20181222.zip jing-20181222/bin/* -d /opt
# create a symbolic link to the correct location
sudo ln -s /opt/jing-20181222/bin/jing.jar /usr/share/java/jing.jar
rm /tmp/jing-20181222.zip
- Change directory to
Scripts
folder:cd Scripts
- Clone UD tools repository:
git clone https://github.com/UniversalDependencies/tools.git
- Install Python regex library:
pip3 install --user regex
Running make help
in the repository root folder provides a make targets list with a description.