Skip to content

Commit

Permalink
Multiformat support (#32)
Browse files Browse the repository at this point in the history
  • Loading branch information
jerCarre authored Jul 12, 2022
1 parent 891b563 commit f5117ff
Show file tree
Hide file tree
Showing 8 changed files with 126 additions and 45 deletions.
30 changes: 24 additions & 6 deletions .github/workflows/tmp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
with:
name: test_en.md
path: ${{steps.translate.outputs.generated_file}}
TranslateZH:
TranslateRstEN:
runs-on: ubuntu-latest
steps:
- name: Checkout
Expand All @@ -30,12 +30,30 @@ jobs:
id: translate
uses: ./
with:
input_file: "example/test_fr.md"
output_file: "example/test_zh.md"
output_lang: "ZH"
input_file: "example/test_fr.rst"
output_file: "example/EN-US/"
output_lang: "EN-US"
deepl_free_token: "${{ secrets.TOKEN }}"
- name: Publish
uses: actions/upload-artifact@v3
with:
name: test_en.rst
path: ${{steps.translate.outputs.generated_file}}
TranslateDocxEN:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Translate
id: translate
uses: ./
with:
input_file: "example/test_fr.docx"
output_file: "example/EN-US/"
output_lang: "EN-US"
deepl_free_token: "${{ secrets.TOKEN }}"
- name: Publish
uses: actions/upload-artifact@v3
with:
name: test_zh.md
path: ${{steps.translate.outputs.generated_file}}
name: test_en.docx
path: ${{steps.translate.outputs.generated_file}}
10 changes: 7 additions & 3 deletions docs/EN-US/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
---
lang: EN-US
generator: pandoc
lang: fr
title: Deepl Free Action
viewport: width=device-width, initial-scale=1.0, user-scalable=yes
---

This github action allows to translate a document in a Github repo. It is based on the free version of the tool [DeepL](https://www.deepl.com)
# Deepl Free Action

This github action allows to translate a document in a Github repo. Supported file formats are: md, rst, docx, pptx, html, pdf or txt. It is based on the free version of the tool [DeepL](https://www.deepl.com)

> This documentation is initially written in [French](/FR/) and then automatically translated into [English](/EN-US/) and [Chinese](/ZH/).
Expand All @@ -19,7 +23,7 @@ You must first:

You must fill in the following parameters:

- `input_file` : the markdown file to translate.
- `input_file` : the file to translate.
- `output_file` : the destination file containing the translation. You can only specify a folder (must end with `/` ), in this case the name of the generated file will be the same as `input_file` .
- `output_lang` : the translation language (see [Deepl API](https://www.deepl.com/fr/docs-api/translating-documents/uploading/))
- `deepl_free_token` : your Deepl token
Expand Down
4 changes: 2 additions & 2 deletions docs/FR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ lang: fr
title: Deepl Free Action
---

Cette action github permet de traduire un document au sein d'un repo Github.
Cette action github permet de traduire un document au sein d'un repo Github. Les formats de fichier supportés sont : md, rst, docx, pptx, html, pdf ou txt.
Elle est basée sur la version gratuite de l'outil [DeepL](https://www.deepl.com)

> Cette documentation est initialement écrite en [français](/FR/) puis traduite automatiquement en [anglais](/EN-US/) et en [chinois](/ZH/).
Expand All @@ -20,7 +20,7 @@ Vous devez au préalable :

Vous devez renseigner les paramètres suivants :

* `input_file` : le fichier markdown à traduire.
* `input_file` : le fichier à traduire.
* `output_file` : le fichier destination contenant la traduction. Vous pouvez seulement indiquer un dossier (doit finir par `/`), dans ce cas le nom du fichier généré sera le même que `input_file`.
* `output_lang` : la langue de traduction (voir [Deepl API](https://www.deepl.com/fr/docs-api/translating-documents/uploading/))
* `deepl_free_token` : votre token Deepl
Expand Down
14 changes: 9 additions & 5 deletions docs/ZH/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
---
lang: ZH
title: Deepl Free Action
generator: pandoc
lang: fr
title: 自由行动
viewport: width=device-width, initial-scale=1.0, user-scalable=yes
---

这个github动作允许你翻译一个文件在一个 Github。它是基于免费版本的工具[DeepL](https://www.deepl.com)
# 自由行动

> 该文件最初以[法语](/FR/)编写,然后自动翻译成[英文](/EN-US/)[中文](/ZH/)
这个github动作允许你翻译一个文件在一个 repo。支持的文件格式有:MD、RST、DOCX、PPTX。 html、pdf或txt。它是基于免费版本的工具[DeepL](https://www.deepl.com)

> 该文件最初是用[法语](/FR/)编写的,然后自动翻译成[英语](/EN-US/)[中文](/ZH/)
## 先决条件

Expand All @@ -19,7 +23,7 @@ title: Deepl Free Action

你必须填写以下参数。

- `input_file`要翻译的markdown文件
- `input_file`要翻译的文件
- `output_file` : 目的地文件,包含 翻译。你只能指定一个文件夹(必须以 `/` ),在这种情况下,生成的文件的名称将与 `input_file`
- `output_lang` : 翻译语言(见[Deepl API](https://www.deepl.com/fr/docs-api/translating-documents/uploading/))
- `deepl_free_token` : 你的Deepl令牌
Expand Down
Binary file added example/test_fr.docx
Binary file not shown.
3 changes: 3 additions & 0 deletions example/test_fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@ lang: fr
[GitHub](https://www.github.com) est un service web d'hébergement et de gestion de développement de logiciels, utilisant le logiciel de gestion de versions Git. Ce site est développé en Ruby on Rails et Erlang par *Chris Wanstrath, PJ Hyett et Tom Preston-Werner*.
GitHub propose des comptes professionnels payants, ainsi que des comptes gratuits pour les projets de logiciels libres. Le site assure également un contrôle d'accès et des fonctionnalités destinées à la collaboration comme le suivi des bugs, les demandes de fonctionnalités, la gestion de tâches et un wiki pour chaque projet.

```shell
echo "Bonjour Github"
```
9 changes: 9 additions & 0 deletions example/test_fr.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# GitHub

[GitHub] est un service web d'hébergement et de gestion de développement de logiciels, utilisant le logiciel de gestion de versions Git. Ce site est développé en Ruby on Rails et Erlang par *Chris Wanstrath, PJ Hyett et Tom Preston-Werner*. GitHub propose des comptes professionnels payants, ainsi que des comptes gratuits pour les projets de logiciels libres. Le site assure également un contrôle d'accès et des fonctionnalités destinées à la collaboration comme le suivi des bugs, les demandes de fonctionnalités, la gestion de tâches et un wiki pour chaque projet.

``` shell
echo "Bonjour Github"
```

[GitHub]: https://www.github.com
101 changes: 72 additions & 29 deletions scripts/translate.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
#!/bin/bash
set -e
# set -xe

OUTPUT=""
INPUT=""
TARGET_LANG=""

declare -A ConversionExtensionArray=( [md]=html [markdown]=html [rst]=html [html]=html [docx]=docx [pptx]=pptx [pdf]=pdf [txt]=txt )

# gen UUID
UUID=$(cat /proc/sys/kernel/random/uuid)

Expand All @@ -16,6 +19,57 @@ display_usage() {
echo " <source_file>: the file to translate"
}

convert() {
input_file=$1
output_file=$2

meta_out_option=""
if [ "$#" -eq 3 ]; then
meta_out_option="--metadata-file=$3"
fi

input_extension=${input_file##*.}
output_extension=${output_file##*.}

if [ "$input_extension" != "$output_extension" ]; then

# before conversion actions
pandoc_output_options=""
case ${output_extension,,} in
md|markdown)
pandoc_output_options="-s --wrap=none -t markdown-header_attributes --markdown-headings=atx "${meta_out_option}
;;
rst)
pandoc_output_options="-s --wrap=none "${meta_out_option}
;;
*)
pandoc_output_options="-s "${meta_out_option}
;;
esac

# conversion
pandoc $pandoc_output_options ${input_file} -o ${output_file}

# after conversion actions
case ${output_extension,,} in
md|markdown)
sed -i '/^:::/d' ${output_file}
sed -i 's/^``` {.sourceCode .\([a-z]*\).*}/``` \1/g' ${output_file}
sed -i 's/{translate="no"}/ /g' ${output_file}
;;
html|htm)
sed -i 's/<code>/<code translate="no">/ g' ${output_file}
sed -i 's/<div class="sourceCode"/<div class="sourceCode" translate="no"/g' ${output_file}
;;
*)
;;
esac

else
cp $input_file $output_file > /dev/null
fi
}

# check args : input, target_lang, output
if (($# == 5)); then
while [[ $# -gt 0 ]]; do
Expand Down Expand Up @@ -45,6 +99,14 @@ else
exit 1
fi

INPUT_EXTENSION=${INPUT##*.}

# check input extension support
if [ ! -v "ConversionExtensionArray[$INPUT_EXTENSION]" ]; then
echo "file extension not supported"
exit 1
fi

# check output is folder or file
[[ "${OUTPUT}" == */ ]] && OUTPUT="${OUTPUT}${INPUT##*/}" || OUTPUT="${OUTPUT}"

Expand All @@ -64,18 +126,18 @@ fi
/extractmeta.sh $INPUT -o /tmp/${UUID}.meta.json
SOURCE_LANG=$(cat "/tmp/${UUID}.meta.json" | jq -r 'with_entries(.key |= ascii_downcase ).lang')

# transform input to HTML
pandoc -t html $INPUT -o /tmp/${UUID}.html
# edit original meta to insert/update target lang
jq .lang='"'${TARGET_LANG}'"' /tmp/${UUID}.meta.json > /tmp/${UUID}.meta_out.json

# skip code blocks from translation
sed -i 's/<code>/<code translate="no">/ g' /tmp/${UUID}.html
sed -i 's/<div class="sourceCode"/<div class="sourceCode" translate="no"/g' /tmp/${UUID}.html
# transform input to deepl available format
CONVERSION_EXTENSION=${ConversionExtensionArray[${INPUT_EXTENSION,,}]}
convert $INPUT /tmp/${UUID}.${CONVERSION_EXTENSION}

# ask for translation
if [ -z "$SOURCE_LANG" ]; then
curl -fsSL -X POST ${DEEPL_FREE_URL}/document -F "file=@/tmp/${UUID}.html" -F "auth_key=$DEEPL_FREE_AUTH_TOKEN" -F "target_lang=${TARGET_LANG}" -o /tmp/${UUID}.response.json
if [ "${SOURCE_LANG^^}" == "NULL" ]; then
curl --silent -fSL -X POST ${DEEPL_FREE_URL}/document -F "file=@/tmp/${UUID}.${CONVERSION_EXTENSION}" -F "auth_key=$DEEPL_FREE_AUTH_TOKEN" -F "target_lang=${TARGET_LANG}" -o /tmp/${UUID}.response.json
else
curl -fsSL -X POST ${DEEPL_FREE_URL}/document -F "file=@/tmp/${UUID}.html" -F "auth_key=$DEEPL_FREE_AUTH_TOKEN" -F "target_lang=${TARGET_LANG}" -F "source_lang=${SOURCE_LANG^^}" -o /tmp/${UUID}.response.json
curl --silent -fSL -X POST ${DEEPL_FREE_URL}/document -F "file=@/tmp/${UUID}.${CONVERSION_EXTENSION}" -F "auth_key=$DEEPL_FREE_AUTH_TOKEN" -F "target_lang=${TARGET_LANG}" -F "source_lang=${SOURCE_LANG^^}" -o /tmp/${UUID}.response.json
fi

DOC_ID=$(cat /tmp/${UUID}.response.json | jq -r '.document_id')
Expand All @@ -100,31 +162,12 @@ do
done

# get translated document
curl -fsSL ${DEEPL_FREE_URL}/document/$DOC_ID/result -d auth_key=$DEEPL_FREE_AUTH_TOKEN -d document_key=$DOC_KEY -o /tmp/${UUID}.result.html
curl --silent -fSL ${DEEPL_FREE_URL}/document/$DOC_ID/result -d auth_key=$DEEPL_FREE_AUTH_TOKEN -d document_key=$DOC_KEY -o /tmp/${UUID}.result.${CONVERSION_EXTENSION}

# convert to output
OUTPUT_EXTENSION=${OUTPUT##*.}

# edit original meta to insert/update target lang
jq .lang='"'${TARGET_LANG}'"' /tmp/${UUID}.meta.json > /tmp/${UUID}.meta_out.json

# define pandoc options
PANDOC_OUTPUT_OPTIONS="-s --metadata-file=/tmp/${UUID}.meta_out.json --wrap=none"

if [ "${OUTPUT_EXTENSION^^}" = "MD" ]; then

# add extra options
PANDOC_OUTPUT_OPTIONS="${PANDOC_OUTPUT_OPTIONS} -t markdown-header_attributes --markdown-headings=atx"

pandoc $PANDOC_OUTPUT_OPTIONS /tmp/${UUID}.result.html -o /tmp/${UUID}.ouput.$OUTPUT_EXTENSION

# clean output markdown : remove ::: , modify code block header
sed -i '/^:::/d' /tmp/${UUID}.ouput.$OUTPUT_EXTENSION
sed -i 's/^``` {.sourceCode .\([a-z]*\).*}/``` \1/g' /tmp/${UUID}.ouput.$OUTPUT_EXTENSION
sed -i 's/{translate="no"}/ /g' /tmp/${UUID}.ouput.$OUTPUT_EXTENSION
else
pandoc $PANDOC_OUTPUT_OPTIONS /tmp/${UUID}.result.html -o /tmp/${UUID}.ouput.$OUTPUT_EXTENSION
fi
convert /tmp/${UUID}.result.${CONVERSION_EXTENSION} /tmp/${UUID}.ouput.$OUTPUT_EXTENSION /tmp/${UUID}.meta_out.json

# publish output file
mkdir -p ${OUTPUT%/*} > /dev/null
Expand Down

0 comments on commit f5117ff

Please sign in to comment.