Skip to content

Cerlancism/chatgpt-subtitle-translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ChatGPT API SRT Subtitle Translator

ChatGPT has also demonstrated its capabilities as a robust translator, capable of handling not just common languages, but also unconventional forms of writing like emojis and word scrambling. However, it may not always produce a deterministic output and adhere to line-to-line correlation, potentially disrupting the timing of subtitles, even when instructed to follow precise instructions and setting the model temperature parameter to 0.

This utility uses the OpenAI ChatGPT API to translate text, with a specific focus on line-based translation, especially for SRT subtitles. The translator optimizes token usage by removing SRT overhead, grouping text into batches, resulting in arbitrary length translations without excessive token consumption while ensuring a one-to-one match between line input and output.

Features

  • Web User Interface (Web UI) and Command Line Interface (CLI)
  • New: Supports Structured output: for more concise results, available in the Web UI and in CLI with --experimental-structured-mode.
  • New: Supports Prompt caching, by including the full context of translated data, the system instruction and translation context are packaged to work well with prompt caching, enabled with --experimental-use-full-context (CLI only).
  • Line-based batching: avoiding token limit per request, reducing overhead token wastage, maintaining translation context to certain extent
  • Checking with the free OpenAI Moderation tool: prevent token wastage if the model is highly likely to refuse to translate
  • Streaming process output
  • Request per minute (RPM) rate limits
  • Progress resumption (CLI Only)

Setup

Reference: https://github.com/openai/openai-quickstart-node#setup

  • Node.js version >= 16.13.0 required. This README assumes bash shell environment
  • Clone this repository and navigate into the directory
    git clone https://github.com/Cerlancism/chatgpt-subtitle-translator && cd chatgpt-subtitle-translator
  • Install the requirements
    npm install
  • Give executable permission
    chmod +x cli/translator.mjs
  • Copy .example.env to .env
    cp .env.example .env
  • Add your API key to the newly created .env file

CLI

cli/translator.mjs --help

Usage: translator [options]

Translation tool based on ChatGPT API

Options:

  • --from <language>
    Source language (default: "")

  • --to <language>
    Target language (default: "English")

  • -i, --input <file>
    Input source text with the content of this file, in .srt format or plain text

  • -o, --output <file>
    Output file name, defaults to be based on input file name

  • -p, --plain-text <text>
    Input source text with this plain text argument

  • -s, --system-instruction <instruction>
    Override the prompt system instruction template Translate ${from} to ${to} with this plain text, ignoring --from and --to options

  • --initial-prompts <prompts>
    Initial prompts for the translation in JSON (default: "[]")

  • --no-use-moderator
    Don't use the OpenAI API Moderation endpoint

  • --moderation-model
    (default: "omni-moderation-latest") https://platform.openai.com/docs/models/moderation

  • --no-prefix-number
    Don't prefix lines with numerical indices

  • --no-line-matching
    Don't enforce one to one line quantity input output matching

  • -l, --history-prompt-length <length>
    Length of prompt history to retain for next request batch (default: 10)

  • -b, --batch-sizes <sizes> Batch sizes of increasing order for translation prompt slices in JSON Array (default: "[10,100]")

    The number of lines to include in each translation prompt, provided that they are estimated to within the token limit. In case of mismatched output line quantities, this number will be decreased step-by-step according to the values in the array, ultimately reaching one.

    Larger batch sizes generally lead to more efficient token utilization and potentially better contextual translation. However, mismatched output line quantities or exceeding the token limit will cause token wastage, requiring resubmission of the batch with a smaller batch size.

  • --experimental-structured-mode [mode]
    Enable structured response. (default: array, choices array, object)

    • --experimental-structured-mode array Structures the input and output into a plain array format. This option is more concise compared to base mode, though it uses slightly more tokens per batch.
    • --experimental-structured-mode object Structures both the input and output into a dynamically generated object schema based on input values. This option is even more concise and uses fewer tokens, but requires smaller batch sizes and can be slow and unreliable. Due to its unreliability, it may lead to more resubmission retries, potentially wasting more tokens in the process.
  • --experimental-use-full-context
    Include the full context of translated data to work well with prompt caching.

    The translated lines per user and assistant message pairs are sliced as defined by --history-prompt-length (by default --history-prompt-length 10), it is recommended to set this to the largest batch size (by default --batch-sizes "[10,100]"): --history-prompt-length 100.

    Enabling this may risk running into the model's context window limit, typically 128K, but should be sufficient for most cases.

  • --log-level <level>
    Log level (default: debug, choices: trace, debug, info, warn, error, silent)

  • --silent
    Same as --log-level silent

  • --quiet
    Same as --log-level silent

Additional Options for GPT:

Examples

Plain text

cli/translator.mjs --plain-text "δ½ ε₯½"

Standard Output

Hello.

Emojis

cli/translator.mjs --stream --to "Emojis" --temperature 0 --plain-text "$(curl 'https://api.chucknorris.io/jokes/0ECUwLDTTYSaeFCq6YMa5A' | jq .value)"

Input Argument

Chuck Norris can walk with the animals, talk with the animals; grunt and squeak and squawk with the animals... and the animals, without fail, always say 'yessir Mr. Norris'.

Standard Output

πŸ‘¨β€πŸ¦°πŸ’ͺπŸšΆβ€β™‚οΈπŸ¦œπŸ’πŸ˜πŸ…πŸ†πŸŽπŸ–πŸ„πŸ‘πŸ¦πŸŠπŸ’πŸπŸΏοΈπŸ‡πŸΏοΈβ—οΈπŸŒ³πŸ’¬πŸ˜²πŸ‘‰πŸ€΅πŸ‘¨β€πŸ¦°πŸ‘Š=πŸ•πŸ‘πŸπŸ¦ŒπŸ˜πŸ¦πŸ¦πŸ¦§πŸ¦“πŸ…πŸ¦ŒπŸ¦ŒπŸ¦ŒπŸ†πŸ¦πŸ˜πŸ˜πŸ—πŸ¦“=πŸ‘πŸ€΅.

Scrambling

cli/translator.mjs --stream --system-instruction "Scramble characters of words while only keeping the start and end letter" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Chuck Norris can walk with the animals, talk with the animals;"

Standard Output

Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;

Unscrabling

cli/translator.mjs --stream --system-instruction "Unscramble characters back to English" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;"

Standard Output

Chuck Norris can walk with the animals, talk with the animals;

Plain text file

cli/translator.mjs --stream --temperature 0 --input test/data/test_cn.txt

Input file: test/data/test_cn.txt

δ½ ε₯½γ€‚
ζ‹œζ‹œοΌ

Standard Output

Hello.  
Goodbye!

SRT file

cli/translator.mjs --stream --temperature 0 --input test/data/test_ja_small.srt

Input file: test/data/test_ja_small.srt

1
00:00:00,000 --> 00:00:02,000
γŠγ―γ‚ˆγ†γ”γ–γ„γΎγ™γ€‚

2
00:00:02,000 --> 00:00:05,000
γŠε…ƒζ°—γ§γ™γ‹οΌŸ

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今ζ—₯γ―ε€©ζ°—γŒγ„γ„γ§γ™γ­γ€‚

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい倩気です。

Output file: test/data/test_ja_small.srt.out_English.srt

1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

How it works

Token Reductions

System Instruction
Tokens: 5

Translate Japanese to English
Input Prompt Transform Output

Tokens: 164

Tokens: 83

Tokens: 46

Tokens: 130

1
00:00:00,000 --> 00:00:02,000
γŠγ―γ‚ˆγ†γ”γ–γ„γΎγ™γ€‚

2
00:00:02,000 --> 00:00:05,000
γŠε…ƒζ°—γ§γ™γ‹οΌŸ

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今ζ—₯γ―ε€©ζ°—γŒγ„γ„γ§γ™γ­γ€‚

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい倩気です。
1. γŠγ―γ‚ˆγ†γ”γ–γ„γΎγ™γ€‚
2. γŠε…ƒζ°—γ§γ™γ‹οΌŸ
3. はい、元気です。
4. 今ζ—₯γ―ε€©ζ°—γŒγ„γ„γ§γ™γ­γ€‚
5. はい、とてもいい倩気です。
1. Good morning.
2. How are you?
3. Yes, I'm doing well.
4. The weather is nice today, isn't it?
5. Yes, it's very nice weather.
1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

Results

TODO: More analysis

5 SRT lines:
test/data/test_ja_small.srt

  • None (Plain text SRT input output):
    Tokens: 299
  • No batching, with SRT stripping but one line per prompt with System Instruction overhead, including up to 10 historical prompt context:
    Tokens: 362
  • SRT stripping and line batching of 2:
    Tokens: 276

30 SRT lines:
test/data/test_ja.srt

  • None (Plain text SRT input output):
    Tokens: 1625
  • No batching, with SRT stripping but one line per prompt with System Instruction overhead, including up to 10 historical prompt context:
    Tokens: 6719
  • SRT stripping and line batching of [5, 10], including up to 10 historical prompt context:
    Tokens: 1036