You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1
00:00:07,312 --> 00:00:09,993
Hello.
2
00:00:09,994 --> 00:00:11,227
Where are you right now?
3
00:00:11,228 --> 00:00:13,360
Right now I am on my way
to South Dakota.
4
00:00:13,361 --> 00:00:16,093
Gonna do a little camping,
do a little fishing.
5
00:00:16,094 --> 00:00:17,426
Good for you, Colter.
but the result.srt has problems:
wrong order
empty line replace with (dalam bahasa Inggris)
appended unknown
1
00:00:07,312 --> 00:00:09,993
Hei, apa yang kau lakukan?
(dalam bahasa Inggris) <-- this should be empty line
2 (satu) <-- the '(satu)' should not be exist
00:00:09,994 --> 00:00:11,227
Di mana kau sekarang?
(dalam bahasa Inggris) ....
3 Pemberantasan Korupsi <-- this also should not be exist
00:00:11,228 --> 00:00:13,360
Saat ini aku sedang dalam perjalanan
ke Dakota Selatan.
(dalam bahasa Inggris) ...
4
00:00:13,361 --> 00:00:16,093
Akan pergi berkemah sedikit,
lakukan sedikit memancing.
(dalam bahasa Inggris) ...
5
00:00:16,094 --> 00:00:17,426
Bagus untukmu, Colter.
(dalam bahasa Inggris) ...
The text was updated successfully, but these errors were encountered:
Had a similar need and the issue ofc boils down to EasyTranslate requiring that every line in the input file is translatable.
Attached patch makes it so that when a line contains only numbers and/or non-alphabetical characters it is not translated but pulled aside and then printed back out during output phase (maybe there's a cleaner way but it appears that whatever is added to the pytorch Dataset structure has to be compatible with accelerator.prepare() so as workaround a collate_fn wrapper separates out any non-tokenized items).
IMO optimally the project could be reworked so that it was easier to call iteratively while parsing a file from a separate utility, or as a smaller change a parameter could be added to translate.py that specified a regex to select which lines to translate, regardless I didn't have the motivation to attempt a cleaner solution so didn't open a PR, but I do use this to translate SRT files so maybe it helps you.
I use this command
with input.srt
but the result.srt has problems:
(dalam bahasa Inggris)
The text was updated successfully, but these errors were encountered: