Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to fine-tune in linux #103

Open
Sunlightshadow opened this issue Aug 24, 2024 · 3 comments
Open

Unable to fine-tune in linux #103

Sunlightshadow opened this issue Aug 24, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Sunlightshadow
Copy link

Sunlightshadow commented Aug 24, 2024

Hi and first of all I am gradeful for publishing a Linux version of Opus-Cat ❤️.
I am using version 1.3.0.0 of Opus-Cat on Fedora 40.
I can download and use models with Opus-Cat. However, I am not able to do the fine tuning.
According to the log, there seems to be a wrong specification for the path of Marian. It tries to access /home/user/marian-dev/src/common/cli_wrapper.cpp:208, but the location does not exist.
In addition, the app cannot access a specific library although I have set the permissions.

Here is the log:

2024-08-24 21:54:58.254 +02:00 [INF] Opening OPUS-CAT MT Engine window
2024-08-24 21:54:58.313 +02:00 [INF] Starting OPUS-CAT MT Engine
2024-08-24 21:54:58.517 +02:00 [INF] Started HTTP API at http://+:8500. This API can be accessed from remote computers, if the firewall has been configured to allow it.
2024-08-24 21:55:20.888 +02:00 [INF] Fine-tuning a new model with model tag xx_ from base model opus-2021-02-22.
2024-08-24 21:56:03.039 +02:00 [INF] Starting batch translator for model eng-deu_opus-2021-02-22.
2024-08-24 21:56:03.044 +02:00 [INF] [2024-08-24 21:56:03] Error: Cannot convert values for the option: log
2024-08-24 21:56:03.044 +02:00 [INF] [2024-08-24 21:56:03] Error: Aborted from void marian::cli::CLIWrapper::updateConfig(const YAML::Node&, marian::cli::OptionPriority, const string&) in /home/user/marian-dev/src/common/cli_wrapper.cpp:208
2024-08-24 21:56:03.044 +02:00 [INF]
2024-08-24 21:56:03.044 +02:00 [INF] [CALL STACK]
2024-08-24 21:56:03.044 +02:00 [INF] [0x555885e7a71d] + 0x20d71d
2024-08-24 21:56:03.044 +02:00 [INF] [0x555885eb1dcf] + 0x244dcf
2024-08-24 21:56:03.044 +02:00 [INF] [0x555885e98204] + 0x22b204
2024-08-24 21:56:03.044 +02:00 [INF] [0x555885d76561] + 0x109561
2024-08-24 21:56:03.044 +02:00 [INF] [0x555885d4c215] + 0xdf215
2024-08-24 21:56:03.044 +02:00 [INF] [0x7f19c5239088] + 0x2a088
2024-08-24 21:56:03.044 +02:00 [INF] [0x7f19c523914b] __libc_start_main + 0x8b
2024-08-24 21:56:03.044 +02:00 [INF] [0x555885d6f805] + 0x102805
2024-08-24 21:56:03.044 +02:00 [INF]
2024-08-24 21:56:03.105 +02:00 [INF] Batch translation process for model eng-deu_opus-2021-02-22 exited. Processing output.
2024-08-24 21:56:03.106 +02:00 [INF] python3-linux-3.8.13-x86_64/bin/python3: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory

I am happy to help you test the program. Thank you very much 🙏!

Edit:

I was able to fix the python permission error. In OpusCatMtEngine.sh you have to change
LD_LIBRARY_PATH=$LD_LIBRARY_PATH./python3-linux-3.8.13-x86_64/lib/
to
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./python3-linux-3.8.13-x86_64/lib/
But the Marian-related bugs still persists:

2024-08-25 05:08:03.198 +02:00 [INF] Opening OPUS-CAT MT Engine window
2024-08-25 05:08:03.252 +02:00 [INF] Starting OPUS-CAT MT Engine
2024-08-25 05:08:03.452 +02:00 [INF] Started HTTP API at http://+:8500. This API can be accessed from remote computers, if the firewall has been configured to allow it.
2024-08-25 05:08:43.864 +02:00 [INF] Fine-tuning a new model with model tag xx from base model opus-2021-02-22.
2024-08-25 05:09:25.590 +02:00 [INF] Starting batch translator for model eng-deu_opus-2021-02-22.
2024-08-25 05:09:25.602 +02:00 [INF] [2024-08-25 05:09:25] Error: Cannot convert values for the option: log
2024-08-25 05:09:25.602 +02:00 [INF] [2024-08-25 05:09:25] Error: Aborted from void marian::cli::CLIWrapper::updateConfig(const YAML::Node&, marian::cli::OptionPriority, const string&) in /home/user/marian-dev/src/common/cli_wrapper.cpp:208
2024-08-25 05:09:25.602 +02:00 [INF]
2024-08-25 05:09:25.602 +02:00 [INF] [CALL STACK]
2024-08-25 05:09:25.602 +02:00 [INF] [0x563e7924371d] + 0x20d71d
2024-08-25 05:09:25.602 +02:00 [INF] [0x563e7927adcf] + 0x244dcf
2024-08-25 05:09:25.602 +02:00 [INF] [0x563e79261204] + 0x22b204
2024-08-25 05:09:25.602 +02:00 [INF] [0x563e7913f561] + 0x109561
2024-08-25 05:09:25.602 +02:00 [INF] [0x563e79115215] + 0xdf215
2024-08-25 05:09:25.602 +02:00 [INF] [0x7f7a22727088] + 0x2a088
2024-08-25 05:09:25.602 +02:00 [INF] [0x7f7a2272714b] __libc_start_main + 0x8b
2024-08-25 05:09:25.602 +02:00 [INF] [0x563e79138805] + 0x102805
2024-08-25 05:09:25.602 +02:00 [INF]
2024-08-25 05:09:25.670 +02:00 [INF] Batch translation process for model eng-deu_opus-2021-02-22 exited. Processing output.
2024-08-25 05:09:25.783 +02:00 [INF] Traceback (most recent call last):
2024-08-25 05:09:25.783 +02:00 [INF] File "./Marian/validate.py", line 54, in
2024-08-25 05:09:25.783 +02:00 [INF] system_ood_sents, system_indomain_sents = extract_lines_and_split(system_output_path,system_seg_method)
2024-08-25 05:09:25.783 +02:00 [INF] File "./Marian/validate.py", line 32, in extract_lines_and_split
2024-08-25 05:09:25.783 +02:00 [INF] with open(sent_file_path,'rt', encoding='utf-8') as sent_file:
2024-08-25 05:09:25.783 +02:00 [INF] FileNotFoundError: [Errno 2] No such file or directory: '/home/xxx/Programme/OpusCatMTEngine_v1.3.0_linux-x64/opuscat/models/eng-deu/opus-2021-02-22_xx/valid.0.txt'

Edit 2:
I am able to fine tune a model when using the "Test!Do not use verison! from the releases page, with my change from above in OpusCatMtEngine.sh! :)

@TommiNieminen
Copy link
Collaborator

TommiNieminen commented Aug 26, 2024

Hi, thanks for your report and testing.

I think the main error from which everything else followed was probably this: Cannot convert values for the option: log

In the directory of the model that is being fine-tuned (you can access it with the Open model button) there is a file called batch.yml that contains the config for batch translation with Marian (edit: the batch.yml file is in the base model directory). That file has a value log, which should contain the path to a log file that is written when batch translating. That value is corrupt for some reason. Does your user name contain spaces by any chance, that's a common reason for path problems?

If you got it working with the test version, I must have fixed the log problem at some point, but the library problem might still occur. I'll release a new version shortly, with some bug fixes, that might solve the problem.

@TommiNieminen
Copy link
Collaborator

I've now release a new version of the cross-platform MT engine: https://github.com/Helsinki-NLP/OPUS-CAT/releases/tag/engine_v1.3.1beta

I'd be interested in knowing if you encounter any of the above problems with this version. I've tested that the Linux version works with both WSL in Windows (on two separate machines), and also on a fresh Ubuntu virtual machine. However, there still might be system-specific problems.

@Sunlightshadow
Copy link
Author

Sunlightshadow commented Aug 27, 2024

Greetings :)

Does your user name contain spaces by any chance, that's a common reason for path problems?

I don't think so. It's suni so it should be no problem from there.
When I fine-tuned I had the following error once but was able to restart it. Here is the part from the model log:

[2024-08-25 09:32:51] Ep. 1 : Up. 400 : Sen. 12,297 : Cost 1.27919793 * 4,907 after 185,532 : Time 6.48s : 756.87 words/s
[2024-08-25 09:32:59] Translating validation set...
[2024-08-25 09:32:59] Error: Segmentation fault
[2024-08-25 09:32:59] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t*, void*)> in /home/user/marian-dev/src/common/logging.cpp:134
[2024-08-25 09:33:28] [marian] Marian v1.9.56 2be8344f 2023-12-19 18:39:32 +0000
[2024-08-25 09:33:28] [marian] Running on suni-pc as process 160797 with command line:
[2024-08-25 09:33:28] [marian] Marian/marian --config /home/suni/.local/share/opuscat/models/eng-deu/opus-2021-02-22_eso/customize.yml --log-level=info
[2024-08-25 09:33:28] [config] after: 0e
[2024-08-25 09:33:28] [config] after-batches: 0
[2024-08-25 09:33:28] [config] after-epochs: 1
[2024-08-25 09:33:28] [config] all-caps-every: 0

I also found some issues running the gui. When I press the button "Open model directory" or any other button that should open my file browser the program crashes because it expects gnome file manager nautilus which I haven't installed because I use KDE Plasma which uses dolphin as file manager.

Unhandled exception. System.ComponentModel.Win32Exception (2): An error occurred trying to start process 'nautilus' with working directory '/home/suni/Programme/Opus-CatTest'. No such file or directory
at System.Diagnostics.Process.ForkAndExecProcess(ProcessStartInfo startInfo, String resolvedFilename, String[] argv, String[] envp, String cwd, Boolean setCredentials, UInt32 userId, UInt32 groupId, UInt32[] groups, Int32& stdinFd, Int32& stdoutFd, Int32& stderrFd, Boolean usesTerminal, Boolean throwOnNoExec)
at System.Diagnostics.Process.StartCore(ProcessStartInfo startInfo)
at System.Diagnostics.Process.Start(ProcessStartInfo startInfo)
at OpusCatMtEngine.LocalModelListView.btnOpenModelDir_Click(Object sender, RoutedEventArgs se) in D:\Users\niemi\source\repos\OPUS-CAT\AvaloniaApplication1\UI\LocalModelListView.axaml.cs:line 39
at Avalonia.Interactivity.EventRoute.RaiseEventImpl(RoutedEventArgs e)
at Avalonia.Interactivity.Interactive.RaiseEvent(RoutedEventArgs e)
at Avalonia.Controls.Button.OnClick()
at Avalonia.Controls.Button.OnPointerReleased(PointerReleasedEventArgs e)
at Avalonia.Reactive.LightweightObservableBase1.PublishNext(T value) at Avalonia.Interactivity.EventRoute.RaiseEventImpl(RoutedEventArgs e) at Avalonia.Interactivity.Interactive.RaiseEvent(RoutedEventArgs e) at Avalonia.Input.MouseDevice.MouseUp(IMouseDevice device, UInt64 timestamp, IInputRoot root, Point p, PointerPointProperties props, KeyModifiers inputModifiers, IInputElement hitTest) at Avalonia.Input.MouseDevice.ProcessRawEvent(RawPointerEventArgs e) at Avalonia.Threading.Dispatcher.Send(SendOrPostCallback action, Object arg, Nullable1 priority)
at Avalonia.Controls.TopLevel.HandleInput(RawInputEventArgs e)
at Avalonia.ManualRawEventGrouperDispatchQueue.DispatchNext()
at Avalonia.X11.X11PlatformThreading.RunLoop(CancellationToken cancellationToken)
at Avalonia.Threading.DispatcherFrame.Run(IControlledDispatcherImpl impl)
at Avalonia.Threading.Dispatcher.PushFrame(DispatcherFrame frame)
at Avalonia.Threading.Dispatcher.MainLoop(CancellationToken cancellationToken)
at Avalonia.Controls.ApplicationLifetimes.ClassicDesktopStyleApplicationLifetime.Start(String[] args)
at Avalonia.ClassicDesktopStyleApplicationLifetimeExtensions.StartWithClassicDesktopLifetime(AppBuilder builder, String[] args, Action`1 lifetimeBuilder)
at OpusCatMtEngine.Program.Main(String[] args) in D:\Users\niemi\source\repos\OPUS-CAT\AvaloniaApplication1\Program.cs:line 12

I will download the new release and test it an fine tune again and see what I will find.

Edit:

Hi, the fine tuning of the model went perfectly. This time I had no errors and also the machine translations of the model are much better than the other test version. Good work. However, the problem with the file manager still exists. The text file in the settings is opened without any problems.

Thank you very much for your work! Sag Let me know if you want me to test anything else for you in linux, I'll let you know if there are any problems.

Edit 2:
I noticed a strange behaviour of the models when translating. If I prefer a model in the priority, I can call it up once or twice, then the checkmark disappears from the checkbox. Opus-Cat only uses the other model which I have downloaded. Now that I have deleted the downloaded model, the other model is accepted without any problems.

@TommiNieminen TommiNieminen added the bug Something isn't working label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants