Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Ribasim core tests fail non-deterministically on MacOS #825

Closed
SouthEndMusic opened this issue Nov 24, 2023 · 6 comments
Closed
Labels
test Relates to unit testing

Comments

@SouthEndMusic
Copy link
Collaborator

SouthEndMusic commented Nov 24, 2023

Only this failure remains: #825 (comment)

Specifically @testitem "Allocation solve" and @testitem "Expand logic_mapping". They pass when running the tests sequentially (runtests(Ribasim) in runtests.jl) but sometimes fail when running the tests in parallel
(runtests(Ribasim; nworkers = min(4, Sys.CPU_THREADS ÷ 2), nworker_threads = 2) in runtests.jl).

https://docs.juliahub.com/General/ReTestItems/stable/autodocs/#ReTestItems.runtests

@SouthEndMusic SouthEndMusic added test Relates to unit testing bug labels Nov 24, 2023
@github-project-automation github-project-automation bot moved this to To do in Ribasim Nov 24, 2023
@visr
Copy link
Member

visr commented Nov 24, 2023

Perhaps it is unrelated to parallel after all, since there are also random failures now that we switched to sequential. Or perhaps both issues occur.

Seen locally (now updated to 4.0, see #827):

Test Failed at D:\Ribasim\core\test\allocation_test.jl:22
  Expression: F[(NodeID(1), NodeID(2))] ≈ 4.5
   Evaluated: 4.0 ≈ 4.5

But after changing this, on CI we get the opposite https://github.com/Deltares/Ribasim/actions/runs/6982110719/job/19000606007?pr=827#step:9:444

Test Failed at /home/runner/work/Ribasim/Ribasim/core/test/allocation_test.jl:22
  Expression: F[(NodeID(1), NodeID(2))] ≈ 4.0
   Evaluated: 4.5 ≈ 4.0

A different issue, on main CI, also non-deterministic: https://github.com/Deltares/Ribasim/actions/runs/6981404082/job/18998462984?pr=823#step:9:540

Test Failed at /home/runner/work/Ribasim/Ribasim/core/test/utils_test.jl:173
  Expression: Ribasim.expand_logic_mapping(logic_mapping)
    Expected: Multiple control states found for DiscreteControl node #1 for truth state `TTF`: foo, bar.
  No exception thrown

@visr visr moved this from To do to Sprint backlog in Ribasim Nov 24, 2023
@visr
Copy link
Member

visr commented Nov 24, 2023

In 8ef0361 I temporarily workaround / disable these two tests, and re-enable parallel tests, to get CI functional.

Proper fixes in #828 (comment)

@visr
Copy link
Member

visr commented Nov 24, 2023

Here is one non-deterministic issue that still happens sometimes, that is related to the switch to ReTestItems: https://github.com/Deltares/Ribasim/actions/runs/6983243135/job/19004024801#step:9:443

ERROR: LoadError: IOError: mkdir("/var/folders/3s"; mode=0o777): permission denied (EACCES)
Stacktrace:
  [1] uv_error
    @ Base ./libuv.jl:100 [inlined]
  [2] mkdir(path::String; mode::UInt16)
    @ Base.Filesystem ./file.jl:185
  [3] mkdir
    @ Base.Filesystem ./file.jl:177 [inlined]
  [4] mkpath(path::String; mode::UInt16)
    @ Base.Filesystem ./file.jl:241
  [5] mkpath(path::String; mode::UInt16) (repeats 3 times)
    @ Base.Filesystem ./file.jl:239
  [6] mkpath
    @ ./file.jl:235 [inlined]
  [7] runtests(shouldrun::typeof(ReTestItems.default_shouldrun), paths::String; nworkers::Int64, nworker_threads::Int64, worker_init_expr::Expr, testitem_timeout::Float64, retries::Int64, memory_threshold::Float64, debug::Int64, name::Nothing, tags::Nothing, report::Bool, logs::Symbol, verbose_results::Bool, test_end_expr::Expr)
    @ ReTestItems ~/.julia/packages/ReTestItems/HZCMZ/src/ReTestItems.jl:234
  [8] runtests(shouldrun::Function, pkg::Module; kw::@Kwargs{nworkers::Int64, nworker_threads::Int64})
    @ ReTestItems ~/.julia/packages/ReTestItems/HZCMZ/src/ReTestItems.jl:196
  [9] runtests
    @ ReTestItems ~/.julia/packages/ReTestItems/HZCMZ/src/ReTestItems.jl:193 [inlined]
 [10] #runtests#38
    @ ReTestItems ~/.julia/packages/ReTestItems/HZCMZ/src/ReTestItems.jl:192 [inlined]

This looks like it could be a ReTestItems issue, if it does parallel mkdir and the folder name is "/var/folders/3s", that looks like a recipe for race conditions.

@visr visr changed the title Bug: parallel Ribasim core tests fail non-deterministically Parallel Ribasim core tests fail non-deterministically on MacOS Nov 29, 2023
Hofer-Julian pushed a commit that referenced this issue Dec 6, 2023
Until we have time to look into #825, I feel like it is better to just
disable MacOS for core CI. With Linux and Windows we still have decent
OS coverage.
@Hofer-Julian
Copy link
Contributor

@visr Now that #872 has been merged, should we close this issue?

@Hofer-Julian
Copy link
Contributor

As discussed with @visr we keep it open for now, I will put it back into "To do"

@Hofer-Julian Hofer-Julian moved this from Sprint backlog to To do in Ribasim Dec 7, 2023
@Hofer-Julian Hofer-Julian moved this from To do to Paused in Ribasim Jan 25, 2024
@SnippenE
Copy link

Can not fix because we couldn't find a solution

@github-project-automation github-project-automation bot moved this from Paused to ✅ Done in Ribasim Feb 22, 2024
@SnippenE SnippenE closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Relates to unit testing
Projects
Archived in project
Development

No branches or pull requests

4 participants