Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC crash while allocating GAP object #1283

Closed
fingolfin opened this issue Apr 27, 2022 · 14 comments
Closed

GC crash while allocating GAP object #1283

fingolfin opened this issue Apr 27, 2022 · 14 comments
Labels
bug Something isn't working

Comments

@fingolfin
Copy link
Member

A test of PR #1279 by @HereAround failed with a crash in Julia nightly. Looking at the logs, it crashes during GC while allocating a new GAP string object:

signal (11): Segmentation fault
in expression starting at /home/runner/work/Oscar.jl/Oscar.jl/src/imports.jl:10
maybe_collect at /cache/build/default-amdci4-2/julialang/julia-master/src/gc.c:895 [inlined]
jl_gc_pool_alloc_inner at /cache/build/default-amdci4-2/julialang/julia-master/src/gc.c:1240 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/default-amdci4-2/julialang/julia-master/src/gc.c:1299 [inlined]
jl_gc_alloc_ at /cache/build/default-amdci4-2/julialang/julia-master/src/julia_internal.h:371 [inlined]
jl_gc_alloc at /cache/build/default-amdci4-2/julialang/julia-master/src/gc.c:3365
NewBag at /workspace/srcdir/gap/src/julia_gc.c:1091
NEW_STRING at /workspace/srcdir/gap/src/stringobj.c:420
MakeStringWithLen at /workspace/srcdir/gap/src/stringobj.h:336 [inlined]
MakeString at /workspace/srcdir/gap/src/stringobj.h:343 [inlined]
MakeImmString at /workspace/srcdir/gap/src/stringobj.h:348 [inlined]
InitKernel at /workspace/srcdir/gap/src/opers.cc:3676
ModulesInitKernel at /workspace/srcdir/gap/src/modules.c:947
InitializeGap at /workspace/srcdir/gap/src/gap.c:1476
GAP_Initialize at /workspace/srcdir/gap/src/libgap-api.c:62
initialize at /home/runner/.julia/packages/GAP/3l2vi/src/GAP.jl:99
#4 at /home/runner/.julia/packages/GAP/3l2vi/src/GAP.jl:261 [inlined]

Right now I am still pursuing various ideas what might be behind it. I mainly record it here so that there is a reference we can point people at if this happens again.

@fingolfin fingolfin added the bug Something isn't working label Apr 27, 2022
@benlorenz
Copy link
Member

Maybe we need new binaries again. Bisect points to JuliaLang/julia#42302:

# first bad commit: [c7ea7098ae62e1171541804eeeec9779ad440d27] Add threadpool support to runtime

This PR seems to have modified a few internal datastructures.

@fingolfin
Copy link
Member Author

Crash also happens for PR #1281 in Julia nightly, both for Linux and for macOS.

There was one recent Julia PR merged affecting a part of the GC specifically made for us, but I don't see how it could be related.

Another idea is that some recent Julia PRs changed the layout of jl_task_t and some other structs, which would another update to libjulia_jll and GAP_jll. Alas, I made a quick check of the GAP kernel C code interacting with Julia and found nothing so far that would support this idea.

@benlorenz
Copy link
Member

Maybe we need new binaries again. Bisect points to JuliaLang/julia#42302:

# first bad commit: [c7ea7098ae62e1171541804eeeec9779ad440d27] Add threadpool support to runtime

This PR seems to have modified a few internal datastructures.

To be more precise, that PR added a new threadpoolid to _jl_task_t.

@fingolfin
Copy link
Member Author

@benlorenz yeah I had my eyes on that one, too. But I do not yet understand how; I tried hard to get rid of all references inside Julia structs (I wish Julia split julia.h: only include opaque definitions for things like jl_value_t in it, plus structs that are "guaranteed" to not change, and have the "real" definitions in a separate internal header. That'd make it so much easier to avoid these gotchas. See also JuliaLang/julia#36903)

Anyway: that patch also added globale variables before jl_n_threads. I'll play a bit

@fingolfin
Copy link
Member Author

OK, we access member ptls from jl_task_t which really has shifted. I've prepared a patch for GAP which replaces this by a function call that should stay safe, and also hacked up a fake_julia.h that has just enough definitions to compile GAP's julia_gc.c, but with all structs opaque. I found only one other such reference (however, in code that is never executed when using GAP.jl; I've removed that code now).

I'll update GAP then GAP_jll -- no need to update libjulia_jll for this just now

@kpamnany
Copy link

Sorry about that folks. Please be aware that there will likely be further changes to jl_task_t in the coming weeks, and probably to the TLS state as well.

@fingolfin
Copy link
Member Author

With JuliaPackaging/Yggdrasil#4844 the GAP kernel code no longer directly access members of any structs in julia.h with the exception of jl_taggedvalue_t which can't really be avoided

I'll next investigate the code in JuliaInterface; there we are using the JL_GC_PUSH1 which also have changed in incompatible ways in the past, but I really don't see a way around that.

@fingolfin
Copy link
Member Author

GAP_jll v400.1192.2+0 is now in the registry and should fix this, so I am closing. Of course reopen if it happens again even with the new version...

@benlorenz
Copy link
Member

benlorenz commented Apr 28, 2022

This seems to have broken the tests for julia 1.7 on macos with a similar backtrace: https://github.com/oscar-system/Oscar.jl/runs/6208242212?check_suite_focus=true
I restarted just this one job after the same job had failed here:
https://github.com/oscar-system/Oscar.jl/runs/6208037217?check_suite_focus=true

signal (11): Segmentation fault: 11
in expression starting at /Users/runner/work/Oscar.jl/Oscar.jl/src/imports.jl:10
jl_gc_pool_alloc at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_gc_alloc_typed at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
NewBag at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
NEW_STRING at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
_ZL10InitKernelPK9init_info at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
ModulesInitKernel at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
InitializeGap at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
GAP_Initialize at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
initialize at /Users/runner/.julia/packages/GAP/3l2vi/src/GAP.jl:99
#4 at /Users/runner/.julia/packages/GAP/3l2vi/src/GAP.jl:261 [inlined]
withenv at ./env.jl:172
unknown function (ip: 0x109648a26)
__init__ at /Users/runner/.julia/packages/GAP/3l2vi/src/GAP.jl:260
unknown function (ip: 0x10964105f)
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_module_run_initializer at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_init_restored_modules at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
_include_from_serialized at ./loading.jl:768
_require_search_from_serialized at ./loading.jl:854
_require at ./loading.jl:1097
require at ./loading.jl:1013
require at ./loading.jl:997
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval_import_path at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_in at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval at ./boot.jl:373 [inlined]
include_string at ./loading.jl:1196
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
_include at ./loading.jl:1253
include at ./Base.jl:418
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_f__call_latest at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
include at /Users/runner/work/Oscar.jl/Oscar.jl/src/Oscar.jl:20
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
do_call at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval_body at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_interpret_toplevel_thunk at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_in at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval at ./boot.jl:373 [inlined]
include_string at ./loading.jl:1196
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
_include at ./loading.jl:1253
include at ./Base.jl:418 [inlined]
include_package_for_output at ./loading.jl:1318
jfptr_include_package_for_output_32000.clone_1 at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/sys.dylib (unknown line)
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
do_call at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval_body at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_interpret_toplevel_thunk at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_in at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval at ./boot.jl:373 [inlined]
eval at ./client.jl:453
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
do_call at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval_body at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_interpret_toplevel_thunk at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_in at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval at ./boot.jl:373 [inlined]
exec_options at ./client.jl:268
_start at ./client.jl:495
jfptr__start_24435.clone_1 at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/sys.dylib (unknown line)
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
true_main at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_repl_entrypoint at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
Allocations: 768754 (Pool: 768435; Big: 319); GC: 2

signal (11): Segmentation fault: 11
in expression starting at /Users/runner/work/Oscar.jl/Oscar.jl/src/imports.jl:10
LookupSymbol at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
GVarName at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
GAP_ValueGlobalVariable at /Users/runner/.julia/artifacts/7ec715fbf8a5f089eedbb4d9713ebb83bbbc67b6/lib/libgap.8.dylib (unknown line)
_ValueGlobalVariable at /Users/runner/.julia/packages/GAP/3l2vi/src/ccalls.jl:95 [inlined]
getproperty at /Users/runner/.julia/packages/GAP/3l2vi/src/globals.jl:39
#1 at /Users/runner/.julia/packages/GAP/3l2vi/src/GAP.jl:233
unknown function (ip: 0x1096494af)
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
_atexit at ./initdefs.jl:350
jfptr__atexit_33738.clone_1 at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/sys.dylib (unknown line)
jl_apply_generic at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_atexit_hook at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_exit at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_exit_thread0_cb at /Users/runner/hostedtoolcache/julia/1.7.2/x64/lib/julia/libjulia-internal.1.7.dylib (unknown line)
Allocations: 768754 (Pool: 768435; Big: 319); GC: 2
ERROR: LoadError: Failed to precompile Oscar [f1435218-dba5-11e9-1e4d-f1a5fab5fc13] to /Users/runner/.julia/compiled/v1.7/Oscar/jl_eKClY6.

cc: @lkastner

@benlorenz benlorenz reopened this Apr 28, 2022
@fingolfin fingolfin changed the title GC crash while marking GAP weakref GC crash while allocating GAP object Apr 28, 2022
@fingolfin
Copy link
Member Author

Huh, in Julia 1.7? That's really strange. I have no clue at all what could cause that sigh. But of course I'll take a look.

@benlorenz
Copy link
Member

I can reproduce this on munk with the aarch julia 1.7 as well and it started with the new binaries:

(Oscar) pkg> add GAP_jll@400.1192.1
   Resolving package versions...
   Installed GAP_jll ─ v400.1192.1+0
  Downloaded artifact: GAP
    Updating `~/Oscar.jl/Project.toml`
  [5cd7a574] + GAP_jll v400.1192.1+0
    Updating `~/Oscar.jl/Manifest.toml`
  [5cd7a574]  GAP_jll v400.1192.2+0  v400.1192.1+0
Precompiling project...
  4 dependencies successfully precompiled in 19 seconds (60 already precompiled)

(Oscar) pkg> add GAP_jll@400.1192.2
   Resolving package versions...
    Updating `~/Oscar.jl/Project.toml`
  [5cd7a574]  GAP_jll v400.1192.1+0  v400.1192.2+0
    Updating `~/Oscar.jl/Manifest.toml`
  [5cd7a574]  GAP_jll v400.1192.1+0  v400.1192.2+0
Precompiling project...
  ✗ Oscar
  3 dependencies successfully precompiled in 3 seconds (60 already precompiled)
  1 dependency errored. To see a full report either run `import Pkg; Pkg.precompile()` or load the package

@fingolfin
Copy link
Member Author

I have a theory what's going on now.

@fingolfin
Copy link
Member Author

Should be fixed by JuliaPackaging/Yggdrasil#4850

@benlorenz
Copy link
Member

Seems fixed now, thanks! All green here https://github.com/oscar-system/Oscar.jl/actions/runs/2244067770 after a complete re-run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants