Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT compiled frame symbols are missing #226

Open
jeaye opened this issue Mar 19, 2025 · 24 comments
Open

JIT compiled frame symbols are missing #226

jeaye opened this issue Mar 19, 2025 · 24 comments
Labels
enhancement New feature or request resolved in next release Resolved in dev

Comments

@jeaye
Copy link

jeaye commented Mar 19, 2025

Jeremy! This project is great. I'm trying to add it to jank right now, as the fallback exception handler.

I have a fun twist for you. jank is a native Clojure dialect on LLVM, so I'm generating LLVM IR and then giving it to LLVM to JIT compile. It looks like cpptrace is unable to find the symbols for any JIT compiled functions.

However, given the same exact executable and flags, gdb can see the symbols. I've asked an LLVM why and I've provided the info below.

Details

CMake

-- Cpptrace auto config: Using libgcc unwind for unwinding
-- Cpptrace auto config: Using libdwarf for symbols
-- Cpptrace auto config: Using cxxabi for demangling

OS

Linux x86_64

Executable

❯ file build/jank
build/jank: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, with debug_info, not stripped

What gdb/lldb are doing

I asked Lang Hames (the main LLVM JIT guy) on the LLVM Discord about this and here's what he said.

gdb and lldb are getting the dwarf through https://sourceware.org/gdb/current/onlinedocs/gdb.html/JIT-Interface.html. That interface is pretty specialized: ORC adds a "debug object" (an object file containing debug info, and with the section addresses in the header updated to reflect their loaded addresses in the executing process) to a linked list headed by a global symbol, then it calls a special registration function. This is where it gets weird: The body of the registration function is a no-op, but GDB and LLDB know to set a breakpoint on that function if they see it in a process, and they use a breakpoint handler to react to the call, reading any new objects from the global linked list of debug objects.
If cpptrace has an API that we can use to register dynamically loaded DWARF then I think we'd be better off adding a custom plugin to call that API.

What do you think? I'm hoping to get this working for both Linux and macOS (x86_64 and aarch64).

@jeremy-rifkin
Copy link
Owner

Hi, I’d be happy to support JIT code. I briefly looked into this once before when someone else asked about it but I didn’t have time to dive very far into it. I’d be happy to accept a PR if you’re interested or I might be able to take a look next week.

@jeremy-rifkin jeremy-rifkin added the enhancement New feature or request label Mar 20, 2025
@jeremy-rifkin
Copy link
Owner

As a side note, I’m a huge fan of Jank :)

@jeaye
Copy link
Author

jeaye commented Mar 20, 2025

As a side note, I’m a huge fan of Jank :)

My man! Not much overlap of C++ and Clojure folks. :)

I’d be happy to accept a PR if you’re interested or I might be able to take a look next week.

If this is something you're open to tackling, I'd rather leave this one to you. I'm happy to help test and I can also provide some guidance for how you can test this with some actual JIT compiled code.

If you ultimately don't have the resources, I can look into getting someone from the jank community to help out.

@jeremy-rifkin
Copy link
Owner

I'm definitely happy to tackle this and try to get it working for Jank. It would probably be easiest for me to test this workflow with a local build / instrumentation of Jank so I don't have to setup a simple LLVM JIT myself. If you have any pointers for a good way to test this I'd be interested, otherwise I can explore :)

@jeaye
Copy link
Author

jeaye commented Mar 20, 2025

You bet. I have merged cpptrace into main now. So, if you can compile jank from source, you should be able to then iterate on the cpptrace submodule.

The build instructions are here: https://github.com/jank-lang/jank/blob/main/compiler+runtime/doc/build.md

In short, please make sure you're either on Linux or macOS x86_64. LLVM has an issue with exception unwinding across JIT frames across aarch64 right now. 😦 After installing your build deps, just run this (in compiler+runtime):

./bin/configure -GNinja -DCMAKE_BUILD_TYPE=Debug
./bin/compile

Then, to replicate the missing symbol issue, create a test.jank file:

(ns test)

(defn foo []
  (throw "meow"))

(defn -main [& args]
  (foo))
(-main)

And run it with ./build/jank run test.jank. It'll throw and you'll see a missing frame for -main and foo.

Image

You can verify that gdb sees it with gdb --args ./build/jank run test.jank and then type catch throw and then run and then backtrace when it catches. Here are the symbols I see:

Image

The cpptrace submodule is in compiler+runtime/third-party/cpptrace. If you hop into there, make whatever changes you want, you can push them by just adding the appropriate remote for your upstream repo. For example:

cd third-party/cpptrace
git remote add upstream https://github.com/jeremy-rifkin/cpptrace
git checkout -b jit-symbols
# Make your changes...
git push upstream jit-symbols

Whenever you make changes to cpptrace, you can run ./bin/compile again for jank. You can also run ./bin/watch ./bin/compile, which will re-compile whenever any jank source OR if you press space in the terminal.

Happy to help, if you have any issues, since you're doing me a solid here.

@jeremy-rifkin
Copy link
Owner

Thanks! I'll give this a try once able!

@jeremy-rifkin
Copy link
Owner

jeremy-rifkin commented Mar 22, 2025

I gave this a shot locally but I seem to be running into errors

In file included from /home/rifkin/jank/compiler+runtime/include/cpp/jank/type.hpp:59:
/home/rifkin/jank/compiler+runtime/include/cpp/jank/native_persistent_string.hpp:71:15: error: constexpr constructor never produces a constant expression [-Winvalid-constexpr]
   71 |     constexpr native_persistent_string() noexcept
      |               ^~~~~~~~~~~~~~~~~~~~~~~~
/home/rifkin/jank/compiler+runtime/include/cpp/jank/native_persistent_string.hpp:71:15: note: non-constexpr constructor 'storage' cannot be used in a constant expression
/home/rifkin/jank/compiler+runtime/include/cpp/jank/native_persistent_string.hpp:734:12: note: declared here
  734 |     struct storage : allocator_type
      |            ^
/home/rifkin/jank/compiler+runtime/include/cpp/jank/native_persistent_string.hpp:76:15: error: constexpr constructor never produces a constant expression [-Winvalid-constexpr]
   76 |     constexpr native_persistent_string(native_persistent_string const &s) noexcept
      |               ^~~~~~~~~~~~~~~~~~~~~~~~
...

Here's what I did:

git clone --recurse-submodules https://github.com/jank-lang/jank.git
cd jank/compiler+runtime
git checkout 80d723de2c891891858831c932d63b0849fbba07
CC=/usr/lib/llvm-19/bin/clang CXX=/usr/lib/llvm-19/bin/clang++ ./bin/configure -GNinja -DCMAKE_BUILD_TYPE=Debug
./bin/compile

I picked 80d723de2c891891858831c932d63b0849fbba07 since it was a green commit on CI

Have you seen this before?

@jeaye
Copy link
Author

jeaye commented Mar 22, 2025

Yep, I've seen this one. It generally happens when the user doesn't have a proper clang/llvm setup. Where did you get that clang/llvm 19?

@jeremy-rifkin
Copy link
Owner

I did this, but it's definitely possible I have something messed up on the machine I'm testing with

wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 19 all

Logs did indicate the right clang appeared to be used

@jeaye
Copy link
Author

jeaye commented Mar 22, 2025

Is this on an old distro, like Ubuntu 22.04 or something?

@jeremy-rifkin
Copy link
Owner

jeremy-rifkin commented Mar 22, 2025

Yes indeed, I'm on ubuntu 22.04. Is 24 needed? (I'd be surprised if that were the case)

@jeaye
Copy link
Author

jeaye commented Mar 22, 2025

I think that's the issue. Likely due to the gcc libs installed from 22.04, namely libstdc++, since that's what clang will be using. Try this (right from your jank directory -- distrobox is amazing):

distrobox create jank-ubuntu --image ubuntu:24.10
distrobox enter jank-ubuntu
sudo apt-get install -y curl git git-lfs zip build-essential entr libssl-dev libdouble-conversion-dev pkg-config ninja-build cmake zlib1g-dev libffi-dev clang libclang-dev llvm llvm-dev libzip-dev libbz2-dev doctest-dev gcc g++ libgc-dev
export CC=clang; export CXX=clang++
./bin/configure -GNinja -DCMAKE_BUILD_TYPE=Debug
./bin/compile

I just did these exact commands, on my machine, and compiled cleanly.

(I'd be surprised if that were the case)

Clang is just the compiler. The standard lib package (which comes from gcc) you're using is 3 years old, back when C++20 features were still very new. So it's understandable when we run into C++20 issues on such an old distro.

@jeremy-rifkin
Copy link
Owner

Thanks! A different libstdc++ makes sense. I was able to get it to build with an ubuntu 24.10 container like you suggested.

I was also able to get the example to run and see the expected missing frames. Will start diving into the JIT interface stuff next.

@jeaye
Copy link
Author

jeaye commented Mar 23, 2025

Thanks! A different libstdc++ makes sense. I was able to get it to build with an ubuntu 24.10 container like you suggested.

I was also able to get the example to run and see the expected missing frames. Will start diving into the JIT interface stuff next.

Niiice! Great job. Thanks again for taking the time for this, Jeremy. 🙂 I owe you one.

@jeremy-rifkin
Copy link
Owner

jeremy-rifkin commented Mar 23, 2025

My pleasure, thanks for the interest in the library! :)

I didn't have a ton of time to look into this today but exploring things a bit:

I instrumented cpptrace to walk the __jit_debug_descriptor linked list and dump out the in-memory object files. I didn't find any debug symbols in these with dwarfdump but readelf showed they have .symtab sections. Just to confirm, is it expected jank isn't emitting any dwarf debug symbols (which for cpptrace would mainly be used for line numbers) and instead it's just function names from the symtab we'd expect to see? (this does seem to match what gdb is able to provide)


Typing up some notes, mainly for myself but also if you have any thoughts I'd be happy to hear:

Looking at the missing frame 5 which is a jit function

#4  in jank_throw at c_api.cpp:876
#5  
#6  in jank::runtime::obj::jit_function::call at jit_function.cpp:71

This frame has raw address 0x7f33aed44090

This seems to correspond to the .text section of the second object file to be registered via the jit interface

obj_1.o:        file format elf64-x86-64

Sections:
Idx Name            Size     VMA              Type
  0                 00000000 0000000000000000 
  1 .strtab         00000187 0000000000000000 
  2 .text           000000ac 00007f33aed44000 TEXT
  3 .rela.text      00000198 0000000000000000 
  4 .bss            00000018 0000000000000000 BSS
  5 .rodata.str1.1  00000043 00007f33aed42000 DATA
  6 .note.GNU-stack 00000000 0000000000000000 
  7 .eh_frame       00000078 00007f33aed42048 
  8 .rela.eh_frame  00000060 0000000000000000 
  9 .symtab         000001e0 0000000000000000 

0x7f33aed44090 - 0x7f33aed44000 is 0x90

$ readelf --symbols obj_1.o

Symbol table '.symtab' contains 20 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS test-test$test_f[...]
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 .text
     3: 0000000000000000     5 OBJECT  LOCAL  DEFAULT    5 .L__unnamed_1
     4: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    4 string_2025564121
     5: 0000000000000005    62 OBJECT  LOCAL  DEFAULT    5 .L__unnamed_2
     6: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 string_1142059953
     7: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    4 data_1437529847
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 .bss
     9: 0000000000000000    60 FUNC    GLOBAL DEFAULT    2 test_jank_global[...]
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND jank_string_create
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND jank_read_string
    12: 0000000000000040    62 FUNC    GLOBAL DEFAULT    2 test_repl_fn_4_0
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND jank_function_bu[...]
    14: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND jank_function_create
    15: 0000000000000080    17 FUNC    GLOBAL DEFAULT    2 test_foo_1_0
    16: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND jank_function_se[...]
    17: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND jank_set_meta
    18: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND jank_throw
    19: 00000000000000a0    12 FUNC    GLOBAL HIDDEN     2 __orc_init_func.

So this falls within the test_foo_1_0 symbol, which is what we expect. Awesome 😄

  • Options for the cpptrace interface
    1. Require registration via some cpptrace:: API in addition to the semi-standard JIT interface
      • This might be a hassle with llvm since llvm is doing the JIT interface part and I don't know if there's an easy way for you to get the in-memory object file handle
    2. Have cpptrace walk the __jit_debug_descriptor
      • This would likely result in an O(n) walk every time a collection of frames is resolved but
      • cpptrace can cache object files here (though maybe there's a risk of a memory location being reused if an in-memory object file is unloaded)
      • gdb/lldb avoid this by setting a breakpoints on __jit_debug_register_code and only walk the linked list then
        • I probably don't want to try to instrument cpptrace to do a breakpoint of its own, and that might interfere with a debugger
        • It might be better to require a function like cpptrace::load_jit_objects() to be called any time jit stuff changes so that cpptrace can avoid a possibly-fickle reliance on caching by memory location.
  • Accessing __jit_debug_descriptor comes down to two options:
    • extern struct jit_descriptor __jit_debug_descriptor; works great for accessing the jit code entries, but, since not all programs will have this symbol it would require some compile-time configuration to turn on/off jit mode. And then if linking against jit mode cpptrace this symbol would have to be defined.
    • cpptrace could try to dlsym it, but this might have problems of its own since it'd require the symbol to be a dynamic export

Path forward

  • cpptrace does its own elf / mach-o parsing for symbol table stuff, these will need to be instrumented for in-memory object files in addition to reading from disk
    • (this may work well with another goal I've had in mind which is having the elf/mach-o parsers use mmap)
  • look into libdwarf stuff (and fortunately libdwarf does have support for in-memory object files)

If jank isn't emitting dwarf symbols here what I'll plan on doing is just setting up a preliminary interface for cpptrace and basic support for scanning in-memory symbol tables, and then dwarf support can hopefully be an easy addition later.

TL;DR: I think this may be easier than I feared and I'm hoping to have something this weekend

@jeaye
Copy link
Author

jeaye commented Mar 23, 2025

Woah, great work! That was a quick first pass. You're right that jank isn't generating debug info alongside the LLVM IR yet. We should have that done in a month or two. jank-lang/jank#242

I'm totally cool with just symbol support for now. Would really appreciate getting the debug info in there, once jank supports it, though. Those frames are going to be the ones which matter most, since it'll be the user's jank code, while the rest are mainly part of the jank compiler/runtime.

In terms of the way forward, ultimately I have little input to give, so long as it works for jank. If you'd like, though, I can either connect you with Lang Hames, the main LLVM JIT guy, or try to get his input on this ticket. For example, that might be allow us to answer the "I don't know if there's an easy way for you to get the in-memory object file handle" bit, among others. If you'd like to reach out to him, this is the LLVM Discord: https://discord.gg/xS7Z362 There's a #jit channel in there, where I know @lhames would be happy to respond. If you don't feel comfortable with that, I'll ping him to see if he can comment here.

@jeremy-rifkin
Copy link
Owner

Thanks for confirming!

The debug info will definitely be important. I can maybe try to wire support through but I won't have an easy way to test that it's really working. I might be able to throw something together with the kaleidoscope jit example.

I'll ask on #jit about in-memory object file access!

@jeremy-rifkin
Copy link
Owner

I've gotten the basic foundation working on linux, I'll work over the next couple days on robustness and various cleanup and macos support

@jeaye
Copy link
Author

jeaye commented Mar 26, 2025

Niiiiice!

Image

Will this work with the existing LLVM 19 binaries, or does it involve LLVM changes for 20 or later?

@jeremy-rifkin
Copy link
Owner

Currently I’m not relying on any special llvm interfaces just the pseudo-standard gdb jit interface. I expect this to work for LLVM 20, 19, older, and later, but I haven’t tested :)

@jeremy-rifkin
Copy link
Owner

jeremy-rifkin commented Mar 29, 2025

I've merged preliminary support for linux and mac to the dev branch. I haven't been able to test on mac yet but linux works. The API I settled on is two parts.

The core interface that allows registering / unregistering in-memory object files is:

namespace cpptrace {
    void register_jit_object(const char*, std::size_t);
    void unregister_jit_object(const char*);
    void clear_all_jit_objects();
}

Then there's a helper in <cpptrace/gdb_jit.hpp> that will walk the list of __jit_debug_descriptor entries and register from there

namespace cpptrace {
    void register_jit_objects_from_gdb_jit_interface();
}

This will need to be called once the JIT object from LLVM is fully prepared for execution and has been added to __jit_debug_descriptor, I'm hoping there's a sensible place in Jank where this can be done, please let me know if this isn't the case and I can come up with something different (in testing I just threw the call before the trace generation but this would naturally be undesirable for performance).

I hope this works for you, please let me know if there's anything else I can do on my end that would be helpful!

@jeaye
Copy link
Author

jeaye commented Mar 31, 2025

Thanks, Jeremy! This is exciting.

So, when we're loading modules, using REPL-based interactive sessions, etc, we're adding new objects to LLVM all of the time. Is the intended behavior to call once register_jit_objects_from_gdb_jit_interface after each new module is added? This may mean calling it hundreds of times. That works for me, if it's designed to operate like that. Thinking about it raised a red flag for me, though, since this seems like it could easily become a O(n ^ 2) operation, if register_jit_objects_from_gdb_jit_interface needs to go through all objects every time it's called.

Just to be clear, the register_jit_object, unregister_jit_object, etc fns are not for jank to call, right?

@jeremy-rifkin
Copy link
Owner

Gotcha, that sort of REPL behavior will throw a wrench into things. I take it LLVM objects are only added in REPL mode not removed? While cpptrace's bookkeeping here should be fast this would indeed have concerning time complexity implications.

I had in mind Jank using the gdb jit utility as opposed to register_jit_object and register_jit_object. While the later two are better in a lot of ways they would require having pointers to the in-memory object files and that seems tricky to get from LLVM, but @lhames seemed to have some ideas.

I will give this some more thought. I'm thinking of some janky (😄) approaches but I'd like to find something more robust.

Could you point me to a place in Jank where LLVM passes are registered?

@jeaye
Copy link
Author

jeaye commented Mar 31, 2025

You're right that we're not currently removing any LLVM modules. We may try to in the future, but it's not something we should worry about for now.

When we create an LLVM module, we register some optimization passes. I suspect that's where we could hook in some other passes, though I'm not very familiar with APIs (and it's riddled with TODOs as a result). That's here: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/src/cpp/jank/codegen/llvm_processor.cpp#L65-L68

Once we codegen the whole module, we then give it to LLVM. LLVM's LLJIT has a fn specifically for adding IR modules, which we call here: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/src/cpp/jank/jit/processor.cpp#L172

There's another case, which is interesting. jank loads object files from disk, when loading modules which have already been precompiled. For that, we use another LLJIT fn specifically for loading object files: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/src/cpp/jank/jit/processor.cpp#L151

In both of these cases, we'll want the symbols registered. The two fns we're calling are defined within LLVM here, in case that's helpful: https://github.com/llvm/llvm-project/blob/main/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp#L920-L928

I don't want you to feel left on out in the rain to solve this on your own, so if I can help in any way, let me know. Happy to chat more in real-time, too, if that can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request resolved in next release Resolved in dev
Projects
None yet
Development

No branches or pull requests

2 participants