Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax BALC[32] to 16-bit variant using trampolines #2

Open
wants to merge 13 commits into
base: nmips/gold_v7
Choose a base branch
from

Conversation

djtodoro
Copy link
Collaborator

@djtodoro djtodoro commented Aug 10, 2023

Enabled by default, to disable please use --no-relax-balc-trampolines.

The patch on the LLVM side: MediaTek-Labs/llvm-project#25

@farazs-github
Copy link
Member

Just for my reference, what kind of integration testing has this patch seen?

@djtodoro
Copy link
Collaborator Author

Just for my reference, what kind of integration testing has this patch seen?

  • check-all passes
  • built llvm-test-suite (no regression nor improvements)

@farazs-github
Copy link
Member

No changes in llvm-test-suite sounds suspect. This was with the accompanying llvm patch and -Os?

@djtodoro
Copy link
Collaborator Author

Well, I agree. I will mark this as DRAFT until I get back with the feedback.

@djtodoro djtodoro marked this pull request as draft August 13, 2023 12:16
@farazs-github
Copy link
Member

Well, if you are revising the patch, 2 points:

  1. We discussed that the placeholder relocation would be added to those calls that we do not want to be optimized. Since most of the code is expected to be compiled with -Os and have the optimization enabled, it would be more efficient to mark the ones we want to skip.
  2. Lets decide on a relocation name instead of NONE ... something to indicate that it is a non-stub/non-tramp call, something like R_NANOMIPS_NOTRAMP or R_NANOMIPS_DIRECT_CALL. The relocation number will be 77. I don't mind reviewing with R_NANOMIPS_NONE, but I am pretty sure we won't merge the patch without this change.

@djtodoro djtodoro marked this pull request as ready for review August 15, 2023 14:01
@djtodoro djtodoro force-pushed the syrmia/gold_v7_blac_stubs branch from 5a7f170 to ef560c3 Compare August 15, 2023 14:05
Copy link
Member

@farazs-github farazs-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you evaluate this against some simple test cases, before resubmitting. I found it to be broken in the most obvious way, by inspecting a few random tests from llvm-test-suites (gcc-loops, sradKernel). Calls that get contracted in the Trampoline phase get expanded back in a later Expansion phase, except the ones where you are inserting the trampoline with an extra BC.

Here is a simple and obvious test:
.text
.linkrelax
.module pcrel
.globl test
.type test,@function
.ent test
test:
jrc $ra
.rept 512
nop32
.endm
.end test

.globl main
.type main,@function
.ent main

main:
balc test
nop
balc test
nop
balc test
nop
balc test
nop
balc test
nop
.end main

@farazs-github farazs-github changed the base branch from mtk/gold_v7 to nmips/gold_v7 September 27, 2023 04:17
gold/nanomips.cc Outdated
@@ -7297,6 +7653,10 @@ Target_nanomips<size, big_endian>::scan_reloc_section_for_transform(
i, prelocs))
continue;

unsigned int notramp_reloc =
has_notramp_reloc<size, big_endian>(r_offset, reloc_count, i, prelocs);
bool has_balc_stub2 = notramp_reloc == false;
Copy link
Member

@farazs-github farazs-github Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Parenthesis around the comparison expression.

  2. From my understanding, please confirm if this is right:
    Input: balc[32] with NOTRAMP -> no_tramp_reloc == true -> has_balc_stub2 == false
    Input: balc[32] without NOTRAMP -> no_tramp_reloc == false -> has_balc_stub2 == true
    Input: any other instruction without NOTRAMP -> no_tramp-reloc == false -> has_balc_stub2 == true??
    The question is, why is has_balc_stub2 true at all for instructions that are not 32-bit calls? At the least, the variable name is misnamed!

gold/nanomips.cc Outdated
@@ -7297,6 +7653,10 @@ Target_nanomips<size, big_endian>::scan_reloc_section_for_transform(
i, prelocs))
continue;

unsigned int notramp_reloc =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect type? Why not bool?

gold/nanomips.cc Outdated
{
unsigned int r_type = elfcpp::elf_r_type<size>(reloc.get_r_info());

if ((r_type != elfcpp::R_NANOMIPS_PC25_S1 || !has_balc_stub2)
Copy link
Member

@farazs-github farazs-github Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary nesting. Would be better to put all parts of the || expression at the same level.

Also note the comment about the name `has_balc_stub2' below. I'd expect has_balc_stub2 to imply that it is a 32-bit call instruction and already has the appropriate relocation and any checking to that effect would simply be in an assert( ).

@motokultivator motokultivator force-pushed the syrmia/gold_v7_blac_stubs branch from c617dee to 2800738 Compare January 10, 2024 19:52
AndrijaSyrmia and others added 9 commits January 22, 2024 13:02
gold/
	* nanomips-insn.def: Add new transformation.
	* nanomips.cc (Nanomips_expand_insn::type): Expand ADDIU[GP.B]
	to ADDIU[GP.W] if the offset is word-aligned and fits 21 bits.
	* testsuite/Makefile.am: Add new tests.
	* testsuite/Makefile.in: Re-generate.
	* testsuite/nanomips_gprel_out_of_range_small.s: New source.
	* testsuite/nanomips_gprel_out_of_range_small.t: New script.
	* testsuite/nanomips_gprel_out_of_range.sh: Check new test
	output.
Enabled by default, to disable use
  --no-relax-balc-trampolines
@AndrijaSyrmia AndrijaSyrmia force-pushed the syrmia/gold_v7_blac_stubs branch from feee56f to b0ad7a8 Compare February 13, 2024 10:44
@AndrijaSyrmia AndrijaSyrmia force-pushed the syrmia/gold_v7_blac_stubs branch from b0ad7a8 to 9894d97 Compare February 20, 2024 14:41
farazs-github pushed a commit that referenced this pull request Apr 28, 2024
When running test-case gdb.server/connect-with-no-symbol-file.exp on
aarch64-linux (specifically, an opensuse leap 15.5 container on a
fedora asahi 39 system), I run into:
...
(gdb) detach^M
Detaching from program: target:connect-with-no-symbol-file, process 185104^M
Ending remote debugging.^M
terminate called after throwing an instance of 'gdb_exception_error'^M
...

The detailed backtrace of the corefile is:
...
 (gdb) bt
 #0  0x0000ffff75504f54 in raise () from /lib64/libpthread.so.0
 #1  0x00000000007a86b4 in handle_fatal_signal (sig=6)
     at gdb/event-top.c:926
 #2  <signal handler called>
 #3  0x0000ffff74b977b4 in raise () from /lib64/libc.so.6
 #4  0x0000ffff74b98c18 in abort () from /lib64/libc.so.6
 #5  0x0000ffff74ea26f4 in __gnu_cxx::__verbose_terminate_handler() ()
    from /usr/lib64/libstdc++.so.6
 #6  0x0000ffff74ea011c in ?? () from /usr/lib64/libstdc++.so.6
 #7  0x0000ffff74ea0180 in std::terminate() () from /usr/lib64/libstdc++.so.6
 #8  0x0000ffff74ea0464 in __cxa_throw () from /usr/lib64/libstdc++.so.6
 bminor#9  0x0000000001548870 in throw_it (reason=RETURN_ERROR,
     error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...)
     at gdbsupport/common-exceptions.cc:203
 bminor#10 0x0000000001548920 in throw_verror (error=TARGET_CLOSE_ERROR,
     fmt=0x16c7810 "Remote connection closed", ap=...)
     at gdbsupport/common-exceptions.cc:211
 bminor#11 0x0000000001548a00 in throw_error (error=TARGET_CLOSE_ERROR,
     fmt=0x16c7810 "Remote connection closed")
     at gdbsupport/common-exceptions.cc:226
 bminor#12 0x0000000000ac8f2c in remote_target::readchar (this=0x233d3d90, timeout=2)
     at gdb/remote.c:9856
 bminor#13 0x0000000000ac9f04 in remote_target::getpkt (this=0x233d3d90,
     buf=0x233d40a8, forever=false, is_notif=0x0) at gdb/remote.c:10326
 bminor#14 0x0000000000acf3d0 in remote_target::remote_hostio_send_command
     (this=0x233d3d90, command_bytes=13, which_packet=17,
     remote_errno=0xfffff1a3cf38, attachment=0xfffff1a3ce88,
     attachment_len=0xfffff1a3ce90) at gdb/remote.c:12567
 bminor#15 0x0000000000ad03bc in remote_target::fileio_fstat (this=0x233d3d90, fd=3,
     st=0xfffff1a3d020, remote_errno=0xfffff1a3cf38)
     at gdb/remote.c:12979
 #16 0x0000000000c39878 in target_fileio_fstat (fd=0, sb=0xfffff1a3d020,
     target_errno=0xfffff1a3cf38) at gdb/target.c:3315
 #17 0x00000000007eee5c in target_fileio_stream::stat (this=0x233d4400,
     abfd=0x2323fc40, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:467
 #18 0x00000000007f012c in <lambda(bfd*, void*, stat*)>::operator()(bfd *,
     void *, stat *) const (__closure=0x0, abfd=0x2323fc40, stream=0x233d4400,
     sb=0xfffff1a3d020) at gdb/gdb_bfd.c:955
 #19 0x00000000007f015c in <lambda(bfd*, void*, stat*)>::_FUN(bfd *, void *,
     stat *) () at gdb/gdb_bfd.c:956
 #20 0x0000000000f9b838 in opncls_bstat (abfd=0x2323fc40, sb=0xfffff1a3d020)
     at bfd/opncls.c:665
 #21 0x0000000000f90adc in bfd_stat (abfd=0x2323fc40, statbuf=0xfffff1a3d020)
     at bfd/bfdio.c:431
 #22 0x000000000065fe20 in reopen_exec_file () at gdb/corefile.c:52
 #23 0x0000000000c3a3e8 in generic_mourn_inferior ()
     at gdb/target.c:3642
 #24 0x0000000000abf3f0 in remote_unpush_target (target=0x233d3d90)
     at gdb/remote.c:6067
 #25 0x0000000000aca8b0 in remote_target::mourn_inferior (this=0x233d3d90)
     at gdb/remote.c:10587
 #26 0x0000000000c387cc in target_mourn_inferior (
     ptid=<error reading variable: Cannot access memory at address 0x2d310>)
     at gdb/target.c:2738
 #27 0x0000000000abfff0 in remote_target::remote_detach_1 (this=0x233d3d90,
     inf=0x22fce540, from_tty=1) at gdb/remote.c:6421
 #28 0x0000000000ac0094 in remote_target::detach (this=0x233d3d90,
     inf=0x22fce540, from_tty=1) at gdb/remote.c:6436
 #29 0x0000000000c37c3c in target_detach (inf=0x22fce540, from_tty=1)
     at gdb/target.c:2526
 #30 0x0000000000860424 in detach_command (args=0x0, from_tty=1)
    at gdb/infcmd.c:2817
 #31 0x000000000060b594 in do_simple_func (args=0x0, from_tty=1, c=0x231431a0)
     at gdb/cli/cli-decode.c:94
 #32 0x00000000006108c8 in cmd_func (cmd=0x231431a0, args=0x0, from_tty=1)
     at gdb/cli/cli-decode.c:2741
 #33 0x0000000000c65a94 in execute_command (p=0x232e52f6 "", from_tty=1)
     at gdb/top.c:570
 #34 0x00000000007a7d2c in command_handler (command=0x232e52f0 "")
     at gdb/event-top.c:566
 #35 0x00000000007a8290 in command_line_handler (rl=...)
     at gdb/event-top.c:802
 #36 0x0000000000c9092c in tui_command_line_handler (rl=...)
     at gdb/tui/tui-interp.c:103
 #37 0x00000000007a750c in gdb_rl_callback_handler (rl=0x23385330 "detach")
     at gdb/event-top.c:258
 #38 0x0000000000d910f4 in rl_callback_read_char ()
     at readline/readline/callback.c:290
 #39 0x00000000007a7338 in gdb_rl_callback_read_char_wrapper_noexcept ()
     at gdb/event-top.c:194
 #40 0x00000000007a73f0 in gdb_rl_callback_read_char_wrapper
     (client_data=0x22fbf640) at gdb/event-top.c:233
 #41 0x0000000000cbee1c in stdin_event_handler (error=0, client_data=0x22fbf640)
     at gdb/ui.c:154
 #42 0x000000000154ed60 in handle_file_event (file_ptr=0x232be730, ready_mask=1)
     at gdbsupport/event-loop.cc:572
 #43 0x000000000154f21c in gdb_wait_for_event (block=1)
     at gdbsupport/event-loop.cc:693
 #44 0x000000000154dec4 in gdb_do_one_event (mstimeout=-1)
    at gdbsupport/event-loop.cc:263
 #45 0x0000000000910f98 in start_event_loop () at gdb/main.c:400
 #46 0x0000000000911130 in captured_command_loop () at gdb/main.c:464
 #47 0x0000000000912b5c in captured_main (data=0xfffff1a3db58)
     at gdb/main.c:1338
 #48 0x0000000000912bf4 in gdb_main (args=0xfffff1a3db58)
     at gdb/main.c:1357
 #49 0x00000000004170f4 in main (argc=10, argv=0xfffff1a3dcc8)
     at gdb/gdb.c:38
 (gdb)
...

The abort happens because a c++ exception escapes to c code, specifically
opncls_bstat in bfd/opncls.c.  Compiling with -fexceptions works around this.

Fix this by catching the exception just before it escapes, in stat_trampoline
and likewise in few similar spot.

Add a new template catch_exceptions to do so in a consistent way.

Tested on aarch64-linux.

Approved-by: Pedro Alves <[email protected]>

PR remote/31577
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31577
farazs-github pushed a commit that referenced this pull request May 5, 2024
If threads are disabled, either by --disable-threading explicitely, or by
missing std::thread support, you get the following ASAN error when
loading symbols:

==7310==ERROR: AddressSanitizer: heap-use-after-free on address 0x614000002128 at pc 0x00000098794a bp 0x7ffe37e6af70 sp 0x7ffe37e6af68
READ of size 1 at 0x614000002128 thread T0
    #0 0x987949 in index_cache_store_context::store() const ../../gdb/dwarf2/index-cache.c:163
    #1 0x943467 in cooked_index_worker::write_to_cache(cooked_index const*, deferred_warnings*) const ../../gdb/dwarf2/cooked-index.c:601
    #2 0x1705e39 in std::function<void ()>::operator()() const /gcc/9/include/c++/9.2.0/bits/std_function.h:690
    #3 0x1705e39 in gdb::task_group::impl::~impl() ../../gdbsupport/task-group.cc:38

0x614000002128 is located 232 bytes inside of 408-byte region [0x614000002040,0x6140000021d8)
freed by thread T0 here:
    #0 0x7fd75ccf8ea5 in operator delete(void*, unsigned long) ../../.././libsanitizer/asan/asan_new_delete.cc:177
    #1 0x9462e5 in cooked_index::index_for_writing() ../../gdb/dwarf2/cooked-index.h:689
    #2 0x9462e5 in operator() ../../gdb/dwarf2/cooked-index.c:657
    #3 0x9462e5 in _M_invoke /gcc/9/include/c++/9.2.0/bits/std_function.h:300

It's happening because cooked_index_worker::wait always returns true in
this case, which tells cooked_index::wait it can delete the m_state
cooked_index_worker member, but cooked_index_worker::write_to_cache tries
to access it immediately afterwards.

Fixed by making cooked_index_worker::wait only return true if desired_state
is CACHE_DONE, same as if threading was enabled, so m_state will not be
prematurely deleted.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31694
Approved-By: Tom Tromey <[email protected]>
Fixed issues in align

Changed picking balc tramp candidates a bit

Reimplemented processing of R_NANOMIPS_NOTRAMP
@AndrijaSyrmia AndrijaSyrmia force-pushed the syrmia/gold_v7_blac_stubs branch from 9894d97 to f1b6d09 Compare May 28, 2024 15:36
farazs-github pushed a commit that referenced this pull request Jul 19, 2024
Similar to the x86_64 testcases, some .s files contain the corresponding
CFI directives.  This helps in validating the synthesized CFI by running
those tests with and without the --scfi=experimental command line
option.

GAS issues some diagnostics, enabled by default, with
--scfi=experimental.  The diagnostics have been added with an intent to
help user correct inadvertent errors in their hand-written asm.  An
error is issued when GAS finds that input asm is not amenable to
accurate CFI synthesis.  The existing scfi-diag-*.s tests in the
gas/testsuite/gas/scfi/x86_64 directory test some SCFI diagnostics
already:

      - (#1) "Warning: SCFI: Asymetrical register restore"
      - (#2) "Error: SCFI: usage of REG_FP as scratch not supported"
      - (#3) "Error: SCFI: unsupported stack manipulation pattern"
      - (#4) "Error: untraceable control flow for func 'XXX'"

In the newly added aarch64 testsuite, further tests for additional
diagnostics have been added:
 - scfi-diag-1.s in this patch highlights an aarch64-specific diagnostic:
   (#5) "Warning: SCFI: ignored probable save/restore op with reg offset"

Additionally, some testcases are added to showcase the (currently)
unsupported patterns, e.g., scfi-unsupported-1.s
        mov     x16, 4384
        sub     sp, sp, x16

gas/testsuite/:
	* gas/scfi/README: Update comment to include aarch64.
	* gas/scfi/aarch64/scfi-aarch64.exp: New file.
	* gas/scfi/aarch64/ginsn-arith-1.l: New test.
	* gas/scfi/aarch64/ginsn-arith-1.s: New test.
	* gas/scfi/aarch64/ginsn-cofi-1.l: New test.
	* gas/scfi/aarch64/ginsn-cofi-1.s: New test.
	* gas/scfi/aarch64/ginsn-ldst-1.l: New test.
	* gas/scfi/aarch64/ginsn-ldst-1.s: New test.
	* gas/scfi/aarch64/scfi-callee-saved-fp-1.d: New test.
	* gas/scfi/aarch64/scfi-callee-saved-fp-1.l: New test.
	* gas/scfi/aarch64/scfi-callee-saved-fp-1.s: New test.
	* gas/scfi/aarch64/scfi-callee-saved-fp-2.d: New test.
	* gas/scfi/aarch64/scfi-callee-saved-fp-2.l: New test.
	* gas/scfi/aarch64/scfi-callee-saved-fp-2.s: New test.
	* gas/scfi/aarch64/scfi-cb-1.d: New test.
	* gas/scfi/aarch64/scfi-cb-1.l: New test.
	* gas/scfi/aarch64/scfi-cb-1.s: New test.
	* gas/scfi/aarch64/scfi-cfg-1.d: New test.
	* gas/scfi/aarch64/scfi-cfg-1.l: New test.
	* gas/scfi/aarch64/scfi-cfg-1.s: New test.
	* gas/scfi/aarch64/scfi-cfg-2.d: New test.
	* gas/scfi/aarch64/scfi-cfg-2.l: New test.
	* gas/scfi/aarch64/scfi-cfg-2.s: New test.
	* gas/scfi/aarch64/scfi-cfg-3.d: New test.
	* gas/scfi/aarch64/scfi-cfg-3.l: New test.
	* gas/scfi/aarch64/scfi-cfg-3.s: New test.
	* gas/scfi/aarch64/scfi-cfg-4.l: New test.
	* gas/scfi/aarch64/scfi-cfg-4.s: New test.
	* gas/scfi/aarch64/scfi-cond-br-1.d: New test.
	* gas/scfi/aarch64/scfi-cond-br-1.l: New test.
	* gas/scfi/aarch64/scfi-cond-br-1.s: New test.
	* gas/scfi/aarch64/scfi-diag-1.l: New test.
	* gas/scfi/aarch64/scfi-diag-1.s: New test.
	* gas/scfi/aarch64/scfi-diag-2.l: New test.
	* gas/scfi/aarch64/scfi-diag-2.s: New test.
	* gas/scfi/aarch64/scfi-diag-3.l: New test.
	* gas/scfi/aarch64/scfi-diag-3.s: New test.
	* gas/scfi/aarch64/scfi-ldrp-1.d: New test.
	* gas/scfi/aarch64/scfi-ldrp-1.l: New test.
	* gas/scfi/aarch64/scfi-ldrp-1.s: New test.
	* gas/scfi/aarch64/scfi-ldrp-2.d: New test.
	* gas/scfi/aarch64/scfi-ldrp-2.l: New test.
	* gas/scfi/aarch64/scfi-ldrp-2.s: New test.
	* gas/scfi/aarch64/scfi-ldstnap-1.d: New test.
	* gas/scfi/aarch64/scfi-ldstnap-1.l: New test.
	* gas/scfi/aarch64/scfi-ldstnap-1.s: New test.
	* gas/scfi/aarch64/scfi-strp-1.d: New test.
	* gas/scfi/aarch64/scfi-strp-1.l: New test.
	* gas/scfi/aarch64/scfi-strp-1.s: New test.
	* gas/scfi/aarch64/scfi-strp-2.d: New test.
	* gas/scfi/aarch64/scfi-strp-2.l: New test.
	* gas/scfi/aarch64/scfi-strp-2.s: New test.
	* gas/scfi/aarch64/scfi-unsupported-1.l: New test.
	* gas/scfi/aarch64/scfi-unsupported-1.s: New test.
	* gas/scfi/aarch64/scfi-unsupported-2.l: New test.
	* gas/scfi/aarch64/scfi-unsupported-2.s: New test.
farazs-github pushed a commit that referenced this pull request Jul 24, 2024
On arm-linux, I run into:
...
PASS: gdb.ada/mi_task_arg.exp: mi runto task_switch.break_me
Expecting: ^(-stack-list-arguments 1[^M
]+)?(\^done,stack-args=\[frame={level="0",args=\[\]},frame={level="1",args=\[{name="<_task>",value="0x[0-9A-Fa-f]+"}(,{name="<_taskL>",value="[0-9]+"})?\]},frame={level="2",args=\[({name="self_id",value="(0x[0-9A-Fa-f]+|<optimized out>)"})?\]},.*[^M
]+[(]gdb[)] ^M
[ ]*)
-stack-list-arguments 1^M
^done,stack-args=[frame={level="0",args=[]},frame={level="1",args=[{name="<_task>",value="0x40bc48"}]},frame={level="2",args=[]}]^M
(gdb) ^M
FAIL: gdb.ada/mi_task_arg.exp: -stack-list-arguments 1 (unexpected output)
...

The problem is that the test-case expects a level 3 frame, but there is none.

This can be reproduced using cli bt:
...
 $ gdb -q -batch outputs/gdb.ada/mi_task_arg/task_switch \
   -ex "b task_switch.break_me" \
   -ex run \
   -ex bt
 Breakpoint 1 at 0x34b4: file task_switch.adb, line 57.

 Thread 3 "my_caller" hit Breakpoint 1, task_switch.break_me () \
   at task_switch.adb:57
 57	      null;
 #0  task_switch.break_me () at task_switch.adb:57
 #1  0x00403424 in task_switch.caller (<_task>=0x40bc48) at task_switch.adb:51
 #2  0xf7f95a08 in ?? () from /lib/arm-linux-gnueabihf/libgnarl-12.so
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
...

The purpose of the test-case is printing the frame at level 1, so I don't
think we should bother about the presence of the frame at level 3.

Fix this by allowing the backtrace to stop at level 2.

Tested on arm-linux.

Approved-By: Luis Machado <[email protected]>
Approved-By: Andrew Burgess <[email protected]>
farazs-github pushed a commit that referenced this pull request Jul 31, 2024
Since commit b1da98a ("gdb: remove use of alloca in
new_macro_definition"), if cached_argv is empty, we call macro_bcache
with a nullptr data.  This ends up caught by UBSan deep down in the
bcache code:

    $ ./gdb -nx -q --data-directory=data-directory  /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp -readnow
    Reading symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp...
    Expanding full symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp...
    /home/smarchi/src/binutils-gdb/gdb/bcache.c:195:12: runtime error: null pointer passed as argument 2, which is declared to never be null

The backtrace:

    #1  0x00007ffff619a05d in __ubsan::__ubsan_handle_nonnull_arg_abort (Data=<optimized out>) at ../../../../src/libsanitizer/ubsan/ubsan_handlers.cpp:750
    #2  0x000055556337fba2 in gdb::bcache::insert (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.c:195
    #3  0x0000555564b49222 in gdb::bcache::insert<char const*, void> (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.h:158
    #4  0x0000555564b481fa in macro_bcache<char const*> (t=0x62100007ae70, addr=0x0, len=0) at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:117
    #5  0x0000555564b42b4a in new_macro_definition (t=0x62100007ae70, kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:573
    #6  0x0000555564b44674 in macro_define_internal (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:777
    #7  0x0000555564b44ae2 in macro_define_function (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:816
    #8  0x0000555563f62fc8 in parse_macro_definition (file=0x6210000ab9e0, line=469, body=0x62a00003af2a "__va_arg_pack() __builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:203

This can be reproduced by running gdb.base/macscp.exp.  Avoid calling
macro_bcache if the macro doesn't have any arguments.

Change-Id: I33b5a7c3b3a93d5adba98983fcaae9c8522c383d
farazs-github pushed a commit that referenced this pull request Aug 2, 2024
Some flavors of indirect call and jmp instructions were not being
handled earlier, leading to a GAS error (#1):
  (#1) "Error: SCFI: unhandled op 0xff may cause incorrect CFI"

Not handling jmp/call (direct or indirect) ops is an error (as shown
above) because SCFI needs an accurate CFG to synthesize CFI correctly.
Recall that the presence of indirect jmp/call, however, does make the
CFG ineligible for SCFI. In other words, generating the ginsns for them
now, will eventually cause SCFI to bail out later with an error (#2)
anyway:
  (#2) "Error: untraceable control flow for func 'XXX'"

The first error (#1) gives the impression of missing functionality in
GAS.  So, it seems cleaner to synthesize a GINSN_TYPE_JUMP /
GINSN_TYPE_CALL now in the backend, and let SCFI machinery complain with
the error as expected.

The handling for these indirect jmp/call instructions is similar, so
reuse the code by carving out a function for the same.

Adjust the testcase to include the now handled jmp/call instructions as
well.

gas/
	* config/tc-i386-ginsn.c (x86_ginsn_indirect_branch): New
	function.
	(x86_ginsn_new): Refactor out functionality to above.

gas/testsuite/
	* gas/scfi/x86_64/ginsn-cofi-1.l: Adjust the output.
	* gas/scfi/x86_64/ginsn-cofi-1.s: Add further varieties of
	jmp/call opcodes.
farazs-github pushed a commit that referenced this pull request Sep 5, 2024
With test-case gdb.dwarf2/dw2-lines.exp on arm-linux, I run into:
...
(gdb) break bar_label^M
Breakpoint 2 at 0x4004f6: file dw2-lines.c, line 29.^M
(gdb) continue^M
Continuing.^M
^M
Breakpoint 2, bar () at dw2-lines.c:29^M
29        foo (2);^M
(gdb) PASS: $exp: cv=2: cdw=32: lv=2: ldw=32: continue to breakpoint: foo \(1\)
...

The pass is incorrect because the continue lands at line 29 with "foo (2)"
instead of line line 27 with "foo (1)".

A minimal version is:
...
$ gdb -q -batch dw2-lines.cv-2-cdw-32-lv-2-ldw-32 -ex "b bar_label"
Breakpoint 1 at 0x4f6: file dw2-lines.c, line 29.
...
where:
...
000004ec <bar>:
 4ec:	b580      	push	{r7, lr}
 4ee:	af00      	add	r7, sp, #0

000004f0 <bar_label>:
 4f0:	2001      	movs	r0, #1
 4f2:	f7ff fff1 	bl	4d8 <foo>

000004f6 <bar_label_2>:
 4f6:	2002      	movs	r0, #2
 4f8:	f7ff ffee 	bl	4d8 <foo>
...

So, how does this happen?  In short:
- skip_prologue_sal calls arm_skip_prologue with pc == 0x4ec,
- thumb_analyze_prologue returns 0x4f2
  (overshooting by 1 insn, PR tdep/31981), and
- skip_prologue_sal decides that we're mid-line, and updates to 0x4f6.

However, this is a test-case about .debug_line info, so why didn't arm_skip_prologue
use the line info to skip the prologue?

The answer is that the line info starts at bar_label, not at bar.

Fixing that allows us to work around PR tdep/31981.

Likewise in gdb.dwarf2/dw2-line-number-zero.exp.

Instead, add a new test-case gdb.arch/skip-prologue.exp that is dedicated to
checking quality of architecture-specific prologue analysis, without being
written in an architecture-specific way.

If fails on arm-linux for both marm and mthumb:
...
FAIL: gdb.arch/skip-prologue.exp: f2: $bp_addr == $prologue_end_addr (skipped too much)
FAIL: gdb.arch/skip-prologue.exp: f4: $bp_addr == $prologue_end_addr (skipped too much)
...
and passes for:
- x86_64-linux for {m64,m32}x{-fno-PIE/-no-pie,-fPIE/-pie}
- aarch64-linux.

Tested on arm-linux.
farazs-github pushed a commit that referenced this pull request Sep 8, 2024
When GDB opens a core file, in 'core_target::build_file_mappings ()',
we collection information about the files that are mapped into the
core file, specifically, the build-id and the DT_SONAME attribute for
the file, which will be set for some shared libraries.

We then cache the DT_SONAME to build-id information on the core file
bfd object in the function set_cbfd_soname_build_id.

Later, when we are loading the shared libraries for the core file, we
can use the library's file name to look in the DT_SONAME to build-id
map, and, if we find a matching entry, we can use the build-id to
validate that we are loading the correct shared library.

This works OK, but has some limitations: not every shared library will
have a DT_SONAME attribute.  Though it is good practice to add such an
attribute, it's not required.  A library without this attribute will
not have its build-id checked, which can lead to GDB loading the wrong
shared library.

What I want to do in this commit is to improve GDB's ability to use
the build-ids extracted in core_target::build_file_mappings to both
validate the shared libraries being loaded, and then to use these
build-ids to potentially find (via debuginfod) the shared library.

To do this I propose making the following changes to GDB:

(1) Rather than just recording the DT_SONAME to build-id mapping in
set_cbfd_soname_build_id, we should also record, the full filename to
build-id mapping, and also the memory ranges to build-id mapping for
every memory range covered by every mapped file.

(2) Add a new callback solib_ops::find_solib_addr.  This callback
takes a solib object and returns an (optional) address within the
inferior that is part of this library.  We can use this address to
find a mapped file using the stored memory ranges which will increase
the cases in which a match can be found.

(3) Move the mapped file record keeping out of solib.c and into
corelow.c.  Future commits will make use of this information from
other parts of GDB.  This information was never solib specific, it
lived in the solib.c file because that was the only user of the data,
but really, the data is all about the core file, and should be stored
in core_target, other parts of GDB can then query this data as needed.

Now, when we load a shared library for a core file, we do the
following lookups:

  1. Is the exact filename of the shared library found in the filename
  to build-id map?  If so then use this build-id for validation.

  2. Find an address within the shared library using ::find_solib_addr
  and then look for an entry in the mapped address to build-id map.
  If an entry is found then use this build-id.

  3. Finally, look in the soname to build-id map.  If an entry is
  found then use this build-id.

The addition of step #2 here means that GDB is now far more likely to
find a suitable build-id for a shared library.  Having acquired a
build-id the existing code for using debuginfod to lookup a shared
library object can trigger more often.

On top of this, we also create a build-id to filename map.  This is
useful as often a shared library is implemented as a symbolic link to
the actual shared library file.  The mapped file information is stored
based on the actual, real file name, while the shared library
information holds the original symbolic link file name.

If when loading the shared library, we find the symbolic link has
disappeared, we can use the build-id to file name map to check if the
actual file is still around, if it is (and if the build-id matches)
then we can fall back to use that file.  This is another way in which
we can slightly increase the chances that GDB will find the required
files when loading a core file.

Adding all of the above required pretty much a full rewrite of the
existing set_cbfd_soname_build_id function and the corresponding
get_cbfd_soname_build_id function, so I have taken the opportunity to
move the information caching out of solib.c and into corelow.c where
it is now accessed through the function core_target_find_mapped_file.

At this point the benefit of this move is not entirely obvious, though
I don't think the new location is significantly worse than where it
was originally.  The benefit though is that the cached information is
no longer tied to the shared library loading code.

I already have a second set of patches (not in this series) that make
use of this caching from elsewhere in GDB.  I've not included those
patches in this series as this series is already pretty big, but even
if those follow up patches don't arrive, I think the new location is
just as good as the original location.

Rather that caching the information within the core file BFD via the
registry mechanism, the information used for the mapped file lookup is
now stored within the core_file target directly.
farazs-github pushed a commit that referenced this pull request Sep 9, 2024
The commit:

  commit c6b4867
  Date:   Thu Mar 30 19:21:22 2023 +0100

      gdb: parse pending breakpoint thread/task immediately

Introduce a use bug where the value of a temporary variable was being
used after it had gone out of scope.  This was picked up by the
address sanitizer and would result in this error:

  (gdb) maintenance selftest create_breakpoint_parse_arg_string
  Running selftest create_breakpoint_parse_arg_string.
  =================================================================
  ==2265825==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fbb08046511 at pc 0x000001632230 bp 0x7fff7c2fb770 sp 0x7fff7c2fb768
  READ of size 1 at 0x7fbb08046511 thread T0
      #0 0x163222f in create_breakpoint_parse_arg_string(char const*, std::unique_ptr<char, gdb::xfree_deleter<char> >*, int*, int*, int*, std::unique_ptr<char, gdb::xfree_deleter<char> >*, bool*) ../../src/gdb/break-cond-parse.c:496
      #1 0x1633026 in test ../../src/gdb/break-cond-parse.c:582
      #2 0x163391b in create_breakpoint_parse_arg_string_tests ../../src/gdb/break-cond-parse.c:649
      #3 0x12cfebc in void std::__invoke_impl<void, void (*&)()>(std::__invoke_other, void (*&)()) /usr/include/c++/13/bits/invoke.h:61
      #4 0x12cc8ee in std::enable_if<is_invocable_r_v<void, void (*&)()>, void>::type std::__invoke_r<void, void (*&)()>(void (*&)()) /usr/include/c++/13/bits/invoke.h:111
      #5 0x12c81e5 in std::_Function_handler<void (), void (*)()>::_M_invoke(std::_Any_data const&) /usr/include/c++/13/bits/std_function.h:290
      #6 0x18bb51d in std::function<void ()>::operator()() const /usr/include/c++/13/bits/std_function.h:591
      #7 0x4193ef9 in selftests::run_tests(gdb::array_view<char const* const>, bool) ../../src/gdbsupport/selftest.cc:100
      #8 0x21c2206 in maintenance_selftest ../../src/gdb/maint.c:1172
      ... etc ...

The problem was caused by three lines like this one:

  thread_info *thr
    = parse_thread_id (std::string (t.get_value ()).c_str (), &tmptok);

After parsing the thread-id TMPTOK would be left pointing into the
temporary string which had been created on this line.  When on the
next line we did this:

  gdb_assert (*tmptok == '\0');

The value of *TMPTOK is undefined.

Fix this by creating the std::string earlier in the scope.  Now the
contents of the string will remain valid when we check *TMPTOK.  The
address sanitizer issue is now resolved.
farazs-github pushed a commit that referenced this pull request Oct 18, 2024
When building gdb with gcc 12 and -fsanitize=threads while renabling
background dwarf reading by setting dwarf_synchronous to false, I run into:
...
(gdb) file amd64-watchpoint-downgrade
Reading symbols from amd64-watchpoint-downgrade...
(gdb) watch global_var
==================
WARNING: ThreadSanitizer: data race (pid=20124)
  Read of size 8 at 0x7b80000500d8 by main thread:
    #0 cooked_index_entry::full_name(obstack*, bool) const cooked-index.c:220
    #1 cooked_index::get_main_name(obstack*, language*) const cooked-index.c:735
    #2 cooked_index_worker::wait(cooked_state, bool) cooked-index.c:559
    #3 cooked_index::wait(cooked_state, bool) cooked-index.c:631
    #4 cooked_index_functions::wait(objfile*, bool) cooked-index.h:729
    #5 cooked_index_functions::compute_main_name(objfile*) cooked-index.h:806
    #6 objfile::compute_main_name() symfile-debug.c:461
    #7 find_main_name symtab.c:6503
    #8 main_language() symtab.c:6608
    bminor#9 set_initial_language_callback symfile.c:1634
    bminor#10 get_current_language() language.c:96
    ...

  Previous write of size 8 at 0x7b80000500d8 by thread T1:
    #0 cooked_index_shard::finalize(parent_map_map const*) \
         dwarf2/cooked-index.c:409
    #1 operator() cooked-index.c:663
    ...

  ...

SUMMARY: ThreadSanitizer: data race cooked-index.c:220 in \
  cooked_index_entry::full_name(obstack*, bool) const
==================
Hardware watchpoint 1: global_var
(gdb) PASS: gdb.arch/amd64-watchpoint-downgrade.exp: watch global_var
...

This was also reported in PR31715.

This is due do gcc PR110799 [1], generating wrong code with
-fhoist-adjacent-loads, and causing a false positive for
-fsanitize=threads.

Work around the gcc PR by forcing -fno-hoist-adjacent-loads for gcc <= 13
and -fsanitize=threads.

Tested in that same configuration on x86_64-linux.  Remaining ThreadSanitizer
problems are the ones reported in PR31626 (gdb.rust/dwindex.exp) and
PR32247 (gdb.trace/basic-libipa.exp).

PR gdb/31715
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31715

Tested-By: Bernd Edlinger <[email protected]>
Approved-By: Tom Tromey <[email protected]>

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
farazs-github pushed a commit that referenced this pull request Oct 29, 2024
When calling a function with double arguments, I get this asan error:

==7920==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x0053131ece38 at pc 0x7ff79697a68f bp 0x0053131ec790 sp 0x0053131ebf40
READ of size 16 at 0x0053131ece38 thread T0
    #0 0x7ff79697a68e in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long long), void const*, void const*, unsigned long long) C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:814
    #1 0x7ff79697aebd in memcmp C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:845
    #2 0x7ff79697aebd in memcmp C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:840
    #3 0x7ff7927e237f in regcache::raw_write(int, gdb::array_view<unsigned char const>) C:/gdb/src/gdb.git/gdb/regcache.c:874
    #4 0x7ff7927e3c85 in regcache::cooked_write(int, gdb::array_view<unsigned char const>) C:/gdb/src/gdb.git/gdb/regcache.c:914
    #5 0x7ff7927e5d89 in regcache::cooked_write(int, unsigned char const*) C:/gdb/src/gdb.git/gdb/regcache.c:933
    #6 0x7ff7911d5965 in amd64_windows_store_arg_in_reg C:/gdb/src/gdb.git/gdb/amd64-windows-tdep.c:216

Address 0x0053131ece38 is located in stack of thread T0 at offset 40 in frame
    #0 0x7ff7911d565f in amd64_windows_store_arg_in_reg C:/gdb/src/gdb.git/gdb/amd64-windows-tdep.c:208

  This frame has 4 object(s):
    [32, 40) 'buf' (line 211) <== Memory access at offset 40 overflows this variable

It's because the first 4 double arguments are passed via XMM registers,
and they need a buffer of 16 bytes, even if we only use 8 bytes of them.

Approved-By: Tom Tromey <[email protected]>
farazs-github pushed a commit that referenced this pull request Nov 1, 2024
On Windows gcore is not implemented, and if you try it, you get an
heap-use-after-free error:

(gdb) gcore C:/gdb/build64/gdb-git-python3/gdb/testsuite/outputs/gdb.base/gcore-buffer-overflow/gcore-buffer-overflow.test
warning: cannot close "=================================================================
==10108==ERROR: AddressSanitizer: heap-use-after-free on address 0x1259ea503110 at pc 0x7ff6806e3936 bp 0x0062e01ed990 sp 0x0062e01ed140
READ of size 111 at 0x1259ea503110 thread T0
    #0 0x7ff6806e3935 in strlen C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:391
    #1 0x7ff6807169c4 in __pformat_puts C:/gcc/src/mingw-w64-v12.0.0/mingw-w64-crt/stdio/mingw_pformat.c:558
    #2 0x7ff6807186c1 in __mingw_pformat C:/gcc/src/mingw-w64-v12.0.0/mingw-w64-crt/stdio/mingw_pformat.c:2514
    #3 0x7ff680713614 in __mingw_vsnprintf C:/gcc/src/mingw-w64-v12.0.0/mingw-w64-crt/stdio/mingw_vsnprintf.c:41
    #4 0x7ff67f34419f in vsnprintf(char*, unsigned long long, char const*, char*) C:/msys64/mingw64/x86_64-w64-mingw32/include/stdio.h:484
    #5 0x7ff67f34419f in string_vprintf[abi:cxx11](char const*, char*) C:/gdb/src/gdb.git/gdbsupport/common-utils.cc:106
    #6 0x7ff67b37b739 in cli_ui_out::do_message(ui_file_style const&, char const*, char*) C:/gdb/src/gdb.git/gdb/cli-out.c:227
    #7 0x7ff67ce3d030 in ui_out::call_do_message(ui_file_style const&, char const*, ...) C:/gdb/src/gdb.git/gdb/ui-out.c:571
    #8 0x7ff67ce4255a in ui_out::vmessage(ui_file_style const&, char const*, char*) C:/gdb/src/gdb.git/gdb/ui-out.c:740
    bminor#9 0x7ff67ce2c873 in ui_file::vprintf(char const*, char*) C:/gdb/src/gdb.git/gdb/ui-file.c:73
    bminor#10 0x7ff67ce7f83d in gdb_vprintf(ui_file*, char const*, char*) C:/gdb/src/gdb.git/gdb/utils.c:1881
    bminor#11 0x7ff67ce7f83d in vwarning(char const*, char*) C:/gdb/src/gdb.git/gdb/utils.c:181
    bminor#12 0x7ff67f3530eb in warning(char const*, ...) C:/gdb/src/gdb.git/gdbsupport/errors.cc:33
    bminor#13 0x7ff67baed27f in gdb_bfd_close_warning C:/gdb/src/gdb.git/gdb/gdb_bfd.c:437
    bminor#14 0x7ff67baed27f in gdb_bfd_close_or_warn C:/gdb/src/gdb.git/gdb/gdb_bfd.c:646
    bminor#15 0x7ff67baed27f in gdb_bfd_unref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.c:739
    #16 0x7ff68094b6f2 in gdb_bfd_ref_policy::decref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.h:82
    #17 0x7ff68094b6f2 in gdb::ref_ptr<bfd, gdb_bfd_ref_policy>::~ref_ptr() C:/gdb/src/gdb.git/gdbsupport/gdb_ref_ptr.h:91
    #18 0x7ff67badf4d2 in gcore_command C:/gdb/src/gdb.git/gdb/gcore.c:176

0x1259ea503110 is located 16 bytes inside of 4064-byte region [0x1259ea503100,0x1259ea5040e0)
freed by thread T0 here:
    #0 0x7ff6806b1687 in free C:/gcc/src/gcc-14.2.0/libsanitizer/asan/asan_malloc_win.cpp:90
    #1 0x7ff67f2ae807 in objalloc_free C:/gdb/src/gdb.git/libiberty/objalloc.c:187
    #2 0x7ff67d7f56e3 in _bfd_free_cached_info C:/gdb/src/gdb.git/bfd/opncls.c:247
    #3 0x7ff67d7f2782 in _bfd_delete_bfd C:/gdb/src/gdb.git/bfd/opncls.c:180
    #4 0x7ff67d7f5df9 in bfd_close_all_done C:/gdb/src/gdb.git/bfd/opncls.c:960
    #5 0x7ff67d7f62ec in bfd_close C:/gdb/src/gdb.git/bfd/opncls.c:925
    #6 0x7ff67baecd27 in gdb_bfd_close_or_warn C:/gdb/src/gdb.git/gdb/gdb_bfd.c:643
    #7 0x7ff67baecd27 in gdb_bfd_unref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.c:739
    #8 0x7ff68094b6f2 in gdb_bfd_ref_policy::decref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.h:82
    bminor#9 0x7ff68094b6f2 in gdb::ref_ptr<bfd, gdb_bfd_ref_policy>::~ref_ptr() C:/gdb/src/gdb.git/gdbsupport/gdb_ref_ptr.h:91
    bminor#10 0x7ff67badf4d2 in gcore_command C:/gdb/src/gdb.git/gdb/gcore.c:176

It happens because gdb_bfd_close_or_warn uses a bfd-internal name for
the failing-close warning, after the close is finished, and the name
already freed:

static int
gdb_bfd_close_or_warn (struct bfd *abfd)
{
  int ret;
  const char *name = bfd_get_filename (abfd);

  for (asection *sect : gdb_bfd_sections (abfd))
    free_one_bfd_section (sect);

  ret = bfd_close (abfd);

  if (!ret)
    gdb_bfd_close_warning (name,
			   bfd_errmsg (bfd_get_error ()));

  return ret;
}

Fixed by making a copy of the name for the warning.

Approved-By: Andrew Burgess <[email protected]>
farazs-github pushed a commit that referenced this pull request Dec 5, 2024
After the commit:

  commit b9de07a
  Date:   Thu Oct 10 11:37:34 2024 +0100

      gdb: fix handling of DW_AT_entry_pc of inlined subroutines

GDB's buildbot CI testing highlighted this assertion failure:

  (gdb) c
  Continuing.
  ../../binutils-gdb/gdb/block.h:203: internal-error: set_entry_pc: Assertion `start >= this->start () && start < this->end ()' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  ----- Backtrace -----
  FAIL: gdb.base/break-probes.exp: run til our library loads (GDB internal error)

This assertion was in the new function set_entry_pc and is asserting
that the default_entry_pc() value is within the blocks start/end
range.

The default_entry_pc() is the value GDB will use as the entry-pc if
the DWARF doesn't specifically override the entry-pc.  This value is
calculated as:

  1. The start address of the first sub-range within the block, if the
  block has more than 1 range, or

  2. The low address (from DW_AT_low_pc) for the block.

If the block only has a single range then this means the block was
defined with low/high pc attributes (case #2 above).  These low/high
pc values are what block::start() and block::end() return.  This means
that by definition, if the block is continuous, the above assert
cannot trigger as 'start', the default_entry_pc() would be equivalent
to block::start().

This means that, for the assert to trigger, the block must have
multiple ranges, and the first address of the first range is not
within the blocks low/high address range.  This seems wrong.

I inspected the state at the time the assert triggered and discovered
the block's start() address.  Then I removed the assert and restarted
GDB.  I was now able to inspect the blocks at the offending address:

  (gdb) maintenance info blocks 0x7ffff7dddaa4
  Blocks at 0x7ffff7dddaa4:
    from objfile: [(objfile *) 0x44a37f0] /lib64/ld-linux-x86-64.so.2
 [(block *) 0x46b30c0] 0x7ffff7ddd5a0..0x7ffff7dde8a6
    entry pc: 0x7ffff7ddd5a0
    is global block
    symbol count: 4
    is contiguous
  [(block *) 0x46b3020] 0x7ffff7ddd5a0..0x7ffff7dde8a6
    entry pc: 0x7ffff7ddd5a0
    is static block
    symbol count: 9
    is contiguous
  [(block *) 0x46b2f70] 0x7ffff7ddda00..0x7ffff7dddac3
    entry pc: 0x7ffff7ddda00
    function: __GI__dl_find_dso_for_object
    symbol count: 4
    is contiguous
  [(block *) 0x46b2e10] 0x7ffff7dddaa4..0x7ffff7dddac3
    entry pc: 0x7ffff7dddaa4
    inline function: __GI__dl_find_dso_for_object
    symbol count: 5
    is contiguous
  [(block *) 0x46b2a40] 0x7ffff7dddaa4..0x7ffff7dddac3
    entry pc: 0x7ffff7dddaa4
    symbol count: 1
    is contiguous
  [(block *) 0x46b2970] 0x7ffff7dddaa4..0x7ffff7dddac3
    entry pc: 0x7ffff7dddaa4
    symbol count: 2
    address ranges:
      0x7ffff7ddda0e..0x7ffff7ddda77
      0x7ffff7ddda90..0x7ffff7ddda96

I've left everything in for context, but the only really interesting
bit is the very last block, it's low/high range is:

  0x7ffff7dddaa4..0x7ffff7dddac3

but it has separate ranges:

  0x7ffff7ddda0e..0x7ffff7ddda77
  0x7ffff7ddda90..0x7ffff7ddda96

which are all outside the low/high range.  This is what triggers the
assert.  But why does that block exist at all?

What I believe is happening is that we're running into a bug in older
versions of GCC.  The buildbot failure was with an 8.5 gcc, and Tom de
Vries also reported seeing failures when using version 7 and 8 gcc,
but not with gcc 9 and onward.

Looking at the DWARF I can see that the problematic block is created
from this DIE:

  <4><15efb>: Abbrev Number: 83 (DW_TAG_lexical_block)
     <15efc>   DW_AT_abstract_origin: <0x15e9f>
     <15efe>   DW_AT_low_pc      : 0x7ffff7dddaa4
     <15f06>   DW_AT_high_pc     : 31

which links via DW_AT_abstract_origin to:

  <2><15e9f>: Abbrev Number: 80 (DW_TAG_lexical_block)
     <15ea0>   DW_AT_ranges      : 0x38e0
     <15ea4>   DW_AT_sibling     : <0x15eca>

And so we can see that <15efb> has got both low/high pc attributes and
a ranges attribute.

If I widen my checking to parents of DIE <15efb> then I see that they
also have DW_AT_abstract_origin, however, there is something
interesting going on, the parent DIEs are linking to a different DIE
tree than <15efb>.

What I believe is happening is this, we have an abstract instance
tree, this is rooted at a DW_AT_subprogram, and contains all the
blocks, variables, parameters, etc, that you would expect.  As this is
an abstract instance, then there are no low/high pc attributes, and no
ranges attributes in this tree.  This makes sense.

Now elsewhere we have a DW_TAG_subprogram (not
DW_TAG_inlined_subroutine) which links via
DW_AT_abstract_origin to the abstract DW_AT_subprogram.  This case is
documented in the DWARF 5 spec in section 3.3.8.3, and describes an
Out-of-Line Instance of an Inlined Subroutine.  Within this out of
line instance many of the DIE correctly link back, using
DW_AT_abstract_origin to the abstract instance tree.  This tree also
includes the DIE <15e9f>, which is where our problem DIE references.

Now, to really confuse things, within this out-of-line instance we
have a DW_TAG_inlined_subroutine, which is another instance of the
same abstract instance tree!  This would seem to indicate a recursive
call to the inline function, and the compiler, for some reason, needed
to instantiate an out of line instance of this function.

And it is within this nested, inlined subroutine, that the problem DIE
exists.  The problem DIE is referencing the corresponding DIE within
the out of line instance tree, but I am convinced this must be a (long
fixed) GCC bug, and that the problem DIE should be referencing the DIE
within the abstract instance tree.

I'm aware that the above is pretty confusing.  The actual DWARF would
be a around 200 lines long, so I'd like to avoid dumping it in here.
But here's my attempt at representing what's going on in a minimal
example.  The numbers down the side represent the section offset, not
the nesting level, and I've removed any attributes that are not
relevant:

  <1> DW_TAG_subprogram
  <2>   DW_TAG_lexical_block
  <3> DW_TAG_subprogram
        DW_AT_abstract_origin <1>
  <4>   DW_TAG_lexical_block
          DW_AT_ranges ...
  <5>   DW_TAG_inlined_subroutine
          DW_AT_abstract_origin <1>
  <6>     DW_TAG_lexical_block
            DW_AT_abstract_origin <4>
            DW_AT_low_pc ...
            DW_AT_high_pc ...

The lexical block at <6> is linking to <4> when it should be linking
to <2>.

There is one additional thing that we might wonder about, which is,
when calculating the low/high pc range for a block, why does GDB not
make use of the range information and expand the range beyond the
defined low/high values?

The answer to this is in dwarf_get_pc_bounds_ranges_or_highlow_pc in
dwarf/read.c.  This is where the low/high bounds are calculated.  What
we see is that GDB first checks for a low/high attribute pair, and if
that is present, this defines the address range for the block.  Only
if there is no DW_AT_low_pc do we check for the DW_AT_ranges, and use
that to define the extent of the block.  And this makes sense, section
3.5 of the DWARF-5 spec says:

  The lexical block entry may have either a DW_AT_low_pc and DW_AT_high_pc
  pair of attributes or a DW_AT_ranges attribute whose values encode the
  contiguous or non-contiguous address ranges, respectively, of the machine
  instructions generated for the lexical block...

Section 3.5 is specifically about lexical blocks, but the same
wording, about it being either low/high OR ranges is repeated for
other DW_TAG_ types.

So this explains why GDB doesn't use the ranges to expand the problem
blocks ranges; as the first DIE has low/high addresses, these are
used, and the ranges is not consulted.

It is only later in dwarf2_record_block_ranges that we create a range
based off the low/high pc, and then also process the ranges data, this
allows the problem block to exist with ranges that are outside the
low/high range.

To solve this I considered a number of options:

1. Prevent loading certain attributes from an abstract instance.

Section 3.3.8.1 of the DWARF-5 spec talks about which attributes are
appropriate to place in an abstract instance.  Any attribute that
might vary between instances should not appear in an abstract
instance.  DW_AT_ranges is included as an example in the
non-exhaustive list of attributes that should not appear in an
abstract instance.

Currently in dwarf2_attr (dwarf2/read.c), when we see a
DW_AT_abstract_origin attribute, we always follow this to try and find
the attribute we are looking for.  But we could change this function
so that we prevent this following for attributes that we know should
not be looked up in an abstract instance.  This would solve the
problem in this case by preventing us finding the DW_AT_ranges in the
incorrect abstract instance.

2. Filter the ranges.

Having established a blocks low/high address range in
dwarf_get_pc_bounds_ranges_or_highlow_pc, we could allow
dwarf2_record_block_ranges to parse the ranges, but we could reject
any range that extends outside the blocks defined start and end
addresses.

For well behaved DWARF where we have either low/high or ranges, then
the blocks start/end are defined from the range data, and so, by
definition, every range would be acceptable.

But in our problem case we would reject all of the invalid ranges.

This is my least favourite solution as it feels like rejecting the
ranges is tackling the problem too late on.

3. Don't try to parse ranges when we have low/high attributes.

This option involves updating dwarf2_record_block_ranges to match the
behaviour of dwarf_get_pc_bounds_ranges_or_highlow_pc, and, I believe,
to match the DWARF spec: don't try to read range data from
DW_AT_ranges if we have low/high pc attributes.

In our case this solves the issue because the problematic DIE has the
low/high attributes, and it then links to the wrong DIE which happens
to have DW_AT_ranges.  With this change in place we don't even look
for the DW_AT_ranges.

If the problem were reversed, and the initial DIE had DW_AT_ranges,
but the incorrectly referenced DIE had the low/high pc attributes,
we would pick up the wrong addresses, but this wouldn't trigger any
asserts.  The reason is that dwarf_get_pc_bounds_ranges_or_highlow_pc
would also find the low/high addresses from the incorrectly referenced
DIE, and so we would just end up with a block which had the wrong
address ranges, but the block would be self consistent, which is
different to the problem we hit here.

In the end, in this commit I went with solution #3, having
dwarf_get_pc_bounds_ranges_or_highlow_pc and
dwarf2_record_block_ranges be consistent seems sensible.  However, I
do wonder if in the future we might want to explore solution #1 as an
additional safety feature.

With this patch in place I'm able to run the gdb.base/break-probes.exp
without seeing the assert that CI testing highlighted.  I see no
regressions when testing on x86-64 GNU/Linux with gcc  9.3.1.

Note: the diff in this commit looks big, but it's really just me
indenting the code.

Approved-By: Tom Tromey <[email protected]>
farazs-github pushed a commit that referenced this pull request Dec 14, 2024
When building gdb with -fsanitize=thread and running test-case
gdb.base/bg-exec-sigint-bp-cond.exp, I run into:
...
==================^M
WARNING: ThreadSanitizer: signal handler spoils errno (pid=25422)^M
    #0 handler_wrapper gdb/posix-hdep.c:66^M
    #1 decltype ({parm#2}({parm#3}...)) gdb::handle_eintr<>() \
         gdbsupport/eintr.h:67^M
    #2 gdb::waitpid(int, int*, int) gdbsupport/eintr.h:78^M
    #3 run_under_shell gdb/cli/cli-cmds.c:926^M
...

Likewise in:
- tui_sigwinch_handler with test-case gdb.python/tui-window.exp, and
- handle_sighup with test-case gdb.base/quit-live.exp.

Fix this by saving the original errno, and restoring it before returning [1].

Tested on x86_64-linux.

Approved-By: Tom Tromey <[email protected]>

[1] https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html
farazs-github pushed a commit that referenced this pull request Dec 21, 2024
This commit adds support for a `gstack' command which Fedora has
been carrying for many years. gstack is a natural counterpart to
the gcore command. Whereas gcore dumps a core file, gstack prints
stack traces of a running process.

There are many improvements over Fedora's version of this script.
The dependency on procfs is gone; gstack will run anywhere gdb
runs. The only runtime dependencies are bash and awk.

The script includes suggestions from gdb/32325 to include
versioning and help. [If this approach to gdb/32325 is acceptable,
I could propagate the solution to gcore/gdb-add-index.]

I've rewritten the documentation, integrating it into the User Manual.
The manpage is now output using this one source.

Example run (on x86_64 Fedora 40)

$ gstack --help
Usage: gstack [-h|--help] [-v|--version] PID
Print a stack trace of a running program

  -h, --help         Print this message then exit.
  -v, --version      Print version information then exit.
$ gstack -v
GNU gstack (GDB) 16.0.50.20241119-git
$ gstack 12345678
Process 12345678 not found.
$ gstack $(pidof emacs)
Thread 6 (Thread 0x7fd5ec1c06c0 (LWP 2491423) "pool-spawner"):
#0  0x00007fd6015ca3dd in syscall () at /lib64/libc.so.6
#1  0x00007fd60b31eccd in g_cond_wait () at /lib64/libglib-2.0.so.0
#2  0x00007fd60b28a61b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0
#3  0x00007fd60b2f1a03 in g_thread_pool_spawn_thread () at /lib64/libglib-2.0.so.0
#4  0x00007fd60b2f0813 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007fd6015486d7 in start_thread () at /lib64/libc.so.6
#6  0x00007fd6015cc60c in clone3 () at /lib64/libc.so.6
#7  0x0000000000000000 in ??? ()

Thread 5 (Thread 0x7fd5eb9bf6c0 (LWP 2491424) "gmain"):
#0  0x00007fd6015be87d in poll () at /lib64/libc.so.6
#1  0x0000000000000001 in ??? ()
#2  0xffffffff00000001 in ??? ()
#3  0x0000000000000001 in ??? ()
#4  0x000000002104cfd0 in ??? ()
#5  0x00007fd5eb9be320 in ??? ()
#6  0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0

Thread 4 (Thread 0x7fd5eb1be6c0 (LWP 2491425) "gdbus"):
#0  0x00007fd6015be87d in poll () at /lib64/libc.so.6
#1  0x0000000020f9b558 in ??? ()
#2  0xffffffff00000003 in ??? ()
#3  0x0000000000000003 in ??? ()
#4  0x00007fd5d8000b90 in ??? ()
#5  0x00007fd5eb1bd320 in ??? ()
#6  0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0

Thread 3 (Thread 0x7fd5ea9bd6c0 (LWP 2491426) "emacs"):
#0  0x00007fd6015ca3dd in syscall () at /lib64/libc.so.6
#1  0x00007fd60b31eccd in g_cond_wait () at /lib64/libglib-2.0.so.0
#2  0x00007fd60b28a61b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0
#3  0x00007fd60b28a67c in g_async_queue_pop () at /lib64/libglib-2.0.so.0
#4  0x00007fd603f4d0d9 in fc_thread_func () at /lib64/libpangoft2-1.0.so.0
#5  0x00007fd60b2f0813 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#6  0x00007fd6015486d7 in start_thread () at /lib64/libc.so.6
#7  0x00007fd6015cc60c in clone3 () at /lib64/libc.so.6
#8  0x0000000000000000 in ??? ()

Thread 2 (Thread 0x7fd5e9e6d6c0 (LWP 2491427) "dconf worker"):
#0  0x00007fd6015be87d in poll () at /lib64/libc.so.6
#1  0x0000000000000001 in ??? ()
#2  0xffffffff00000001 in ??? ()
#3  0x0000000000000001 in ??? ()
#4  0x00007fd5cc000b90 in ??? ()
#5  0x00007fd5e9e6c320 in ??? ()
#6  0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0

Thread 1 (Thread 0x7fd5fcc45280 (LWP 2491417) "emacs"):
#0  0x00007fd6015c9197 in pselect () at /lib64/libc.so.6
#1  0x0000000000000000 in ??? ()

Since this is essentially a complete rewrite of the original
script and documentation, I've chosen to only keep a 2024 copyright date.

Reviewed-By: Eli Zaretskii <[email protected]>
Approved-By: Tom Tromey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants