Skip to content

[April 24th 2025] Merge changes from upstream #3723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10,000 commits into
base: master
Choose a base branch
from

Conversation

powerboat9
Copy link
Collaborator

This should improve our situation with respect to downstreaming. Any merge should be done with the github default merge method, rather than with a rebase-merge.

@powerboat9 powerboat9 requested a review from dkm April 9, 2025 21:50
@powerboat9
Copy link
Collaborator Author

powerboat9 commented Apr 9, 2025

GitHub appears to be having issues displaying the diff, considering the amount of commits or files involved I'd guess. I'm not sure how we'd fix that, so we might have to just work around it.

@powerboat9
Copy link
Collaborator Author

Looks like one of the tests is failing -- any ideas?

jamborm and others added 27 commits April 14, 2025 14:39
This patch just introduces a form of dumping of widest ints that only
have zeros in the lowest 128 bits so that instead of printing
thousands of f's the output looks like:

       Bits: value = 0xffff, mask = all ones folled by 0xffffffffffffffffffffffffffff0000

and then makes sure we use the function not only to print bits but
also to print masks where values like these can also occur.

gcc/ChangeLog:

2025-03-21  Martin Jambor  <[email protected]>

	* ipa-cp.cc (ipcp_print_widest_int): Also add a truncated form of
	dumping of widest ints which only have zeros in the lowest 128 bits.
	Update the comment.
	(ipcp_bits_lattice::print): Also dump the mask using
	ipcp_print_widest_int.
	(ipcp_store_vr_results): Likewise.
…18785)

This patch revisits the fix for PR 118785 and intead of deducing the
necessary operation type it just uses the value collected and streamed
by an earlier patch.  The main advantage is that we do not rely on
expr_type_first_operand_type_p enumarating all operations.

gcc/ChangeLog:

2025-03-20  Martin Jambor  <[email protected]>

	PR ipa/118785
	* ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Use the stored
	and streamed type of arithmetic pass-through functions.
…R118097)

This patch revisits the fix for PR 118097 and instead of deducing the
necessary operation type it just uses the value collected and streamed
by an earlier patch.

It is bigger than the ones for propagating value ranges and known bits
because we track constants both in parameters themselves and also in
memory they point to or within aggregates, we clone functions for them
and we do fancy things for some types of recursive calls.

In the case of constants in aggregates or passed by reference, the
situation should not change because the code creating jump functions
for them does not allow type-casts, unlike for the plain ones.
However, this patch changes how we handle them for the sake of
consistency and also so that we can try and eliminate this limitation
in the next stage 1.

gcc/ChangeLog:

2025-03-20  Martin Jambor  <[email protected]>

	PR ipa/118097
	* ipa-cp.cc (ipa_get_jf_arith_result): Require res_operand for
	anything except NOP_EXPR or ADDR_EXPR, document it and remove the code
	trying to deduce it.
	(ipa_value_from_jfunc): Use the stored and streamed type of arithmetic
	pass-through functions.
	(ipa_agg_value_from_jfunc): Use the stored and streamed type of
	arithmetic pass-through functions, convert to the type used to store
	the value if necessary.
	(get_val_across_arith_op): New parameter op_type, pass it to
	ipa_get_jf_arith_result.
	(propagate_vals_across_arith_jfunc): New parameter op_type, pass it to
	get_val_across_arith_op.
	(propagate_vals_across_pass_through): Use the stored and streamed type
	of arithmetic pass-through functions.
	(propagate_aggregate_lattice): Likewise.
	(push_agg_values_for_index_from_edge): Use the stored and streamed
	type of arithmetic pass-through functions, convert to the type used to
	store the value if necessary.
Don't use red-zone when there are no caller-saved registers with 32 GPRs
since 128-byte red-zone is too small for 31 GPRs.

gcc/

	PR target/119784
	* config/i386/i386.cc (ix86_using_red_zone): Don't use red-zone
	with 32 GPRs and no caller-saved registers.

gcc/testsuite/

	PR target/119784
	* gcc.target/i386/pr119784a.c: New test.
	* gcc.target/i386/pr119784b.c: Likewise.

Signed-off-by: H.J. Lu <[email protected]>
	* libgcobol.cc (__gg__float64_from_128): Mark literal as float128
	literal.
In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template.  This manifests as
misleading diagnostic context lines when issuing a satisfaction failure
error, e.g.  the below testcase without this patch we emit:
  In substitution of '... void A<int>::f<U>() ... [with U = int]'
and with this patch we emit:
  In substitution of '... void A<int>::f<U>() ... [with U = char]'.

This patch fixes this by passing the original 'args' to push_tinst_level,
which ought to properly correspond to 't'.

	PR c++/99214

gcc/cp/ChangeLog:

	* constraint.cc (satisfy_declaration_constraints): Pass the
	original ARGS to push_tinst_level.

gcc/testsuite/ChangeLog:

	* g++.dg/concepts/diagnostic20.C: New test.

Reviewed-by: Jason Merrill <[email protected]>
This testcase was fixed by r15-3052-gc7b76a076cb2c6ded but is
a testcase that failed in a different fashion and a much older
failure than the one added with r15-3052.

Pushed as obvious after a quick test.

	PR tree-optimization/118476

gcc/testsuite/ChangeLog:

	* gcc.dg/torture/pr118476-1.c: New test.

Signed-off-by: Andrew Pinski <[email protected]>
This moves is_floating_point over to using FLOAT_TYPE_P instead
of manually checking. Note before it would return true for all
COMPLEX_TYPE but complex types' inner type could be integral.

Also fixes up the comment to be in more of the GNU style.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/rust/ChangeLog:

	* rust-gcc.cc (is_floating_point): Use FLOAT_TYPE_P
	instead of manually checking the type.

Signed-off-by: Andrew Pinski <[email protected]>
Just a simple cleanupof the code to use error_operand_p
instead of directly comparing against error_mark_node.

This also moves some cdoe around when dealing with error_operand_p
just to be faster and/or slightly tighten up the code slightly.

gcc/rust/ChangeLog:

	* rust-gcc.cc (Bvariable::get_tree): Use error_operand_p.
	(pointer_type): Likewise.
	(reference_type): Likewise.
	(immutable_type): Likewise.
	(function_type): Likewise.
	(function_type_variadic): Likewise.
	Cleanup the check for receiver.type first.
	(function_ptr_type): Use error_operand_p.
	(fill_in_fields): Likewise.
	(fill_in_array): Likewise.
	(named_type): Likewise.
	(type_size): Likewise.
	(type_alignment): Likewise.
	(type_field_alignment): Likewise.
	(type_field_offset): Likewise.
	(zero_expression): Likewise.
	(float_constant_expression): Likewise.
	(convert_expression): Likewise.
	(struct_field_expression): Likewise.
	(compound_expression): Likewise.
	(conditional_expression): Likewise.
	(negation_expression): Likewise.
	(arithmetic_or_logical_expression): Likewise.
	(arithmetic_or_logical_expression_checked): Likewise.
	(comparison_expression): Likewise.
	(lazy_boolean_expression): Likewise.
	(constructor_expression): Likewise.
	(array_constructor_expression): Likewise.
	(array_index_expression): Likewise.
	(call_expression): Likewise.
	(init_statement): Likewise.
	(assignment_statement): Likewise.
	(return_statement): Likewise.
	(exception_handler_statement): Likewise.
	(if_statement): Likewise.
	(compound_statement): Likewise.
	Tighten up the code, removing t variable.
	(statement_list): Use error_operand_p.
	(block): Likewise.
	(block_add_statements): Likewise.
	(convert_tree): Likewise.
	(global_variable): Likewise.
	(global_variable_set_init): Likewise.
	(local_variable): Likewise.
	(parameter_variable): Likewise.
	(static_chain_variable): Likewise.
	(temporary_variable): Likewise.
	(function): Likewise. Tighten up the code.
	(function_defer_statement): Use error_operand_p.
	(function_set_parameters): Use error_operand_p.
	(write_global_definitions): Use error_operand_p.
	Tighten up the code around the loop.

Signed-off-by: Andrew Pinski <[email protected]>
There are some places inside rust-gcc.cc which are candidates
to use range for instead of iterators directly. This changes
the locations I saw and makes the code slightly more readable.

gcc/rust/ChangeLog:

	PR rust/119341
	* rust-gcc.cc (function_type): Use range fors.
	(function_type_variadic): Likewise.
	(fill_in_fields): Likewise.
	(statement_list): Likewise.
	(block): Likewise.
	(block_add_statements): Likewise.
	(function_set_parameters): Likewise.
	(write_global_definitions): Likewise.

Signed-off-by: Andrew Pinski <[email protected]>
Inside a BLOCK node, all of the variables of the scope/block
are chained together and that connects them to the block.
This just adds a comment to that effect as reading the code
it is not so obvious why they need to be chained together.

gcc/rust/ChangeLog:

	PR rust/119342
	* rust-gcc.cc (block): Add comment on why chaining
	the variables of the scope toether.

Signed-off-by: Andrew Pinski <[email protected]>
…ation

gcc/rust/ChangeLog:

	* typecheck/rust-hir-type-check-expr.cc (is_default_fn): New.
	(emit_ambiguous_resolution_error): New.
	(handle_multiple_candidates): Properly handle multiple candidates in
	the case of specialization.
	(TypeCheckExpr::visit): Call `handle_multiple_candidates`.

gcc/testsuite/ChangeLog:

	* rust/execute/torture/min_specialization2.rs: New test.
	* rust/execute/torture/min_specialization3.rs: New test.
Instead, mark the visitor as dirty and wait for the next round of the fixed point to take care of
them. This avoids issues with module items being loaded while not being stripped yet.

gcc/rust/ChangeLog:

	* resolve/rust-toplevel-name-resolver-2.0.cc (TopLevel::visit): Return if module
	is unloaded.
gcc/rust/ChangeLog:

	* ast/rust-expr.h (class RangeExpr): Add empty outer attributes and allow getting them
	and setting them.
gcc/rust/ChangeLog:

	* ast/rust-ast.h (DelimTokenTree::get_locus): New function.
gcc/rust/ChangeLog:

	* rust-session-manager.cc (Session::compile_crate): Call the visitor later in the pipeline.
gcc/rust/ChangeLog:

	* expand/rust-macro-expand.cc (MacroExpander::match_n_matches): Do not
	insert fragments and substack fragments if the matcher failed.

gcc/testsuite/ChangeLog:

	* rust/compile/macros/mbe/macro-issue3708.rs: New test.
gcc/rust/ChangeLog:

	* expand/rust-macro-expand.cc (MacroExpander::expand_decl_macro): Call into
	TokenTreeDesugar.
	* expand/rust-token-tree-desugar.cc: New file.
	* expand/rust-token-tree-desugar.h: New file.
	* Make-lang.in: Compile them.

gcc/testsuite/ChangeLog:

	* rust/compile/macros/mbe/macro-issue3709-1.rs: New test.
	* rust/compile/macros/mbe/macro-issue3709-2.rs: New test.
gcc/rust/ChangeLog:

	* expand/rust-macro-builtins-format-args.cc (format_args_parse_arguments): Improve safety,
	allow extra commas after end of argument list.

gcc/testsuite/ChangeLog:

	* rust/compile/format_args_extra_comma.rs: New test.
gcc/rust/ChangeLog:

	* checks/errors/rust-const-checker.cc
	(ConstChecker::visit): Visit the enum items of enums.
	* resolve/rust-ast-resolve-item.cc
	(ResolveItem::visit): Resolve enum discriminants during nr1.0.

gcc/testsuite/ChangeLog:

	* rust/compile/enum_discriminant2.rs: New test.

Signed-off-by: Owen Avery <[email protected]>
Addresses PR#117869

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117869

gcc/ChangeLog:

	* doc/install.texi: Add requirements for building gccrs.
gcc/rust/ChangeLog:

	* expand/rust-macro-builtins.cc (MacroBuiltin::builtin_transcribers):
	Add entry for track_caller.
	* util/rust-attribute-values.h: add `TRACK_CALLER` attribute.
	* util/rust-attributes.cc: add `track_caller` attribute definition.

gcc/testsuite/ChangeLog:

	* rust/compile/track_caller.rs: New test.

Signed-off-by: Bhavesh Mandalapu <[email protected]>
gcc/rust/ChangeLog:

	* util/rust-attribute-values.h: Add missing attributes.
	* util/rust-attributes.cc: Likewise.
	* util/rust-attributes.h (enum CompilerPass): Mention adding something for const
	functions.
This causes an assertion failure when compiling core with nr2.0, but should
probably be improved. I'm not sure how this code enables built-in derive
macros to be resolved so this is a temporary fix.

gcc/rust/ChangeLog:

	* resolve/rust-early-name-resolver-2.0.cc (Early::visit_attributes): Remove assertion.
gcc/rust/ChangeLog:

	* util/rust-attribute-values.h: Add RUSTFMT value.
	* util/rust-attributes.cc: Define the attribute.
	* util/rust-attributes.h (enum CompilerPass): Add EXTERNAL variant.
	* expand/rust-macro-builtins.cc: Fix formatting.
gcc/rust/ChangeLog:

	* util/rust-lang-item.h: Add new manually_drop lang item.
	* util/rust-lang-item.cc: Likewise.
…p maybe_complain_about_tail_call [PR119718]

Andrew P. mentioned earlier he'd like to see in the dump files a note
whether it was a failed must tail call or not.
We already print that on the tailc/musttail pass side, because
print_gimple_stmt prints [must tail call] after the musttail calls.
The first hunk below does it for GENERIC CALL_EXPRs too (which is needed
for the expand diagnostics).  That isn't enough though, because the
error on it was done first and then CALL_EXPR_MUST_TAIL_CALL flag was
cleared, so the dump didn't have it anymore.  I've reordered the
dump printing with error, so that it works properly.

2025-04-14  Jakub Jelinek  <[email protected]>

	PR tree-optimization/119718
	* tree-pretty-print.cc (dump_generic_node) <case CALL_EXPR>: Dump
	also CALL_EXPR_MUST_TAIL_CALL flag.
	* calls.cc (maybe_complain_about_tail_call): Emit error about
	CALL_EXPR_MUST_TAIL_CALL only after emitting dump message, not before
	it.
Christophe Lyon and others added 28 commits April 23, 2025 13:15
r14-7202-gc8ec3e1327cb1e added vld1xN and vst1xN intrinsics and some
tests on arm, but didn't enable some existing tests.

Since these tests are shared with aarch64, this patch removes the
'dg-skip-if "unimplemented" { arm*-*-* }' directives and relies on the
advsimd-intrinsics.exp driver to define the appropriate flags and
dg-do-what action.  (A previous patch removed 'dg-do run', and this
patch removes 'dg-options "-O3"' which would override the options
computed by the test driver)

float16 intrinsics require the neon-fp16 FPU, which is possibly
enabled by advsimd-intrinsics.exp, so we include them unconditionally
on aarch64 or if fp16 is enabled on arm.

poly64 intrinsics would require crypto-neon-fp-armv8: the patch
enables the corresponding tests on aarch64 only, since for arm they
are already covered by other tests in gcc.target/arm/simd/.  For some
reason, poly64 tests where missing from x2 and x3 tests, so the patch
adds them as needed.

Tested on aarch64-linux-gnu (no change), arm-linux-gnueabihf (the
additional tests are executed) and various flavors of arm-none-eabi
(the additional tests are compiled-only on M-profile, executed on
A-profile).

	gcc/testsuite/
	PR target/71233
	* gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Enable on arm.
	* gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vld1x4.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vst1x2.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vst1x3.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vst1x4.c: Likewise.
	* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Update.
	* config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Update.
	* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Update.
	* sv.po: Update.
this patch implements costing of truth_value exprs.  I.e.
  a = b < c;
Those seems to be now the most common operations that goes to the addss path
except for in->fp and fp->int conversions.

For integer we use setcc, for FP there is CMccSS and variants which sets the
destination register a s a mast (i.e. -1 on true and 0 on false).  Technically
these needs res&1 to get into 1 on true, 0 on false, but looking on examples
where this is used, it is common that the resulting code is optimized avoiding
need for this (except for cases wehre result is directly saved to memory).
For this reason I am accounting only one sse_op (CMccSS) itself.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Cost truth_value
	exprs.
Currently enabling profile feedback regresses x264 and exchange. In both cases the root of the
issue is that ipa-cp cost model thinks cloning is not relevant when feedback is available
while it clones without feedback.

Consider:

__attribute__ ((used))
int a[1000];

__attribute__ ((noinline))
void
test2(int sz)
{
  for (int i = 0; i < sz; i++)
	  a[i]++;
  asm volatile (""::"m"(a));
}

__attribute__ ((noinline))
void
test1 (int sz)
{
  for (int i = 0; i < 1000; i++)
	  test2(sz);
}
int main()
{
	test1(1000);
	return 0;
}

Here we want to clone call both test1 and test2 and specialize for 1000, but
ipa-cp will not do that, since it will skip call main->test1 as not hot since
it is called just once both with or without profile feedback.
In this simple testcase even without profile feedback we will track that main
is called once.

I think the testcase shows that hotness of call is not that relevant when
deciding whether we want to propagate constants across it.  ipa-cp with IPA
profile can compute overall estimate of time saved (which is existing time
benefit computing time saved per invociation of the function multiplied by
number of executions) and see if result is big enough. An easy check is to
simply call maybe_hot_p on the resulting count.

So this patch makes ipa-cp to consider all calls sites except those known to be
unlikely executed (i.e. run 0 times in train run or known to lead to someting
bad) as interesting, which makes ipa-cp to propagate across them, find cloning
candidates and feed them into good_clonning_oppurtunity.

For this I added cs_interesting_for_ipcp_p which also attempts to do right
thing with partial training.

Now good_clonning_oppurtunity will currently return false, since it will figure
out that the call edge is not very frequent.
It already kind of knows that frequency of call instruction istself is not too
important, but instead of computing overall time saved, it tries to compare it
with param_ipa_cp_profile_count_base percentage of counts of call edges.  I
think this is not very relevant since estimated time saved per call can be
large.  So I dropped this logic and replaced it with simple use of overall
saved time.

Since ipa-cp is not dealing well with the cases where it hits the allowed unit
growth limit, we probably want to be more careful, so I keep existing metric
with this change.

So now we get:

Evaluating opportunities for test1/3.
 - considering value 1000 for param #0 sz (caller_count: 1)
     good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500
     not cloning: time saved is not hot
     good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 (precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, threshold: 500

First call to good_cloning_oppurtunity considers the case where only test1 is
clonned. In this case time saved is 1 (for passing the value around) and since
it is called just once (count_sum) overall time saved is 1 which is not
considered hot and we also get very low evaulation score.

In the second call we consider cloning chain test1->test2.  In this case time
saved is large (12901) since test2 is invoked many times and it is used to
controll the loop.  We still know that the count is 1 but overall time is
129001 which is already considered relevant and we clone.

I also try to do something sensible in case we have calls both with
and without IPA profile (which can happen for comdats where profile got missing
or with LTO if some units were not trained).
Instead of checking whether sum of calls with known profile is nonzero, I keep
track if there are other calls and if so, also try the local heuristics that
is used without profile feedback.

The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding
up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%).

We still get better x264 runtime without profile (86.4s for x264 and 93.8 for exchange).

The main problem I see is that ipa-cp has the global limit for growth of 10%
but does not consider the oppurtunities in priority order.  Consequently if the
limit is hit, randomly some clone oppurtunities are dropped in favour of
others.

I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I get:

orig	new	growth
588677	605385	102.838229
4378	6037	137.894016
484650	494851	102.104818
4111	4111	100.000000
99953	103519	103.567677
106181	114889	108.201091
21389	21597	100.972462
24925	26746	107.305918
15308	23974	156.610922
27354	27906	102.017986
494	494	100.000000
4631	4631	100.000000
863216	872729	101.102042
126604	126604	100.000000
605138	627156	103.638509
4112	4112	100.000000
222006	231293	104.183220
2952	3384	114.634146
37584	39807	105.914751
4111	4111	100.000000
13226	13226	100.000000
4111	4111	100.000000
326215	337396	103.427494
25240	25433	100.764659
64644	65972	102.054328
127223	132300	103.990631
494	494	100.000000

Small units can grow up to 16000 instructions and other units are
large. So there is only one 156% growth hititng limits which is exchange
that has recursive clonning that goes specially.

With profile feedback ipacp basically shuts itself off:

333815	333891	100.022767
2559	2974	116.217272
217576	217581	100.002298
2749	2749	100.000000
64652	64716	100.098992
68416	69707	101.886986
13171	13171	100.000000
11849	11849	100.000000
10519	16180	153.816903
15843	15843	100.000000
231	231	100.000000
3624	3624	100.000000
573385	573386	100.000174
97623	97623	100.000000
295673	295676	100.001015
2750	2750	100.000000
130723	130726	100.002295
2334	2334	100.000000
19313	19313	100.000000
2749	2749	100.000000
517331	517331	100.000000
6707	6707	100.000000
2749	2749	100.000000
193638	193638	100.000000
16425	16425	100.000000
47154	47154	100.000000
96422	96422	100.000000
231	231	100.000000

So we essentially clone only exchange and and mcf (116%)
With patch and no FDO I get:

588677	605385	102.838229
4378	6037	137.894016
484519	494698	102.100846
4111	4111	100.000000
99953	103519	103.567677
106181	114889	108.201091
21389	22632	105.811398
24854	26620	107.105496
15308	23974	156.610922
27354	28039	102.504204
494	494	100.000000
4631	4631	100.000000
4631	4631	100.000000
126604	126630	100.020536
4112	4112	100.000000
222006	231293	104.183220
2952	3384	114.634146
37584	39807	105.914751
2760715	2835539	102.710312
4111	4111	100.000000
13226	13226	100.000000
4111	4111	100.000000
326215	337396	103.427494
25240	25433	100.764659
64644	65972	102.054328
127223	132300	103.990631
494	494	100.000000

which seems essentially same as without patch. However with FDO I get:
333815	350363	104.957237
2559	3345	130.715123
217469	220765	101.515618
485599	488772	100.653420
2749	2749	100.000000
64652	74265	114.868836
68416	87484	127.870674
13171	20656	156.829398
11792	11990	101.679104
10519	17028	161.878506
15843	16119	101.742094
231	231	100.000000
573336	573336	100.000000
97623	97623	100.000000
295497	296208	100.240612
2750	2750	100.000000
130723	133341	102.002708
2334	2334	100.000000
19313	19368	100.284782
2749	2749	100.000000
6707	6755	100.715670
2749	2749	100.000000
193638	194712	100.554643
16425	17377	105.796043
47154	47154	100.000000
96422	96422	100.000000
231	231	100.000000

So here we get 114% and 127 growth in x264 (two differen tbinaries)
56% growht in Deepsjeng, 61% growth in Exchange which all are above
10% cutoff.

Bootstrapped/regtested x86_64-linux.

gcc/ChangeLog:

	* ipa-cp.cc (base_count): Remove.
	(struct caller_statistics): Rename n_hot_calls to n_interesting_calls;
	add called_without_ipa_profile.
	(init_caller_stats): Update.
	(cs_interesting_for_ipcp_p): New function.
	(gather_caller_stats): collect n_interesting_calls and
	called_without_profile.
	(ipcp_cloning_candidate_p): Use n_interesting-calls rather then hot.
	(good_cloning_opportunity_p): Rewrite heuristics when IPA profile is
	present
	(estimate_local_effects): Update.
	(value_topo_info::propagate_effects): Update.
	(compare_edge_profile_counts): Remove.
	(ipcp_propagate_stage): Do not collect base_count.
	(get_info_about_necessary_edges): Record whether function is called
	without profile.
	(decide_about_value): Update.
	(ipa_cp_cc_finalize): Do not initialie base_count.
	* profile-count.cc (profile_count::operator*): New.
	(profile_count::operator*=): New.
	* profile-count.h (profile_count::operator*): Declare
	(profile_count::operator*=): Declare.
	* params.opt: Remove ipa-cp-profile-count-base.
	* doc/invoke.texi: Likewise.
The test fails on pru-unknown-elf with:
   cc1plus: warning: '-fstack-protector' not supported for this target

Even though the compiled functions have the feature disabled using an
attribute, the command line option is still not supported by some targets.

Tested x86_64-pc-linux-gnu and ensured that g++.sum is the same with and
without this patch.

gcc/testsuite/ChangeLog:

	* g++.dg/no-stack-protector-attr-3.C: Require effective target
	fstack_protector.

Signed-off-by: Dimitar Dimitrov <[email protected]>
	* gcc.pot: Regenerate.
…an unbounded array

This patch detects constants ZType, RType, CType being passed to unbounded
arrays and generates an error message highlighting the formal and
actual parameters in error.

gcc/m2/ChangeLog:

	PR modula2/119914
	* gm2-compiler/M2Check.mod (checkConstMeta): Add check for
	Ztype, Rtype and Ctype and unbounded arrays.
	(IsZRCType): New procedure function.
	(isZRC): Add comment.
	* gm2-compiler/M2Quads.mod:
	* gm2-compiler/M2Range.mod (gdbinit): New procedure.
	(BreakWhenRangeCreated): Ditto.
	(CheckBreak): Ditto.
	(InitRange): Call CheckBreak.
	(Init): Add gdbhook and initialize interactive watch point.
	* gm2-compiler/SymbolTable.def (GetNthParamAnyClosest): New
	procedure function.
	* gm2-compiler/SymbolTable.mod (BreakSym): Remove constant.
	(BreakSym): Add Variable.
	(stop): Remove.
	(gdbhook): New procedure.
	(BreakWhenSymCreated): Ditto.
	(CheckBreak): Ditto.
	(NewSym): Call CheckBreak.
	(Init): Add gdbhook and initialize interactive watch point.
	(MakeProcedure): Replace guarded call to stop with CheckBreak.
	(GetNthParamChoice): New procedure function.
	(GetNthParamOrdered): Ditto.
	(GetNthParamAnyClosest): Ditto.
	(GetOuterModuleScope): Ditto.

gcc/testsuite/ChangeLog:

	PR modula2/119914
	* gm2/pim/fail/constintarraybyte.mod: New test.

Signed-off-by: Gaius Mulley <[email protected]>
…VF is 4/2.

Since the upper bits are already cleared by the comparison
instructions.

gcc/ChangeLog:
	PR target/103750
	* config/i386/sse.md (*<avx512>_cmp<mode>3_and15): New define_insn.
	(*<avx512>_ucmp<mode>3_and15): Ditto.
	(*<avx512>_cmp<mode>3_and3): Ditto.
	(*avx512vl_ucmpv2di3_and3): Ditto.
	(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
	Change operands[3] predicate to <cmp_imm_predicate>.
	(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
	Ditto.
	(*<avx512>_cmp<mode>3): Add GET_MODE_NUNITS (<MODE>mode) >= 8
	to the condition.
	(*<avx512>_ucmp<mode>3): Ditto.
	(V48_AVX512VL_4): New mode iterator.
	(VI48_AVX512VL_4): Ditto.
	(V8_AVX512VL_2): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512vl-pr103750-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-3.c: Adjust testcase.
	* gcc.target/i386/avx512f-vpcmpgtuq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpeqq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpequq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpgeq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpgeuq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpgtq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpgtuq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpleq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpleuq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpltq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpltuq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpneqq-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcmpnequq-1.c: Ditto.
… [PR119711]

As noted by Richi on a large testcase, there are unnecessary paddings
in some heavily used dwarf2out.{h,cc} structures on 64-bit hosts.

struct dw_val_node {
        enum dw_val_class          val_class;            /*     0     4 */

        /* XXX 4 bytes hole, try to pack */

        struct addr_table_entry *  val_entry;            /*     8     8 */
        union dw_val_struct_union  v;                    /*    16    16 */

        /* size: 32, cachelines: 1, members: 3 */
        /* sum members: 28, holes: 1, sum holes: 4 */
        /* last cacheline: 32 bytes */
};
struct dw_loc_descr_node {
        dw_loc_descr_ref           dw_loc_next;          /*     0     8 */
        enum dwarf_location_atom   dw_loc_opc:8;         /*     8: 0  4 */
        unsigned int               dtprel:1;             /*     8: 8  4 */
        unsigned int               frame_offset_rel:1;   /*     8: 9  4 */

        /* XXX 22 bits hole, try to pack */

        int                        dw_loc_addr;          /*    12     4 */
        struct dw_val_node         dw_loc_oprnd1;        /*    16    32 */
        struct dw_val_node         dw_loc_oprnd2;        /*    48    32 */

        /* size: 80, cachelines: 2, members: 7 */
        /* sum members: 76 */
        /* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 22 bits */
        /* last cacheline: 16 bytes */
};
struct dw_attr_struct {
        enum dwarf_attribute       dw_attr;              /*     0     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dw_val_node         dw_attr_val;          /*     8    32 */

        /* size: 40, cachelines: 1, members: 2 */
        /* sum members: 36, holes: 1, sum holes: 4 */
        /* last cacheline: 40 bytes */
};

The following patch is an (not very clean admittedly) attempt to decrease
size of dw_loc_descr_node from 80 bytes to 72 and (more importantly)
dw_attr_struct from 40 bytes to 32 by moving the dw_attr member from
dw_attr_struct into dw_attr_val's padding and similarly move
dw_loc_opc/dtprel/frame_offset_rel members into dw_loc_oprnd1 padding
and dw_loc_addr into dw_loc_oprnd2 padding.
All we need to ensure is that nothing tries to copy whole dw_val_node
structs unless it is copied as part of whole dw_loc_descr_node or
dw_attr_struct copy.

To verify that wasn't the case, I've temporarily added a deleted copy ctor
to dw_val_node and then looked at all the errors/warnings caused by that,
and those were just from memcpy/memmove or structure assignments of whole
dw_loc_descr_node/dw_attr_struct.

2025-04-24  Jakub Jelinek  <[email protected]>

	PR debug/119711
	* dwarf2out.h (struct dw_val_node): Add u member.
	(struct dw_loc_descr_node): Remove dw_loc_opc, dtprel,
	frame_offset_rel and dw_loc_addr members.
	(dw_loc_opc, dw_loc_dtprel, dw_loc_frame_offset_rel, dw_loc_addr):
	Define.
	(struct dw_attr_struct): Remove dw_attr member.
	(dw_attr): Define.
	* dwarf2out.cc (loc_descr_equal_p_1): Use dw_loc_dtprel instead of
	dtprel.
	(output_loc_operands, new_addr_loc_descr, loc_checksum,
	loc_checksum_ordered): Likewise.
	(resolve_args_picking_1): Use dw_loc_frame_offset_rel instead of
	frame_offset_rel.
	(loc_list_from_tree_1): Likewise.
	(resolve_addr_in_expr): Use dw_loc_dtprel instead of dtprel.
	(copy_deref_exprloc): Copy val_class, val_entry and v members
	instead of whole dw_loc_oprnd1 and dw_loc_oprnd2.
	(optimize_string_length): Copy val_class, val_entry and v members
	instead of whole dw_attr_val.
	(hash_loc_operands): Use dw_loc_dtprel instead of dtprel.
	(compare_loc_operands, compare_locs): Likewise.
…arts with a directive

This bugfix is for FormatStrings to ensure that in the case of %x, %u the
procedure function PerformFormatString uses Copy rather than Slice to
avoid the case on an upper bound of zero in Slice.  Oddly the %d case
had the correct code.

gcc/m2/ChangeLog:

	PR modula2/119915
	* gm2-libs/FormatStrings.mod (PerformFormatString): Handle
	the %u and %x format specifiers in a similar way to the %d
	specifier.  Avoid using Slice and use Copy instead.

gcc/testsuite/ChangeLog:

	PR modula2/119915
	* gm2/pimlib/run/pass/format2.mod: New test.

Signed-off-by: Gaius Mulley <[email protected]>
…der-for-locality

The handling of an explicit -flto-partition= and -fipa-reorder-for-locality
should be simpler.  No need to have a new default option.  We can use opts_set
to check if -flto-partition is explicitly set and use that information in the
error handling.
Remove -flto-partition=default and update accordingly.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <[email protected]>

gcc/

	* common.opt (LTO_PARTITION_DEFAULT): Delete.
	(flto-partition=): Change default back to balanced.
	* flag-types.h (lto_partition_model): Remove LTO_PARTITION_DEFAULT.
	* opts.cc (validate_ipa_reorder_locality_lto_partition):
	Check opts_set->x_flag_lto_partition instead of LTO_PARTITION_DEFAULT.
	(finish_options): Remove handling of LTO_PARTITION_DEFAULT.

gcc/testsuite/

	* gcc.dg/completion-2.c: Remove check for default.
Add checks for nowait/depend and for checks that the returned
CUDA, CUDA_DRIVER and HIP interop objects actually work.

While the CUDA/CUDA_DRIVER ones are only for Nvidia GPUs, HIP
works on both AMD and Nvidia GPUs; on Nvidia GPUs, it is a
very thin wrapper around CUDA.

For Fortran, only a HIP test has been added - using hipfort.

While libgomp.c-c++-common/interop-2.c always works - even without
GPU - and checks for depend / nowait, all others require that
runtime libraries are found at link (and execution) time:
For Nvidia GPUs, libcuda + libcudart or libcublas,
For AMD GPUs, libamdhip64 or libhipblas.

The header files and hipfort modules do not need to be present as a
fallback has been implemented, but if they are, they get used.

Due to the combinations, the basic 1x C/C++, 4x C and 1x Fortran tests
yield 1x C/C++, 14x C and 4 Fortran run-test files.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp (check_effective_target_openacc_cublas,
	check_effective_target_openacc_cudart): Update description as
	the check requires more.
	(check_effective_target_openacc_libcuda,
	check_effective_target_openacc_libcublas,
	check_effective_target_openacc_libcudart,
	check_effective_target_gomp_hip_header_amd,
	check_effective_target_gomp_hip_header_nvidia,
	check_effective_target_gomp_hipfort_module,
	check_effective_target_gomp_libamdhip64,
	check_effective_target_gomp_libhipblas): New.
	* testsuite/libgomp.c-c++-common/interop-2.c: New test.
	* testsuite/libgomp.c/interop-cublas-full.c: New test.
	* testsuite/libgomp.c/interop-cublas-libonly.c: New test.
	* testsuite/libgomp.c/interop-cuda-full.c: New test.
	* testsuite/libgomp.c/interop-cuda-libonly.c: New test.
	* testsuite/libgomp.c/interop-hip-amd-full.c: New test.
	* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hip-nvidia-full.c: New test.
	* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: New test.
	* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hip.h: New test.
	* testsuite/libgomp.c/interop-hipblas-amd-full.c: New test.
	* testsuite/libgomp.c/interop-hipblas-amd-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: New test.
	* testsuite/libgomp.c/interop-hipblas-nvidia-no-headers.c: New test.
	* testsuite/libgomp.c/interop-hipblas-nvidia-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hipblas.h: New test.
	* testsuite/libgomp.fortran/interop-hip-amd-full.F90: New test.
	* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: New test.
	* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: New test.
	* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: New test.
	* testsuite/libgomp.fortran/interop-hip.h: New test.
…or-locality

This ensures -fno-ipa-reorder-for-locality doesn't complain with an explicit
-flto-partition=.

Signed-off-by: Kyrylo Tkachov <[email protected]>

	* opts.cc (validate_ipa_reorder_locality_lto_partition): Check opts
	instead of opts_set for x_flag_ipa_reorder_for_locality.
	(finish_options): Update call site.
Aaron mentioned in the PR that late in C23 N3124 was adopted and
$@` are now part of basic character set.  The paper has been implemented
in GCC from what I can see, but we should allow for GNU23/2Y $@` in
raw string delimiters as well, like they are allowed for C++26, because
the delimiters can contain anything from basic character set but space,
()\, tab, form-feed, newline and backspace.

2025-04-24  Jakub Jelinek  <[email protected]>

	PR c++/110343
	* lex.cc (lex_raw_string): For C allow $@` in raw string delimiters
	if CPP_OPTION (pfile, low_ucns) i.e. for C23 and later.

	* gcc.dg/raw-string-1.c: New test.
PR119610 is about incorrect CFI output for a stack probe when that
probe is not the initial allocation.  The main aarch64 stack probe
function, aarch64_allocate_and_probe_stack_space, implicitly assumed
that the incoming stack pointer pointed to the top of the frame,
and thus held the CFA.

aarch64_save_callee_saves and aarch64_restore_callee_saves use a
parameter called bytes_below_sp to track how far the stack pointer
is above the base of the static frame.  This patch does the same
thing for aarch64_allocate_and_probe_stack_space.

Also, I noticed that the SVE path was attaching the first CFA note
to the wrong instruction: it was attaching the note to the calculation
of the stack size, rather than to the r11<-sp copy.

gcc/
	PR target/119610
	* config/aarch64/aarch64.cc (aarch64_allocate_and_probe_stack_space):
	Add a bytes_below_sp parameter and use it to calculate the CFA
	offsets.  Attach the first SVE CFA note to the move into the
	associated temporary register.
	(aarch64_allocate_and_probe_stack_space): Update calls accordingly.
	Start out with bytes_per_sp set to the frame size and decrement
	it after each allocation.

gcc/testsuite/
	PR target/119610
	* g++.dg/torture/pr119610.C: New test.
	* g++.target/aarch64/sve/pr119610-sve.C: Likewise.
As a followup to the previous patch for 116954, there's no reason to do
anything in remove_contract_attributes if contracts aren't enabled.

	PR c++/116954

gcc/cp/ChangeLog:

	* contracts.cc (remove_contract_attributes): Return early if
	not enabled.
The existing test is currently testing std::vector. Adapt it for std::deque.

libstdc++-v3/ChangeLog:

	* testsuite/util/replacement_memory_operators.h: Adapt for -fno-exceptions
	context.
	* testsuite/23_containers/deque/capacity/shrink_to_fit.cc: Adapt test
	to check std::deque shrink_to_fit method.

Reviewed-by: Jonathan Wakely <[email protected]>
Reviewed-by: Tomasz Kaminski <[email protected]>
This is all about using the AMD's HIP header files with
__HIP_PLATFORM_NVIDIA__ defined, i.e. HIP with Nvidia/CUDA; in that case,
HIP is a thin layer on top of CUDA.

First, the check_effective_target_gomp_hip_header_nvidia check failed;
to fix it, -Wno-deprecated-declarations was added - and likewise to the
two affected testcases that actually used the HIP headers on Nvidia.

Doing so, the HIP tested was successful but the HIP-BLAS one showed two
issues:

* One seems to be related to include search paths as the HIP header uses
  #include "library_types.h" to include that CUDA header. Seemingly, it
  tried to included (again) the HIP header hip/library_types.h, not the
  CUDA one. I guess, some tweaking of -isystem vs. -I could have
  prevented this, but the simpler workaround was to just explicitly
  include the CUDA one before the HIP header files.

* Once done, everything compiled but linking failed as the association
  between three HIP-BLAS functions and their CUDA-BLAS ones did not
  work. Solution: Just add three #define for mapping them.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp
	(check_effective_target_gomp_hip_header_nvidia): Compile with
	"-Wno-deprecated-declarations".
	* testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise.
	* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: Likewise.
	* testsuite/libgomp.c/interop-hipblas.h: Add workarounds
	when using the HIP headers with __HIP_PLATFORM_NVIDIA__.
The problem here is division by zero, since adjusted 0 > precise 0. Fixed by
using right test.

gcc/ChangeLog:

	PR ipa/119924
	* ipa-cp.cc (update_counts_for_self_gen_clones): Use nonzero_p.
	(update_profiling_info): Likewise.
	(update_specialized_profile): Likewise.
…ers 0 or -1

gcc/ChangeLog:

	PR target/119919
	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Account
	correctly cond_expr and min/max when one of operands is 0 or -1.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr119919.c: New test.
This patch changes the exception processing logic for the calculation of
reference modifications and table subscripts to be more in accordance with
ISO specifications.

It also adjusts the processing of RETURN-CODE when calling routines that
have no CALL ... RETURNING phrase.

gcc/cobol

	* genapi.cc: (initialize_variable_internal): Change TRACE1 formatting.
	(create_and_call): Repair RETURN-CODE processing.
	(mh_source_is_group): Repair run-time IF type comparison.
	(psa_FldLiteralA): Change TRACE1 formatting.
	(parser_symbol_add): Eliminate unnecessary code.
	* genutil.cc: Eliminate SET_EXCEPTION_CODE macro.
	(get_data_offset_dest): Repair set_exception_code logic.
	(get_data_offset_source): Likewise.
	(get_binary_value): Likewise.
	(refer_refmod_length): Likewise.
	(refer_fill_depends): Likewise.
	(refer_offset_dest): Likewise.
	(refer_size_dest): Likewise.
	(refer_offset_source): Likewise.

gcc/testsuite

	* cobol.dg/group1/declarative_1.cob: Adjust for repaired exception logic.
This patch provides autoconf tests for each field used in wraptime.cc
referencing struct tm and struct timeval.

libgm2/ChangeLog:

	PR modula2/115276
	* config.h.in: Regenerate.
	* configure: Regenerate.
	* configure.ac (AC_STRUCT_TIMEZONE): Add.
	(AC_CHECK_MEMBER): Test for struct tm.tm_year.
	(AC_CHECK_MEMBER): Test for struct tm.tm_mon.
	(AC_CHECK_MEMBER): Test for struct tm.tm_mday.
	(AC_CHECK_MEMBER): Test for struct tm.tm_hour.
	(AC_CHECK_MEMBER): Test for struct tm.tm_min.
	(AC_CHECK_MEMBER): Test for struct tm.tm_sec.
	(AC_CHECK_MEMBER): Test for struct tm.tm_year.
	(AC_CHECK_MEMBER): Test for struct tm.tm_yday.
	(AC_CHECK_MEMBER): Test for struct tm.tm_wday.
	(AC_CHECK_MEMBER): Test for struct tm.tm_isdst.
	(AC_CHECK_MEMBER): Test for struct timeval.tv_sec.
	(AC_CHECK_MEMBER): Test for struct timeval.tv_sec.
	(AC_CHECK_MEMBER): Test for struct timeval.tv_usec.
	* libm2iso/wraptime.cc (InitTimeval): Guard against lack
	struct timeval and malloc.
	(InitTimezone): Guard against lack of struct tm.tm_zone
	and malloc.
	(KillTimezone): Ditto.
	(InitTimeval): Guard against lack of struct timeval
	and malloc.
	(KillTimeval): Guard against lack of malloc.
	(settimeofday): Guard against lack of struct tm.tm_zone.
	(GetFractions): Guard against lack of struct timeval.
	(localtime_r): Ditto.
	(GetYear): Guard against lack of struct tm.
	(GetMonth): Ditto.
	(GetDay): Ditto.
	(GetHour): Ditto.
	(GetMinute): Ditto.
	(GetSecond): Ditto.
	(GetSummerTime): Ditto.
	(GetDST): Guards against lack of struct timezone.
	(SetTimezone): Ditto.
	(SetTimeval): Guard against lack of struct tm.

Signed-off-by: Gaius Mulley <[email protected]>
This was approved in Wrocław as LWG 3899.

This avoids creating a new coroutine frame to co_yield the elements of
an lvalue generator.

libstdc++-v3/ChangeLog:

	* include/std/generator (generator::yield_value): Add overload
	taking lvalue element_of view, as per LWG 3899.
	* testsuite/24_iterators/range_generators/lwg3899.cc: New test.

Reviewed-by: Tomasz Kamiński <[email protected]>
Reviewed-by: Arsen Arsenović <[email protected]>
protobuf (and therefore firefox too) currently doesn't build on s390*-linux.
The problem is that it uses [[clang::musttail]] attribute heavily, and in
llvm (IMHO llvm bug) [[clang::musttail]] calls with 5+ arguments on
s390*-linux are silently accepted and result in a normal non-tail call.
In GCC we just reject those because the target hook refuses to tail call it
(IMHO the right behavior).
Now, the reason why that happens is as s390_function_ok_for_sibcall attempts
to explain, the 5th argument (assuming normal <= wordsize integer or pointer
arguments, nothing that needs 2+ registers) is passed in %r6 which is not
call clobbered, so we can't do tail call when we'd have to change content
of that register and then caller would assume %r6 content didn't change and
use it again.
In the protobuf case though, the 5th argument is always passed through
from the caller to the musttail callee unmodified, so one can actually
emit just jg tail_called_function or perhaps tweak some registers but
keep %r6 untouched, and in that case I think it is just fine to tail call
it (at least unless the stack slots used for 6+ argument can't be modified
by the callee in the ABI and nothing checks for that).

So, the following patch checks for this special case, where the argument
which uses %r6 is passed in a single register and it is passed default
definition of SSA_NAME of a PARM_DECL with the same DECL_INCOMING_RTL.

It won't really work at -O0 but should work for -O1 and above, at least when
one doesn't really try to modify the parameter conditionally and hope it will
be optimized away in the end.

2025-04-24  Jakub Jelinek  <[email protected]>
	    Stefan Schulze Frielinghaus  <[email protected]>

	PR target/119873
	* config/s390/s390.cc (s390_call_saved_register_used): Don't return
	true if default definition of PARM_DECL SSA_NAME of the same register
	is passed in call saved register.
	(s390_function_ok_for_sibcall): Adjust comment.

	* gcc.target/s390/pr119873-1.c: New test.
	* gcc.target/s390/pr119873-2.c: New test.
	* gcc.target/s390/pr119873-3.c: New test.
	* gcc.target/s390/pr119873-4.c: New test.
@powerboat9 powerboat9 changed the title [April 9th 2025] Merge changes from upstream [April 24th 2025] Merge changes from upstream Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.