-
Notifications
You must be signed in to change notification settings - Fork 179
[April 24th 2025] Merge changes from upstream #3723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
powerboat9
wants to merge
10,000
commits into
Rust-GCC:master
Choose a base branch
from
powerboat9:merge-3
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GitHub appears to be having issues displaying the diff, considering the amount of commits or files involved I'd guess. I'm not sure how we'd fix that, so we might have to just work around it. |
Looks like one of the tests is failing -- any ideas? |
This patch just introduces a form of dumping of widest ints that only have zeros in the lowest 128 bits so that instead of printing thousands of f's the output looks like: Bits: value = 0xffff, mask = all ones folled by 0xffffffffffffffffffffffffffff0000 and then makes sure we use the function not only to print bits but also to print masks where values like these can also occur. gcc/ChangeLog: 2025-03-21 Martin Jambor <[email protected]> * ipa-cp.cc (ipcp_print_widest_int): Also add a truncated form of dumping of widest ints which only have zeros in the lowest 128 bits. Update the comment. (ipcp_bits_lattice::print): Also dump the mask using ipcp_print_widest_int. (ipcp_store_vr_results): Likewise.
…18785) This patch revisits the fix for PR 118785 and intead of deducing the necessary operation type it just uses the value collected and streamed by an earlier patch. The main advantage is that we do not rely on expr_type_first_operand_type_p enumarating all operations. gcc/ChangeLog: 2025-03-20 Martin Jambor <[email protected]> PR ipa/118785 * ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Use the stored and streamed type of arithmetic pass-through functions.
…R118097) This patch revisits the fix for PR 118097 and instead of deducing the necessary operation type it just uses the value collected and streamed by an earlier patch. It is bigger than the ones for propagating value ranges and known bits because we track constants both in parameters themselves and also in memory they point to or within aggregates, we clone functions for them and we do fancy things for some types of recursive calls. In the case of constants in aggregates or passed by reference, the situation should not change because the code creating jump functions for them does not allow type-casts, unlike for the plain ones. However, this patch changes how we handle them for the sake of consistency and also so that we can try and eliminate this limitation in the next stage 1. gcc/ChangeLog: 2025-03-20 Martin Jambor <[email protected]> PR ipa/118097 * ipa-cp.cc (ipa_get_jf_arith_result): Require res_operand for anything except NOP_EXPR or ADDR_EXPR, document it and remove the code trying to deduce it. (ipa_value_from_jfunc): Use the stored and streamed type of arithmetic pass-through functions. (ipa_agg_value_from_jfunc): Use the stored and streamed type of arithmetic pass-through functions, convert to the type used to store the value if necessary. (get_val_across_arith_op): New parameter op_type, pass it to ipa_get_jf_arith_result. (propagate_vals_across_arith_jfunc): New parameter op_type, pass it to get_val_across_arith_op. (propagate_vals_across_pass_through): Use the stored and streamed type of arithmetic pass-through functions. (propagate_aggregate_lattice): Likewise. (push_agg_values_for_index_from_edge): Use the stored and streamed type of arithmetic pass-through functions, convert to the type used to store the value if necessary.
Don't use red-zone when there are no caller-saved registers with 32 GPRs since 128-byte red-zone is too small for 31 GPRs. gcc/ PR target/119784 * config/i386/i386.cc (ix86_using_red_zone): Don't use red-zone with 32 GPRs and no caller-saved registers. gcc/testsuite/ PR target/119784 * gcc.target/i386/pr119784a.c: New test. * gcc.target/i386/pr119784b.c: Likewise. Signed-off-by: H.J. Lu <[email protected]>
* libgcobol.cc (__gg__float64_from_128): Mark literal as float128 literal.
In the three-parameter version of satisfy_declaration_constraints, when 't' isn't the most general template, then 't' won't correspond with 'args' after we augment the latter via add_outermost_template_args, and so the instantiation context that we push via push_tinst_level isn't quite correct: 'args' is a complete set of template arguments, but 't' is not necessarily the most general template. This manifests as misleading diagnostic context lines when issuing a satisfaction failure error, e.g. the below testcase without this patch we emit: In substitution of '... void A<int>::f<U>() ... [with U = int]' and with this patch we emit: In substitution of '... void A<int>::f<U>() ... [with U = char]'. This patch fixes this by passing the original 'args' to push_tinst_level, which ought to properly correspond to 't'. PR c++/99214 gcc/cp/ChangeLog: * constraint.cc (satisfy_declaration_constraints): Pass the original ARGS to push_tinst_level. gcc/testsuite/ChangeLog: * g++.dg/concepts/diagnostic20.C: New test. Reviewed-by: Jason Merrill <[email protected]>
This testcase was fixed by r15-3052-gc7b76a076cb2c6ded but is a testcase that failed in a different fashion and a much older failure than the one added with r15-3052. Pushed as obvious after a quick test. PR tree-optimization/118476 gcc/testsuite/ChangeLog: * gcc.dg/torture/pr118476-1.c: New test. Signed-off-by: Andrew Pinski <[email protected]>
This moves is_floating_point over to using FLOAT_TYPE_P instead of manually checking. Note before it would return true for all COMPLEX_TYPE but complex types' inner type could be integral. Also fixes up the comment to be in more of the GNU style. Bootstrapped and tested on x86_64-linux-gnu. gcc/rust/ChangeLog: * rust-gcc.cc (is_floating_point): Use FLOAT_TYPE_P instead of manually checking the type. Signed-off-by: Andrew Pinski <[email protected]>
Just a simple cleanupof the code to use error_operand_p instead of directly comparing against error_mark_node. This also moves some cdoe around when dealing with error_operand_p just to be faster and/or slightly tighten up the code slightly. gcc/rust/ChangeLog: * rust-gcc.cc (Bvariable::get_tree): Use error_operand_p. (pointer_type): Likewise. (reference_type): Likewise. (immutable_type): Likewise. (function_type): Likewise. (function_type_variadic): Likewise. Cleanup the check for receiver.type first. (function_ptr_type): Use error_operand_p. (fill_in_fields): Likewise. (fill_in_array): Likewise. (named_type): Likewise. (type_size): Likewise. (type_alignment): Likewise. (type_field_alignment): Likewise. (type_field_offset): Likewise. (zero_expression): Likewise. (float_constant_expression): Likewise. (convert_expression): Likewise. (struct_field_expression): Likewise. (compound_expression): Likewise. (conditional_expression): Likewise. (negation_expression): Likewise. (arithmetic_or_logical_expression): Likewise. (arithmetic_or_logical_expression_checked): Likewise. (comparison_expression): Likewise. (lazy_boolean_expression): Likewise. (constructor_expression): Likewise. (array_constructor_expression): Likewise. (array_index_expression): Likewise. (call_expression): Likewise. (init_statement): Likewise. (assignment_statement): Likewise. (return_statement): Likewise. (exception_handler_statement): Likewise. (if_statement): Likewise. (compound_statement): Likewise. Tighten up the code, removing t variable. (statement_list): Use error_operand_p. (block): Likewise. (block_add_statements): Likewise. (convert_tree): Likewise. (global_variable): Likewise. (global_variable_set_init): Likewise. (local_variable): Likewise. (parameter_variable): Likewise. (static_chain_variable): Likewise. (temporary_variable): Likewise. (function): Likewise. Tighten up the code. (function_defer_statement): Use error_operand_p. (function_set_parameters): Use error_operand_p. (write_global_definitions): Use error_operand_p. Tighten up the code around the loop. Signed-off-by: Andrew Pinski <[email protected]>
There are some places inside rust-gcc.cc which are candidates to use range for instead of iterators directly. This changes the locations I saw and makes the code slightly more readable. gcc/rust/ChangeLog: PR rust/119341 * rust-gcc.cc (function_type): Use range fors. (function_type_variadic): Likewise. (fill_in_fields): Likewise. (statement_list): Likewise. (block): Likewise. (block_add_statements): Likewise. (function_set_parameters): Likewise. (write_global_definitions): Likewise. Signed-off-by: Andrew Pinski <[email protected]>
Inside a BLOCK node, all of the variables of the scope/block are chained together and that connects them to the block. This just adds a comment to that effect as reading the code it is not so obvious why they need to be chained together. gcc/rust/ChangeLog: PR rust/119342 * rust-gcc.cc (block): Add comment on why chaining the variables of the scope toether. Signed-off-by: Andrew Pinski <[email protected]>
…ation gcc/rust/ChangeLog: * typecheck/rust-hir-type-check-expr.cc (is_default_fn): New. (emit_ambiguous_resolution_error): New. (handle_multiple_candidates): Properly handle multiple candidates in the case of specialization. (TypeCheckExpr::visit): Call `handle_multiple_candidates`. gcc/testsuite/ChangeLog: * rust/execute/torture/min_specialization2.rs: New test. * rust/execute/torture/min_specialization3.rs: New test.
Instead, mark the visitor as dirty and wait for the next round of the fixed point to take care of them. This avoids issues with module items being loaded while not being stripped yet. gcc/rust/ChangeLog: * resolve/rust-toplevel-name-resolver-2.0.cc (TopLevel::visit): Return if module is unloaded.
gcc/rust/ChangeLog: * ast/rust-expr.h (class RangeExpr): Add empty outer attributes and allow getting them and setting them.
gcc/rust/ChangeLog: * ast/rust-ast.h (DelimTokenTree::get_locus): New function.
gcc/rust/ChangeLog: * rust-session-manager.cc (Session::compile_crate): Call the visitor later in the pipeline.
gcc/rust/ChangeLog: * expand/rust-macro-expand.cc (MacroExpander::match_n_matches): Do not insert fragments and substack fragments if the matcher failed. gcc/testsuite/ChangeLog: * rust/compile/macros/mbe/macro-issue3708.rs: New test.
gcc/rust/ChangeLog: * expand/rust-macro-expand.cc (MacroExpander::expand_decl_macro): Call into TokenTreeDesugar. * expand/rust-token-tree-desugar.cc: New file. * expand/rust-token-tree-desugar.h: New file. * Make-lang.in: Compile them. gcc/testsuite/ChangeLog: * rust/compile/macros/mbe/macro-issue3709-1.rs: New test. * rust/compile/macros/mbe/macro-issue3709-2.rs: New test.
gcc/rust/ChangeLog: * expand/rust-macro-builtins-format-args.cc (format_args_parse_arguments): Improve safety, allow extra commas after end of argument list. gcc/testsuite/ChangeLog: * rust/compile/format_args_extra_comma.rs: New test.
gcc/rust/ChangeLog: * checks/errors/rust-const-checker.cc (ConstChecker::visit): Visit the enum items of enums. * resolve/rust-ast-resolve-item.cc (ResolveItem::visit): Resolve enum discriminants during nr1.0. gcc/testsuite/ChangeLog: * rust/compile/enum_discriminant2.rs: New test. Signed-off-by: Owen Avery <[email protected]>
Addresses PR#117869 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117869 gcc/ChangeLog: * doc/install.texi: Add requirements for building gccrs.
gcc/rust/ChangeLog: * expand/rust-macro-builtins.cc (MacroBuiltin::builtin_transcribers): Add entry for track_caller. * util/rust-attribute-values.h: add `TRACK_CALLER` attribute. * util/rust-attributes.cc: add `track_caller` attribute definition. gcc/testsuite/ChangeLog: * rust/compile/track_caller.rs: New test. Signed-off-by: Bhavesh Mandalapu <[email protected]>
gcc/rust/ChangeLog: * util/rust-attribute-values.h: Add missing attributes. * util/rust-attributes.cc: Likewise. * util/rust-attributes.h (enum CompilerPass): Mention adding something for const functions.
This causes an assertion failure when compiling core with nr2.0, but should probably be improved. I'm not sure how this code enables built-in derive macros to be resolved so this is a temporary fix. gcc/rust/ChangeLog: * resolve/rust-early-name-resolver-2.0.cc (Early::visit_attributes): Remove assertion.
gcc/rust/ChangeLog: * util/rust-attribute-values.h: Add RUSTFMT value. * util/rust-attributes.cc: Define the attribute. * util/rust-attributes.h (enum CompilerPass): Add EXTERNAL variant. * expand/rust-macro-builtins.cc: Fix formatting.
gcc/rust/ChangeLog: * util/rust-lang-item.h: Add new manually_drop lang item. * util/rust-lang-item.cc: Likewise.
…p maybe_complain_about_tail_call [PR119718] Andrew P. mentioned earlier he'd like to see in the dump files a note whether it was a failed must tail call or not. We already print that on the tailc/musttail pass side, because print_gimple_stmt prints [must tail call] after the musttail calls. The first hunk below does it for GENERIC CALL_EXPRs too (which is needed for the expand diagnostics). That isn't enough though, because the error on it was done first and then CALL_EXPR_MUST_TAIL_CALL flag was cleared, so the dump didn't have it anymore. I've reordered the dump printing with error, so that it works properly. 2025-04-14 Jakub Jelinek <[email protected]> PR tree-optimization/119718 * tree-pretty-print.cc (dump_generic_node) <case CALL_EXPR>: Dump also CALL_EXPR_MUST_TAIL_CALL flag. * calls.cc (maybe_complain_about_tail_call): Emit error about CALL_EXPR_MUST_TAIL_CALL only after emitting dump message, not before it.
r14-7202-gc8ec3e1327cb1e added vld1xN and vst1xN intrinsics and some tests on arm, but didn't enable some existing tests. Since these tests are shared with aarch64, this patch removes the 'dg-skip-if "unimplemented" { arm*-*-* }' directives and relies on the advsimd-intrinsics.exp driver to define the appropriate flags and dg-do-what action. (A previous patch removed 'dg-do run', and this patch removes 'dg-options "-O3"' which would override the options computed by the test driver) float16 intrinsics require the neon-fp16 FPU, which is possibly enabled by advsimd-intrinsics.exp, so we include them unconditionally on aarch64 or if fp16 is enabled on arm. poly64 intrinsics would require crypto-neon-fp-armv8: the patch enables the corresponding tests on aarch64 only, since for arm they are already covered by other tests in gcc.target/arm/simd/. For some reason, poly64 tests where missing from x2 and x3 tests, so the patch adds them as needed. Tested on aarch64-linux-gnu (no change), arm-linux-gnueabihf (the additional tests are executed) and various flavors of arm-none-eabi (the additional tests are compiled-only on M-profile, executed on A-profile). gcc/testsuite/ PR target/71233 * gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Enable on arm. * gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vld1x4.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst1x2.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst1x3.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst1x4.c: Likewise.
* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Update. * config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Update. * config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Update.
* sv.po: Update.
this patch implements costing of truth_value exprs. I.e. a = b < c; Those seems to be now the most common operations that goes to the addss path except for in->fp and fp->int conversions. For integer we use setcc, for FP there is CMccSS and variants which sets the destination register a s a mast (i.e. -1 on true and 0 on false). Technically these needs res&1 to get into 1 on true, 0 on false, but looking on examples where this is used, it is common that the resulting code is optimized avoiding need for this (except for cases wehre result is directly saved to memory). For this reason I am accounting only one sse_op (CMccSS) itself. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Cost truth_value exprs.
Currently enabling profile feedback regresses x264 and exchange. In both cases the root of the issue is that ipa-cp cost model thinks cloning is not relevant when feedback is available while it clones without feedback. Consider: __attribute__ ((used)) int a[1000]; __attribute__ ((noinline)) void test2(int sz) { for (int i = 0; i < sz; i++) a[i]++; asm volatile (""::"m"(a)); } __attribute__ ((noinline)) void test1 (int sz) { for (int i = 0; i < 1000; i++) test2(sz); } int main() { test1(1000); return 0; } Here we want to clone call both test1 and test2 and specialize for 1000, but ipa-cp will not do that, since it will skip call main->test1 as not hot since it is called just once both with or without profile feedback. In this simple testcase even without profile feedback we will track that main is called once. I think the testcase shows that hotness of call is not that relevant when deciding whether we want to propagate constants across it. ipa-cp with IPA profile can compute overall estimate of time saved (which is existing time benefit computing time saved per invociation of the function multiplied by number of executions) and see if result is big enough. An easy check is to simply call maybe_hot_p on the resulting count. So this patch makes ipa-cp to consider all calls sites except those known to be unlikely executed (i.e. run 0 times in train run or known to lead to someting bad) as interesting, which makes ipa-cp to propagate across them, find cloning candidates and feed them into good_clonning_oppurtunity. For this I added cs_interesting_for_ipcp_p which also attempts to do right thing with partial training. Now good_clonning_oppurtunity will currently return false, since it will figure out that the call edge is not very frequent. It already kind of knows that frequency of call instruction istself is not too important, but instead of computing overall time saved, it tries to compare it with param_ipa_cp_profile_count_base percentage of counts of call edges. I think this is not very relevant since estimated time saved per call can be large. So I dropped this logic and replaced it with simple use of overall saved time. Since ipa-cp is not dealing well with the cases where it hits the allowed unit growth limit, we probably want to be more careful, so I keep existing metric with this change. So now we get: Evaluating opportunities for test1/3. - considering value 1000 for param #0 sz (caller_count: 1) good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500 not cloning: time saved is not hot good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 (precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, threshold: 500 First call to good_cloning_oppurtunity considers the case where only test1 is clonned. In this case time saved is 1 (for passing the value around) and since it is called just once (count_sum) overall time saved is 1 which is not considered hot and we also get very low evaulation score. In the second call we consider cloning chain test1->test2. In this case time saved is large (12901) since test2 is invoked many times and it is used to controll the loop. We still know that the count is 1 but overall time is 129001 which is already considered relevant and we clone. I also try to do something sensible in case we have calls both with and without IPA profile (which can happen for comdats where profile got missing or with LTO if some units were not trained). Instead of checking whether sum of calls with known profile is nonzero, I keep track if there are other calls and if so, also try the local heuristics that is used without profile feedback. The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%). We still get better x264 runtime without profile (86.4s for x264 and 93.8 for exchange). The main problem I see is that ipa-cp has the global limit for growth of 10% but does not consider the oppurtunities in priority order. Consequently if the limit is hit, randomly some clone oppurtunities are dropped in favour of others. I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I get: orig new growth 588677 605385 102.838229 4378 6037 137.894016 484650 494851 102.104818 4111 4111 100.000000 99953 103519 103.567677 106181 114889 108.201091 21389 21597 100.972462 24925 26746 107.305918 15308 23974 156.610922 27354 27906 102.017986 494 494 100.000000 4631 4631 100.000000 863216 872729 101.102042 126604 126604 100.000000 605138 627156 103.638509 4112 4112 100.000000 222006 231293 104.183220 2952 3384 114.634146 37584 39807 105.914751 4111 4111 100.000000 13226 13226 100.000000 4111 4111 100.000000 326215 337396 103.427494 25240 25433 100.764659 64644 65972 102.054328 127223 132300 103.990631 494 494 100.000000 Small units can grow up to 16000 instructions and other units are large. So there is only one 156% growth hititng limits which is exchange that has recursive clonning that goes specially. With profile feedback ipacp basically shuts itself off: 333815 333891 100.022767 2559 2974 116.217272 217576 217581 100.002298 2749 2749 100.000000 64652 64716 100.098992 68416 69707 101.886986 13171 13171 100.000000 11849 11849 100.000000 10519 16180 153.816903 15843 15843 100.000000 231 231 100.000000 3624 3624 100.000000 573385 573386 100.000174 97623 97623 100.000000 295673 295676 100.001015 2750 2750 100.000000 130723 130726 100.002295 2334 2334 100.000000 19313 19313 100.000000 2749 2749 100.000000 517331 517331 100.000000 6707 6707 100.000000 2749 2749 100.000000 193638 193638 100.000000 16425 16425 100.000000 47154 47154 100.000000 96422 96422 100.000000 231 231 100.000000 So we essentially clone only exchange and and mcf (116%) With patch and no FDO I get: 588677 605385 102.838229 4378 6037 137.894016 484519 494698 102.100846 4111 4111 100.000000 99953 103519 103.567677 106181 114889 108.201091 21389 22632 105.811398 24854 26620 107.105496 15308 23974 156.610922 27354 28039 102.504204 494 494 100.000000 4631 4631 100.000000 4631 4631 100.000000 126604 126630 100.020536 4112 4112 100.000000 222006 231293 104.183220 2952 3384 114.634146 37584 39807 105.914751 2760715 2835539 102.710312 4111 4111 100.000000 13226 13226 100.000000 4111 4111 100.000000 326215 337396 103.427494 25240 25433 100.764659 64644 65972 102.054328 127223 132300 103.990631 494 494 100.000000 which seems essentially same as without patch. However with FDO I get: 333815 350363 104.957237 2559 3345 130.715123 217469 220765 101.515618 485599 488772 100.653420 2749 2749 100.000000 64652 74265 114.868836 68416 87484 127.870674 13171 20656 156.829398 11792 11990 101.679104 10519 17028 161.878506 15843 16119 101.742094 231 231 100.000000 573336 573336 100.000000 97623 97623 100.000000 295497 296208 100.240612 2750 2750 100.000000 130723 133341 102.002708 2334 2334 100.000000 19313 19368 100.284782 2749 2749 100.000000 6707 6755 100.715670 2749 2749 100.000000 193638 194712 100.554643 16425 17377 105.796043 47154 47154 100.000000 96422 96422 100.000000 231 231 100.000000 So here we get 114% and 127 growth in x264 (two differen tbinaries) 56% growht in Deepsjeng, 61% growth in Exchange which all are above 10% cutoff. Bootstrapped/regtested x86_64-linux. gcc/ChangeLog: * ipa-cp.cc (base_count): Remove. (struct caller_statistics): Rename n_hot_calls to n_interesting_calls; add called_without_ipa_profile. (init_caller_stats): Update. (cs_interesting_for_ipcp_p): New function. (gather_caller_stats): collect n_interesting_calls and called_without_profile. (ipcp_cloning_candidate_p): Use n_interesting-calls rather then hot. (good_cloning_opportunity_p): Rewrite heuristics when IPA profile is present (estimate_local_effects): Update. (value_topo_info::propagate_effects): Update. (compare_edge_profile_counts): Remove. (ipcp_propagate_stage): Do not collect base_count. (get_info_about_necessary_edges): Record whether function is called without profile. (decide_about_value): Update. (ipa_cp_cc_finalize): Do not initialie base_count. * profile-count.cc (profile_count::operator*): New. (profile_count::operator*=): New. * profile-count.h (profile_count::operator*): Declare (profile_count::operator*=): Declare. * params.opt: Remove ipa-cp-profile-count-base. * doc/invoke.texi: Likewise.
The test fails on pru-unknown-elf with: cc1plus: warning: '-fstack-protector' not supported for this target Even though the compiled functions have the feature disabled using an attribute, the command line option is still not supported by some targets. Tested x86_64-pc-linux-gnu and ensured that g++.sum is the same with and without this patch. gcc/testsuite/ChangeLog: * g++.dg/no-stack-protector-attr-3.C: Require effective target fstack_protector. Signed-off-by: Dimitar Dimitrov <[email protected]>
* gcc.pot: Regenerate.
…an unbounded array This patch detects constants ZType, RType, CType being passed to unbounded arrays and generates an error message highlighting the formal and actual parameters in error. gcc/m2/ChangeLog: PR modula2/119914 * gm2-compiler/M2Check.mod (checkConstMeta): Add check for Ztype, Rtype and Ctype and unbounded arrays. (IsZRCType): New procedure function. (isZRC): Add comment. * gm2-compiler/M2Quads.mod: * gm2-compiler/M2Range.mod (gdbinit): New procedure. (BreakWhenRangeCreated): Ditto. (CheckBreak): Ditto. (InitRange): Call CheckBreak. (Init): Add gdbhook and initialize interactive watch point. * gm2-compiler/SymbolTable.def (GetNthParamAnyClosest): New procedure function. * gm2-compiler/SymbolTable.mod (BreakSym): Remove constant. (BreakSym): Add Variable. (stop): Remove. (gdbhook): New procedure. (BreakWhenSymCreated): Ditto. (CheckBreak): Ditto. (NewSym): Call CheckBreak. (Init): Add gdbhook and initialize interactive watch point. (MakeProcedure): Replace guarded call to stop with CheckBreak. (GetNthParamChoice): New procedure function. (GetNthParamOrdered): Ditto. (GetNthParamAnyClosest): Ditto. (GetOuterModuleScope): Ditto. gcc/testsuite/ChangeLog: PR modula2/119914 * gm2/pim/fail/constintarraybyte.mod: New test. Signed-off-by: Gaius Mulley <[email protected]>
…VF is 4/2. Since the upper bits are already cleared by the comparison instructions. gcc/ChangeLog: PR target/103750 * config/i386/sse.md (*<avx512>_cmp<mode>3_and15): New define_insn. (*<avx512>_ucmp<mode>3_and15): Ditto. (*<avx512>_cmp<mode>3_and3): Ditto. (*avx512vl_ucmpv2di3_and3): Ditto. (*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>): Change operands[3] predicate to <cmp_imm_predicate>. (*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2): Ditto. (*<avx512>_cmp<mode>3): Add GET_MODE_NUNITS (<MODE>mode) >= 8 to the condition. (*<avx512>_ucmp<mode>3): Ditto. (V48_AVX512VL_4): New mode iterator. (VI48_AVX512VL_4): Ditto. (V8_AVX512VL_2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-pr103750-1.c: New test. * gcc.target/i386/avx512f-pr96891-3.c: Adjust testcase. * gcc.target/i386/avx512f-vpcmpgtuq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpeqq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpequq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpgeq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpgeuq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpgtq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpgtuq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpleq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpleuq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpltq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpltuq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpneqq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpnequq-1.c: Ditto.
… [PR119711] As noted by Richi on a large testcase, there are unnecessary paddings in some heavily used dwarf2out.{h,cc} structures on 64-bit hosts. struct dw_val_node { enum dw_val_class val_class; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ struct addr_table_entry * val_entry; /* 8 8 */ union dw_val_struct_union v; /* 16 16 */ /* size: 32, cachelines: 1, members: 3 */ /* sum members: 28, holes: 1, sum holes: 4 */ /* last cacheline: 32 bytes */ }; struct dw_loc_descr_node { dw_loc_descr_ref dw_loc_next; /* 0 8 */ enum dwarf_location_atom dw_loc_opc:8; /* 8: 0 4 */ unsigned int dtprel:1; /* 8: 8 4 */ unsigned int frame_offset_rel:1; /* 8: 9 4 */ /* XXX 22 bits hole, try to pack */ int dw_loc_addr; /* 12 4 */ struct dw_val_node dw_loc_oprnd1; /* 16 32 */ struct dw_val_node dw_loc_oprnd2; /* 48 32 */ /* size: 80, cachelines: 2, members: 7 */ /* sum members: 76 */ /* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 22 bits */ /* last cacheline: 16 bytes */ }; struct dw_attr_struct { enum dwarf_attribute dw_attr; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ struct dw_val_node dw_attr_val; /* 8 32 */ /* size: 40, cachelines: 1, members: 2 */ /* sum members: 36, holes: 1, sum holes: 4 */ /* last cacheline: 40 bytes */ }; The following patch is an (not very clean admittedly) attempt to decrease size of dw_loc_descr_node from 80 bytes to 72 and (more importantly) dw_attr_struct from 40 bytes to 32 by moving the dw_attr member from dw_attr_struct into dw_attr_val's padding and similarly move dw_loc_opc/dtprel/frame_offset_rel members into dw_loc_oprnd1 padding and dw_loc_addr into dw_loc_oprnd2 padding. All we need to ensure is that nothing tries to copy whole dw_val_node structs unless it is copied as part of whole dw_loc_descr_node or dw_attr_struct copy. To verify that wasn't the case, I've temporarily added a deleted copy ctor to dw_val_node and then looked at all the errors/warnings caused by that, and those were just from memcpy/memmove or structure assignments of whole dw_loc_descr_node/dw_attr_struct. 2025-04-24 Jakub Jelinek <[email protected]> PR debug/119711 * dwarf2out.h (struct dw_val_node): Add u member. (struct dw_loc_descr_node): Remove dw_loc_opc, dtprel, frame_offset_rel and dw_loc_addr members. (dw_loc_opc, dw_loc_dtprel, dw_loc_frame_offset_rel, dw_loc_addr): Define. (struct dw_attr_struct): Remove dw_attr member. (dw_attr): Define. * dwarf2out.cc (loc_descr_equal_p_1): Use dw_loc_dtprel instead of dtprel. (output_loc_operands, new_addr_loc_descr, loc_checksum, loc_checksum_ordered): Likewise. (resolve_args_picking_1): Use dw_loc_frame_offset_rel instead of frame_offset_rel. (loc_list_from_tree_1): Likewise. (resolve_addr_in_expr): Use dw_loc_dtprel instead of dtprel. (copy_deref_exprloc): Copy val_class, val_entry and v members instead of whole dw_loc_oprnd1 and dw_loc_oprnd2. (optimize_string_length): Copy val_class, val_entry and v members instead of whole dw_attr_val. (hash_loc_operands): Use dw_loc_dtprel instead of dtprel. (compare_loc_operands, compare_locs): Likewise.
…arts with a directive This bugfix is for FormatStrings to ensure that in the case of %x, %u the procedure function PerformFormatString uses Copy rather than Slice to avoid the case on an upper bound of zero in Slice. Oddly the %d case had the correct code. gcc/m2/ChangeLog: PR modula2/119915 * gm2-libs/FormatStrings.mod (PerformFormatString): Handle the %u and %x format specifiers in a similar way to the %d specifier. Avoid using Slice and use Copy instead. gcc/testsuite/ChangeLog: PR modula2/119915 * gm2/pimlib/run/pass/format2.mod: New test. Signed-off-by: Gaius Mulley <[email protected]>
…der-for-locality The handling of an explicit -flto-partition= and -fipa-reorder-for-locality should be simpler. No need to have a new default option. We can use opts_set to check if -flto-partition is explicitly set and use that information in the error handling. Remove -flto-partition=default and update accordingly. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <[email protected]> gcc/ * common.opt (LTO_PARTITION_DEFAULT): Delete. (flto-partition=): Change default back to balanced. * flag-types.h (lto_partition_model): Remove LTO_PARTITION_DEFAULT. * opts.cc (validate_ipa_reorder_locality_lto_partition): Check opts_set->x_flag_lto_partition instead of LTO_PARTITION_DEFAULT. (finish_options): Remove handling of LTO_PARTITION_DEFAULT. gcc/testsuite/ * gcc.dg/completion-2.c: Remove check for default.
Add checks for nowait/depend and for checks that the returned CUDA, CUDA_DRIVER and HIP interop objects actually work. While the CUDA/CUDA_DRIVER ones are only for Nvidia GPUs, HIP works on both AMD and Nvidia GPUs; on Nvidia GPUs, it is a very thin wrapper around CUDA. For Fortran, only a HIP test has been added - using hipfort. While libgomp.c-c++-common/interop-2.c always works - even without GPU - and checks for depend / nowait, all others require that runtime libraries are found at link (and execution) time: For Nvidia GPUs, libcuda + libcudart or libcublas, For AMD GPUs, libamdhip64 or libhipblas. The header files and hipfort modules do not need to be present as a fallback has been implemented, but if they are, they get used. Due to the combinations, the basic 1x C/C++, 4x C and 1x Fortran tests yield 1x C/C++, 14x C and 4 Fortran run-test files. libgomp/ChangeLog: * testsuite/lib/libgomp.exp (check_effective_target_openacc_cublas, check_effective_target_openacc_cudart): Update description as the check requires more. (check_effective_target_openacc_libcuda, check_effective_target_openacc_libcublas, check_effective_target_openacc_libcudart, check_effective_target_gomp_hip_header_amd, check_effective_target_gomp_hip_header_nvidia, check_effective_target_gomp_hipfort_module, check_effective_target_gomp_libamdhip64, check_effective_target_gomp_libhipblas): New. * testsuite/libgomp.c-c++-common/interop-2.c: New test. * testsuite/libgomp.c/interop-cublas-full.c: New test. * testsuite/libgomp.c/interop-cublas-libonly.c: New test. * testsuite/libgomp.c/interop-cuda-full.c: New test. * testsuite/libgomp.c/interop-cuda-libonly.c: New test. * testsuite/libgomp.c/interop-hip-amd-full.c: New test. * testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: New test. * testsuite/libgomp.c/interop-hip-nvidia-full.c: New test. * testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: New test. * testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: New test. * testsuite/libgomp.c/interop-hip.h: New test. * testsuite/libgomp.c/interop-hipblas-amd-full.c: New test. * testsuite/libgomp.c/interop-hipblas-amd-no-hip-header.c: New test. * testsuite/libgomp.c/interop-hipblas-nvidia-full.c: New test. * testsuite/libgomp.c/interop-hipblas-nvidia-no-headers.c: New test. * testsuite/libgomp.c/interop-hipblas-nvidia-no-hip-header.c: New test. * testsuite/libgomp.c/interop-hipblas.h: New test. * testsuite/libgomp.fortran/interop-hip-amd-full.F90: New test. * testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: New test. * testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: New test. * testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: New test. * testsuite/libgomp.fortran/interop-hip.h: New test.
…or-locality This ensures -fno-ipa-reorder-for-locality doesn't complain with an explicit -flto-partition=. Signed-off-by: Kyrylo Tkachov <[email protected]> * opts.cc (validate_ipa_reorder_locality_lto_partition): Check opts instead of opts_set for x_flag_ipa_reorder_for_locality. (finish_options): Update call site.
Aaron mentioned in the PR that late in C23 N3124 was adopted and $@` are now part of basic character set. The paper has been implemented in GCC from what I can see, but we should allow for GNU23/2Y $@` in raw string delimiters as well, like they are allowed for C++26, because the delimiters can contain anything from basic character set but space, ()\, tab, form-feed, newline and backspace. 2025-04-24 Jakub Jelinek <[email protected]> PR c++/110343 * lex.cc (lex_raw_string): For C allow $@` in raw string delimiters if CPP_OPTION (pfile, low_ucns) i.e. for C23 and later. * gcc.dg/raw-string-1.c: New test.
PR119610 is about incorrect CFI output for a stack probe when that probe is not the initial allocation. The main aarch64 stack probe function, aarch64_allocate_and_probe_stack_space, implicitly assumed that the incoming stack pointer pointed to the top of the frame, and thus held the CFA. aarch64_save_callee_saves and aarch64_restore_callee_saves use a parameter called bytes_below_sp to track how far the stack pointer is above the base of the static frame. This patch does the same thing for aarch64_allocate_and_probe_stack_space. Also, I noticed that the SVE path was attaching the first CFA note to the wrong instruction: it was attaching the note to the calculation of the stack size, rather than to the r11<-sp copy. gcc/ PR target/119610 * config/aarch64/aarch64.cc (aarch64_allocate_and_probe_stack_space): Add a bytes_below_sp parameter and use it to calculate the CFA offsets. Attach the first SVE CFA note to the move into the associated temporary register. (aarch64_allocate_and_probe_stack_space): Update calls accordingly. Start out with bytes_per_sp set to the frame size and decrement it after each allocation. gcc/testsuite/ PR target/119610 * g++.dg/torture/pr119610.C: New test. * g++.target/aarch64/sve/pr119610-sve.C: Likewise.
As a followup to the previous patch for 116954, there's no reason to do anything in remove_contract_attributes if contracts aren't enabled. PR c++/116954 gcc/cp/ChangeLog: * contracts.cc (remove_contract_attributes): Return early if not enabled.
The existing test is currently testing std::vector. Adapt it for std::deque. libstdc++-v3/ChangeLog: * testsuite/util/replacement_memory_operators.h: Adapt for -fno-exceptions context. * testsuite/23_containers/deque/capacity/shrink_to_fit.cc: Adapt test to check std::deque shrink_to_fit method. Reviewed-by: Jonathan Wakely <[email protected]> Reviewed-by: Tomasz Kaminski <[email protected]>
This is all about using the AMD's HIP header files with __HIP_PLATFORM_NVIDIA__ defined, i.e. HIP with Nvidia/CUDA; in that case, HIP is a thin layer on top of CUDA. First, the check_effective_target_gomp_hip_header_nvidia check failed; to fix it, -Wno-deprecated-declarations was added - and likewise to the two affected testcases that actually used the HIP headers on Nvidia. Doing so, the HIP tested was successful but the HIP-BLAS one showed two issues: * One seems to be related to include search paths as the HIP header uses #include "library_types.h" to include that CUDA header. Seemingly, it tried to included (again) the HIP header hip/library_types.h, not the CUDA one. I guess, some tweaking of -isystem vs. -I could have prevented this, but the simpler workaround was to just explicitly include the CUDA one before the HIP header files. * Once done, everything compiled but linking failed as the association between three HIP-BLAS functions and their CUDA-BLAS ones did not work. Solution: Just add three #define for mapping them. libgomp/ChangeLog: * testsuite/lib/libgomp.exp (check_effective_target_gomp_hip_header_nvidia): Compile with "-Wno-deprecated-declarations". * testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise. * testsuite/libgomp.c/interop-hipblas-nvidia-full.c: Likewise. * testsuite/libgomp.c/interop-hipblas.h: Add workarounds when using the HIP headers with __HIP_PLATFORM_NVIDIA__.
The problem here is division by zero, since adjusted 0 > precise 0. Fixed by using right test. gcc/ChangeLog: PR ipa/119924 * ipa-cp.cc (update_counts_for_self_gen_clones): Use nonzero_p. (update_profiling_info): Likewise. (update_specialized_profile): Likewise.
…ers 0 or -1 gcc/ChangeLog: PR target/119919 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Account correctly cond_expr and min/max when one of operands is 0 or -1. gcc/testsuite/ChangeLog: * gcc.target/i386/pr119919.c: New test.
This patch changes the exception processing logic for the calculation of reference modifications and table subscripts to be more in accordance with ISO specifications. It also adjusts the processing of RETURN-CODE when calling routines that have no CALL ... RETURNING phrase. gcc/cobol * genapi.cc: (initialize_variable_internal): Change TRACE1 formatting. (create_and_call): Repair RETURN-CODE processing. (mh_source_is_group): Repair run-time IF type comparison. (psa_FldLiteralA): Change TRACE1 formatting. (parser_symbol_add): Eliminate unnecessary code. * genutil.cc: Eliminate SET_EXCEPTION_CODE macro. (get_data_offset_dest): Repair set_exception_code logic. (get_data_offset_source): Likewise. (get_binary_value): Likewise. (refer_refmod_length): Likewise. (refer_fill_depends): Likewise. (refer_offset_dest): Likewise. (refer_size_dest): Likewise. (refer_offset_source): Likewise. gcc/testsuite * cobol.dg/group1/declarative_1.cob: Adjust for repaired exception logic.
This patch provides autoconf tests for each field used in wraptime.cc referencing struct tm and struct timeval. libgm2/ChangeLog: PR modula2/115276 * config.h.in: Regenerate. * configure: Regenerate. * configure.ac (AC_STRUCT_TIMEZONE): Add. (AC_CHECK_MEMBER): Test for struct tm.tm_year. (AC_CHECK_MEMBER): Test for struct tm.tm_mon. (AC_CHECK_MEMBER): Test for struct tm.tm_mday. (AC_CHECK_MEMBER): Test for struct tm.tm_hour. (AC_CHECK_MEMBER): Test for struct tm.tm_min. (AC_CHECK_MEMBER): Test for struct tm.tm_sec. (AC_CHECK_MEMBER): Test for struct tm.tm_year. (AC_CHECK_MEMBER): Test for struct tm.tm_yday. (AC_CHECK_MEMBER): Test for struct tm.tm_wday. (AC_CHECK_MEMBER): Test for struct tm.tm_isdst. (AC_CHECK_MEMBER): Test for struct timeval.tv_sec. (AC_CHECK_MEMBER): Test for struct timeval.tv_sec. (AC_CHECK_MEMBER): Test for struct timeval.tv_usec. * libm2iso/wraptime.cc (InitTimeval): Guard against lack struct timeval and malloc. (InitTimezone): Guard against lack of struct tm.tm_zone and malloc. (KillTimezone): Ditto. (InitTimeval): Guard against lack of struct timeval and malloc. (KillTimeval): Guard against lack of malloc. (settimeofday): Guard against lack of struct tm.tm_zone. (GetFractions): Guard against lack of struct timeval. (localtime_r): Ditto. (GetYear): Guard against lack of struct tm. (GetMonth): Ditto. (GetDay): Ditto. (GetHour): Ditto. (GetMinute): Ditto. (GetSecond): Ditto. (GetSummerTime): Ditto. (GetDST): Guards against lack of struct timezone. (SetTimezone): Ditto. (SetTimeval): Guard against lack of struct tm. Signed-off-by: Gaius Mulley <[email protected]>
This was approved in Wrocław as LWG 3899. This avoids creating a new coroutine frame to co_yield the elements of an lvalue generator. libstdc++-v3/ChangeLog: * include/std/generator (generator::yield_value): Add overload taking lvalue element_of view, as per LWG 3899. * testsuite/24_iterators/range_generators/lwg3899.cc: New test. Reviewed-by: Tomasz Kamiński <[email protected]> Reviewed-by: Arsen Arsenović <[email protected]>
protobuf (and therefore firefox too) currently doesn't build on s390*-linux. The problem is that it uses [[clang::musttail]] attribute heavily, and in llvm (IMHO llvm bug) [[clang::musttail]] calls with 5+ arguments on s390*-linux are silently accepted and result in a normal non-tail call. In GCC we just reject those because the target hook refuses to tail call it (IMHO the right behavior). Now, the reason why that happens is as s390_function_ok_for_sibcall attempts to explain, the 5th argument (assuming normal <= wordsize integer or pointer arguments, nothing that needs 2+ registers) is passed in %r6 which is not call clobbered, so we can't do tail call when we'd have to change content of that register and then caller would assume %r6 content didn't change and use it again. In the protobuf case though, the 5th argument is always passed through from the caller to the musttail callee unmodified, so one can actually emit just jg tail_called_function or perhaps tweak some registers but keep %r6 untouched, and in that case I think it is just fine to tail call it (at least unless the stack slots used for 6+ argument can't be modified by the callee in the ABI and nothing checks for that). So, the following patch checks for this special case, where the argument which uses %r6 is passed in a single register and it is passed default definition of SSA_NAME of a PARM_DECL with the same DECL_INCOMING_RTL. It won't really work at -O0 but should work for -O1 and above, at least when one doesn't really try to modify the parameter conditionally and hope it will be optimized away in the end. 2025-04-24 Jakub Jelinek <[email protected]> Stefan Schulze Frielinghaus <[email protected]> PR target/119873 * config/s390/s390.cc (s390_call_saved_register_used): Don't return true if default definition of PARM_DECL SSA_NAME of the same register is passed in call saved register. (s390_function_ok_for_sibcall): Adjust comment. * gcc.target/s390/pr119873-1.c: New test. * gcc.target/s390/pr119873-2.c: New test. * gcc.target/s390/pr119873-3.c: New test. * gcc.target/s390/pr119873-4.c: New test.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This should improve our situation with respect to downstreaming. Any merge should be done with the github default merge method, rather than with a rebase-merge.