Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular for-next build test #1157

Open
wants to merge 10,000 commits into
base: build
Choose a base branch
from
Open

Regular for-next build test #1157

wants to merge 10,000 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Dec 3, 2024

  1. iommufd: Fix out_fput in iommufd_fault_alloc()

    As fput() calls the file->f_op->release op, where fault obj and ictx are
    getting released, there is no need to release these two after fput() one
    more time, which would result in imbalanced refcounts:
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 48 PID: 2369 at lib/refcount.c:31 refcount_warn_saturate+0x60/0x230
      Call trace:
       refcount_warn_saturate+0x60/0x230 (P)
       refcount_warn_saturate+0x60/0x230 (L)
       iommufd_fault_fops_release+0x9c/0xe0 [iommufd]
      ...
      VFS: Close: file count is 0 (f_op=iommufd_fops [iommufd])
      WARNING: CPU: 48 PID: 2369 at fs/open.c:1507 filp_flush+0x3c/0xf0
      Call trace:
       filp_flush+0x3c/0xf0 (P)
       filp_flush+0x3c/0xf0 (L)
       __arm64_sys_close+0x34/0x98
      ...
      imbalanced put on file reference count
      WARNING: CPU: 48 PID: 2369 at fs/file.c:74 __file_ref_put+0x100/0x138
      Call trace:
       __file_ref_put+0x100/0x138 (P)
       __file_ref_put+0x100/0x138 (L)
       __fput_sync+0x4c/0xd0
    
    Drop those two lines to fix the warnings above.
    
    Cc: [email protected]
    Fixes: 07838f7 ("iommufd: Add iommufd fault object")
    Link: https://patch.msgid.link/r/b5651beb3a6b1adeef26fffac24607353bf67ba1.1733212723.git.nicolinc@nvidia.com
    Signed-off-by: Nicolin Chen <[email protected]>
    Reviewed-by: Yi Liu <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    nicolinc authored and jgunthorpe committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    af7f478 View commit details
    Browse the repository at this point in the history
  2. iommufd/selftest: Cover IOMMU_FAULT_QUEUE_ALLOC in iommufd_fail_nth

    This was missing in the series introducing the fault object. Thus, add it.
    
    Link: https://patch.msgid.link/r/d61b9b7f73276cc8f1aef9602bd35c486917506e.1733212723.git.nicolinc@nvidia.com
    Signed-off-by: Nicolin Chen <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    nicolinc authored and jgunthorpe committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    a8c9df2 View commit details
    Browse the repository at this point in the history
  3. scripts/nsdeps: get 'make nsdeps' working again

    Since commit cdd30eb ("module: Convert symbol namespace to string
    literal"), when MODULE_IMPORT_NS() is missing, 'make nsdeps' inserts
    pointless code:
    
        MODULE_IMPORT_NS("ns");
    
    Here, "ns" is not a namespace, but the variable in the semantic patch.
    It must not be quoted. Instead, a string literal must be passed to
    Coccinelle.
    
    Fixes: cdd30eb ("module: Convert symbol namespace to string literal")
    Signed-off-by: Masahiro Yamada <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    masahir0y authored and torvalds committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    62aa6f2 View commit details
    Browse the repository at this point in the history
  4. doc: module: revert misconversions for MODULE_IMPORT_NS()

    This reverts the misconversions introduced by commit cdd30eb
    ("module: Convert symbol namespace to string literal").
    
    The affected descriptions refer to MODULE_IMPORT_NS() tags in general,
    rather than suggesting the use of the empty string ("") as the
    namespace.
    
    Fixes: cdd30eb ("module: Convert symbol namespace to string literal")
    Signed-off-by: Masahiro Yamada <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    masahir0y authored and torvalds committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    3727b1a View commit details
    Browse the repository at this point in the history
  5. module: Convert default symbol namespace to string literal

    Commit cdd30eb ("module: Convert symbol namespace to string
    literal") only converted MODULE_IMPORT_NS() and EXPORT_SYMBOL_NS(),
    leaving DEFAULT_SYMBOL_NAMESPACE as a macro expansion.
    
    This commit converts DEFAULT_SYMBOL_NAMESPACE in the same way to avoid
    annoyance for the default namespace as well.
    
    Signed-off-by: Masahiro Yamada <[email protected]>
    Reviewed-by: Uwe Kleine-König <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    masahir0y authored and torvalds committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    ceb8bf2 View commit details
    Browse the repository at this point in the history
  6. irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y whe…

    …n compile-testing
    
    Merely enabling compile-testing should not enable additional functionality.
    
    Fixes: 0be58e0 ("irqchip/stm32mp-exti: Allow building as module")
    Signed-off-by: Geert Uytterhoeven <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Link: https://lore.kernel.org/all/ef5ec063b23522058f92087e072419ea233acfe9.1733243115.git.geert+renesas@glider.be
    geertu authored and KAGA-KOKO committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    9151299 View commit details
    Browse the repository at this point in the history
  7. arm64: mm: Fix zone_dma_limit calculation

    Commit ba0fb44 ("dma-mapping: replace zone_dma_bits by
    zone_dma_limit") and subsequent patches changed how zone_dma_limit is
    calculated to allow a reduced ZONE_DMA even when RAM starts above 4GB.
    Commit 122c234 ("arm64: mm: keep low RAM dma zone") further fixed
    this to ensure ZONE_DMA remains below U32_MAX if RAM starts below 4GB,
    especially on platforms that do not have IORT or DT description of the
    device DMA ranges. While zone boundaries calculation was fixed by the
    latter commit, zone_dma_limit, used to determine the GFP_DMA flag in the
    core code, was not updated. This results in excessive use of GFP_DMA and
    unnecessary ZONE_DMA allocations on some platforms.
    
    Update zone_dma_limit to match the actual upper bound of ZONE_DMA.
    
    Fixes: ba0fb44 ("dma-mapping: replace zone_dma_bits by zone_dma_limit")
    Cc: <[email protected]> # 6.12.x
    Reported-by: Yutang Jiang <[email protected]>
    Tested-by: Yutang Jiang <[email protected]>
    Signed-off-by: Yang Shi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [[email protected]: some tweaking of the commit log]
    Signed-off-by: Catalin Marinas <[email protected]>
    Yang Shi authored and ctmarinas committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    56a7087 View commit details
    Browse the repository at this point in the history
  8. iommu/arm-smmu-v3: Improve uAPI comment for IOMMU_HW_INFO_TYPE_ARM_SM…

    …MUV3
    
    Be specific about what fields should be accessed in the idr result and
    give other guidance to the VMM on how it should generate the
    vIDR. Discussion on the list, and review of the qemu implementation
    understood this needs to be clearer and more detailed.
    
    Link: https://patch.msgid.link/r/[email protected]
    Reviewed-by: Kevin Tian <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    jgunthorpe committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    2ca704f View commit details
    Browse the repository at this point in the history
  9. arm64: patching: avoid early page_to_phys()

    When arm64 is configured with CONFIG_DEBUG_VIRTUAL=y, a warning is
    printed from the patching code because patch_map(), e.g.
    
    | ------------[ cut here ]------------
    | WARNING: CPU: 0 PID: 0 at arch/arm64/kernel/patching.c:45 patch_map.constprop.0+0x120/0xd00
    | CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-rc1-00002-ge1a5d6c6be55 #1
    | Hardware name: linux,dummy-virt (DT)
    | pstate: 800003c5 (Nzcv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    | pc : patch_map.constprop.0+0x120/0xd00
    | lr : patch_map.constprop.0+0x120/0xd00
    | sp : ffffa9bb312a79a0
    | x29: ffffa9bb312a79a0 x28: 0000000000000001 x27: 0000000000000001
    | x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000402e8
    | x23: ffffa9bb2c94c1c8 x22: ffffa9bb2c94c000 x21: ffffa9bb222e883c
    | x20: 0000000000000002 x19: ffffc1ffc100ba40 x18: ffffa9bb2cf0f21c
    | x17: 0000000000000006 x16: 0000000000000000 x15: 0000000000000004
    | x14: 1ffff5376625b4ac x13: ffff753766a67fb8 x12: ffff753766919cd1
    | x11: 0000000000000003 x10: 1ffff5376625b4c3 x9 : 1ffff5376625b4af
    | x8 : ffff753766254f0a x7 : 0000000041b58ab3 x6 : ffff753766254f18
    | x5 : ffffa9bb312d9bc0 x4 : 0000000000000000 x3 : ffffa9bb29bd90e4
    | x2 : 0000000000000002 x1 : ffffa9bb312d9bc0 x0 : 0000000000000000
    | Call trace:
    |  patch_map.constprop.0+0x120/0xd00 (P)
    |  patch_map.constprop.0+0x120/0xd00 (L)
    |  __aarch64_insn_write+0xa8/0x120
    |  aarch64_insn_patch_text_nosync+0x4c/0xb8
    |  arch_jump_label_transform_queue+0x7c/0x100
    |  jump_label_update+0x154/0x460
    |  static_key_enable_cpuslocked+0x1d8/0x280
    |  static_key_enable+0x2c/0x48
    |  early_randomize_kstack_offset+0x104/0x168
    |  do_early_param+0xe4/0x148
    |  parse_args+0x3a4/0x838
    |  parse_early_options+0x50/0x68
    |  parse_early_param+0x58/0xe0
    |  setup_arch+0x78/0x1f0
    |  start_kernel+0xa0/0x530
    |  __primary_switched+0x8c/0xa0
    | irq event stamp: 0
    | hardirqs last  enabled at (0): [<0000000000000000>] 0x0
    | hardirqs last disabled at (0): [<0000000000000000>] 0x0
    | softirqs last  enabled at (0): [<0000000000000000>] 0x0
    | softirqs last disabled at (0): [<0000000000000000>] 0x0
    | ---[ end trace 0000000000000000 ]---
    
    The warning has been produced since commit:
    
      3e25d5a ("asm-generic: add an optional pfn_valid check to page_to_phys")
    
    ... which added a pfn_valid() check into page_to_phys(), and at this
    point in boot pfn_valid() will always return false because the vmemmap
    has not yet been initialized and there are no valid mem_sections yet.
    
    Before that commit, the arithmetic performed by page_to_phys() would
    give the expected physical address, though it is somewhat dubious to use
    vmemmap addresses before the vmemmap has been initialized.
    
    Aside from kernel image addresses, all executable code should be
    allocated from execmem (where all allocations will fall within the
    vmalloc area), and so there's no need for the fallback case when
    CONFIG_EXECMEM=n.
    
    Simplify patch_map() accordingly, directly converting kernel image
    addresses and removing the redundant fallback case.
    
    Fixes: 3e25d5a ("asm-generic: add an optional pfn_valid check to page_to_phys")
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: Arnd Bergmann <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Cc: Mike Rapoport <[email protected]>
    Cc: Thomas Huth <[email protected]>
    Cc: Will Deacon <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    Mark Rutland authored and ctmarinas committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    8d09e2d View commit details
    Browse the repository at this point in the history
  10. drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails

    Calling the MMIO_GUARD hypercall from guests which have not been
    enrolled (e.g. because they are running without pvmfw) results in
    -EINVAL being returned. In this case, MMIO_GUARD is not active
    and so we can simply proceed with the normal ioremap() routine.
    
    Don't fail ioremap() if MMIO_GUARD fails; instead WARN_ON_ONCE()
    to highlight that the pvm environment is slightly wonky.
    
    Fixes: 0f12694 ("drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercall")
    Signed-off-by: Will Deacon <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    willdeacon authored and ctmarinas committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    d44679f View commit details
    Browse the repository at this point in the history
  11. MAINTAINERS: Add CCA and pKVM CoCO guest support to the ARM64 entry

    Commits 7999edc ("virt: arm-cca-guest: TSM_REPORT support for
    realm") and a06c3fa ("drivers/virt: pkvm: Add initial support for
    running as a protected guest") added arm64 guest-side support for
    running in CCA and pKVM confidential computing environments
    respectively.
    
    Unfortunately, these changes were not accompanied by a MAINTAINERS
    entry and so aren't automatically picked up by the get_maintainer.pl
    script. Since the initial support was merged via the arm64 tree, extend
    the ARM64 entry to cover the two new directories.
    
    Cc: Marc Zyngier <[email protected]>
    Cc: Oliver Upton <[email protected]>
    Cc: Suzuki K Poulose <[email protected]>
    Signed-off-by: Will Deacon <[email protected]>
    Acked-by: Suzuki K Poulose <[email protected]>
    Acked-by: Catalin Marinas <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    willdeacon authored and ctmarinas committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    9223059 View commit details
    Browse the repository at this point in the history
  12. ice: fix PHY Clock Recovery availability check

    To check if PHY Clock Recovery mechanic is available for a device, there
    is a need to verify if given PHY is available within the netlist, but the
    netlist node type used for the search is wrong, also the search context
    shall be specified.
    
    Modify the search function to allow specifying the context in the
    search.
    
    Use the PHY node type instead of CLOCK CONTROLLER type, also use proper
    search context which for PHY search is PORT, as defined in E810
    Datasheet [1] ('3.3.8.2.4 Node Part Number and Node Options (0x0003)' and
    'Table 3-105. Program Topology Device NVM Admin Command').
    
    [1] https://cdrdv2.intel.com/v1/dl/getContent/613875?explicitVersion=true
    
    Fixes: 91e43ca ("ice: fix linking when CONFIG_PTP_1588_CLOCK=n")
    Reviewed-by: Aleksandr Loktionov <[email protected]>
    Signed-off-by: Arkadiusz Kubalewski <[email protected]>
    Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    kubalewski authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    01fd68e View commit details
    Browse the repository at this point in the history
  13. ice: fix PHY timestamp extraction for ETH56G

    Fix incorrect PHY timestamp extraction for ETH56G.
    It's better to use FIELD_PREP() than manual shift.
    
    Fixes: 7cab44f ("ice: Introduce ETH56G PHY model for E825C products")
    Reviewed-by: Przemek Kitszel <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Signed-off-by: Przemyslaw Korba <[email protected]>
    Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    Przekorb authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    3214fae View commit details
    Browse the repository at this point in the history
  14. ice: Fix NULL pointer dereference in switchdev

    Commit 608a5c0 ("virtchnl: support queue rate limit and quanta
    size configuration") introduced new virtchnl ops:
    - get_qos_caps
    - cfg_q_bw
    - cfg_q_quanta
    
    New ops were added to ice_virtchnl_dflt_ops, in
    commit 0153077 ("ice: Support VF queue rate limit and quanta
    size configuration"), but not to the ice_virtchnl_repr_ops. Because
    of that, if we get one of those messages in switchdev mode we end up
    with NULL pointer dereference:
    
    [ 1199.794701] BUG: kernel NULL pointer dereference, address: 0000000000000000
    [ 1199.794804] Workqueue: ice ice_service_task [ice]
    [ 1199.794878] RIP: 0010:0x0
    [ 1199.795027] Call Trace:
    [ 1199.795033]  <TASK>
    [ 1199.795039]  ? __die+0x20/0x70
    [ 1199.795051]  ? page_fault_oops+0x140/0x520
    [ 1199.795064]  ? exc_page_fault+0x7e/0x270
    [ 1199.795074]  ? asm_exc_page_fault+0x22/0x30
    [ 1199.795086]  ice_vc_process_vf_msg+0x6e5/0xd30 [ice]
    [ 1199.795165]  __ice_clean_ctrlq+0x734/0x9d0 [ice]
    [ 1199.795207]  ice_service_task+0xccf/0x12b0 [ice]
    [ 1199.795248]  process_one_work+0x21a/0x620
    [ 1199.795260]  worker_thread+0x18d/0x330
    [ 1199.795269]  ? __pfx_worker_thread+0x10/0x10
    [ 1199.795279]  kthread+0xec/0x120
    [ 1199.795288]  ? __pfx_kthread+0x10/0x10
    [ 1199.795296]  ret_from_fork+0x2d/0x50
    [ 1199.795305]  ? __pfx_kthread+0x10/0x10
    [ 1199.795312]  ret_from_fork_asm+0x1a/0x30
    [ 1199.795323]  </TASK>
    
    Fixes: 0153077 ("ice: Support VF queue rate limit and quanta size configuration")
    Reviewed-by: Przemek Kitszel <[email protected]>
    Reviewed-by: Michal Swiatkowski <[email protected]>
    Signed-off-by: Wojciech Drewek <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Tested-by: Sujai Buvaneswaran <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    WojDrew authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    9ee87d2 View commit details
    Browse the repository at this point in the history
  15. ice: Fix VLAN pruning in switchdev mode

    In switchdev mode the uplink VSI should receive all unmatched packets,
    including VLANs. Therefore, VLAN pruning should be disabled if uplink is
    in switchdev mode. It is already being done in ice_eswitch_setup_env(),
    however the addition of ice_up() in commit 44ba608 ("ice: do
    switchdev slow-path Rx using PF VSI") caused VLAN pruning to be
    re-enabled after disabling it.
    
    Add a check to ice_set_vlan_filtering_features() to ensure VLAN
    filtering will not be enabled if uplink is in switchdev mode. Note that
    ice_is_eswitch_mode_switchdev() is being used instead of
    ice_is_switchdev_running(), as the latter would only return true after
    the whole switchdev setup completes.
    
    Fixes: 44ba608 ("ice: do switchdev slow-path Rx using PF VSI")
    Reviewed-by: Michal Swiatkowski <[email protected]>
    Signed-off-by: Marcin Szycik <[email protected]>
    Tested-by: Priya Singh <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Marcin Szycik authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    761e0be View commit details
    Browse the repository at this point in the history
  16. idpf: set completion tag for "empty" bufs associated with a packet

    Commit d9028db ("idpf: convert to libeth Tx buffer completion")
    inadvertently removed code that was necessary for the tx buffer cleaning
    routine to iterate over all buffers associated with a packet.
    
    When a frag is too large for a single data descriptor, it will be split
    across multiple data descriptors. This means the frag will span multiple
    buffers in the buffer ring in order to keep the descriptor and buffer
    ring indexes aligned. The buffer entries in the ring are technically
    empty and no cleaning actions need to be performed. These empty buffers
    can precede other frags associated with the same packet. I.e. a single
    packet on the buffer ring can look like:
    
    	buf[0]=skb0.frag0
    	buf[1]=skb0.frag1
    	buf[2]=empty
    	buf[3]=skb0.frag2
    
    The cleaning routine iterates through these buffers based on a matching
    completion tag. If the completion tag is not set for buf2, the loop will
    end prematurely. Frag2 will be left uncleaned and next_to_clean will be
    left pointing to the end of packet, which will break the cleaning logic
    for subsequent cleans. This consequently leads to tx timeouts.
    
    Assign the empty bufs the same completion tag for the packet to ensure
    the cleaning routine iterates over all of the buffers associated with
    the packet.
    
    Fixes: d9028db ("idpf: convert to libeth Tx buffer completion")
    Signed-off-by: Joshua Hay <[email protected]>
    Acked-by: Alexander Lobakin <[email protected]>
    Reviewed-by: Madhu chittim <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Tested-by: Krishneil Singh <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    jahay1 authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    4c69c77 View commit details
    Browse the repository at this point in the history
  17. ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5

    Commit 339f289 ("ixgbevf: Add support for new mailbox communication
    between PF and VF") added support for v1.5 of the PF to VF mailbox
    communication API. This commit mistakenly enabled IPSEC offload for API
    v1.5.
    
    No implementation of the v1.5 API has support for IPSEC offload. This
    offload is only supported by the Linux PF as mailbox API v1.4. In fact, the
    v1.5 API is not implemented in any Linux PF.
    
    Attempting to enable IPSEC offload on a PF which supports v1.5 API will not
    work. Only the Linux upstream ixgbe and ixgbevf support IPSEC offload, and
    only as part of the v1.4 API.
    
    Fix the ixgbevf Linux driver to stop attempting IPSEC offload when
    the mailbox API does not support it.
    
    The existing API design choice makes it difficult to support future API
    versions, as other non-Linux hosts do not implement IPSEC offload. If we
    add support for v1.5 to the Linux PF, then we lose support for IPSEC
    offload.
    
    A full solution likely requires a new mailbox API with a proper negotiation
    to check that IPSEC is actually supported by the host.
    
    Fixes: 339f289 ("ixgbevf: Add support for new mailbox communication between PF and VF")
    Signed-off-by: Jacob Keller <[email protected]>
    Reviewed-by: Przemek Kitszel <[email protected]>
    Tested-by: Rafal Romanowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    jacob-keller authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    d072531 View commit details
    Browse the repository at this point in the history
  18. ixgbe: downgrade logging of unsupported VF API version to debug

    The ixgbe PF driver logs an info message when a VF attempts to negotiate an
    API version which it does not support:
    
      VF 0 requested invalid api version 6
    
    The ixgbevf driver attempts to load with mailbox API v1.5, which is
    required for best compatibility with other hosts such as the ESX VMWare PF.
    
    The Linux PF only supports API v1.4, and does not currently have support
    for the v1.5 API.
    
    The logged message can confuse users, as the v1.5 API is valid, but just
    happens to not currently be supported by the Linux PF.
    
    Downgrade the info message to a debug message, and fix the language to
    use 'unsupported' instead of 'invalid' to improve message clarity.
    
    Long term, we should investigate whether the improvements in the v1.5 API
    make sense for the Linux PF, and if so implement them properly. This may
    require yet another API version to resolve issues with negotiating IPSEC
    offload support.
    
    Fixes: 339f289 ("ixgbevf: Add support for new mailbox communication between PF and VF")
    Reported-by: Yifei Liu <[email protected]>
    Link: https://lore.kernel.org/intel-wired-lan/[email protected]/
    Signed-off-by: Jacob Keller <[email protected]>
    Reviewed-by: Przemek Kitszel <[email protected]>
    Tested-by: Rafal Romanowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    jacob-keller authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    15915b4 View commit details
    Browse the repository at this point in the history
  19. ixgbe: Correct BASE-BX10 compliance code

    SFF-8472 (section 5.4 Transceiver Compliance Codes) defines bit 6 as
    BASE-BX10. Bit 6 means a value of 0x40 (decimal 64).
    
    The current value in the source code is 0x64, which appears to be a
    mix-up of hex and decimal values. A value of 0x64 (binary 01100100)
    incorrectly sets bit 2 (1000BASE-CX) and bit 5 (100BASE-FX) as well.
    
    Fixes: 1b43e0d ("ixgbe: Add 1000BASE-BX support")
    Signed-off-by: Tore Amundsen <[email protected]>
    Reviewed-by: Paul Menzel <[email protected]>
    Acked-by: Ernesto Castellotti <[email protected]>
    Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    toreamun authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    f72ce14 View commit details
    Browse the repository at this point in the history
  20. igb: Fix potential invalid memory access in igb_init_module()

    The pci_register_driver() can fail and when this happened, the dca_notifier
    needs to be unregistered, otherwise the dca_notifier can be called when
    igb fails to install, resulting to invalid memory access.
    
    Fixes: bbd98fe ("igb: Fix DCA errors and do not use context index for 82576")
    Signed-off-by: Yuan Can <[email protected]>
    Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    Yuan Can authored and anguy11 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    0566f83 View commit details
    Browse the repository at this point in the history
  21. Merge tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs…

    …/xfs-linux
    
    Pull xfs fixes from Carlos Maiolino:
    
     - Use xchg() in xlog_cil_insert_pcp_aggregate()
    
     - Fix ABBA deadlock on a race between mount and log shutdown
    
     - Fix quota softlimit incoherency on delalloc
    
     - Fix sparse inode limits on runt AG
    
     - remove unknown compat feature checks in SB write valdation
    
     - Eliminate a lockdep false positive
    
    * tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
      xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay
      xfs: Use xchg() in xlog_cil_insert_pcp_aggregate()
      xfs: prevent mount and log shutdown race
      xfs: delalloc and quota softlimit timers are incoherent
      xfs: fix sparse inode limits on runt AG
      xfs: remove unknown compat feature check in superblock write validation
      xfs: eliminate lockdep false positives in xfs_attr_shortform_list
    torvalds committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    9141c5d View commit details
    Browse the repository at this point in the history
  22. Merge tag 'fs_for_v6.13-rc2' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/jack/linux-fs
    
    Pull quota and udf fixes from Jan Kara:
     "Two small UDF fixes for better handling of corrupted filesystem and a
      quota fix to fix handling of filesystem freezing"
    
    * tag 'fs_for_v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
      udf: Verify inode link counts before performing rename
      udf: Skip parent dir link count update if corrupted
      quota: flush quota_release_work upon quota writeback
    torvalds committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    3d24694 View commit details
    Browse the repository at this point in the history
  23. nvme-pci: remove two deallocate zeroes quirks

    The quirk was initially used as a signal to set the discard_zeroes_data
    queue limit because there were some use cases that relied on that
    behavior. The queue limit no longer exists as every user of it has been
    converted to use the write zeroes operation instead.
    
    The quirk now means to use a discard command as an alias to a write
    zeroes request. Two of the devices previously using the quirk support
    the write zeroes command directly, so these don't need or want to use
    discard when the desired operation is to write zeroes.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    keithbusch committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    b0de545 View commit details
    Browse the repository at this point in the history
  24. Merge tag 'for-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
    
     - add lockdep annotations for io_uring/encoded read integration, inode
       lock is held when returning to userspace
    
     - properly reflect experimental config option to sysfs
    
     - handle NULL root in case the rescue mode accepts invalid/damaged tree
       roots (rescue=ibadroot)
    
     - regression fix of a deadlock between transaction and extent locks
    
     - fix pending bio accounting bug in encoded read ioctl
    
     - fix NOWAIT mode when checking references for NOCOW files
    
     - fix use-after-free in a rb-tree cleanup in ref-verify debugging tool
    
    * tag 'for-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: fix lockdep warnings on io_uring encoded reads
      btrfs: ref-verify: fix use-after-free after invalid ref action
      btrfs: add a sanity check for btrfs root in btrfs_search_slot()
      btrfs: don't loop for nowait writes when checking for cross references
      btrfs: sysfs: advertise experimental features only if CONFIG_BTRFS_EXPERIMENTAL=y
      btrfs: fix deadlock between transaction commits and extent locks
      btrfs: fix use-after-free in btrfs_encoded_read_endio()
    torvalds committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    feffde6 View commit details
    Browse the repository at this point in the history
  25. btrfs: fix mount failure due to remount races

    [BUG]
    The following reproducer can cause btrfs mount to fail:
    
      dev="/dev/test/scratch1"
      mnt1="/mnt/test"
      mnt2="/mnt/scratch"
    
      mkfs.btrfs -f $dev
      mount $dev $mnt1
      btrfs subvolume create $mnt1/subvol1
      btrfs subvolume create $mnt1/subvol2
      umount $mnt1
    
      mount $dev $mnt1 -o subvol=subvol1
      while mount -o remount,ro $mnt1; do mount -o remount,rw $mnt1; done &
      bg=$!
    
      while mount $dev $mnt2 -o subvol=subvol2; do umount $mnt2; done
    
      kill $bg
      wait
      umount -R $mnt1
      umount -R $mnt2
    
    The script will fail with the following error:
    
      mount: /mnt/scratch: /dev/mapper/test-scratch1 already mounted on /mnt/test.
            dmesg(1) may have more information after failed mount system call.
      umount: /mnt/test: target is busy.
      umount: /mnt/scratch/: not mounted
    
    And there is no kernel error message.
    
    [CAUSE]
    During the btrfs mount, to support mounting different subvolumes with
    different RO/RW flags, we need to detect that and retry if needed:
    
      Retry with matching RO flags if the initial mount fail with -EBUSY.
    
    The problem is, during that retry we do not hold any super block lock
    (s_umount), this means there can be a remount process changing the RO
    flags of the original fs super block.
    
    If so, we can have an EBUSY error during retry.  And this time we treat
    any failure as an error, without any retry and cause the above EBUSY
    mount failure.
    
    [FIX]
    The current retry behavior is racy because we do not have a super block
    thus no way to hold s_umount to prevent the race with remount.
    
    Solve the root problem by allowing fc->sb_flags to mismatch from the
    sb->s_flags at btrfs_get_tree_super().
    
    Then at the re-entry point btrfs_get_tree_subvol(), manually check the
    fc->s_flags against sb->s_flags, if it's a RO->RW mismatch, then
    reconfigure with s_umount lock hold.
    
    Reported-by: Enno Gotthold <[email protected]>
    Reported-by: Fabian Vogt <[email protected]>
    [ Special thanks for the reproducer and early analysis pointing to btrfs. ]
    Fixes: f044b31 ("btrfs: handle the ro->rw transition for mounting different subvolumes")
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1231836
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    951a3f5 View commit details
    Browse the repository at this point in the history
  26. btrfs: fix missing snapshot drew unlock when root is dead during swap…

    … activation
    
    When activating a swap file we acquire the root's snapshot drew lock and
    then check if the root is dead, failing and returning with -EPERM if it's
    dead but without unlocking the root's snapshot lock. Fix this by adding
    the missing unlock.
    
    Fixes: 60021bd ("btrfs: prevent subvol with swapfile from being deleted")
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    9c803c4 View commit details
    Browse the repository at this point in the history
  27. netfilter: nft_inner: incorrect percpu area handling under softirq

    Softirq can interrupt ongoing packet from process context that is
    walking over the percpu area that contains inner header offsets.
    
    Disable bh and perform three checks before restoring the percpu inner
    header offsets to validate that the percpu area is valid for this
    skbuff:
    
    1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff
       has already been parsed before for inner header fetching to
       register.
    
    2) Validate that the percpu area refers to this skbuff using the
       skbuff pointer as a cookie. If there is a cookie mismatch, then
       this skbuff needs to be parsed again.
    
    3) Finally, validate if the percpu area refers to this tunnel type.
    
    Only after these three checks the percpu area is restored to a on-stack
    copy and bh is enabled again.
    
    After inner header fetching, the on-stack copy is stored back to the
    percpu area.
    
    Fixes: 3a07327 ("netfilter: nft_inner: support for inner tunnel header matching")
    Reported-by: [email protected]
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    ummakynes committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    7b1d83d View commit details
    Browse the repository at this point in the history
  28. samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS

    Commit [0] breaks samples/bpf build:
    
        $ make M=samples/bpf
        ...
        make -C /path/to/kernel/samples/bpf/../../tools/lib/bpf \
         ...
         EXTRA_CFLAGS=" \
         ...
         -fsanitize=bounds \
         -I/path/to/kernel/usr/include \
         ...
        	/path/to/kernel/samples/bpf/libbpf/libbpf.a install_headers
          CC      /path/to/kernel/samples/bpf/libbpf/staticobjs/libbpf.o
        In file included from libbpf.c:29:
        /path/to/kernel/tools/include/linux/err.h:35:8: error: 'inline' can only appear on functions
           35 | static inline void * __must_check ERR_PTR(long error_)
              |        ^
    
    The error is caused by `objtree` variable changing definition from `.`
    (dot) to an absolute path:
    - The variable TPROGS_CFLAGS is constructed as follows:
      ...
      TPROGS_CFLAGS += -I$(objtree)/usr/include
    - It is passed as EXTRA_CFLAGS for libbpf compilation:
      $(LIBBPF): ...
        ...
    	$(MAKE) -C $(LIBBPF_SRC) RM='rm -rf' EXTRA_CFLAGS="$(TPROGS_CFLAGS)"
    - Before commit [0], the line passed to libbpf makefile was
      '-I./usr/include', where '.' referred to LIBBPF_SRC due to -C flag.
      The directory $(LIBBPF_SRC)/usr/include does not exist and thus
      was never resolved by C compiler.
    - After commit [0], the line passed to libbpf makefile became:
      '<output-dir>/usr/include', this directory exists and is resolved by
      C compiler.
    - Both 'tools/include' and 'usr/include' define files err.h and types.h.
    - libbpf expects headers like 'linux/err.h' and 'linux/types.h'
      defined in 'tools/include', not 'usr/include', hence the compilation
      error.
    
    This commit removes unnecessary -I flags from libbpf compilation.
    (libbpf sets up the necessary includes at lib/bpf/Makefile:63).
    
    Changes v1 [1] -> v2:
    - dropped unnecessary replacement of KBUILD_OUTPUT with $(objtree)
      (Andrii)
    Changes v2 [2] -> v3:
    - make sure --sysroot option is set for libbpf's EXTRA_CFLAGS,
      if $(SYSROOT) is set (Stanislav)
    
    [0] commit 13b2548 ("kbuild: change working directory to external module directory with M=")
    [1] https://lore.kernel.org/bpf/[email protected]/
    [2] https://lore.kernel.org/bpf/[email protected]/
    
    Fixes: 13b2548 ("kbuild: change working directory to external module directory with M=")
    Signed-off-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Stanislav Fomichev <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    eddyz87 authored and anakryiko committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    5a6ea70 View commit details
    Browse the repository at this point in the history
  29. bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again

    Commit 028ddca ("bcache: Remove unnecessary NULL point check in
    node allocations") leads a NULL pointer deference in cache_set_flush().
    
    1721         if (!IS_ERR_OR_NULL(c->root))
    1722                 list_add(&c->root->list, &c->btree_cache);
    
    >From the above code in cache_set_flush(), if previous registration code
    fails before allocating c->root, it is possible c->root is NULL as what
    it is initialized. __bch_btree_node_alloc() never returns NULL but
    c->root is possible to be NULL at above line 1721.
    
    This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this.
    
    Fixes: 028ddca ("bcache: Remove unnecessary NULL point check in node allocations")
    Signed-off-by: Liequan Che <[email protected]>
    Cc: [email protected]
    Cc: Zheng Wang <[email protected]>
    Reviewed-by: Mingzhe Zou <[email protected]>
    Signed-off-by: Coly Li <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Liequan Che authored and axboe committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    b2e382a View commit details
    Browse the repository at this point in the history
  30. clk: en7523: Fix wrong BUS clock for EN7581

    The Documentation for EN7581 had a typo and still referenced the EN7523
    BUS base source frequency. This was in conflict with a different page in
    the Documentration that state that the BUS runs at 300MHz (600MHz source
    with divisor set to 2) and the actual watchdog that tick at half the BUS
    clock (150MHz). This was verified with the watchdog by timing the
    seconds that the system takes to reboot (due too watchdog) and by
    operating on different values of the BUS divisor.
    
    The correct values for source of BUS clock are 600MHz and 540MHz.
    
    This was also confirmed by Airoha.
    
    Cc: [email protected]
    Fixes: 66bc473 ("clk: en7523: Add EN7581 support")
    Signed-off-by: Christian Marangi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Acked-by: Lorenzo Bianconi <[email protected]>
    Signed-off-by: Stephen Boyd <[email protected]>
    Ansuel authored and bebarino committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    2eb75f8 View commit details
    Browse the repository at this point in the history
  31. clk: en7523: Initialize num before accessing hws in en7523_register_c…

    …locks()
    
    With the new __counted_by annotation in clk_hw_onecell_data, the "num"
    struct member must be set before accessing the "hws" array. Failing to
    do so will trigger a runtime warning when enabling CONFIG_UBSAN_BOUNDS
    and CONFIG_FORTIFY_SOURCE.
    
    Fixes: f316cdf ("clk: Annotate struct clk_hw_onecell_data with __counted_by")
    Signed-off-by: Haoyu Li <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Stephen Boyd <[email protected]>
    learjet5 authored and bebarino committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    52fd170 View commit details
    Browse the repository at this point in the history
  32. Merge tag 'drm-misc-fixes-2024-11-21' of https://gitlab.freedesktop.o…

    …rg/drm/misc/kernel into drm-fixes
    
    Short summary of fixes pull:
    
    dma-fence:
    - Fix reference leak on fence-merge failure path
    - Simplify fence merging with kernel's sort()
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    airlied committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    8cc4d0f View commit details
    Browse the repository at this point in the history
  33. Merge tag 'drm-misc-fixes-2024-11-28' of https://gitlab.freedesktop.o…

    …rg/drm/misc/kernel into drm-fixes
    
    Short summary of fixes pull:
    
    dma-buf:
    - Fix dma_fence_array_signaled() to ensure forward progress
    
    dp_mst:
    - Fix MST sideband message body length check
    
    sti:
    - Add __iomem for mixer_dbg_mxn()'s parameter
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    airlied committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    defc06f View commit details
    Browse the repository at this point in the history
  34. drm/amdgpu: rework resume handling for display (v2)

    Split resume into a 3rd step to handle displays when DCC is
    enabled on DCN 4.0.1.  Move display after the buffer funcs
    have been re-enabled so that the GPU will do the move and
    properly set the DCC metadata for DCN.
    
    v2: fix fence irq resume ordering
    
    Reviewed-by: Christian König <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected] # 6.11.x
    alexdeucher committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    73dae65 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2024

  1. net: Make napi_hash_lock irq safe

    Make napi_hash_lock IRQ safe. It is used during the control path, and is
    taken and released in napi_hash_add and napi_hash_del, which will
    typically be called by calls to napi_enable and napi_disable.
    
    This change avoids a deadlock in pcnet32 (and other any other drivers
    which follow the same pattern):
    
     CPU 0:
     pcnet32_open
        spin_lock_irqsave(&lp->lock, ...)
          napi_enable
            napi_hash_add <- before this executes, CPU 1 proceeds
              spin_lock(napi_hash_lock)
           [...]
        spin_unlock_irqrestore(&lp->lock, flags);
    
     CPU 1:
       pcnet32_close
         napi_disable
           napi_hash_del
             spin_lock(napi_hash_lock)
              < INTERRUPT >
                pcnet32_interrupt
                  spin_lock(lp->lock) <- DEADLOCK
    
    Changing the napi_hash_lock to be IRQ safe prevents the IRQ from firing
    on CPU 1 until napi_hash_lock is released, preventing the deadlock.
    
    Cc: [email protected]
    Fixes: 86e25f4 ("net: napi: Add napi_config")
    Reported-by: Guenter Roeck <[email protected]>
    Closes: https://lore.kernel.org/netdev/[email protected]/
    Suggested-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Joe Damato <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    jdamato-fsly authored and kuba-moo committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    cecc155 View commit details
    Browse the repository at this point in the history
  2. Revert "udp: avoid calling sock_def_readable() if possible"

    This reverts commit 612b1c0. On a
    scenario with multiple threads blocking on a recvfrom(), we need to call
    sock_def_readable() on every __udp_enqueue_schedule_skb() otherwise the
    threads won't be woken up as __skb_wait_for_more_packets() is using
    prepare_to_wait_exclusive().
    
    Link: https://bugzilla.redhat.com/2308477
    Fixes: 612b1c0 ("udp: avoid calling sock_def_readable() if possible")
    Signed-off-by: Fernando Fernandez Mancera <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    ffmancera authored and kuba-moo committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    3d501f5 View commit details
    Browse the repository at this point in the history
  3. ethtool: Fix access to uninitialized fields in set RXNFC command

    The check for non-zero ring with RSS is only relevant for
    ETHTOOL_SRXCLSRLINS command, in other cases the check tries to access
    memory which was not initialized by the userspace tool. Only perform the
    check in case of ETHTOOL_SRXCLSRLINS.
    
    Without this patch, filter deletion (for example) could statistically
    result in a false error:
      # ethtool --config-ntuple eth3 delete 484
      rmgr: Cannot delete RX class rule: Invalid argument
      Cannot delete classification rule
    
    Fixes: 9e43ad7 ("net: ethtool: only allow set_rxnfc with rss + ring_cookie if driver opts in")
    Link: https://lore.kernel.org/netdev/[email protected]/
    Reviewed-by: Dragos Tatulea <[email protected]>
    Reviewed-by: Tariq Toukan <[email protected]>
    Signed-off-by: Gal Pressman <[email protected]>
    Reviewed-by: Edward Cree <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    gal-pressman authored and kuba-moo committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    9407190 View commit details
    Browse the repository at this point in the history
  4. net: sched: fix erspan_opt settings in cls_flower

    When matching erspan_opt in cls_flower, only the (version, dir, hwid)
    fields are relevant. However, in fl_set_erspan_opt() it initializes
    all bits of erspan_opt and its mask to 1. This inadvertently requires
    packets to match not only the (version, dir, hwid) fields but also the
    other fields that are unexpectedly set to 1.
    
    This patch resolves the issue by ensuring that only the (version, dir,
    hwid) fields are configured in fl_set_erspan_opt(), leaving the other
    fields to 0 in erspan_opt.
    
    Fixes: 79b1011 ("net: sched: allow flower to match erspan options")
    Reported-by: Shuang Li <[email protected]>
    Signed-off-by: Xin Long <[email protected]>
    Reviewed-by: Cong Wang <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    lxin authored and davem330 committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    2922078 View commit details
    Browse the repository at this point in the history
  5. net: sched: fix ordering of qlen adjustment

    Changes to sch->q.qlen around qdisc_tree_reduce_backlog() need to happen
    _before_ a call to said function because otherwise it may fail to notify
    parent qdiscs when the child is about to become empty.
    
    Signed-off-by: Lion Ackermann <[email protected]>
    Acked-by: Toke Høiland-Jørgensen <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Lion Ackermann authored and davem330 committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    5eb7de8 View commit details
    Browse the repository at this point in the history
  6. spi: intel: Add Panther Lake SPI controller support

    The Panther Lake SPI controllers are compatible with the Cannon Lake
    controllers. Add support for following SPI controller device IDs:
     - H-series: 0xe323
     - P-series: 0xe423
     - U-series: 0xe423
    
    Signed-off-by: Aapo Vienamo <[email protected]>
    Signed-off-by: Mika Westerberg <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    tkln authored and broonie committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    ceb259e View commit details
    Browse the repository at this point in the history
  7. netfilter: ipset: Hold module reference while requesting a module

    User space may unload ip_set.ko while it is itself requesting a set type
    backend module, leading to a kernel crash. The race condition may be
    provoked by inserting an mdelay() right after the nfnl_unlock() call.
    
    Fixes: a7b4f98 ("netfilter: ipset: IP set core support")
    Signed-off-by: Phil Sutter <[email protected]>
    Acked-by: Jozsef Kadlecsik <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Phil Sutter authored and ummakynes committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    456f010 View commit details
    Browse the repository at this point in the history
  8. tracing: Fix cmp_entries_dup() to respect sort() comparison rules

    The cmp_entries_dup() function used as the comparator for sort()
    violated the symmetry and transitivity properties required by the
    sorting algorithm. Specifically, it returned 1 whenever memcmp() was
    non-zero, which broke the following expectations:
    
    * Symmetry: If x < y, then y > x.
    * Transitivity: If x < y and y < z, then x < z.
    
    These violations could lead to incorrect sorting and failure to
    correctly identify duplicate elements.
    
    Fix the issue by directly returning the result of memcmp(), which
    adheres to the required comparison properties.
    
    Cc: [email protected]
    Fixes: 08d43a5 ("tracing: Add lock-free tracing_map")
    Link: https://lore.kernel.org/[email protected]
    Signed-off-by: Kuan-Wei Chiu <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    visitorckw authored and rostedt committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    e63fbd5 View commit details
    Browse the repository at this point in the history
  9. scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driv…

    …er load time
    
    Issue a Diag-Reset when the "Doorbell-In-Use" bit is set during the
    driver load/initialization.
    
    Signed-off-by: Ranjan Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Ranjan Kumar authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    3f5eb06 View commit details
    Browse the repository at this point in the history
  10. scsi: mpt3sas: Update driver version to 51.100.00.00

    Update driver version to 51.100.00.00.
    
    Signed-off-by: Ranjan Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Ranjan Kumar authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    6050471 View commit details
    Browse the repository at this point in the history
  11. bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc

    Inside mark_stack_slot_misc, we should not upgrade STACK_INVALID to
    STACK_MISC when allow_ptr_leaks is false, since invalid contents
    shouldn't be read unless the program has the relevant capabilities.
    The relaxation only makes sense when env->allow_ptr_leaks is true.
    
    However, such conversion in privileged mode becomes unnecessary, as
    invalid slots can be read without being upgraded to STACK_MISC.
    
    Currently, the condition is inverted (i.e. checking for true instead of
    false), simply remove it to restore correct behavior.
    
    Fixes: eaf18fe ("bpf: preserve STACK_ZERO slots on partial reg spills")
    Acked-by: Andrii Nakryiko <[email protected]>
    Reported-by: Tao Lyu <[email protected]>
    Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    kkdwivedi authored and Alexei Starovoitov committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    69772f5 View commit details
    Browse the repository at this point in the history
  12. bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots

    When CAP_PERFMON and CAP_SYS_ADMIN (allow_ptr_leaks) are disabled, the
    verifier aims to reject partial overwrite on an 8-byte stack slot that
    contains a spilled pointer.
    
    However, in such a scenario, it rejects all partial stack overwrites as
    long as the targeted stack slot is a spilled register, because it does
    not check if the stack slot is a spilled pointer.
    
    Incomplete checks will result in the rejection of valid programs, which
    spill narrower scalar values onto scalar slots, as shown below.
    
    0: R1=ctx() R10=fp0
    ; asm volatile ( @ repro.bpf.c:679
    0: (7a) *(u64 *)(r10 -8) = 1          ; R10=fp0 fp-8_w=1
    1: (62) *(u32 *)(r10 -8) = 1
    attempt to corrupt spilled pointer on stack
    processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0.
    
    Fix this by expanding the check to not consider spilled scalar registers
    when rejecting the write into the stack.
    
    Previous discussion on this patch is at link [0].
    
      [0]: https://lore.kernel.org/bpf/[email protected]
    
    Fixes: ab125ed ("bpf: fix check for attempt to corrupt spilled pointer")
    Acked-by: Eduard Zingerman <[email protected]>
    Acked-by: Andrii Nakryiko <[email protected]>
    Signed-off-by: Tao Lyu <[email protected]>
    Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    lvtao-sec authored and Alexei Starovoitov committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    b0e6697 View commit details
    Browse the repository at this point in the history
  13. selftests/bpf: Introduce __caps_unpriv annotation for tests

    Add a __caps_unpriv annotation so that tests requiring specific
    capabilities while dropping the rest can conveniently specify them
    during selftest declaration instead of munging with capabilities at
    runtime from the testing binary.
    
    While at it, let us convert test_verifier_mtu to use this new support
    instead.
    
    Since we do not want to include linux/capability.h, we only defined the
    four main capabilities BPF subsystem deals with in bpf_misc.h for use in
    tests. If the user passes a CAP_SYS_NICE or anything else that's not
    defined in the header, capability parsing code will return a warning.
    
    Also reject strtol returning 0. CAP_CHOWN = 0 but we'll never need to
    use it, and strtol doesn't errno on failed conversion. Fail the test in
    such a case.
    
    The original diff for this idea is available at link [0].
    
      [0]: https://lore.kernel.org/bpf/[email protected]
    
    Signed-off-by: Eduard Zingerman <[email protected]>
    [ Kartikeya: rebase on bpf-next, add warn to parse_caps, convert test_verifier_mtu ]
    Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    eddyz87 authored and Alexei Starovoitov committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    adfdd9c View commit details
    Browse the repository at this point in the history
  14. selftests/bpf: Add test for reading from STACK_INVALID slots

    Ensure that when CAP_PERFMON is dropped, and the verifier sees
    allow_ptr_leaks as false, we are not permitted to read from a
    STACK_INVALID slot. Without the fix, the test will report unexpected
    success in loading.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    kkdwivedi authored and Alexei Starovoitov committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    f513c36 View commit details
    Browse the repository at this point in the history
  15. selftests/bpf: Add test for narrow spill into 64-bit spilled scalar

    Add a test case to verify that without CAP_PERFMON, the test now
    succeeds instead of failing due to a verification error.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    kkdwivedi authored and Alexei Starovoitov committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    19b6dbc View commit details
    Browse the repository at this point in the history
  16. Merge branch 'fixes-for-stack-with-allow_ptr_leaks'

    Kumar Kartikeya Dwivedi says:
    
    ====================
    Fixes for stack with allow_ptr_leaks
    
    Two fixes for usability/correctness gaps when interacting with the stack
    without CAP_PERFMON (i.e. with allow_ptr_leaks = false). See the commits
    for details. I've verified that the tests fail when run without the fixes.
    
    Changelog:
    ----------
    v3 -> v4
    v3: https://lore.kernel.org/bpf/[email protected]
    
     * Address Andrii's comments
       * Fix bug paperered over by missing CAP_NET_ADMIN in verifier_mtu
         test
       * Add warning when undefined CAP_ constant is specified, and fail
         test
       * Reorder annotations to be more clear
       * Verify that fixes fail without patches again
     * Add Acked-by from Andrii
    
    v2 -> v3
    v2: https://lore.kernel.org/bpf/[email protected]
    
     * Address comments from Eduard
       * Fix comment for mark_stack_slot_misc
       * We can simply always return early when stype == STACK_INVALID
       * Drop allow_ptr_leaks conditionals
       * Add Eduard's __caps_unpriv patch into the series
       * Convert test_verifier_mtu to use it
       * Move existing tests to __caps_unpriv annotation and verifier_spill_fill.c
       * Add Acked-by from Eduard
    
    v1 -> v2
    v1: https://lore.kernel.org/bpf/[email protected]
    
     * Fix CI errors in selftest by removing dependence on BPF_ST
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Alexei Starovoitov committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    e2cf913 View commit details
    Browse the repository at this point in the history
  17. nvme-fabrics: handle zero MAXCMD without closing the connection

    The NVMe specification states that MAXCMD is mandatory
    for NVMe-over-Fabrics implementations. However, some NVMe/TCP
    and NVMe/FC arrays from major vendors have buggy firmware
    that reports MAXCMD as zero in the Identify Controller data structure.
    
    Currently, the implementation closes the connection in such cases,
    completely preventing the host from connecting to the target.
    
    Fix the issue by printing a clear error message about the firmware bug
    and allowing the connection to proceed. It assumes that the
    target supports a MAXCMD value of SQSIZE + 1. If any issues arise,
    the user can manually adjust SQSIZE to mitigate them.
    
    Fixes: 4999568 ("nvme-fabrics: check max outstanding commands")
    Signed-off-by: Maurizio Lombardi <[email protected]>
    Reviewed-by: Laurence Oberman <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    maurizio-lombardi authored and keithbusch committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    88c23a3 View commit details
    Browse the repository at this point in the history
  18. nvmet: replace kmalloc + memset with kzalloc for data allocation

    cocci warnings: (new ones prefixed by >>)
    >> drivers/nvme/target/pr.c:831:8-15: WARNING: kzalloc should be used for data, instead of kmalloc/memset
    
    The pattern of using 'kmalloc' followed by 'memset' is replaced with
    'kzalloc', which is functionally equivalent to 'kmalloc' + 'memset',
    but more efficient. 'kzalloc' automatically zeroes the allocated
    memory, making it a faster and more streamlined solution.
    
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Reviewed-by: Kuan-Wei Chiu <[email protected]>
    Reviewed-by: Chaitanya Kulkarni <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Yu-Chun Lin <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    eleanorLYJ authored and keithbusch committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    41d826c View commit details
    Browse the repository at this point in the history
  19. scsi: mpi3mr: Synchronize access to ioctl data buffer

    The driver serializes ioctls through a mutex lock but access to the
    ioctl data buffer is not guarded by the mutex. This results in multiple
    user threads being able to write to the driver's ioctl buffer
    simultaneously.
    
    Protect the ioctl buffer with the ioctl mutex.
    
    Signed-off-by: Sumit Saxena <[email protected]>
    Signed-off-by: Ranjan Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Ranjan Kumar authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    367ac16 View commit details
    Browse the repository at this point in the history
  20. scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs

    The driver, through the SAS transport, exposes a sysfs interface to
    enable/disable PHYs in a controller/expander setup.  When multiple PHYs
    are disabled and enabled in rapid succession, the persistent and current
    config pages related to SAS IO unit/SAS Expander pages could get
    corrupted.
    
    Use separate memory for each config request.
    
    Signed-off-by: Prayas Patel <[email protected]>
    Signed-off-by: Ranjan Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Ranjan Kumar authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    711201a View commit details
    Browse the repository at this point in the history
  21. scsi: mpi3mr: Start controller indexing from 0

    Instead of displaying the controller index starting from '1' make the
    driver display the controller index starting from '0'.
    
    Signed-off-by: Sumit Saxena <[email protected]>
    Signed-off-by: Ranjan Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Ranjan Kumar authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    0d32014 View commit details
    Browse the repository at this point in the history
  22. scsi: mpi3mr: Handling of fault code for insufficient power

    Before retrying initialization, check and abort if the fault code
    indicates insufficient power. Also mark the controller as unrecoverable
    instead of issuing reset in the watch dog timer if the fault code
    indicates insufficient power.
    
    Signed-off-by: Prayas Patel <[email protected]>
    Signed-off-by: Ranjan Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Ranjan Kumar authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    fb6eb98 View commit details
    Browse the repository at this point in the history
  23. scsi: mpi3mr: Update driver version to 8.12.0.3.50

    Update driver version to 8.12.0.3.50.
    
    Signed-off-by: Ranjan Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Ranjan Kumar authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    0deb37c View commit details
    Browse the repository at this point in the history
  24. nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary

    Only call into nvme_alloc_host_mem_single which uses
    dma_alloc_noncontiguous when there is non-null dma merge boundary.
    Without this we'll call into dma_alloc_noncontiguous for device using
    dma-direct, which can work fine as long as the preferred size is below the
    MAX_ORDER of the page allocator, but blows up with a warning if it is
    too large.
    
    Fixes: 63a5c7a ("nvme-pci: use dma_alloc_noncontigous if possible")
    Reported-by: Leon Romanovsky <[email protected]>
    Reported-by: Chaitanya Kumar Borah <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Leon Romanovsky <[email protected]>
    Tested-by: Chaitanya Kumar Borah <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    Christoph Hellwig authored and keithbusch committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    ad0cf42 View commit details
    Browse the repository at this point in the history
  25. scsi: qla2xxx: Fix abort in bsg timeout

    Current abort of bsg on timeout prematurely clears the
    outstanding_cmds[]. Abort does not allow FW to return the IOCB/SRB. In
    addition, bsg_job_done() is not called to return the BSG (i.e. leak).
    
    Abort the outstanding bsg/SRB and wait for the completion. The
    completion IOCB will wake up the bsg_timeout thread. If abort is not
    successful, then driver will forcibly call bsg_job_done() and free the
    srb.
    
    Err Inject:
    
     - qaucli -z
     - assign CT Passthru IOCB's NportHandle with another initiator
       nport handle to trigger timeout.  Remote port will drop CT request.
     - bsg_job_done is properly called as part of cleanup
    
    kernel: qla2xxx [0000:21:00.1]-7012:7: qla2x00_process_ct : 286 : Error Inject.
    kernel: qla2xxx [0000:21:00.1]-7016:7: bsg rqst type: FC_BSG_HST_CT else type: 101 - loop-id=1 portid=fffffa.
    kernel: qla2xxx [0000:21:00.1]-70bb:7: qla24xx_bsg_timeout CMD timeout. bsg ptr ffff9971a42f0838 msgcode 80000004 vendor cmd fa010000
    kernel: qla2xxx [0000:21:00.1]-507c:7: Abort command issued - hdl=4b, type=5
    kernel: qla2xxx [0000:21:00.1]-5040:7: ELS-CT pass-through-ct pass-through error hdl=4b comp_status-status=0x5 error subcode 1=0x0 error subcode 2=0xaf882e80.
    kernel: qla2xxx [0000:21:00.1]-7009:7: qla2x00_bsg_job_done: sp hdl 4b, result=70000 bsg ptr ffff9971a42f0838
    kernel: qla2xxx [0000:21:00.1]-802c:7: Aborting bsg ffff9971a42f0838 sp=ffff99760b87ba80 handle=4b rval=0
    kernel: qla2xxx [0000:21:00.1]-708a:7: bsg abort success. bsg ffff9971a42f0838 sp=ffff99760b87ba80 handle=0x4b
    kernel: qla2xxx [0000:21:00.1]-7012:7: qla2x00_process_ct : 286 : Error Inject.
    kernel: qla2xxx [0000:21:00.1]-7016:7: bsg rqst type: FC_BSG_HST_CT else type: 101 - loop-id=1 portid=fffffa.
    kernel: qla2xxx [0000:21:00.1]-70bb:7: qla24xx_bsg_timeout CMD timeout. bsg ptr ffff9971a42f43b8 msgcode 80000004 vendor cmd fa010000
    kernel: qla2xxx [0000:21:00.1]-7012:7: qla_bsg_found : 2206 : Error Inject 2.
    kernel: qla2xxx [0000:21:00.1]-802c:7: Aborting bsg ffff9971a42f43b8 sp=ffff99762c304440 handle=5e rval=5
    kernel: qla2xxx [0000:21:00.1]-704f:7: bsg abort fail.  bsg=ffff9971a42f43b8 sp=ffff99762c304440 rval=5.
    kernel: qla2xxx [0000:21:00.1]-7051:7: qla_bsg_found bsg_job_done : bsg ffff9971a42f43b8 result 0xfffffffa sp ffff99762c304440.
    
    Cc: [email protected]
    Fixes: c449b41 ("scsi: qla2xxx: Use QP lock to search for bsg")
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Quinn Tran authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    c423263 View commit details
    Browse the repository at this point in the history
  26. scsi: qla2xxx: Fix use after free on unload

    System crash is observed with stack trace warning of use after
    free. There are 2 signals to tell dpc_thread to terminate (UNLOADING
    flag and kthread_stop).
    
    On setting the UNLOADING flag when dpc_thread happens to run at the time
    and sees the flag, this causes dpc_thread to exit and clean up
    itself. When kthread_stop is called for final cleanup, this causes use
    after free.
    
    Remove UNLOADING signal to terminate dpc_thread.  Use the kthread_stop
    as the main signal to exit dpc_thread.
    
    [596663.812935] kernel BUG at mm/slub.c:294!
    [596663.812950] invalid opcode: 0000 [#1] SMP PTI
    [596663.812957] CPU: 13 PID: 1475935 Comm: rmmod Kdump: loaded Tainted: G          IOE    --------- -  - 4.18.0-240.el8.x86_64 #1
    [596663.812960] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/20/2012
    [596663.812974] RIP: 0010:__slab_free+0x17d/0x360
    
    ...
    [596663.813008] Call Trace:
    [596663.813022]  ? __dentry_kill+0x121/0x170
    [596663.813030]  ? _cond_resched+0x15/0x30
    [596663.813034]  ? _cond_resched+0x15/0x30
    [596663.813039]  ? wait_for_completion+0x35/0x190
    [596663.813048]  ? try_to_wake_up+0x63/0x540
    [596663.813055]  free_task+0x5a/0x60
    [596663.813061]  kthread_stop+0xf3/0x100
    [596663.813103]  qla2x00_remove_one+0x284/0x440 [qla2xxx]
    
    Cc: [email protected]
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Quinn Tran authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    07c903d View commit details
    Browse the repository at this point in the history
  27. nvme-tcp: fix the memleak while create new ctrl failed

    Now while we create new ctrl failed, we have not free the
    tagset occupied by admin_q, here try to fix it.
    
    Fixes: fd1418d ("nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()")
    Signed-off-by: Chunguang.xu <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Hannes Reinecke <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    Chunguang.xu authored and keithbusch committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    fec55c2 View commit details
    Browse the repository at this point in the history
  28. nvme-rdma: unquiesce admin_q before destroy it

    Kernel will hang on destroy admin_q while we create ctrl failed, such
    as following calltrace:
    
    PID: 23644    TASK: ff2d52b40f439fc0  CPU: 2    COMMAND: "nvme"
     #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
     #1 [ff61d23de260fc08] schedule at ffffffff8323c014
     #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
     #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
     #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
     #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce
     #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced
     #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b
     torvalds#8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362
     torvalds#9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
        RIP: 00007fda7891d574  RSP: 00007ffe2ef06958  RFLAGS: 00000202
        RAX: ffffffffffffffda  RBX: 000055e8122a4d90  RCX: 00007fda7891d574
        RDX: 000000000000012b  RSI: 000055e8122a4d90  RDI: 0000000000000004
        RBP: 00007ffe2ef079c0   R8: 000000000000012b   R9: 000055e8122a4d90
        R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000004
        R13: 000055e8122923c0  R14: 000000000000012b  R15: 00007fda78a54500
        ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
    
    This due to we have quiesced admi_q before cancel requests, but forgot
    to unquiesce before destroy it, as a result we fail to drain the
    pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
    try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
    simplify the code.
    
    Fixes: 958dc1d ("nvme-rdma: add clean action for failed reconnection")
    Reported-by: Yingfu.zhou <[email protected]>
    Signed-off-by: Chunguang.xu <[email protected]>
    Signed-off-by: Yue.zhao <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Hannes Reinecke <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    Chunguang.xu authored and keithbusch committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    5858b68 View commit details
    Browse the repository at this point in the history
  29. nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()

    As we quiesce admin_q in nvme_tcp_teardown_admin_queue(), so we should no
    need to quiesce it in nvme_tcp_reaardown_io_queues(), make things simple.
    
    Signed-off-by: Chunguang.xu <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Hannes Reinecke <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    Chunguang.xu authored and keithbusch committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    fdc5664 View commit details
    Browse the repository at this point in the history
  30. nvme-tcp: simplify nvme_tcp_teardown_io_queues()

    As nvme_tcp_teardown_io_queues() is the only one caller of
    nvme_tcp_destroy_admin_queue(), so we can merge it into
    nvme_tcp_teardown_io_queues() to simplify the code.
    
    Signed-off-by: Chunguang.xu <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Hannes Reinecke <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    Chunguang.xu authored and keithbusch committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    b4e12f5 View commit details
    Browse the repository at this point in the history
  31. scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cnt

    Firmware supports multiple sg_cnt for request and response for CT
    commands, so remove the redundant check. A check is there where sg_cnt
    for request and response should be same. This is not required as driver
    and FW have code to handle multiple and different sg_cnt on request and
    response.
    
    Cc: [email protected]
    Signed-off-by: Saurav Kashyap <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Saurav Kashyap authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    833c70e View commit details
    Browse the repository at this point in the history
  32. scsi: qla2xxx: Fix NVMe and NPIV connect issue

    NVMe controller fails to send connect command due to failure to locate
    hw context buffer for NVMe queue 0 (blk_mq_hw_ctx, hctx_idx=0). The
    cause of the issue is NPIV host did not initialize the vha->irq_offset
    field.  This field is given to blk-mq (blk_mq_pci_map_queues) to help
    locate the beginning of IO Queues which in turn help locate NVMe queue
    0.
    
    Initialize this field to allow NVMe to work properly with NPIV host.
    
     kernel: nvme nvme5: Connect command failed, errno: -18
     kernel: nvme nvme5: qid 0: secure concatenation is not supported
     kernel: nvme nvme5: NVME-FC{5}: create_assoc failed, assoc_id 2e9100 ret 401
     kernel: nvme nvme5: NVME-FC{5}: reset: Reconnect attempt failed (401)
     kernel: nvme nvme5: NVME-FC{5}: Reconnect attempt in 2 seconds
    
    Cc: [email protected]
    Fixes: f0783d4 ("scsi: qla2xxx: Use correct number of vectors for online CPUs")
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Quinn Tran authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    4812b77 View commit details
    Browse the repository at this point in the history
  33. scsi: qla2xxx: Supported speed displayed incorrectly for VPorts

    The fc_function_template for vports was missing the
    .show_host_supported_speeds. The base port had the same.
    
    Add .show_host_supported_speeds to the vport template as well.
    
    Cc: [email protected]
    Fixes: 2c3dfe3 ("[SCSI] qla2xxx: add support for NPIV")
    Signed-off-by: Anil Gurumurthy <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Anil Gurumurthy authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    e4e268f View commit details
    Browse the repository at this point in the history
  34. scsi: qla2xxx: Update version to 10.02.09.400-k

    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    njavali authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    35002a8 View commit details
    Browse the repository at this point in the history
  35. scsi: ufs: core: sysfs: Prevent div by zero

    Prevent a division by 0 when monitoring is not enabled.
    
    Fixes: 1d8613a ("scsi: ufs: core: Introduce HBA performance monitor sysfs nodes")
    Cc: [email protected]
    Signed-off-by: Gwendal Grignou <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Can Guo <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    gwendalcr authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    eb48e9f View commit details
    Browse the repository at this point in the history
  36. scsi: sg: Fix slab-use-after-free read in sg_release()

    Fix a use-after-free bug in sg_release(), detected by syzbot with KASAN:
    
    BUG: KASAN: slab-use-after-free in lock_release+0x151/0xa30
    kernel/locking/lockdep.c:5838
    __mutex_unlock_slowpath+0xe2/0x750 kernel/locking/mutex.c:912
    sg_release+0x1f4/0x2e0 drivers/scsi/sg.c:407
    
    In sg_release(), the function kref_put(&sfp->f_ref, sg_remove_sfp) is
    called before releasing the open_rel_lock mutex. The kref_put() call may
    decrement the reference count of sfp to zero, triggering its cleanup
    through sg_remove_sfp(). This cleanup includes scheduling deferred work
    via sg_remove_sfp_usercontext(), which ultimately frees sfp.
    
    After kref_put(), sg_release() continues to unlock open_rel_lock and may
    reference sfp or sdp. If sfp has already been freed, this results in a
    slab-use-after-free error.
    
    Move the kref_put(&sfp->f_ref, sg_remove_sfp) call after unlocking the
    open_rel_lock mutex. This ensures:
    
     - No references to sfp or sdp occur after the reference count is
       decremented.
    
     - Cleanup functions such as sg_remove_sfp() and
       sg_remove_sfp_usercontext() can safely execute without impacting the
       mutex handling in sg_release().
    
    The fix has been tested and validated by syzbot. This patch closes the
    bug reported at the following syzkaller link and ensures proper
    sequencing of resource cleanup and mutex operations, eliminating the
    risk of use-after-free errors in sg_release().
    
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=7efb5850a17ba6ce098b
    Tested-by: [email protected]
    Fixes: cc833ac ("sg: O_EXCL and other lock handling")
    Signed-off-by: Suraj Sonawane <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Bart Van Assche <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    SurajSonawane2415 authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    f10593a View commit details
    Browse the repository at this point in the history
  37. scsi: ufs: core: Add missing post notify for power mode change

    When the power mode change is successful but the power mode hasn't
    actually changed, the post notification was missed.  Similar to the
    approach with hibernate/clock scale/hce enable, having pre/post
    notifications in the same function will make it easier to maintain.
    
    Additionally, supplement the description of power parameters for the
    pwr_change_notify callback.
    
    Fixes: 7eb584d ("ufs: refactor configuring power mode")
    Cc: [email protected] #6.11.x
    Signed-off-by: Peter Wang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Bart Van Assche <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    ptr324 authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    7f45ed5 View commit details
    Browse the repository at this point in the history
  38. scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_O…

    …VERRUN as an error
    
    This partially reverts commit 812fe64 ("scsi: storvsc: Handle
    additional SRB status values").
    
    HyperV does not support MAINTENANCE_IN resulting in FC passthrough
    returning the SRB_STATUS_DATA_OVERRUN value. Now that
    SRB_STATUS_DATA_OVERRUN is treated as an error, multipath ALUA paths go
    into a faulty state as multipath ALUA submits RTPG commands via
    MAINTENANCE_IN.
    
    [    3.215560] hv_storvsc 1d69d403-9692-4460-89f9-a8cbcc0f94f3:
    tag#230 cmd 0xa3 status: scsi 0x0 srb 0x12 hv 0xc0000001
    [    3.215572] scsi 1:0:0:32: alua: rtpg failed, result 458752
    
    Make MAINTENANCE_IN return success to avoid the error path as is
    currently done with INQUIRY and MODE_SENSE.
    
    Suggested-by: Michael Kelley <[email protected]>
    Signed-off-by: Cathy Avery <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Michael Kelley <[email protected]>
    Reviewed-by: Ewan D. Milne <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    caavery authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    b1aee7f View commit details
    Browse the repository at this point in the history
  39. scsi: scsi_debug: Fix hrtimer support for ndelay

    Since commit 771f712 ("scsi: scsi_debug: Fix cmd duration
    calculation"), ns_from_boot value is only evaluated in schedule_resp()
    for polled requests.
    
    However, ns_from_boot is also required for hrtimer support for when
    ndelay is less than INCLUSIVE_TIMING_MAX_NS, so fix up the logic to
    decide when to evaluate ns_from_boot.
    
    Fixes: 771f712 ("scsi: scsi_debug: Fix cmd duration calculation")
    Signed-off-by: John Garry <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    johnpgarry authored and martinkpetersen committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    6918141 View commit details
    Browse the repository at this point in the history
  40. Merge tag 'platform-drivers-x86-v6.13-2' of git://git.kernel.org/pub/…

    …scm/linux/kernel/git/pdx86/platform-drivers-x86
    
    Pull x86 platform driver fixes from Ilpo Järvinen:
    
     - asus-nb-wmi: Silence unknown event warning when charger is plugged in
    
     - asus-wmi: Handle return code variations during thermal policy writing
       graciously
    
     - samsung-laptop: Correct module description
    
    * tag 'platform-drivers-x86-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
      platform/x86: asus-nb-wmi: Ignore unknown event 0xCF
      platform/x86: asus-wmi: Ignore return value when writing thermal policy
      platform/x86: samsung-laptop: Match MODULE_DESCRIPTION() to functionality
    torvalds committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    0769a8f View commit details
    Browse the repository at this point in the history
  41. Merge tag 'loongarch-fixes-6.13-1' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/chenhuacai/linux-loongson
    
    Pull LoongArch fixes from Huacai Chen:
     "Fix bugs about EFI screen info, hugetlb pte clear and Lockdep-RCU
      splat in KVM, plus some trival cleanups"
    
    * tag 'loongarch-fixes-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
      LoongArch: KVM: Protect kvm_io_bus_{read,write}() with SRCU
      LoongArch: KVM: Protect kvm_check_requests() with SRCU
      LoongArch: BPF: Adjust the parameter of emit_jirl()
      LoongArch: Add architecture specific huge_pte_clear()
      LoongArch/irq: Use seq_put_decimal_ull_width() for decimal values
      LoongArch: Fix reserving screen info memory for above-4G firmware
    torvalds committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    5076001 View commit details
    Browse the repository at this point in the history
  42. fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX

    Avoid extra roundtrip
    
    Cc: [email protected]
    Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: Ralph Boehme <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    slowfranklin authored and Steve French committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    ca4b2c4 View commit details
    Browse the repository at this point in the history
  43. x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR impleme…

    …ntation
    
    Under some conditions, MONITOR wakeups on Lunar Lake processors
    can be lost, resulting in significant user-visible delays.
    
    Add Lunar Lake to X86_BUG_MONITOR so that wake_up_idle_cpu()
    always sends an IPI, avoiding this potential delay.
    
    Reported originally here:
    
    	https://bugzilla.kernel.org/show_bug.cgi?id=219364
    
    [ dhansen: tweak subject ]
    
    Signed-off-by: Len Brown <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Reviewed-by: Rafael J. Wysocki <[email protected]>
    Cc:[email protected]
    Link: https://lore.kernel.org/all/a4aa8842a3c3bfdb7fe9807710eef159cbf0e705.1731463305.git.len.brown%40intel.com
    lenb authored and hansendc committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    c9a4b55 View commit details
    Browse the repository at this point in the history
  44. netfilter: nft_set_hash: skip duplicated elements pending gc run

    rhashtable does not provide stable walk, duplicated elements are
    possible in case of resizing. I considered that checking for errors when
    calling rhashtable_walk_next() was sufficient to detect the resizing.
    However, rhashtable_walk_next() returns -EAGAIN only at the end of the
    iteration, which is too late, because a gc work containing duplicated
    elements could have been already scheduled for removal to the worker.
    
    Add a u32 gc worker sequence number per set, bump it on every workqueue
    run. Annotate gc worker sequence number on the expired element. Use it
    to skip those already seen in this gc workqueue run.
    
    Note that this new field is never reset in case gc transaction fails, so
    next gc worker run on the expired element overrides it. Wraparound of gc
    worker sequence number should not be an issue with stale gc worker
    sequence number in the element, that would just postpone the element
    removal in one gc run.
    
    Note that it is not possible to use flags to annotate that element is
    pending gc run to detect duplicates, given that gc transaction can be
    invalidated in case of update from the control plane, therefore, not
    allowing to clear such flag.
    
    On x86_64, pahole reports no changes in the size of nft_rhash_elem.
    
    Fixes: f6c383b ("netfilter: nf_tables: adapt set backend to use GC transaction API")
    Reported-by: Laurent Fasnacht <[email protected]>
    Tested-by: Laurent Fasnacht <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    ummakynes committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    7ffc748 View commit details
    Browse the repository at this point in the history
  45. fs/smb/client: Implement new SMB3 POSIX type

    Fixes special files against current Samba.
    
    On the Samba server:
    
    insgesamt 20
    131958 brw-r--r--  1 root  root  0, 0 15. Nov 12:04 blockdev
    131965 crw-r--r--  1 root  root  1, 1 15. Nov 12:04 chardev
    131966 prw-r--r--  1 samba samba    0 15. Nov 12:05 fifo
    131953 -rw-rwxrw-+ 2 samba samba    4 18. Nov 11:37 file
    131953 -rw-rwxrw-+ 2 samba samba    4 18. Nov 11:37 hardlink
    131957 lrwxrwxrwx  1 samba samba    4 15. Nov 12:03 symlink -> file
    131954 -rwxrwxr-x+ 1 samba samba    0 18. Nov 15:28 symlinkoversmb
    
    Before:
    
    ls: cannot access '/mnt/smb3unix/posix/blockdev': No data available
    ls: cannot access '/mnt/smb3unix/posix/chardev': No data available
    ls: cannot access '/mnt/smb3unix/posix/symlinkoversmb': No data available
    ls: cannot access '/mnt/smb3unix/posix/fifo': No data available
    ls: cannot access '/mnt/smb3unix/posix/symlink': No data available
    total 16
         ? -????????? ? ?    ?     ?            ? blockdev
         ? -????????? ? ?    ?     ?            ? chardev
         ? -????????? ? ?    ?     ?            ? fifo
    131953 -rw-rwxrw- 2 root samba 4 Nov 18 11:37 file
    131953 -rw-rwxrw- 2 root samba 4 Nov 18 11:37 hardlink
         ? -????????? ? ?    ?     ?            ? symlink
         ? -????????? ? ?    ?     ?            ? symlinkoversmb
    
    After:
    
    insgesamt 21
    131958 brw-r--r-- 1 root root  0, 0 15. Nov 12:04 blockdev
    131965 crw-r--r-- 1 root root  1, 1 15. Nov 12:04 chardev
    131966 prw-r--r-- 1 root samba    0 15. Nov 12:05 fifo
    131953 -rw-rwxrw- 2 root samba    4 18. Nov 11:37 file
    131953 -rw-rwxrw- 2 root samba    4 18. Nov 11:37 hardlink
    131957 lrwxrwxrwx 1 root samba    4 15. Nov 12:03 symlink -> file
    131954 lrwxrwxr-x 1 root samba   23 18. Nov 15:28 symlinkoversmb -> mnt/smb3unix/posix/file
    
    Cc: [email protected]
    Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: Ralph Boehme <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    slowfranklin authored and Steve French committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    6a832bc View commit details
    Browse the repository at this point in the history
  46. fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse points

    Spares an extra revalidation request
    
    Cc: [email protected]
    Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: Ralph Boehme <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    slowfranklin authored and Steve French committed Dec 4, 2024
    Configuration menu
    Copy the full SHA
    8cb0bc5 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2024

  1. ksmbd: align aux_payload_buf to avoid OOB reads in cryptographic oper…

    …ations
    
    The aux_payload_buf allocation in SMB2 read is performed without ensuring
    alignment, which could result in out-of-bounds (OOB) reads during
    cryptographic operations such as crypto_xor or ghash. This patch aligns
    the allocation of aux_payload_buf to prevent these issues.
    (Note that to add this patch to stable would require modifications due
    to recent patch "ksmbd: use __GFP_RETRY_MAYFAIL")
    
    Signed-off-by: Norbert Szetei <[email protected]>
    Acked-by: Namjae Jeon <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    nszetei authored and Steve French committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    06a0254 View commit details
    Browse the repository at this point in the history
  2. ipmr: tune the ipmr_can_free_table() checks.

    Eric reported a syzkaller-triggered splat caused by recent ipmr changes:
    
    WARNING: CPU: 2 PID: 6041 at net/ipv6/ip6mr.c:419
    ip6mr_free_table+0xbd/0x120 net/ipv6/ip6mr.c:419
    Modules linked in:
    CPU: 2 UID: 0 PID: 6041 Comm: syz-executor183 Not tainted
    6.12.0-syzkaller-10681-g65ae975e97d5 #0
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
    1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
    RIP: 0010:ip6mr_free_table+0xbd/0x120 net/ipv6/ip6mr.c:419
    Code: 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c
    02 00 75 58 49 83 bc 24 c0 0e 00 00 00 74 09 e8 44 ef a9 f7 90 <0f> 0b
    90 e8 3b ef a9 f7 48 8d 7b 38 e8 12 a3 96 f7 48 89 df be 0f
    RSP: 0018:ffffc90004267bd8 EFLAGS: 00010293
    RAX: 0000000000000000 RBX: ffff88803c710000 RCX: ffffffff89e4d844
    RDX: ffff88803c52c880 RSI: ffffffff89e4d87c RDI: ffff88803c578ec0
    RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: ffff88803c578000
    R13: ffff88803c710000 R14: ffff88803c710008 R15: dead000000000100
    FS: 00007f7a855ee6c0(0000) GS:ffff88806a800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f7a85689938 CR3: 000000003c492000 CR4: 0000000000352ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    <TASK>
    ip6mr_rules_exit+0x176/0x2d0 net/ipv6/ip6mr.c:283
    ip6mr_net_exit_batch+0x53/0xa0 net/ipv6/ip6mr.c:1388
    ops_exit_list+0x128/0x180 net/core/net_namespace.c:177
    setup_net+0x4fe/0x860 net/core/net_namespace.c:394
    copy_net_ns+0x2b4/0x6b0 net/core/net_namespace.c:500
    create_new_namespaces+0x3ea/0xad0 kernel/nsproxy.c:110
    unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
    ksys_unshare+0x45d/0xa40 kernel/fork.c:3334
    __do_sys_unshare kernel/fork.c:3405 [inline]
    __se_sys_unshare kernel/fork.c:3403 [inline]
    __x64_sys_unshare+0x31/0x40 kernel/fork.c:3403
    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
    do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f7a856332d9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48
    89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
    01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f7a855ee238 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
    RAX: ffffffffffffffda RBX: 00007f7a856bd308 RCX: 00007f7a856332d9
    RDX: 00007f7a8560f8c6 RSI: 0000000000000000 RDI: 0000000062040200
    RBP: 00007f7a856bd300 R08: 00007fff932160a7 R09: 00007f7a855ee6c0
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a856bd30c
    R13: 0000000000000000 R14: 00007fff93215fc0 R15: 00007fff932160a8
    </TASK>
    
    The root cause is a network namespace creation failing after successful
    initialization of the ipmr subsystem. Such a case is not currently
    matched by the ipmr_can_free_table() helper.
    
    New namespaces are zeroed on allocation and inserted into net ns list
    only after successful creation; when deleting an ipmr table, the list
    next pointer can be NULL only on netns initialization failure.
    
    Update the ipmr_can_free_table() checks leveraging such condition.
    
    Reported-by: Eric Dumazet <[email protected]>
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=6e8cb445d4b43d006e0c
    Fixes: 11b6e70 ("ipmr: add debug check for mr table cleanup")
    Signed-off-by: Paolo Abeni <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://patch.msgid.link/8bde975e21bbca9d9c27e36209b2dd4f1d7a3f00.1733212078.git.pabeni@redhat.com
    Signed-off-by: Jakub Kicinski <[email protected]>
    Paolo Abeni authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    50b9420 View commit details
    Browse the repository at this point in the history
  3. ethtool: Fix wrong mod state in case of verbose and no_mask bitset

    A bitset without mask in a _SET request means we want exactly the bits in
    the bitset to be set. This works correctly for compact format but when
    verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
    bits present in the request bitset but does not clear the rest. The commit
    6699170 ("ethtool: fix application of verbose no_mask bitset") fixes
    this issue by clearing the whole target bitmap before we start iterating.
    The solution proposed brought an issue with the behavior of the mod
    variable. As the bitset is always cleared the old value will always
    differ to the new value.
    
    Fix it by adding a new function to compare bitmaps and a temporary variable
    which save the state of the old bitmap.
    
    Fixes: 6699170 ("ethtool: fix application of verbose no_mask bitset")
    Signed-off-by: Kory Maincent <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kmaincent authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    910c478 View commit details
    Browse the repository at this point in the history
  4. mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4

    The driver is currently using an ACL key block that is not supported by
    Spectrum-4. This works because the driver is only using a single field
    from this key block which is located in the same offset in the
    equivalent Spectrum-4 key block.
    
    The issue was discovered when the firmware started rejecting the use of
    the unsupported key block. The change has been reverted to avoid
    breaking users that only update their firmware.
    
    Nonetheless, fix the issue by using the correct key block.
    
    Fixes: 07ff135 ("mlxsw: Introduce flex key elements for Spectrum-4")
    Signed-off-by: Ido Schimmel <[email protected]>
    Reviewed-by: Petr Machata <[email protected]>
    Signed-off-by: Petr Machata <[email protected]>
    Link: https://patch.msgid.link/35e72c97bdd3bc414fb8e4d747e5fb5d26c29658.1733237440.git.petrm@nvidia.com
    Signed-off-by: Jakub Kicinski <[email protected]>
    idosch authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    217bbf1 View commit details
    Browse the repository at this point in the history
  5. geneve: do not assume mac header is set in geneve_xmit_skb()

    We should not assume mac header is set in output path.
    
    Use skb_eth_hdr() instead of eth_hdr() to fix the issue.
    
    sysbot reported the following :
    
     WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 skb_mac_header include/linux/skbuff.h:3052 [inline]
     WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 eth_hdr include/linux/if_ether.h:24 [inline]
     WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 geneve_xmit_skb drivers/net/geneve.c:898 [inline]
     WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 geneve_xmit+0x4c38/0x5730 drivers/net/geneve.c:1039
    Modules linked in:
    CPU: 0 UID: 0 PID: 11635 Comm: syz.4.1423 Not tainted 6.12.0-syzkaller-10296-gaaf20f870da0 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
     RIP: 0010:skb_mac_header include/linux/skbuff.h:3052 [inline]
     RIP: 0010:eth_hdr include/linux/if_ether.h:24 [inline]
     RIP: 0010:geneve_xmit_skb drivers/net/geneve.c:898 [inline]
     RIP: 0010:geneve_xmit+0x4c38/0x5730 drivers/net/geneve.c:1039
    Code: 21 c6 02 e9 35 d4 ff ff e8 a5 48 4c fb 90 0f 0b 90 e9 fd f5 ff ff e8 97 48 4c fb 90 0f 0b 90 e9 d8 f5 ff ff e8 89 48 4c fb 90 <0f> 0b 90 e9 41 e4 ff ff e8 7b 48 4c fb 90 0f 0b 90 e9 cd e7 ff ff
    RSP: 0018:ffffc90003b2f870 EFLAGS: 00010283
    RAX: 000000000000037a RBX: 000000000000ffff RCX: ffffc9000dc3d000
    RDX: 0000000000080000 RSI: ffffffff86428417 RDI: 0000000000000003
    RBP: ffffc90003b2f9f0 R08: 0000000000000003 R09: 000000000000ffff
    R10: 000000000000ffff R11: 0000000000000002 R12: ffff88806603c000
    R13: 0000000000000000 R14: ffff8880685b2780 R15: 0000000000000e23
    FS:  00007fdc2deed6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b30a1dff8 CR3: 0000000056b8c000 CR4: 00000000003526f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
      __netdev_start_xmit include/linux/netdevice.h:5002 [inline]
      netdev_start_xmit include/linux/netdevice.h:5011 [inline]
      __dev_direct_xmit+0x58a/0x720 net/core/dev.c:4490
      dev_direct_xmit include/linux/netdevice.h:3181 [inline]
      packet_xmit+0x1e4/0x360 net/packet/af_packet.c:285
      packet_snd net/packet/af_packet.c:3146 [inline]
      packet_sendmsg+0x2700/0x5660 net/packet/af_packet.c:3178
      sock_sendmsg_nosec net/socket.c:711 [inline]
      __sock_sendmsg net/socket.c:726 [inline]
      __sys_sendto+0x488/0x4f0 net/socket.c:2197
      __do_sys_sendto net/socket.c:2204 [inline]
      __se_sys_sendto net/socket.c:2200 [inline]
      __x64_sys_sendto+0xe0/0x1c0 net/socket.c:2200
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    Fixes: a025fb5 ("geneve: Allow configuration of DF behaviour")
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/netdev/[email protected]/T/#u
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Stefano Brivio <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Eric Dumazet authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    8588c99 View commit details
    Browse the repository at this point in the history
  6. bnxt_en: refactor tpa_info alloc/free into helpers

    Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work
    on a single tpa_info in prep for queue API using them.
    
    There are 2 pairs of operations:
    
    * bnxt_alloc_one_tpa_info()
    * bnxt_free_one_tpa_info()
    
    These alloc/free the tpa_info array itself.
    
    * bnxt_alloc_one_tpa_info_data()
    * bnxt_free_one_tpa_info_data()
    
    These alloc/free the frags stored in tpa_info array.
    
    Reviewed-by: Somnath Kotur <[email protected]>
    Signed-off-by: David Wei <[email protected]>
    Reviewed-by: Michael Chan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    spikeh authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    5883a3e View commit details
    Browse the repository at this point in the history
  7. bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()

    Refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() for
    allocating rx_agg_bmap.
    
    Reviewed-by: Somnath Kotur <[email protected]>
    Signed-off-by: David Wei <[email protected]>
    Reviewed-by: Michael Chan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    spikeh authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    bf1782d View commit details
    Browse the repository at this point in the history
  8. bnxt_en: handle tpa_info in queue API implementation

    Commit 7ed816b ("eth: bnxt: use page pool for head frags") added a
    page pool for header frags, which may be distinct from the existing pool
    for the aggregation ring. Prior to this change, frags used in the TPA
    ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
    meaning their lifetimes were not associated with a page pool. They can
    be returned at any time and so the queue API did not alloc or free
    rx_tpa.
    
    But now frags come from a separate head_pool which may be different to
    page_pool. Without allocating and freeing rx_tpa, frags allocated from
    the old head_pool may be returned to a different new head_pool which
    causes a mismatch between the pp hold/release count.
    
    Fix this problem by properly freeing and allocating rx_tpa in the queue
    API implementation.
    
    Signed-off-by: David Wei <[email protected]>
    Reviewed-by: Michael Chan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    spikeh authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    bd649c5 View commit details
    Browse the repository at this point in the history
  9. Merge branch 'bnxt_en-support-header-page-pool-in-queue-api'

    David Wei says:
    
    ====================
    bnxt_en: support header page pool in queue API
    
    Commit 7ed816b ("eth: bnxt: use page pool for head frags") added a
    separate page pool for header frags. Now, frags are allocated from this
    header page pool e.g. rxr->tpa_info.data.
    
    The queue API did not properly handle rxr->tpa_info and so using the
    queue API to i.e. reset any queues will result in pages being returned
    to the incorrect page pool, causing inflight != 0 warnings.
    
    Fix this bug by properly allocating/freeing tpa_info and copying/freeing
    head_pool in the queue API implementation.
    
    The 1st patch is a prep patch that refactors helpers out to be used by
    the implementation patch later.
    
    The 2nd patch is a drive-by refactor. Happy to take it out and re-send
    to net-next if there are any objections.
    
    The 3rd patch is the implementation patch that will properly alloc/free
    rxr->tpa_info.
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    5f4d035 View commit details
    Browse the repository at this point in the history
  10. net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout

    It allocates a match template, which creates a compressed definer fc
    struct, but that is not deallocated.
    
    This commit fixes that.
    
    Fixes: 74a778b ("net/mlx5: HWS, added definers handling")
    Signed-off-by: Cosmin Ratiu <[email protected]>
    Reviewed-by: Yevgeny Kliteynik <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Cosmin Ratiu authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    530b69a View commit details
    Browse the repository at this point in the history
  11. net/mlx5: HWS: Properly set bwc queue locks lock classes

    The mentioned "Fixes" patch forgot to do that.
    
    Fixes: 9addffa ("net/mlx5: HWS, use lock classes for bwc locks")
    Signed-off-by: Cosmin Ratiu <[email protected]>
    Reviewed-by: Yevgeny Kliteynik <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Cosmin Ratiu authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    10e0f0c View commit details
    Browse the repository at this point in the history
  12. net/mlx5: E-Switch, Fix switching to switchdev mode with IB device di…

    …sabled
    
    In case that IB device is already disabled when moving to switchdev mode,
    which can happen when working with LAG, need to do rescan_drivers()
    before leaving in order to add ethernet representor auxiliary device.
    
    Fixes: ab85ebf ("net/mlx5: E-switch, refactor eswitch mode change")
    Signed-off-by: Patrisious Haddad <[email protected]>
    Reviewed-by: Mark Bloch <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    PatrisiousHaddad authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    5f9b2bf View commit details
    Browse the repository at this point in the history
  13. net/mlx5: E-Switch, Fix switching to switchdev mode in MPV

    Fix the mentioned commit change for MPV mode, since in MPV mode the IB
    device is shared between different core devices, so under this change
    when moving both devices simultaneously to switchdev mode the IB device
    removal and re-addition can race with itself causing unexpected behavior.
    
    In such case do rescan_drivers() only once in order to add the ethernet
    representor auxiliary device, and skip adding and removing IB devices.
    
    Fixes: ab85ebf ("net/mlx5: E-switch, refactor eswitch mode change")
    Signed-off-by: Patrisious Haddad <[email protected]>
    Reviewed-by: Mark Bloch <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    PatrisiousHaddad authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    d04c81a View commit details
    Browse the repository at this point in the history
  14. net/mlx5e: SD, Use correct mdev to build channel param

    In a multi-PF netdev, each traffic channel creates its own resources
    against a specific PF.
    In the cited commit, where this support was added, the channel_param
    logic was mistakenly kept unchanged, so it always used the primary PF
    which is found at priv->mdev.
    In this patch we fix this by moving the logic to be per-channel, and
    passing the correct mdev instance.
    
    This bug happened to be usually harmless, as the resulting cparam
    structures would be the same for all channels, due to identical FW logic
    and decisions.
    However, in some use cases, like fwreset, this gets broken.
    
    This could lead to different symptoms. Example:
    Error cqe on cqn 0x428, ci 0x0, qn 0x10a9, opcode 0xe, syndrome 0x4,
    vendor syndrome 0x32
    
    Fixes: e4f9686 ("net/mlx5e: Let channels be SD-aware")
    Signed-off-by: Tariq Toukan <[email protected]>
    Reviewed-by: Lama Kayal <[email protected]>
    Reviewed-by: Gal Pressman <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Tariq Toukan authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    31f114c View commit details
    Browse the repository at this point in the history
  15. net/mlx5e: Remove workaround to avoid syndrome for internal port

    Previously a workaround was added to avoid syndrome 0xcdb051. It is
    triggered when offload a rule with tunnel encapsulation, and
    forwarding to another table, but not matching on the internal port in
    firmware steering mode. The original workaround skips internal tunnel
    port logic, which is not correct as not all cases are considered. As
    an example, if vlan is configured on the uplink port, traffic can't
    pass because vlan header is not added with this workaround. Besides,
    there is no such issue for software steering. So, this patch removes
    that, and returns error directly if trying to offload such rule for
    firmware steering.
    
    Fixes: 06b4eac ("net/mlx5e: Don't offload internal port if filter device is out device")
    Signed-off-by: Jianbo Liu <[email protected]>
    Tested-by: Frode Nordahl <[email protected]>
    Reviewed-by: Chris Mi <[email protected]>
    Reviewed-by: Ariel Levkovich <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Jianbo Liu authored and kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    5085f86 View commit details
    Browse the repository at this point in the history
  16. Merge branch 'mlx5-misc-fixes-2024-12-03'

    Tariq Toukan says:
    
    ====================
    mlx5 misc fixes 2024-12-03
    
    This patchset provides misc bug fixes from the team to the mlx5 core and
    Eth drivers.
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    1831729 View commit details
    Browse the repository at this point in the history
  17. Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/tnguy/net-queue
    
    Tony Nguyen says:
    
    ====================
    Intel Wired LAN Driver Updates 2024-12-03 (ice, idpf, ixgbe, ixgbevf, igb)
    
    This series contains updates to ice, idpf, ixgbe, ixgbevf, and igb
    drivers.
    
    For ice:
    Arkadiusz corrects search for determining whether PHY clock recovery is
    supported on the device.
    
    Przemyslaw corrects mask used for PHY timestamps on ETH56G devices.
    
    Wojciech adds missing virtchnl ops which caused NULL pointer
    dereference.
    
    Marcin fixes VLAN filter settings for uplink VSI in switchdev mode.
    
    For idpf:
    Josh restores setting of completion tag for empty buffers.
    
    For ixgbevf:
    Jake removes incorrect initialization/support of IPSEC for mailbox
    version 1.5.
    
    For ixgbe:
    Jake rewords and downgrades misleading message when negotiation
    of VF mailbox version is not supported.
    
    Tore Amundsen corrects value for BASE-BX10 capability.
    
    For igb:
    Yuan Can adds proper teardown on failed pci_register_driver() call.
    
    * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
      igb: Fix potential invalid memory access in igb_init_module()
      ixgbe: Correct BASE-BX10 compliance code
      ixgbe: downgrade logging of unsupported VF API version to debug
      ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5
      idpf: set completion tag for "empty" bufs associated with a packet
      ice: Fix VLAN pruning in switchdev mode
      ice: Fix NULL pointer dereference in switchdev
      ice: fix PHY timestamp extraction for ETH56G
      ice: fix PHY Clock Recovery availability check
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    4615855 View commit details
    Browse the repository at this point in the history
  18. audit: workaround a GCC bug triggered by task comm changes

    A build failure has been reported with the following details:
    
       In file included from include/linux/string.h:390,
                        from include/linux/bitmap.h:13,
                        from include/linux/cpumask.h:12,
                        from include/linux/smp.h:13,
                        from include/linux/lockdep.h:14,
                        from include/linux/spinlock.h:63,
                        from include/linux/wait.h:9,
                        from include/linux/wait_bit.h:8,
                        from include/linux/fs.h:6,
                        from kernel/auditsc.c:37:
       In function 'sized_strscpy',
           inlined from '__audit_ptrace' at kernel/auditsc.c:2732:2:
    >> include/linux/fortify-string.h:293:17:
       error: call to '__write_overflow' declared with attribute error:
       detected write beyond size of object (1st parameter)
         293 |                 __write_overflow();
             |                 ^~~~~~~~~~~~~~~~~~
       In function 'sized_strscpy',
           inlined from 'audit_signal_info_syscall' at kernel/auditsc.c:2759:3:
    >> include/linux/fortify-string.h:293:17:
       error: call to '__write_overflow' declared with attribute error:
       detected write beyond size of object (1st parameter)
         293 |                 __write_overflow();
             |                 ^~~~~~~~~~~~~~~~~~
    
    The issue appears to be a GCC bug, though the root cause remains
    unclear at this time. For now, let's implement a workaround.
    
    A bug report has also been filed with GCC [0].
    
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117912 [0]
    
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Reported-by: Steven Rostedt (Google) <[email protected]>
    Closes: https://lore.kernel.org/all/[email protected]/
    Reported-by: Zhuo, Qiuxu <[email protected]>
    Closes: https://lore.kernel.org/all/CY8PR11MB71348E568DBDA576F17DAFF389362@CY8PR11MB7134.namprd11.prod.outlook.com/
    Originally-by: Kees Cook <[email protected]>
    Link: https://lore.kernel.org/linux-hardening/202410171059.C2C395030@keescook/
    Signed-off-by: Yafang shao <[email protected]>
    Tested-by: Steven Rostedt (Google) <[email protected]>
    Tested-by: Yafang Shao <[email protected]>
    [PM: subject tweak, description line wrapping]
    Signed-off-by: Paul Moore <[email protected]>
    laoar authored and pcmoore committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    d938150 View commit details
    Browse the repository at this point in the history
  19. vsock/test: fix failures due to wrong SO_RCVLOWAT parameter

    This happens on 64-bit big-endian machines.
    SO_RCVLOWAT requires an int parameter. However, instead of int, the test
    uses unsigned long in one place and size_t in another. Both are 8 bytes
    long on 64-bit machines. The kernel, having received the 8 bytes, doesn't
    test for the exact size of the parameter, it only cares that it's >=
    sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on
    a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT
    returns with success and the socket stays with the default SO_RCVLOWAT = 1,
    which results in vsock_test failures, while vsock_perf doesn't even notice
    that it's failed to change it.
    
    Fixes: b134633 ("vsock_test: POLLIN + SO_RCVLOWAT test")
    Fixes: 542e893 ("vsock/test: two tests to check credit update logic")
    Fixes: 8abbffd ("test/vsock: vsock_perf utility")
    Signed-off-by: Konstantin Shkolnyy <[email protected]>
    Reviewed-by: Stefano Garzarella <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Konstantin Shkolnyy authored and Paolo Abeni committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    7ce1c09 View commit details
    Browse the repository at this point in the history
  20. vsock/test: fix parameter types in SO_VM_SOCKETS_* calls

    Change parameters of SO_VM_SOCKETS_* to unsigned long long as documented
    in the vm_sockets.h, because the corresponding kernel code requires them
    to be at least 64-bit, no matter what architecture. Otherwise they are
    too small on 32-bit machines.
    
    Fixes: 5c33811 ("test/vsock: rework message bounds test")
    Fixes: 685a21c ("test/vsock: add big message test")
    Fixes: 542e893 ("vsock/test: two tests to check credit update logic")
    Fixes: 8abbffd ("test/vsock: vsock_perf utility")
    Signed-off-by: Konstantin Shkolnyy <[email protected]>
    Reviewed-by: Stefano Garzarella <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Konstantin Shkolnyy authored and Paolo Abeni committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    3f36ee2 View commit details
    Browse the repository at this point in the history
  21. vsock/test: verify socket options after setting them

    Replace setsockopt() calls with calls to functions that follow
    setsockopt() with getsockopt() and check that the returned value and its
    size are the same as have been set. (Except in vsock_perf.)
    
    Signed-off-by: Konstantin Shkolnyy <[email protected]>
    Reviewed-by: Stefano Garzarella <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Konstantin Shkolnyy authored and Paolo Abeni committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    86814d8 View commit details
    Browse the repository at this point in the history
  22. Merge branch 'vsock-test-fix-wrong-setsockopt-parameters'

    Konstantin Shkolnyy says:
    
    ====================
    vsock/test: fix wrong setsockopt() parameters
    
    Parameters were created using wrong C types, which caused them to be of
    wrong size on some architectures, causing problems.
    
    The problem with SO_RCVLOWAT was found on s390 (big endian), while x86-64
    didn't show it. After the fix, all tests pass on s390.
    Then Stefano Garzarella pointed out that SO_VM_SOCKETS_* calls might have
    a similar problem, which turned out to be true, hence, the second patch.
    
    Changes for v8:
    - Fix whitespace warnings from "checkpatch.pl --strict"
    - Add maintainers to Cc:
    Changes for v7:
    - Rebase on top of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
    - Add the "net" tags to the subjects
    Changes for v6:
    - rework the patch #3 to avoid creating a new file for new functions,
    and exclude vsock_perf from calling the new functions.
    - add "Reviewed-by:" to the patch #2.
    Changes for v5:
    - in the patch #2 replace the introduced uint64_t with unsigned long long
    to match documentation
    - add a patch #3 that verifies every setsockopt() call.
    Changes for v4:
    - add "Reviewed-by:" to the first patch, and add a second patch fixing
    SO_VM_SOCKETS_* calls, which depends on the first one (hence, it's now
    a patch series.)
    Changes for v3:
    - fix the same problem in vsock_perf and update commit message
    Changes for v2:
    - add "Fixes:" lines to the commit message
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Paolo Abeni committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    0e21a47 View commit details
    Browse the repository at this point in the history
  23. Merge tag 'nf-24-12-05' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/netfilter/nf
    
    Pablo Neira Ayuso says:
    
    ====================
    Netfilter fixes for net
    
    The following patchset contains Netfilter fixes for net:
    
    1) Fix esoteric undefined behaviour due to uninitialized stack access
       in ip_vs_protocol_init(), from Jinghao Jia.
    
    2) Fix iptables xt_LED slab-out-of-bounds due to incorrect sanitization
       of the led string identifier, reported by syzbot. Patch from
       Dmitry Antipov.
    
    3) Remove WARN_ON_ONCE reachable from userspace to check for the maximum
       cgroup level, nft_socket cgroup matching is restricted to 255 levels,
       but cgroups allow for INT_MAX levels by default. Reported by syzbot.
    
    4) Fix nft_inner incorrect use of percpu area to store tunnel parser
       context with softirqs, resulting in inconsistent inner header
       offsets that could lead to bogus rule mismatches, reported by syzbot.
    
    5) Grab module reference on ipset core while requesting set type modules,
       otherwise kernel crash is possible by removing ipset core module,
       patch from Phil Sutter.
    
    6) Fix possible double-free in nft_hash garbage collector due to unstable
       walk interator that can provide twice the same element. Use a sequence
       number to skip expired/dead elements that have been already scheduled
       for removal. Based on patch from Laurent Fasnach
    
    netfilter pull request 24-12-05
    
    * tag 'nf-24-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
      netfilter: nft_set_hash: skip duplicated elements pending gc run
      netfilter: ipset: Hold module reference while requesting a module
      netfilter: nft_inner: incorrect percpu area handling under softirq
      netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level
      netfilter: x_tables: fix LED ID check in led_tg_check()
      ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Paolo Abeni committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    7b998e0 View commit details
    Browse the repository at this point in the history
  24. net: avoid potential UAF in default_operstate()

    syzbot reported an UAF in default_operstate() [1]
    
    Issue is a race between device and netns dismantles.
    
    After calling __rtnl_unlock() from netdev_run_todo(),
    we can not assume the netns of each device is still alive.
    
    Make sure the device is not in NETREG_UNREGISTERED state,
    and add an ASSERT_RTNL() before the call to
    __dev_get_by_index().
    
    We might move this ASSERT_RTNL() in __dev_get_by_index()
    in the future.
    
    [1]
    
    BUG: KASAN: slab-use-after-free in __dev_get_by_index+0x5d/0x110 net/core/dev.c:852
    Read of size 8 at addr ffff888043eba1b0 by task syz.0.0/5339
    
    CPU: 0 UID: 0 PID: 5339 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10296-gaaf20f870da0 #0
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:94 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
      print_address_description mm/kasan/report.c:378 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:489
      kasan_report+0x143/0x180 mm/kasan/report.c:602
      __dev_get_by_index+0x5d/0x110 net/core/dev.c:852
      default_operstate net/core/link_watch.c:51 [inline]
      rfc2863_policy+0x224/0x300 net/core/link_watch.c:67
      linkwatch_do_dev+0x3e/0x170 net/core/link_watch.c:170
      netdev_run_todo+0x461/0x1000 net/core/dev.c:10894
      rtnl_unlock net/core/rtnetlink.c:152 [inline]
      rtnl_net_unlock include/linux/rtnetlink.h:133 [inline]
      rtnl_dellink+0x760/0x8d0 net/core/rtnetlink.c:3520
      rtnetlink_rcv_msg+0x791/0xcf0 net/core/rtnetlink.c:6911
      netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2541
      netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
      netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1347
      netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1891
      sock_sendmsg_nosec net/socket.c:711 [inline]
      __sock_sendmsg+0x221/0x270 net/socket.c:726
      ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583
      ___sys_sendmsg net/socket.c:2637 [inline]
      __sys_sendmsg+0x269/0x350 net/socket.c:2669
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f2a3cb80809
    Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f2a3d9cd058 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007f2a3cd45fa0 RCX: 00007f2a3cb80809
    RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000008
    RBP: 00007f2a3cbf393e R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 00007f2a3cd45fa0 R15: 00007ffd03bc65c8
     </TASK>
    
    Allocated by task 5339:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
      __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
      kasan_kmalloc include/linux/kasan.h:260 [inline]
      __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4314
      kmalloc_noprof include/linux/slab.h:901 [inline]
      kmalloc_array_noprof include/linux/slab.h:945 [inline]
      netdev_create_hash net/core/dev.c:11870 [inline]
      netdev_init+0x10c/0x250 net/core/dev.c:11890
      ops_init+0x31e/0x590 net/core/net_namespace.c:138
      setup_net+0x287/0x9e0 net/core/net_namespace.c:362
      copy_net_ns+0x33f/0x570 net/core/net_namespace.c:500
      create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110
      unshare_nsproxy_namespaces+0x124/0x180 kernel/nsproxy.c:228
      ksys_unshare+0x57d/0xa70 kernel/fork.c:3314
      __do_sys_unshare kernel/fork.c:3385 [inline]
      __se_sys_unshare kernel/fork.c:3383 [inline]
      __x64_sys_unshare+0x38/0x40 kernel/fork.c:3383
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    Freed by task 12:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
      poison_slab_object mm/kasan/common.c:247 [inline]
      __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
      kasan_slab_free include/linux/kasan.h:233 [inline]
      slab_free_hook mm/slub.c:2338 [inline]
      slab_free mm/slub.c:4598 [inline]
      kfree+0x196/0x420 mm/slub.c:4746
      netdev_exit+0x65/0xd0 net/core/dev.c:11992
      ops_exit_list net/core/net_namespace.c:172 [inline]
      cleanup_net+0x802/0xcc0 net/core/net_namespace.c:632
      process_one_work kernel/workqueue.c:3229 [inline]
      process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
      worker_thread+0x870/0xd30 kernel/workqueue.c:3391
      kthread+0x2f0/0x390 kernel/kthread.c:389
      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
    The buggy address belongs to the object at ffff888043eba000
     which belongs to the cache kmalloc-2k of size 2048
    The buggy address is located 432 bytes inside of
     freed 2048-byte region [ffff888043eba000, ffff888043eba800)
    
    The buggy address belongs to the physical page:
    page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x43eb8
    head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
    flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
    page_type: f5(slab)
    raw: 04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000
    raw: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
    head: 04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000
    head: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
    head: 04fff00000000003 ffffea00010fae01 ffffffffffffffff 0000000000000000
    head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as allocated
    page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5339, tgid 5338 (syz.0.0), ts 69674195892, free_ts 69663220888
      set_page_owner include/linux/page_owner.h:32 [inline]
      post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1556
      prep_new_page mm/page_alloc.c:1564 [inline]
      get_page_from_freelist+0x3649/0x3790 mm/page_alloc.c:3474
      __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4751
      alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
      alloc_slab_page+0x6a/0x140 mm/slub.c:2408
      allocate_slab+0x5a/0x2f0 mm/slub.c:2574
      new_slab mm/slub.c:2627 [inline]
      ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3815
      __slab_alloc+0x58/0xa0 mm/slub.c:3905
      __slab_alloc_node mm/slub.c:3980 [inline]
      slab_alloc_node mm/slub.c:4141 [inline]
      __do_kmalloc_node mm/slub.c:4282 [inline]
      __kmalloc_noprof+0x2e6/0x4c0 mm/slub.c:4295
      kmalloc_noprof include/linux/slab.h:905 [inline]
      sk_prot_alloc+0xe0/0x210 net/core/sock.c:2165
      sk_alloc+0x38/0x370 net/core/sock.c:2218
      __netlink_create+0x65/0x260 net/netlink/af_netlink.c:629
      __netlink_kernel_create+0x174/0x6f0 net/netlink/af_netlink.c:2015
      netlink_kernel_create include/linux/netlink.h:62 [inline]
      uevent_net_init+0xed/0x2d0 lib/kobject_uevent.c:783
      ops_init+0x31e/0x590 net/core/net_namespace.c:138
      setup_net+0x287/0x9e0 net/core/net_namespace.c:362
    page last free pid 1032 tgid 1032 stack trace:
      reset_page_owner include/linux/page_owner.h:25 [inline]
      free_pages_prepare mm/page_alloc.c:1127 [inline]
      free_unref_page+0xdf9/0x1140 mm/page_alloc.c:2657
      __slab_free+0x31b/0x3d0 mm/slub.c:4509
      qlink_free mm/kasan/quarantine.c:163 [inline]
      qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
      kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
      __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
      kasan_slab_alloc include/linux/kasan.h:250 [inline]
      slab_post_alloc_hook mm/slub.c:4104 [inline]
      slab_alloc_node mm/slub.c:4153 [inline]
      kmem_cache_alloc_node_noprof+0x1d9/0x380 mm/slub.c:4205
      __alloc_skb+0x1c3/0x440 net/core/skbuff.c:668
      alloc_skb include/linux/skbuff.h:1323 [inline]
      alloc_skb_with_frags+0xc3/0x820 net/core/skbuff.c:6612
      sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2881
      sock_alloc_send_skb include/net/sock.h:1797 [inline]
      mld_newpack+0x1c3/0xaf0 net/ipv6/mcast.c:1747
      add_grhead net/ipv6/mcast.c:1850 [inline]
      add_grec+0x1492/0x19a0 net/ipv6/mcast.c:1988
      mld_send_initial_cr+0x228/0x4b0 net/ipv6/mcast.c:2234
      ipv6_mc_dad_complete+0x88/0x490 net/ipv6/mcast.c:2245
      addrconf_dad_completed+0x712/0xcd0 net/ipv6/addrconf.c:4342
     addrconf_dad_work+0xdc2/0x16f0
      process_one_work kernel/workqueue.c:3229 [inline]
      process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
    
    Memory state around the buggy address:
     ffff888043eba080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff888043eba100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff888043eba180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                         ^
     ffff888043eba200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff888043eba280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    
    Fixes: 8c55fac ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down")
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/netdev/[email protected]/T/#u
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Vladimir Oltean <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Eric Dumazet authored and Paolo Abeni committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    750e516 View commit details
    Browse the repository at this point in the history
  25. net :mana :Request a V2 response version for MANA_QUERY_GF_STAT

    The current requested response version(V1) for MANA_QUERY_GF_STAT query
    results in STATISTICS_FLAGS_TX_ERRORS_GDMA_ERROR value being set to
    0 always.
    In order to get the correct value for this counter we request the response
    version to be V2.
    
    Cc: [email protected]
    Fixes: e1df520 ("net :mana :Add remaining GDMA stats for MANA to ethtool")
    Signed-off-by: Shradha Gupta <[email protected]>
    Reviewed-by: Haiyang Zhang <[email protected]>
    Link: https://patch.msgid.link/1733291300-12593-1-git-send-email-shradhagupta@linux.microsoft.com
    Signed-off-by: Paolo Abeni <[email protected]>
    Shradha Gupta authored and Paolo Abeni committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    31f1b55 View commit details
    Browse the repository at this point in the history
  26. ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A

    HiSilicon HIP09A platforms using the same SMMU PMCG with HIP09
    and thus suffers the same erratum. List them in the PMCG platform
    information list without introducing a new SMMU PMCG Model.
    
    Update the silicon-errata.rst as well.
    
    Reviewed-by: Yicong Yang <[email protected]>
    Acked-by: Hanjun Guo <[email protected]>
    Signed-off-by: Qinxin Xia <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    Qinxin Xia authored and ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    c2b46ae View commit details
    Browse the repository at this point in the history
  27. arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-b…

    …it ASIDs
    
    Linux currently sets the TCR_EL1.AS bit unconditionally during CPU
    bring-up. On an 8-bit ASID CPU, this is RES0 and ignored, otherwise
    16-bit ASIDs are enabled. However, if running in a VM and the hypervisor
    reports 8-bit ASIDs (ID_AA64MMFR0_EL1.ASIDBits == 0) on a 16-bit ASIDs
    CPU, Linux uses bits 8 to 63 as a generation number for tracking old
    process ASIDs. The bottom 8 bits of this generation end up being written
    to TTBR1_EL1 and also used for the ASID-based TLBI operations as the
    upper 8 bits of the ASID. Following an ASID roll-over event we can have
    threads of the same application with the same 8-bit ASID but different
    generation numbers running on separate CPUs. Both TLB caching and the
    TLBI operations will end up using different actual 16-bit ASIDs for the
    same process.
    
    A similar scenario can happen in a big.LITTLE configuration if the boot
    CPU only uses 8-bit ASIDs while secondary CPUs have 16-bit ASIDs.
    
    Ensure that the ASID generation is only tracked by bits 16 and up,
    leaving bits 15:8 as 0 if the kernel uses 8-bit ASIDs. Note that
    clearing TCR_EL1.AS is not sufficient since the architecture requires
    that the top 8 bits of the ASID passed to TLBI instructions are 0 rather
    than ignored in such configuration.
    
    Cc: [email protected]
    Cc: Will Deacon <[email protected]>
    Cc: Mark Rutland <[email protected]>
    Cc: Marc Zyngier <[email protected]>
    Cc: James Morse <[email protected]>
    Acked-by: Mark Rutland <[email protected]>
    Acked-by: Marc Zyngier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    c0900d1 View commit details
    Browse the repository at this point in the history
  28. arm64: mte: Fix copy_highpage() warning on hugetlb folios

    Commit 25c17c4 ("hugetlb: arm64: add mte support") improved the
    copy_highpage() function to update the tags in the destination hugetlb
    folio. However, when the source folio isn't tagged, the code takes the
    non-hugetlb path where try_page_mte_tagging() warns as the destination
    is a hugetlb folio:
    
      WARNING: CPU: 0 PID: 363 at arch/arm64/include/asm/mte.h:58 copy_highpage+0x1d4/0x2d8
      [...]
      pc : copy_highpage+0x1d4/0x2d8
      lr : copy_highpage+0x78/0x2d8
      [...]
      Call trace:
       copy_highpage+0x1d4/0x2d8 (P)
       copy_highpage+0x78/0x2d8 (L)
       copy_user_highpage+0x20/0x48
       copy_user_large_folio+0x1bc/0x268
       hugetlb_wp+0x190/0x860
       hugetlb_fault+0xa28/0xc10
       handle_mm_fault+0x2a0/0x2c0
       do_page_fault+0x12c/0x578
       do_mem_abort+0x4c/0xa8
       el0_da+0x44/0xb0
       el0t_64_sync_handler+0xc4/0x138
       el0t_64_sync+0x198/0x1a0
    
    Change the check for the tagged status of the source folio so that it
    does not fall through the non-hugetlb case. In addition, only perform
    the copy (for the full folio) if the source page is the folio head and
    warn if the destination folio is already tagged, for symmetry with the
    non-hugetlb case.
    
    Fixes: 25c17c4 ("hugetlb: arm64: add mte support")
    Reported-by: Sasha Levin <[email protected]>
    Cc: Yang Shi <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: Will Deacon <[email protected]>
    Link: https://lore.kernel.org/r/Z0STR6VLt2MCalnY@sashalap
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    cf3b16d View commit details
    Browse the repository at this point in the history
  29. jffs2: Fix rtime decompressor

    The fix for a memory corruption contained a off-by-one error and
    caused the compressor to fail in legit cases.
    
    Cc: Kinsey Moore <[email protected]>
    Cc: [email protected]
    Fixes: fe05155 ("jffs2: Prevent rtime decompress memory corruption")
    Signed-off-by: Richard Weinberger <[email protected]>
    richardweinberger committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    b29bf71 View commit details
    Browse the repository at this point in the history
  30. spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enab…

    …led()
    
    The devm_clk_get_optional_enabled() function returns error
    pointers(PTR_ERR()). So use IS_ERR() to check it.
    
    Verified on K3-J7200 EVM board, without clock node mentioned
    in the device tree.
    
    Signed-off-by: Purushothama Siddaiah <[email protected]>
    Reviewed-by: Corey Minyard <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    psiddaiah authored and broonie committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    4c6ac54 View commit details
    Browse the repository at this point in the history
  31. x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page ta…

    …bles
    
    The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups
    respectively) assume that the root page table is actually a 8KiB allocation,
    with the userspace root immediately after the kernel root page table (so that
    the former can enforce NX on on all the subordinate page tables, which are
    actually shared).
    
    However, users of the kernel_ident_mapping_init() code do not give it an 8KiB
    allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset()
    allocate only a single 4KiB page. The kexec code on x86_64 currently gets
    away with it purely by chance, because it allocates 8KiB for its "control
    code page" and then actually uses the first half for the PGD, then copies the
    actual trampoline code into the second half only after the identmap code has
    finished scribbling over it.
    
    Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as
    _PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is
    exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to
    write to the userspace 'shadow' PGD.
    
    Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the
    actual page tables; since __pti_set_user_pgtbl() returns the value to be
    written to the kernel page table, it could be filtered out. But there seems
    to be no benefit to actually doing so.
    
    Suggested-by: Dave Hansen <[email protected]>
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Cc: [email protected]
    Cc: Linus Torvalds <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Rik van Riel <[email protected]>
    dwmw2 authored and Ingo Molnar committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    d0ceea6 View commit details
    Browse the repository at this point in the history
  32. ASoC: mediatek: mt8188-mt6359: Remove hardcoded dmic codec

    Remove hardcoded dmic codec from the UL_SRC dai link to avoid requiring
    a dmic codec to be present for the driver to probe, as not every
    MT8188-based platform might need a dmic codec. The codec can be assigned
    to the dai link through the dai-link property in Devicetree on the
    platforms where it is needed.
    
    No Devicetree currently relies on it so it is safe to remove without
    worrying about backward compatibility.
    
    Fixes: 9f08dcb ("ASoC: mediatek: mt8188-mt6359: support new board with nau88255")
    Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
    Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
    Link: https://patch.msgid.link/20241203-mt8188-6359-unhardcode-dmic-v1-1-346e3e5cbe6d@collabora.com
    Signed-off-by: Mark Brown <[email protected]>
    nfraprado authored and broonie committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    ec16a3c View commit details
    Browse the repository at this point in the history
  33. ALSA: hda/realtek: Fix spelling mistake "Firelfy" -> "Firefly"

    There is a spelling mistake in a literal string in the alc269_fixup_tbl
    quirk table. Fix it.
    
    Fixes: 0d08f0e ("ALSA: hda/realtek: fix micmute LEDs don't work on HP Laptops")
    Signed-off-by: Colin Ian King <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    ColinIanKing authored and tiwai committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    20c3b3e View commit details
    Browse the repository at this point in the history
  34. x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC

    The rework of possible CPUs management erroneously disabled SMP when the
    IO/APIC is disabled either by the 'noapic' command line parameter or during
    IO/APIC setup. SMP is possible without IO/APIC.
    
    Remove the ioapic_is_disabled conditions from the relevant possible CPU
    management code paths to restore the orgininal behaviour.
    
    Fixes: 7c0edad ("x86/cpu/topology: Rework possible CPU management")
    Signed-off-by: Fernando Fernandez Mancera <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/all/[email protected]
    ffmancera authored and KAGA-KOKO committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    73da582 View commit details
    Browse the repository at this point in the history
  35. drm/dp_mst: Fix resetting msg rx state after topology removal

    If the MST topology is removed during the reception of an MST down reply
    or MST up request sideband message, the
    drm_dp_mst_topology_mgr::up_req_recv/down_rep_recv states could be reset
    from one thread via drm_dp_mst_topology_mgr_set_mst(false), racing with
    the reading/parsing of the message from another thread via
    drm_dp_mst_handle_down_rep() or drm_dp_mst_handle_up_req(). The race is
    possible since the reader/parser doesn't hold any lock while accessing
    the reception state. This in turn can lead to a memory corruption in the
    reader/parser as described by commit bd2fcca ("drm/dp_mst: Fix MST
    sideband message body length check").
    
    Fix the above by resetting the message reception state if needed before
    reading/parsing a message. Another solution would be to hold the
    drm_dp_mst_topology_mgr::lock for the whole duration of the message
    reception/parsing in drm_dp_mst_handle_down_rep() and
    drm_dp_mst_handle_up_req(), however this would require a bigger change.
    Since the fix is also needed for stable, opting for the simpler solution
    in this patch.
    
    Cc: Lyude Paul <[email protected]>
    Cc: <[email protected]>
    Fixes: 1d08261 ("drm/display/dp_mst: Fix down/up message handling after sink disconnect")
    Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13056
    Reviewed-by: Lyude Paul <[email protected]>
    Signed-off-by: Imre Deak <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    ideak committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    a6fa67d View commit details
    Browse the repository at this point in the history
  36. drm/dp_mst: Verify request type in the corresponding down message reply

    After receiving the response for an MST down request message, the
    response should be accepted/parsed only if the response type matches
    that of the request. Ensure this by checking if the request type code
    stored both in the request and the reply match, dropping the reply in
    case of a mismatch.
    
    This fixes the topology detection for an MST hub, as described in the
    Closes link below, where the hub sends an incorrect reply message after
    a CLEAR_PAYLOAD_TABLE -> LINK_ADDRESS down request message sequence.
    
    Cc: Lyude Paul <[email protected]>
    Cc: <[email protected]>
    Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12804
    Reviewed-by: Lyude Paul <[email protected]>
    Signed-off-by: Imre Deak <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    ideak committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    4d49e77 View commit details
    Browse the repository at this point in the history
  37. drm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()

    Simplify the error return path in drm_dp_mst_handle_down_rep(),
    preparing for the next patch.
    
    While at it use reset_msg_rx_state() instead of open-coding it.
    
    Cc: Lyude Paul <[email protected]>
    Reviewed-by: Lyude Paul <[email protected]>
    Signed-off-by: Imre Deak <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    ideak committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    b559b68 View commit details
    Browse the repository at this point in the history
  38. drm/dp_mst: Fix down request message timeout handling

    If receiving a reply for an MST down request message times out, the
    thread receiving the reply in drm_dp_mst_handle_down_rep() could try to
    dereference the drm_dp_sideband_msg_tx txmsg request message after the
    thread waiting for the reply - calling drm_dp_mst_wait_tx_reply() - has
    timed out and freed txmsg, hence leading to a use-after-free in
    drm_dp_mst_handle_down_rep().
    
    Prevent the above by holding the drm_dp_mst_topology_mgr::qlock in
    drm_dp_mst_handle_down_rep() for the whole duration txmsg is looked up
    from the request list and dereferenced.
    
    v2: Fix unlocking mgr->qlock after verify_rx_request_type() fails.
    
    Cc: Lyude Paul <[email protected]>
    Reviewed-by: Lyude Paul <[email protected]>
    Signed-off-by: Imre Deak <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    ideak committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    3f61185 View commit details
    Browse the repository at this point in the history
  39. drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_…

    …up_req()
    
    While receiving an MST up request message from one thread in
    drm_dp_mst_handle_up_req(), the MST topology could be removed from
    another thread via drm_dp_mst_topology_mgr_set_mst(false), freeing
    mst_primary and setting drm_dp_mst_topology_mgr::mst_primary to NULL.
    This could lead to a NULL deref/use-after-free of mst_primary in
    drm_dp_mst_handle_up_req().
    
    Avoid the above by holding a reference for mst_primary in
    drm_dp_mst_handle_up_req() while it's used.
    
    v2: Fix kfreeing the request if getting an mst_primary reference fails.
    
    Cc: Lyude Paul <[email protected]>
    Reviewed-by: Lyude Paul <[email protected]> (v1)
    Signed-off-by: Imre Deak <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    ideak committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    e54b000 View commit details
    Browse the repository at this point in the history
  40. drm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_…

    …req()
    
    After an out-of-memory error the reception state should be reset, so
    that the next attempt receiving a message doesn't fail (due to getting a
    start-of-message packet, while the reception state has already the
    start-of-message flag set).
    
    Cc: Lyude Paul <[email protected]>
    Reviewed-by: Lyude Paul <[email protected]>
    Signed-off-by: Imre Deak <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    ideak committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    2b245c9 View commit details
    Browse the repository at this point in the history
  41. tracing: Fix archs that still call tracepoints without RCU watching

    Tracepoints require having RCU "watching" as it uses RCU to do updates to
    the tracepoints. There are some cases that would call a tracepoint when
    RCU was not "watching". This was usually in the idle path where RCU has
    "shutdown". For the few locations that had tracepoints without RCU
    watching, there was an trace_*_rcuidle() variant that could be used. This
    used SRCU for protection.
    
    There are tracepoints that trace when interrupts and preemption are
    enabled and disabled. In some architectures, these tracepoints are called
    in a path where RCU is not watching. When x86 and arm64 removed these
    locations, it was incorrectly assumed that it would be safe to remove the
    trace_*_rcuidle() variant and also remove the SRCU logic, as it made the
    code more complex and harder to implement new tracepoint features (like
    faultable tracepoints and tracepoints in rust).
    
    Instead of bringing back the trace_*_rcuidle(), as it will not be trivial
    to do as new code has already been added depending on its removal, add a
    workaround to the one file that still requires it (trace_preemptirq.c). If
    the architecture does not define CONFIG_ARCH_WANTS_NO_INSTR, then check if
    the code is in the idle path, and if so, call ct_irq_enter/exit() which
    will enable RCU around the tracepoint.
    
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Mark Rutland <[email protected]>
    Link: https://lore.kernel.org/[email protected]
    Reported-by: Geert Uytterhoeven <[email protected]>
    Fixes: 48bcda6 ("tracing: Remove definition of trace_*_rcuidle()")
    Closes: https://lore.kernel.org/all/[email protected]/
    Acked-by: Paul E. McKenney <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Tested-by: Geert Uytterhoeven <[email protected]>
    Tested-by: Madhavan Srinivasan <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    rostedt committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    dc1b157 View commit details
    Browse the repository at this point in the history
  42. drm/dp_mst: Use reset_msg_rx_state() instead of open coding it

    Use reset_msg_rx_state() in drm_dp_mst_handle_up_req() instead of
    open-coding it.
    
    Cc: Lyude Paul <[email protected]>
    Reviewed-by: Lyude Paul <[email protected]>
    Signed-off-by: Imre Deak <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    ideak committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    59ca0e1 View commit details
    Browse the repository at this point in the history
  43. coco: virt: arm64: Do not enable cca guest driver by default

    As per the guidelines, new drivers may not be set to default on.
    An expert user can always select it.
    
    Reported-by: Dan Williams <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: Steven Price <[email protected]>
    Cc: Sami Mujawar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Suzuki K Poulose <[email protected]>
    Reviewed-by: Steven Price <[email protected]>
    Signed-off-by: Catalin Marinas <[email protected]>
    Suzuki K Poulose authored and ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    16d5306 View commit details
    Browse the repository at this point in the history
  44. clocksource: Make negative motion detection more robust

    Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
    24-bit wide clocksource.
    
    It turns out that the calculated maximal idle time, which limits idle
    sleeps to prevent clocksource wrap arounds, is close to the point where the
    negative motion detection triggers.
    
      max_idle_ns:                    597268854 ns
      negative motion tripping point: 671088640 ns
    
    If the idle wakeup is delayed beyond that point, the clocksource
    advances far enough to trigger the negative motion detection. This
    prevents the clock to advance and in the worst case the system stalls
    completely if the consecutive sleeps based on the stale clock are
    delayed as well.
    
    Cure this by calculating a more robust cut-off value for negative motion,
    which covers 87.5% of the actual clocksource counter width. Compare the
    delta against this value to catch negative motion. This is specifically for
    clock sources with a small counter width as their wrap around time is close
    to the half counter width. For clock sources with wide counters this is not
    a problem because the maximum idle time is far from the half counter width
    due to the math overflow protection constraints.
    
    For the case at hand this results in a tripping point of 1174405120ns.
    
    Note, that this cannot prevent issues when the delay exceeds the 87.5%
    margin, but that's not different from the previous unchecked version which
    allowed arbitrary time jumps.
    
    Systems with small counter width are prone to invalid results, but this
    problem is unlikely to be seen on real hardware. If such a system
    completely stalls for more than half a second, then there are other more
    urgent problems than the counter wrapping around.
    
    Fixes: c163e40 ("timekeeping: Always check for negative motion")
    Reported-by: Guenter Roeck <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
    Closes: https://lore.kernel.org/all/[email protected]
    KAGA-KOKO committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    76031d9 View commit details
    Browse the repository at this point in the history
  45. virtio-blk: don't keep queue frozen during system suspend

    Commit 4ce6e2d ("virtio-blk: Ensure no requests in virtqueues before
    deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
    PM callbacks. And the motivation is to drain inflight IOs before suspending.
    
    block layer's queue freeze looks very handy, but it is also easy to cause
    deadlock, such as, any attempt to call into bio_queue_enter() may run into
    deadlock if the queue is frozen in current context. There are all kinds
    of ->suspend() called in suspend context, so keeping queue frozen in the
    whole suspend context isn't one good idea. And Marek reported lockdep
    warning[1] caused by virtio-blk's freeze queue in virtblk_freeze().
    
    [1] https://lore.kernel.org/linux-block/[email protected]/
    
    Given the motivation is to drain in-flight IOs, it can be done by calling
    freeze & unfreeze, meantime restore to previous behavior by keeping queue
    quiesced during suspend.
    
    Cc: Yi Sun <[email protected]>
    Cc: Michael S. Tsirkin <[email protected]>
    Cc: Jason Wang <[email protected]>
    Cc: Stefan Hajnoczi <[email protected]>
    Cc: [email protected]
    Reported-by: Marek Szyprowski <[email protected]>
    Signed-off-by: Ming Lei <[email protected]>
    Acked-by: Stefan Hajnoczi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Ming Lei authored and axboe committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    7678abe View commit details
    Browse the repository at this point in the history
  46. Merge tag 'asoc-fix-v6.13-rc1' of https://git.kernel.org/pub/scm/linu…

    …x/kernel/git/broonie/sound into for-linus
    
    ASoC: Fixes for v6.13
    
    A few small fixes for v6.13, all system specific - the biggest thing is
    the fix for jack handling over suspend on some Intel laptops.
    tiwai committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    c34e9ab View commit details
    Browse the repository at this point in the history
  47. Merge tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme into…

    … block-6.13
    
    Pull NVMe fixess from Keith:
    
    "nvme fixes for Linux 6.13
    
     - Target fix using incorrect zero buffer (Nilay)
     - Device specifc deallocate quirk fixes (Christoph, Keith)
     - Fabrics fix for handling max command target bugs (Maurizio)
     - Cocci fix usage for kzalloc (Yu-Chen)
     - DMA size fix for host memory buffer feature (Christoph)
     - Fabrics queue cleanup fixes (Chunguang)"
    
    * tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme:
      nvme-tcp: simplify nvme_tcp_teardown_io_queues()
      nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
      nvme-rdma: unquiesce admin_q before destroy it
      nvme-tcp: fix the memleak while create new ctrl failed
      nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary
      nvmet: replace kmalloc + memset with kzalloc for data allocation
      nvme-fabrics: handle zero MAXCMD without closing the connection
      nvme-pci: remove two deallocate zeroes quirks
      nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
      nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()
    axboe committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    d64fd5f View commit details
    Browse the repository at this point in the history
  48. arm64: cpufeature: Add GCS to cpucap_is_possible()

    Since system_supports_gcs() ends up referring to cpucap_is_possible(),
    teach the latter about GCS for consistency with similar features.
    
    Signed-off-by: Robin Murphy <[email protected]>
    Acked-by: Mark Rutland <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/416c7369fcdce4ebb2a8f12daae234507be27e38.1733406275.git.robin.murphy@arm.com
    Signed-off-by: Catalin Marinas <[email protected]>
    rmurphy-arm authored and ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    f00b53f View commit details
    Browse the repository at this point in the history
  49. drm/v3d: Enable Performance Counters before clearing them

    On the Raspberry Pi 5, performance counters are not being cleared
    when `v3d_perfmon_start()` is called, even though we write to the
    CLR register. As a result, their values accumulate until they
    overflow.
    
    The expected behavior is for performance counters to reset to zero
    at the start of a job. When the job finishes and the perfmon is
    stopped, the counters should accurately reflect the values for that
    specific job.
    
    To ensure this behavior, the performance counters are now enabled
    before being cleared. This allows the CLR register to function as
    intended, zeroing the counter values when the job begins.
    
    Fixes: 26a4dc2 ("drm/v3d: Expose performance counters to userspace")
    Signed-off-by: Maíra Canal <[email protected]>
    Reviewed-by: Iago Toral Quiroga <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    mairacanal committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    c98b104 View commit details
    Browse the repository at this point in the history
  50. arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL

    Currently tagged_addr_ctrl_set() doesn't initialize the temporary 'ctrl'
    variable, and a SETREGSET call with a length of zero will leave this
    uninitialized. Consequently tagged_addr_ctrl_set() will consume an
    arbitrary value, potentially leaking up to 64 bits of memory from the
    kernel stack. The read is limited to a specific slot on the stack, and
    the issue does not provide a write mechanism.
    
    As set_tagged_addr_ctrl() only accepts values where bits [63:4] zero and
    rejects other values, a partial SETREGSET attempt will randomly succeed
    or fail depending on the value of the uninitialized value, and the
    exposure is significantly limited.
    
    Fix this by initializing the temporary value before copying the regset
    from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
    NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
    value of the tagged address ctrl will be retained.
    
    The NT_ARM_TAGGED_ADDR_CTRL regset is only visible in the
    user_aarch64_view used by a native AArch64 task to manipulate another
    native AArch64 task. As get_tagged_addr_ctrl() only returns an error
    value when called for a compat task, tagged_addr_ctrl_get() and
    tagged_addr_ctrl_set() should never observe an error value from
    get_tagged_addr_ctrl(). Add a WARN_ON_ONCE() to both to indicate that
    such an error would be unexpected, and error handlnig is not missing in
    either case.
    
    Fixes: 2200aa7 ("arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset")
    Cc: <[email protected]> # 5.10.x
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: Will Deacon <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    Mark Rutland authored and ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    ca62d90 View commit details
    Browse the repository at this point in the history
  51. Merge tag 'linux-watchdog-6.13-rc1' of git://www.linux-watchdog.org/l…

    …inux-watchdog
    
    Pull watchdog updates from Wim Van Sebroeck:
    
     - Add support for exynosautov920 SoC
    
     - Add support for Airoha EN7851 watchdog
    
     - Add support for MT6735 TOPRGU/WDT
    
     - Delete the cpu5wdt driver
    
     - Always print when registering watchdog fails
    
     - Several other small fixes and improvements
    
    * tag 'linux-watchdog-6.13-rc1' of git://www.linux-watchdog.org/linux-watchdog: (36 commits)
      watchdog: rti: of: honor timeout-sec property
      watchdog: s3c2410_wdt: add support for exynosautov920 SoC
      dt-bindings: watchdog: Document ExynosAutoV920 watchdog bindings
      watchdog: mediatek: Add support for MT6735 TOPRGU/WDT
      watchdog: mediatek: Make sure system reset gets asserted in mtk_wdt_restart()
      dt-bindings: watchdog: fsl-imx-wdt: Add missing 'big-endian' property
      dt-bindings: watchdog: Document Qualcomm QCS8300
      docs: ABI: Fix spelling mistake in pretimeout_avaialable_governors
      Revert "watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs"
      watchdog: rzg2l_wdt: Power on the watchdog domain in the restart handler
      watchdog: Switch back to struct platform_driver::remove()
      watchdog: it87_wdt: add PWRGD enable quirk for Qotom QCML04
      watchdog: da9063: Remove __maybe_unused notations
      watchdog: da9063: Do not use a global variable
      watchdog: Delete the cpu5wdt driver
      watchdog: Add support for Airoha EN7851 watchdog
      dt-bindings: watchdog: airoha: document watchdog for Airoha EN7581
      watchdog: sl28cpld_wdt: don't print out if registering watchdog fails
      watchdog: rza_wdt: don't print out if registering watchdog fails
      watchdog: rti_wdt: don't print out if registering watchdog fails
      ...
    torvalds committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    42d52ac View commit details
    Browse the repository at this point in the history
  52. arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR

    Currently fpmr_set() doesn't initialize the temporary 'fpmr' variable,
    and a SETREGSET call with a length of zero will leave this
    uninitialized. Consequently an arbitrary value will be written back to
    target->thread.uw.fpmr, potentially leaking up to 64 bits of memory from
    the kernel stack. The read is limited to a specific slot on the stack,
    and the issue does not provide a write mechanism.
    
    Fix this by initializing the temporary value before copying the regset
    from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
    NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
    contents of FPMR will be retained.
    
    Before this patch:
    
    | # ./fpmr-test
    | Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d
    | SETREGSET(nt=0x40e, len=8) wrote 8 bytes
    |
    | Attempting to read NT_ARM_FPMR::fpmr
    | GETREGSET(nt=0x40e, len=8) read 8 bytes
    | Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
    |
    | Attempting to write NT_ARM_FPMR (zero length)
    | SETREGSET(nt=0x40e, len=0) wrote 0 bytes
    |
    | Attempting to read NT_ARM_FPMR::fpmr
    | GETREGSET(nt=0x40e, len=8) read 8 bytes
    | Read NT_ARM_FPMR::fpmr = 0xffff800083963d50
    
    After this patch:
    
    | # ./fpmr-test
    | Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d
    | SETREGSET(nt=0x40e, len=8) wrote 8 bytes
    |
    | Attempting to read NT_ARM_FPMR::fpmr
    | GETREGSET(nt=0x40e, len=8) read 8 bytes
    | Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
    |
    | Attempting to write NT_ARM_FPMR (zero length)
    | SETREGSET(nt=0x40e, len=0) wrote 0 bytes
    |
    | Attempting to read NT_ARM_FPMR::fpmr
    | GETREGSET(nt=0x40e, len=8) read 8 bytes
    | Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
    
    Fixes: 4035c22 ("arm64/ptrace: Expose FPMR via ptrace")
    Cc: <[email protected]> # 6.9.x
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Will Deacon <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    Mark Rutland authored and ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    f5d7129 View commit details
    Browse the repository at this point in the history
  53. arm64: ptrace: fix partial SETREGSET for NT_ARM_POE

    Currently poe_set() doesn't initialize the temporary 'ctrl' variable,
    and a SETREGSET call with a length of zero will leave this
    uninitialized. Consequently an arbitrary value will be written back to
    target->thread.por_el0, potentially leaking up to 64 bits of memory from
    the kernel stack. The read is limited to a specific slot on the stack,
    and the issue does not provide a write mechanism.
    
    Fix this by initializing the temporary value before copying the regset
    from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
    NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
    contents of POR_EL1 will be retained.
    
    Before this patch:
    
    | # ./poe-test
    | Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d
    | SETREGSET(nt=0x40f, len=8) wrote 8 bytes
    |
    | Attempting to read NT_ARM_POE::por_el0
    | GETREGSET(nt=0x40f, len=8) read 8 bytes
    | Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
    |
    | Attempting to write NT_ARM_POE (zero length)
    | SETREGSET(nt=0x40f, len=0) wrote 0 bytes
    |
    | Attempting to read NT_ARM_POE::por_el0
    | GETREGSET(nt=0x40f, len=8) read 8 bytes
    | Read NT_ARM_POE::por_el0 = 0xffff8000839c3d50
    
    After this patch:
    
    | # ./poe-test
    | Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d
    | SETREGSET(nt=0x40f, len=8) wrote 8 bytes
    |
    | Attempting to read NT_ARM_POE::por_el0
    | GETREGSET(nt=0x40f, len=8) read 8 bytes
    | Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
    |
    | Attempting to write NT_ARM_POE (zero length)
    | SETREGSET(nt=0x40f, len=0) wrote 0 bytes
    |
    | Attempting to read NT_ARM_POE::por_el0
    | GETREGSET(nt=0x40f, len=8) read 8 bytes
    | Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
    
    Fixes: 1751981 ("arm64/ptrace: add support for FEAT_POE")
    Cc: <[email protected]> # 6.12.x
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: Joey Gouly <[email protected]>
    Cc: Will Deacon <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    Mark Rutland authored and ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    594bfc4 View commit details
    Browse the repository at this point in the history
  54. arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS

    Currently gcs_set() doesn't initialize the temporary 'user_gcs'
    variable, and a SETREGSET call with a length of 0, 8, or 16 will leave
    some portion of this uninitialized. Consequently some arbitrary
    uninitialized values may be written back to the relevant fields in task
    struct, potentially leaking up to 192 bits of memory from the kernel
    stack. The read is limited to a specific slot on the stack, and the
    issue does not provide a write mechanism.
    
    As gcs_set() rejects cases where user_gcs::features_enabled has bits set
    other than PR_SHADOW_STACK_SUPPORTED_STATUS_MASK, a SETREGSET call with
    a length of zero will randomly succeed or fail depending on the value of
    the uninitialized value, it isn't possible to leak the full 192 bits.
    With a length of 8 or 16, user_gcs::features_enabled can be initialized
    to an accepted value, making it practical to leak 128 or 64 bits.
    
    Fix this by initializing the temporary value before copying the regset
    from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
    NT_ARM_SYSTEM_CALL). In the case of a zero-length or partial write, the
    existing contents of the fields which are not written to will be
    retained.
    
    To ensure that the extraction and insertion of fields is consistent
    across the GETREGSET and SETREGSET calls, new task_gcs_to_user() and
    task_gcs_from_user() helpers are added, matching the style of
    pac_address_keys_to_user() and pac_address_keys_from_user().
    
    Before this patch:
    
    | # ./gcs-test
    | Attempting to write NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x0000000000000000,
    |     .gcspr_el0        = 0x900d900d900d900d,
    | }
    | SETREGSET(nt=0x410, len=24) wrote 24 bytes
    |
    | Attempting to read NT_ARM_GCS::user_gcs
    | GETREGSET(nt=0x410, len=24) read 24 bytes
    | Read NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x0000000000000000,
    |     .gcspr_el0        = 0x900d900d900d900d,
    | }
    |
    | Attempting partial write NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x1de7ec7edbadc0de,
    |     .gcspr_el0        = 0x1de7ec7edbadc0de,
    | }
    | SETREGSET(nt=0x410, len=8) wrote 8 bytes
    |
    | Attempting to read NT_ARM_GCS::user_gcs
    | GETREGSET(nt=0x410, len=24) read 24 bytes
    | Read NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x000000000093e780,
    |     .gcspr_el0        = 0xffff800083a63d50,
    | }
    
    After this patch:
    
    | # ./gcs-test
    | Attempting to write NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x0000000000000000,
    |     .gcspr_el0        = 0x900d900d900d900d,
    | }
    | SETREGSET(nt=0x410, len=24) wrote 24 bytes
    |
    | Attempting to read NT_ARM_GCS::user_gcs
    | GETREGSET(nt=0x410, len=24) read 24 bytes
    | Read NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x0000000000000000,
    |     .gcspr_el0        = 0x900d900d900d900d,
    | }
    |
    | Attempting partial write NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x1de7ec7edbadc0de,
    |     .gcspr_el0        = 0x1de7ec7edbadc0de,
    | }
    | SETREGSET(nt=0x410, len=8) wrote 8 bytes
    |
    | Attempting to read NT_ARM_GCS::user_gcs
    | GETREGSET(nt=0x410, len=24) read 24 bytes
    | Read NT_ARM_GCS::user_gcs = {
    |     .features_enabled = 0x0000000000000000,
    |     .features_locked  = 0x0000000000000000,
    |     .gcspr_el0        = 0x900d900d900d900d,
    | }
    
    Fixes: 7ec3b57 ("arm64/ptrace: Expose GCS via ptrace and core files")
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Will Deacon <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Catalin Marinas <[email protected]>
    Mark Rutland authored and ctmarinas committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    d60624f View commit details
    Browse the repository at this point in the history
  55. Merge tag 'hid-for-linus-2024120501' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/hid/hid
    
    Pull HID fixes from Benjamin Tissoires:
    
     - regression fix in suspend/resume for i2c-hid (Kenny Levinsen)
    
     - fix wacom driver assuming a name can not be null (WangYuli)
    
     - a couple of constify changes/fixes (Thomas Weißschuh)
    
     - a couple of selftests/hid fixes (Maximilian Heyne & Benjamin
       Tissoires)
    
    * tag 'hid-for-linus-2024120501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
      selftests/hid: fix kfunc inclusions with newer bpftool
      HID: bpf: drop unneeded casts discarding const
      HID: bpf: constify hid_ops
      selftests: hid: fix typo and exit code
      HID: wacom: fix when get product name maybe null pointer
      HID: i2c-hid: Revert to using power commands to wake on resume
    torvalds committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    2a770b4 View commit details
    Browse the repository at this point in the history
  56. Merge tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/trace/linux-trace
    
    Pull tracing fixes from Steven Rostedt:
    
     - Fix trace histogram sort function cmp_entries_dup()
    
       The sort function cmp_entries_dup() returns either 1 or 0, and not -1
       if parameter "a" is less than "b" by memcmp().
    
     - Fix archs that call trace_hardirqs_off() without RCU watching
    
       Both x86 and arm64 no longer call any tracepoints with RCU not
       watching. It was assumed that it was safe to get rid of
       trace_*_rcuidle() version of the tracepoint calls. This was needed to
       get rid of the SRCU protection and be able to implement features like
       faultable traceponits and add rust tracepoints.
    
       Unfortunately, there were a few architectures that still relied on
       that logic. There's only one file that has tracepoints that are
       called without RCU watching. Add macro logic around the tracepoints
       for architectures that do not have CONFIG_ARCH_WANTS_NO_INSTR defined
       will check if the code is in the idle path (the only place RCU isn't
       watching), and enable RCU around calling the tracepoint, but only do
       it if the tracepoint is enabled.
    
    * tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
      tracing: Fix archs that still call tracepoints without RCU watching
      tracing: Fix cmp_entries_dup() to respect sort() comparison rules
    torvalds committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    9d6a414 View commit details
    Browse the repository at this point in the history
  57. Merge tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/netdev/net
    
    Pull networking fixes from Paolo Abeni:
     "Including fixes from can and netfilter.
    
      Current release - regressions:
    
       - rtnetlink: fix double call of rtnl_link_get_net_ifla()
    
       - tcp: populate XPS related fields of timewait sockets
    
       - ethtool: fix access to uninitialized fields in set RXNFC command
    
       - selinux: use sk_to_full_sk() in selinux_ip_output()
    
      Current release - new code bugs:
    
       - net: make napi_hash_lock irq safe
    
       - eth:
          - bnxt_en: support header page pool in queue API
          - ice: fix NULL pointer dereference in switchdev
    
      Previous releases - regressions:
    
       - core: fix icmp host relookup triggering ip_rt_bug
    
       - ipv6:
          - avoid possible NULL deref in modify_prefix_route()
          - release expired exception dst cached in socket
    
       - smc: fix LGR and link use-after-free issue
    
       - hsr: avoid potential out-of-bound access in fill_frame_info()
    
       - can: hi311x: fix potential use-after-free
    
       - eth: ice: fix VLAN pruning in switchdev mode
    
      Previous releases - always broken:
    
       - netfilter:
          - ipset: hold module reference while requesting a module
          - nft_inner: incorrect percpu area handling under softirq
    
       - can: j1939: fix skb reference counting
    
       - eth:
          - mlxsw: use correct key block on Spectrum-4
          - mlx5: fix memory leak in mlx5hws_definer_calc_layout"
    
    * tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
      net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
      net: avoid potential UAF in default_operstate()
      vsock/test: verify socket options after setting them
      vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
      vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
      net/mlx5e: Remove workaround to avoid syndrome for internal port
      net/mlx5e: SD, Use correct mdev to build channel param
      net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
      net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
      net/mlx5: HWS: Properly set bwc queue locks lock classes
      net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
      bnxt_en: handle tpa_info in queue API implementation
      bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
      bnxt_en: refactor tpa_info alloc/free into helpers
      geneve: do not assume mac header is set in geneve_xmit_skb()
      mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
      ethtool: Fix wrong mod state in case of verbose and no_mask bitset
      ipmr: tune the ipmr_can_free_table() checks.
      netfilter: nft_set_hash: skip duplicated elements pending gc run
      netfilter: ipset: Hold module reference while requesting a module
      ...
    torvalds committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    896d894 View commit details
    Browse the repository at this point in the history
  58. Merge tag 'drm-xe-fixes-2024-12-04' of https://gitlab.freedesktop.org…

    …/drm/xe/kernel into drm-fixes
    
    Driver Changes:
    - Missing init value and 64-bit write-order check (Zhanjung)
    - Fix a memory allocation issue causing lockdep violation (John)
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Thomas Hellstrom <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/Z1BidZBFQOLjz__J@fedora
    airlied committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    915bac6 View commit details
    Browse the repository at this point in the history
  59. Merge tag 'v6.13-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd

    Pull smb server fixes from Steve French:
    
     - Three fixes for potential out of bound accesses in read and write
       paths (e.g. when alternate data streams enabled)
    
     - GCC 15 build fix
    
    * tag 'v6.13-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
      ksmbd: align aux_payload_buf to avoid OOB reads in cryptographic operations
      ksmbd: fix Out-of-Bounds Write in ksmbd_vfs_stream_write
      ksmbd: fix Out-of-Bounds Read in ksmbd_vfs_stream_read
      smb: server: Fix building with GCC 15
    torvalds committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    f65289a View commit details
    Browse the repository at this point in the history
  60. Merge tag 'drm-misc-fixes-2024-12-05' of https://gitlab.freedesktop.o…

    …rg/drm/misc/kernel into drm-fixes
    
    drm-misc-fixes v6.13-rc2:
    - v3d performance counter fix.
    - A lot of DP-MST related fixes.
    
    Signed-off-by: Dave Airlie <[email protected]>
    From: Maarten Lankhorst <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    airlied committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    471f3a2 View commit details
    Browse the repository at this point in the history
  61. Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/jgg/iommufd
    
    Pull iommufd fixes from Jason Gunthorpe:
     "One bug fix and some documentation updates:
    
       - Correct typos in comments
    
       - Elaborate a comment about how the uAPI works for
         IOMMU_HW_INFO_TYPE_ARM_SMMUV3
    
       - Fix a double free on error path and add test coverage for the bug"
    
    * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
      iommu/arm-smmu-v3: Improve uAPI comment for IOMMU_HW_INFO_TYPE_ARM_SMMUV3
      iommufd/selftest: Cover IOMMU_FAULT_QUEUE_ALLOC in iommufd_fail_nth
      iommufd: Fix out_fput in iommufd_fault_alloc()
      iommufd: Fix typos in kernel-doc comments
    torvalds committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    6a10386 View commit details
    Browse the repository at this point in the history
  62. Merge tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/pcmoore/audit
    
    Pull audit build problem workaround from Paul Moore:
     "A minor audit patch that shuffles some code slightly to workaround a
      GCC bug affecting a number of people.
    
      The GCC folks have been able to reproduce the problem and are
      discussing solutions (see the bug report link in the commit), but
      since the workaround is trivial let's do that in the kernel so we can
      unblock people who are hitting this"
    
    * tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
      audit: workaround a GCC bug triggered by task comm changes
    torvalds committed Dec 5, 2024
    Configuration menu
    Copy the full SHA
    b8f5221 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2024

  1. fs/proc/vmcore.c: fix warning when CONFIG_MMU=n

    >> fs/proc/vmcore.c:424:19: warning: 'mmap_vmcore_fault' defined but not used [-Wunused-function]
         424 | static vm_fault_t mmap_vmcore_fault(struct vm_fault *vmf)
    
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Cc: Qi Xi <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    def1379 View commit details
    Browse the repository at this point in the history
  2. mm/gup: handle NULL pages in unpin_user_pages()

    The recent addition of "pofs" (pages or folios) handling to gup has a
    flaw: it assumes that unpin_user_pages() handles NULL pages in the pages**
    array.  That's not the case, as I discovered when I ran on a new
    configuration on my test machine.
    
    Fix this by skipping NULL pages in unpin_user_pages(), just like
    unpin_folios() already does.
    
    Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
    6.12, and running this:
    
        tools/testing/selftests/mm/gup_longterm
    
    ...I get the following crash:
    
    BUG: kernel NULL pointer dereference, address: 0000000000000008
    RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
    ...
    Call Trace:
     <TASK>
     ? __die_body+0x66/0xb0
     ? page_fault_oops+0x30c/0x3b0
     ? do_user_addr_fault+0x6c3/0x720
     ? irqentry_enter+0x34/0x60
     ? exc_page_fault+0x68/0x100
     ? asm_exc_page_fault+0x22/0x30
     ? sanity_check_pinned_pages+0x3a/0x2d0
     unpin_user_pages+0x24/0xe0
     check_and_migrate_movable_pages_or_folios+0x455/0x4b0
     __gup_longterm_locked+0x3bf/0x820
     ? mmap_read_lock_killable+0x12/0x50
     ? __pfx_mmap_read_lock_killable+0x10/0x10
     pin_user_pages+0x66/0xa0
     gup_test_ioctl+0x358/0xb20
     __se_sys_ioctl+0x6b/0xc0
     do_syscall_64+0x7b/0x150
     entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 94efde1 ("mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases")
    Signed-off-by: John Hubbard <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: Oscar Salvador <[email protected]>
    Cc: Vivek Kasireddy <[email protected]>
    Cc: Dave Airlie <[email protected]>
    Cc: Gerd Hoffmann <[email protected]>
    Cc: Matthew Wilcox <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Cc: Jason Gunthorpe <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: Arnd Bergmann <[email protected]>
    Cc: Daniel Vetter <[email protected]>
    Cc: Dongwon Kim <[email protected]>
    Cc: Hugh Dickins <[email protected]>
    Cc: Junxiao Chang <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    johnhubbard authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    a1268be View commit details
    Browse the repository at this point in the history
  3. mm/mempolicy: fix migrate_to_node() assuming there is at least one VM…

    …A in a MM
    
    We currently assume that there is at least one VMA in a MM, which isn't
    true.
    
    So we might end up having find_vma() return NULL, to then de-reference
    NULL.  So properly handle find_vma() returning NULL.
    
    This fixes the report:
    
    Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
    KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
    CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
    RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
    RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
    Code: ...
    RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
    RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
    RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
    R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
    R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
    FS:  00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
     __do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
     __se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
     __x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    [[email protected]: add unlikely()]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 3974388 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
    Signed-off-by: David Hildenbrand <[email protected]>
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/lkml/[email protected]/T/
    Reviewed-by: Liam R. Howlett <[email protected]>
    Reviewed-by: Christoph Lameter <[email protected]>
    Cc: Liam R. Howlett <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    davidhildenbrand authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    091c1dd View commit details
    Browse the repository at this point in the history
  4. kasan: make report_lock a raw spinlock

    If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not
    be locked when IRQs are disabled.  However, KASAN reports may be triggered
    in such contexts.  For example:
    
            char *s = kzalloc(1, GFP_KERNEL);
            kfree(s);
            local_irq_disable();
            char c = *s;  /* KASAN report here leads to spin_lock() */
            local_irq_enable();
    
    Make report_spinlock a raw spinlock to prevent rescheduling when
    PREEMPT_RT is enabled.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 342a932 ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>")
    Signed-off-by: Jared Kangas <[email protected]>
    Cc: Alexander Potapenko <[email protected]>
    Cc: Andrey Konovalov <[email protected]>
    Cc: Andrey Ryabinin <[email protected]>
    Cc: Dmitry Vyukov <[email protected]>
    Cc: Vincenzo Frascino <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    rh-jkangas authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    e30a036 View commit details
    Browse the repository at this point in the history
  5. nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()

    Syzbot reported that when searching for records in a directory where the
    inode's i_size is corrupted and has a large value, memory access outside
    the folio/page range may occur, or a use-after-free bug may be detected if
    KASAN is enabled.
    
    This is because nilfs_last_byte(), which is called by nilfs_find_entry()
    and others to calculate the number of valid bytes of directory data in a
    page from i_size and the page index, loses the upper 32 bits of the 64-bit
    size information due to an inappropriate type of local variable to which
    the i_size value is assigned.
    
    This caused a large byte offset value due to underflow in the end address
    calculation in the calling nilfs_find_entry(), resulting in memory access
    that exceeds the folio/page size.
    
    Fix this issue by changing the type of the local variable causing the bit
    loss from "unsigned int" to "u64".  The return value of nilfs_last_byte()
    is also of type "unsigned int", but it is truncated so as not to exceed
    PAGE_SIZE and no bit loss occurs, so no change is required.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 2ba466d ("nilfs2: directory entry operations")
    Signed-off-by: Ryusuke Konishi <[email protected]>
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624
    Tested-by: [email protected]
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    konis authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    985ebec View commit details
    Browse the repository at this point in the history
  6. ocfs2: free inode when ocfs2_get_init_inode() fails

    syzbot is reporting busy inodes after unmount, for commit 9c89fe0
    ("ocfs2: Handle error from dquot_initialize()") forgot to call iput() when
    new_inode() succeeded and dquot_initialize() failed.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 9c89fe0 ("ocfs2: Handle error from dquot_initialize()")
    Signed-off-by: Tetsuo Handa <[email protected]>
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=0af00f6a2cba2058b5db
    Tested-by: [email protected]
    Reviewed-by: Joseph Qi <[email protected]>
    Cc: Mark Fasheh <[email protected]>
    Cc: Joel Becker <[email protected]>
    Cc: Junxiao Bi <[email protected]>
    Cc: Changwei Ge <[email protected]>
    Cc: Jun Piao <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Tetsuo Handa authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    965b5dd View commit details
    Browse the repository at this point in the history
  7. selftest: hugetlb_dio: fix test naming

    The string logged when a test passes or fails is used by the selftest
    framework to identify which test is being reported.  The hugetlb_dio test
    not only uses the same strings for every test that is run but it also uses
    different strings for test passes and failures which means that test
    automation is unable to follow what the test is doing at all.
    
    Pull the existing duplicated logging of the number of free huge pages
    before and after the test out of the conditional and replace that and the
    logging of the result with a single ksft_print_result() which incorporates
    the parameters passed into the test into the output.
    
    Link: https://lkml.kernel.org/r/20241127-kselftest-mm-hugetlb-dio-names-v1-1-22aab01bf550@kernel.org
    Fixes: fae1980 ("selftests: hugetlb_dio: fixup check for initial conditions to skip in the start")
    Signed-off-by: Mark Brown <[email protected]>
    Reviewed-by: Muhammad Usama Anjum <[email protected]>
    Cc: Donet Tom <[email protected]>
    Cc: Ritesh Harjani (IBM) <[email protected]>
    Cc: Shuah Khan <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    broonie authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    4ae132c View commit details
    Browse the repository at this point in the history
  8. selftests/damon: add _damon_sysfs.py to TEST_FILES

    When running selftests I encountered the following error message with
    some damon tests:
    
     # Traceback (most recent call last):
     #   File "[...]/damon/./damos_quota.py", line 7, in <module>
     #     import _damon_sysfs
     # ModuleNotFoundError: No module named '_damon_sysfs'
    
    Fix this by adding the _damon_sysfs.py file to TEST_FILES so that it
    will be available when running the respective damon selftests.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 306abb6 ("selftests/damon: implement a python module for test-purpose DAMON sysfs controls")
    Signed-off-by: Maximilian Heyne <[email protected]>
    Reviewed-by: SeongJae Park <[email protected]>
    Cc: Shuah Khan <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    heynemax authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    4a475c0 View commit details
    Browse the repository at this point in the history
  9. Revert "readahead: properly shorten readahead when falling back to do…

    …_page_cache_ra()"
    
    This reverts commit 7c87758.
    
    Anders and Philippe have reported that recent kernels occasionally hang
    when used with NFS in readahead code.  The problem has been bisected to
    7c87758 ("readahead: properly shorten readahead when falling back to
    do_page_cache_ra()").  The cause of the problem is that ra->size can be
    shrunk by read_pages() call and subsequently we end up calling
    do_page_cache_ra() with negative (read huge positive) number of pages. 
    Let's revert 7c87758 for now until we can find a proper way how the
    logic in read_pages() and page_cache_ra_order() can coexist.  This can
    lead to reduced readahead throughput due to readahead window confusion but
    that's better than outright hangs.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 7c87758 ("readahead: properly shorten readahead when falling back to do_page_cache_ra()")
    Reported-by: Anders Blomdell <[email protected]>
    Reported-by: Philippe Troin <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Tested-by: Philippe Troin <[email protected]>
    Cc: Matthew Wilcox <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    jankara authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    a220d6b View commit details
    Browse the repository at this point in the history
  10. mm: fix vrealloc()'s KASAN poisoning logic

    When vrealloc() reuses already allocated vmap_area, we need to re-annotate
    poisoned and unpoisoned portions of underlying memory according to the new
    size.
    
    This results in a KASAN splat recorded at [1].  A KASAN mis-reporting
    issue where there is none.
    
    Note, hard-coding KASAN_VMALLOC_PROT_NORMAL might not be exactly correct,
    but KASAN flag logic is pretty involved and spread out throughout
    __vmalloc_node_range_noprof(), so I'm using the bare minimum flag here and
    leaving the rest to mm people to refactor this logic and reuse it here.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/bpf/[email protected]/ [1]
    Fixes: 3ddc2fe ("mm: vmalloc: implement vrealloc()")
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Cc: Alexei Starovoitov <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Cc: Michal Hocko <[email protected]>
    Cc: Uladzislau Rezki (Sony) <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    anakryiko authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    d699440 View commit details
    Browse the repository at this point in the history
  11. mm: open-code PageTail in folio_flags() and const_folio_flags()

    It is unsafe to call PageTail() in dump_page() as page_is_fake_head() will
    almost certainly return true when called on a head page that is copied to
    the stack.  That will cause the VM_BUG_ON_PGFLAGS() in const_folio_flags()
    to trigger when it shouldn't.  Fortunately, we don't need to call
    PageTail() here; it's fine to have a pointer to a virtual alias of the
    page's flag word rather than the real page's flag word.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: fae7d83 ("mm: add __dump_folio()")
    Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
    Cc: Kees Cook <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Matthew Wilcox (Oracle) authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    4de22b2 View commit details
    Browse the repository at this point in the history
  12. mm: open-code page_folio() in dump_page()

    page_folio() calls page_fixed_fake_head() which will misidentify this page
    as being a fake head and load off the end of 'precise'.  We may have a
    pointer to a fake head, but that's OK because it contains the right
    information for dump_page().
    
    gcc-15 is smart enough to catch this with -Warray-bounds:
    
    In function 'page_fixed_fake_head',
        inlined from '_compound_head' at ../include/linux/page-flags.h:251:24,
        inlined from '__dump_page' at ../mm/debug.c:123:11:
    ../include/asm-generic/rwonce.h:44:26: warning: array subscript 9 is outside
    +array bounds of 'struct page[1]' [-Warray-bounds=]
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: fae7d83 ("mm: add __dump_folio()")
    Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
    Reported-by: Kees Cook <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Matthew Wilcox (Oracle) authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    6a7de1b View commit details
    Browse the repository at this point in the history
  13. stackdepot: fix stack_depot_save_flags() in NMI context

    Per documentation, stack_depot_save_flags() was meant to be usable from
    NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset.  However, it still
    would try to take the pool_lock in an attempt to save a stack trace in the
    current pool (if space is available).
    
    This could result in deadlock if an NMI is handled while pool_lock is
    already held.  To avoid deadlock, only try to take the lock in NMI context
    and give up if unsuccessful.
    
    The documentation is fixed to clearly convey this.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 4434a56 ("stackdepot: make fast paths lock-less again")
    Signed-off-by: Marco Elver <[email protected]>
    Reported-by: Sebastian Andrzej Siewior <[email protected]>
    Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
    Cc: Alexander Potapenko <[email protected]>
    Cc: Andrey Konovalov <[email protected]>
    Cc: Dmitry Vyukov <[email protected]>
    Cc: Oscar Salvador <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    melver authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    031e04b View commit details
    Browse the repository at this point in the history
  14. ocfs2: update seq_file index in ocfs2_dlm_seq_next

    The following INFO level message was seen:
    
    seq_file: buggy .next function ocfs2_dlm_seq_next [ocfs2] did not
    update position index
    
    Fix:
    Update *pos (so m->index) to make seq_read_iter happy though the index its
    self makes no sense to ocfs2_dlm_seq_next.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Wengang Wang <[email protected]>
    Reviewed-by: Joseph Qi <[email protected]>
    Cc: Mark Fasheh <[email protected]>
    Cc: Joel Becker <[email protected]>
    Cc: Junxiao Bi <[email protected]>
    Cc: Changwei Ge <[email protected]>
    Cc: Jun Piao <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Wengang-oracle authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    914eec5 View commit details
    Browse the repository at this point in the history
  15. mm/codetag: swap tags when migrate pages

    Current solution to adjust codetag references during page migration is
    done in 3 steps:
    
    1. sets the codetag reference of the old page as empty (not pointing
       to any codetag);
    
    2. subtracts counters of the new page to compensate for its own
       allocation;
    
    3. sets codetag reference of the new page to point to the codetag of
       the old page.
    
    This does not work if CONFIG_MEM_ALLOC_PROFILING_DEBUG=n because
    set_codetag_empty() becomes NOOP.  Instead, let's simply swap codetag
    references so that the new page is referencing the old codetag and the old
    page is referencing the new codetag.  This way accounting stays valid and
    the logic makes more sense.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: e0a955b ("mm/codetag: add pgalloc_tag_copy()")
    Signed-off-by: David Wang <[email protected]>
    Closes: https://lore.kernel.org/lkml/[email protected]/
    Acked-by: Suren Baghdasaryan <[email protected]>
    Suggested-by: Suren Baghdasaryan <[email protected]>
    Acked-by: Yu Zhao <[email protected]>
    Cc: Kent Overstreet <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    zq-david-wang authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    51f43d5 View commit details
    Browse the repository at this point in the history
  16. mm: memcg: declare do_memsw_account inline

    In commit 66d60c4 ("mm: memcg: move legacy memcg event code into
    memcontrol-v1.c"), the static do_memsw_account() function was moved from a
    .c file to a .h file.  Unfortunately, the traditional inline keyword
    wasn't added.  If a file (e.g., a unit test) includes the .h file, but
    doesn't refer to do_memsw_account(), it will get a warning like:
    
    mm/memcontrol-v1.h:41:13: warning: unused function 'do_memsw_account' [-Wunused-function]
       41 | static bool do_memsw_account(void)
          |             ^~~~~~~~~~~~~~~~
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 66d60c4 ("mm: memcg: move legacy memcg event code into memcontrol-v1.c")
    Signed-off-by: John Sperbeck <[email protected]>
    Acked-by: Roman Gushchin <[email protected]>
    Cc: Johannes Weiner <[email protected]>
    Cc: Michal Hocko <[email protected]>
    Cc: Muchun Song <[email protected]>
    Cc: Shakeel Butt <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    John Sperbeck authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    89dd878 View commit details
    Browse the repository at this point in the history
  17. mm: respect mmap hint address when aligning for THP

    Commit efa7df3 ("mm: align larger anonymous mappings on THP
    boundaries") updated __get_unmapped_area() to align the start address for
    the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
    
    It does this by effectively looking up a region that is of size,
    request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
    
    Commit 4ef9ad1 ("mm: huge_memory: don't force huge page alignment on
    32 bit") opted out of this for 32bit due to regressions in mmap base
    randomization.
    
    Commit d4148ae ("mm, mmap: limit THP alignment of anonymous mappings
    to PMD-aligned sizes") restricted this to only mmap sizes that are
    multiples of the PMD_SIZE due to reported regressions in some performance
    benchmarks -- which seemed mostly due to the reduced spatial locality of
    related mappings due to the forced PMD-alignment.
    
    Another unintended side effect has emerged: When a user specifies an mmap
    hint address, the THP alignment logic modifies the behavior, potentially
    ignoring the hint even if a sufficiently large gap exists at the requested
    hint location.
    
    Example Scenario:
    
    Consider the following simplified virtual address (VA) space:
    
        ...
    
        0x200000-0x400000 --- VMA A
        0x400000-0x600000 --- Hole
        0x600000-0x800000 --- VMA B
    
        ...
    
    A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
    
      - Before THP alignment: The requested region (size 0x200000) fits into
        the gap at 0x400000, so the hint is respected.
    
      - After alignment: The logic searches for a region of size
        0x400000 (len + PMD_SIZE) starting at 0x400000.
        This search fails due to the mapping at 0x600000 (VMA B), and the hint
        is ignored, falling back to arch_get_unmapped_area[_topdown]().
    
    In general the hint is effectively ignored, if there is any existing
    mapping in the below range:
    
         [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
    
    This changes the semantics of mmap hint; from ""Respect the hint if a
    sufficiently large gap exists at the requested location" to "Respect the
    hint only if an additional PMD-sized gap exists beyond the requested
    size".
    
    This has performance implications for allocators that allocate their heap
    using mmap but try to keep it "as contiguous as possible" by using the end
    of the exisiting heap as the address hint.  With the new behavior it's
    more likely to get a much less contiguous heap, adding extra fragmentation
    and performance overhead.
    
    To restore the expected behavior; don't use
    thp_get_unmapped_area_vmflags() when the user provided a hint address, for
    anonymous mappings.
    
    Note: As Yang Shi pointed out: the issue still remains for filesystems
    which are using thp_get_unmapped_area() for their get_unmapped_area() op. 
    It is unclear what worklaods will regress for if we ignore THP alignment
    when the hint address is provided for such file backed mappings -- so this
    fix will be handled separately.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: efa7df3 ("mm: align larger anonymous mappings on THP boundaries")
    Signed-off-by: Kalesh Singh <[email protected]>
    Reviewed-by: Rik van Riel <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Reviewed-by: David Hildenbrand <[email protected]>
    Cc: Kefeng Wang <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Cc: Yang Shi <[email protected]>
    Cc: Rik van Riel <[email protected]>
    Cc: Ryan Roberts <[email protected]>
    Cc: Suren Baghdasaryan <[email protected]>
    Cc: Minchan Kim <[email protected]>
    Cc: Hans Boehm <[email protected]>
    Cc: Lokesh Gidra <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Kalesh Singh authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    249608e View commit details
    Browse the repository at this point in the history
  18. mm: correct typo in MMAP_STATE() macro

    We mistakenly refer to len rather than len_ here.  The only existing
    caller passes len to the len_ parameter so this has no impact on the code,
    but it is obviously incorrect to do this, so fix it.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Lorenzo Stoakes <[email protected]>
    Reviewed-by: Liam R. Howlett <[email protected]>
    Reviewed-by: Wei Yang <[email protected]>
    Cc: Jann Horn <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    lorenzo-stoakes authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    cbb70e4 View commit details
    Browse the repository at this point in the history
  19. scatterlist: fix incorrect func name in kernel-doc

    Fix a kernel-doc warning by making the kernel-doc function description
    match the function name:
    
    include/linux/scatterlist.h:323: warning: expecting prototype for sg_unmark_bus_address(). Prototype was for sg_dma_unmark_bus_address() instead
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 4239930 ("lib/scatterlist: add flag for indicating P2PDMA segments in an SGL")
    Signed-off-by: Randy Dunlap <[email protected]>
    Cc: Logan Gunthorpe <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    rddunlap authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    d89c8ec View commit details
    Browse the repository at this point in the history
  20. mm/filemap: don't call folio_test_locked() without a reference in nex…

    …t_uptodate_folio()
    
    The folio can get freed + buddy-merged + reallocated in the meantime,
    resulting in us calling folio_test_locked() possibly on a tail page.
    
    This makes const_folio_flags VM_BUG_ON_PGFLAGS() when stumbling over the
    tail page.
    
    Could this result in other issues?  Doesn't look like it.  False positives
    and false negatives don't really matter, because this folio would get
    skipped either way when detecting that they have been reallocated in the
    meantime.
    
    Fix it by performing the folio_test_locked() checked after grabbing a
    reference.  If this ever becomes a real problem, we could add a special
    helper that racily checks if the bit is set even on tail pages ...  but
    let's hope that's not required so we can just handle it cleaner: work on
    the folio after we hold a reference.
    
    Do we really need the folio_test_locked() check if we are going to trylock
    briefly after?  Well, we can at least avoid a xas_reload().
    
    It's a bit unclear which exact change introduced that issue.  Likely, ever
    since we made PG_locked obey to the PF_NO_TAIL policy it could have been
    triggered in some way.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 48c935a ("page-flags: define PG_locked behavior on compound pages")
    Signed-off-by: David Hildenbrand <[email protected]>
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/lkml/[email protected]/
    Acked-by: Kirill A. Shutemov <[email protected]>
    Cc: "Matthew Wilcox (Oracle)" <[email protected]>
    Cc: Hillf Danton <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    davidhildenbrand authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    3203b3a View commit details
    Browse the repository at this point in the history
  21. lib: stackinit: hide never-taken branch from compiler

    The never-taken branch leads to an invalid bounds condition, which is by
    design. To avoid the unwanted warning from the compiler, hide the
    variable from the optimizer.
    
    ../lib/stackinit_kunit.c: In function 'do_nothing_u16_zero':
    ../lib/stackinit_kunit.c:51:49: error: array subscript 1 is outside array bounds of 'u16[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds=]
       51 | #define DO_NOTHING_RETURN_SCALAR(ptr)           *(ptr)
          |                                                 ^~~~~~
    ../lib/stackinit_kunit.c:219:24: note: in expansion of macro 'DO_NOTHING_RETURN_SCALAR'
      219 |                 return DO_NOTHING_RETURN_ ## which(ptr + 1);    \
          |                        ^~~~~~~~~~~~~~~~~~
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Kees Cook <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    kees authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    5c37936 View commit details
    Browse the repository at this point in the history
  22. mm/damon: fix order of arguments in damos_before_apply tracepoint

    Since the order of the scheme_idx and target_idx arguments in TP_ARGS is
    reversed, they are stored in the trace record in reverse.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://patch.msgid.link/[email protected]
    Fixes: c603c63 ("mm/damon/core: add a tracepoint for damos apply target regions")
    Signed-off-by: Akinobu Mita <[email protected]>
    Signed-off-by: SeongJae Park <[email protected]>
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Steven Rostedt <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    mita authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    6535b86 View commit details
    Browse the repository at this point in the history
  23. sched/numa: fix memory leak due to the overwritten vma->numab_state

    [Problem Description]
    When running the hackbench program of LTP, the following memory leak is
    reported by kmemleak.
    
      # /opt/ltp/testcases/bin/hackbench 20 thread 1000
      Running with 20*40 (== 800) tasks.
    
      # dmesg | grep kmemleak
      ...
      kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
    
      # cat /sys/kernel/debug/kmemleak
      unreferenced object 0xffff888cd8ca2c40 (size 64):
        comm "hackbench", pid 17142, jiffies 4299780315
        hex dump (first 32 bytes):
          ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00  .tI.....L.I.....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace (crc bff18fd4):
          [<ffffffff81419a89>] __kmalloc_cache_noprof+0x2f9/0x3f0
          [<ffffffff8113f715>] task_numa_work+0x725/0xa00
          [<ffffffff8110f878>] task_work_run+0x58/0x90
          [<ffffffff81ddd9f8>] syscall_exit_to_user_mode+0x1c8/0x1e0
          [<ffffffff81dd78d5>] do_syscall_64+0x85/0x150
          [<ffffffff81e0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
      ...
    
    This issue can be consistently reproduced on three different servers:
      * a 448-core server
      * a 256-core server
      * a 192-core server
    
    [Root Cause]
    Since multiple threads are created by the hackbench program (along with
    the command argument 'thread'), a shared vma might be accessed by two or
    more cores simultaneously. When two or more cores observe that
    vma->numab_state is NULL at the same time, vma->numab_state will be
    overwritten.
    
    Although current code ensures that only one thread scans the VMAs in a
    single 'numa_scan_period', there might be a chance for another thread
    to enter in the next 'numa_scan_period' while we have not gotten till
    numab_state allocation [1].
    
    Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000`
    cannot the reproduce the issue. It is verified with 200+ test runs.
    
    [Solution]
    Use the cmpxchg atomic operation to ensure that only one thread executes
    the vma->numab_state assignment.
    
    [1] https://lore.kernel.org/lkml/[email protected]/
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: ef6a22b ("sched/numa: apply the scan delay to every new vma")
    Signed-off-by: Adrian Huang <[email protected]>
    Reported-by: Jiwei Sun <[email protected]>
    Reviewed-by: Raghavendra K T <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Cc: Ben Segall <[email protected]>
    Cc: Dietmar Eggemann <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Juri Lelli <[email protected]>
    Cc: Mel Gorman <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Steven Rostedt <[email protected]>
    Cc: Valentin Schneider <[email protected]>
    Cc: Vincent Guittot <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Adrian Huang authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    5f1b64e View commit details
    Browse the repository at this point in the history
  24. iio: magnetometer: yas530: use signed integer type for clamp limits

    In the function yas537_measure() there is a clamp_val() with limits of
    -BIT(13) and BIT(13) - 1.  The input clamp value h[] is of type s32.  The
    BIT() is of type unsigned long integer due to its define in
    include/vdso/bits.h.  The lower limit -BIT(13) is recognized as -8192 but
    expressed as an unsigned long integer.  The size of an unsigned long
    integer differs between 32-bit and 64-bit architectures.  Converting this
    to type s32 may lead to undesired behavior.
    
    Additionally, in the calculation lines h[0], h[1] and h[2] the unsigned
    long integer divisor BIT(13) causes an unsigned division, shifting the
    left-hand side of the equation back and forth, possibly ending up in large
    positive values instead of negative values on 32-bit architectures.
    
    To solve those two issues, declare a signed integer with a value of
    BIT(13).
    
    There is another omission in the clamp line: clamp_val() returns a value
    and it's going nowhere here.  Self-assign it to h[i] to make use of the
    clamp macro.
    
    Finally, replace clamp_val() macro by clamp() because after changing the
    limits from type unsigned long integer to signed integer it's fine that
    way.
    
    Link: https://lkml.kernel.org/r/11609b2243c295d65ab4d47e78c239d61ad6be75.1732914810.git.jahau@rocketmail.com
    Fixes: 65f79b5 ("iio: magnetometer: yas530: Add YAS537 variant")
    Signed-off-by: Jakob Hauser <[email protected]>
    
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Reviewed-by: David Laight <[email protected]>
    Acked-by: Jonathan Cameron <[email protected]>
    Cc: Lars-Peter Clausen <[email protected]>
    Cc: Linus Walleij <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Jakko3 authored and akpm00 committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    f1ee548 View commit details
    Browse the repository at this point in the history
  25. x86/kexec: Restore GDT on return from ::preserve_context kexec

    The restore_processor_state() function explicitly states that "the asm code
    that gets us here will have restored a usable GDT". That wasn't true in the
    case of returning from a ::preserve_context kexec. Make it so.
    
    Without this, the kernel was depending on the called function to reload a
    GDT which is appropriate for the kernel before returning.
    
    Test program:
    
     #include <unistd.h>
     #include <errno.h>
     #include <stdio.h>
     #include <stdlib.h>
     #include <linux/kexec.h>
     #include <linux/reboot.h>
     #include <sys/reboot.h>
     #include <sys/syscall.h>
    
     int main (void)
     {
            struct kexec_segment segment = {};
    	unsigned char purgatory[] = {
    		0x66, 0xba, 0xf8, 0x03,	// mov $0x3f8, %dx
    		0xb0, 0x42,		// mov $0x42, %al
    		0xee,			// outb %al, (%dx)
    		0xc3,			// ret
    	};
    	int ret;
    
    	segment.buf = &purgatory;
    	segment.bufsz = sizeof(purgatory);
    	segment.mem = (void *)0x400000;
    	segment.memsz = 0x1000;
    	ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
    	if (ret) {
    		perror("kexec_load");
    		exit(1);
    	}
    
    	ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
    	if (ret) {
    		perror("kexec reboot");
    		exit(1);
    	}
    	printf("Success\n");
    	return 0;
     }
    
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    dwmw2 authored and Ingo Molnar committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    07fa619 View commit details
    Browse the repository at this point in the history
  26. cacheinfo: Allocate memory during CPU hotplug if not done from the pr…

    …imary CPU
    
    Commit
    
      5944ce0 ("arch_topology: Build cacheinfo from primary CPU")
    
    adds functionality that architectures can use to optionally allocate and
    build cacheinfo early during boot. Commit
    
      6539cff ("cacheinfo: Add arch specific early level initializer")
    
    lets secondary CPUs correct (and reallocate memory) cacheinfo data if
    needed.
    
    If the early build functionality is not used and cacheinfo does not need
    correction, memory for cacheinfo is never allocated. x86 does not use
    the early build functionality. Consequently, during the cacheinfo CPU
    hotplug callback, last_level_cache_is_valid() attempts to dereference
    a NULL pointer:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000100
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not present page
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEPMT SMP NOPTI
      CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1
      RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
    
    Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback
    if not done earlier.
    
    Moreover, before determining the validity of the last-level cache info,
    ensure that it has been allocated. Simply checking for non-zero
    cache_leaves() is not sufficient, as some architectures (e.g., Intel
    processors) have non-zero cache_leaves() before allocation.
    
    Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size().
    This function iterates over all online CPUs. However, a CPU may have come
    online recently, but its cacheinfo may not have been allocated yet.
    
    While here, remove an unnecessary indentation in allocate_cache_info().
    
      [ bp: Massage. ]
    
    Fixes: 6539cff ("cacheinfo: Add arch specific early level initializer")
    Signed-off-by: Ricardo Neri <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Reviewed-by: Radu Rendec <[email protected]>
    Reviewed-by: Nikolay Borisov <[email protected]>
    Reviewed-by: Andreas Herrmann <[email protected]>
    Reviewed-by: Sudeep Holla <[email protected]>
    Cc: [email protected] # 6.3+
    Link: https://lore.kernel.org/r/[email protected]
    ricardon authored and bp3tk0v committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    b3fce42 View commit details
    Browse the repository at this point in the history
  27. x86/cacheinfo: Delete global num_cache_leaves

    Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
    CPUs from the same global "num_cache_leaves".
    
    This is erroneous on systems such as Meteor Lake, where each CPU has a
    distinct num_leaves value. Delete the global "num_cache_leaves" and
    initialize num_leaves on each CPU.
    
    init_cache_level() no longer needs to set num_leaves. Also, it never had to
    set num_levels as it is unnecessary in x86. Keep checking for zero cache
    leaves. Such condition indicates a bug.
    
      [ bp: Cleanup. ]
    
    Signed-off-by: Ricardo Neri <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Cc: [email protected] # 6.3+
    Link: https://lore.kernel.org/r/[email protected]
    ricardon authored and bp3tk0v committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    9677be0 View commit details
    Browse the repository at this point in the history
  28. btrfs: properly wait for writeback before buffered write

    [BUG]
    Before commit e820dbe ("btrfs: convert btrfs_buffered_write() to
    use folios"), function prepare_one_folio() will always wait for folio
    writeback to finish before returning the folio.
    
    However commit e820dbe ("btrfs: convert btrfs_buffered_write() to
    use folios") changed to use FGP_STABLE to do the writeback wait, but
    FGP_STABLE is calling folio_wait_stable(), which only calls
    folio_wait_writeback() if the address space has AS_STABLE_WRITES, which
    is not set for btrfs inodes.
    
    This means we will not wait for the folio writeback at all.
    
    [CAUSE]
    The cause is FGP_STABLE is not waiting for writeback unconditionally, but
    only for address spaces with AS_STABLE_WRITES, normally such flag is set
    when the super block has SB_I_STABLE_WRITES flag.
    
    Such super block flag is set when the block device has hardware digest
    support or has internal checksum requirement.
    
    I'd argue btrfs should set such super block due to its default data
    checksum behavior, but it is not set yet, so this means FGP_STABLE flag
    will have no effect at all.
    
    (For NODATASUM inodes, we can skip the waiting in theory but that should
    be an optimization in the future.)
    
    This can lead to data checksum mismatch, as we can modify the folio
    while it's still under writeback, this will make the contents differ
    from the contents at submission and checksum calculation.
    
    [FIX]
    Instead of fully relying on FGP_STABLE, manually do the folio writeback
    waiting, until we set the address space or super flag.
    
    Fixes: e820dbe ("btrfs: convert btrfs_buffered_write() to use folios")
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    c83d77e View commit details
    Browse the repository at this point in the history
  29. btrfs: handle bio_split() errors

    Commit e546fe1 ("block: Rework bio_split() return value") changed
    bio_split() so that it can return errors.
    
    Add error handling for it in btrfs_split_bio() and ultimately
    btrfs_submit_chunk(). As the bio is not submitted, the bio counter must
    be decremented to pair btrfs_bio_counter_inc_blocked().
    
    Reviewed-by: John Garry <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    c7c97ce View commit details
    Browse the repository at this point in the history
  30. btrfs: flush delalloc workers queue before stopping cleaner kthread d…

    …uring unmount
    
    During the unmount path, at close_ctree(), we first stop the cleaner
    kthread, using kthread_stop() which frees the associated task_struct, and
    then stop and destroy all the work queues. However after we stopped the
    cleaner we may still have a worker from the delalloc_workers queue running
    inode.c:submit_compressed_extents(), which calls btrfs_add_delayed_iput(),
    which in turn tries to wake up the cleaner kthread - which was already
    destroyed before, resulting in a use-after-free on the task_struct.
    
    Syzbot reported this with the following stack traces:
    
      BUG: KASAN: slab-use-after-free in __lock_acquire+0x78/0x2100 kernel/locking/lockdep.c:5089
      Read of size 8 at addr ffff8880259d2818 by task kworker/u8:3/52
    
      CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.13.0-rc1-syzkaller-00002-gcdd30ebb1b9f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
      Workqueue: btrfs-delalloc btrfs_work_helper
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:94 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
       print_address_description mm/kasan/report.c:378 [inline]
       print_report+0x169/0x550 mm/kasan/report.c:489
       kasan_report+0x143/0x180 mm/kasan/report.c:602
       __lock_acquire+0x78/0x2100 kernel/locking/lockdep.c:5089
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
       class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline]
       try_to_wake_up+0xc2/0x1470 kernel/sched/core.c:4205
       submit_compressed_extents+0xdf/0x16e0 fs/btrfs/inode.c:1615
       run_ordered_work fs/btrfs/async-thread.c:288 [inline]
       btrfs_work_helper+0x96f/0xc40 fs/btrfs/async-thread.c:324
       process_one_work kernel/workqueue.c:3229 [inline]
       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310
       worker_thread+0x870/0xd30 kernel/workqueue.c:3391
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       </TASK>
    
      Allocated by task 2:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       unpoison_slab_object mm/kasan/common.c:319 [inline]
       __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345
       kasan_slab_alloc include/linux/kasan.h:250 [inline]
       slab_post_alloc_hook mm/slub.c:4104 [inline]
       slab_alloc_node mm/slub.c:4153 [inline]
       kmem_cache_alloc_node_noprof+0x1d9/0x380 mm/slub.c:4205
       alloc_task_struct_node kernel/fork.c:180 [inline]
       dup_task_struct+0x57/0x8c0 kernel/fork.c:1113
       copy_process+0x5d1/0x3d50 kernel/fork.c:2225
       kernel_clone+0x223/0x870 kernel/fork.c:2807
       kernel_thread+0x1bc/0x240 kernel/fork.c:2869
       create_kthread kernel/kthread.c:412 [inline]
       kthreadd+0x60d/0x810 kernel/kthread.c:767
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
      Freed by task 24:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
       poison_slab_object mm/kasan/common.c:247 [inline]
       __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
       kasan_slab_free include/linux/kasan.h:233 [inline]
       slab_free_hook mm/slub.c:2338 [inline]
       slab_free mm/slub.c:4598 [inline]
       kmem_cache_free+0x195/0x410 mm/slub.c:4700
       put_task_struct include/linux/sched/task.h:144 [inline]
       delayed_put_task_struct+0x125/0x300 kernel/exit.c:227
       rcu_do_batch kernel/rcu/tree.c:2567 [inline]
       rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
       handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:554
       run_ksoftirqd+0xca/0x130 kernel/softirq.c:943
       smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
      Last potentially related work creation:
       kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
       __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:544
       __call_rcu_common kernel/rcu/tree.c:3086 [inline]
       call_rcu+0x167/0xa70 kernel/rcu/tree.c:3190
       context_switch kernel/sched/core.c:5372 [inline]
       __schedule+0x1803/0x4be0 kernel/sched/core.c:6756
       __schedule_loop kernel/sched/core.c:6833 [inline]
       schedule+0x14b/0x320 kernel/sched/core.c:6848
       schedule_timeout+0xb0/0x290 kernel/time/sleep_timeout.c:75
       do_wait_for_common kernel/sched/completion.c:95 [inline]
       __wait_for_common kernel/sched/completion.c:116 [inline]
       wait_for_common kernel/sched/completion.c:127 [inline]
       wait_for_completion+0x355/0x620 kernel/sched/completion.c:148
       kthread_stop+0x19e/0x640 kernel/kthread.c:712
       close_ctree+0x524/0xd60 fs/btrfs/disk-io.c:4328
       generic_shutdown_super+0x139/0x2d0 fs/super.c:642
       kill_anon_super+0x3b/0x70 fs/super.c:1237
       btrfs_kill_super+0x41/0x50 fs/btrfs/super.c:2112
       deactivate_locked_super+0xc4/0x130 fs/super.c:473
       cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373
       task_work_run+0x24f/0x310 kernel/task_work.c:239
       ptrace_notify+0x2d2/0x380 kernel/signal.c:2503
       ptrace_report_syscall include/linux/ptrace.h:415 [inline]
       ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline]
       syscall_exit_work+0xc7/0x1d0 kernel/entry/common.c:173
       syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline]
       __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
       syscall_exit_to_user_mode+0x24a/0x340 kernel/entry/common.c:218
       do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
      The buggy address belongs to the object at ffff8880259d1e00
       which belongs to the cache task_struct of size 7424
      The buggy address is located 2584 bytes inside of
       freed 7424-byte region [ffff8880259d1e00, ffff8880259d3b00)
    
      The buggy address belongs to the physical page:
      page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x259d0
      head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      memcg:ffff88802f4b56c1
      flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
      page_type: f5(slab)
      raw: 00fff00000000040 ffff88801bafe500 dead000000000100 dead000000000122
      raw: 0000000000000000 0000000000040004 00000001f5000000 ffff88802f4b56c1
      head: 00fff00000000040 ffff88801bafe500 dead000000000100 dead000000000122
      head: 0000000000000000 0000000000040004 00000001f5000000 ffff88802f4b56c1
      head: 00fff00000000003 ffffea0000967401 ffffffffffffffff 0000000000000000
      head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 12, tgid 12 (kworker/u8:1), ts 7328037942, free_ts 0
       set_page_owner include/linux/page_owner.h:32 [inline]
       post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1556
       prep_new_page mm/page_alloc.c:1564 [inline]
       get_page_from_freelist+0x3651/0x37a0 mm/page_alloc.c:3474
       __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4751
       alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
       alloc_slab_page+0x6a/0x140 mm/slub.c:2408
       allocate_slab+0x5a/0x2f0 mm/slub.c:2574
       new_slab mm/slub.c:2627 [inline]
       ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3815
       __slab_alloc+0x58/0xa0 mm/slub.c:3905
       __slab_alloc_node mm/slub.c:3980 [inline]
       slab_alloc_node mm/slub.c:4141 [inline]
       kmem_cache_alloc_node_noprof+0x269/0x380 mm/slub.c:4205
       alloc_task_struct_node kernel/fork.c:180 [inline]
       dup_task_struct+0x57/0x8c0 kernel/fork.c:1113
       copy_process+0x5d1/0x3d50 kernel/fork.c:2225
       kernel_clone+0x223/0x870 kernel/fork.c:2807
       user_mode_thread+0x132/0x1a0 kernel/fork.c:2885
       call_usermodehelper_exec_work+0x5c/0x230 kernel/umh.c:171
       process_one_work kernel/workqueue.c:3229 [inline]
       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310
       worker_thread+0x870/0xd30 kernel/workqueue.c:3391
      page_owner free stack trace missing
    
      Memory state around the buggy address:
       ffff8880259d2700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880259d2780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880259d2800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                  ^
       ffff8880259d2880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880259d2900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
    
    Fix this by flushing the delalloc workers queue before stopping the
    cleaner kthread.
    
    Reported-by: [email protected]
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    f10bef7 View commit details
    Browse the repository at this point in the history
  31. smb3.1.1: fix posix mounts to older servers

    Some servers which implement the SMB3.1.1 POSIX extensions did not
    set the file type in the mode in the infolevel 100 response.
    With the recent changes for checking the file type via the mode field,
    this can cause the root directory to be reported incorrectly and
    mounts (e.g. to ksmbd) to fail.
    
    Fixes: 6a832bc ("fs/smb/client: Implement new SMB3 POSIX type")
    Cc: [email protected]
    Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
    Cc: Ralph Boehme <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Steve French committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    ddca502 View commit details
    Browse the repository at this point in the history
  32. smb: client: fix potential race in cifs_put_tcon()

    dfs_cache_refresh() delayed worker could race with cifs_put_tcon(), so
    make sure to call list_replace_init() on @tcon->dfs_ses_list after
    kworker is cancelled or finished.
    
    Fixes: 4f42a8b ("smb: client: fix DFS interlink failover")
    Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    pcacjr authored and Steve French committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    c32b624 View commit details
    Browse the repository at this point in the history
  33. blk-mq: register cpuhp callback after hctx is added to xarray table

    We need to retrieve 'hctx' from xarray table in the cpuhp callback, so the
    callback should be registered after this 'hctx' is added to xarray table.
    
    Cc: Reinette Chatre <[email protected]>
    Cc: Fenghua Yu <[email protected]>
    Cc: Peter Newman <[email protected]>
    Cc: Babu Moger <[email protected]>
    Cc: Luck Tony <[email protected]>
    Signed-off-by: Ming Lei <[email protected]>
    Tested-by: Tony Luck <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Ming Lei authored and axboe committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    4bf485a View commit details
    Browse the repository at this point in the history
  34. blk-mq: move cpuhp callback registering out of q->sysfs_lock

    Registering and unregistering cpuhp callback requires global cpu hotplug lock,
    which is used everywhere. Meantime q->sysfs_lock is used in block layer
    almost everywhere.
    
    It is easy to trigger lockdep warning[1] by connecting the two locks.
    
    Fix the warning by moving blk-mq's cpuhp callback registering out of
    q->sysfs_lock. Add one dedicated global lock for covering registering &
    unregistering hctx's cpuhp, and it is safe to do so because hctx is
    guaranteed to be live if our request_queue is live.
    
    [1] https://lore.kernel.org/lkml/Z04pz3AlvI4o0Mr8@agluck-desk3/
    
    Cc: Reinette Chatre <[email protected]>
    Cc: Fenghua Yu <[email protected]>
    Cc: Peter Newman <[email protected]>
    Cc: Babu Moger <[email protected]>
    Reported-by: Luck Tony <[email protected]>
    Signed-off-by: Ming Lei <[email protected]>
    Tested-by: Tony Luck <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Ming Lei authored and axboe committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    22465bb View commit details
    Browse the repository at this point in the history
  35. bpf: Remove unnecessary check when updating LPM trie

    When "node->prefixlen == matchlen" is true, it means that the node is
    fully matched. If "node->prefixlen == key->prefixlen" is false, it means
    the prefix length of key is greater than the prefix length of node,
    otherwise, matchlen will not be equal with node->prefixlen. However, it
    also implies that the prefix length of node must be less than
    max_prefixlen.
    
    Therefore, "node->prefixlen == trie->max_prefixlen" will always be false
    when the check of "node->prefixlen == key->prefixlen" returns false.
    Remove this unnecessary comparison.
    
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Acked-by: Daniel Borkmann <[email protected]>
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    156c977 View commit details
    Browse the repository at this point in the history
  36. bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem

    There is no need to call kfree(im_node) when updating element fails,
    because im_node must be NULL. Remove the unnecessary kfree() for
    im_node.
    
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Acked-by: Daniel Borkmann <[email protected]>
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    3d5611b View commit details
    Browse the repository at this point in the history
  37. bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie

    Add the currently missing handling for the BPF_EXIST and BPF_NOEXIST
    flags. These flags can be specified by users and are relevant since LPM
    trie supports exact matches during update.
    
    Fixes: b95a5c4 ("bpf: add a longest prefix match trie map implementation")
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Acked-by: Daniel Borkmann <[email protected]>
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    eae6a07 View commit details
    Browse the repository at this point in the history
  38. bpf: Handle in-place update for full LPM trie correctly

    When a LPM trie is full, in-place updates of existing elements
    incorrectly return -ENOSPC.
    
    Fix this by deferring the check of trie->n_entries. For new insertions,
    n_entries must not exceed max_entries. However, in-place updates are
    allowed even when the trie is full.
    
    Fixes: b95a5c4 ("bpf: add a longest prefix match trie map implementation")
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    532d6b3 View commit details
    Browse the repository at this point in the history
  39. bpf: Fix exact match conditions in trie_get_next_key()

    trie_get_next_key() uses node->prefixlen == key->prefixlen to identify
    an exact match, However, it is incorrect because when the target key
    doesn't fully match the found node (e.g., node->prefixlen != matchlen),
    these two nodes may also have the same prefixlen. It will return
    expected result when the passed key exist in the trie. However when a
    recently-deleted key or nonexistent key is passed to
    trie_get_next_key(), it may skip keys and return incorrect result.
    
    Fix it by using node->prefixlen == matchlen to identify exact matches.
    When the condition is true after the search, it also implies
    node->prefixlen equals key->prefixlen, otherwise, the search would
    return NULL instead.
    
    Fixes: b471f2f ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    27abc7b View commit details
    Browse the repository at this point in the history
  40. bpf: Switch to bpf mem allocator for LPM trie

    Multiple syzbot warnings have been reported. These warnings are mainly
    about the lock order between trie->lock and kmalloc()'s internal lock.
    See report [1] as an example:
    
    ======================================================
    WARNING: possible circular locking dependency detected
    6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted
    ------------------------------------------------------
    syz.3.2069/15008 is trying to acquire lock:
    ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ...
    
    but task is already holding lock:
    ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ...
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 (&trie->lock){-.-.}-{2:2}:
           __raw_spin_lock_irqsave
           _raw_spin_lock_irqsave+0x3a/0x60
           trie_delete_elem+0xb0/0x820
           ___bpf_prog_run+0x3e51/0xabd0
           __bpf_prog_run32+0xc1/0x100
           bpf_dispatcher_nop_func
           ......
           bpf_trace_run2+0x231/0x590
           __bpf_trace_contention_end+0xca/0x110
           trace_contention_end.constprop.0+0xea/0x170
           __pv_queued_spin_lock_slowpath+0x28e/0xcc0
           pv_queued_spin_lock_slowpath
           queued_spin_lock_slowpath
           queued_spin_lock
           do_raw_spin_lock+0x210/0x2c0
           __raw_spin_lock_irqsave
           _raw_spin_lock_irqsave+0x42/0x60
           __put_partials+0xc3/0x170
           qlink_free
           qlist_free_all+0x4e/0x140
           kasan_quarantine_reduce+0x192/0x1e0
           __kasan_slab_alloc+0x69/0x90
           kasan_slab_alloc
           slab_post_alloc_hook
           slab_alloc_node
           kmem_cache_alloc_node_noprof+0x153/0x310
           __alloc_skb+0x2b1/0x380
           ......
    
    -> #0 (&n->list_lock){-.-.}-{2:2}:
           check_prev_add
           check_prevs_add
           validate_chain
           __lock_acquire+0x2478/0x3b30
           lock_acquire
           lock_acquire+0x1b1/0x560
           __raw_spin_lock_irqsave
           _raw_spin_lock_irqsave+0x3a/0x60
           get_partial_node.part.0+0x20/0x350
           get_partial_node
           get_partial
           ___slab_alloc+0x65b/0x1870
           __slab_alloc.constprop.0+0x56/0xb0
           __slab_alloc_node
           slab_alloc_node
           __do_kmalloc_node
           __kmalloc_node_noprof+0x35c/0x440
           kmalloc_node_noprof
           bpf_map_kmalloc_node+0x98/0x4a0
           lpm_trie_node_alloc
           trie_update_elem+0x1ef/0xe00
           bpf_map_update_value+0x2c1/0x6c0
           map_update_elem+0x623/0x910
           __sys_bpf+0x90c/0x49a0
           ...
    
    other info that might help us debug this:
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock(&trie->lock);
                                   lock(&n->list_lock);
                                   lock(&trie->lock);
      lock(&n->list_lock);
    
     *** DEADLOCK ***
    
    [1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7
    
    A bpf program attached to trace_contention_end() triggers after
    acquiring &n->list_lock. The program invokes trie_delete_elem(), which
    then acquires trie->lock. However, it is possible that another
    process is invoking trie_update_elem(). trie_update_elem() will acquire
    trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke
    get_partial_node() and try to acquire &n->list_lock (not necessarily the
    same lock object). Therefore, lockdep warns about the circular locking
    dependency.
    
    Invoking kmalloc() before acquiring trie->lock could fix the warning.
    However, since BPF programs call be invoked from any context (e.g.,
    through kprobe/tracepoint/fentry), there may still be lock ordering
    problems for internal locks in kmalloc() or trie->lock itself.
    
    To eliminate these potential lock ordering problems with kmalloc()'s
    internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent
    BPF memory allocator APIs that can be invoked in any context. The lock
    ordering problems with trie->lock (e.g., reentrance) will be handled
    separately.
    
    Three aspects of this change require explanation:
    
    1. Intermediate and leaf nodes are allocated from the same allocator.
    Since the value size of LPM trie is usually small, using a single
    alocator reduces the memory overhead of the BPF memory allocator.
    
    2. Leaf nodes are allocated before disabling IRQs. This handles cases
    where leaf_size is large (e.g., > 4KB - 8) and updates require
    intermediate node allocation. If leaf nodes were allocated in
    IRQ-disabled region, the free objects in BPF memory allocator would not
    be refilled timely and the intermediate node allocation may fail.
    
    3. Paired migrate_{disable|enable}() calls for node alloc and free. The
    BPF memory allocator uses per-CPU struct internally, these paired calls
    are necessary to guarantee correctness.
    
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    3d8dc43 View commit details
    Browse the repository at this point in the history
  41. bpf: Use raw_spinlock_t for LPM trie

    After switching from kmalloc() to the bpf memory allocator, there will be
    no blocking operation during the update of LPM trie. Therefore, change
    trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in
    atomic context, even on RT kernels.
    
    The max value of prefixlen is 2048. Therefore, update or deletion
    operations will find the target after at most 2048 comparisons.
    Constructing a test case which updates an element after 2048 comparisons
    under a 8 CPU VM, and the average time and the maximal time for such
    update operation is about 210us and 900us.
    
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    6a5c63d View commit details
    Browse the repository at this point in the history
  42. selftests/bpf: Move test_lpm_map.c to map_tests

    Move test_lpm_map.c to map_tests/ to include LPM trie test cases in
    regular test_maps run. Most code remains unchanged, including the use of
    assert(). Only reduce n_lookups from 64K to 512, which decreases
    test_lpm_map runtime from 37s to 0.7s.
    
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    3e18f5f View commit details
    Browse the repository at this point in the history
  43. selftests/bpf: Add more test cases for LPM trie

    Add more test cases for LPM trie in test_maps:
    
    1) test_lpm_trie_update_flags
    It constructs various use cases for BPF_EXIST and BPF_NOEXIST and check
    whether the return value of update operation is expected.
    
    2) test_lpm_trie_update_full_maps
    It tests the update operations on a full LPM trie map. Adding new node
    will fail and overwriting the value of existed node will succeed.
    
    3) test_lpm_trie_iterate_strs and test_lpm_trie_iterate_ints
    There two test cases test whether the iteration through get_next_key is
    sorted and expected. These two test cases delete the minimal key after
    each iteration and check whether next iteration returns the second
    minimal key. The only difference between these two test cases is the
    former one saves strings in the LPM trie and the latter saves integers.
    Without the fix of get_next_key, these two cases will fail as shown
    below:
      test_lpm_trie_iterate_strs(1091):FAIL:iterate #2 got abc exp abS
      test_lpm_trie_iterate_ints(1142):FAIL:iterate #1 got 0x2 exp 0x1
    
    Signed-off-by: Hou Tao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Hou Tao authored and Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    04d4ce9 View commit details
    Browse the repository at this point in the history
  44. Merge branch 'fixes-for-lpm-trie'

    Hou Tao says:
    
    ====================
    This patch set fixes several issues for LPM trie. These issues were
    found during adding new test cases or were reported by syzbot.
    
    The patch set is structured as follows:
    
    Patch #1~#2 are clean-ups for lpm_trie_update_elem().
    Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie.
    Patch #4 fixes the accounting of n_entries when doing in-place update.
    Patch #5 fixes the exact match condition in trie_get_next_key() and it
    may skip keys when the passed key is not found in the map.
    Patch #6~#7 switch from kmalloc() to bpf memory allocator for LPM trie
    to fix several lock order warnings reported by syzbot. It also enables
    raw_spinlock_t for LPM trie again. After these changes, the LPM trie will
    be closer to being usable in any context (though the reentrance check of
    trie->lock is still missing, but it is on my todo list).
    Patch torvalds#8: move test_lpm_map to map_tests to make it run regularly.
    Patch torvalds#9: add test cases for the issues fixed by patch #3~#5.
    
    Please see individual patches for more details. Comments are always
    welcome.
    
    Change Log:
    v3:
      * patch #2: remove the unnecessary NULL-init for im_node
      * patch #6: alloc the leaf node before disabling IRQ to low
        the possibility of -ENOMEM when leaf_size is large; Free
        these nodes outside the trie lock (Suggested by Alexei)
      * collect review and ack tags (Thanks for Toke & Daniel)
    
    v2: https://lore.kernel.org/bpf/[email protected]/
      * collect review tags (Thanks for Toke)
      * drop "Add bpf_mem_cache_is_mergeable() helper" patch
      * patch #3~#4: add fix tag
      * patch #4: rename the helper to trie_check_add_elem() and increase
        n_entries in it.
      * patch #6: use one bpf mem allocator and update commit message to
        clarify that using bpf mem allocator is more appropriate.
      * patch #7: update commit message to add the possible max running time
        for update operation.
      * patch torvalds#9: update commit message to specify the purpose of these test
        cases.
    
    v1: https://lore.kernel.org/bpf/[email protected]/
    ====================
    
    Link: https://lore.kernel.org/all/[email protected]/
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Alexei Starovoitov committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    509df67 View commit details
    Browse the repository at this point in the history
  45. x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR…

    … fails
    
    When ensuring EFER.AUTOIBRS is set, WARN only on a negative return code
    from msr_set_bit(), as '1' is used to indicate the WRMSR was successful
    ('0' indicates the MSR bit was already set).
    
    Fixes: 8cc68c9 ("x86/CPU/AMD: Make sure EFER[AIBRSE] is set")
    Reported-by: Nathan Chancellor <[email protected]>
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Closes: https://lore.kernel.org/all/20241205220604.GA2054199@thelio-3990X
    sean-jc authored and Ingo Molnar committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    4920776 View commit details
    Browse the repository at this point in the history
  46. Merge tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/ulfh/linux-pm
    
    Pull pmdomain fixes from Ulf Hansson:
     "Core:
       - Fix a couple of memory-leaks during genpd init/remove
    
      Providers:
       - imx: Adjust delay for gpcv2 to fix power up handshake
       - mediatek: Fix DT bindings by adding another nested power-domain
         layer"
    
    * tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
      pmdomain: imx: gpcv2: Adjust delay after power up handshake
      pmdomain: core: Fix error path in pm_genpd_init() when ida alloc fails
      pmdomain: core: Add missing put_device()
      dt-bindings: power: mediatek: Add another nested power-domain layer
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    fa4c221 View commit details
    Browse the repository at this point in the history
  47. Merge tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/ulfh/mmc
    
    Pull MMC fixes from Ulf Hansson:
     "Core:
       - Further prevent card detect during shutdown
    
      Host drivers:
       - sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10
         tablet"
    
    * tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
      mmc: core: Further prevent card detect during shutdown
      mmc: sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10 tablet
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    35b7b33 View commit details
    Browse the repository at this point in the history
  48. Merge tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/broonie/spi
    
    Pull spi fixes from Mark Brown:
     "A few small driver specific fixes and device ID updates for SPI.
    
      The Apple change flags the driver as being compatible with the core's
      GPIO chip select support, fixing support for some systems"
    
    * tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
      spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()
      spi: intel: Add Panther Lake SPI controller support
      spi: apple: Set use_gpio_descriptors to true
      spi: mpc52xx: Add cancel_work_sync before module remove
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    b60500e View commit details
    Browse the repository at this point in the history
  49. Merge tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linu…

    …x/kernel/git/broonie/regmap
    
    Pull regmap fixes from Mark Brown:
     "A couple of small fixes, fixing an incorrect format specifier in a log
      message and adding missing cleanup of the devres data used to support
      dev_get_regmap() when a device is unregistered"
    
    * tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
      regmap: detach regmap from dev on regmap_exit
      regmap: Use correct format specifier for logging range errors
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    d9e15b2 View commit details
    Browse the repository at this point in the history
  50. Merge tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/tiwai/sound
    
    Pull sound fixes from Takashi Iwai:
     "A collection of small fixes that have been gathered in the week.
    
       - Fix the missing XRUN handling in USB-audio low latency mode
    
       - Fix regression by the previous USB-audio hadening change
    
       - Clean up old SH sound driver to use the standard helpers
    
       - A few further fixes for MIDI 2.0 UMP handling
    
       - Various HD-audio and USB-audio quirks
    
       - Fix jack handling at PM on ASoC Intel AVS
    
       - Misc small fixes for ASoC SOF and Mediatek"
    
    * tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
      ALSA: hda/realtek: Fix spelling mistake "Firelfy" -> "Firefly"
      ASoC: mediatek: mt8188-mt6359: Remove hardcoded dmic codec
      ALSA: hda/realtek: fix micmute LEDs don't work on HP Laptops
      ALSA: usb-audio: Add extra PID for RME Digiface USB
      ALSA: usb-audio: Fix a DMA to stack memory bug
      ASoC: SOF: ipc3-topology: fix resource leaks in sof_ipc3_widget_setup_comp_dai()
      ALSA: hda/realtek: Add support for Samsung Galaxy Book3 360 (NP730QFG)
      ASoC: Intel: avs: da7219: Remove suspend_pre() and resume_post()
      ALSA: hda/tas2781: Fix error code tas2781_read_acpi()
      ALSA: hda/realtek: Enable mute and micmute LED on HP ProBook 430 G8
      ALSA: usb-audio: add mixer mapping for Corsair HS80
      ALSA: ump: Shut up truncated string warning
      ALSA: sh: Use standard helper for buffer accesses
      ALSA: usb-audio: Notify xrun for low-latency mode
      ALSA: hda/conexant: fix Z60MR100 startup pop issue
      ALSA: ump: Update legacy substream names upon FB info update
      ALSA: ump: Indicate the inactive group in legacy substream names
      ALSA: ump: Don't open legacy substream for an inactive group
      ALSA: seq: ump: Fix seq port updates per FB info notify
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    2b90dcd View commit details
    Browse the repository at this point in the history
  51. Merge tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/dr…

    …m/kernel
    
    Pull drm fixes from Dave Airlie:
     "Pretty quiet week which is probably expected after US holidays, the
      dma-fence and displayport MST message handling fixes make up the bulk
      of this, along with a couple of minor xe and other driver fixes.
    
      dma-fence:
       - Fix reference leak on fence-merge failure path
       - Simplify fence merging with kernel's sort()
       - Fix dma_fence_array_signaled() to ensure forward progress
    
      dp_mst:
       - Fix MST sideband message body length check
       - Fix a bunch of locking/state handling with DP MST msgs
    
      sti:
       - Add __iomem for mixer_dbg_mxn()'s parameter
    
      xe:
       - Missing init value and 64-bit write-order check
       - Fix a memory allocation issue causing lockdep violation
    
      v3d:
       - Performance counter fix"
    
    * tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel:
      drm/v3d: Enable Performance Counters before clearing them
      drm/dp_mst: Use reset_msg_rx_state() instead of open coding it
      drm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_req()
      drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
      drm/dp_mst: Fix down request message timeout handling
      drm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()
      drm/dp_mst: Verify request type in the corresponding down message reply
      drm/dp_mst: Fix resetting msg rx state after topology removal
      drm/xe: Move the coredump registration to the worker thread
      drm/xe/guc: Fix missing init value and add register order check
      drm/sti: Add __iomem for mixer_dbg_mxn's parameter
      drm/dp_mst: Fix MST sideband message body length check
      dma-buf: fix dma_fence_array_signaled v4
      dma-fence: Use kernel's sort for merging fences
      dma-fence: Fix reference leak on fence merge failure path
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    9a6e8c7 View commit details
    Browse the repository at this point in the history
  52. Merge tag 'amd-drm-fixes-6.13-2024-12-04' of https://gitlab.freedeskt…

    …op.org/agd5f/linux into drm-fixes
    
    amd-drm-fixes-6.13-2024-12-04:
    
    amdgpu:
    - Jpeg work handler fix for VCN 1.0
    - HDP flush fixes
    - ACPI EDID sanity check
    - OLED panel backlight fix
    - DC YCbCr fix
    - DC Detile buffer size debugging
    - DC prefetch calculation fix
    - DC VTotal handling fix
    - DC HBlank fix
    - ISP fix
    - SR-IOV fix
    - Workload profile fixes
    - DCN 4.0.1 resume fix
    
    amdkfd:
    - GC 12.x fix
    - GC 9.4.x fix
    
    Signed-off-by: Simona Vetter <[email protected]>
    From: Alex Deucher <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    danvet committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    1995e7d View commit details
    Browse the repository at this point in the history
  53. Merge tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/dr…

    …m/kernel
    
    Pull more drm fixes from Simona Vetter:
     "Due to mailing list unreliability we missed the amdgpu pull, hence
      part two with that now included:
    
       - amdgu: mostly display fixes + jpeg vcn 1.0, sriov, dcn4.0 resume
         fixes
    
       - amdkfd fixes"
    
    * tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel:
      drm/amdgpu: rework resume handling for display (v2)
      drm/amd/pm: fix and simplify workload handling
      Revert "drm/amd/pm: correct the workload setting"
      drm/amdgpu: fix sriov reinit late orders
      drm/amdgpu: Fix ISP hw init issue
      drm/amd/display: Add hblank borrowing support
      drm/amd/display: Limit VTotal range to max hw cap minus fp
      drm/amd/display: Correct prefetch calculation
      drm/amd/display: Add option to retrieve detile buffer size
      drm/amd/display: Add a left edge pixel if in YCbCr422 or YCbCr420 and odm
      drm/amdkfd: hard-code cacheline for gc943,gc944
      drm/amdkfd: add MEC version that supports no PCIe atomics for GFX12
      drm/amd/display: Fix programming backlight on OLED panels
      drm/amd: Sanity check the ACPI EDID
      drm/amdgpu/hdp7.0: do a posting read when flushing HDP
      drm/amdgpu/hdp6.0: do a posting read when flushing HDP
      drm/amdgpu/hdp5.2: do a posting read when flushing HDP
      drm/amdgpu/hdp5.0: do a posting read when flushing HDP
      drm/amdgpu/hdp4.0: do a posting read when flushing HDP
      drm/amdgpu/jpeg1.0: fix idle work handler
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    c7cde62 View commit details
    Browse the repository at this point in the history
  54. Merge tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/rppt/memblock
    
    Pull memblock fixes from Mike Rapoport:
     "Restore check for node validity in arch_numa.
    
      The rework of NUMA initialization in arch_numa dropped a check that
      refused to accept configurations with invalid node IDs.
    
      Restore that check to ensure that when firmware passes invalid nodes,
      such configuration is rejected and kernel gracefully falls back to
      dummy NUMA"
    
    * tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
      arch_numa: Restore nid checks before registering a memblock with a node
      memblock: allow zero threshold in validate_numa_converage()
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    ddfc146 View commit details
    Browse the repository at this point in the history
  55. Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/arm64/linux
    
    Pull arm64 fixes from Catalin Marinas:
     "Nothing major, some left-overs from the recent merging window (MTE,
      coco) and some newly found issues like the ptrace() ones.
    
       - MTE/hugetlbfs:
    
          - Set VM_MTE_ALLOWED in the arch code and remove it from the core
            code for hugetlbfs mappings
    
          - Fix copy_highpage() warning when the source is a huge page but
            not MTE tagged, taking the wrong small page path
    
       - drivers/virt/coco:
    
          - Add the pKVM and Arm CCA drivers under the arm64 maintainership
    
          - Fix the pkvm driver to fall back to ioremap() (and warn) if the
            MMIO_GUARD hypercall fails
    
          - Keep the Arm CCA driver default 'n' rather than 'm'
    
       - A series of fixes for the arm64 ptrace() implementation,
         potentially leading to the kernel consuming uninitialised stack
         variables when PTRACE_SETREGSET is invoked with a length of 0
    
       - Fix zone_dma_limit calculation when RAM starts below 4GB and
         ZONE_DMA is capped to this limit
    
       - Fix early boot warning with CONFIG_DEBUG_VIRTUAL=y triggered by a
         call to page_to_phys() (from patch_map()) which checks pfn_valid()
         before vmemmap has been set up
    
       - Do not clobber bits 15:8 of the ASID used for TTBR1_EL1 and TLBI
         ops when the kernel assumes 8-bit ASIDs but running under a
         hypervisor on a system that implements 16-bit ASIDs (found running
         Linux under Parallels on Apple M4)
    
       - ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A as it
         is using the same SMMU PMCG as HIP09 and suffers from the same
         errata
    
       - Add GCS to cpucap_is_possible(), missed in the recent merge"
    
    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
      arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS
      arm64: ptrace: fix partial SETREGSET for NT_ARM_POE
      arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR
      arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL
      arm64: cpufeature: Add GCS to cpucap_is_possible()
      coco: virt: arm64: Do not enable cca guest driver by default
      arm64: mte: Fix copy_highpage() warning on hugetlb folios
      arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-bit ASIDs
      ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
      MAINTAINERS: Add CCA and pKVM CoCO guest support to the ARM64 entry
      drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails
      arm64: patching: avoid early page_to_phys()
      arm64: mm: Fix zone_dma_limit calculation
      arm64: mte: set VM_MTE_ALLOWED for hugetlbfs at correct place
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    f3ddc43 View commit details
    Browse the repository at this point in the history
  56. Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/bpf/bpf
    
    Pull bpf fixes from Daniel Borkmann::
    
     - Fix several issues for BPF LPM trie map which were found by syzbot
       and during addition of new test cases (Hou Tao)
    
     - Fix a missing process_iter_arg register type check in the BPF
       verifier (Kumar Kartikeya Dwivedi, Tao Lyu)
    
     - Fix several correctness gaps in the BPF verifier when interacting
       with the BPF stack without CAP_PERFMON (Kumar Kartikeya Dwivedi,
       Eduard Zingerman, Tao Lyu)
    
     - Fix OOB BPF map writes when deleting elements for the case of xsk map
       as well as devmap (Maciej Fijalkowski)
    
     - Fix xsk sockets to always clear DMA mapping information when
       unmapping the pool (Larysa Zaremba)
    
     - Fix sk_mem_uncharge logic in tcp_bpf_sendmsg to only uncharge after
       sent bytes have been finalized (Zijian Zhang)
    
     - Fix BPF sockmap with vsocks which was missing a queue check in poll
       and sockmap cleanup on close (Michal Luczaj)
    
     - Fix tools infra to override makefile ARCH variable if defined but
       empty, which addresses cross-building tools. (Björn Töpel)
    
     - Fix two resolve_btfids build warnings on unresolved bpf_lsm symbols
       (Thomas Weißschuh)
    
     - Fix a NULL pointer dereference in bpftool (Amir Mohammadi)
    
     - Fix BPF selftests to check for CONFIG_PREEMPTION instead of
       CONFIG_PREEMPT (Sebastian Andrzej Siewior)
    
    * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (31 commits)
      selftests/bpf: Add more test cases for LPM trie
      selftests/bpf: Move test_lpm_map.c to map_tests
      bpf: Use raw_spinlock_t for LPM trie
      bpf: Switch to bpf mem allocator for LPM trie
      bpf: Fix exact match conditions in trie_get_next_key()
      bpf: Handle in-place update for full LPM trie correctly
      bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
      bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
      bpf: Remove unnecessary check when updating LPM trie
      selftests/bpf: Add test for narrow spill into 64-bit spilled scalar
      selftests/bpf: Add test for reading from STACK_INVALID slots
      selftests/bpf: Introduce __caps_unpriv annotation for tests
      bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots
      bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc
      samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS
      bpf: Zero index arg error string for dynptr and iter
      selftests/bpf: Add tests for iter arg check
      bpf: Ensure reg is PTR_TO_STACK in process_iter_arg
      tools: Override makefile ARCH variable if defined, but empty
      selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap
      ...
    torvalds committed Dec 6, 2024
    Configuration menu
    Copy the full SHA
    b5f2170 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2024

  1. headers/cleanup.h: Remove the if_not_guard() facility

    Linus noticed that the new if_not_guard() definition is fragile:
    
       "This macro generates actively wrong code if it happens to be inside an
        if-statement or a loop without a block.
    
        IOW, code like this:
    
          for (iterate-over-something)
              if_not_guard(a)
                  return -BUSY;
    
        looks like will build fine, but will generate completely incorrect code."
    
    The reason is that the __if_not_guard() macro is multi-statement, so
    while most kernel developers expect macros to be simple or at least
    compound statements - but for __if_not_guard() it is not so:
    
     #define __if_not_guard(_name, _id, args...)            \
            BUILD_BUG_ON(!__is_cond_ptr(_name));            \
            CLASS(_name, _id)(args);                        \
            if (!__guard_ptr(_name)(&_id))
    
    To add insult to injury, the placement of the BUILD_BUG_ON() line makes
    the macro appear to compile fine, but it will generate incorrect code
    as Linus reported, for example if used within iteration or conditional
    statements that will use the first statement of a macro as a loop body
    or conditional statement body.
    
    [ I'd also like to note that the original submission by David Lechner did
      not contain the BUILD_BUG_ON() line, so it was safer than what we ended
      up committing. Mea culpa. ]
    
    It doesn't appear to be possible to turn this macro into a robust
    single or compound statement that could be used in single statements,
    due to the necessity to define an auto scope variable with an open
    scope and the necessity of it having to expand to a partial 'if'
    statement with no body.
    
    Instead of trying to work around this fragility, just remove the
    construct before it gets used.
    
    Reported-by: Linus Torvalds <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: David Lechner <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Ingo Molnar committed Dec 7, 2024
    Configuration menu
    Copy the full SHA
    b4d83c8 View commit details
    Browse the repository at this point in the history
  2. Merge tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/rw/ubifs
    
    Pull jffs2 fix from Richard Weinberger:
    
     - Fixup rtime compressor bounds checking
    
    * tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
      jffs2: Fix rtime decompressor
    torvalds committed Dec 7, 2024
    Configuration menu
    Copy the full SHA
    a6db2a5 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux

    Pull io_uring fix from Jens Axboe:
     "A single fix for a parameter type which affects 32-bit"
    
    * tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux:
      io_uring: Change res2 parameter type in io_uring_cmd_done
    torvalds committed Dec 7, 2024
    Configuration menu
    Copy the full SHA
    aa0274d View commit details
    Browse the repository at this point in the history
  4. Merge tag 'block-6.13-20241207' of git://git.kernel.dk/linux

    Pull block fixes from Jens Axboe:
    
     - NVMe pull request via Keith:
          - Target fix using incorrect zero buffer (Nilay)
          - Device specifc deallocate quirk fixes (Christoph, Keith)
          - Fabrics fix for handling max command target bugs (Maurizio)
          - Cocci fix usage for kzalloc (Yu-Chen)
          - DMA size fix for host memory buffer feature (Christoph)
          - Fabrics queue cleanup fixes (Chunguang)
    
     - CPU hotplug ordering fixes
    
     - Add missing MODULE_DESCRIPTION for rnull
    
     - bcache error value fix
    
     - virtio-blk queue freeze fix
    
    * tag 'block-6.13-20241207' of git://git.kernel.dk/linux:
      blk-mq: move cpuhp callback registering out of q->sysfs_lock
      blk-mq: register cpuhp callback after hctx is added to xarray table
      virtio-blk: don't keep queue frozen during system suspend
      nvme-tcp: simplify nvme_tcp_teardown_io_queues()
      nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
      nvme-rdma: unquiesce admin_q before destroy it
      nvme-tcp: fix the memleak while create new ctrl failed
      nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary
      nvmet: replace kmalloc + memset with kzalloc for data allocation
      nvme-fabrics: handle zero MAXCMD without closing the connection
      bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again
      nvme-pci: remove two deallocate zeroes quirks
      block: rnull: add missing MODULE_DESCRIPTION
      nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
      nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()
    torvalds committed Dec 7, 2024
    Configuration menu
    Copy the full SHA
    7503345 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2024

  1. Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/jejb/scsi
    
    Pull SCSI fixes from James Bottomley:
     "Large number of small fixes, all in drivers"
    
    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
      scsi: scsi_debug: Fix hrtimer support for ndelay
      scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
      scsi: ufs: core: Add missing post notify for power mode change
      scsi: sg: Fix slab-use-after-free read in sg_release()
      scsi: ufs: core: sysfs: Prevent div by zero
      scsi: qla2xxx: Update version to 10.02.09.400-k
      scsi: qla2xxx: Supported speed displayed incorrectly for VPorts
      scsi: qla2xxx: Fix NVMe and NPIV connect issue
      scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cnt
      scsi: qla2xxx: Fix use after free on unload
      scsi: qla2xxx: Fix abort in bsg timeout
      scsi: mpi3mr: Update driver version to 8.12.0.3.50
      scsi: mpi3mr: Handling of fault code for insufficient power
      scsi: mpi3mr: Start controller indexing from 0
      scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs
      scsi: mpi3mr: Synchronize access to ioctl data buffer
      scsi: mpt3sas: Update driver version to 51.100.00.00
      scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
      scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove()
      scsi: ufs: pltfrm: Drop PM runtime reference count after ufshcd_remove()
      ...
    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    c94cd02 View commit details
    Browse the repository at this point in the history
  2. Merge tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench…

    …/cifs-2.6
    
    Pull smb client fixes from Steve French:
    
     - DFS fix (for race with tree disconnect and dfs cache worker)
    
     - Four fixes for SMB3.1.1 posix extensions:
          - improve special file support e.g. to Samba, retrieving the file
            type earlier
          - reduce roundtrips (e.g. on ls -l, in some cases)
    
    * tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
      smb: client: fix potential race in cifs_put_tcon()
      smb3.1.1: fix posix mounts to older servers
      fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse points
      fs/smb/client: Implement new SMB3 POSIX type
      fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX
    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    62b5a46 View commit details
    Browse the repository at this point in the history
  3. modpost: Add .irqentry.text to OTHER_SECTIONS

    The compiler can fully inline the actual handler function of an interrupt
    entry into the .irqentry.text entry point. If such a function contains an
    access which has an exception table entry, modpost complains about a
    section mismatch:
    
      WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...
    
      The relocation at __ex_table+0x447c references section ".irqentry.text"
      which is not in the list of authorized sections.
    
    Add .irqentry.text to OTHER_SECTIONS to cure the issue.
    
    Reported-by: Sergey Senozhatsky <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: [email protected] # needed for linux-5.4-y
    Link: https://lore.kernel.org/all/[email protected]/
    Signed-off-by: Masahiro Yamada <[email protected]>
    KAGA-KOKO authored and masahir0y committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    7912405 View commit details
    Browse the repository at this point in the history
  4. kbuild: deb-pkg: fix build error with O=

    Since commit 13b2548 ("kbuild: change working directory to external
    module directory with M="), the Debian package build fails if a relative
    path is specified with the O= option.
    
      $ make O=build bindeb-pkg
        [ snip ]
      dpkg-deb: building package 'linux-image-6.13.0-rc1' in '../linux-image-6.13.0-rc1_6.13.0-rc1-6_amd64.deb'.
      Rebuilding host programs with x86_64-linux-gnu-gcc...
      make[6]: Entering directory '/home/masahiro/linux/build'
      /home/masahiro/linux/Makefile:190: *** specified kernel directory "build" does not exist.  Stop.
    
    This occurs because the sub_make_done flag is cleared, even though the
    working directory is already in the output directory.
    
    Passing KBUILD_OUTPUT=. resolves the issue.
    
    Fixes: 13b2548 ("kbuild: change working directory to external module directory with M=")
    Reported-by: Charlie Jenkins <[email protected]>
    Closes: https://lore.kernel.org/all/Z1DnP-GJcfseyrM3@ghost/
    Tested-by: Charlie Jenkins <[email protected]>
    Reviewed-by: Charlie Jenkins <[email protected]>
    Signed-off-by: Masahiro Yamada <[email protected]>
    masahir0y committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    d8d326d View commit details
    Browse the repository at this point in the history
  5. tracing/eprobe: Fix to release eprobe when failed to add dyn_event

    Fix eprobe event to unregister event call and release eprobe when it fails
    to add dynamic event correctly.
    
    Link: https://lore.kernel.org/all/173289886698.73724.1959899350183686006.stgit@devnote2/
    
    Fixes: 7491e2c ("tracing: Add a probe that attaches to trace events")
    Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
    mhiramat committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    494b332 View commit details
    Browse the repository at this point in the history
  6. Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.o…

    …rg/pub/scm/linux/kernel/git/akpm/mm
    
    Pull misc fixes from Andrew Morton:
     "24 hotfixes.  17 are cc:stable.  15 are MM and 9 are non-MM.
    
      The usual bunch of singletons - please see the relevant changelogs for
      details"
    
    * tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits)
      iio: magnetometer: yas530: use signed integer type for clamp limits
      sched/numa: fix memory leak due to the overwritten vma->numab_state
      mm/damon: fix order of arguments in damos_before_apply tracepoint
      lib: stackinit: hide never-taken branch from compiler
      mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()
      scatterlist: fix incorrect func name in kernel-doc
      mm: correct typo in MMAP_STATE() macro
      mm: respect mmap hint address when aligning for THP
      mm: memcg: declare do_memsw_account inline
      mm/codetag: swap tags when migrate pages
      ocfs2: update seq_file index in ocfs2_dlm_seq_next
      stackdepot: fix stack_depot_save_flags() in NMI context
      mm: open-code page_folio() in dump_page()
      mm: open-code PageTail in folio_flags() and const_folio_flags()
      mm: fix vrealloc()'s KASAN poisoning logic
      Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
      selftests/damon: add _damon_sysfs.py to TEST_FILES
      selftest: hugetlb_dio: fix test naming
      ocfs2: free inode when ocfs2_get_init_inode() fails
      nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
      ...
    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    553c89e View commit details
    Browse the repository at this point in the history
  7. Merge tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull x86 fixes from Borislav Petkov:
    
     - Have the Automatic IBRS setting check on AMD does not falsely fire in
       the guest when it has been set already on the host
    
     - Make sure cacheinfo structures memory is allocated to address a boot
       NULL ptr dereference on Intel Meteor Lake which has different numbers
       of subleafs in its CPUID(4) leaf
    
     - Take care of the GDT restoring on the kexec path too, as expected by
       the kernel
    
     - Make sure SMP is not disabled when IO-APIC is disabled on the kernel
       cmdline
    
     - Add a PGD flag _PAGE_NOPTISHADOW to instruct machinery not to
       propagate changes to the kernelmode page tables, to the user portion,
       in PTI
    
     - Mark Intel Lunar Lake as affected by an issue where MONITOR wakeups
       can get lost and thus user-visible delays happen
    
     - Make sure PKRU is properly restored with XRSTOR on AMD after a PRKU
       write of 0 (WRPKRU) which will mark PKRU in its init state and thus
       lose the actual buffer
    
    * tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
      x86/cacheinfo: Delete global num_cache_leaves
      cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
      x86/kexec: Restore GDT on return from ::preserve_context kexec
      x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
      x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables
      x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
      x86/pkeys: Ensure updated PKRU value is XRSTOR'd
      x86/pkeys: Change caller of update_pkru_in_sigframe()
    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    8426226 View commit details
    Browse the repository at this point in the history
  8. Merge tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/s…

    …cm/linux/kernel/git/tip/tip
    
    Pull timer fix from Borislav Petkov:
    
     - Handle the case where clocksources with small counter width can,
       in conjunction with overly long idle sleeps, falsely trigger the
       negative motion detection of clocksources
    
    * tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      clocksource: Make negative motion detection more robust
    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    c25ca0c View commit details
    Browse the repository at this point in the history
  9. Merge tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull irq fixes from Borislav Petkov:
    
     - Fix a /proc/interrupts formatting regression
    
     - Have the BCM2836 interrupt controller enter power management states
       properly
    
     - Other fixlets
    
    * tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing
      genirq/proc: Add missing space separator back
      irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
      irqchip/gic-v3: Fix irq_complete_ack() comment
    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    eadaac4 View commit details
    Browse the repository at this point in the history
  10. Merge tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/masahiroy/linux-kbuild
    
    Pull Kbuild fixes from Masahiro Yamada:
    
     - Fix a section mismatch warning in modpost
    
     - Fix Debian package build error with the O= option
    
    * tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
      kbuild: deb-pkg: fix build error with O=
      modpost: Add .irqentry.text to OTHER_SECTIONS
    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    0b6809a View commit details
    Browse the repository at this point in the history
  11. Linux 6.13-rc2

    torvalds committed Dec 8, 2024
    Configuration menu
    Copy the full SHA
    fac04ef View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2024

  1. futex: fix user access on powerpc

    The powerpc user access code is special, and unlike other architectures
    distinguishes between user access for reading and writing.
    
    And commit 43a43fa ("futex: improve user space accesses") messed
    that up.  It went undetected elsewhere, but caused ppc32 to fail early
    during boot, because the user access had been started with
    user_read_access_begin(), but then finished off with just a plain
    "user_access_end()".
    
    Note that the address-masking user access helpers don't even have that
    read-vs-write distinction, so if powerpc ever wants to do address
    masking tricks, we'll have to do some extra work for it.
    
    [ Make sure to also do it for the EFAULT case, as pointed out by
      Christophe Leroy ]
    
    Reported-by: Andreas Schwab <[email protected]>
    Cc: Christophe Leroy <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Signed-off-by: Linus Torvalds <[email protected]>
    torvalds committed Dec 9, 2024
    Configuration menu
    Copy the full SHA
    32913f3 View commit details
    Browse the repository at this point in the history
  2. x86: Fix build regression with CONFIG_KEXEC_JUMP enabled

    Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:
    
      ld: vmlinux.o: in function `virtual_mapped':
      linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'
    
    when CONFIG_KEXEC_JUMP is enabled.
    
    This was introduced by commit 07fa619 ("x86/kexec: Restore GDT on
    return from ::preserve_context kexec") which introduced a use of
    saved_context_gdt_desc without a declaration for it.
    
    Fix that by including asm/asm-offsets.h where saved_context_gdt_desc
    is defined (indirectly in include/generated/asm-offsets.h which
    asm/asm-offsets.h includes).
    
    Fixes: 07fa619 ("x86/kexec: Restore GDT on return from ::preserve_context kexec")
    Signed-off-by: Damien Le Moal <[email protected]>
    Acked-by: Borislav Petkov (AMD) <[email protected]>
    Acked-by: David Woodhouse <[email protected]>
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Signed-off-by: Linus Torvalds <[email protected]>
    damien-lemoal authored and torvalds committed Dec 9, 2024
    Configuration menu
    Copy the full SHA
    aeb6893 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'sched_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/sc…

    …m/linux/kernel/git/tip/tip
    
    Pull scheduler fixes from Borislav Petkov:
    
     - Remove wrong enqueueing of a task for a later wakeup when a task
       blocks on a RT mutex
    
     - Do not setup a new deadline entity on a boosted task as that has
       happened already
    
     - Update preempt= kernel command line param
    
     - Prevent needless softirqd wakeups in the idle task's context
    
     - Detect the case where the idle load balancer CPU becomes busy and
       avoid unnecessary load balancing invocation
    
     - Remove an unnecessary load balancing need_resched() call in
       nohz_csd_func()
    
     - Allow for raising of SCHED_SOFTIRQ softirq type on RT but retain the
       warning to catch any other cases
    
     - Remove a wrong warning when a cpuset update makes the task affinity
       no longer a subset of the cpuset
    
    * tag 'sched_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutex
      sched/deadline: Fix warning in migrate_enable for boosted tasks
      sched/core: Update kernel boot parameters for LAZY preempt.
      sched/core: Prevent wakeup of ksoftirqd during idle load balance
      sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy
      sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()
      softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
      sched: fix warning in sched_setaffinity
      sched/deadline: Fix replenish_dl_new_period dl_server condition
    torvalds committed Dec 9, 2024
    Configuration menu
    Copy the full SHA
    df9e210 View commit details
    Browse the repository at this point in the history
  4. Merge tag 'perf_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull x86 perf fixes from Borislav Petkov:
    
     - Make sure the PEBS buffer is drained before reconfiguring the
       hardware
    
     - Add Arrow Lake U support
    
    * tag 'perf_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf/x86/intel/ds: Unconditionally drain PEBS DS when changing PEBS_DATA_CFG
      perf/x86/intel: Add Arrow Lake U support
    torvalds committed Dec 9, 2024
    Configuration menu
    Copy the full SHA
    e4c995f View commit details
    Browse the repository at this point in the history
  5. Merge tag 'locking_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/…

    …scm/linux/kernel/git/tip/tip
    
    Pull locking fixes from Borislav Petkov:
    
     - Remove if_not_guard() as it is generating incorrect code
    
     - Fix the initialization of the fake lockdep_map for the first locked
       ww_mutex
    
    * tag 'locking_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      headers/cleanup.h: Remove the if_not_guard() facility
      locking/ww_mutex: Fix ww_mutex dummy lockdep map selftest warnings
    torvalds committed Dec 9, 2024
    Configuration menu
    Copy the full SHA
    7cb1b46 View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2024

  1. Merge tag 'probes-fixes-v6.13-rc1' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/trace/linux-trace
    
    Pull eprobes fix from Masami Hiramatsu:
    
     - release eprobe when failing to add dyn_event.
    
       This unregisters event call and release eprobe when it fails to add a
       dynamic event. Found in cleaning up.
    
    * tag 'probes-fixes-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
      tracing/eprobe: Fix to release eprobe when failed to add dyn_event
    torvalds committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    1594c49 View commit details
    Browse the repository at this point in the history
  2. Merge tag 'for-6.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
     "A few more fixes. Apart from the one liners and updated bio splitting
      error handling there's a fix for subvolume mount with different flags.
      This was known and fixed for some time but I've delayed it to give it
      more testing.
    
       - fix unbalanced locking when swapfile activation fails when the
         subvolume gets deleted in the meantime
    
       - add btrfs error handling after bio_split() calls that got error
         handling recently
    
       - during unmount, flush delalloc workers at the right time before the
         cleaner thread is shut down
    
       - fix regression in buffered write folio conversion, explicitly wait
         for writeback as FGP_STABLE flag is currently a no-op on btrfs
    
       - handle race in subvolume mount with different flags, the conversion
         to the new mount API did not handle the case where multiple
         subvolumes get mounted in parallel, which is a distro use case"
    
    * tag 'for-6.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: flush delalloc workers queue before stopping cleaner kthread during unmount
      btrfs: handle bio_split() errors
      btrfs: properly wait for writeback before buffered write
      btrfs: fix missing snapshot drew unlock when root is dead during swap activation
      btrfs: fix mount failure due to remount races
    torvalds committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    5a087a6 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/clk/linux
    
    Pull clk fixes from Stephen Boyd:
     "Two reverts and two EN7581 driver fixes:
    
       - Revert the attempt to make CLK_GET_RATE_NOCACHE flag work in
         clk_set_rate() because it led to problems with the Qualcomm CPUFreq
         driver
    
       - Revert Amlogic reset driver back to the initial implementation.
         This broke probe of the audio subsystem on axg based platforms and
         also had compilation problems. We'll try again next time.
    
       - Fix a clk frequency and fix array bounds runtime checks in the
         Airoha EN7581 driver"
    
    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
      clk: en7523: Initialize num before accessing hws in en7523_register_clocks()
      clk: en7523: Fix wrong BUS clock for EN7581
      clk: amlogic: axg-audio: revert reset implementation
      Revert "clk: Fix invalid execution of clk_set_rate"
    torvalds committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    f92f474 View commit details
    Browse the repository at this point in the history
  4. btrfs: simplify waiting for encoded read endios

    Simplify the I/O completion path for encoded reads by using a
    completion instead of a wait_queue.
    
    Furthermore use refcount_t instead of atomic_t for reference counting the
    private data.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    3472502 View commit details
    Browse the repository at this point in the history
  5. btrfs: fix improper generation check in snapshot delete

    We have been using the following check
    
       if (generation <= root->root_key.offset)
    
    to make decisions about whether or not to visit a node during snapshot
    delete.  This is because for normal subvolumes this is set to 0, and for
    snapshots it's set to the creation generation.  The idea being that if
    the generation of the node is less than or equal to our creation
    generation then we don't need to visit that node, because it doesn't
    belong to us, we can simply drop our reference and move on.
    
    However reloc roots don't have their generation stored in
    root->root_key.offset, instead that is the objectid of their
    corresponding fs root.  This means we can incorrectly not walk into
    nodes that need to be dropped when deleting a reloc root.
    
    There are a variety of consequences to making the wrong choice in two
    distinct areas.
    
    visit_node_for_delete()
    
    1. False positive.  We think we are newer than the block when we really
       aren't.  We don't visit the node and drop our reference to the node
       and carry on.  This would result in leaked space.
    2. False negative.  We do decide to walk down into a block that we
       should have just dropped our reference to.  However this means that
       the child node will have refs > 1, so we will switch to
       UPDATE_BACKREF, and then the subsequent walk_down_proc() will notice
       that btrfs_header_owner(node) != root->root_key.objectid and it'll
       break out of the loop, and then walk_up_proc() will drop our reference,
       so this appears to be ok.
    
    do_walk_down()
    
    1. False positive.  We are in UPDATE_BACKREF and incorrectly decide that
       we are done and don't need to update the backref for our lower nodes.
       This is another case that simply won't happen with relocation, as we
       only have to do UPDATE_BACKREF if the node below us was shared and
       didn't have FULL_BACKREF set, and since we don't own that node
       because we're a reloc root we actually won't end up in this case.
    2. False negative.  Again this is tricky because as described above, we
       simply wouldn't be here from relocation, because we don't own any of
       the nodes because we never set btrfs_header_owner() to the reloc root
       objectid, and we always use FULL_BACKREF, we never actually need to
       set FULL_BACKREF on any children.
    
    Having spent a lot of time stressing relocation/snapshot delete recently
    I've not seen this pop in practice.  But this is objectively incorrect,
    so fix this to get the correct starting generation based on the root
    we're dropping to keep me from thinking there's a problem here.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    196fd52 View commit details
    Browse the repository at this point in the history
  6. btrfs: move select_delayed_ref() and export it

    This helper is how we select the delayed ref to run once we've selected
    the delayed ref head.  I need this exported to add a unit test for
    delayed refs, and it's more natural home is in delayed-ref.c.  Rename it
    to btrfs_select_delayed_ref and move it into delayed-ref.c.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    52ff715 View commit details
    Browse the repository at this point in the history
  7. btrfs: selftests: add delayed ref self test cases

    The recent fix for a stupid mistake I made uncovered the fact that we
    don't have adequate testing in the delayed refs code, as it took a
    pretty extensive and long running stress test to uncover something that
    a unit test would have uncovered right away.
    
    Fix this by adding a delayed refs self test suite.  This will validate
    that the btrfs_ref transformation does the correct thing, that we do the
    correct thing when merging delayed refs, and that we get the delayed
    refs in the order that we expect.  These are all crucial to how the
    delayed refs operate.
    
    I introduced various bugs (including the original bug) into the delayed
    refs code to validate that these tests caught all of the shenanigans
    that I could think of.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    ee7cf5c View commit details
    Browse the repository at this point in the history
  8. btrfs: use kmemdup() in btrfs_uring_encoded_read()

    Use kmemdup() in btrfs_uring_encoded_read() rather than kmalloc() followed by
    memcpy().
    
    Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Reported-by: kernel test robot <[email protected]>
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    1a28705 View commit details
    Browse the repository at this point in the history
  9. btrfs: use PTR_ERR() instead of PTR_ERR_OR_ZERO() for btrfs_get_extent()

    The function btrfs_get_extent() will only return an PTR_ERR() or a valid
    extent map pointer. It will not return NULL.
    
    Thus the usage of PTR_ERR_OR_ZERO() inside submit_one_sector() is not
    needed, use plain PTR_ERR() instead, and that is the only usage of
    PTR_ERR_OR_ZERO() after btrfs_get_extent().
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    806a37a View commit details
    Browse the repository at this point in the history
  10. btrfs: send: remove redundant assignments to variable ret

    The variable ret is being initialized to zero and also later re-assigned
    to zero. In both cases the assignment is redundant since the value is
    never read after the assignment and hence they can be removed.
    
    Signed-off-by: Colin Ian King <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    ColinIanKing authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    9b062de View commit details
    Browse the repository at this point in the history
  11. btrfs: handle FS_IOC_READ_VERITY_METADATA ioctl

    Commit 1460540 ("btrfs: initial fsverity support") introduced
    fs-verity support for btrfs, but didn't add support for
    FS_IOC_READ_VERITY_METADATA to directly query the Merkle tree,
    descriptor and signature blocks for fs-verity enabled files.
    
    Add the (trival) implementation: we just need to wire it through to the
    fs-verity code, the same way as is done in the other two filesystems
    which support this ioctl (ext4, f2fs). The fs-verity code already has
    access to the required data.
    
    This is also safe to backport to older stable trees (5.15+) if needed.
    
    Signed-off-by: Allison Karlitskaya <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    allisonkarlitskaya authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    d089a53 View commit details
    Browse the repository at this point in the history
  12. btrfs: factor out btrfs_return_free_space()

    Factor out a part of unpin_extent_range() that returns space back to the
    space info, prioritizing global block reserve.  Also, move the "len"
    variable into the loop to clarify we don't need to carry it beyond an
    iteration.
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Naohiro Aota <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    naota authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    ede9f4a View commit details
    Browse the repository at this point in the history
  13. btrfs: drop fs_info argument from btrfs_update_space_info_*()

    Since commit e1e577a ("btrfs: store fs_info in space_info"), we have
    the fs_info in a space_info. So, we can drop fs_info argument from
    btrfs_update_space_info_*. There is no behavior change.
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Naohiro Aota <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    naota authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    0b7f674 View commit details
    Browse the repository at this point in the history
  14. btrfs: zoned: reclaim unused zone by zone resetting

    On the zoned mode, once used and freed region is still not reusable after the
    freeing. The underlying zone needs to be reset before reusing. Btrfs resets a
    zone when it removes a block group, and then new block group is allocated on
    the zones to reuse the zones. But, it is sometime too late to catch up with a
    write side.
    
    This commit introduces a new space-info reclaim method ZONE_RESET. That will
    pick a block group from the unused list and reset its zone to reuse the
    zone_unusable space. It is faster than removing the block group and re-creating
    a new block group on the same zones.
    
    For the first implementation, the ZONE_RESET is only applied to a block group
    whose region is fully zone_unusable. Reclaiming partial zone_unusable block
    group could be implemented later.
    
    Signed-off-by: Naohiro Aota <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    naota authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    65fb456 View commit details
    Browse the repository at this point in the history
  15. btrfs: don't BUG_ON() in btrfs_drop_extents()

    btrfs_drop_extents() calls BUG_ON() in case the counter of to be deleted
    extents is greater than 0. But all of these code paths can handle errors,
    so there's no need to crash the kernel. Instead WARN() that the condition
    has been met and gracefully bail out.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    afcc184 View commit details
    Browse the repository at this point in the history
  16. btrfs: fix data race when accessing the inode's disk_i_size at btrfs_…

    …drop_extents()
    
    A data race occurs when the function `insert_ordered_extent_file_extent()`
    and the function `btrfs_inode_safe_disk_i_size_write()` are executed
    concurrently. The function `insert_ordered_extent_file_extent()` is not
    locked when reading inode->disk_i_size, causing
    `btrfs_inode_safe_disk_i_size_write()` to cause data competition when
    writing inode->disk_i_size, thus affecting the value of `modify_tree`.
    
    The specific call stack that appears during testing is as follows:
    ============DATA_RACE============
     btrfs_drop_extents+0x89a/0xa060 [btrfs]
     insert_reserved_file_extent+0xb54/0x2960 [btrfs]
     insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs]
     btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs]
     btrfs_finish_ordered_io+0x37/0x60 [btrfs]
     finish_ordered_fn+0x3e/0x50 [btrfs]
     btrfs_work_helper+0x9c9/0x27a0 [btrfs]
     process_scheduled_works+0x716/0xf10
     worker_thread+0xb6a/0x1190
     kthread+0x292/0x330
     ret_from_fork+0x4d/0x80
     ret_from_fork_asm+0x1a/0x30
    ============OTHER_INFO============
     btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs]
     btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs]
     btrfs_finish_ordered_io+0x37/0x60 [btrfs]
     finish_ordered_fn+0x3e/0x50 [btrfs]
     btrfs_work_helper+0x9c9/0x27a0 [btrfs]
     process_scheduled_works+0x716/0xf10
     worker_thread+0xb6a/0x1190
     kthread+0x292/0x330
     ret_from_fork+0x4d/0x80
     ret_from_fork_asm+0x1a/0x30
    =================================
    
    The main purpose of the check of the inode's disk_i_size is to avoid
    taking write locks on a btree path when we have a write at or beyond
    eof, since in these cases we don't expect to find extent items in the
    root to drop. However if we end up taking write locks due to a data
    race on disk_i_size, everything is still correct, we only add extra
    lock contention on the tree in case there's concurrency from other tasks.
    If the race causes us to not take write locks when we actually need them,
    then everything is functionally correct as well, since if we find out we
    have extent items to drop and we took read locks (modify_tree set to 0),
    we release the path and retry again with write locks.
    
    Since this data race does not affect the correctness of the function,
    it is a harmless data race, use data_race() to check inode->disk_i_size.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Hao-ran Zheng <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Hao-ran Zheng authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    3a97149 View commit details
    Browse the repository at this point in the history
  17. btrfs: use bio_is_zone_append() in the completion handler

    Otherwise it won't catch bios turned into regular writes by the block
    level zone write plugging. The additional test it adds is for emulated
    zone append.
    
    Fixes: 9b1ce7f ("block: Implement zone append emulation")
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: Damien Le Moal <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Christoph Hellwig authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    6d3a586 View commit details
    Browse the repository at this point in the history
  18. btrfs: split bios to the fs sector size boundary

    Btrfs like other file systems can't really deal with I/O not aligned to
    it's internal block size (which strangely is called sector size in
    btrfs, for historical reasons), but the block layer split helper doesn't
    even know about that.
    
    Round down the split boundary so that all I/Os are aligned.
    
    Fixes: d5e4377 ("btrfs: split zone append bios in btrfs_submit_bio")
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Damien Le Moal <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Christoph Hellwig authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    888ecc4 View commit details
    Browse the repository at this point in the history
  19. btrfs: tree-checker: reject inline extent items with 0 ref count

    [BUG]
    There is a bug report in the mailing list where btrfs_run_delayed_refs()
    failed to drop the ref count for logical 25870311358464 num_bytes
    2113536.
    
    The involved leaf dump looks like this:
    
      item 166 key (25870311358464 168 2113536) itemoff 10091 itemsize 50
        extent refs 1 gen 84178 flags 1
        ref#0: shared data backref parent 32399126528000 count 0 <<<
        ref#1: shared data backref parent 31808973717504 count 1
    
    Notice the count number is 0.
    
    [CAUSE]
    There is no concrete evidence yet, but considering 0 -> 1 is also a
    single bit flipped, it's possible that hardware memory bitflip is
    involved, causing the on-disk extent tree to be corrupted.
    
    [FIX]
    To prevent us reading such corrupted extent item, or writing such
    damaged extent item back to disk, enhance the handling of
    BTRFS_EXTENT_DATA_REF_KEY and BTRFS_SHARED_DATA_REF_KEY keys for both
    inlined and key items, to detect such 0 ref count and reject them.
    
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Reported-by: Frankie Fisher <[email protected]>
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    adam900710 authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    89c1fc5 View commit details
    Browse the repository at this point in the history
  20. btrfs: convert BUG_ON in btrfs_reloc_cow_block() to proper error hand…

    …ling
    
    This BUG_ON is meant to catch backref cache problems, but these can
    arise from either bugs in the backref cache or corruption in the extent
    tree.  Fix it to be a proper error.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    1c726e4 View commit details
    Browse the repository at this point in the history
  21. btrfs: remove the changed list for backref cache

    Now that we're not updating the backref cache when we switch transids we
    can remove the changed list.
    
    We're going to keep the new_bytenr field because it serves as a good
    sanity check for the backref cache and relocation, and can prevent us
    from making extent tree corruption worse.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    d3f5003 View commit details
    Browse the repository at this point in the history
  22. btrfs: add a comment for new_bytenr in backref_cache_node

    Add a comment for this field so we know what it is used for.  Previously
    we used it to update the backref cache, so people may mistakenly think
    it is useless, but in fact exists to make sure the backref cache makes
    sense.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    f868384 View commit details
    Browse the repository at this point in the history
  23. btrfs: simplify loop in select_reloc_root()

    We have this setup as a loop, but in reality we will never walk back up
    the backref tree, if we do then it's a bug.  Get rid of the loop and
    handle the case where we have node->new_bytenr set at all.  Previous
    check was only if node->new_bytenr != root->node->start, but if it did
    then we would hit the WARN_ON() and walk back up the tree.
    
    Instead we want to just return error if ->new_bytenr is set, and then do
    the normal updating of the node for the reloc root and carry on.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    15eaf3b View commit details
    Browse the repository at this point in the history
  24. btrfs: remove clone_backref_node() from relocation

    Since we no longer maintain backref cache across transactions, and this
    is only called when we're creating the reloc root for a newly created
    snapshot in the transaction critical section, we will end up doing a
    bunch of work that will just get thrown away when we start the
    transaction in the relocation loop.  Delete this code as it no longer
    does anything for us.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    6f1978d View commit details
    Browse the repository at this point in the history
  25. btrfs: don't build backref tree for COW-only blocks

    We already determine the owner for any blocks we find when we're
    relocating, and for COW-only blocks (and the data reloc tree) we COW
    down to the block and call it good enough.  However we still build a
    whole backref tree for them, even though we're not going to use it, and
    then just don't put these blocks in the cache.
    
    Rework the code to check if the block belongs to a COW-only root or the
    data reloc root, and then just cow down to the block, skipping the
    backref cache generation.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    fb4fdba View commit details
    Browse the repository at this point in the history
  26. btrfs: do not handle non-shareable roots in backref cache

    Now that we handle relocation for non-shareable roots without using the
    backref cache, remove the ->cowonly field from the backref nodes and
    update the handling to throw an error.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    a405274 View commit details
    Browse the repository at this point in the history
  27. btrfs: simplify btrfs_backref_release_cache()

    We rely on finding all our nodes on the various lists in the backref
    cache, when they are all also in the rbtree.  Instead just search
    through the rbtree and free everything.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    c650914 View commit details
    Browse the repository at this point in the history
  28. btrfs: remove the ->lowest and ->leaves members from struct btrfs_bac…

    …kref_node
    
    Before we were keeping all of our nodes on various lists in order to
    make sure everything got cleaned up correctly.  We used node->lowest to
    indicate that node->lower was linked into the cache->leaves list.  Now
    that we do cleanup based on the rb-tree both the list and the flag are
    useless, so delete them both.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    1e0d705 View commit details
    Browse the repository at this point in the history
  29. btrfs: remove detached list from struct btrfs_backref_cache

    We don't ever look at this list, remove it.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    josefbacik authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    bdf0ad1 View commit details
    Browse the repository at this point in the history
  30. btrfs: improve the warning and error message for btrfs_remove_qgroup()

    [WARNING]
    There are several warnings about the recently introduced qgroup
    auto-removal that it triggers WARN_ON() for the non-zero rfer/excl
    numbers, e.g:
    
     ------------[ cut here ]------------
     WARNING: CPU: 67 PID: 2882 at fs/btrfs/qgroup.c:1854 btrfs_remove_qgroup+0x3df/0x450
     CPU: 67 UID: 0 PID: 2882 Comm: btrfs-cleaner Kdump: loaded Not tainted 6.11.6-300.fc41.x86_64 #1
     RIP: 0010:btrfs_remove_qgroup+0x3df/0x450
     Call Trace:
      <TASK>
      btrfs_qgroup_cleanup_dropped_subvolume+0x97/0xc0
      btrfs_drop_snapshot+0x44e/0xa80
      btrfs_clean_one_deleted_snapshot+0xc3/0x110
      cleaner_kthread+0xd8/0x130
      kthread+0xd2/0x100
      ret_from_fork+0x34/0x50
      ret_from_fork_asm+0x1a/0x30
      </TASK>
     ---[ end trace 0000000000000000 ]---
     BTRFS warning (device sda): to be deleted qgroup 0/319 has non-zero numbers, rfer 258478080 rfer_cmpr 258478080 excl 0 excl_cmpr 0
    
    [CAUSE]
    Although the root cause is still unclear, as if qgroup is consistent a
    fully dropped subvolume (with extra transaction committed) should lead
    to all zero numbers for the qgroup.
    
    My current guess is the subvolume drop triggered the new subtree drop
    threshold thus marked qgroup inconsistent, then rescan cleared it but
    some corner case is not properly handled during subvolume dropping.
    
    But at least for this particular case, since it's only the rfer/excl not
    properly reset to 0, and qgroup is already marked inconsistent, there is
    nothing to be worried for the end users.
    
    The user space tool utilizing qgroup would queue a rescan to handle
    everything, so the kernel wanring is a little overkilled.
    
    [ENHANCEMENT]
    Enhance the warning inside btrfs_remove_qgroup() by:
    
    - Only do WARN() if CONFIG_BTRFS_DEBUG is enabled
      As explained the kernel can handle inconsistent qgroups by simply do a
      rescan, there is nothing to bother the end users.
    
    - Treat the reserved space leak the same as non-zero numbers
      By outputting the values and trigger a WARN() if it's a debug build.
      So far I haven't experienced any case related to reserved space so I
      hope we will never need to bother them.
    
    Fixes: 839d6ea ("btrfs: automatically remove the subvolume qgroup")
    Link: kdave/btrfs-progs#922
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    c61ffaa View commit details
    Browse the repository at this point in the history
  31. btrfs: open-code btrfs_copy_from_user()

    The function btrfs_copy_from_user() handles the folio dirtying for
    buffered write. The original design is to allow that function to handle
    multiple folios, but since commit c87c299 ("btrfs: make buffered
    write to copy one page a time") there is no need to support multiple
    folios.
    
    So here open-code btrfs_copy_from_user() to
    copy_folio_from_iter_atomic() and flush_dcache_folio() calls.
    
    The short-copy check and revert are still kept as-is.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    4139d74 View commit details
    Browse the repository at this point in the history
  32. btrfs: output the reason for open_ctree() failure

    There is a recent ML report that mounting a large fs backed by hardware
    RAID56 controller (with one device missing) took too much time, and
    systemd seems to kill the mount attempt.
    
    In that case, the only error message is:
    
      BTRFS error (device sdj): open_ctree failed
    
    There is no reason on why the failure happened, making it very hard to
    understand the reason.
    
    At least output the error number (in the particular case it should be
    -EINTR) to provide some clue.
    
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Reported-by: Christoph Anton Mitterer <[email protected]>
    Cc: [email protected]
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Dec 11, 2024
    Configuration menu
    Copy the full SHA
    f7e8118 View commit details
    Browse the repository at this point in the history