Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rst delete ci #1447

Closed
wants to merge 10,000 commits into from
Closed

Rst delete ci #1447

wants to merge 10,000 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Oct 9, 2024

  1. xfs: fix a typo

    Fix a typo in comments.
    
    Signed-off-by: Andrew Kreimer <[email protected]>
    Reviewed-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Carlos Maiolino <[email protected]>
    algonell authored and cmaiolino committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    77bfe1b View commit details
    Browse the repository at this point in the history
  2. net: ti: icssg-prueth: Fix race condition for VLAN table access

    The VLAN table is a shared memory between the two ports/slices
    in a ICSSG cluster and this may lead to race condition when the
    common code paths for both ports are executed in different CPUs.
    
    Fix the race condition access by locking the shared memory access
    
    Fixes: 487f732 ("net: ti: icssg-prueth: Add helper functions to configure FDB")
    Signed-off-by: MD Danish Anwar <[email protected]>
    Reviewed-by: Roger Quadros <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    danish-ti authored and davem330 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    ff8ee11 View commit details
    Browse the repository at this point in the history
  3. btrfs: zoned: fix missing RCU locking in error message when loading z…

    …one info
    
    At btrfs_load_zone_info() we have an error path that is dereferencing
    the name of a device which is a RCU string but we are not holding a RCU
    read lock, which is incorrect.
    
    Fix this by using btrfs_err_in_rcu() instead of btrfs_err().
    
    The problem is there since commit 08e11a3 ("btrfs: zoned: load zone's
    allocation offset"), back then at btrfs_load_block_group_zone_info() but
    then later on that code was factored out into the helper
    btrfs_load_zone_info() by commit 09a4672 ("btrfs: zoned: factor out
    per-zone logic from btrfs_load_block_group_zone_info").
    
    Fixes: 08e11a3 ("btrfs: zoned: load zone's allocation offset")
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Naohiro Aota <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    fe4cd7e View commit details
    Browse the repository at this point in the history
  4. btrfs: fix clear_dirty and writeback ordering in submit_one_sector()

    This commit is a replay of commit 6252690 ("btrfs: fix invalid
    mapping of extent xarray state"). We need to call
    btrfs_folio_clear_dirty() before btrfs_set_range_writeback(), so that
    xarray DIRTY tag is cleared.
    
    With a refactoring commit 8189197 ("btrfs: refactor
    __extent_writepage_io() to do sector-by-sector submission"), it screwed
    up and the order is reversed and causing the same hang. Fix the ordering
    now in submit_one_sector().
    
    Fixes: 8189197 ("btrfs: refactor __extent_writepage_io() to do sector-by-sector submission")
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Naohiro Aota <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    naota authored and kdave committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    e761be2 View commit details
    Browse the repository at this point in the history
  5. net: phy: realtek: Fix MMD access on RTL8126A-integrated PHY

    All MMD reads return 0 for the RTL8126A-integrated PHY. Therefore phylib
    assumes it doesn't support EEE, what results in higher power consumption,
    and a significantly higher chip temperature in my case.
    To fix this split out the PHY driver for the RTL8126A-integrated PHY
    and set the read_mmd/write_mmd callbacks to read from vendor-specific
    registers.
    
    Fixes: 5befa37 ("net: phy: realtek: add support for RTL8126A-integrated 5Gbps PHY")
    Cc: [email protected]
    Signed-off-by: Heiner Kallweit <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    hkallweit authored and davem330 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    a6ad589 View commit details
    Browse the repository at this point in the history
  6. net: amd: mvme147: Fix probe banner message

    Currently this driver prints this line with what looks like
    a rogue format specifier when the device is probed:
    [    2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx
    
    Change the printk() for netdev_info() and move it after the
    registration has completed so it prints out the name of the
    interface properly.
    
    Signed-off-by: Daniel Palmer <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    fifteenhex authored and davem330 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    82c5b53 View commit details
    Browse the repository at this point in the history
  7. sctp: ensure sk_state is set to CLOSED if hashing fails in sctp_liste…

    …n_start
    
    If hashing fails in sctp_listen_start(), the socket remains in the
    LISTENING state, even though it was not added to the hash table.
    This can lead to a scenario where a socket appears to be listening
    without actually being accessible.
    
    This patch ensures that if the hashing operation fails, the sk_state
    is set back to CLOSED before returning an error.
    
    Note that there is no need to undo the autobind operation if hashing
    fails, as the bind port can still be used for next listen() call on
    the same socket.
    
    Fixes: 76c6d98 ("sctp: add sock_reuseport for the sock in __sctp_hash_endpoint")
    Reported-by: Marcelo Ricardo Leitner <[email protected]>
    Signed-off-by: Xin Long <[email protected]>
    Acked-by: Marcelo Ricardo Leitner <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    lxin authored and davem330 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    4d5c70e View commit details
    Browse the repository at this point in the history
  8. net: hns3/hns: Update the maintainer for the HNS3/HNS ethernet driver

    Yisen Zhuang has left the company in September.
    Jian Shen will be responsible for maintaining the
    hns3/hns driver's code in the future,
    so add Jian Shen to the hns3/hns driver's matainer list.
    
    Signed-off-by: Jijie Shao <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Jijie Shao authored and davem330 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    983e35c View commit details
    Browse the repository at this point in the history
  9. ring-buffer: Do not have boot mapped buffers hook to CPU hotplug

    The boot mapped ring buffer has its buffer mapped at a fixed location
    found at boot up. It is not dynamic. It cannot grow or be expanded when
    new CPUs come online.
    
    Do not hook fixed memory mapped ring buffers to the CPU hotplug callback,
    otherwise it can cause a crash when it tries to add the buffer to the
    memory that is already fully occupied.
    
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Link: https://lore.kernel.org/[email protected]
    Fixes: be68d63 ("ring-buffer: Add ring_buffer_alloc_range()")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    rostedt committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    912da2c View commit details
    Browse the repository at this point in the history
  10. ata: libata: avoid superfluous disk spin down + spin up during hibern…

    …ation
    
    A user reported that commit aa3998d ("ata: libata-scsi: Disable scsi
    device manage_system_start_stop") introduced a spin down + immediate spin
    up of the disk both when entering and when resuming from hibernation.
    This behavior was not there before, and causes an increased latency both
    when entering and when resuming from hibernation.
    
    Hibernation is done by three consecutive PM events, in the following order:
    1) PM_EVENT_FREEZE
    2) PM_EVENT_THAW
    3) PM_EVENT_HIBERNATE
    
    Commit aa3998d ("ata: libata-scsi: Disable scsi device
    manage_system_start_stop") modified ata_eh_handle_port_suspend() to call
    ata_dev_power_set_standby() (which spins down the disk), for both event
    PM_EVENT_FREEZE and event PM_EVENT_HIBERNATE.
    
    Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
    explicitly mentions that PM_EVENT_FREEZE does not have to be put the device
    in a low-power state, and actually recommends not doing so. Thus, let's not
    spin down the disk on PM_EVENT_FREEZE. (The disk will instead be spun down
    during the subsequent PM_EVENT_HIBERNATE event.)
    
    This way, PM_EVENT_FREEZE will behave as it did before commit aa3998d
    ("ata: libata-scsi: Disable scsi device manage_system_start_stop"), while
    PM_EVENT_HIBERNATE will continue to spin down the disk.
    
    This will avoid the superfluous spin down + spin up when entering and
    resuming from hibernation, while still making sure that the disk is spun
    down before actually entering hibernation.
    
    Cc: [email protected] # v6.6+
    Fixes: aa3998d ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
    Reviewed-by: Damien Le Moal <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Niklas Cassel <[email protected]>
    floatious committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    a38719e View commit details
    Browse the repository at this point in the history
  11. HID: amd_sfh: Switch to device-managed dmam_alloc_coherent()

    Using the device-managed version allows to simplify clean-up in probe()
    error path.
    
    Additionally, this device-managed ensures proper cleanup, which helps to
    resolve memory errors, page faults, btrfs going read-only, and btrfs
    disk corruption.
    
    Fixes: 4b2c53d ("SFH:Transport Driver to add support of AMD Sensor Fusion Hub (SFH)")
    Tested-by: Chris Hixon <[email protected]>
    Tested-by: Richard <[email protected]>
    Tested-by: Skyler <[email protected]>
    Reported-by: Chris Hixon <[email protected]>
    Closes: https://lore.kernel.org/all/[email protected]/
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331
    Signed-off-by: Basavaraj Natikar <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Basavaraj Natikar authored and Jiri Kosina committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    c56f9ec View commit details
    Browse the repository at this point in the history
  12. selftests: sched_ext: Add sched_ext as proper selftest target

    The sched_ext selftests is missing proper cross-compilation support, a
    proper target entry, and out-of-tree build support.
    
    When building the kselftest suite, e.g.:
    
      make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-  \
        TARGETS=sched_ext SKIP_TARGETS="" O=/output/foo \
        -C tools/testing/selftests install
    
    or:
    
      make ARCH=arm64 LLVM=1 TARGETS=sched_ext SKIP_TARGETS="" \
        O=/output/foo -C tools/testing/selftests install
    
    The expectation is that the sched_ext is included, cross-built, the
    correct toolchain is picked up, and placed into /output/foo.
    
    In contrast to the BPF selftests, the sched_ext suite does not use
    bpftool at test run-time, so it is sufficient to build bpftool for the
    build host only.
    
    Add ARCH, CROSS_COMPILE, OUTPUT, and TARGETS support to the sched_ext
    selftest. Also, remove some variables that were unused by the
    Makefile.
    
    Signed-off-by: Björn Töpel <[email protected]>
    Reviewed-by: Shuah Khan <[email protected]>
    Acked-by: David Vernet <[email protected]>
    Tested-by: Mark Brown <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Signed-off-by: Tejun Heo <[email protected]>
    bjorn-rivos authored and htejun committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    7941b83 View commit details
    Browse the repository at this point in the history
  13. unicode: Don't special case ignorable code points

    We don't need to handle them separately. Instead, just let them
    decompose/casefold to themselves.
    
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    krisman committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    5c26d2f View commit details
    Browse the repository at this point in the history
  14. Merge tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/krisman/unicode
    
    Pull unicode fix from Gabriel Krisman Bertazi:
    
     - Handle code-points with the Ignorable property as regular character
       instead of treating them as an empty string (me)
    
    * tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
      unicode: Don't special case ignorable code points
    torvalds committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    ff9d409 View commit details
    Browse the repository at this point in the history
  15. NFS: remove revoked delegation from server's delegation list

    After the delegation is returned to the NFS server remove it
    from the server's delegations list to reduce the time it takes
    to scan this list.
    
    Network trace captured while running the below script shows the
    time taken to service the CB_RECALL increases gradually due to
    the overhead of traversing the delegation list in
    nfs_delegation_find_inode_server.
    
    The NFS server in this test is a Solaris server which issues
    CB_RECALL when receiving the all-zero stateid in the SETATTR.
    
    mount=/mnt/data
    for i in $(seq 1 20)
    do
       echo $i
       mkdir $mount/testtarfile$i
       time  tar -C $mount/testtarfile$i -xf 5000_files.tar
    done
    
    Signed-off-by: Dai Ngo <[email protected]>
    Reviewed-by: Trond Myklebust <[email protected]>
    Signed-off-by: Anna Schumaker <[email protected]>
    daimngo authored and Anna Schumaker committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    7ef6010 View commit details
    Browse the repository at this point in the history
  16. misc: sgi-gru: Don't disable preemption in GRU driver

    Disabling preemption in the GRU driver is unnecessary, and clashes with
    sleeping locks in several code paths.  Remove preempt_disable and
    preempt_enable from the GRU driver.
    
    Signed-off-by: Dimitri Sivanich <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Dimitri Sivanich authored and torvalds committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    b983b27 View commit details
    Browse the repository at this point in the history
  17. bcachefs: do not use PF_MEMALLOC_NORECLAIM

    Patch series "remove PF_MEMALLOC_NORECLAIM" v3.
    
    
    This patch (of 2):
    
    bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
    inode to achieve GFP_NOWAIT semantic while holding locks. If this
    allocation fails it will drop locks and use GFP_NOFS allocation context.
    
    We would like to drop PF_MEMALLOC_NORECLAIM because it is really
    dangerous to use if the caller doesn't control the full call chain with
    this flag set. E.g. if any of the function down the chain needed
    GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
    cause unexpected failure.
    
    While this is not the case in this particular case using the scoped gfp
    semantic is not really needed bacause we can easily pus the allocation
    context down the chain without too much clutter.
    
    [[email protected]: fix kerneldoc warnings]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Michal Hocko <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Dave Chinner <[email protected]>
    Reviewed-by: Jan Kara <[email protected]> # For vfs changes
    Cc: Al Viro <[email protected]>
    Cc: Christian Brauner <[email protected]>
    Cc: James Morris <[email protected]>
    Cc: Kent Overstreet <[email protected]>
    Cc: Paul Moore <[email protected]>
    Cc: Serge E. Hallyn <[email protected]>
    Cc: Yafang Shao <[email protected]>
    Cc: Matthew Wilcox (Oracle) <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Michal Hocko authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    9897713 View commit details
    Browse the repository at this point in the history
  18. Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"

    This reverts commit eab0af9.
    
    There is no existing user of those flags.  PF_MEMALLOC_NOWARN is dangerous
    because a nested allocation context can use GFP_NOFAIL which could cause
    unexpected failure.  Such a code would be hard to maintain because it
    could be deeper in the call chain.
    
    PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1] that
    such a allocation contex is inherently unsafe if the context doesn't fully
    control all allocations called from this context.
    
    While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM is
    it doesn't have any user and as Matthew has pointed out we are running out
    of those flags so better reclaim it without any real users.
    
    [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Michal Hocko <[email protected]>
    Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Dave Chinner <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Cc: Al Viro <[email protected]>
    Cc: Christian Brauner <[email protected]>
    Cc: James Morris <[email protected]>
    Cc: Jan Kara <[email protected]>
    Cc: Kent Overstreet <[email protected]>
    Cc: Paul Moore <[email protected]>
    Cc: Serge E. Hallyn <[email protected]>
    Cc: Yafang Shao <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Michal Hocko authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    9a8da05 View commit details
    Browse the repository at this point in the history
  19. kthread: unpark only parked kthread

    Calling into kthread unparking unconditionally is mostly harmless when
    the kthread is already unparked. The wake up is then simply ignored
    because the target is not in TASK_PARKED state.
    
    However if the kthread is per CPU, the wake up is preceded by a call
    to kthread_bind() which expects the task to be inactive and in
    TASK_PARKED state, which obviously isn't the case if it is unparked.
    
    As a result, calling kthread_stop() on an unparked per-cpu kthread
    triggers such a warning:
    
    	WARNING: CPU: 0 PID: 11 at kernel/kthread.c:525 __kthread_bind_mask kernel/kthread.c:525
    	 <TASK>
    	 kthread_stop+0x17a/0x630 kernel/kthread.c:707
    	 destroy_workqueue+0x136/0xc40 kernel/workqueue.c:5810
    	 wg_destruct+0x1e2/0x2e0 drivers/net/wireguard/device.c:257
    	 netdev_run_todo+0xe1a/0x1000 net/core/dev.c:10693
    	 default_device_exit_batch+0xa14/0xa90 net/core/dev.c:11769
    	 ops_exit_list net/core/net_namespace.c:178 [inline]
    	 cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:640
    	 process_one_work kernel/workqueue.c:3231 [inline]
    	 process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
    	 worker_thread+0x86d/0xd70 kernel/workqueue.c:3393
    	 kthread+0x2f0/0x390 kernel/kthread.c:389
    	 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
    	 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    	 </TASK>
    
    Fix this with skipping unecessary unparking while stopping a kthread.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 5c25b5f ("workqueue: Tag bound workers with KTHREAD_IS_PER_CPU")
    Signed-off-by: Frederic Weisbecker <[email protected]>
    Reported-by: [email protected]
    Tested-by: [email protected]
    Suggested-by: Thomas Gleixner <[email protected]>
    Cc: Hillf Danton <[email protected]>
    Cc: Tejun Heo <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Frederic Weisbecker authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    214e01a View commit details
    Browse the repository at this point in the history
  20. device-dax: correct pgoff align in dax_set_mapping()

    pgoff should be aligned using ALIGN_DOWN() instead of ALIGN().  Otherwise,
    vmf->address not aligned to fault_size will be aligned to the next
    alignment, that can result in memory failure getting the wrong address.
    
    It's a subtle situation that only can be observed in
    page_mapped_in_vma() after the page is page fault handled by
    dev_dax_huge_fault.  Generally, there is little chance to perform
    page_mapped_in_vma in dev-dax's page unless in specific error injection
    to the dax device to trigger an MCE - memory-failure.  In that case,
    page_mapped_in_vma() will be triggered to determine which task is
    accessing the failure address and kill that task in the end.
    
    
    We used self-developed dax device (which is 2M aligned mapping) , to
    perform error injection to random address.  It turned out that error
    injected to non-2M-aligned address was causing endless MCE until panic.
    Because page_mapped_in_vma() kept resulting wrong address and the task
    accessing the failure address was never killed properly:
    
    
    [ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3784.049006] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3784.448042] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3784.792026] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3785.162502] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3785.461116] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3785.764730] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3786.042128] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3786.464293] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3786.818090] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    [ 3787.085297] mce: Uncorrected hardware memory error in user-access at 
    200c9742380
    [ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page: 
    Recovered
    
    It took us several weeks to pinpoint this problem,  but we eventually
    used bpftrace to trace the page fault and mce address and successfully
    identified the issue.
    
    
    Joao added:
    
    ; Likely we never reproduce in production because we always pin
    : device-dax regions in the region align they provide (Qemu does
    : similarly with prealloc in hugetlb/file backed memory).  I think this
    : bug requires that we touch *unpinned* device-dax regions unaligned to
    : the device-dax selected alignment (page size i.e.  4K/2M/1G)
    
    Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.1727421694.git.llfl@linux.alibaba.com
    Fixes: b9b5777 ("device-dax: use ALIGN() for determining pgoff")
    Signed-off-by: Kun(llfl) <[email protected]>
    Tested-by: JianXiong Zhao <[email protected]>
    Reviewed-by: Joao Martins <[email protected]>
    Cc: Dan Williams <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Kun(llfl) authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    7fcbd97 View commit details
    Browse the repository at this point in the history
  21. selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test

    The hmm2 double_map test was failing due to an incorrect buffer->mirror
    size.  The buffer->mirror size was 6, while buffer->ptr size was 6 *
    PAGE_SIZE.  The test failed because the kernel's copy_to_user function was
    attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror.  Since the
    size of buffer->mirror was incorrect, copy_to_user failed.
    
    This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
    
    Test Result without this patch
    ==============================
     #  RUN           hmm2.hmm2_device_private.double_map ...
     # hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
     # double_map: Test terminated by assertion
     #          FAIL  hmm2.hmm2_device_private.double_map
     not ok 53 hmm2.hmm2_device_private.double_map
    
    Test Result with this patch
    ===========================
     #  RUN           hmm2.hmm2_device_private.double_map ...
     #            OK  hmm2.hmm2_device_private.double_map
     ok 53 hmm2.hmm2_device_private.double_map
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: fee9f6d ("mm/hmm/test: add selftests for HMM")
    Signed-off-by: Donet Tom <[email protected]>
    Reviewed-by: Muhammad Usama Anjum <[email protected]>
    Cc: Jérôme Glisse <[email protected]>
    Cc: Kees Cook <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Przemek Kitszel <[email protected]>
    Cc: Ritesh Harjani (IBM) <[email protected]>
    Cc: Shuah Khan <[email protected]>
    Cc: Ralph Campbell <[email protected]>
    Cc: Jason Gunthorpe <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Donet Tom authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    76503e1 View commit details
    Browse the repository at this point in the history
  22. fs/proc/kcore.c: allow translation of physical memory addresses

    When /proc/kcore is read an attempt to read the first two pages results in
    HW-specific page swap on s390 and another (so called prefix) pages are
    accessed instead.  That leads to a wrong read.
    
    Allow architecture-specific translation of memory addresses using
    kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
    to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks.  That
    way an architecture can deal with specific physical memory ranges.
    
    Re-use the existing /dev/mem callback implementation on s390, which
    handles the described prefix pages swapping correctly.
    
    For other architectures the default callback is basically NOP.  It is
    expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
    KCORE_RAM memory type.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Alexander Gordeev <[email protected]>
    Suggested-by: Heiko Carstens <[email protected]>
    Cc: Vasily Gorbik <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Alexander Gordeev authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    3d5854d View commit details
    Browse the repository at this point in the history
  23. resource, kunit: fix user-after-free in resource_test_region_intersec…

    …ts()
    
    In resource_test_insert_resource(), the pointer is used in error message
    after kfree().  This is user-after-free.  To fix this, we need to call
    kunit_add_action_or_reset() to schedule memory freeing after usage.  But
    kunit_add_action_or_reset() itself may fail and free the memory.  So, its
    return value should be checked and abort the test for failure.  Then, we
    found that other usage of kunit_add_action_or_reset() in
    resource_test_region_intersects() needs to be fixed too.  We fix all these
    user-after-free bugs in this patch.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 99185c1 ("resource, kunit: add test case for region_intersects()")
    Signed-off-by: "Huang, Ying" <[email protected]>
    Reported-by: Kees Bakker <[email protected]>
    Closes: https://lore.kernel.org/lkml/[email protected]/
    Cc: Dan Williams <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: Bjorn Helgaas <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    yhuang-intel authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    0665d7a View commit details
    Browse the repository at this point in the history
  24. mm/huge_memory: check pmd_special() only after pmd_present()

    We should only check for pmd_special() after we made sure that we have a
    present PMD.  For example, if we have a migration PMD, pmd_special() might
    indicate that we have a special PMD although we really don't.
    
    This fixes confusing migration entries as PFN mappings, and not doing what
    we are supposed to do in the "is_swap_pmd()" case further down in the
    function -- including messing up COW, page table handling and accounting.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: bc02afb ("mm/fork: accept huge pfnmap entries")
    Signed-off-by: David Hildenbrand <[email protected]>
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/lkml/[email protected]/
    Reviewed-by: Peter Xu <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    davidhildenbrand authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    47fa301 View commit details
    Browse the repository at this point in the history
  25. .mailmap: update Fangrui's email

    I'm leaving Google.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Fangrui Song <[email protected]>
    Acked-by: Nathan Chancellor <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    MaskRay authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    71e32fe View commit details
    Browse the repository at this point in the history
  26. secretmem: disable memfd_secret() if arch cannot set direct map

    Return -ENOSYS from memfd_secret() syscall if !can_set_direct_map().  This
    is the case for example on some arm64 configurations, where marking 4k
    PTEs in the direct map not present can only be done if the direct map is
    set up at 4k granularity in the first place (as ARM's break-before-make
    semantics do not easily allow breaking apart large/gigantic pages).
    
    More precisely, on arm64 systems with !can_set_direct_map(),
    set_direct_map_invalid_noflush() is a no-op, however it returns success
    (0) instead of an error.  This means that memfd_secret will seemingly
    "work" (e.g.  syscall succeeds, you can mmap the fd and fault in pages),
    but it does not actually achieve its goal of removing its memory from the
    direct map.
    
    Note that with this patch, memfd_secret() will start erroring on systems
    where can_set_direct_map() returns false (arm64 with
    CONFIG_RODATA_FULL_DEFAULT_ENABLED=n, CONFIG_DEBUG_PAGEALLOC=n and
    CONFIG_KFENCE=n), but that still seems better than the current silent
    failure.  Since CONFIG_RODATA_FULL_DEFAULT_ENABLED defaults to 'y', most
    arm64 systems actually have a working memfd_secret() and aren't be
    affected.
    
    From going through the iterations of the original memfd_secret patch
    series, it seems that disabling the syscall in these scenarios was the
    intended behavior [1] (preferred over having
    set_direct_map_invalid_noflush return an error as that would result in
    SIGBUSes at page-fault time), however the check for it got dropped between
    v16 [2] and v17 [3], when secretmem moved away from CMA allocations.
    
    [1]: https://lore.kernel.org/lkml/[email protected]/
    [2]: https://lore.kernel.org/lkml/[email protected]/#t
    [3]: https://lore.kernel.org/lkml/[email protected]/
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 1507f51 ("mm: introduce memfd_secret system call to create "secret" memory areas")
    Signed-off-by: Patrick Roy <[email protected]>
    Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
    Cc: Alexander Graf <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: James Gowans <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    roypat authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    532b53c View commit details
    Browse the repository at this point in the history
  27. CREDITS: sort alphabetically by name

    Re-sort few misplaced entries in the CREDITS file.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Krzysztof Kozlowski <[email protected]>
    Cc: Arnd Bergmann <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    krzk authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    b181569 View commit details
    Browse the repository at this point in the history
  28. mm: zswap: delete comments for "value" member of 'struct zswap_entry'.

    Made a minor edit in the comments for 'struct zswap_entry' to delete the
    description of the 'value' member that was deleted in commit 20a5532
    ("mm: remove code to handle same filled pages").
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Kanchana P Sridhar <[email protected]>
    Fixes: 20a5532 ("mm: remove code to handle same filled pages")
    Reviewed-by: Nhat Pham <[email protected]>
    Acked-by: Yosry Ahmed <[email protected]>
    Reviewed-by: Usama Arif <[email protected]>
    Cc: Chengming Zhou <[email protected]>
    Cc: Huang Ying <[email protected]>
    Cc: Johannes Weiner <[email protected]>
    Cc: Kanchana P Sridhar <[email protected]>
    Cc: Ryan Roberts <[email protected]>
    Cc: Wajdi Feghali <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    kparasur authored and akpm00 committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    aa5f0fa View commit details
    Browse the repository at this point in the history
  29. bcachefs: bcachefs_metadata_version_inode_has_child_snapshots

    There's an inherent race in taking a snapshot while an unlinked file is
    open, and then reattaching it in the child snapshot.
    
    In the interior snapshot node the file will appear unlinked, as though
    it should be deleted - it's not referenced by anything in that snapshot
    - but we can't delete it, because the file data is referenced by the
    child snapshot.
    
    This was being handled incorrectly with
    propagate_key_to_snapshot_leaves() - but that doesn't resolve the
    fundamental inconsistency of "this file looks like it should be deleted
    according to normal rules, but - ".
    
    To fix this, we need to fix the rule for when an inode is deleted. The
    previous rule, ignoring snapshots (there was no well-defined rule
    for with snapshots) was:
      Unlinked, non open files are deleted, either at recovery time or
      during online fsck
    
    The new rule is:
      Unlinked, non open files, that do not exist in child snapshots, are
      deleted.
    
    To make this work transactionally, we add a new inode flag,
    BCH_INODE_has_child_snapshot; it overrides BCH_INODE_unlinked when
    considering whether to delete an inode, or put it on the deleted list.
    
    For transactional consistency, clearing it handled by the inode trigger:
    when deleting an inode we check if there are parent inodes which can now
    have the BCH_INODE_has_child_snapshot flag cleared.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    9b23fdb View commit details
    Browse the repository at this point in the history
  30. bcachefs: Kill bch2_propagate_key_to_snapshot_leaves()

    Dead code now.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    84878e8 View commit details
    Browse the repository at this point in the history
  31. bcachefs: bch2_inode_or_descendents_is_open()

    fsck can now correctly check if inodes in interior snapshot nodes are
    open/in use.
    
    - Tweak the vfs inode rhashtable so that the subvolume ID isn't hashed,
      meaning inums in different subvolumes will hash to the same slot. Note
      that this is a hack, and will cause problems if anyone ever has the
      same file in many different snapshots open all at the same time.
    
    - Then check if any of those subvolumes is a descendent of the snapshot
      ID being checked
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    9d86178 View commit details
    Browse the repository at this point in the history
  32. bcachefs: Disk accounting device validation fixes

    - Fix failure to validate that accounting replicas entries point to
      valid devices: this wasn't a real bug since they'd be cleaned up by
      GC, but is still something we should know about
    
    - Fix failure to validate that dev_data_type entries point to valid
      devices: this does fix a real bug, since bch2_accounting_read() would
      then try to copy the counters to that device and pop an inconsistent
      error when the device didn't exist
    
    - Remove accounting entries that are zeroed or invalid: if we're not
      validating them we need to get rid of them: they might not exist in
      the superblock, so we need the to trigger the superblock mark path
      when they're readded.
    
      This fixes the replication.ktest rereplicate test, which was failing
      with "superblock not marked for replicas..."
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    19773ec View commit details
    Browse the repository at this point in the history
  33. bcachefs: add check for btree id against max in try read node

    Add check for read node's btree_id against BTREE_ID_NR_MAX in
    try_read_btree_node to prevent triggering EBUG_ON condition in
    bch2_btree_id_root[1].
    
    [1] https://syzkaller.appspot.com/bug?extid=cf7b2215b5d70600ec00
    
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=cf7b2215b5d70600ec00
    Fixes: 4409b80 ("bcachefs: Repair pass for scanning for btree nodes")
    Signed-off-by: Piotr Zalewski <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    JungerBoyo authored and Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    0151d10 View commit details
    Browse the repository at this point in the history
  34. bcachefs: Release transaction before wake up

    We will get this if we wake up first:
    
    Kernel panic - not syncing: btree_node_write_done leaked btree_trans
    
    since there are still transactions waiting for cycle detectors after
    BTREE_NODE_write_in_flight is cleared.
    
    Signed-off-by: Alan Huang <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    alanskind authored and Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    a154154 View commit details
    Browse the repository at this point in the history
  35. bcachefs: Fix NULL pointer dereference in bch2_opt_to_text

    This patch adds a bounds check to the bch2_opt_to_text function to prevent
    NULL pointer dereferences when accessing the opt->choices array. This
    ensures that the index used is within valid bounds before dereferencing.
    The new version enhances the readability.
    
    Reported-and-tested-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=37186860aa7812b331d5
    Signed-off-by: Mohammed Anees <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    donutAnees authored and Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    a30f322 View commit details
    Browse the repository at this point in the history
  36. bcachefs: Fix state lock involved deadlock

    We increased write ref, if the fs went to RO, that would lead to
    a deadlock, it actually happens:
    
    00171 ========= TEST   generic/279
    00171
    00172 bcachefs (vdb): starting version 1.12: rebalance_work_acct_fix opts=nocow
    00172 bcachefs (vdb): recovering from clean shutdown, journal seq 35
    00172 bcachefs (vdb): accounting_read... done
    00172 bcachefs (vdb): alloc_read... done
    00172 bcachefs (vdb): stripes_read... done
    00172 bcachefs (vdb): snapshots_read... done
    00172 bcachefs (vdb): journal_replay... done
    00172 bcachefs (vdb): resume_logged_ops... done
    00172 bcachefs (vdb): going read-write
    00172 bcachefs (vdb): done starting filesystem
    00172 FSTYP         -- bcachefs
    00172 PLATFORM      -- Linux/aarch64 farm3-kvm 6.11.0-rc1-ktest-g3e290a0b8e34 #7030 SMP Tue Oct  8 14:15:12 UTC 2024
    00172 MKFS_OPTIONS  -- --nocow /dev/vdc
    00172 MOUNT_OPTIONS -- /dev/vdc /mnt/scratch
    00172
    00172 bcachefs (vdc): starting version 1.12: rebalance_work_acct_fix opts=nocow
    00172 bcachefs (vdc): initializing new filesystem
    00172 bcachefs (vdc): going read-write
    00172 bcachefs (vdc): marking superblocks
    00172 bcachefs (vdc): initializing freespace
    00172 bcachefs (vdc): done initializing freespace
    00172 bcachefs (vdc): reading snapshots table
    00172 bcachefs (vdc): reading snapshots done
    00172 bcachefs (vdc): done starting filesystem
    00173 bcachefs (vdc): shutting down
    00173 bcachefs (vdc): going read-only
    00173 bcachefs (vdc): finished waiting for writes to stop
    00173 bcachefs (vdc): flushing journal and stopping allocators, journal seq 4
    00173 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 6
    00173 bcachefs (vdc): shutdown complete, journal seq 7
    00173 bcachefs (vdc): marking filesystem clean
    00173 bcachefs (vdc): shutdown complete
    00173 bcachefs (vdb): shutting down
    00173 bcachefs (vdb): going read-only
    00361 INFO: task umount:6180 blocked for more than 122 seconds.
    00361 Not tainted 6.11.0-rc1-ktest-g3e290a0b8e34 #7030
    00361 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    00361 task:umount          state:D stack:0     pid:6180  tgid:6180  ppid:6176   flags:0x00000004
    00361 Call trace:
    00362 __switch_to (arch/arm64/kernel/process.c:556)
    00362 __schedule (kernel/sched/core.c:5191 kernel/sched/core.c:6529)
    00363 schedule (include/asm-generic/bitops/generic-non-atomic.h:128 include/linux/thread_info.h:192 include/linux/sched.h:2084 kernel/sched/core.c:6608 kernel/sched/core.c:6621)
    00365 bch2_fs_read_only (fs/bcachefs/super.c:346 (discriminator 41))
    00367 __bch2_fs_stop (fs/bcachefs/super.c:620)
    00368 bch2_put_super (fs/bcachefs/fs.c:1942)
    00369 generic_shutdown_super (include/linux/list.h:373 (discriminator 2) fs/super.c:650 (discriminator 2))
    00371 bch2_kill_sb (fs/bcachefs/fs.c:2170)
    00372 deactivate_locked_super (fs/super.c:434 fs/super.c:475)
    00373 deactivate_super (fs/super.c:508)
    00374 cleanup_mnt (fs/namespace.c:250 fs/namespace.c:1374)
    00376 __cleanup_mnt (fs/namespace.c:1381)
    00376 task_work_run (include/linux/sched.h:2024 kernel/task_work.c:224)
    00377 do_notify_resume (include/linux/resume_user_mode.h:50 arch/arm64/kernel/entry-common.c:151)
    00377 el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:171 arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
    00377 el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
    00378 el0t_64_sync (arch/arm64/kernel/entry.S:598)
    00378 INFO: task tee:6182 blocked for more than 122 seconds.
    00378 Not tainted 6.11.0-rc1-ktest-g3e290a0b8e34 #7030
    00378 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    00378 task:tee             state:D stack:0     pid:6182  tgid:6182  ppid:533    flags:0x00000004
    00378 Call trace:
    00378 __switch_to (arch/arm64/kernel/process.c:556)
    00378 __schedule (kernel/sched/core.c:5191 kernel/sched/core.c:6529)
    00378 schedule (include/asm-generic/bitops/generic-non-atomic.h:128 include/linux/thread_info.h:192 include/linux/sched.h:2084 kernel/sched/core.c:6608 kernel/sched/core.c:6621)
    00378 schedule_preempt_disabled (kernel/sched/core.c:6680)
    00379 rwsem_down_read_slowpath (kernel/locking/rwsem.c:1073 (discriminator 1))
    00379 down_read (kernel/locking/rwsem.c:1529)
    00381 bch2_gc_gens (fs/bcachefs/sb-members.h:77 fs/bcachefs/sb-members.h:88 fs/bcachefs/sb-members.h:128 fs/bcachefs/btree_gc.c:1240)
    00383 bch2_fs_store_inner (fs/bcachefs/sysfs.c:473)
    00385 bch2_fs_internal_store (fs/bcachefs/sysfs.c:417 fs/bcachefs/sysfs.c:580 fs/bcachefs/sysfs.c:576)
    00386 sysfs_kf_write (fs/sysfs/file.c:137)
    00387 kernfs_fop_write_iter (fs/kernfs/file.c:334)
    00389 vfs_write (fs/read_write.c:497 fs/read_write.c:590)
    00390 ksys_write (fs/read_write.c:643)
    00391 __arm64_sys_write (fs/read_write.c:652)
    00391 invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54)
    00392 do_el0_svc (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2))
    00392 el0_svc (arch/arm64/include/asm/irqflags.h:55 arch/arm64/include/asm/irqflags.h:76 arch/arm64/kernel/entry-common.c:165 arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
    00392 el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
    00392 el0t_64_sync (arch/arm64/kernel/entry.S:598)
    
    Signed-off-by: Alan Huang <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    alanskind authored and Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    9205d24 View commit details
    Browse the repository at this point in the history
  37. closures: Add closure_wait_event_timeout()

    Add a closure version of wait_event_timeout(), with the same semantics.
    
    The closure version is useful because unlike wait_event(), it allows
    blocking code to run in the conditional expression.
    
    Cc: Coly Li <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    04b670d View commit details
    Browse the repository at this point in the history
  38. bcachefs: Check if stuck in journal_res_get()

    Like how we already do when the allocator seems to be stuck, check if
    we're waiting too long for a journal reservation and print some debug
    info.
    
    This is specifically to track down
    koverstreet/bcachefs#656
    
    which is showing up in userspace where we don't have sysfs/debugfs to
    get the journal debug info.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    a7e2dd5 View commit details
    Browse the repository at this point in the history
  39. bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry

    inode_bit_waitqueue() is changing - this update clears the way for
    sched changes.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    3b80552 View commit details
    Browse the repository at this point in the history
  40. netfilter: xtables: avoid NFPROTO_UNSPEC where needed

    syzbot managed to call xt_cluster match via ebtables:
    
     WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780
     [..]
     ebt_do_table+0x174b/0x2a40
    
    Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet
    processing.  As this is only useful to restrict locally terminating
    TCP/UDP traffic, register this for ipv4 and ipv6 family only.
    
    Pablo points out that this is a general issue, direct users of the
    set/getsockopt interface can call into targets/matches that were only
    intended for use with ip(6)tables.
    
    Check all UNSPEC matches and targets for similar issues:
    
    - matches and targets are fine except if they assume skb_network_header()
      is valid -- this is only true when called from inet layer: ip(6) stack
      pulls the ip/ipv6 header into linear data area.
    - targets that return XT_CONTINUE or other xtables verdicts must be
      restricted too, they are incompatbile with the ebtables traverser, e.g.
      EBT_CONTINUE is a completely different value than XT_CONTINUE.
    
    Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as
    they are provided for use by ip(6)tables.
    
    The MARK target is also used by arptables, so register for NFPROTO_ARP too.
    
    While at it, bail out if connbytes fails to enable the corresponding
    conntrack family.
    
    This change passes the selftests in iptables.git.
    
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/netfilter-devel/[email protected]/
    Fixes: 0269ea4 ("netfilter: xtables: add cluster match")
    Signed-off-by: Florian Westphal <[email protected]>
    Co-developed-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Florian Westphal authored and ummakynes committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    0bfcb7b View commit details
    Browse the repository at this point in the history
  41. netfilter: fib: check correct rtable in vrf setups

    We need to init l3mdev unconditionally, else main routing table is searched
    and incorrect result is returned unless strict (iif keyword) matching is
    requested.
    
    Next patch adds a selftest for this.
    
    Fixes: 2a8a7c0 ("netfilter: nft_fib: Fix for rpath check with VRF devices")
    Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1761
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Florian Westphal authored and ummakynes committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    05ef705 View commit details
    Browse the repository at this point in the history
  42. selftests: netfilter: conntrack_vrf.sh: add fib test case

    meta iifname veth0 ip daddr ... fib daddr oif
    
    ... is expected to return "dummy0" interface which is part of same vrf
    as veth0.
    
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Florian Westphal authored and ummakynes committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    c6a0862 View commit details
    Browse the repository at this point in the history
  43. Merge tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.o…

    …rg/pub/scm/linux/kernel/git/akpm/mm
    
    Pull misc fixes from Andrew Morton:
     "12 hotfixes, 5 of which are c:stable. All singletons, about half of
      which are MM"
    
    * tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
      mm: zswap: delete comments for "value" member of 'struct zswap_entry'.
      CREDITS: sort alphabetically by name
      secretmem: disable memfd_secret() if arch cannot set direct map
      .mailmap: update Fangrui's email
      mm/huge_memory: check pmd_special() only after pmd_present()
      resource, kunit: fix user-after-free in resource_test_region_intersects()
      fs/proc/kcore.c: allow translation of physical memory addresses
      selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
      device-dax: correct pgoff align in dax_set_mapping()
      kthread: unpark only parked kthread
      Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"
      bcachefs: do not use PF_MEMALLOC_NORECLAIM
    torvalds committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    d3d1556 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2024

  1. net: ftgmac100: fixed not check status from fixed phy

    Add error handling from calling fixed_phy_register.
    It may return some error, therefore, need to check the status.
    
    And fixed_phy_register needs to bind a device node for mdio.
    Add the mac device node for fixed_phy_register function.
    This is a reference to this function, of_phy_register_fixed_link().
    
    Fixes: e24a6c8 ("net: ftgmac100: Get link speed and duplex for NC-SI")
    Signed-off-by: Jacky Chou <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    aspeedJacky authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    70a0da8 View commit details
    Browse the repository at this point in the history
  2. ksmbd: fix user-after-free from session log off

    There is racy issue between smb2 session log off and smb2 session setup.
    It will cause user-after-free from session log off.
    This add session_lock when setting SMB2_SESSION_EXPIRED and referece
    count to session struct not to free session while it is being used.
    
    Cc: [email protected] # v5.15+
    Reported-by: [email protected] # ZDI-CAN-25282
    Signed-off-by: Namjae Jeon <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    namjaejeon authored and Steve French committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    7aa8804 View commit details
    Browse the repository at this point in the history
  3. net: ibm: emac: mal: add dcr_unmap to _remove

    It's done in probe so it should be undone here.
    
    Fixes: 1d3bb99 ("Device tree aware EMAC driver")
    Signed-off-by: Rosen Penev <[email protected]>
    Reviewed-by: Breno Leitao <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    neheb authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    080ddc2 View commit details
    Browse the repository at this point in the history
  4. net: fec: don't save PTP state if PTP is unsupported

    Some platforms (such as i.MX25 and i.MX27) do not support PTP, so on
    these platforms fec_ptp_init() is not called and the related members
    in fep are not initialized. However, fec_ptp_save_state() is called
    unconditionally, which causes the kernel to panic. Therefore, add a
    condition so that fec_ptp_save_state() is not called if PTP is not
    supported.
    
    Fixes: a1477dc ("net: fec: Restart PPS after link state change")
    Reported-by: Guenter Roeck <[email protected]>
    Closes: https://lore.kernel.org/lkml/[email protected]/
    Signed-off-by: Wei Fang <[email protected]>
    Reviewed-by: Csókás, Bence <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Wei Fang authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    6be0630 View commit details
    Browse the repository at this point in the history
  5. net: dsa: refuse cross-chip mirroring operations

    In case of a tc mirred action from one switch to another, the behavior
    is not correct. We simply tell the source switch driver to program a
    mirroring entry towards mirror->to_local_port = to_dp->index, but it is
    not even guaranteed that the to_dp belongs to the same switch as dp.
    
    For proper cross-chip support, we would need to go through the
    cross-chip notifier layer in switch.c, program the entry on cascade
    ports, and introduce new, explicit API for cross-chip mirroring, given
    that intermediary switches should have introspection into the DSA tags
    passed through the cascade port (and not just program a port mirror on
    the entire cascade port). None of that exists today.
    
    Reject what is not implemented so that user space is not misled into
    thinking it works.
    
    Fixes: f50f212 ("net: dsa: Add plumbing for port mirroring")
    Signed-off-by: Vladimir Oltean <[email protected]>
    Reviewed-by: Andrew Lunn <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    vladimiroltean authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    8c92436 View commit details
    Browse the repository at this point in the history
  6. net: netconsole: fix wrong warning

    A warning is triggered when there is insufficient space in the buffer
    for userdata. However, this is not an issue since userdata will be sent
    in the next iteration.
    
    Current warning message:
    
        ------------[ cut here ]------------
         WARNING: CPU: 13 PID: 3013042 at drivers/net/netconsole.c:1122 write_ext_msg+0x3b6/0x3d0
          ? write_ext_msg+0x3b6/0x3d0
          console_flush_all+0x1e9/0x330
    
    The code incorrectly issues a warning when this_chunk is zero, which is
    a valid scenario. The warning should only be triggered when this_chunk
    is negative.
    
    Fixes: 1ec9daf ("net: netconsole: append userdata to fragmented netconsole messages")
    Signed-off-by: Breno Leitao <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    leitao authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    d94785b View commit details
    Browse the repository at this point in the history
  7. mptcp: handle consistently DSS corruption

    Bugged peer implementation can send corrupted DSS options, consistently
    hitting a few warning in the data path. Use DEBUG_NET assertions, to
    avoid the splat on some builds and handle consistently the error, dumping
    related MIBs and performing fallback and/or reset according to the
    subflow type.
    
    Fixes: 6771bfd ("mptcp: update mptcp ack sequence from work queue")
    Cc: [email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Paolo Abeni authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    e32d262 View commit details
    Browse the repository at this point in the history
  8. tcp: fix mptcp DSS corruption due to large pmtu xmit

    Syzkaller was able to trigger a DSS corruption:
    
      TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies.
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
      Modules linked in:
      CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191ac2a0 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
      RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
      Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff
      RSP: 0018:ffffc90000006db8 EFLAGS: 00010246
      RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00
      RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0
      RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8
      R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000
      R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5
      FS:  000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       move_skbs_to_msk net/mptcp/protocol.c:811 [inline]
       mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854
       subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490
       tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283
       tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237
       tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
       tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350
       ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
       ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
       NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
       NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
       __netif_receive_skb_one_core net/core/dev.c:5662 [inline]
       __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
       process_backlog+0x662/0x15b0 net/core/dev.c:6107
       __napi_poll+0xcb/0x490 net/core/dev.c:6771
       napi_poll net/core/dev.c:6840 [inline]
       net_rx_action+0x89b/0x1240 net/core/dev.c:6962
       handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
       do_softirq+0x11b/0x1e0 kernel/softirq.c:455
       </IRQ>
       <TASK>
       __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
       local_bh_enable include/linux/bottom_half.h:33 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
       __dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451
       dev_queue_xmit include/linux/netdevice.h:3094 [inline]
       neigh_hh_output include/net/neighbour.h:526 [inline]
       neigh_output include/net/neighbour.h:540 [inline]
       ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
       ip_local_out net/ipv4/ip_output.c:130 [inline]
       __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536
       __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
       tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
       tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline]
       tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752
       __tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015
       tcp_push_pending_frames include/net/tcp.h:2107 [inline]
       tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline]
       tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239
       tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
       sk_backlog_rcv include/net/sock.h:1113 [inline]
       __release_sock+0x214/0x350 net/core/sock.c:3072
       release_sock+0x61/0x1f0 net/core/sock.c:3626
       mptcp_push_release net/mptcp/protocol.c:1486 [inline]
       __mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625
       mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0x1a6/0x270 net/socket.c:745
       ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603
       ___sys_sendmsg net/socket.c:2657 [inline]
       __sys_sendmsg+0x2aa/0x390 net/socket.c:2686
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7fb06e9317f9
      Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005
      RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000
      R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406
      R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003
       </TASK>
    
    Additionally syzkaller provided a nice reproducer. The repro enables
    pmtu on the loopback device, leading to tcp_mtu_probe() generating
    very large probe packets.
    
    tcp_can_coalesce_send_queue_head() currently does not check for
    mptcp-level invariants, and allowed the creation of cross-DSS probes,
    leading to the mentioned corruption.
    
    Address the issue teaching tcp_can_coalesce_send_queue_head() about
    mptcp using the tcp_skb_can_collapse(), also reducing the code
    duplication.
    
    Fixes: 8571248 ("tcp: coalesce/collapse must respect MPTCP extensions")
    Cc: [email protected]
    Reported-by: [email protected]
    Closes: multipath-tcp/mptcp_net-next#513
    Signed-off-by: Paolo Abeni <[email protected]>
    Acked-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Paolo Abeni authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    4dabcdf View commit details
    Browse the repository at this point in the history
  9. mptcp: fallback when MPTCP opts are dropped after 1st data

    As reported by Christoph [1], before this patch, an MPTCP connection was
    wrongly reset when a host received a first data packet with MPTCP
    options after the 3wHS, but got the next ones without.
    
    According to the MPTCP v1 specs [2], a fallback should happen in this
    case, because the host didn't receive a DATA_ACK from the other peer,
    nor receive data for more than the initial window which implies a
    DATA_ACK being received by the other peer.
    
    The patch here re-uses the same logic as the one used in other places:
    by looking at allow_infinite_fallback, which is disabled at the creation
    of an additional subflow. It's not looking at the first DATA_ACK (or
    implying one received from the other side) as suggested by the RFC, but
    it is in continuation with what was already done, which is safer, and it
    fixes the reported issue. The next step, looking at this first DATA_ACK,
    is tracked in [4].
    
    This patch has been validated using the following Packetdrill script:
    
       0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
    
      // 3WHS is OK
      +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
      +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
      +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
      +0 accept(3, ..., ...) = 4
    
      // Data from the client with valid MPTCP options (no DATA_ACK: normal)
      +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
      // From here, the MPTCP options will be dropped by a middlebox
      +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
    
      +0.1 read(4, ..., 500) = 500
      +0   write(4, ..., 100) = 100
    
      // The server replies with data, still thinking MPTCP is being used
      +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
      // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
      +0.1 <  . 501:501(0)   ack 101 win 2048
    
      +0.0 < P. 501:601(100) ack 101 win 2048
      // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
      +0.0 >  . 101:101(0)   ack 601
    
    Note that this script requires Packetdrill with MPTCP support, see [3].
    
    Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
    Cc: [email protected]
    Reported-by: Christoph Paasch <[email protected]>
    Closes: multipath-tcp/mptcp_net-next#518 [1]
    Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
    Link: https://github.com/multipath-tcp/packetdrill [3]
    Link: multipath-tcp/mptcp_net-next#519 [4]
    Reviewed-by: Paolo Abeni <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    matttbe authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    119d51e View commit details
    Browse the repository at this point in the history
  10. mptcp: pm: do not remove closing subflows

    In a previous fix, the in-kernel path-manager has been modified not to
    retrigger the removal of a subflow if it was already closed, e.g. when
    the initial subflow is removed, but kept in the subflows list.
    
    To be complete, this fix should also skip the subflows that are in any
    closing state: mptcp_close_ssk() will initiate the closure, but the
    switch to the TCP_CLOSE state depends on the other peer.
    
    Fixes: 58e1b66 ("mptcp: pm: do not remove already closed subflows")
    Cc: [email protected]
    Suggested-by: Paolo Abeni <[email protected]>
    Acked-by: Paolo Abeni <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    matttbe authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    db0a37b View commit details
    Browse the repository at this point in the history
  11. Merge branch 'mptcp-misc-fixes-involving-fallback-to-tcp'

    Matthieu Baerts says:
    
    ====================
    mptcp: misc. fixes involving fallback to TCP
    
    - Patch 1: better handle DSS corruptions from a bugged peer: reducing
      warnings, doing a fallback or a reset depending on the subflow state.
      For >= v5.7.
    
    - Patch 2: fix DSS corruption due to large pmtu xmit, where MPTCP was
      not taken into account. For >= v5.6.
    
    - Patch 3: fallback when MPTCP opts are dropped after the first data
      packet, instead of resetting the connection. For >= v5.6.
    
    - Patch 4: restrict the removal of a subflow to other closing states, a
      better fix, for a recent one. For >= v5.10.
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    5151a35 View commit details
    Browse the repository at this point in the history
  12. Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/tnguy/net-queue
    
    Tony Nguyen says:
    
    ====================
    Intel Wired LAN Driver Updates 2024-10-08 (ice, i40e, igb, e1000e)
    
    This series contains updates to ice, i40e, igb, and e1000e drivers.
    
    For ice:
    
    Marcin allows driver to load, into safe mode, when DDP package is
    missing or corrupted and adjusts the netif_is_ice() check to
    account for when the device is in safe mode. He also fixes an
    out-of-bounds issue when MSI-X are increased for VFs.
    
    Wojciech clears FDB entries on reset to match the hardware state.
    
    For i40e:
    
    Aleksandr adds locking around MACVLAN filters to prevent memory leaks
    due to concurrency issues.
    
    For igb:
    
    Mohamed Khalfella adds a check to not attempt to bring up an already
    running interface on non-fatal PCIe errors.
    
    For e1000e:
    
    Vitaly changes board type for I219 to more closely match the hardware
    and stop PHY issues.
    
    * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
      e1000e: change I219 (19) devices to ADP
      igb: Do not bring the device up after non-fatal error
      i40e: Fix macvlan leak by synchronizing access to mac_filter_hash
      ice: Fix increasing MSI-X on VF
      ice: Flush FDB entries before reset
      ice: Fix netif_is_ice() in Safe Mode
      ice: Fix entering Safe Mode
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    a354733 View commit details
    Browse the repository at this point in the history
  13. of: Fix unbalanced of node refcount and memory leaks

    Got following report when doing overlay_test:
    
    	OF: ERROR: memory leak, expected refcount 1 instead of 2,
    	of_node_get()/of_node_put() unbalanced - destroy cset entry:
    	attach overlay node            /kunit-test
    
    	OF: ERROR: memory leak before free overlay changeset,  /kunit-test
    
    In of_overlay_apply_kunit_cleanup(), the "np" should be associated with
    fake instead of test to call of_node_put(), so the node is put before
    the overlay is removed.
    
    It also fix the following memory leaks:
    
    	unreferenced object 0xffffff80c7d22800 (size 256):
    	  comm "kunit_try_catch", pid 236, jiffies 4294894764
    	  hex dump (first 32 bytes):
    	    d0 26 d4 c2 80 ff ff ff 00 00 00 00 00 00 00 00  .&..............
    	    60 19 75 c1 80 ff ff ff 00 00 00 00 00 00 00 00  `.u.............
    	  backtrace (crc ee0a471c):
    	    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    	    [<00000000c538ac7e>] __kmalloc_cache_noprof+0x26c/0x2f4
    	    [<00000000119f34f3>] __of_node_dup+0x4c/0x328
    	    [<00000000b212ca39>] build_changeset_next_level+0x2cc/0x4c0
    	    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    	    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    	    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    	    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    	    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    	    [<000000000b296be1>] kthread+0x2e8/0x374
    	    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
    	unreferenced object 0xffffff80c1751960 (size 16):
    	  comm "kunit_try_catch", pid 236, jiffies 4294894764
    	  hex dump (first 16 bytes):
    	    6b 75 6e 69 74 2d 74 65 73 74 00 c1 80 ff ff ff  kunit-test......
    	  backtrace (crc 18196259):
    	    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    	    [<0000000071006e2c>] __kmalloc_node_track_caller_noprof+0x300/0x3e0
    	    [<00000000b16ac6cb>] kstrdup+0x48/0x84
    	    [<0000000050e3373b>] __of_node_dup+0x60/0x328
    	    [<00000000b212ca39>] build_changeset_next_level+0x2cc/0x4c0
    	    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    	    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    	    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    	    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    	    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    	    [<000000000b296be1>] kthread+0x2e8/0x374
    	    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
    	unreferenced object 0xffffff80c2e96e00 (size 192):
    	  comm "kunit_try_catch", pid 236, jiffies 4294894764
    	  hex dump (first 32 bytes):
    	    80 19 75 c1 80 ff ff ff 0b 00 00 00 00 00 00 00  ..u.............
    	    a0 19 75 c1 80 ff ff ff 00 6f e9 c2 80 ff ff ff  ..u......o......
    	  backtrace (crc 1924cba4):
    	    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    	    [<00000000c538ac7e>] __kmalloc_cache_noprof+0x26c/0x2f4
    	    [<000000009fdd35ad>] __of_prop_dup+0x7c/0x2ec
    	    [<00000000aa4e0111>] add_changeset_property+0x548/0x9e0
    	    [<000000004777e25b>] build_changeset_next_level+0xd4/0x4c0
    	    [<00000000a9c93f8a>] build_changeset_next_level+0x3a8/0x4c0
    	    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    	    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    	    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    	    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    	    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    	    [<000000000b296be1>] kthread+0x2e8/0x374
    	    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
    	unreferenced object 0xffffff80c1751980 (size 16):
    	  comm "kunit_try_catch", pid 236, jiffies 4294894764
    	  hex dump (first 16 bytes):
    	    63 6f 6d 70 61 74 69 62 6c 65 00 c1 80 ff ff ff  compatible......
    	  backtrace (crc 42df3c87):
    	    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    	    [<0000000071006e2c>] __kmalloc_node_track_caller_noprof+0x300/0x3e0
    	    [<00000000b16ac6cb>] kstrdup+0x48/0x84
    	    [<00000000a8888fd8>] __of_prop_dup+0xb0/0x2ec
    	    [<00000000aa4e0111>] add_changeset_property+0x548/0x9e0
    	    [<000000004777e25b>] build_changeset_next_level+0xd4/0x4c0
    	    [<00000000a9c93f8a>] build_changeset_next_level+0x3a8/0x4c0
    	    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    	    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    	    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    	    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    	    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    	    [<000000000b296be1>] kthread+0x2e8/0x374
    	unreferenced object 0xffffff80c2e96f00 (size 192):
    	  comm "kunit_try_catch", pid 236, jiffies 4294894764
    	  hex dump (first 32 bytes):
    	    40 f7 bb c6 80 ff ff ff 0b 00 00 00 00 00 00 00  @...............
    	    c0 19 75 c1 80 ff ff ff 00 00 00 00 00 00 00 00  ..u.............
    	  backtrace (crc f2f57ea7):
    	    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    	    [<00000000c538ac7e>] __kmalloc_cache_noprof+0x26c/0x2f4
    	    [<000000009fdd35ad>] __of_prop_dup+0x7c/0x2ec
    	    [<00000000aa4e0111>] add_changeset_property+0x548/0x9e0
    	    [<000000004777e25b>] build_changeset_next_level+0xd4/0x4c0
    	    [<00000000a9c93f8a>] build_changeset_next_level+0x3a8/0x4c0
    	    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    	    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    	    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    	    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    	    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    	    [<000000000b296be1>] kthread+0x2e8/0x374
    	    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
    	......
    
    How to reproduce:
    	CONFIG_OF_OVERLAY_KUNIT_TEST=y, CONFIG_DEBUG_KMEMLEAK=y
    	and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, launch the kernel.
    
    Fixes: 5c9dd72 ("of: Add a KUnit test for overlays and test managed APIs")
    Reviewed-by: Stephen Boyd <[email protected]>
    Signed-off-by: Jinjie Ruan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Rob Herring (Arm) <[email protected]>
    Jinjie Ruan authored and robherring committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b68694a View commit details
    Browse the repository at this point in the history
  14. drm/fbdev-dma: Only cleanup deferred I/O if necessary

    Commit 5a498d4 ("drm/fbdev-dma: Only install deferred I/O if
    necessary") initializes deferred I/O only if it is used.
    drm_fbdev_dma_fb_destroy() however calls fb_deferred_io_cleanup()
    unconditionally with struct fb_info.fbdefio == NULL. KASAN with the
    out-of-tree Apple silicon display driver posts following warning from
    __flush_work() of a random struct work_struct instead of the expected
    NULL pointer derefs.
    
    [   22.053799] ------------[ cut here ]------------
    [   22.054832] WARNING: CPU: 2 PID: 1 at kernel/workqueue.c:4177 __flush_work+0x4d8/0x580
    [   22.056597] Modules linked in: uhid bnep uinput nls_ascii ip6_tables ip_tables i2c_dev loop fuse dm_multipath nfnetlink zram hid_magicmouse btrfs xor xor_neon brcmfmac_wcc raid6_pq hci_bcm4377 bluetooth brcmfmac hid_apple brcmutil nvmem_spmi_mfd simple_mfd_spmi dockchannel_hid cfg80211 joydev regmap_spmi nvme_apple ecdh_generic ecc macsmc_hid rfkill dwc3 appledrm snd_soc_macaudio macsmc_power nvme_core apple_isp phy_apple_atc apple_sart apple_rtkit_helper apple_dockchannel tps6598x macsmc_hwmon snd_soc_cs42l84 videobuf2_v4l2 spmi_apple_controller nvmem_apple_efuses videobuf2_dma_sg apple_z2 videobuf2_memops spi_nor panel_summit videobuf2_common asahi videodev pwm_apple apple_dcp snd_soc_apple_mca apple_admac spi_apple clk_apple_nco i2c_pasemi_platform snd_pcm_dmaengine mc i2c_pasemi_core mux_core ofpart adpdrm drm_dma_helper apple_dart apple_soc_cpufreq leds_pwm phram
    [   22.073768] CPU: 2 UID: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.11.2-asahi+ #asahi-dev
    [   22.075612] Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
    [   22.077032] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
    [   22.078567] pc : __flush_work+0x4d8/0x580
    [   22.079471] lr : __flush_work+0x54/0x580
    [   22.080345] sp : ffffc000836ef820
    [   22.081089] x29: ffffc000836ef880 x28: 0000000000000000 x27: ffff80002ddb7128
    [   22.082678] x26: dfffc00000000000 x25: 1ffff000096f0c57 x24: ffffc00082d3e358
    [   22.084263] x23: ffff80004b7862b8 x22: dfffc00000000000 x21: ffff80005aa1d470
    [   22.085855] x20: ffff80004b786000 x19: ffff80004b7862a0 x18: 0000000000000000
    [   22.087439] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000005
    [   22.089030] x14: 1ffff800106ddf0a x13: 0000000000000000 x12: 0000000000000000
    [   22.090618] x11: ffffb800106ddf0f x10: dfffc00000000000 x9 : 1ffff800106ddf0e
    [   22.092206] x8 : 0000000000000000 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000001
    [   22.093790] x5 : ffffc000836ef728 x4 : 0000000000000000 x3 : 0000000000000020
    [   22.095368] x2 : 0000000000000008 x1 : 00000000000000aa x0 : 0000000000000000
    [   22.096955] Call trace:
    [   22.097505]  __flush_work+0x4d8/0x580
    [   22.098330]  flush_delayed_work+0x80/0xb8
    [   22.099231]  fb_deferred_io_cleanup+0x3c/0x130
    [   22.100217]  drm_fbdev_dma_fb_destroy+0x6c/0xe0 [drm_dma_helper]
    [   22.101559]  unregister_framebuffer+0x210/0x2f0
    [   22.102575]  drm_fb_helper_unregister_info+0x48/0x60
    [   22.103683]  drm_fbdev_dma_client_unregister+0x4c/0x80 [drm_dma_helper]
    [   22.105147]  drm_client_dev_unregister+0x1cc/0x230
    [   22.106217]  drm_dev_unregister+0x58/0x570
    [   22.107125]  apple_drm_unbind+0x50/0x98 [appledrm]
    [   22.108199]  component_del+0x1f8/0x3a8
    [   22.109042]  dcp_platform_shutdown+0x24/0x38 [apple_dcp]
    [   22.110357]  platform_shutdown+0x70/0x90
    [   22.111219]  device_shutdown+0x368/0x4d8
    [   22.112095]  kernel_restart+0x6c/0x1d0
    [   22.112946]  __arm64_sys_reboot+0x1c8/0x328
    [   22.113868]  invoke_syscall+0x78/0x1a8
    [   22.114703]  do_el0_svc+0x124/0x1a0
    [   22.115498]  el0_svc+0x3c/0xe0
    [   22.116181]  el0t_64_sync_handler+0x70/0xc0
    [   22.117110]  el0t_64_sync+0x190/0x198
    [   22.117931] ---[ end trace 0000000000000000 ]---
    
    Signed-off-by: Janne Grunau <[email protected]>
    Fixes: 5a498d4 ("drm/fbdev-dma: Only install deferred I/O if necessary")
    Reviewed-by: Thomas Zimmermann <[email protected]>
    Reviewed-by: Linus Walleij <[email protected]>
    Signed-off-by: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/ZwLNuZL-8Gh5UUQb@robin
    jannau authored and Thomas Zimmermann committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    fcddc71 View commit details
    Browse the repository at this point in the history
  15. ata: libata: Update MAINTAINERS file

    Modify the entry for the ahci_platform driver (LIBATA SATA
    AHCI PLATFORM devices support) in the MAINTAINERS file to remove Jens
    as maintainer. Also remove all references to Jens block tree from the
    various LIBATA driver entries as the tree reference for these is defined
    by the LIBATA SUBSYSTEM entry.
    
    Signed-off-by: Damien Le Moal <[email protected]>
    Acked-by: Jens Axboe <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Niklas Cassel <[email protected]>
    damien-lemoal authored and floatious committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    0df4b9d View commit details
    Browse the repository at this point in the history
  16. crypto: api - Fix liveliness check in crypto_alg_tested

    As algorithm testing is carried out without holding the main crypto
    lock, it is always possible for the algorithm to go away during the
    test.
    
    So before crypto_alg_tested updates the status of the tested alg,
    it checks whether it's still on the list of all algorithms.  This
    is inaccurate because it may be off the main list but still on the
    list of algorithms to be removed.
    
    Updating the algorithm status is safe per se as the larval still
    holds a reference to it.  However, killing spawns of other algorithms
    that are of lower priority is clearly a deficiency as it adds
    unnecessary churn.
    
    Fix the test by checking whether the algorithm is dead.
    
    Signed-off-by: Herbert Xu <[email protected]>
    herbertx committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b81e286 View commit details
    Browse the repository at this point in the history
  17. crypto: testmgr - Hide ENOENT errors better

    The previous patch removed the ENOENT warning at the point of
    allocation, but the overall self-test warning is still there.
    
    Fix all of them by returning zero as the test result.  This is
    safe because if the algorithm has gone away, then it cannot be
    marked as tested.
    
    Fixes: 4eded6d ("crypto: testmgr - Hide ENOENT errors")
    Signed-off-by: Herbert Xu <[email protected]>
    herbertx committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    6318fbe View commit details
    Browse the repository at this point in the history
  18. crypto: marvell/cesa - Disable hash algorithms

    Disable cesa hash algorithms by lowering the priority because they
    appear to be broken when invoked in parallel.  This allows them to
    still be tested for debugging purposes.
    
    Reported-by: Klaus Kudielka <[email protected]>
    Signed-off-by: Herbert Xu <[email protected]>
    herbertx committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    e845d23 View commit details
    Browse the repository at this point in the history
  19. net: do not delay dst_entries_add() in dst_release()

    dst_entries_add() uses per-cpu data that might be freed at netns
    dismantle from ip6_route_net_exit() calling dst_entries_destroy()
    
    Before ip6_route_net_exit() can be called, we release all
    the dsts associated with this netns, via calls to dst_release(),
    which waits an rcu grace period before calling dst_destroy()
    
    dst_entries_add() use in dst_destroy() is racy, because
    dst_entries_destroy() could have been called already.
    
    Decrementing the number of dsts must happen sooner.
    
    Notes:
    
    1) in CONFIG_XFRM case, dst_destroy() can call
       dst_release_immediate(child), this might also cause UAF
       if the child does not have DST_NOCOUNT set.
       IPSEC maintainers might take a look and see how to address this.
    
    2) There is also discussion about removing this count of dst,
       which might happen in future kernels.
    
    Fixes: f886497 ("ipv4: fix dst race in sk_dst_get()")
    Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/
    Reported-by: Naresh Kamboju <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Tested-by: Naresh Kamboju <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: Xin Long <[email protected]>
    Cc: Steffen Klassert <[email protected]>
    Reviewed-by: Xin Long <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Eric Dumazet authored and Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    ac888d5 View commit details
    Browse the repository at this point in the history
  20. mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling

    While working with the T-Head 1520 LicheePi4A SoC, certain conditions
    arose that allowed me to reproduce a race issue in the sdhci code.
    
    To reproduce the bug, you need to enable the sdio1 controller in the
    device tree file
    `arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi` as follows:
    
    &sdio1 {
    	bus-width = <4>;
    	max-frequency = <100000000>;
    	no-sd;
    	no-mmc;
    	broken-cd;
    	cap-sd-highspeed;
    	post-power-on-delay-ms = <50>;
    	status = "okay";
    	wakeup-source;
    	keep-power-in-suspend;
    };
    
    When resetting the SoC using the reset button, the following messages
    appear in the dmesg log:
    
    [    8.164898] mmc2: Got command interrupt 0x00000001 even though no
    command operation was in progress.
    [    8.174054] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
    [    8.180503] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00000005
    [    8.186950] mmc2: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
    [    8.193395] mmc2: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
    [    8.199841] mmc2: sdhci: Present:   0x03da0000 | Host ctl: 0x00000000
    [    8.206287] mmc2: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
    [    8.212733] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x0000decf
    [    8.219178] mmc2: sdhci: Timeout:   0x00000000 | Int stat: 0x00000000
    [    8.225622] mmc2: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
    [    8.232068] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
    [    8.238513] mmc2: sdhci: Caps:      0x3f69c881 | Caps_1:   0x08008177
    [    8.244959] mmc2: sdhci: Cmd:       0x00000502 | Max curr: 0x00191919
    [    8.254115] mmc2: sdhci: Resp[0]:   0x00001009 | Resp[1]:  0x00000000
    [    8.260561] mmc2: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
    [    8.267005] mmc2: sdhci: Host ctl2: 0x00001000
    [    8.271453] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
    0x0000000000000000
    [    8.278594] mmc2: sdhci: ============================================
    
    I also enabled some traces to better understand the problem:
    
         kworker/3:1-62      [003] .....     8.163538: mmc_request_start:
    mmc2: start struct mmc_request[000000000d30cc0c]: cmd_opcode=5
    cmd_arg=0x0 cmd_flags=0x2e1 cmd_retries=0 stop_opcode=0 stop_arg=0x0
    stop_flags=0x0 stop_retries=0 sbc_opcode=0 sbc_arg=0x0 sbc_flags=0x0
    sbc_retires=0 blocks=0 block_size=0 blk_addr=0 data_flags=0x0 tag=0
    can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1
    retune_period=0
              <idle>-0       [000] d.h2.     8.164816: sdhci_cmd_irq:
    hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x10000
    intmask_p=0x18000
         irq/24-mmc2-96      [000] .....     8.164840: sdhci_thread_irq:
    msg=
         irq/24-mmc2-96      [000] d.h2.     8.164896: sdhci_cmd_irq:
    hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x1
    intmask_p=0x1
         irq/24-mmc2-96      [000] .....     8.285142: mmc_request_done:
    mmc2: end struct mmc_request[000000000d30cc0c]: cmd_opcode=5
    cmd_err=-110 cmd_resp=0x0 0x0 0x0 0x0 cmd_retries=0 stop_opcode=0
    stop_err=0 stop_resp=0x0 0x0 0x0 0x0 stop_retries=0 sbc_opcode=0
    sbc_err=0 sbc_resp=0x0 0x0 0x0 0x0 sbc_retries=0 bytes_xfered=0
    data_err=0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0
    hold_retune=1 retune_period=0
    
    Here's what happens: the __mmc_start_request function is called with
    opcode 5. Since the power to the Wi-Fi card, which resides on this SDIO
    bus, is initially off after the reset, an interrupt SDHCI_INT_TIMEOUT is
    triggered. Immediately after that, a second interrupt SDHCI_INT_RESPONSE
    is triggered. Depending on the exact timing, these conditions can
    trigger the following race problem:
    
    1) The sdhci_cmd_irq top half handles the command as an error. It sets
       host->cmd to NULL and host->pending_reset to true.
    2) The sdhci_thread_irq bottom half is scheduled next and executes faster
       than the second interrupt handler for SDHCI_INT_RESPONSE. It clears
       host->pending_reset before the SDHCI_INT_RESPONSE handler runs.
    3) The pending interrupt SDHCI_INT_RESPONSE handler gets called, triggering
       a code path that prints: "mmc2: Got command interrupt 0x00000001 even
       though no command operation was in progress."
    
    To solve this issue, we need to clear pending interrupts when resetting
    host->pending_reset. This ensures that after sdhci_threaded_irq restores
    interrupts, there are no pending stale interrupts.
    
    The behavior observed here is non-compliant with the SDHCI standard.
    Place the code in the sdhci-of-dwcmshc driver to account for a
    hardware-specific quirk instead of the core SDHCI code.
    
    Signed-off-by: Michal Wilczynski <[email protected]>
    Acked-by: Adrian Hunter <[email protected]>
    Fixes: 43658a5 ("mmc: sdhci-of-dwcmshc: Add support for T-Head TH1520")
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Michal Wilczynski authored and storulf committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    27e8fe0 View commit details
    Browse the repository at this point in the history
  21. Merge tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/netfilter/nf
    
    Pablo Neira Ayuso says:
    
    ====================
    Netfilter fixes for net
    
    The following patchset contains Netfilter fixes for net:
    
    1) Restrict xtables extensions to families that are safe, syzbot found
       a way to combine ebtables with extensions that are never used by
       userspace tools. From Florian Westphal.
    
    2) Set l3mdev inconditionally whenever possible in nft_fib to fix lookup
       mismatch, also from Florian.
    
    netfilter pull request 24-10-09
    
    * tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
      selftests: netfilter: conntrack_vrf.sh: add fib test case
      netfilter: fib: check correct rtable in vrf setups
      netfilter: xtables: avoid NFPROTO_UNSPEC where needed
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    9a3cd87 View commit details
    Browse the repository at this point in the history
  22. Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_lis…

    …t()"
    
    This reverts commit f790b5c.
    
    The reverted commit was not ready to be applied due to dependency on other
    OPP/pmdomain changes that didn't make it for the last release cycle. Let's
    revert it to fix the behaviour.
    
    Signed-off-by: Ulf Hansson <[email protected]>
    Acked-by: Viresh Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    storulf committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    fa36b4b View commit details
    Browse the repository at this point in the history
  23. PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()

    The dev_pm_domain_attach|detach_list() functions are not resource managed,
    hence they should not use devm_* helpers to manage allocation/freeing of
    data. Let's fix this by converting to the traditional alloc/free functions.
    
    Fixes: 161e16a ("PM: domains: Add helper functions to attach/detach multiple PM domains")
    Cc: [email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Acked-by: Viresh Kumar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    storulf committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    7738568 View commit details
    Browse the repository at this point in the history
  24. rtnetlink: Add bulk registration helpers for rtnetlink message handlers.

    Before commit addf9b9 ("net: rtnetlink: use rcu to free rtnl message
    handlers"), once rtnl_msg_handlers[protocol] was allocated, the following
    rtnl_register_module() for the same protocol never failed.
    
    However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to
    be allocated in each rtnl_register_module(), so each call could fail.
    
    Many callers of rtnl_register_module() do not handle the returned error,
    and we need to add many error handlings.
    
    To handle that easily, let's add wrapper functions for bulk registration
    of rtnetlink message handlers.
    
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    q2ven authored and Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    07cc7b0 View commit details
    Browse the repository at this point in the history
  25. vxlan: Handle error of rtnl_register_module().

    Since introduced, vxlan_vnifilter_init() has been ignoring the
    returned value of rtnl_register_module(), which could fail silently.
    
    Handling the error allows users to view a module as an all-or-nothing
    thing in terms of the rtnetlink functionality.  This prevents syzkaller
    from reporting spurious errors from its tests, where OOM often occurs
    and module is automatically loaded.
    
    Let's handle the errors by rtnl_register_many().
    
    Fixes: f9c4bb0 ("vxlan: vni filtering support on collect metadata device")
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Reviewed-by: Nikolay Aleksandrov <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    q2ven authored and Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    78b7b99 View commit details
    Browse the repository at this point in the history
  26. bridge: Handle error of rtnl_register_module().

    Since introduced, br_vlan_rtnl_init() has been ignoring the returned
    value of rtnl_register_module(), which could fail silently.
    
    Handling the error allows users to view a module as an all-or-nothing
    thing in terms of the rtnetlink functionality.  This prevents syzkaller
    from reporting spurious errors from its tests, where OOM often occurs
    and module is automatically loaded.
    
    Let's handle the errors by rtnl_register_many().
    
    Fixes: 8dcea18 ("net: bridge: vlan: add rtm definitions and dump support")
    Fixes: f26b296 ("net: bridge: vlan: add new rtm message support")
    Fixes: adb3ce9 ("net: bridge: vlan: add del rtm message support")
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Acked-by: Nikolay Aleksandrov <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    q2ven authored and Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    cba5e43 View commit details
    Browse the repository at this point in the history
  27. mctp: Handle error of rtnl_register_module().

    Since introduced, mctp has been ignoring the returned value of
    rtnl_register_module(), which could fail silently.
    
    Handling the error allows users to view a module as an all-or-nothing
    thing in terms of the rtnetlink functionality.  This prevents syzkaller
    from reporting spurious errors from its tests, where OOM often occurs
    and module is automatically loaded.
    
    Let's handle the errors by rtnl_register_many().
    
    Fixes: 583be98 ("mctp: Add device handling and netlink interface")
    Fixes: 831119f ("mctp: Add neighbour netlink interface")
    Fixes: 06d2f4c ("mctp: Add netlink route management")
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Reviewed-by: Jeremy Kerr <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    q2ven authored and Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    d517056 View commit details
    Browse the repository at this point in the history
  28. mpls: Handle error of rtnl_register_module().

    Since introduced, mpls_init() has been ignoring the returned
    value of rtnl_register_module(), which could fail silently.
    
    Handling the error allows users to view a module as an all-or-nothing
    thing in terms of the rtnetlink functionality.  This prevents syzkaller
    from reporting spurious errors from its tests, where OOM often occurs
    and module is automatically loaded.
    
    Let's handle the errors by rtnl_register_many().
    
    Fixes: 03c0566 ("mpls: Netlink commands to add, remove, and dump routes")
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    q2ven authored and Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    5be2062 View commit details
    Browse the repository at this point in the history
  29. phonet: Handle error of rtnl_register_module().

    Before commit addf9b9 ("net: rtnetlink: use rcu to free rtnl
    message handlers"), once the first rtnl_register_module() allocated
    rtnl_msg_handlers[PF_PHONET], the following calls never failed.
    
    However, after the commit, rtnl_register_module() could fail silently
    to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error
    handling for each call.
    
    Handling the error allows users to view a module as an all-or-nothing
    thing in terms of the rtnetlink functionality.  This prevents syzkaller
    from reporting spurious errors from its tests, where OOM often occurs
    and module is automatically loaded.
    
    Let's use rtnl_register_many() to handle the errors easily.
    
    Fixes: addf9b9 ("net: rtnetlink: use rcu to free rtnl message handlers")
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Acked-by: Rémi Denis-Courmont <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    q2ven authored and Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b5e837c View commit details
    Browse the repository at this point in the history
  30. Merge branch 'rtnetlink-handle-error-of-rtnl_register_module'

    Kuniyuki Iwashima says:
    
    ====================
    rtnetlink: Handle error of rtnl_register_module().
    
    While converting phonet to per-netns RTNL, I found a weird comment
    
      /* Further rtnl_register_module() cannot fail */
    
    that was true but no longer true after commit addf9b9 ("net:
    rtnetlink: use rcu to free rtnl message handlers").
    
    Many callers of rtnl_register_module() just ignore the returned
    value but should handle them properly.
    
    This series introduces two helpers, rtnl_register_many() and
    rtnl_unregister_many(), to do that easily and fix such callers.
    
    All rtnl_register() and rtnl_register_module() will be converted
    to _many() variant and some rtnl_lock() will be saved in _many()
    later in net-next.
    
    Changes:
      v4:
        * Add more context in changelog of each patch
    
      v3: https://lore.kernel.org/all/[email protected]/
        * Move module *owner to struct rtnl_msg_handler
        * Make struct rtnl_msg_handler args/vars const
        * Update mctp goto labels
    
      v2: https://lore.kernel.org/netdev/[email protected]/
        * Remove __exit from mctp_neigh_exit().
    
      v1: https://lore.kernel.org/netdev/[email protected]/
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Paolo Abeni committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    ffc8fa9 View commit details
    Browse the repository at this point in the history
  31. docs: netdev: document guidance on cleanup patches

    The purpose of this section is to document what is the current practice
    regarding clean-up patches which address checkpatch warnings and similar
    problems. I feel there is a value in having this documented so others
    can easily refer to it.
    
    Clearly this topic is subjective. And to some extent the current
    practice discourages a wider range of patches than is described here.
    But I feel it is best to start somewhere, with the most well established
    part of the current practice.
    
    Signed-off-by: Simon Horman <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    horms authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    aeb218d View commit details
    Browse the repository at this point in the history
  32. ppp: fix ppp_async_encode() illegal access

    syzbot reported an issue in ppp_async_encode() [1]
    
    In this case, pppoe_sendmsg() is called with a zero size.
    Then ppp_async_encode() is called with an empty skb.
    
    BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
     BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
      ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
      ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
      ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
      ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
      ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
      pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
      sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
      __release_sock+0x1da/0x330 net/core/sock.c:3072
      release_sock+0x6b/0x250 net/core/sock.c:3626
      pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
      sock_sendmsg_nosec net/socket.c:729 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:744
      ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
      __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
      __do_sys_sendmmsg net/socket.c:2771 [inline]
      __se_sys_sendmmsg net/socket.c:2768 [inline]
      __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
      x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:4092 [inline]
      slab_alloc_node mm/slub.c:4135 [inline]
      kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
      __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
      alloc_skb include/linux/skbuff.h:1322 [inline]
      sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
      pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
      sock_sendmsg_nosec net/socket.c:729 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:744
      ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
      __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
      __do_sys_sendmmsg net/socket.c:2771 [inline]
      __se_sys_sendmmsg net/socket.c:2768 [inline]
      __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
      x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
    
    Fixes: 1da177e ("Linux-2.6.12-rc2")
    Reported-by: [email protected]
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Eric Dumazet authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    40dddd4 View commit details
    Browse the repository at this point in the history
  33. net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC

    Eric report a panic on IPPROTO_SMC, and give the facts
    that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too.
    
    Bug: Unable to handle kernel NULL pointer dereference at virtual address
    0000000000000000
    Mem abort info:
    ESR = 0x0000000086000005
    EC = 0x21: IABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    FSC = 0x05: level 1 translation fault
    user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000
    [0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003,
    pud=0000000000000000
    Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted
    6.11.0-rc7-syzkaller-g5f5673607153 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS Google 08/06/2024
    pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : 0x0
    lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910
    sp : ffff80009b887a90
    x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000
    x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00
    x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000
    x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee
    x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001
    x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003
    x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1
    x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
    x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000
    x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000
    Call trace:
    0x0
    netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000
    smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593
    smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973
    security_socket_post_create+0x94/0xd4 security/security.c:4425
    __sock_create+0x4c8/0x884 net/socket.c:1587
    sock_create net/socket.c:1622 [inline]
    __sys_socket_create net/socket.c:1659 [inline]
    __sys_socket+0x134/0x340 net/socket.c:1706
    __do_sys_socket net/socket.c:1720 [inline]
    __se_sys_socket net/socket.c:1718 [inline]
    __arm64_sys_socket+0x7c/0x94 net/socket.c:1718
    __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
    invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
    el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
    do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
    el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
    el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
    el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
    Code: ???????? ???????? ???????? ???????? (????????)
    ---[ end trace 0000000000000000 ]---
    
    This patch add a toy implementation that performs a simple return to
    prevent such panic. This is because MSS can be set in sock_create_kern
    or smc_setsockopt, similar to how it's done in AF_SMC. However, for
    AF_SMC, there is currently no way to synchronize MSS within
    __sys_connect_file. This toy implementation lays the groundwork for us
    to support such feature for IPPROTO_SMC in the future.
    
    Fixes: d25a92c ("net/smc: Introduce IPPROTO_SMC")
    Reported-by: Eric Dumazet <[email protected]>
    Signed-off-by: D. Wythe <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Reviewed-by: Wenjia Zhang <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    D. Wythe authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    6fd27ea View commit details
    Browse the repository at this point in the history
  34. slip: make slhc_remember() more robust against malicious packets

    syzbot found that slhc_remember() was missing checks against
    malicious packets [1].
    
    slhc_remember() only checked the size of the packet was at least 20,
    which is not good enough.
    
    We need to make sure the packet includes the IPv4 and TCP header
    that are supposed to be carried.
    
    Add iph and th pointers to make the code more readable.
    
    [1]
    
    BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
      slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
      ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455
      ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline]
      ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212
      ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327
      pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
      sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
      __release_sock+0x1da/0x330 net/core/sock.c:3072
      release_sock+0x6b/0x250 net/core/sock.c:3626
      pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
      sock_sendmsg_nosec net/socket.c:729 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:744
      ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
      __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
      __do_sys_sendmmsg net/socket.c:2771 [inline]
      __se_sys_sendmmsg net/socket.c:2768 [inline]
      __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
      x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:4091 [inline]
      slab_alloc_node mm/slub.c:4134 [inline]
      kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
      __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
      alloc_skb include/linux/skbuff.h:1322 [inline]
      sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
      pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
      sock_sendmsg_nosec net/socket.c:729 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:744
      ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
      __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
      __do_sys_sendmmsg net/socket.c:2771 [inline]
      __se_sys_sendmmsg net/socket.c:2768 [inline]
      __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
      x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
    
    Fixes: b5451d7 ("slip: Move the SLIP drivers")
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/netdev/[email protected]/T/#u
    Signed-off-by: Eric Dumazet <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Eric Dumazet authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    7d3fce8 View commit details
    Browse the repository at this point in the history
  35. sched_ext: use correct function name in pick_task_scx() warning message

    pick_next_task_scx() was turned into pick_task_scx() since
    commit 753e283 ("sched_ext: Unify regular and core-sched pick
    task paths"). Update the outdated message.
    
    Signed-off-by: Honglei Wang <[email protected]>
    Signed-off-by: Tejun Heo <[email protected]>
    Honglei Wang authored and htejun committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    c425180 View commit details
    Browse the repository at this point in the history
  36. MAINTAINERS: consistently exclude wireless files from NETWORKING [GEN…

    …ERAL]
    
    We already exclude wireless drivers from the netdev@ traffic, to
    delegate it to linux-wireless@, and avoid overwhelming netdev@.
    
    Many of the following wireless-related sections MAINTAINERS
    are already not included in the NETWORKING [GENERAL] section.
    For consistency, exclude those that are.
    
    * 802.11 (including CFG80211/NL80211)
    * MAC80211
    * RFKILL
    
    Acked-by: Johannes Berg <[email protected]>
    Signed-off-by: Simon Horman <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    horms authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    9937aae View commit details
    Browse the repository at this point in the history
  37. MAINTAINERS: Add headers and mailing list to UDP section

    Add netdev mailing list and some more udp.h headers to the UDP section.
    This is now more consistent with the TCP section.
    
    Acked-by: Willem de Bruijn <[email protected]>
    Signed-off-by: Simon Horman <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    horms authored and kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    5404b5a View commit details
    Browse the repository at this point in the history
  38. Merge branch 'maintainers-networking-file-coverage-updates'

    Simon Horman says:
    
    ====================
    MAINTAINERS: Networking file coverage updates
    
    The aim of this proposal is to make the handling of some files,
    related to Networking and Wireless, more consistently. It does so by:
    
    1. Adding some more headers to the UDP section, making it consistent
       with the TCP section.
    
    2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
       making their handling consistent with other files related to
       Wireless.
    
    The aim of this is to make things more consistent.  And for MAINTAINERS
    to better reflect the situation on the ground.  I am more than happy to
    be told that the current state of affairs is fine. Or for other ideas to
    be discussed.
    
    v1: https://lore.kernel.org/[email protected]
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    7b43ba6 View commit details
    Browse the repository at this point in the history
  39. Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/x…

    …fs-linux
    
    Pull xfs fixes from Carlos Maiolino:
    
     - A few small typo fixes
    
     - fstests xfs/538 DEBUG-only fix
    
     - Performance fix on blockgc on COW'ed files, by skipping trims on
       cowblock inodes currently opened for write
    
     - Prevent cowblocks to be freed under dirty pagecache during unshare
    
     - Update MAINTAINERS file to quote the new maintainer
    
    * tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
      xfs: fix a typo
      xfs: don't free cowblocks from under dirty pagecache on unshare
      xfs: skip background cowblock trims on inodes open for write
      xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
      xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
      xfs: don't ifdef around the exact minlen allocations
      xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
      xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
      xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
      xfs: return bool from xfs_attr3_leaf_add
      xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
      xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
      xfs: scrub: convert comma to semicolon
      xfs: Remove empty declartion in header file
      MAINTAINERS: add Carlos Maiolino as XFS release manager
    torvalds committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    825ec75 View commit details
    Browse the repository at this point in the history
  40. rcu/nocb: Fix rcuog wake-up from offline softirq

    After a CPU has set itself offline and before it eventually calls
    rcutree_report_cpu_dead(), there are still opportunities for callbacks
    to be enqueued, for example from a softirq. When that happens on NOCB,
    the rcuog wake-up is deferred through an IPI to an online CPU in order
    not to call into the scheduler and risk arming the RT-bandwidth after
    hrtimers have been migrated out and disabled.
    
    But performing a synchronized IPI from a softirq is buggy as reported in
    the following scenario:
    
            WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
            Modules linked in: rcutorture torture
            CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
            Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
            RIP: 0010:smp_call_function_single
            <IRQ>
            swake_up_one_online
            __call_rcu_nocb_wake
            __call_rcu_common
            ? rcu_torture_one_read
            call_timer_fn
            __run_timers
            run_timer_softirq
            handle_softirqs
            irq_exit_rcu
            ? tick_handle_periodic
            sysvec_apic_timer_interrupt
            </IRQ>
    
    Fix this with forcing deferred rcuog wake up through the NOCB timer when
    the CPU is offline. The actual wake up will happen from
    rcutree_report_cpu_dead().
    
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-lkp/[email protected]
    Fixes: 9139f93 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
    Reviewed-by: "Joel Fernandes (Google)" <[email protected]>
    Signed-off-by: Frederic Weisbecker <[email protected]>
    Signed-off-by: Neeraj Upadhyay <[email protected]>
    Frederic Weisbecker authored and Neeraj Upadhyay committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    f7345cc View commit details
    Browse the repository at this point in the history
  41. Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/cel/linux
    
    Pull nfsd fixes from Chuck Lever:
    
     - Fix NFSD bring-up / shutdown
    
     - Fix a UAF when releasing a stateid
    
    * tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
      nfsd: fix possible badness in FREE_STATEID
      nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
      NFSD: Mark filecache "down" if init fails
    torvalds committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    5870963 View commit details
    Browse the repository at this point in the history
  42. Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
    
     - update fstrim loop and add more cancellation points, fix reported
       delayed or blocked suspend if there's a huge chunk queued
    
     - fix error handling in recent qgroup xarray conversion
    
     - in zoned mode, fix warning printing device path without RCU
       protection
    
     - again fix invalid extent xarray state (6252690), lost due to
       refactoring
    
    * tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
      btrfs: zoned: fix missing RCU locking in error message when loading zone info
      btrfs: fix missing error handling when adding delayed ref with qgroups enabled
      btrfs: add cancellation points to trim loops
      btrfs: split remaining space to discard in chunks
    torvalds committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    eb952c4 View commit details
    Browse the repository at this point in the history
  43. of: Skip kunit tests when arm64+ACPI doesn't populate root node

    A root node is required to apply DT overlays. A root node is usually
    present after commit 7b937cc ("of: Create of_root if no dtb
    provided by firmware"), except for on arm64 systems booted with ACPI
    tables. In that case, the root node is intentionally not populated
    because it would "allow DT devices to be instantiated atop an ACPI base
    system"[1].
    
    Introduce an OF function that skips the kunit test if the root node
    isn't populated. Limit the test to when both CONFIG_ARM64 and
    CONFIG_ACPI are set, because otherwise the lack of a root node is a bug.
    Make the function private and take a kunit test parameter so that it
    can't be abused to test for the presence of the root node in non-test
    code.
    
    Use this function to skip tests that require the root node. Currently
    that's the DT tests and any tests that apply overlays.
    
    Reported-by: Guenter Roeck <[email protected]>
    Closes: https://lore.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/r/[email protected] [1]
    Fixes: 893ecc6 ("of: Add KUnit test to confirm DTB is loaded")
    Signed-off-by: Stephen Boyd <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Acked-by: Mark Rutland <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Rob Herring (Arm) <[email protected]>
    bebarino authored and robherring committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    6e0391e View commit details
    Browse the repository at this point in the history
  44. Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/sc…

    …m/linux/kernel/git/trace/linux-trace
    
    Pull tracing fix from Steven Rostedt:
     "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
      callbacks
    
      When a ring buffer is mapped to memory assigned at boot, it also
      splits it up evenly between the possible CPUs. But the allocation code
      still attached a CPU notifier callback to this ring buffer. When a CPU
      is added, the callback will happen and another per-cpu buffer is
      created for the ring buffer.
    
      But for boot mapped buffers, there is no room to add another one (as
      they were all created already). The result of calling the CPU hotplug
      notifier on a boot mapped ring buffer is unpredictable and could lead
      to a system crash.
    
      If the ring buffer is boot mapped simply do not attach the CPU
      notifier to it"
    
    * tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
      ring-buffer: Do not have boot mapped buffers hook to CPU hotplug
    torvalds committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    0edab8d View commit details
    Browse the repository at this point in the history
  45. Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/netdev/net
    
    Pull networking fixes from Jakub Kicinski:
     "Including fixes from bluetooth and netfilter.
    
      Current release - regressions:
    
       - dsa: sja1105: fix reception from VLAN-unaware bridges
    
       - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
         enabled"
    
       - eth: fec: don't save PTP state if PTP is unsupported
    
      Current release - new code bugs:
    
       - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref
    
       - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop
    
       - phy: aquantia: AQR115c fix up PMA capabilities
    
      Previous releases - regressions:
    
       - tcp: 3 fixes for retrans_stamp and undo logic
    
      Previous releases - always broken:
    
       - net: do not delay dst_entries_add() in dst_release()
    
       - netfilter: restrict xtables extensions to families that are safe,
         syzbot found a way to combine ebtables with extensions that are
         never used by userspace tools
    
       - sctp: ensure sk_state is set to CLOSED if hashing fails in
         sctp_listen_start
    
       - mptcp: handle consistently DSS corruption, and prevent corruption
         due to large pmtu xmit"
    
    * tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
      MAINTAINERS: Add headers and mailing list to UDP section
      MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
      slip: make slhc_remember() more robust against malicious packets
      net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
      ppp: fix ppp_async_encode() illegal access
      docs: netdev: document guidance on cleanup patches
      phonet: Handle error of rtnl_register_module().
      mpls: Handle error of rtnl_register_module().
      mctp: Handle error of rtnl_register_module().
      bridge: Handle error of rtnl_register_module().
      vxlan: Handle error of rtnl_register_module().
      rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
      net: do not delay dst_entries_add() in dst_release()
      mptcp: pm: do not remove closing subflows
      mptcp: fallback when MPTCP opts are dropped after 1st data
      tcp: fix mptcp DSS corruption due to large pmtu xmit
      mptcp: handle consistently DSS corruption
      net: netconsole: fix wrong warning
      net: dsa: refuse cross-chip mirroring operations
      net: fec: don't save PTP state if PTP is unsupported
      ...
    torvalds committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    1d227fc View commit details
    Browse the repository at this point in the history
  46. Revert "sched_ext: Use shorter slice while bypassing"

    This reverts commit 6f34d8d.
    
    Slice length is ignored while bypassing and tasks are switched on every tick
    and thus the patch does not make any difference. The perceived difference
    was from test noise.
    
    Signed-off-by: Tejun Heo <[email protected]>
    Acked-by: David Vernet <[email protected]>
    htejun committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    54baa7a View commit details
    Browse the repository at this point in the history
  47. sched_ext: Start schedulers with consistent p->scx.slice values

    The disable path caps p->scx.slice to SCX_SLICE_DFL. As the field is already
    being ignored at this stage during disable, the only effect this has is that
    when the next BPF scheduler is loaded, it won't see unreasonable left-over
    slices. Ultimately, this shouldn't matter but it's better to start in a
    known state. Drop p->scx.slice capping from the disable path and instead
    reset it to SCX_SLICE_DFL in the enable path.
    
    Signed-off-by: Tejun Heo <[email protected]>
    Acked-by: David Vernet <[email protected]>
    htejun committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    3fdb9eb View commit details
    Browse the repository at this point in the history
  48. sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_…

    …dfl()
    
    Move the sanity check from the inner function scx_select_cpu_dfl() to the
    exported kfunc scx_bpf_select_cpu_dfl(). This doesn't cause behavior
    differences and will allow using scx_select_cpu_dfl() in bypass mode
    regardless of scx_builtin_idle_enabled.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    cc3e1ca View commit details
    Browse the repository at this point in the history
  49. sched_ext: bypass mode shouldn't depend on ops.select_cpu()

    Bypass mode was depending on ops.select_cpu() which can't be trusted as with
    the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
    bypass mode.
    
    Signed-off-by: Tejun Heo <[email protected]>
    Acked-by: David Vernet <[email protected]>
    htejun committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    aebe7ae View commit details
    Browse the repository at this point in the history
  50. sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers

    Iterating with scx_task_iter involves scx_tasks_lock and optionally the rq
    lock of the task being iterated. Both locks can be released during iteration
    and the iteration can be continued after re-grabbing scx_tasks_lock.
    Currently, all lock handling is pushed to the caller which is a bit
    cumbersome and makes it difficult to add lock-aware behaviors. Make the
    scx_task_iter helpers handle scx_tasks_lock.
    
    - scx_task_iter_init/scx_taks_iter_exit() now grabs and releases
      scx_task_lock, respectively. Renamed to
      scx_task_iter_start/scx_task_iter_stop() to more clearly indicate that
      there are non-trivial side-effects.
    
    - Add __ prefix to scx_task_iter_rq_unlock() to indicate that the function
      is internal.
    
    - Add scx_task_iter_unlock/relock(). The former drops both rq lock (if held)
      and scx_tasks_lock and the latter re-locks only scx_tasks_lock.
    
    This doesn't cause behavior changes and will be used to implement stall
    avoidance.
    
    Signed-off-by: Tejun Heo <[email protected]>
    Acked-by: David Vernet <[email protected]>
    htejun committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    967da57 View commit details
    Browse the repository at this point in the history
  51. sched_ext: Don't hold scx_tasks_lock for too long

    While enabling and disabling a BPF scheduler, every task is iterated a
    couple times by walking scx_tasks. Except for one, all iterations keep
    holding scx_tasks_lock. On multi-socket systems under heavy rq lock
    contention and high number of threads, this can can lead to RCU and other
    stalls.
    
    The following is triggered on a 2 x AMD EPYC 7642 system (192 logical CPUs)
    running `stress-ng --workload 150 --workload-threads 10` with >400k idle
    threads and RCU stall period reduced to 5s:
    
      rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
      rcu:     91-...!: (10 ticks this GP) idle=0754/1/0x4000000000000000 softirq=18204/18206 fqs=17
      rcu:     186-...!: (17 ticks this GP) idle=ec54/1/0x4000000000000000 softirq=25863/25866 fqs=17
      rcu:     (detected by 80, t=10042 jiffies, g=89305, q=33 ncpus=192)
      Sending NMI from CPU 80 to CPUs 91:
      NMI backtrace for cpu 91
      CPU: 91 UID: 0 PID: 284038 Comm: sched_ext_ops_h Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty torvalds#471
      Hardware name: Supermicro Super Server/H11DSi, BIOS 2.8 12/14/2023
      Sched_ext: simple (disabling+all)
      RIP: 0010:queued_spin_lock_slowpath+0x17b/0x2f0
      Code: 02 c0 10 03 00 83 79 08 00 75 08 f3 90 83 79 08 00 74 f8 48 8b 11 48 85 d2 74 09 0f 0d 0a eb 0a 31 d2 eb 06 31 d2 eb 02 f3 90 <8b> 07 66 85 c0 75 f7 39 d8 75 0d be 01 00 00 00 89 d8 f0 0f b1 37
      RSP: 0018:ffffc9000fadfcb8 EFLAGS: 00000002
      RAX: 0000000001700001 RBX: 0000000001700000 RCX: ffff88bfcaaf10c0
      RDX: 0000000000000000 RSI: 0000000000000101 RDI: ffff88bfca8f0080
      RBP: 0000000001700000 R08: 0000000000000090 R09: ffffffffffffffff
      R10: ffff88a74761b268 R11: 0000000000000000 R12: ffff88a6b6765460
      R13: ffffc9000fadfd60 R14: ffff88bfca8f0080 R15: ffff88bfcaac0000
      FS:  0000000000000000(0000) GS:ffff88bfcaac0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f5c55f526a0 CR3: 0000000afd474000 CR4: 0000000000350eb0
      Call Trace:
       <NMI>
       </NMI>
       <TASK>
       do_raw_spin_lock+0x9c/0xb0
       task_rq_lock+0x50/0x190
       scx_task_iter_next_locked+0x157/0x170
       scx_ops_disable_workfn+0x2c2/0xbf0
       kthread_worker_fn+0x108/0x2a0
       kthread+0xeb/0x110
       ret_from_fork+0x36/0x40
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      Sending NMI from CPU 80 to CPUs 186:
      NMI backtrace for cpu 186
      CPU: 186 UID: 0 PID: 51248 Comm: fish Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty torvalds#471
    
    scx_task_iter can safely drop locks while iterating. Make
    scx_task_iter_next() drop scx_tasks_lock every 32 iterations to avoid
    stalls.
    
    Signed-off-by: Tejun Heo <[email protected]>
    Acked-by: David Vernet <[email protected]>
    htejun committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b07996c View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    fe4a435 View commit details
    Browse the repository at this point in the history
  53. Merge tag 'drm-misc-fixes-2024-10-10' of https://gitlab.freedesktop.o…

    …rg/drm/misc/kernel into drm-fixes
    
    Short summary of fixes pull:
    
    fbdev-dma:
    - Only clean up deferred I/O if instanciated
    
    nouveau:
    - dmem: Fix privileged error in copy engine channel; Fix possible
    data leak in migrate_to_ram()
    - gsp: Fix coding style
    
    sched:
    - Avoid leaking lockdep map
    
    v3d:
    - Stop active perfmon before destroying it
    
    vc4:
    - Stop active perfmon before destroying it
    
    xe:
    - Drop GuC submit_wq pool
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    airlied committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b634acb View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2024

  1. Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org…

    …/drm/xe/kernel into drm-fixes
    
    Driver Changes:
    - Fix error checking with xa_store() (Matthe Auld)
    - Fix missing freq restore on GSC load error (Vinay)
    - Fix wedged_mode file permission (Matt Roper)
    - Fix use-after-free in ct communication (Matthew Auld)
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Lucas De Marchi <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/jri65tmv3bjbhqhxs5smv45nazssxzhtwphojem4uufwtjuliy@gsdhlh6kzsdy
    airlied committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    ac44ff7 View commit details
    Browse the repository at this point in the history
  2. powerpc/8xx: Fix kernel DTLB miss on dcbz

    Following OOPS is encountered while loading test_bpf module
    on powerpc 8xx:
    
    [  218.835567] BUG: Unable to handle kernel data access on write at 0xcb000000
    [  218.842473] Faulting instruction address: 0xc0017a80
    [  218.847451] Oops: Kernel access of bad area, sig: 11 [#1]
    [  218.852854] BE PAGE_SIZE=16K PREEMPT CMPC885
    [  218.857207] SAF3000 DIE NOTIFICATION
    [  218.860713] Modules linked in: test_bpf(+) test_module
    [  218.865867] CPU: 0 UID: 0 PID: 527 Comm: insmod Not tainted 6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty #1280
    [  218.875546] Hardware name: MIAE 8xx 0x500000 CMPC885
    [  218.880521] NIP:  c0017a8 LR: beab859c CTR: 000101d4
    [  218.885584] REGS: cac2bc90 TRAP: 0300   Not tainted  (6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty)
    [  218.894308] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 55005555  XER: a0007100
    [  218.901290] DAR: cb000000 DSISR: c2000000
    [  218.901290] GPR00: 000185d1 cac2bd50 c21b9580 caf7c030 c3883fcc 00000008 cafffffc 00000000
    [  218.901290] GPR08: 00040000 18300000 20000000 00000004 99005555 100d815e ca669d08 00000369
    [  218.901290] GPR16: ca730000 00000000 ca2c004c 00000000 00000000 0000035d 00000311 00000369
    [  218.901290] GPR24: ca732240 00000001 00030ba3 c3800000 00000000 00185d48 caf7c000 ca2c004c
    [  218.941087] NIP [c0017a8] memcpy+0x88/0xec
    [  218.945277] LR [beab859c] test_bpf_init+0x22c/0x3c90 [test_bpf]
    [  218.951476] Call Trace:
    [  218.953916] [cac2bd50] [beab8570] test_bpf_init+0x200/0x3c90 [test_bpf] (unreliable)
    [  218.962034] [cac2bde0] [c0004c04] do_one_initcall+0x4c/0x1fc
    [  218.967706] [cac2be40] [c00a2ec4] do_init_module+0x68/0x360
    [  218.973292] [cac2be60] [c00a5194] init_module_from_file+0x8c/0xc0
    [  218.979401] [cac2bed0] [c00a5568] sys_finit_module+0x250/0x3f0
    [  218.985248] [cac2bf20] [c000e390] system_call_exception+0x8c/0x15c
    [  218.991444] [cac2bf30] [c00120a8] ret_from_syscall+0x0/0x28
    
    This happens in the main loop of memcpy()
    
      ==>	c0017a8:	7c 0b 37 ec 	dcbz    r11,r6
    	c0017a84:	80 e4 00 04 	lwz     r7,4(r4)
    	c0017a88:	81 04 00 08 	lwz     r8,8(r4)
    	c0017a8c:	81 24 00 0c 	lwz     r9,12(r4)
    	c0017a90:	85 44 00 10 	lwzu    r10,16(r4)
    	c0017a94:	90 e6 00 04 	stw     r7,4(r6)
    	c0017a98:	91 06 00 08 	stw     r8,8(r6)
    	c0017a9c:	91 26 00 0c 	stw     r9,12(r6)
    	c0017aa0:	95 46 00 10 	stwu    r10,16(r6)
    	c0017aa4:	42 00 ff dc 	bdnz    c0017a8 <memcpy+0x88>
    
    Commit ac9f97f ("powerpc/8xx: Inconditionally use task PGDIR in
    DTLB misses") relies on re-reading DAR register to know if an error is
    due to a missing copy of a PMD entry in task's PGDIR, allthough DAR
    was already read in the exception prolog and copied into thread
    struct. This is because is it done very early in the exception and
    there are not enough registers available to keep a pointer to thread
    struct.
    
    However, dcbz instruction is buggy and doesn't update DAR register on
    fault. That is detected and generates a call to FixupDAR workaround
    which updates DAR copy in thread struct but doesn't fix DAR register.
    
    Let's fix DAR in addition to the update of DAR copy in thread struct.
    
    Fixes: ac9f97f ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses")
    Signed-off-by: Christophe Leroy <[email protected]>
    Signed-off-by: Michael Ellerman <[email protected]>
    Link: https://msgid.link/2b851399bd87e81c6ccb87ea3a7a6b32c7aa04d7.1728118396.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    8956c58 View commit details
    Browse the repository at this point in the history
  3. erofs: ensure regular inodes for file-backed mounts

    Only regular inodes are allowed for file-backed mounts, not directories
    (as seen in the original syzbot case) or special inodes.
    
    Also ensure that .read_folio() is implemented on the underlying fs
    for the primary device.
    
    Fixes: fb17675 ("erofs: add file-backed mount support")
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/r/[email protected]
    Tested-by: [email protected]
    Reviewed-by: Chao Yu <[email protected]>
    Signed-off-by: Gao Xiang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    hsiangkao committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    416a8b2 View commit details
    Browse the repository at this point in the history
  4. erofs: get rid of z_erofs_try_to_claim_pcluster()

    Just fold it into the caller for simplicity.
    
    Reviewed-by: Chao Yu <[email protected]>
    Signed-off-by: Gao Xiang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    hsiangkao committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    2402082 View commit details
    Browse the repository at this point in the history
  5. erofs: get rid of kaddr in struct z_erofs_maprecorder

    `kaddr` becomes useless after switching to metabuf.
    
    Reviewed-by: Chao Yu <[email protected]>
    Signed-off-by: Gao Xiang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    hsiangkao committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    ae54567 View commit details
    Browse the repository at this point in the history
  6. HID: wacom: Hardcode (non-inverted) AES pens as BTN_TOOL_PEN

    Unlike EMR tools which encode type information in their tool ID, tools
    for AES sensors are all "generic pens". It is inappropriate to make use
    of the wacom_intuos_get_tool_type function when dealing with these kinds
    of devices. Instead, we should only ever report BTN_TOOL_PEN or
    BTN_TOOL_RUBBER, as depending on the state of the Eraser and Invert
    bits.
    
    Reported-by: Daniel Jutz <[email protected]>
    Closes: https://lore.kernel.org/linux-input/[email protected]/
    Bisected-by: Christian Heusel <[email protected]>
    Fixes: 9c2913b ("HID: wacom: more appropriate tool type categorization")
    Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/1041
    Link: linuxwacom/input-wacom#440
    Signed-off-by: Jason Gerecke <[email protected]>
    Cc: [email protected]
    Acked-by: Benjamin Tissoires <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    jigpu authored and Jiri Kosina committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    2934b12 View commit details
    Browse the repository at this point in the history
  7. f2fs: allow parallel DIO reads

    This fixes a regression which prevents parallel DIO reads.
    
    Fixes: 0cac511 ("f2fs: fix to avoid racing in between read and OPU dio write")
    Reviewed-by: Daeho Jeong <[email protected]>
    Signed-off-by: Jaegeuk Kim <[email protected]>
    Jaegeuk Kim committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    332fade View commit details
    Browse the repository at this point in the history
  8. ksmbd: add support for supplementary groups

    Even though system user has a supplementary group, It gets
    NT_STATUS_ACCESS_DENIED when attempting to create file or directory.
    This patch add KSMBD_EVENT_LOGIN_REQUEST_EXT/RESPONSE_EXT netlink events
    to get supplementary groups list. The new netlink event doesn't break
    backward compatibility when using old ksmbd-tools.
    
    Co-developed-by: Atte Heikkilä <[email protected]>
    Signed-off-by: Atte Heikkilä <[email protected]>
    Signed-off-by: Namjae Jeon <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    namjaejeon authored and Steve French committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    a77e0e0 View commit details
    Browse the repository at this point in the history
  9. btrfs: use sector numbers as keys for the dirty extents xarray

    We are using the logical address ("bytenr") of an extent as the key for
    qgroup records in the dirty extents xarray. This is a problem because the
    xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
    platform any extent starting at or beyond 4G is truncated, which is a too
    low limitation as virtually everyone is using storage with more than 4G of
    space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
    16G for example, resulting in incorrect qgroup accounting.
    
    Fix this by using sector numbers as keys instead, that is, using keys that
    match the logical address right shifted by fs_info->sectorsize_bits, which
    is what we do for the fs_info->buffer_radix that tracks extent buffers
    (radix trees also use an "unsigned long" type for keys). This also makes
    the index space more dense which helps optimize the xarray (as mentioned
    at Documentation/core-api/xarray.rst).
    
    Fixes: 3cce39a ("btrfs: qgroup: use xarray to track dirty extents in transaction")
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    97420be View commit details
    Browse the repository at this point in the history
  10. RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES

    While running ISER over SIW, the initiator machine encounters a warning
    from skb_splice_from_iter() indicating that a slab page is being used in
    send_page. To address this, it is better to add a sendpage_ok() check
    within the driver itself, and if it returns 0, then MSG_SPLICE_PAGES flag
    should be disabled before entering the network stack.
    
    A similar issue has been discussed for NVMe in this thread:
    https://lore.kernel.org/all/[email protected]/
    
      WARNING: CPU: 0 PID: 5342 at net/core/skbuff.c:7140 skb_splice_from_iter+0x173/0x320
      Call Trace:
       tcp_sendmsg_locked+0x368/0xe40
       siw_tx_hdt+0x695/0xa40 [siw]
       siw_qp_sq_process+0x102/0xb00 [siw]
       siw_sq_resume+0x39/0x110 [siw]
       siw_run_sq+0x74/0x160 [siw]
       kthread+0xd2/0x100
       ret_from_fork+0x34/0x40
       ret_from_fork_asm+0x1a/0x30
    
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Showrya M N <[email protected]>
    Signed-off-by: Potnuri Bharat Teja <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Showrya M N authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    4e1e3dd View commit details
    Browse the repository at this point in the history
  11. RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP

    ip_dev_find() always returns real net_device address, whether traffic is
    running on a vlan or real device, if traffic is over vlan, filling
    endpoint struture with real ndev and an attempt to send a connect request
    will results in RDMA_CM_EVENT_UNREACHABLE error.  This patch fixes the
    issue by using vlan_dev_real_dev().
    
    Fixes: 830662f ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Anumula Murali Mohan Reddy <[email protected]>
    Signed-off-by: Potnuri Bharat Teja <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Anumula-Murali-Mohan-Reddy authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    c659b40 View commit details
    Browse the repository at this point in the history
  12. RDMA/irdma: Fix misspelling of "accept*"

    There is "accept*" misspelled as "accpet*" in the comments.  Fix the
    spelling.
    
    Fixes: 146b975 ("RDMA/irdma: Add connection manager")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Alexander Zubkov <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    user318 authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    8cddfa5 View commit details
    Browse the repository at this point in the history
  13. RDMA/srpt: Make slab cache names unique

    Since commit 4c39529 ("slab: Warn on duplicate cache names when
    DEBUG_VM=y"), slab complains about duplicate cache names. Hence this
    patch. The approach is as follows:
    - Maintain an xarray with the slab size as index and a reference count
      and a kmem_cache pointer as contents. Use srpt-${slab_size} as kmem
      cache name.
    - Use 512-byte alignment for all slabs instead of only for some of the
      slabs.
    - Increment the reference count instead of calling kmem_cache_create().
    - Decrement the reference count instead of calling kmem_cache_destroy().
    
    Fixes: 5dabcd0 ("RDMA/srpt: Add support for immediate data")
    Link: https://patch.msgid.link/r/[email protected]
    Reported-by: Shinichiro Kawasaki <[email protected]>
    Closes: https://lore.kernel.org/linux-block/xpe6bea7rakpyoyfvspvin2dsozjmjtjktpph7rep3h25tv7fb@ooz4cu5z6bq6/
    Suggested-by: Jason Gunthorpe <[email protected]>
    Signed-off-by: Bart Van Assche <[email protected]>
    Tested-by: Shin'ichiro Kawasaki <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    bvanassche authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    4d784c0 View commit details
    Browse the repository at this point in the history
  14. btrfs: fix uninitialized pointer free in add_inode_ref()

    The add_inode_ref() function does not initialize the "name" struct when
    it is declared.  If any of the following calls to "read_one_inode()
    returns NULL,
    
    	dir = read_one_inode(root, parent_objectid);
    	if (!dir) {
    		ret = -ENOENT;
    		goto out;
    	}
    
    	inode = read_one_inode(root, inode_objectid);
    	if (!inode) {
    		ret = -EIO;
    		goto out;
    	}
    
    then "name.name" would be freed on "out" before being initialized.
    
    out:
    	...
    	kfree(name.name);
    
    This issue was reported by Coverity with CID 1526744.
    
    Fixes: e43eec8 ("btrfs: use struct qstr instead of name and namelen pairs")
    CC: [email protected] # 6.6+
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Roi Martin <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    jroimartin authored and kdave committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    66691c6 View commit details
    Browse the repository at this point in the history
  15. btrfs: send: cleanup unneeded return variable in changed_verity()

    As all changed_* functions need to return something, just return 0
    directly here, as the verity status is passed via the context.
    
    Reported by LKP: fs/btrfs/send.c:6877:5-8: Unneeded variable: "ret". Return "0" on line 6883
    
    Reported-by: kernel test robot <[email protected]>
    Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Signed-off-by: Christian Heusel <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    christian-heusel authored and kdave committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    a0af493 View commit details
    Browse the repository at this point in the history
  16. btrfs: fix uninitialized pointer free on read_alloc_one_name() error

    The function read_alloc_one_name() does not initialize the name field of
    the passed fscrypt_str struct if kmalloc fails to allocate the
    corresponding buffer.  Thus, it is not guaranteed that
    fscrypt_str.name is initialized when freeing it.
    
    This is a follow-up to the linked patch that fixes the remaining
    instances of the bug introduced by commit e43eec8 ("btrfs: use
    struct qstr instead of name and namelen pairs").
    
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Fixes: e43eec8 ("btrfs: use struct qstr instead of name and namelen pairs")
    CC: [email protected] # 6.1+
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: Roi Martin <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    jroimartin authored and kdave committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    2ab5e24 View commit details
    Browse the repository at this point in the history
  17. Merge tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/dr…

    …m/kernel
    
    Pull drm fixes from Dave Airlie:
     "Weekly fixes haul for drm, lots of small fixes all over, amdgpu, xe
      lead the way, some minor nouveau and radeon fixes, and then a bunch of
      misc all over.
    
      Nothing too scary or out of the unusual.
    
      sched:
       - Avoid leaking lockdep map
    
      fbdev-dma:
       - Only clean up deferred I/O if instanciated
    
      amdgpu:
       - Fix invalid UBSAN warnings
       - Fix artifacts in MPO transitions
       - Hibernation fix
    
      amdkfd:
       - Fix an eviction fence leak
    
      radeon:
       - Add late register for connectors
       - Always set GEM function pointers
    
      i915:
       - HDCP refcount fix
    
      nouveau:
       - dmem: Fix privileged error in copy engine channel; Fix possible
         data leak in migrate_to_ram()
       - gsp: Fix coding style
    
      v3d:
       - Stop active perfmon before destroying it
    
      vc4:
       - Stop active perfmon before destroying it
    
      xe:
       - Drop GuC submit_wq pool
       - Fix error checking with xa_store()
       - Fix missing freq restore on GSC load error
       - Fix wedged_mode file permission
       - Fix use-after-free in ct communication"
    
    * tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel:
      drm/fbdev-dma: Only cleanup deferred I/O if necessary
      drm/xe: Make wedged_mode debugfs writable
      drm/xe: Restore GT freq on GSC load error
      drm/xe/guc_submit: fix xa_store() error checking
      drm/xe/ct: fix xa_store() error checking
      drm/xe/ct: prevent UAF in send_recv()
      drm/radeon: always set GEM function pointer
      nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error
      nouveau/dmem: Fix privileged error in copy engine channel
      drm/amd/display: fix hibernate entry for DCN35+
      drm/amd/display: Clear update flags after update has been applied
      drm/amdgpu: partially revert powerplay `__counted_by` changes
      drm/radeon: add late_register for connector
      drm/amdkfd: Fix an eviction fence leak
      drm/vc4: Stop the active perfmon before being destroyed
      drm/v3d: Stop the active perfmon before being destroyed
      drm/i915/hdcp: fix connector refcounting
      drm/nouveau/gsp: remove extraneous ; after mutex
      drm/xe: Drop GuC submit_wq pool
      drm/sched: Use drm sched lockdep map for submit_wq
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    befcc89 View commit details
    Browse the repository at this point in the history
  18. Merge tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/libata/linux
    
    Pull ata fixes from Niklas Cassel:
    
     - Fix a hibernate regression where the disk was needlessly spun down
       and then immediately spun up both when entering and when resuming
       from hibernation (me)
    
     - Update the MAINTAINERS file to remove remnants from Jens
       maintainership of libata (Damien)
    
    * tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
      ata: libata: Update MAINTAINERS file
      ata: libata: avoid superfluous disk spin down + spin up during hibernation
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    3700dc9 View commit details
    Browse the repository at this point in the history
  19. Merge tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/ulfh/mmc
    
    Pull MMC fixes from Ulf Hansson:
     "MMC core:
       - Prevent splat from warning when setting maximum DMA segment
    
      MMC host:
       - mvsdio: Drop sg_miter support for PIO as it didn't work
       - sdhci-of-dwcmshc: Prevent stale interrupt for the T-Head 1520
         variant"
    
    * tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
      mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
      Revert "mmc: mvsdio: Use sg_miter for PIO"
      mmc: core: Only set maximum DMA segment size if DMA is supported
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    7351a87 View commit details
    Browse the repository at this point in the history
  20. Merge tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/ulfh/linux-pm
    
    Pull pmdomain fixes from Ulf Hansson:
     "pmdomain core:
       - Fix alloc/free in dev_pm_domain_attach|detach_list()
    
      pmdomain providers:
       - qcom: Fix the return of uninitialized variable
    
      pmdomain consumers:
       - drm/tegra/gr3d: Revert conversion to dev_pm_domain_attach|detach_list()
    
      OPP core:
       - Fix error code in dev_pm_opp_set_config()"
    
    * tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
      PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()
      Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"
      pmdomain: qcom-cpr: Fix the return of uninitialized variable
      OPP: fix error code in dev_pm_opp_set_config()
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    22e6aba View commit details
    Browse the repository at this point in the history
  21. Merge tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/rafael/linux-pm
    
    Pull ACPI fixes from Rafael Wysocki:
     "Reduce the number of ACPI IRQ override DMI quirks by combining quirks
      that cover similar systems while making them cover additional models
      at the same time (Hans de Goede)"
    
    * tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
      ACPI: resource: Fold Asus Vivobook Pro N6506M* DMI quirks together
      ACPI: resource: Fold Asus ExpertBook B1402C* and B1502C* DMI quirks together
      ACPI: resource: Make Asus ExpertBook B2502 matches cover more models
      ACPI: resource: Make Asus ExpertBook B2402 matches cover more models
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    325354c View commit details
    Browse the repository at this point in the history
  22. Merge tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/rafael/linux-pm
    
    Pull thermal control fixes from Rafael Wysocki:
     "Address possible use-after-free scenarios during the processing of
      thermal netlink commands and during thermal zone removal (Rafael
      Wysocki)"
    
    * tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
      thermal: core: Free tzp copy along with the thermal zone
      thermal: core: Reference count the zone in thermal_zone_get_by_id()
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    f8fafb6 View commit details
    Browse the repository at this point in the history
  23. Merge tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/rafael/linux-pm
    
    Pull power management fixes from Rafael Wysocki:
     "These address two issues in the TPMI module of the Intel RAPL power
      capping driver and one issue in the processor part of the Intel
      int340x thermal driver, update a CPU ID list and register definitions
      needed for RAPL PL4 support and remove some unused code.
    
      Specifics:
    
       - Fix the TPMI_RAPL_REG_DOMAIN_INFO register offset in the TPMI part
         of the Intel RAPL power capping driver, make it ignore minor
         hardware version mismatches (which only indicate exposing
         additional features) and update register definitions in it to
         enable PL4 support (Zhang Rui)
    
       - Add Arrow Lake-U to the list of processors supporting PL4 in the
         MSR part of the Intel RAPL power capping driver (Sumeet Pawnikar)
    
       - Remove excess pci_disable_device() calls from the processor part of
         the int340x thermal driver to address a warning triggered during
         module unload and remove unused CPU hotplug code related to RAPL
         support from it (Zhang Rui)"
    
    * tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
      thermal: intel: int340x: processor: Add MMIO RAPL PL4 support
      thermal: intel: int340x: processor: Remove MMIO RAPL CPU hotplug support
      powercap: intel_rapl_msr: Add PL4 support for Arrowlake-U
      powercap: intel_rapl_tpmi: Ignore minor version change
      thermal: intel: int340x: processor: Fix warning during module unload
      powercap: intel_rapl_tpmi: Fix bogus register reading
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    e643eda View commit details
    Browse the repository at this point in the history
  24. Merge tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux

    Pull io_uring fixes from Jens Axboe:
    
     - Explicitly have a mshot_finished condition for IORING_OP_RECV in
       multishot mode, similarly to what IORING_OP_RECVMSG has. This doesn't
       fix a bug right now, but it makes it harder to actually have a bug
       here if a request takes multiple iterations to finish.
    
     - Fix handling of retry of read/write of !FMODE_NOWAIT files. If they
       are pollable, that's all we need.
    
    * tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux:
      io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT
      io_uring/rw: fix cflags posting for single issue multishot read
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    9e4c6c1 View commit details
    Browse the repository at this point in the history
  25. selftests/rseq: Fix mm_cid test failure

    Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by:
    
    glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)")
    
    Without this fix, rseq selftests for mm_cid fail:
    
    ./run_param_test.sh
    Default parameters
    Running test spinlock
    Running compare-twice test spinlock
    Running mm_cid test spinlock
    Error: cpu id getter unavailable
    
    Fixes: 18c2355 ("selftests/rseq: Implement rseq mm_cid field support")
    Signed-off-by: Mathieu Desnoyers <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    CC: Boqun Feng <[email protected]>
    CC: "Paul E. McKenney" <[email protected]>
    Cc: Shuah Khan <[email protected]>
    CC: Carlos O'Donell <[email protected]>
    CC: Florian Weimer <[email protected]>
    CC: [email protected]
    CC: [email protected]
    Signed-off-by: Shuah Khan <[email protected]>
    compudj authored and shuahkh committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    a0cc649 View commit details
    Browse the repository at this point in the history
  26. ftrace/selftest: Test combination of function_graph tracer and functi…

    …on profiler
    
    Masami reported a bug when running function graph tracing then the
    function profiler. The following commands would cause a kernel crash:
    
      # cd /sys/kernel/tracing/
      # echo function_graph > current_tracer
      # echo 1 > function_profile_enabled
    
    In that order. Create a test to test this two to make sure this does not
    come back as a regression.
    
    Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devnote2
    
    Link: https://lore.kernel.org/all/[email protected]/
    Acked-by: Masami Hiramatsu (Google) <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Shuah Khan <[email protected]>
    rostedt authored and shuahkh committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    4ee5ca9 View commit details
    Browse the repository at this point in the history
  27. Merge tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/xen/tip
    
    Pull xen fix from Juergen Gross:
     "A fix for topology information of Xen PV guests"
    
    * tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
      x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    d947d68 View commit details
    Browse the repository at this point in the history
  28. Merge tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/rcu/linux
    
    Pull RCU fix from Neeraj Upadhyay:
     "Fix rcuog kthread wakeup invocation from softirq context on a CPU
      which has been marked offline.
    
      This can happen when new callbacks are enqueued from a softirq on an
      offline CPU before it calls rcutree_report_cpu_dead(). When this
      happens on NOCB configuration, the rcuog wake-up is deferred through
      an IPI to an online CPU. This is done to avoid call into the scheduler
      which can risk arming the RT-bandwidth after hrtimers have been
      migrated out and disabled.
    
      However, doing IPI call from softirq is not allowed: Fix this by
      forcing deferred rcuog wakeup through the NOCB timer when the CPU is
      offline"
    
    * tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
      rcu/nocb: Fix rcuog wake-up from offline softirq
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    a102976 View commit details
    Browse the repository at this point in the history
  29. Merge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/l…

    …inux-nfs
    
    Pull NFS client fixes from Anna Schumaker:
     "Localio Bugfixes:
       - remove duplicated include in localio.c
       - fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
       - fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
       - fix nfsd_file tracepoints to handle NULL rqstp pointers
    
      Other Bugfixes:
       - fix program selection loop in svc_process_common
       - fix integer overflow in decode_rc_list()
       - prevent NULL-pointer dereference in nfs42_complete_copies()
       - fix CB_RECALL performance issues when using a large number of
         delegations"
    
    * tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
      NFS: remove revoked delegation from server's delegation list
      nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
      nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
      nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
      NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
      SUNRPC: Fix integer overflow in decode_rc_list()
      sunrpc: fix prog selection loop in svc_process_common
      nfs: Remove duplicated include in localio.c
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    6254d53 View commit details
    Browse the repository at this point in the history
  30. Merge tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/brgl/linux
    
    Pull gpio fixes from Bartosz Golaszewski:
    
     - fix clock handle leak in probe() error path in gpio-aspeed
    
     - add a dummy register read to ensure the write actually completed
    
    * tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
      gpio: aspeed: Use devm_clk api to manage clock source
      gpio: aspeed: Add the flush write to ensure the write complete.
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    547fc32 View commit details
    Browse the repository at this point in the history
  31. Merge tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/deller/linux-fbdev
    
    Pull fbdev platform driver fix from Helge Deller:
     "Switch fbdev drivers back to struct platform_driver::remove()
    
      Now that 'remove()' has been converted to the sane new API, there's
      no reason for the 'remove_new()' use, so this converts back to the
      traditional and simpler name.
    
      See commits
    
         5c5a768 ("platform: Provide a remove callback that returns no value")
         0edb555 ("platform: Make platform_driver::remove() return void")
    
      for background to this all"
    
    * tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
      fbdev: Switch back to struct platform_driver::remove()
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    9066258 View commit details
    Browse the repository at this point in the history
  32. Merge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/s…

    …cm/linux/kernel/git/robh/linux
    
    Pull devicetree fixes from Rob Herring:
    
     - Disable kunit tests for arm64+ACPI
    
     - Fix refcount issue in kunit tests
    
     - Drop constraints on non-conformant 'interrupt-map' in fsl,ls-extirq
    
     - Drop type ref on 'msi-parent in fsl,qoriq-mc binding
    
     - Move elgin,jg10309-01 to its own binding from trivial-devices
    
    * tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
      of: Skip kunit tests when arm64+ACPI doesn't populate root node
      of: Fix unbalanced of node refcount and memory leaks
      dt-bindings: interrupt-controller: fsl,ls-extirq: workaround wrong interrupt-map number
      dt-bindings: misc: fsl,qoriq-mc: remove ref for msi-parent
      dt-bindings: display: elgin,jg10309-01: Add own binding
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    974099e View commit details
    Browse the repository at this point in the history
  33. Merge tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pu…

    …b/scm/linux/kernel/git/shuah/linux-kselftest
    
    Pull kselftest fixes from Shuah Khan:
     "Fixes for build, run-time errors, and reporting errors:
    
       - ftrace: regression test for a kernel crash when running function
         graph tracing and then enabling function profiler.
    
       - rseq: fix for mm_cid test failure.
    
       - vDSO:
          - fixes to reporting skip and other error conditions
          - changes unconditionally build chacha and getrandom tests on all
            architectures to make it easier for them to run in CIs
          - build error when sched.h to bring in CLONE_NEWTIME define"
    
    * tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
      ftrace/selftest: Test combination of function_graph tracer and function profiler
      selftests/rseq: Fix mm_cid test failure
      selftests: vDSO: Explicitly include sched.h
      selftests: vDSO: improve getrandom and chacha error messages
      selftests: vDSO: unconditionally build getrandom test
      selftests: vDSO: unconditionally build chacha test
    torvalds committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    09f6b0c View commit details
    Browse the repository at this point in the history
  34. RDMA/bnxt_re: Fix the max CQ WQEs for older adapters

    Older adapters doesn't support the MAX CQ WQEs reported by older FW. So
    restrict the value reported to 1M always for older adapters.
    
    Fixes: 1ac5a40 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Abhishek Mohapatra<[email protected]>
    Reviewed-by: Chandramohan Akula <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Abhishek Mohapatra authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    ac6df53 View commit details
    Browse the repository at this point in the history
  35. RDMA/bnxt_re: Fix out of bound check

    Driver exports pacing stats only on GenP5 and P7 adapters. But while
    parsing the pacing stats, driver has a check for "rdev->dbr_pacing".  This
    caused a trace when KASAN is enabled.
    
    BUG: KASAN: slab-out-of-bounds in bnxt_re_get_hw_stats+0x2b6a/0x2e00 [bnxt_re]
    Write of size 8 at addr ffff8885942a6340 by task modprobe/4809
    
    Fixes: 8b6573f ("bnxt_re: Update the debug counters for doorbell pacing")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Kalesh AP <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Kalesh AP authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    a9e6e74 View commit details
    Browse the repository at this point in the history
  36. RDMA/bnxt_re: Fix incorrect dereference of srq in async event

    Currently driver is not getting correct srq. Dereference only if qplib has
    a valid srq.
    
    Fixes: b02fd3f ("RDMA/bnxt_re: Report async events and errors")
    Link: https://patch.msgid.link/r/[email protected]
    Reviewed-by: Saravanan Vajravel <[email protected]>
    Reviewed-by: Chandramohan Akula <[email protected]>
    Signed-off-by: Kashyap Desai <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    kadesai16 authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    87b4d8d View commit details
    Browse the repository at this point in the history
  37. RDMA/bnxt_re: Return more meaningful error

    When the HWRM command fails, driver currently returns -EFAULT(Bad
    address). This does not look correct.
    
    Modified to return -EIO(I/O error).
    
    Fixes: cc1ec76 ("RDMA/bnxt_re: Fixing the Control path command and response handling")
    Fixes: 65288a2 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Kalesh AP <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Kalesh AP authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    98647df View commit details
    Browse the repository at this point in the history
  38. RDMA/bnxt_re: Fix a possible NULL pointer dereference

    There is a possibility of a NULL pointer dereference in the failure path
    of bnxt_re_add_device().  To address that, moved the update of
    "rdev->adev" to bnxt_re_dev_add().
    
    Fixes: dee3da3 ("RDMA/bnxt_re: Change aux driver data to en_info to hold more information")
    Link: https://patch.msgid.link/r/[email protected]
    Reported-by: Dan Carpenter <[email protected]>
    Closes: https://lore.kernel.org/linux-rdma/CAH-L+nMCwymKGqf5pd8-FZNhxEkDD=kb6AoCaE6fAVi7b3e5Qw@mail.gmail.com/T/#t
    Signed-off-by: Kalesh AP <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Kalesh AP authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    0ba9294 View commit details
    Browse the repository at this point in the history
  39. RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop

    Driver waits indefinitely for the fifo occupancy to go below a threshold
    as soon as the pacing interrupt is received. This can cause soft lockup on
    one of the processors, if the rate of DB is very high.
    
    Add a loop count for FPGA and exit the __wait_for_fifo_occupancy_below_th
    if the loop is taking more time. Pacing will be continuing until the
    occupancy is below the threshold. This is ensured by the checks in
    bnxt_re_pacing_timer_exp and further scheduling the work for pacing based
    on the fifo occupancy.
    
    Fixes: 2ad4e63 ("RDMA/bnxt_re: Implement doorbell pacing algorithm")
    Link: https://patch.msgid.link/r/[email protected]
    Reviewed-by: Kalesh AP <[email protected]>
    Reviewed-by: Chandramohan Akula <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    selvintxavier authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    8be3e5b View commit details
    Browse the repository at this point in the history
  40. RDMA/bnxt_re: Fix an error path in bnxt_re_add_device

    In bnxt_re_add_device(), when register netdev notifier fails, driver is
    not unregistering the IB device in the error cleanup path.  Also, removed
    the duplicate cleanup in error path of bnxt_re_probe.
    
    Fixes: 94a9dc6 ("RDMA/bnxt_re: Group all operations under add_device and remove_device")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Kalesh AP <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Kalesh AP authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    a5e099e View commit details
    Browse the repository at this point in the history
  41. RDMA/bnxt_re: Change the sequence of updating the CQ toggle value

    Currently the CQ toggle value in the shared page (read by the userlib) is
    updated as part of the cqn_handler. There is a potential race of
    application calling the CQ ARM doorbell immediately and using the old
    toggle value.
    
    Change the sequence of updating CQ toggle value to update in the
    bnxt_qplib_service_nq function immediately after reading the toggle value
    to be in sync with the HW updated value.
    
    Fixes: e275919 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Chandramohan Akula <[email protected]>
    Reviewed-by: Selvin Xavier <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    chandramohan-akula authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    2df4113 View commit details
    Browse the repository at this point in the history
  42. RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages

    Avoid memory corruption while setting up Level-2 PBL pages for the non MR
    resources when num_pages > 256K.
    
    There will be a single PDE page address (contiguous pages in the case of >
    PAGE_SIZE), but, current logic assumes multiple pages, leading to invalid
    memory access after 256K PBL entries in the PDE.
    
    Fixes: 0c4dcd6 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Bhargava Chenna Marreddy <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    bmarreddy authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    7988bdb View commit details
    Browse the repository at this point in the history
  43. RDMA/bnxt_re: Fix the GID table length

    GID table length is reported by FW. The gid index which is passed to the
    driver during modify_qp/create_ah is restricted by the sgid_index field of
    struct ib_global_route.  sgid_index is u8 and the max sgid possible is
    256.
    
    Each GID entry in HW will have 2 GID entries in the kernel gid table.  So
    we can support twice the gid table size reported by FW. Also, restrict the
    max GID to 256 also.
    
    Fixes: 847b978 ("RDMA/bnxt_re: Restrict the max_gids to 256")
    Link: https://patch.msgid.link/r/[email protected]
    Signed-off-by: Kalesh AP <[email protected]>
    Signed-off-by: Selvin Xavier <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>
    Kalesh AP authored and jgunthorpe committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    dc5006c View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2024

  1. bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID

    This fixes a kasan splat in the ec device removal tests.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    7d84d9f View commit details
    Browse the repository at this point in the history
  2. bcachefs: Fix invalid shift in member_to_text()

    Reported-by: [email protected]
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    c1bd21b View commit details
    Browse the repository at this point in the history
  3. bcachefs: Fix accounting replay flags

    BCH_TRANS_COMMIT_journal_reclaim without BCH_WATERMARK_reclaim means
    "return an error if low on journal space" - but accounting replay must
    succeed.
    
    Fixes koverstreet/bcachefs#656
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    672f752 View commit details
    Browse the repository at this point in the history
  4. bcachefs: Fix bkey_nocow_lock()

    This fixes an assertion pop in nocow_locking.c
    
    00243 kernel BUG at fs/bcachefs/nocow_locking.c:41!
    00243 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
    00243 Modules linked in:
    00243 Hardware name: linux,dummy-virt (DT)
    00243 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
    00244 pc : bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41)
    00244 lr : bkey_nocow_lock (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:79)
    00244 sp : ffffff80c82373b0
    00244 x29: ffffff80c82373b0 x28: ffffff80e08958c0 x27: ffffff80e0880000
    00244 x26: ffffff80c8237a98 x25: 00000000000000a0 x24: ffffff80c8237ab0
    00244 x23: 00000000000000c0 x22: 0000000000000008 x21: 0000000000000000
    00244 x20: ffffff80c8237a98 x19: 0000000000000018 x18: 0000000000000000
    00244 x17: 0000000000000000 x16: 000000000000003f x15: 0000000000000000
    00244 x14: 0000000000000008 x13: 0000000000000018 x12: 0000000000000000
    00244 x11: 0000000000000000 x10: ffffff80e0880000 x9 : ffffffc0803ac1a4
    00244 x8 : 0000000000000018 x7 : ffffff80c8237a88 x6 : ffffff80c8237ab0
    00244 x5 : ffffff80e08988d0 x4 : 00000000ffffffff x3 : 0000000000000000
    00244 x2 : 0000000000000004 x1 : 0003000000000d1e x0 : ffffff80e08988c0
    00244 Call trace:
    00244 bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41)
    00245 bch2_data_update_init (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:627 (discriminator 1))
    00245 promote_alloc.isra.0 (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:242 /home/testdashboard/linux-7/fs/bcachefs/io_read.c:304)
    00245 __bch2_read_extent (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:949)
    00246 __bch2_read (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:1215)
    00246 bch2_direct_IO_read (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:132)
    00246 bch2_read_iter (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:201)
    00247 aio_read.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:1602)
    00247 io_submit_one.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:2003 /home/testdashboard/linux-7/fs/aio.c:2052)
    00248 __arm64_sys_io_submit (/home/testdashboard/linux-7/fs/aio.c:2111 /home/testdashboard/linux-7/fs/aio.c:2081 /home/testdashboard/linux-7/fs/aio.c:2081)
    00248 invoke_syscall.constprop.0 (/home/testdashboard/linux-7/arch/arm64/include/asm/syscall.h:61 /home/testdashboard/linux-7/arch/arm64/kernel/syscall.c:54)
    00248 ========= FAILED TIMEOUT tiering_variable_buckets_replicas in 1200s
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    9183c2b View commit details
    Browse the repository at this point in the history
  5. bcachefs: Improve check_snapshot_exists()

    Check if we have snapshot_trees or subvolumes that refer to the snapshot
    node being reconstructed, and use them.
    
    With this, the kill_btree_root test that blows away the snapshots btree
    now passes, and we're able to successfully reconstruct.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    c986dd7 View commit details
    Browse the repository at this point in the history
  6. Merge tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/groeck/linux-staging
    
    Pull hwmon fixes from Guenter Roeck:
    
     - Add missing dependencies on REGMAP_I2C for several drivers
    
     - Fix memory leak in adt7475 driver
    
     - Relabel Columbiaville temperature sensor in intel-m10-bmc-hwmon
       driver to match other sensor labels
    
    * tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
      hwmon: (max1668) Add missing dependency on REGMAP_I2C
      hwmon: (ltc2991) Add missing dependency on REGMAP_I2C
      hwmon: (adt7470) Add missing dependency on REGMAP_I2C
      hwmon: (adm9240) Add missing dependency on REGMAP_I2C
      hwmon: (mc34vr500) Add missing dependency on REGMAP_I2C
      hwmon: (tmp513) Add missing dependency on REGMAP_I2C
      hwmon: (adt7475) Fix memory leak in adt7475_fan_pwm_config()
      hwmon: intel-m10-bmc-hwmon: relabel Columbiaville to CVL Die Temperature
    torvalds committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    05749ec View commit details
    Browse the repository at this point in the history
  7. Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/jejb/scsi
    
    Pull SCSI fixes from James Bottomley:
     "Four small fixes, three in drivers and one in the FC transport class
      to add idempotence to state setting"
    
    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
      scsi: scsi_transport_fc: Allow setting rport state to current state
      scsi: wd33c93: Don't use stale scsi_pointer value
      scsi: fnic: Move flush_work initialization out of if block
      scsi: ufs: Use pre-calculated offsets in ufshcd_init_lrb()
    torvalds committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    7234e2e View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2024

  1. Merge tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/powerpc/linux
    
    Pull powerpc fix from Michael Ellerman:
    
     - Fix crash in memcpy on 8xx due to dcbz workaround since recent
       changes
    
    Thanks to Christophe Leroy.
    
    * tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/8xx: Fix kernel DTLB miss on dcbz
    torvalds committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    36c2545 View commit details
    Browse the repository at this point in the history
  2. Merge tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linu…

    …x/kernel/git/gregkh/driver-core
    
    Pull driver core fixes from Greg KH:
     "Here is a single driver core fix, and a .mailmap update.
    
      The fix is for the rust driver core bindings, turned out that the
      from_raw binding wasn't a good idea (don't want to pass a pointer to a
      reference counted object without actually incrementing the pointer.)
      So this change fixes it up as the from_raw binding came in in -rc1.
    
      The other change is a .mailmap update.
    
      Both have been in linux-next for a while with no reported issues"
    
    * tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
      mailmap: update mail for Fiona Behrens
      rust: device: change the from_raw() function
    torvalds committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    f683c9b View commit details
    Browse the repository at this point in the history
  3. Merge tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/gregkh/usb
    
    Pull USB fixes from Greg KH:
     "Here are some small USB fixes for some reported problems for 6.12-rc3.
      Include in here is:
    
       - fix for yurex driver that was caused in -rc1
    
       - build error fix for usbg network filesystem code
    
       - onboard_usb_dev build fix
    
       - dwc3 driver fixes for reported errors
    
       - gadget driver fix
    
       - new USB storage driver quirk
    
       - xhci resume bugfix
    
      All of these have been in linux-next for a while with no reported
      issues"
    
    * tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
      net/9p/usbg: Fix build error
      USB: yurex: kill needless initialization in yurex_read
      Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"
      usb: xhci: Fix problem with xhci resume from suspend
      usb: misc: onboard_usb_dev: introduce new config symbol for usb5744 SMBus support
      usb: dwc3: core: Stop processing of pending events if controller is halted
      usb: dwc3: re-enable runtime PM after failed resume
      usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip
      usb: gadget: core: force synchronous registration
    torvalds committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    ba01565 View commit details
    Browse the repository at this point in the history
  4. Merge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6

    Pull smb client fixes from Steve French:
     "Two fixes for Windows symlink handling"
    
    * tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6:
      cifs: Fix creating native symlinks pointing to current or parent directory
      cifs: Improve creating native symlinks pointing to directory
    torvalds committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    cfea70e View commit details
    Browse the repository at this point in the history
  5. Linux 6.12-rc3

    torvalds committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    8e929cb View commit details
    Browse the repository at this point in the history
  6. bcachefs: fix uaf in bch2_dio_write_done()

    Reported-by: [email protected]
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    573ddcd View commit details
    Browse the repository at this point in the history
  7. bcachefs: Fix missing bounds checks in bch2_alloc_read()

    We were checking that the alloc key was for a valid device, but not a
    valid bucket.
    
    This is the upgrade path from versions prior to bcachefs being mainlined.
    
    Reported-by: [email protected]
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    a319aea View commit details
    Browse the repository at this point in the history
  8. bcachefs: Add missing validation for bch_stripe.csum_granularity_bits

    Reported-by: [email protected]
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    9f25dbe View commit details
    Browse the repository at this point in the history
  9. Merge tag 'hid-for-linus-2024101301' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/hid/hid
    
    Pull HID fixes from Jiri Kosina:
    
     - fix for memory corruption regression in amd_sfh driver (Basavaraj
       Natikar)
    
     - fix for mis-reporting of BTN_TOOL_PEN and BTN_TOOL_RUBBER for AES
       sensors tools in Wacom driver (Jason Gerecke)
    
     - fix for unitialized variable use in intel-ish-hid driver
       (SurajSonawane2415)
    
     - a few device-specific quirks / device ID additions
    
    * tag 'hid-for-linus-2024101301' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
      HID: wacom: Hardcode (non-inverted) AES pens as BTN_TOOL_PEN
      HID: amd_sfh: Switch to device-managed dmam_alloc_coherent()
      HID: multitouch: Add quirk for HONOR MagicBook Art 14 touchpad
      HID: multitouch: Add support for B2402FVA track point
      HID: plantronics: Workaround for an unexcepted opposite volume key
      hid: intel-ish-hid: Fix uninitialized variable 'rv' in ish_fw_xfer_direct_dma
    torvalds committed Oct 13, 2024
    Configuration menu
    Copy the full SHA
    6485cf5 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2024

  1. bcachefs: Fix kasan splat in new_stripe_alloc_buckets()

    Update for BCH_SB_MEMBER_INVALID.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    b1e5622 View commit details
    Browse the repository at this point in the history
  2. bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev

    When creating a new stripe, we may reuse an existing stripe that has
    some empty and some nonempty blocks.
    
    Generally, the existing stripe won't change underneath us - except for
    block sector counts, which we copy to the new key in
    ec_stripe_key_update.
    
    But the device removal path can now invalidate stripe pointers to a
    device, and that can race with stripe reuse.
    
    Change ec_stripe_key_update() to check for and resolve this
    inconsistency.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    cb6055e View commit details
    Browse the repository at this point in the history
  3. bcachefs: Fix sysfs warning in fstests generic/730,731

    sysfs warns if we're removing a symlink from a directory that's no
    longer in sysfs; this is triggered by fstests generic/730, which
    simulates hot removal of a block device.
    
    This patch is however not a correct fix, since checking
    kobj->state_in_sysfs on a kobj owned by another subsystem is racy.
    
    A better fix would be to add the appropriate check to
    sysfs_remove_link() - and sysfs_create_link() as well.
    
    But kobject_add_internal()/kobject_del() do not as of today have locking
    that would support that.
    
    Note that the block/holder.c code appears to be subject to this race as
    well.
    
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: "Rafael J. Wysocki" <[email protected]>
    Cc:  Christoph Hellwig <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    5e3b723 View commit details
    Browse the repository at this point in the history
  4. Merge tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/xiang/erofs
    
    Pull erofs fixes from Gao Xiang:
     "The main one fixes a syzbot issue due to the invalid inode type out of
      file-backed mounts. The others are minor cleanups without actual logic
      changes.
    
      Summary:
    
       - Make sure only regular inodes can be used for file-backed mounts
    
       - Two minor codebase cleanups"
    
    * tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
      erofs: get rid of kaddr in `struct z_erofs_maprecorder`
      erofs: get rid of z_erofs_try_to_claim_pcluster()
      erofs: ensure regular inodes for file-backed mounts
    torvalds committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    63fa605 View commit details
    Browse the repository at this point in the history
  5. Merge tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/jaegeuk/f2fs
    
    Pull f2fs fix from Jaegeuk Kim:
     "An urgent fix to resolve DIO read performance regression caused by
      'f2fs: fix to avoid racing in between read and OPU dio write'"
    
    * tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
      f2fs: allow parallel DIO reads
    torvalds committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    eca631b View commit details
    Browse the repository at this point in the history
  6. ring-buffer: Fix refcount setting of boot mapped buffers

    A ring buffer which has its buffered mapped at boot up to fixed memory
    should not be freed. Other buffers can be. The ref counting setup was
    wrong for both. It made the not mapped buffers ref count have zero, and the
    boot mapped buffer a ref count of 1. But an normally allocated buffer
    should be 1, where it can be removed.
    
    Keep the ref count of a normal boot buffer with its setup ref count (do
    not decrement it), and increment the fixed memory boot mapped buffer's ref
    count.
    
    Cc: Mathieu Desnoyers <[email protected]>
    Link: https://lore.kernel.org/[email protected]
    Fixes: e645535 ("tracing: Add option to use memmapped memory for trace boot instance")
    Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    rostedt committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    2cf9733 View commit details
    Browse the repository at this point in the history
  7. sched_ext: Remove unnecessary cpu_relax()

    As described in commit b07996c ("sched_ext: Don't hold
    scx_tasks_lock for too long"), we're doing a cond_resched() every 32
    calls to scx_task_iter_next() to avoid RCU and other stalls. That commit
    also added a cpu_relax() to the codepath where we drop and reacquire the
    lock, but as Waiman described in [0], cpu_relax() should only be
    necessary in busy loops to avoid pounding on a cacheline (or to allow a
    hypertwin to more fully utilize a core).
    
    Let's remove the unnecessary cpu_relax().
    
    [0]: https://lore.kernel.org/all/[email protected]/
    
    Cc: Waiman Long <[email protected]>
    Signed-off-by: David Vernet <[email protected]>
    Signed-off-by: Tejun Heo <[email protected]>
    Byte-Lab authored and htejun committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    60e339b View commit details
    Browse the repository at this point in the history

Commits on Oct 15, 2024

  1. ring-buffer: Fix reader locking when changing the sub buffer order

    The function ring_buffer_subbuf_order_set() updates each
    ring_buffer_per_cpu and installs new sub buffers that match the requested
    page order. This operation may be invoked concurrently with readers that
    rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or
    the ring_buffer_per_cpu.pages and reader_page pointers. However, no
    exclusive access is acquired by ring_buffer_subbuf_order_set(). Modifying
    the mentioned data while a reader also operates on them can then result in
    incorrect memory access and various crashes.
    
    Fix the problem by taking the reader_lock when updating a specific
    ring_buffer_per_cpu in ring_buffer_subbuf_order_set().
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
    
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Link: https://lore.kernel.org/[email protected]
    Fixes: 8e7b58c ("ring-buffer: Just update the subbuffers when changing their allocation order")
    Signed-off-by: Petr Pavlu <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    petrpavlu authored and rostedt committed Oct 15, 2024
    Configuration menu
    Copy the full SHA
    09661f7 View commit details
    Browse the repository at this point in the history
  2. Merge tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs

    Pull bcachefs fixes from Kent Overstreet:
    
     - New metadata version inode_has_child_snapshots
    
       This fixes bugs with handling of unlinked inodes + snapshots, in
       particular when an inode is reattached after taking a snapshot;
       deleted inodes now get correctly cleaned up across snapshots.
    
     - Disk accounting rewrite fixes
         - validation fixes for when a device has been removed
         - fix journal replay failing with "journal_reclaim_would_deadlock"
    
     - Some more small fixes for erasure coding + device removal
    
     - Assorted small syzbot fixes
    
    * tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs: (27 commits)
      bcachefs: Fix sysfs warning in fstests generic/730,731
      bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev
      bcachefs: Fix kasan splat in new_stripe_alloc_buckets()
      bcachefs: Add missing validation for bch_stripe.csum_granularity_bits
      bcachefs: Fix missing bounds checks in bch2_alloc_read()
      bcachefs: fix uaf in bch2_dio_write_done()
      bcachefs: Improve check_snapshot_exists()
      bcachefs: Fix bkey_nocow_lock()
      bcachefs: Fix accounting replay flags
      bcachefs: Fix invalid shift in member_to_text()
      bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID
      bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry
      bcachefs: Check if stuck in journal_res_get()
      closures: Add closure_wait_event_timeout()
      bcachefs: Fix state lock involved deadlock
      bcachefs: Fix NULL pointer dereference in bch2_opt_to_text
      bcachefs: Release transaction before wake up
      bcachefs: add check for btree id against max in try read node
      bcachefs: Disk accounting device validation fixes
      bcachefs: bch2_inode_or_descendents_is_open()
      ...
    torvalds committed Oct 15, 2024
    Configuration menu
    Copy the full SHA
    bdc7276 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/sc…

    …m/linux/kernel/git/trace/linux-trace
    
    Pull ring-buffer fixes from Steven Rostedt:
    
     - Fix ref counter of buffers assigned at boot up
    
       A tracing instance can be created from the kernel command line. If it
       maps to memory, it is considered permanent and should not be deleted,
       or bad things can happen. If it is not mapped to memory, then the
       user is fine to delete it via rmdir from the instances directory. But
       the ref counts assumed 0 was free to remove and greater than zero was
       not. But this was not the case. When an instance is created, it
       should have the reference of 1, and if it should not be removed, it
       must be greater than 1. The boot up code set normal instances with a
       ref count of 0, which could get removed if something accessed it and
       then released it. And memory mapped instances had a ref count of 1
       which meant it could be deleted, and bad things happen. Keep normal
       instances ref count as 1, and set memory mapped instances ref count
       to 2.
    
     - Protect sub buffer size (order) updates from other modifications
    
       When a ring buffer is changing the size of its sub-buffers, no other
       operations should be performed on the ring buffer. That includes
       reading it. But the locking only grabbed the buffer->mutex that keeps
       some operations from touching the ring buffer. It also must hold the
       cpu_buffer->reader_lock as well when updates happen as other paths
       use that to do some operations on the ring buffer.
    
    * tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
      ring-buffer: Fix reader locking when changing the sub buffer order
      ring-buffer: Fix refcount setting of boot mapped buffers
    torvalds committed Oct 15, 2024
    Configuration menu
    Copy the full SHA
    2f87d09 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2024

  1. Merge tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/…

    …scm/linux/kernel/git/tj/sched_ext
    
    Pull sched_ext fixes from Tejun Heo:
    
     - More issues reported in the enable/disable paths on large machines
       with many tasks due to scx_tasks_lock being held too long. Break up
       the task iterations
    
     - Remove ops.select_cpu() dependency in bypass mode so that a
       misbehaving implementation can't live-lock the machine by pushing all
       tasks to few CPUs in bypass mode
    
     - Other misc fixes
    
    * tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
      sched_ext: Remove unnecessary cpu_relax()
      sched_ext: Don't hold scx_tasks_lock for too long
      sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers
      sched_ext: bypass mode shouldn't depend on ops.select_cpu()
      sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()
      sched_ext: Start schedulers with consistent p->scx.slice values
      Revert "sched_ext: Use shorter slice while bypassing"
      sched_ext: use correct function name in pick_task_scx() warning message
      selftests: sched_ext: Add sched_ext as proper selftest target
    torvalds committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    dff6584 View commit details
    Browse the repository at this point in the history
  2. Merge tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git…

    …/herbert/crypto-2.6
    
    Pull crypto fixes from Herbert Xu:
    
     - Remove bogus testmgr ENOENT error messages
    
     - Ensure algorithm is still alive before marking it as tested
    
     - Disable buggy hash algorithms in marvell/cesa
    
    * tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
      crypto: marvell/cesa - Disable hash algorithms
      crypto: testmgr - Hide ENOENT errors better
      crypto: api - Fix liveliness check in crypto_alg_tested
    torvalds committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    6f6fc39 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

    Pull smb server fixes from Steve French:
    
     - fix race between session setup and session logoff
    
     - add supplementary group support
    
    * tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
      ksmbd: add support for supplementary groups
      ksmbd: fix user-after-free from session log off
    torvalds committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    9f635d4 View commit details
    Browse the repository at this point in the history
  4. Merge tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
    
     - regression fix: dirty extents tracked in xarray for qgroups must be
       adjusted for 32bit platforms
    
     - fix potentially freeing uninitialized name in fscrypt structure
    
     - fix warning about unneeded variable in a send callback
    
    * tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: fix uninitialized pointer free on read_alloc_one_name() error
      btrfs: send: cleanup unneeded return variable in changed_verity()
      btrfs: fix uninitialized pointer free in add_inode_ref()
      btrfs: use sector numbers as keys for the dirty extents xarray
    torvalds committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    667b1d4 View commit details
    Browse the repository at this point in the history
  5. Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/rdma/rdma
    
    Pull rdma fixes from Jason Gunthorpe:
     "Several miscellaneous fixes. A lot of bnxt_re activity, there will be
      more rc patches there coming.
    
       - Many bnxt_re bug fixes - Memory leaks, kasn, NULL pointer deref,
         soft lockups, error unwinding and some small functional issues
    
       - Error unwind bug in rdma netlink
    
       - Two issues with incorrect VLAN detection for iWarp
    
       - skb_splice_from_iter() splat in siw
    
       - Give SRP slab caches unique names to resolve the merge window
         WARN_ON regression"
    
    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
      RDMA/bnxt_re: Fix the GID table length
      RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
      RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
      RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
      RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
      RDMA/bnxt_re: Fix a possible NULL pointer dereference
      RDMA/bnxt_re: Return more meaningful error
      RDMA/bnxt_re: Fix incorrect dereference of srq in async event
      RDMA/bnxt_re: Fix out of bound check
      RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
      RDMA/srpt: Make slab cache names unique
      RDMA/irdma: Fix misspelling of "accept*"
      RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
      RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES
      RDMA/core: Fix ENODEV error for iWARP test over vlan
      RDMA/nldev: Fix NULL pointer dereferences issue in rdma_nl_notify_event
      RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
      RDMA/bnxt_re: Add a check for memory allocation
      RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
      RDMA/bnxt_re: Fix a possible memory leak
    torvalds committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    c964ced View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2024

  1. btrfs: don't take dev_replace rwsem on task already holding it

    Running fstests btrfs/011 with MKFS_OPTIONS="-O rst" to force the usage of
    the RAID stripe-tree, we get the following splat from lockdep:
    
     BTRFS info (device sdd): dev_replace from /dev/sdd (devid 1) to /dev/sdb started
    
     ============================================
     WARNING: possible recursive locking detected
     6.11.0-rc3-btrfs-for-next torvalds#599 Not tainted
     --------------------------------------------
     btrfs/2326 is trying to acquire lock:
     ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
    
     but task is already holding lock:
     ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
    
     other info that might help us debug this:
      Possible unsafe locking scenario:
    
            CPU0
            ----
       lock(&fs_info->dev_replace.rwsem);
       lock(&fs_info->dev_replace.rwsem);
    
      *** DEADLOCK ***
    
      May be due to missing lock nesting notation
    
     1 lock held by btrfs/2326:
      #0: ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
    
     stack backtrace:
     CPU: 1 UID: 0 PID: 2326 Comm: btrfs Not tainted 6.11.0-rc3-btrfs-for-next torvalds#599
     Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
     Call Trace:
      <TASK>
      dump_stack_lvl+0x5b/0x80
      __lock_acquire+0x2798/0x69d0
      ? __pfx___lock_acquire+0x10/0x10
      ? __pfx___lock_acquire+0x10/0x10
      lock_acquire+0x19d/0x4a0
      ? btrfs_map_block+0x39f/0x2250
      ? __pfx_lock_acquire+0x10/0x10
      ? find_held_lock+0x2d/0x110
      ? lock_is_held_type+0x8f/0x100
      down_read+0x8e/0x440
      ? btrfs_map_block+0x39f/0x2250
      ? __pfx_down_read+0x10/0x10
      ? do_raw_read_unlock+0x44/0x70
      ? _raw_read_unlock+0x23/0x40
      btrfs_map_block+0x39f/0x2250
      ? btrfs_dev_replace_by_ioctl+0xd69/0x1d00
      ? btrfs_bio_counter_inc_blocked+0xd9/0x2e0
      ? __kasan_slab_alloc+0x6e/0x70
      ? __pfx_btrfs_map_block+0x10/0x10
      ? __pfx_btrfs_bio_counter_inc_blocked+0x10/0x10
      ? kmem_cache_alloc_noprof+0x1f2/0x300
      ? mempool_alloc_noprof+0xed/0x2b0
      btrfs_submit_chunk+0x28d/0x17e0
      ? __pfx_btrfs_submit_chunk+0x10/0x10
      ? bvec_alloc+0xd7/0x1b0
      ? bio_add_folio+0x171/0x270
      ? __pfx_bio_add_folio+0x10/0x10
      ? __kasan_check_read+0x20/0x20
      btrfs_submit_bio+0x37/0x80
      read_extent_buffer_pages+0x3df/0x6c0
      btrfs_read_extent_buffer+0x13e/0x5f0
      read_tree_block+0x81/0xe0
      read_block_for_search+0x4bd/0x7a0
      ? __pfx_read_block_for_search+0x10/0x10
      btrfs_search_slot+0x78d/0x2720
      ? __pfx_btrfs_search_slot+0x10/0x10
      ? lock_is_held_type+0x8f/0x100
      ? kasan_save_track+0x14/0x30
      ? __kasan_slab_alloc+0x6e/0x70
      ? kmem_cache_alloc_noprof+0x1f2/0x300
      btrfs_get_raid_extent_offset+0x181/0x820
      ? __pfx_lock_acquire+0x10/0x10
      ? __pfx_btrfs_get_raid_extent_offset+0x10/0x10
      ? down_read+0x194/0x440
      ? __pfx_down_read+0x10/0x10
      ? do_raw_read_unlock+0x44/0x70
      ? _raw_read_unlock+0x23/0x40
      btrfs_map_block+0x5b5/0x2250
      ? __pfx_btrfs_map_block+0x10/0x10
      scrub_submit_initial_read+0x8fe/0x11b0
      ? __pfx_scrub_submit_initial_read+0x10/0x10
      submit_initial_group_read+0x161/0x3a0
      ? lock_release+0x20e/0x710
      ? __pfx_submit_initial_group_read+0x10/0x10
      ? __pfx_lock_release+0x10/0x10
      scrub_simple_mirror.isra.0+0x3eb/0x580
      scrub_stripe+0xe4d/0x1440
      ? lock_release+0x20e/0x710
      ? __pfx_scrub_stripe+0x10/0x10
      ? __pfx_lock_release+0x10/0x10
      ? do_raw_read_unlock+0x44/0x70
      ? _raw_read_unlock+0x23/0x40
      scrub_chunk+0x257/0x4a0
      scrub_enumerate_chunks+0x64c/0xf70
      ? __mutex_unlock_slowpath+0x147/0x5f0
      ? __pfx_scrub_enumerate_chunks+0x10/0x10
      ? bit_wait_timeout+0xb0/0x170
      ? __up_read+0x189/0x700
      ? scrub_workers_get+0x231/0x300
      ? up_write+0x490/0x4f0
      btrfs_scrub_dev+0x52e/0xcd0
      ? create_pending_snapshots+0x230/0x250
      ? __pfx_btrfs_scrub_dev+0x10/0x10
      btrfs_dev_replace_by_ioctl+0xd69/0x1d00
      ? lock_acquire+0x19d/0x4a0
      ? __pfx_btrfs_dev_replace_by_ioctl+0x10/0x10
      ? lock_release+0x20e/0x710
      ? btrfs_ioctl+0xa09/0x74f0
      ? __pfx_lock_release+0x10/0x10
      ? do_raw_spin_lock+0x11e/0x240
      ? __pfx_do_raw_spin_lock+0x10/0x10
      btrfs_ioctl+0xa14/0x74f0
      ? lock_acquire+0x19d/0x4a0
      ? find_held_lock+0x2d/0x110
      ? __pfx_btrfs_ioctl+0x10/0x10
      ? lock_release+0x20e/0x710
      ? do_sigaction+0x3f0/0x860
      ? __pfx_do_vfs_ioctl+0x10/0x10
      ? do_raw_spin_lock+0x11e/0x240
      ? lockdep_hardirqs_on_prepare+0x270/0x3e0
      ? _raw_spin_unlock_irq+0x28/0x50
      ? do_sigaction+0x3f0/0x860
      ? __pfx_do_sigaction+0x10/0x10
      ? __x64_sys_rt_sigaction+0x18e/0x1e0
      ? __pfx___x64_sys_rt_sigaction+0x10/0x10
      ? __x64_sys_close+0x7c/0xd0
      __x64_sys_ioctl+0x137/0x190
      do_syscall_64+0x71/0x140
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
     RIP: 0033:0x7f0bd1114f9b
     Code: Unable to access opcode bytes at 0x7f0bd1114f71.
     RSP: 002b:00007ffc8a8c3130 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f0bd1114f9b
     RDX: 00007ffc8a8c35e0 RSI: 00000000ca289435 RDI: 0000000000000003
     RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007
     R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc8a8c6c85
     R13: 00000000398e72a0 R14: 0000000000004361 R15: 0000000000000004
      </TASK>
    
    This happens because on RAID stripe-tree filesystems we recurse back into
    btrfs_map_block() on scrub to perform the logical to device physical
    mapping.
    
    But as the device replace task is already holding the dev_replace::rwsem
    we deadlock.
    
    So don't take the dev_replace::rwsem in case our task is the task performing
    the device replace.
    
    Suggested-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    fe46c64 View commit details
    Browse the repository at this point in the history
  2. btrfs: make assert_rbio() to only check CONFIG_BTRFS_ASSERT

    According to the description, CONFIG_BTRFS_DEBUG is only for extra
    debug info, meanwhile sanity checks should be managed by
    CONFIG_BTRFS_ASSERT.
    
    There is no need to check both to enable assert_rbio().
    
    Just remove the check for CONFIG_BTRFS_DEBUG.
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    5b3b62a View commit details
    Browse the repository at this point in the history
  3. btrfs: split out CONFIG_BTRFS_EXPERIMENTAL from CONFIG_BTRFS_DEBUG

    Currently CONFIG_BTRFS_EXPERIMENTAL is not only for the extra debugging
    output, but also for experimental features.
    
    This is not ideal to distinguish planned but not yet stable features
    from those purely designed for debugging.
    
    This patch splits the following features into CONFIG_BTRFS_EXPERIMENTAL:
    
    - Extent map shrinker
      This seems to be the first one to exit experimental.
    
    - Extent tree v2
      This seems to be the last one to graduate from experimental.
    
    - Raid stripe tree
    - Csum offload mode
    - Send protocol v3
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    8f83607 View commit details
    Browse the repository at this point in the history
  4. btrfs: zlib: make the compression path to handle sector size < page size

    Inside zlib_compress_folios(), each time we switch the input page cache,
    the @start is increased by PAGE_SIZE.
    
    But for the incoming compression support for sector size < page size
    (previously we support compression only when the range is fully page
    aligned), this is not going to handle the following case:
    
        0          32K         64K          96K
        |          |///////////||///////////|
    
    @start has the initial value 32K, indicating the start filepos of the
    to-be-compressed range.
    
    And when grabbing the first page as input, we always call "start +=
    PAGE_SIZE;".
    
    But since @start is starting at 32K, it will be increased by 64K,
    resulting it to be 96K for the next range, causing incorrect input range
    and corruption for the future subpage compression.
    
    Fix it by only increase @start by the input size.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    18c2be4 View commit details
    Browse the repository at this point in the history
  5. btrfs: zstd: make the compression path to handle sector size < page size

    Inside zstd_compress_folios(), after exhausted one input page, we need
    to switch to the next page as input.
    
    However when counting the total input bytes (@tot_in), we always increase
    it by PAGE_SIZE.
    
    For the following case, it can cause incorrect value:
    
            0          32K         64K          96K
            |          |///////////||///////////|
    
    After compressing range [32K, 64K), we switch to the next page, and
    increasing @tot_in by 64K, while we only read 32K.
    
    This will cause the @total_in to return a value larger than the input
    length.
    
    Fix it by only increase @tot_in by the input size.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    17a51a0 View commit details
    Browse the repository at this point in the history
  6. btrfs: compression: add an ASSERT() to ensure the read-in length is sane

    There are already two bugs (one in zlib, one in zstd) that involved
    compression path is not handling sector size < page size cases well.
    
    So it makes more sense to make sure that btrfs_compress_folios() returns
    
    Since we already have two bugs (one in zlib, one in zstd) in the
    compression path resulting the @total_in be to larger than the
    to-be-compressed range length, there is enough reason to add an ASSERT()
    to make sure the total read-in length doesn't exceed the input length.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    6eb293f View commit details
    Browse the repository at this point in the history
  7. btrfs: wait for writeback if sector size is smaller than page size

    [PROBLEM]
    If sector perfect compression is enabled for sector size < page size
    case, the following case can lead dirty ranges not being written back:
    
         0     32K     64K     96K     128K
         |     |///////||//////|     |/|
                                     124K
    
    In above example, the page size is 64K, and we need to write back above
    two pages.
    
    - Submit for page 0 (main thread)
      We found delalloc range [32K, 96K), which can be compressed.
      So we queue an async range for [32K, 96K).
      This means, the page unlock/clearing dirty/setting writeback will
      all happen in a workqueue context.
    
    - The compression is done, and compressed range is submitted (workqueue)
      Since the compression is done in asynchronously, the compression can
      be done before the main thread to submit for page 64K.
    
      Now the whole range [32K, 96K), involving two pages, will be marked
      writeback.
    
    - Submit for page 64K (main thread)
      extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE,
      so it skips the writeback wait.
    
      And unlock the page and exit. This means the dirty range [124K, 128K)
      will never be submitted, until next writeback happens for page 64K.
    
    This will never happen for previous kernels because:
    
    - For sector size == page size case
      Since one page is one sector, if a page is marked writeback it will
      not have dirty flags.
      So this corner case will never hit.
    
    - For sector size < page size case
      We never do subpage compression, a range can only be submitted for
      compression if the range is fully page aligned.
      This change makes the subpage behavior mostly the same as non-subpage
      cases.
    
    [ENHANCEMENT]
    Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then
    always wait for writeback flags.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    50e2162 View commit details
    Browse the repository at this point in the history
  8. btrfs: make extent_range_clear_dirty_for_io() to handle sector size <…

    … page size cases
    
    For btrfs with sector size < page size (e.g. 4K sector size, 64K page
    size), and enable the sector perfect compression support, then the
    following dirty range can lead to problems:
    
       0     32K     64K     96K    128K
       |     |///////||//////|    |/|
                                  124K
    
    In above case, if we start writeback for that inode, the last dirty
    range [124K, 128K) will not be submitted and cause reserved space
    leakage:
    
    - Start writeback for page 0
      We find the range [32K, 96K) is suitable for compression, and queue it
      into a workqueue to do the delayed compression and submission.
    
    - Compression happens for range [32K, 96K)
      Function extent_range_clear_dirty_for_io() is called, however it is
      only doing full page handling, not considering any the extra bitmaps
      for subpage cases.
    
      That function will clear page dirty for both page 0 and page 64K.
    
    - Writeback for the inode is done
      Because page 64K has its dirty flag cleared, it will not be considered
      as a writeback target.
    
    This means the range [124K, 128K) will not be submitted, and reserved
    space for it will be leaked.
    
    Fix this problem by using the subpage helper to clear the dirty flag.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    2923eaf View commit details
    Browse the repository at this point in the history
  9. btrfs: do not assume the full page range is not dirty in extent_write…

    …page_io()
    
    The function extent_writepage_io() will submit the dirty sectors inside
    the page for the write.
    
    But recently to co-operate with the incoming subpage compression
    enhancement, a new bitmap is introduced to
    btrfs_bio_ctrl::submit_bitmap, to only avoid a subset of the dirty
    range.
    
    This is because we can have the following cases with 64K page size:
    
        0      16K       32K       48K       64K
        |      |/////////|         |/|
                                     52K
    
    For range [16K, 32K), we queue the dirty range for compression, which is
    ran in a delayed workqueue.
    Then for range [48K, 52K), we go through the regular submission path.
    
    In that case, our btrfs_bio_ctrl::submit_bitmap will exclude the range
    [16K, 32K).
    
    The dirty flags for the range [16K, 32K) is only cleared when the
    compression is done, by the extent_clear_unlock_delalloc() call inside
    submit_one_async_extent().
    
    This patch fix the false alert by removing the
    btrfs_folio_assert_not_dirty() check, since it's no longer correct for
    subpage compression cases.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    5ce7471 View commit details
    Browse the repository at this point in the history
  10. btrfs: move the delalloc range bitmap search into extent_io.c

    Currently for subpage (sector size < page size) cases, we reuse subpage
    locked bitmap to find out all delalloc ranges we have locked, and run
    all those found ranges.
    
    However such reuse is not perfect, e.g.:
    
        0       32K      64K      96K       128K
        |       |////////||///////|    |////|
                                       120K
    
    For above range, writepage_delalloc() for page 0 will handle the range
    [32K, 96k), note delalloc range can be beyond the page boundary.
    
    But writepage_delalloc() for page 64K will only handle range [120K,
    128K), as the previous run on page 0 has already handled range [64K,
    96K).
    Meanwhile for the writeback we should expect range [64K, 96K) to also be
    locked, this leads to the mismatch from locked bitmap and delalloc
    range.
    
    This is not causing problems yet, but it's still an inconsistent
    behavior.
    
    So instead of relying on the subpage locked bitmap, move the delalloc
    range search using local @delalloc_bitmap, so that we can remove the
    existing btrfs_folio_find_writer_locked().
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    938449b View commit details
    Browse the repository at this point in the history
  11. btrfs: mark all dirty sectors as locked inside writepage_delalloc()

    Currently we only mark sectors as locked if there is a *NEW* delalloc
    range for it.
    
    But NEW delalloc range is not the same as dirty sectors we want to
    submit, e.g:
    
            0       32K      64K      96K       128K
            |       |////////||///////|    |////|
                                           120K
    
    For above 64K page size case, writepage_delalloc() for page 0 will find
    and lock the delalloc range [32K, 96K), which is beyond the page
    boundary.
    
    Then when writepage_delalloc() is called for the page 64K, since [64K,
    96K) is already locked, only [120K, 128K) will be locked.
    
    This means, although range [64K, 96K) is dirty and will be submitted
    later by extent_writepage_io(), it will not be marked as locked.
    
    This is fine for now, as we call btrfs_folio_end_writer_lock_bitmap() to
    free every non-compressed sector, and compression is only allowed for
    full page range.
    
    But this is not safe for future sector perfect compression support, as
    this can lead to double folio unlock:
    
                  Thread A                 |           Thread B
    ---------------------------------------+--------------------------------
                                           | submit_one_async_extent()
    				       | |- extent_clear_unlock_delalloc()
    extent_writepage()                     |    |- btrfs_folio_end_writer_lock()
    |- btrfs_folio_end_writer_lock_bitmap()|       |- btrfs_subpage_end_and_test_writer()
       |                                   |       |  |- atomic_sub_and_test()
       |                                   |       |     /* Now the atomic value is 0 */
       |- if (atomic_read() == 0)          |       |
       |- folio_unlock()                   |       |- folio_unlock()
    
    The root cause is the above range [64K, 96K) is dirtied and should also
    be locked but it isn't.
    
    So to make everything more consistent and prepare for the incoming
    sector perfect compression, mark all dirty sectors as locked.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    81b8cc5 View commit details
    Browse the repository at this point in the history
  12. btrfs: allow compression even if the range is not page aligned

    Previously for btrfs with sector size smaller than page size (subpage),
    we only allow compression if the range is fully page aligned.
    
    This is to work around the asynchronous submission of compressed range,
    which delayed the page unlock and writeback into a workqueue,
    furthermore asynchronous submission can lock multiple sector range
    across page boundary.
    
    Such asynchronous submission makes it very hard to co-operate with other
    regular writes.
    
    With the recent changes to the subpage folio unlock path, now
    asynchronous submission of compressed pages can co-operate with regular
    submission, so enable sector perfect compression if it's an experimental
    build.
    
    The ETA for moving this feature out of experimental is 6.15, and I hope
    all remaining corner cases can be exposed before that.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    6326713 View commit details
    Browse the repository at this point in the history
  13. btrfs: avoid unnecessary device path update for the same device

    [PROBLEM]
    It is very common for udev to trigger device scan, and every time a
    mounted btrfs device got re-scan from different soft links, we will get
    some of unnecessary device path updates, this is especially common
    for LVM based storage:
    
     # lvs
      scratch1 test -wi-ao---- 10.00g
      scratch2 test -wi-a----- 10.00g
      scratch3 test -wi-a----- 10.00g
      scratch4 test -wi-a----- 10.00g
      scratch5 test -wi-a----- 10.00g
      test     test -wi-a----- 10.00g
    
     # mkfs.btrfs -f /dev/test/scratch1
     # mount /dev/test/scratch1 /mnt/btrfs
     # dmesg -c
     [  205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154)
     [  205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9
     [  205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm
     [  205.713856] BTRFS info (device dm-4): using free-space-tree
     [  205.722324] BTRFS info (device dm-4): checking UUID tree
    
    So far so good, but even if we just touched any soft link of
    "dm-4", we will get quite some unnecessary device path updates.
    
     # touch /dev/mapper/test-scratch1
     # dmesg -c
     [  469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221)
     [  469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221)
    
    Such device path rename is unnecessary and can lead to random path
    change due to the udev race.
    
    [CAUSE]
    Inside device_list_add(), we are using a very primitive way checking if
    the device has changed, strcmp().
    
    Which can never handle links well, no matter if it's hard or soft links.
    
    So every different link of the same device will be treated as a different
    device, causing the unnecessary device path update.
    
    [FIX]
    Introduce a helper, is_same_device(), and use path_equal() to properly
    detect the same block device.
    So that the different soft links won't trigger the rename race.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
    Reported-by: Fabian Vogt <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    f7f6d8e View commit details
    Browse the repository at this point in the history
  14. btrfs: canonicalize the device path before adding it

    [PROBLEM]
    Currently btrfs accepts any file path for its device, resulting some
    weird situation:
    
     # ./mount_by_fd /dev/test/scratch1  /mnt/btrfs/
    
    The program has the following source code:
    
     #include <fcntl.h>
     #include <stdio.h>
     #include <sys/mount.h>
    
     int main(int argc, char *argv[]) {
    	int fd = open(argv[1], O_RDWR);
    	char path[256];
    	snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
    	return mount(path, argv[2], "btrfs", 0, NULL);
     }
    
    Then we can have the following weird device path:
    
     BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440)
    
    Normally it's not a big deal, and later udev can trigger a device path
    rename. But if udev didn't trigger, the device path "/proc/self/fd/3"
    will show up in mtab.
    
    [CAUSE]
    For filename "/proc/self/fd/3", it means the opened file descriptor 3.
    In above case, it's exactly the device we want to open, aka points to
    "/dev/test/scratch1" which is another symlink pointing to "/dev/dm-2".
    
    Inside kernel we solve the mount source using LOOKUP_FOLLOW, which
    follows the symbolic link and grab the proper block device.
    
    But inside btrfs we also save the filename into btrfs_device::name, and
    utilize that member to report our mount source, which leads to the above
    situation.
    
    [FIX]
    Instead of unconditionally trust the path, check if the original file
    (not following the symbolic link) is inside "/dev/", if not, then
    manually lookup the path to its final destination, and use that as our
    device path.
    
    This allows us to still use symbolic links, like
    "/dev/mapper/test-scratch" from LVM2, which is required for fstests runs
    with LVM2 setup.
    
    And for really weird names, like the above case, we solve it to
    "/dev/dm-2" instead.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
    Reported-by: Fabian Vogt <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    4f0ed68 View commit details
    Browse the repository at this point in the history
  15. btrfs: remove code duplication in ordered extent finishing

    Remove the duplicated transaction joining, block reserve setting and raid
    extent inserting in btrfs_finish_ordered_extent().
    
    While at it, also abort the transaction in case inserting a RAID
    stripe-tree entry fails.
    
    Suggested-by: Naohiro Aota <[email protected]>
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    9078516 View commit details
    Browse the repository at this point in the history
  16. btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_r…

    …ecord
    
    Now that we track qgroup extent records in a xarray we don't need to have
    a "bytenr" field in  struct btrfs_qgroup_extent_record, since we can get
    it from the index of the record in the xarray.
    
    So remove the field and grab the bytenr from either the index key or any
    other place where it's available (delayed refs). This reduces the size of
    struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning
    that we now can store 128 instances of this structure instead of 102 per
    4K page.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    d217a8f View commit details
    Browse the repository at this point in the history
  17. btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent…

    …_post()
    
    Instead of extracting fs_info from the transaction multiples times, store
    it in a local variable and use it.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    bc10046 View commit details
    Browse the repository at this point in the history
  18. btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_…

    …extent()
    
    There's no need to hold the delayed refs spinlock when calling
    btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since
    it doesn't change anything in delayed refs and it only changes the xarray
    used to track qgroup extent records, which is protected by the xarray's
    lock.
    
    Holding the lock is only adding unnecessary lock contention with other
    tasks that actually need to take the lock to add/remove/change delayed
    references. So remove the locking.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    7d2835e View commit details
    Browse the repository at this point in the history
  19. btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_e…

    …xtent()
    
    Instead of dereferencing the delayed refs from the transaction multiple
    times, store it early in the local variable and then always use the
    variable.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    ad0bb2c View commit details
    Browse the repository at this point in the history
  20. btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()

    The qgroup record was allocated with kzalloc(), so it's pointless to set
    its old_roots member to NULL. Remove the assignment.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    c8421bc View commit details
    Browse the repository at this point in the history
  21. btrfs: remove redundant stop_loop variable in scrub_stripe()

    The variable stop_loop was originally introduced in commit 625f1c8
    ("Btrfs: improve the loop of scrub_stripe"). It was initialized to 0 in
    commit 3b080b2 ("Btrfs: scrub raid56 stripes in the right way").
    However, in a later commit 18d30ab ("btrfs: scrub: use
    scrub_simple_mirror() to handle RAID56 data stripe scrub"), the code
    that modified stop_loop was removed, making the variable redundant.
    
    Currently, stop_loop is only initialized with 0 and is never used or
    modified within the scrub_stripe() function. As a result, this patch
    removes the stop_loop variable to clean up the code and eliminate
    unnecessary redundancy.
    
    This change has no impact on functionality, as stop_loop was never
    utilized in any meaningful way in the final version of the code.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Riyan Dhiman <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Ryand1234 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    e7f6492 View commit details
    Browse the repository at this point in the history
  22. btrfs: remove unused page_to_inode and page_to_fs_info macros

    This macro is no longer used after the "btrfs: Cleaned up folio->page
    conversion" series patch [1] was applied, so remove it.
    
    [1]: https://patchwork.kernel.org/project/linux-btrfs/cover/[email protected]/
    
    Reviewed-by: Neal Gompa <[email protected]>
    Signed-off-by: Youling Tang <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Youling Tang authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    21ac0bf View commit details
    Browse the repository at this point in the history
  23. btrfs: correct typos in multiple comments across various files

    Fix some confusing spelling errors that were currently identified,
    the details are as follows:
    
    	block-group.c: 2800: 	uncompressible 	==> incompressible
    	extent-tree.c: 3131:	EXTEMT		==> EXTENT
    	extent_io.c: 3124: 	utlizing 	==> utilizing
    	extent_map.c: 1323: 	ealier		==> earlier
    	extent_map.c: 1325:	possiblity	==> possibility
    	fiemap.c: 189:		emmitted	==> emitted
    	fiemap.c: 197:		emmitted	==> emitted
    	fiemap.c: 203:		emmitted	==> emitted
    	transaction.h: 36:	trasaction	==> transaction
    	volumes.c: 5312:	filesysmte	==> filesystem
    	zoned.c: 1977:		trasnsaction	==> transaction
    
    Signed-off-by: Shen Lichuan <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Shen Lichuan authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    0647aa3 View commit details
    Browse the repository at this point in the history
  24. btrfs: tests: add selftests for raid-stripe-tree

    Add first stash of very basic self tests for the RAID stripe-tree.
    
    More test cases will follow exercising the tree.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    9c74f2c View commit details
    Browse the repository at this point in the history
  25. btrfs: remove unused btrfs_free_squota_rsv()

    btrfs_free_squota_rsv() was added in commit
    e85a0ad ("btrfs: ensure releasing squota reserve on head refs")
    but has remained unused since then.
    Remove it as we don't seem to need it and was probably a leftover.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Dr. David Alan Gilbert <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Dr. David Alan Gilbert authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    1bbafcc View commit details
    Browse the repository at this point in the history
  26. btrfs: remove unused btrfs_is_parity_mirror()

    btrfs_is_parity_mirror() has been unused since commit 4886ff7
    ("btrfs: introduce a new helper to submit write bio for repair").
    Remove it as the code was refactored and we don't need the helper
    anymore.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Dr. David Alan Gilbert <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Dr. David Alan Gilbert authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3d8ac55 View commit details
    Browse the repository at this point in the history
  27. btrfs: remove unused btrfs_try_tree_write_lock()

    btrfs_try_tree_write_lock() has been unused since commit
    50b21d7 ("btrfs: submit a writeback bio per extent_buffer").
    Remove it as we don't need it anymore.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Dr. David Alan Gilbert <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Dr. David Alan Gilbert authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    7f92863 View commit details
    Browse the repository at this point in the history
  28. btrfs: remove the dirty_page local variable

    Inside btrfs_buffered_write(), we have a local variable @dirty_pages,
    recording the number of pages we dirtied in the current iteration.
    
    However we do not really need that variable, since it can be calculated
    from @pos and @copied.
    
    In fact there is already a problem inside the short copy path, where we
    use @dirty_pages to calculate the range we need to release.
    But that usage assumes sectorsize == PAGE_SIZE, which is no longer true.
    
    Instead of keeping @dirty_pages and cause incorrect usage, just
    calculate the number of dirtied pages inside btrfs_dirty_pages().
    
    Reviewed-by: Josef Bacik <[email protected]>
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3b7324c View commit details
    Browse the repository at this point in the history
  29. btrfs: simplify the page uptodate preparation for prepare_pages()

    Currently inside prepare_pages(), we handle the leading and tailing page
    differently, and skip the middle pages (if any).  This is to avoid
    reading pages which are fully covered by the dirty range.
    
    Refactor the code by moving all checks (alignment check, range check,
    force read check) into prepare_uptodate_page().
    
    So that prepare_pages() only needs to iterate all the pages
    unconditionally.
    
    And since we're here, also update prepare_uptodate_page() to use
    folio API other than the old page API.
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    a85e63f View commit details
    Browse the repository at this point in the history
  30. btrfs: handle empty list of NOCOW ordered extents with checksum list

    Currently we BUG_ON() in btrfs_finish_one_ordered() if we are finishing
    an ordered extent that is flagged as NOCOW, but it's checksum list is
    not empty.
    
    This is clearly a logic error which we can recover from by aborting the
    transaction.
    
    For developer builds which enable CONFIG_BTRFS_ASSERT, also ASSERT()
    that the list is empty.
    
    Suggested-by: Filipe Manana <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    df5af25 View commit details
    Browse the repository at this point in the history
  31. btrfs: return ENODATA in case RST lookup fails

    In case a lookup in the RAID stripe-tree fails, return ENODATA instead of
    ENOENT to better distinguish stripe-tree lookups from other code paths
    where we return ENOENT.
    
    Suggested-by: Josef Bacik <[email protected]>
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    659f41e View commit details
    Browse the repository at this point in the history
  32. btrfs: scrub: skip initial RST lookup errors

    Performing the initial extent sector read on a RAID stripe-tree backed
    filesystem with pre-allocated extents will cause the RAID stripe-tree
    lookup code to return ENODATA, as pre-allocated extents do not have any
    on-disk bytes and thus no RAID stripe-tree entries.
    
    But the current scrub read code marks these extents as errors, because
    the lookup fails.
    
    If btrfs_map_block() returns -ENODATA, it means that the call to
    btrfs_get_raid_extent_offset() returned -ENODATA, because there is no
    entry for the corresponding range in the RAID stripe-tree. But as this
    range is in the extent tree it means we've hit a pre-allocated extent. In
    this case, don't mark the sector in the stripe's error bitmaps as faulty
    and carry on to the next.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    e081590 View commit details
    Browse the repository at this point in the history
  33. btrfs: qgroup: run delayed iputs after ordered extent completion

    When trying to flush qgroups in order to release space we run delayed
    iputs in order to release space from recently deleted files (their link
    counted reached zero), and then we start delalloc and wait for any
    existing ordered extents to complete.
    
    However there's a time window here where we end up not doing the final
    iput on a deleted file which could release necessary space:
    
    1) An unlink operation starts;
    
    2) During the unlink, or right before it completes, delalloc is flushed
       and an ordered extent is created;
    
    3) When the ordered extent is created, the inode's ref count is
       incremented (with igrab() at alloc_ordered_extent());
    
    4) When the unlink finishes it doesn't drop the last reference on the
       inode and so it doesn't trigger inode eviction to delete all of
       the inode's items in its root and drop all references on its data
       extents;
    
    5) Another task enters try_flush_qgroup() to try to release space,
       it runs all delayed iputs, but there's no delayed iput yet for that
       deleted file because the ordered extent hasn't completed yet;
    
    6) Then at try_flush_qgroup() we wait for the ordered extent to complete
       and that results in adding a delayed iput at btrfs_put_ordered_extent()
       when called from btrfs_finish_one_ordered();
    
    7) Adding the delayed iput results in waking the cleaner kthread if it's
       not running already. However it may take some time for it to be
       scheduled, or it may be running but busy running auto defrag, dropping
       deleted snapshots or doing other work, so by the time we return from
       try_flush_qgroup() the space for deleted file isn't released.
    
    Improve on this by running delayed iputs only after flushing delalloc
    and waiting for ordered extent completion.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    8162aaa View commit details
    Browse the repository at this point in the history
  34. btrfs: remove btrfs_set_range_writeback()

    The function btrfs_set_range_writeback() was originally a callback for
    metadata and data, to mark a range with writeback flag.
    
    Then it was converted into a common function call for both metadata and
    data.
    
    From the very beginning, the function had been only called on a full page,
    later converted to handle range inside a page.
    
    But it never needed to handle multiple pages, and since commit
    8189197 ("btrfs: refactor __extent_writepage_io() to do
    sector-by-sector submission") the function was only called on a
    sector-by-sector basis.
    
    This makes the function unnecessary, and can be converted to a simple
    btrfs_folio_set_writeback() call instead.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    0b2308e View commit details
    Browse the repository at this point in the history
  35. btrfs: zstd: assert the timer pointer in callback

    Make sure we got the right timer struct for the zstd workspace reclaim
    work.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    bde9f20 View commit details
    Browse the repository at this point in the history
  36. btrfs: drop unused parameter path from btrfs_tree_mod_log_rewind()

    The path parameter was used for our own locking, that got converted to
    rwsem eventually. Last usage in ac5887c ("btrfs: locking: remove
    all the blocking helpers").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    1d19c12 View commit details
    Browse the repository at this point in the history
  37. btrfs: drop unused parameter ctx from batch_delete_dir_index_items()

    The ctx parameter is not used, we can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3994109 View commit details
    Browse the repository at this point in the history
  38. btrfs: drop unused parameter fs_info from wait_reserve_ticket()

    The parameter is not used, we can also reach it from the space info if
    needed in the future.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    4995054 View commit details
    Browse the repository at this point in the history
  39. btrfs: drop unused parameter fs_info from do_reclaim_sweep()

    The parameter is unused and we can get it from space info if needed.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    51f6c3a View commit details
    Browse the repository at this point in the history
  40. btrfs: send: drop unused parameter num from iterate_inode_ref_t callb…

    …acks
    
    None of the ref iteration callbacks needs the num parameter (this is for
    the directory item iteration), so we can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    8721e68 View commit details
    Browse the repository at this point in the history
  41. btrfs: send: drop unused parameter index from iterate_inode_ref_t cal…

    …lbacks
    
    None of the ref iteration callbacks needs the index parameter (this is
    for the directory item iteration), so we can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3555fea View commit details
    Browse the repository at this point in the history
  42. btrfs: scrub: drop unused parameter sctx from scrub_submit_extent_sec…

    …tor_read()
    
    The parameter is unused and we can reach sctx from scrub stripe if
    needed.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    2aa366d View commit details
    Browse the repository at this point in the history
  43. btrfs: drop unused parameter map from scrub_simple_mirror()

    The parameter map used to be passed to scrub_extent() until
    e02ee89 ("btrfs: scrub: switch scrub_simple_mirror() to
    scrub_stripe infrastructure"), where the scrub implementation was
    completely reworked.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    e6c00f4 View commit details
    Browse the repository at this point in the history
  44. btrfs: qgroup: drop unused parameter fs_info from __del_qgroup_rb()

    We don't need fs_info here, everything is reachable from qgroup.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    f9bd555 View commit details
    Browse the repository at this point in the history
  45. btrfs: drop unused transaction parameter from btrfs_qgroup_add_swappe…

    …d_blocks()
    
    The caller replace_path() runs under transaction but we don't need it in
    btrfs_qgroup_add_swapped_blocks().
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    e3c79ce View commit details
    Browse the repository at this point in the history
  46. btrfs: lzo: drop unused paramter level from lzo_alloc_workspace()

    The LZO compression has only one level, we don't need to pass the
    parameter.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3beb0db View commit details
    Browse the repository at this point in the history
  47. btrfs: drop unused parameter argp from btrfs_ioctl_quota_rescan_wait()

    We don't need the user passed parameter, rescan is a filesystem
    operation so fs_info is sufficient.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    7e22750 View commit details
    Browse the repository at this point in the history
  48. btrfs: drop unused parameter inode from read_inline_extent()

    We don't need the inode pointer to read inline extent, it's all
    accessible from the path pointer.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    24f6fd6 View commit details
    Browse the repository at this point in the history
  49. btrfs: drop unused parameter offset from __cow_file_range_inline()

    We don't need offset for inline extents, they always start from 0.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    05d2682 View commit details
    Browse the repository at this point in the history
  50. btrfs: drop unused parameter file_offset from btrfs_encoded_read_regu…

    …lar_fill_pages()
    
    The file_offset parameter used to be passed to encoded read struct but
    was removed in commit b665aff ("btrfs: remove unused members from
    struct btrfs_encoded_read_private").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    2c007d2 View commit details
    Browse the repository at this point in the history
  51. btrfs: drop unused parameter iov_iter from btrfs_write_check()

    The parameter 'from' has never been used since commit b8d8e1f
    ("btrfs: introduce btrfs_write_check()"), this is for buffered write.
    Direct io write needs it so it was probably an interface thing, but we
    can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3c09d3b View commit details
    Browse the repository at this point in the history
  52. btrfs: drop unused parameter refs from visit_node_for_delete()

    The parameter duplicates what can be effectively obtained from
    wc->refs[level - 1] and this is what's actually used inside. Added in
    commit 2b73c7e ("btrfs: unify logic to decide if we need to walk
    down into a node during snapshot delete").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    afe6a70 View commit details
    Browse the repository at this point in the history
  53. btrfs: drop unused parameter mask from try_release_extent_state()

    The mask parameter used for allocations got unified to GFP_NOFS and
    removed from relevant functions in 1d12680 ("btrfs: drop gfp from
    parameter extent state helpers").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    ce8e39d View commit details
    Browse the repository at this point in the history
  54. btrfs: drop unused parameter fs_info from folio_range_has_eb()

    The parameter was added in 8ff8466 ("btrfs: support subpage for
    extent buffer page release") for page but hasn't been used since, so we
    can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    2f3f009 View commit details
    Browse the repository at this point in the history
  55. btrfs: drop unused parameter options from open_ctree()

    Since the new mount option parser in commit ad21f15 ("btrfs:
    switch to the new mount API") we don't pass the options like that
    anymore.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    4836659 View commit details
    Browse the repository at this point in the history
  56. btrfs: drop unused parameter data from btrfs_fill_super()

    The only caller passes NULL, we can drop the parameter. This is since
    the new mount option parser done in 3bb17a2 ("btrfs: add get_tree
    callback for new mount API").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3fcafae View commit details
    Browse the repository at this point in the history
  57. btrfs: drop unused parameter transaction from alloc_log_tree()

    The function got split in commit 6ab6ebb ("btrfs: split
    alloc_log_tree()") and since then transaction parameter has been unused.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    bea54c5 View commit details
    Browse the repository at this point in the history
  58. btrfs: drop unused parameter fs_info from btrfs_match_dir_item_name()

    Cascaded removal of fs_info that is not needed in several functions.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    9c0eded View commit details
    Browse the repository at this point in the history
  59. btrfs: drop unused parameter level from alloc_heuristic_ws()

    The compression heuristic pass does not need a level, so we can drop the
    parameter.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    caebb14 View commit details
    Browse the repository at this point in the history
  60. btrfs: zoned: fix zone unusable accounting for freed reserved extent

    When btrfs reserves an extent and does not use it (e.g, by an error), it
    calls btrfs_free_reserved_extent() to free the reserved extent. In the
    process, it calls btrfs_add_free_space() and then it accounts the region
    bytes as block_group->zone_unusable.
    
    However, it leaves the space_info->bytes_zone_unusable side not updated. As
    a result, ENOSPC can happen while a space_info reservation succeeded. The
    reservation is fine because the freed region is not added in
    space_info->bytes_zone_unusable, leaving that space as "free". OTOH,
    corresponding block group counts it as zone_unusable and its allocation
    pointer is not rewound, we cannot allocate an extent from that block group.
    That will also negate space_info's async/sync reclaim process, and cause an
    ENOSPC error from the extent allocation process.
    
    Fix that by returning the space to space_info->bytes_zone_unusable.
    Ideally, since a bio is not submitted for this reserved region, we should
    return the space to free space and rewind the allocation pointer. But, it
    needs rework on extent allocation handling, so let it work in this way for
    now.
    
    Fixes: 169e0da ("btrfs: zoned: track unusable bytes for zones")
    CC: [email protected] # 5.15+
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Naohiro Aota <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    naota authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    9bbc899 View commit details
    Browse the repository at this point in the history
  61. btrfs: fix error propagation of split bios

    The purpose of btrfs_bbio_propagate_error() shall be propagating an error
    of split bio to its original btrfs_bio, and tell the error to the upper
    layer. However, it's not working well on some cases.
    
    * Case 1. Immediate (or quick) end_bio with an error
    
    When btrfs sends btrfs_bio to mirrored devices, btrfs calls
    btrfs_bio_end_io() when all the mirroring bios are completed. If that
    btrfs_bio was split, it is from btrfs_clone_bioset and its end_io function
    is btrfs_orig_write_end_io. For this case, btrfs_bbio_propagate_error()
    accesses the orig_bbio's bio context to increase the error count.
    
    That works well in most cases. However, if the end_io is called enough
    fast, orig_bbio's (remaining part after split) bio context may not be
    properly set at that time. Since the bio context is set when the orig_bbio
    (the last btrfs_bio) is sent to devices, that might be too late for earlier
    split btrfs_bio's completion.  That will result in NULL pointer
    dereference.
    
    That bug is easily reproducible by running btrfs/146 on zoned devices [1]
    and it shows the following trace.
    
    [1] You need raid-stripe-tree feature as it create "-d raid0 -m raid1" FS.
    
      BUG: kernel NULL pointer dereference, address: 0000000000000020
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 1 UID: 0 PID: 13 Comm: kworker/u32:1 Not tainted 6.11.0-rc7-BTRFS-ZNS+ torvalds#474
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Workqueue: writeback wb_workfn (flush-btrfs-5)
      RIP: 0010:btrfs_bio_end_io+0xae/0xc0 [btrfs]
      BTRFS error (device dm-0): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
      RSP: 0018:ffffc9000006f248 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff888005a7f080 RCX: ffffc9000006f1dc
      RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff888005a7f080
      RBP: ffff888011dfc540 R08: 0000000000000000 R09: 0000000000000001
      R10: ffffffff82e508e0 R11: 0000000000000005 R12: ffff88800ddfbe58
      R13: ffff888005a7f080 R14: ffff888005a7f158 R15: ffff888005a7f158
      FS:  0000000000000000(0000) GS:ffff88803ea80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000020 CR3: 0000000002e22006 CR4: 0000000000370ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ? __die_body.cold+0x19/0x26
       ? page_fault_oops+0x13e/0x2b0
       ? _printk+0x58/0x73
       ? do_user_addr_fault+0x5f/0x750
       ? exc_page_fault+0x76/0x240
       ? asm_exc_page_fault+0x22/0x30
       ? btrfs_bio_end_io+0xae/0xc0 [btrfs]
       ? btrfs_log_dev_io_error+0x7f/0x90 [btrfs]
       btrfs_orig_write_end_io+0x51/0x90 [btrfs]
       dm_submit_bio+0x5c2/0xa50 [dm_mod]
       ? find_held_lock+0x2b/0x80
       ? blk_try_enter_queue+0x90/0x1e0
       __submit_bio+0xe0/0x130
       ? ktime_get+0x10a/0x160
       ? lockdep_hardirqs_on+0x74/0x100
       submit_bio_noacct_nocheck+0x199/0x410
       btrfs_submit_bio+0x7d/0x150 [btrfs]
       btrfs_submit_chunk+0x1a1/0x6d0 [btrfs]
       ? lockdep_hardirqs_on+0x74/0x100
       ? __folio_start_writeback+0x10/0x2c0
       btrfs_submit_bbio+0x1c/0x40 [btrfs]
       submit_one_bio+0x44/0x60 [btrfs]
       submit_extent_folio+0x13f/0x330 [btrfs]
       ? btrfs_set_range_writeback+0xa3/0xd0 [btrfs]
       extent_writepage_io+0x18b/0x360 [btrfs]
       extent_write_locked_range+0x17c/0x340 [btrfs]
       ? __pfx_end_bbio_data_write+0x10/0x10 [btrfs]
       run_delalloc_cow+0x71/0xd0 [btrfs]
       btrfs_run_delalloc_range+0x176/0x500 [btrfs]
       ? find_lock_delalloc_range+0x119/0x260 [btrfs]
       writepage_delalloc+0x2ab/0x480 [btrfs]
       extent_write_cache_pages+0x236/0x7d0 [btrfs]
       btrfs_writepages+0x72/0x130 [btrfs]
       do_writepages+0xd4/0x240
       ? find_held_lock+0x2b/0x80
       ? wbc_attach_and_unlock_inode+0x12c/0x290
       ? wbc_attach_and_unlock_inode+0x12c/0x290
       __writeback_single_inode+0x5c/0x4c0
       ? do_raw_spin_unlock+0x49/0xb0
       writeback_sb_inodes+0x22c/0x560
       __writeback_inodes_wb+0x4c/0xe0
       wb_writeback+0x1d6/0x3f0
       wb_workfn+0x334/0x520
       process_one_work+0x1ee/0x570
       ? lock_is_held_type+0xc6/0x130
       worker_thread+0x1d1/0x3b0
       ? __pfx_worker_thread+0x10/0x10
       kthread+0xee/0x120
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x30/0x50
       ? __pfx_kthread+0x10/0x10
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      Modules linked in: dm_mod btrfs blake2b_generic xor raid6_pq rapl
      CR2: 0000000000000020
    
    * Case 2. Earlier completion of orig_bbio for mirrored btrfs_bios
    
    btrfs_bbio_propagate_error() assumes the end_io function for orig_bbio is
    called last among split bios. In that case, btrfs_orig_write_end_io() sets
    the bio->bi_status to BLK_STS_IOERR by seeing the bioc->error [2].
    Otherwise, the increased orig_bio's bioc->error is not checked by anyone
    and return BLK_STS_OK to the upper layer.
    
    [2] Actually, this is not true. Because we only increases orig_bioc->errors
    by max_errors, the condition "atomic_read(&bioc->error) > bioc->max_errors"
    is still not met if only one split btrfs_bio fails.
    
    * Case 3. Later completion of orig_bbio for un-mirrored btrfs_bios
    
    In contrast to the above case, btrfs_bbio_propagate_error() is not working
    well if un-mirrored orig_bbio is completed last. It sets
    orig_bbio->bio.bi_status to the btrfs_bio's error. But, that is easily
    over-written by orig_bbio's completion status. If the status is BLK_STS_OK,
    the upper layer would not know the failure.
    
    * Solution
    
    Considering the above cases, we can only save the error status in the
    orig_bbio (remaining part after split) itself as it is always
    available. Also, the saved error status should be propagated when all the
    split btrfs_bios are finished (i.e, bbio->pending_ios == 0).
    
    This commit introduces "status" to btrfs_bbio and saves the first error of
    split bios to original btrfs_bio's "status" variable. When all the split
    bios are finished, the saved status is loaded into original btrfs_bio's
    status.
    
    With this commit, btrfs/146 on zoned devices does not hit the NULL pointer
    dereference anymore.
    
    Fixes: 852eee6 ("btrfs: allow btrfs_submit_bio to split bios")
    CC: [email protected] # 6.6+
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Naohiro Aota <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    naota authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    467b190 View commit details
    Browse the repository at this point in the history
  62. btrfs: clear force-compress on remount when compress mount option is …

    …given
    
    After the migration to use fs context for processing mount options we had
    a slight change in the semantics for remounting a filesystem that was
    mounted with compress-force. Before we could clear compress-force by
    passing only "-o compress[=algo]" during a remount, but after that change
    that does not work anymore, force-compress is still present and one needs
    to pass "-o compress-force=no,compress[=algo]" to the mount command.
    
    Example, when running on a kernel 6.8+:
    
      $ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi
      $ mount | grep sdi
      /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/)
    
      $ mount -o remount,compress=zlib:5 /mnt/sdi
      $ mount | grep sdi
      /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/)
    
    On a 6.7 kernel (or older):
    
      $ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi
      $ mount | grep sdi
      /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/)
    
      $ mount -o remount,compress=zlib:5 /mnt/sdi
      $ mount | grep sdi
      /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/)
    
    So update btrfs_parse_param() to clear "compress-force" when "compress" is
    given, providing the same semantics as kernel 6.7 and older.
    
    Reported-by: Roman Mamedov <[email protected]>
    Link: https://lore.kernel.org/linux-btrfs/20241014182416.13d0f8b0@nvm/
    CC: [email protected] # 6.8+
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    b54ce3d View commit details
    Browse the repository at this point in the history
  63. btrfs: reduce lock contention when eb cache miss for btree search

    When crawling btree, if an eb cache miss occurs, we change to use the eb
    read lock and release all previous locks (including the parent lock) to
    reduce lock contention.
    
    If an eb cache miss occurs in a leaf and needs to execute IO, before this
    change we released locks only from level 2 and up and we read a leaf's
    content from disk while holding a lock on its parent (level 1), causing
    the unnecessary lock contention on the parent, after this change we
    release locks from level 1 and up, but we lock level 0, and read leaf's
    content from disk.
    
    Because we have prepared the check parameters and the read lock of eb we
    hold, we can ensure that no race will occur during the check and cause
    unexpected errors.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Robbie Ko <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Robbie Ko authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    b18732b View commit details
    Browse the repository at this point in the history
  64. btrfs: add and use helper to remove extent map from its inode's tree

    Move the common code to remove an extent map from its inode's tree into a
    helper function and use it, reducing duplicated code.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    7e1135c View commit details
    Browse the repository at this point in the history
  65. btrfs: make the extent map shrinker run asynchronously as a work queu…

    …e job
    
    Currently the extent map shrinker is run synchronously for kswapd tasks
    that end up calling the fs shrinker (fs/super.c:super_cache_scan()).
    This has some disadvantages and for some heavy workloads with memory
    pressure it can cause some delays and stalls that make a machine
    unresponsive for some periods. This happens because:
    
    1) We can have several kswapd tasks on machines with multiple NUMA zones,
       and running the extent map shrinker concurrently can cause high
       contention on some spin locks, namely the spin locks that protect
       the radix tree that tracks roots, the per root xarray that tracks
       open inodes and the list of delayed iputs. This not only delays the
       shrinker but also causes high CPU consumption and makes the task
       running the shrinker monopolize a core, resulting in the symptoms
       of an unresponsive system. This was noted in previous commits such as
       commit ae1e766 ("btrfs: only run the extent map shrinker from
       kswapd tasks");
    
    2) The extent map shrinker's iteration over inodes can often be slow, even
       after changing the data structure that tracks open inodes for a root
       from a red black tree (up to kernel 6.10) to an xarray (kernel 6.10+).
       The transition to the xarray while it made things a bit faster, it's
       still somewhat slow - for example in a test scenario with 10000 inodes
       that have no extent maps loaded, the extent map shrinker took between
       5ms to 8ms, using a release, non-debug kernel. Iterating over the
       extent maps of an inode can also be slow if have an inode with many
       thousands of extent maps, since we use a red black tree to track and
       search extent maps. So having the extent map shrinker run synchronously
       adds extra delay for other things a kswapd task does.
    
    So make the extent map shrinker run asynchronously as a job for the
    system unbounded workqueue, just like what we do for data and metadata
    space reclaim jobs.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    d4eefbc View commit details
    Browse the repository at this point in the history
  66. btrfs: simplify tracking progress for the extent map shrinker

    Now that the extent map shrinker can only be run by a single task (as a
    work queue item) there is no need to keep the progress of the shrinker
    protected by a spinlock and passing the progress to trace events as
    parameters. So remove the lock and simplify the arguments for the trace
    events.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    ad6f27e View commit details
    Browse the repository at this point in the history
  67. btrfs: rename extent map shrinker members from struct btrfs_fs_info

    The names for the members of struct btrfs_fs_info related to the extent
    map shrinker are a bit too long, so rename them to be shorter by replacing
    the "extent_map_" prefix with the "em_" prefix.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    94a09da View commit details
    Browse the repository at this point in the history
  68. btrfs: re-enable the extent map shrinker

    Now that the extent map shrinker can only be run by a single task and runs
    asynchronously as a work queue job, enable it as it can no longer cause
    stalls on tasks allocating memory and entering the extent map shrinker
    through the fs shrinker (implemented by btrfs_free_cached_objects()).
    
    This is crucial to prevent exhaustion of memory due to unbounded extent
    map creation, primarily with direct IO but also for buffered IO on files
    with holes. This problem, for the direct IO case, was first reported in
    the Link tag below. That report was added to a Link tag of the first patch
    that introduced the extent map shrinker, commit 956a17d ("btrfs: add
    a shrinker for extent maps"), however the Link tag disappeared somehow
    from the committed patch (but was included in the submitted patch to the
    mailing list), so adding it below for future reference.
    
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    5a6cada View commit details
    Browse the repository at this point in the history
  69. btrfs: remove redundant level argument from read_block_for_search()

    The level parameter passed to read_block_for_search() always matches the
    level of the extent buffer passed in the "eb_ret" parameter, which we are
    also extracting into the "parent_level" local variable.
    
    So remove the level parameter and instead use the "parent_level" variable
    which in fact has a better name (it's the level of the parent node from
    which we are reading a child node/leaf).
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    0f9677a View commit details
    Browse the repository at this point in the history
  70. btrfs: simplify arguments for btrfs_verify_level_key()

    The only caller of btrfs_verify_level_key() is read_block_for_search() and
    it's passing 3 arguments to it that can be extracted from its on stack
    variable of type struct btrfs_tree_parent_check.
    
    So change btrfs_verify_level_key() to accept an argument of type
    struct btrfs_tree_parent_check instead of level, first key and parent
    transid arguments.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    7f066ce View commit details
    Browse the repository at this point in the history
  71. btrfs: remove redundant initializations for struct btrfs_tree_parent_…

    …check
    
    It's pointless to initialize the has_first_key field of the stack local
    btrfs_tree_parent_check structure at btrfs_tree_parent_check() and at
    btrfs_qgroup_trace_subtree() since all fields not explicitly initialized
    are zeroed out. In the case of the first function it's a bit odd because
    we are assigning 0 and the field is of type bool, however not incorrect
    since a 0 is converted to false.
    
    Just remove the explicit initializations due to their redundancy.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    28cea0d View commit details
    Browse the repository at this point in the history
  72. btrfs: remove local generation variable from read_block_for_search()

    It's redundant to have the 'gen' variable since we already have the same
    value in the local btrfs_tree_parent_check structure. So remove it and
    instead use the structure's field.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    7e3a5ca View commit details
    Browse the repository at this point in the history
  73. btrfs: do not clear read-only when adding sprout device

    If you follow the seed/sprout wiki, it suggests the following workflow:
    
    btrfstune -S 1 seed_dev
    mount seed_dev mnt
    btrfs device add sprout_dev
    mount -o remount,rw mnt
    
    The first mount mounts the FS readonly, which results in not setting
    BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
    somewhat surprisingly clears the readonly bit on the sb (though the
    mount is still practically readonly, from the users perspective...).
    Finally, the remount checks the readonly bit on the sb against the flag
    and sees no change, so it does not run the code intended to run on
    ro->rw transitions, leaving BTRFS_FS_OPEN unset.
    
    As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
    does no work. This results in leaking deleted snapshots until we run out
    of space.
    
    I propose fixing it at the first departure from what feels reasonable:
    when we clear the readonly bit on the sb during device add.
    
    A new fstest I have written reproduces the bug and confirms the fix.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Boris Burkov <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    boryas authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    78a423d View commit details
    Browse the repository at this point in the history
  74. btrfs: qgroup: set a more sane default value for subtree drop threshold

    Since commit 011b46c ("btrfs: skip subtree scan if it's too high to
    avoid low stall in btrfs_commit_transaction()"), btrfs qgroup can
    automatically skip large subtree scan at the cost of marking qgroup
    inconsistent.
    
    It's designed to address the final performance problem of snapshot drop
    with qgroup enabled, but to be safe the default value is
    BTRFS_MAX_LEVEL, requiring a user space daemon to set a different value
    to make it work.
    
    I'd say it's not a good idea to rely on user space tool to set this
    default value, especially when some operations (snapshot dropping) can
    be triggered immediately after mount, leaving a very small window to
    that that sysfs interface.
    
    So instead of disabling this new feature by default, enable it with a
    low threshold (3), so that large subvolume tree drop at mount time won't
    cause huge qgroup workload.
    
    CC: [email protected] # 6.1
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    8a40d12 View commit details
    Browse the repository at this point in the history
  75. btrfs: fix the delalloc range locking if sector size < page size

    Inside lock_delalloc_folios(), there are several problems related to
    sector size < page size handling:
    
    - Set the writer locks without checking if the folio is still valid
      We call btrfs_folio_start_writer_lock() just like it's folio_lock().
      But since the folio may not even be the folio of the current mapping,
      we can easily screw up the folio->private.
    
    - The range is not clamped inside the page
      This means we can over write other bitmaps if the start/len is not
      properly handled, and trigger the btrfs_subpage_assert().
    
    - @processed_end is always rounded up to page end
      If the delalloc range is not page aligned, and we need to retry
      (returning -EAGAIN), then we will unlock to the page end.
    
      Thankfully this is not a huge problem, as now
      btrfs_folio_end_writer_lock() can handle range larger than the locked
      range, and only unlock what is already locked.
    
    Fix all these problems by:
    
    - Lock and check the folio first, then call
      btrfs_folio_set_writer_lock()
      So that if we got a folio not belonging to the inode, we won't
      touch folio->private.
    
    - Properly truncate the range inside the page
    
    - Update @processed_end to the locked range end
    
    Fixes: 1e1de38 ("btrfs: make process_one_page() to handle subpage locking")
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    cd9721f View commit details
    Browse the repository at this point in the history
  76. btrfs: remove unused btrfs_folio_start_writer_lock()

    This function is not really suitable to lock a folio, as it lacks the
    proper mapping checks, thus the locked folio may not even belong to
    btrfs.
    
    And due to the above reason, the last user inside lock_delalloc_folios()
    is already removed, and we can remove this function.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3efe27c View commit details
    Browse the repository at this point in the history
  77. btrfs: unify to use writer locks for subpage locking

    Since commit d7172f5 ("btrfs: use per-buffer locking for
    extent_buffer reading"), metadata read no longer relies on the subpage
    reader locking.
    
    This means we do not need to maintain a different metadata/data split
    for locking, so we can convert the existing reader lock users by:
    
    - add_ra_bio_pages()
      Convert to btrfs_folio_set_writer_lock()
    
    - end_folio_read()
      Convert to btrfs_folio_end_writer_lock()
    
    - begin_folio_read()
      Convert to btrfs_folio_set_writer_lock()
    
    - folio_range_has_eb()
      Remove the subpage->readers checks, since it is always 0.
    
    - Remove btrfs_subpage_start_reader() and btrfs_subpage_end_reader()
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    9d64856 View commit details
    Browse the repository at this point in the history
  78. btrfs: rename btrfs_folio_(set|start|end)_writer_lock()

    Since there is no user of reader locks, rename the writer locks into a
    more generic name, by removing the "_writer" part from the name.
    
    And also rename btrfs_subpage::writer into btrfs_subpage::locked.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    2be4c90 View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2024

  1. btrfs: implement partial deletion of RAID stripe extents

    In our CI system, the RAID stripe tree configuration sometimes fails with
    the following ASSERT():
    
     assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64
    
    This ASSERT()ion triggers, because for the initial design of RAID
    stripe-tree, I had the "one ordered-extent equals one bio" rule of zoned
    btrfs in mind.
    
    But for a RAID stripe-tree based system, that is not hosted on a zoned
    storage device, but on a regular device this rule doesn't apply.
    
    So in case the range we want to delete starts in the middle of the
    previous item, grab the item and "truncate" it's length. That is, clone
    the item, subtract the deleted portion from the key's offset, delete the
    old item and insert the new one.
    
    In case the range to delete ends in the middle of an item, we have to
    adjust both the item's key as well as the stripe extents and then
    re-insert the modified clone into the tree after deleting the old stripe
    extent.
    
    Signed-off-by: Johannes Thumshirn <[email protected]>
    morbidrsa committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    6eba246 View commit details
    Browse the repository at this point in the history
  2. btrfs: implement self-tests for partial RAID srtipe-tree delete

    Implement self-tests for partial deletion of RAID stripe-tree entries.
    
    These two new tests cover both the deletion of the front of a RAID
    stripe-tree stripe extent as well as truncation of an item to make it
    smaller.
    
    Signed-off-by: Johannes Thumshirn <[email protected]>
    morbidrsa committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    7269e1c View commit details
    Browse the repository at this point in the history