Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.6] Track ClearLinux kernel performance patches #28

Draft
wants to merge 40 commits into
base: base-6.6
Choose a base branch
from

Conversation

kakra
Copy link
Owner

@kakra kakra commented Nov 26, 2023

Export patch series: https://github.com/kakra/linux/pull/28.patch

ClearLinux performance patches: a selected set of ClearLinux kernel patches which are supposed to improve performance, gaming experience, or general compatibility with latest Intel CPUs (e.g. asymmetric CPU cores of 12th gen or later)

fenrus75 and others added 30 commits November 19, 2023 21:10
Author:    Arjan van de Ven <[email protected]>

Signed-off-by: Miguel Bernal Marin <[email protected]>
Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
Both the VM and EXT4 have a "commit to disk after X seconds" time.
Currently the EXT4 time is shorter than our VM time, which is a bit
suboptional,
it's better for performance to let the VM do the writeouts in bulk
rather than something deep in the journalling layer.

(DISTRO TWEAK -- NOT FOR UPSTREAM)

Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
Reduce wakeups for PME checks, which are a workaround for miswired
boards (sadly, too many of them) in laptops.
Increase target_residency in cpuidle cstate

Tune intel_idle to be a bit less agressive;
Clear linux is cleaner in hygiene (wakupes) than the average linux,
so we can afford changing these in a way that increases
performance while keeping power efficiency
Few distro-tweaks to add printk's to visualize boot time better

Author:    Arjan van de Ven <[email protected]>

Signed-off-by: Miguel Bernal Marin <[email protected]>
NO point recalibrating for known-constant tsc ...
saves 200ms+ of boot time.
ATA init is the long pole in the boot process, and its asynchronous.
move the graphics init after it so that ata and graphics initialize
in parallel
As Clear Linux boots fast the device is not ready when
the mounting code is reached, so a retry device scan will
be performed every 0.5 sec for at least 40 sec
and synchronize the async task.

Signed-off-by: Miguel Bernal Marin <[email protected]>
Add module.sig_unenforce boot parameter to allow loading unsigned kernel
modules. Parameter is only effective if CONFIG_MODULE_SIG_FORCE is
enabled and system is *not* SecureBooted.

Signed-off-by: Brett T. Warden <[email protected]>
Signed-off-by: Miguel Bernal Marin <[email protected]>
Prefer the order of specific version before generic and /etc before
/lib to enable the user to give specific overrides for generic
firmware and distribution firmware.
These settings are needed to prevent networking issues when
the networking modules come up by default without explicit
settings, which breaks some cases.

We don't want the modprobe settings to be read at boot time
if we're not going to do anything else ever.
Kvmtool and clear containers supports using user attributes to label host
files with the virtual uid/guid of the file in the container. This allows an
end user to manage their files and a complete uid space without all the ugly
namespace stuff.

The one gap in the support is symlinks because an end user can change the
ownership of a symbolic link. We support attributes on these files as you
can already (as root) set security attributes on them.

The current rules seem slightly over-paranoid and as we have a use case this
patch enables updating the attributes on a symbolic link IFF you are the
owner of the synlink (as permissions are not usually meaningful on the link
itself).

Signed-off-by: Alan Cox <[email protected]>
tweak rwsem owner spinning a bit
Change libahci to ignore firmware's staggered spin-up flag. End-users
who wish to honor firmware's SSS flag can add the following kernel
parameter to a new file at /etc/kernel/cmdline.d/ignore_sss.conf:
    libahci.ignore_sss=0

And then run
    sudo clr-boot-manager update

Signed-off-by: Joe Konno <[email protected]>
print cpu number when we print a crash
On systems with overclocking enabled, CPPC Highest Performance can be
hard coded to 0xff. In this case even if we have cores with different
highest performance, ITMT can't be enabled as the current implementation
depends on CPPC Highest Performance.

On such systems we can use MSR_HWP_CAPABILITIES maximum performance field
when CPPC.Highest Performance is 0xff.

Due to legacy reasons, we can't solely depend on MSR_HWP_CAPABILITIES as
in some older systems CPPC Highest Performance is the only way to identify
different performing cores.

Signed-off-by: Srinivas Pandruvada <[email protected]>
make sure there's at least 1024 per cpu pages... a reasonably small
amount for todays system
Instead of using jiffies and waiting for jiffies to wrap before
measuring use the higher precision local_time for benchmarking.
Measure 2500 loops, which works out to be accurate enough for
benchmarking the raid algo data rates. Also add division by zero
checking in case timing measurements are bogus.

Speeds up raid benchmarking from 48,000 usecs to 4000 usecs, saving
0.044 seconds on boot.

Signed-off-by: Colin Ian King <[email protected]>
Printing initcall timings that successfully return after 0 usecs
provides not much useful information and takes a small amount of time
to do so. Disable the initcall timings for these specific cases. On
an Alderlake i9-12900 this reduces kernel boot time by 0.67% (timed
up to the invocation of systemd starting) based on 10 boot measurements.

Signed-off-by: Colin Ian King <[email protected]>
Place libraries right below the binary for PIE binaries, this helps code locality
(and thus performance).

Signed-off-by: Kai Krakow <[email protected]>
Signed-off-by: Kai Krakow <[email protected]>
Signed-off-by: Kai Krakow <[email protected]>
ColinIanKing and others added 10 commits November 19, 2023 21:52
Some misguided apps hammer sched_yield() in a tight loop (they should be using futexes instead)
which causes massive lock contention even if there is little work to do or to yield to.
rare limit yielding since the base scheduler does a pretty good job already about just
running the right things

Signed-off-by: Kai Krakow <[email protected]>
Signed-off-by: Kai Krakow <[email protected]>
Enabling SLAB_HWCACHE_ALIGN for the ACPI object caches improves
boot speed in the ACPICA core for object allocation and free'ing
especially in the AML parsing and execution phases in boot. Testing
with 100 boots shows an average boot saving in acpi_init of ~35000
usecs compared to the unaligned version. Most of the ACPI objects
being allocated and free'd are of very short life times in the
critical paths for parsing and execution, so the extra memory used
for alignment isn't too onerous.

Signed-off-by: Colin Ian King <[email protected]>
… pr_err

For x86 targets it's more pertinant to check for lack of MWAIT than AMD
specific cpus, so swap the order of tests. Also make the pr_err a
pr_warn to align with other ENODEV warning messages.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Kai Krakow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants