Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aem 4.17.4 #14

Closed
wants to merge 235 commits into from
Closed

Aem 4.17.4 #14

wants to merge 235 commits into from

Conversation

krystian-hebel
Copy link
Member

Testing CI...

jbeulich and others added 30 commits August 21, 2023 15:52
Old gcc won't cope with initializers involving unnamed struct/union
fields.

Fixes: 441b1b2 ("x86/emul: Switch x86_emulate_ctxt to cpu_policy")
Signed-off-by: Jan Beulich <[email protected]>
Acked-by: Andrew Cooper <[email protected]>
master commit: 7688466
master date: 2023-04-19 11:02:47 +0200
In osstest, the jobs using pygrub on arm64 on the branch linux-linus
started to fails with:
    [Errno 28] No space left on device
    Error writing temporary copy of ramdisk

This is because /var/run is small when dom0 has only 512MB to work
with, /var/run is only 40MB. The size of both kernel and ramdisk on
this jobs is now about 42MB, so not enough space in /var/run.

So, to avoid writing a big binary in ramfs, we will use /var/lib
instead, like we already do when saving the device model state on
migration.

Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Anthony PERARD <[email protected]>
Reviewed-by: Jason Andryuk <[email protected]>
master commit: ad89640
master date: 2023-08-08 09:45:20 +0200
Defining ARCH and SRCARCH later in xen/Makefile allows to switch to
immediate evaluation variable type.

ARCH and SRCARCH depend on value defined in Config.mk and aren't used
for e.g. TARGET_SUBARCH or TARGET_ARCH, and not before they're needed in
a sub-make or a rule.

This will help reduce the number of times the shell rune is been
run.

With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

Also, `make -d` shows a lot of these:
    Makefile:39: not recursively expanding SRCARCH to export to shell function
    Makefile:38: not recursively expanding ARCH to export to shell function

Reported-by: Jason Andryuk <[email protected]>
Signed-off-by: Anthony PERARD <[email protected]>
Tested-by: Jason Andryuk <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
(cherry picked from commit 58e0a3f)
Signed-off-by: Anthony PERARD <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
(cherry picked from commit a6ab7dd)
The same command is used to generate the value of both $(TARGET_ARCH)
and $(SRCARCH), as $(ARCH) is an alias for $(XEN_TARGET_ARCH).

Signed-off-by: Anthony PERARD <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
(cherry picked from commit ac27b3b)
With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

Also, `make -d` shows a lot of these:
    Makefile:15: not recursively expanding XEN_BUILD_DATE to export to shell function
    Makefile:16: not recursively expanding XEN_BUILD_TIME to export to shell function
    Makefile:17: not recursively expanding XEN_BUILD_HOST to export to shell function
    Makefile:14: not recursively expanding XEN_DOMAIN to export to shell function

So to avoid having these command been run more than necessary, we
will replace ?= by an equivalent but with immediate expansion.

Reported-by: Jason Andryuk <[email protected]>
Signed-off-by: Anthony PERARD <[email protected]>
Tested-by: Jason Andryuk <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
(cherry picked from commit 0c594c1)
With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

So to avoid having these command been run more than necessary, we
will replace ?= by an equivalent but with immediate expansion.

Reported-by: Jason Andryuk <[email protected]>
Signed-off-by: Anthony PERARD <[email protected]>
Tested-by: Jason Andryuk <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
(cherry picked from commit a07414d)
Our present approach is working fully behind the compiler's back. This
was found to not work with LTO. Employ ld's --wrap= option instead. Note
that while this makes the build work at least with new enough gcc (it
doesn't with gcc7, for example, due to tool chain side issues afaict),
according to my testing things still won't work when building the
fuzzing harness with afl-cc: While with the gcc7 tool chain I see afl-as
getting invoked, this does not happen with gcc13. Yet without using that
assembler wrapper the resulting binary will look uninstrumented to
afl-fuzz.

While checking the resulting binaries I noticed that we've gained uses
of snprintf() and strstr(), which only just so happen to not cause any
problems. Add a wrappers for them as well.

Since we don't have any actual uses of v{,sn}printf(), no definitions of
their wrappers appear (just yet). But I think we want
__wrap_{,sn}printf() to properly use __real_v{,sn}printf() right away,
which means we need delarations of the latter.

Reported-by: Andrew Cooper <[email protected]>
Suggested-by: Andrew Cooper <[email protected]>
Signed-off-by: Jan Beulich <[email protected]>
Tested-by: Andrew Cooper <[email protected]>
Acked-by: Andrew Cooper <[email protected]>
(cherry picked from commit 6fba45c)
GCC 12 objects to pointers derived from a constant:

  util.c: In function 'find_rsdp':
  util.c:429:16: error: array subscript 0 is outside array bounds of 'uint16_t[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds]
    429 |     ebda_seg = *(uint16_t *)ADDR_FROM_SEG_OFF(0x40, 0xe);
  cc1: all warnings being treated as errors

This is a GCC bug, but work around it rather than turning array-bounds
checking off generally.

Signed-off-by: Andrew Cooper <[email protected]>
Acked-by: Jan Beulich <[email protected]>
(cherry picked from commit e35138a)
Clang-15 complains:

  tcgbios.c:598:25: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
  void tcpa_calling_int19h()
                          ^
                           void

C2x formally removes K&R syntax.  The declarations for these functions in
32bitprotos.h are already ANSI compatible.  Update the definitions to match.

Signed-off-by: Andrew Cooper <[email protected]>
Acked-by: Jan Beulich <[email protected]>
(cherry picked from commit a562afa)
As the Alpine 3.18 container notes:

  egrep: warning: egrep is obsolescent; using grep -E

Adjust it.

Signed-off-by: Andrew Cooper <[email protected]>
Acked-by: Jan Beulich <[email protected]>
(cherry picked from commit 5ddac3c)
CI: Update FreeBSD to 13.1

Also print the compiler version before starting.  It's not easy to find
otherwise, and does change from time to time.

Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Anthony PERARD <[email protected]>
(cherry picked from commit 5e7667e)

CI: Update FreeBSD to 13.2

Signed-off-by: Andrew Cooper <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
(cherry picked from commit f872a62)

CI: Update FreeBSD to 12.4

Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>
(cherry picked from commit a735608)
Gitlab reports:

  node.c:158:17: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]

        ctrl->blocking = 1;
                       ^ ~
  1 error generated.
  make[4]: *** [/builds/xen-project/people/andyhhp/xen/tools/vchan/../../tools/Rules.mk:188: node.o] Error 1

In Xen 4.18, this was fixed with c/s 99ab02f ("tools: convert bitfields
to unsigned type") but this is an ABI change which can't be backported.

Swich 1 for -1 to provide a minimally invasive way to fix the build.

No functional change.

Signed-off-by: Andrew Cooper <[email protected]>
The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
When the hypervisor returns -ETIME (timeout in the past) Linux keeps
retrying to setup the timer with a higher timeout instead of
self-injecting a timer interrupt.

On boxes without any hardware assistance for logdirty we have seen HVM
Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
logdirty is enabled:

CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 506250 nsec
CE: xen increased min_delta_ns to 759375 nsec
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
Freezing user space processes ...
INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
Task dump for CPU 14:
swapper/14      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14

Thus leading to CPU stalls and a broken system as a result.

Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
the hypervisor.  Old Linux versions are the only ones known to have
(wrongly) attempted to use the flag, and ignoring it is compatible
with the behavior expected by any guests setting that flag.

Note the usage of the flag has been removed from Linux by commit:

c06b6d70feb3 xen/x86: don't lose event interrupts

Which landed in Linux 4.7.

Signed-off-by: Roger Pau Monné <[email protected]>
Acked-by: Henry Wang <[email protected]> # CHANGELOG
Acked-by: Jan Beulich <[email protected]>
master commit: 19c6cbd
master date: 2023-05-03 13:36:05 +0200
Ensure that the base address is 2M aligned, or else the page table
entries created would be corrupt as reserved bits on the PDE end up
set.

We have encountered a broken firmware where grub2 would end up loading
Xen at a non 2M aligned region when using the multiboot2 protocol, and
that caused a very difficult to debug triple fault.

If the alignment is not as required by the page tables print an error
message and stop the boot.  Also add a build time check that the
calculation of symbol offsets don't break alignment of passed
addresses.

The check could be performed earlier, but so far the alignment is
required by the page tables, and hence feels more natural that the
check lives near to the piece of code that requires it.

Note that when booted as an EFI application from the PE entry point
the alignment check is already performed by
efi_arch_load_addr_check(), and hence there's no need to add another
check at the point where page tables get built in
efi_arch_memory_setup().

Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
master commit: 0946068
master date: 2023-05-03 13:36:25 +0200
A recent xentrace highlighted an unhandled corner case in the vcpu
"start-of-day" logic, if the trace starts after the last running ->
non-running transition, but before the first non-running -> running
transition.  Because start-of-day wasn't handled, vcpu_next_update()
was expecting p->current to be NULL, and tripping out with the
following error message when it wasn't:

vcpu_next_update: FATAL: p->current not NULL! (d32768dv$p, runstate RUNSTATE_INIT)

where 32768 is the DEFAULT_DOMAIN, and $p is the pcpu number.

Instead of calling vcpu_start() piecemeal throughout
sched_runstate_process(), call it at the top of the function if the
vcpu in question is still in RUNSTATE_INIT, so that we can handle all
the cases in one place.

Sketch out at the top of the function all cases which we need to
handle, and what to do in those cases.  Some transitions tell us where
v is running; some transitions tell us about what is (or is not)
running on p; some transitions tell us neither.

If a transition tells us where v is now running, update its state;
otherwise leave it in INIT, in order to avoid having to deal with TSC
skew on start-up.

If a transition tells us what is or is not running on p, update
p->current (either to v or NULL).  Otherwise leave it alone.

If neither, do nothing.

Reifying those rules:

- If we're continuing to run, set v to RUNNING, and use p->first_tsc
  as the runstate time.

- If we're starting to run, set v to RUNNING, and use ri->tsc as the
  runstate time.

- If v is being deschedled, leave v in the INIT state to avoid dealing
  with TSC skew; but set p->current to NULL so that whatever is
  scheduled next won't trigger the assert in vcpu_next_update().

- If a vcpu is waking up (switching from one non-runnable state to
  another non-runnable state), leave v in INIT, and p in whatever
  state it's in (which may be the default domain, or some other vcpu
  which has already run).

While here, fix the comment above vcpu_start; it's called when the
vcpu state is INIT, not when current is the default domain.

Signed-off-by: George Dunlap <[email protected]>
Acked-by: Andrew Cooper <[email protected]>
Acked-by: Anthony PERARD <[email protected]>
master commit: aab4b38
master date: 2023-06-30 11:25:33 +0100
The current logic to init the local APIC and the IO-APIC does init the
local APIC LVTERR/ESR before doing any sanitization on the IO-APIC pin
configuration.  It's already noted on enable_IO_APIC() that Xen
shouldn't trust the IO-APIC being empty at bootup.

At XenServer we have a system where the IO-APIC 0 is handed to Xen
with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
with a vector of 0 (all fields of the RTE are zeroed).  Once the local
APIC LVTERR/ESR is enabled periodic injections from such pin cause the
local APIC to in turn inject periodic error vectors:

APIC error on CPU0: 00(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector

That prevents Xen from booting.

Move the masking of the IO-APIC pins ahead of the setup of the local
APIC.  This has the side effect of also moving the detection of the
pin where the i8259 is connected, as such detection must be done
before masking any pins.

Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
master commit: 813da5f
master date: 2023-07-17 10:31:10 +0200
Further changes will require access to the full RTE as a single value
in order to pass it to IOMMU interrupt remapping handlers.

No functional change intended.

Signed-off-by: Roger Pau Monné <[email protected]>
Acked-by: Jan Beulich <[email protected]>
master commit: cdc48cb
master date: 2023-07-28 09:39:44 +0200
Do not allow to write to RTE registers using io_apic_write and instead
require changes to RTE to be performed using ioapic_write_entry.

This is in preparation for passing the full contents of the RTE to the
IOMMU interrupt remapping handlers, so remapping entries for IO-APIC
RTEs can be updated atomically when possible.

While immediately this commit might expand the number of MMIO accesses
in order to update an IO-APIC RTE, further changes will benefit from
getting the full RTE value passed to the IOMMU handlers, as the logic
is greatly simplified when the IOMMU handlers can get the complete RTE
value in one go.

Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
master commit: ef7995e
master date: 2023-07-28 09:40:20 +0200
Preparatory change to unify the IO-APIC pin variable name between
io_apic_read_remap_rte() and amd_iommu_ioapic_update_ire(), so that
the local variable can be made a function parameter with the same name
across vendors.

Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
master commit: a478b38
master date: 2023-07-28 09:40:42 +0200
So that the remapping entry can be updated atomically when possible.

Doing such update atomically will avoid Xen having to mask the IO-APIC
pin prior to performing any interrupt movements (ie: changing the
destination and vector fields), as the interrupt remapping entry is
always consistent.

This also simplifies some of the logic on both VT-d and AMD-Vi
implementations, as having the full RTE available instead of half of
it avoids to possibly read and update the missing other half from
hardware.

While there remove the explicit zeroing of new_ire fields in
ioapic_rte_to_remap_entry() and initialize the variable at definition
so all fields are zeroed.  Note fields could be also initialized with
final values at definition, but I found that likely too much to be
done at this time.

Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
master commit: 3e03317
master date: 2023-08-01 11:48:39 +0200
The check was missing an escape for the inner $, thus breaking things
in the unlikely event that the underlying assembler doesn't support this
option.

Fixes: 62d2229 ("build: silence GNU ld warning about executable stacks")
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Anthony PERARD <[email protected]>
master commit: d1f6a58
master date: 2023-08-14 09:58:19 +0200
The "cpuid_empty" label is also (in principle; maybe only for rubbish
input) reachable in the "cpuid_only" case. Hence the label needs to live
ahead of the check of the variable.

Fixes: 5b80cec ("libxl: introduce MSR data in libxl_cpuid_policy")
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Anthony PERARD <[email protected]>
master commit: ebce4e3
master date: 2023-08-17 16:24:17 +0200
tboot_shutdown() calls into tboot to perform the actual system shutdown.
tboot isn't built with endbr annotations, and Xen has CET-IBT enabled on
newer hardware.  shutdown_entry isn't annotated with endbr and Xen
faults:

Panic on CPU 0:
CONTROL-FLOW PROTECTION FAULT: #CP[0003] endbranch

And Xen hangs at this point.

Disabling CET-IBT let Xen and tboot power off, but reboot was
perfoming a poweroff instead of a warm reboot.  Disabling all of CET,
i.e. shadow stacks as well, lets tboot reboot properly.

Fixes: cdbe2b0 ("x86: Enable CET Indirect Branch Tracking")
Signed-off-by: Jason Andryuk <[email protected]>
Acked-by: Andrew Cooper <[email protected]>
Reviewed-by: Daniel P. Smith <[email protected]>
master commit: 0801868
master date: 2023-08-17 16:24:49 +0200
Fixes: 9864841 ("x86/vm_event: add support for VM_EVENT_REASON_INTERRUPT")
Signed-off-by: Jinoh Kang <[email protected]>
master commit: b2865c2
master date: 2023-08-18 20:21:44 +0100
At the time of XSA-170, the x86 instruction emulator was genuinely broken.  It
would load arbitrary values into %rip and putting a check here probably was
the best stopgap security fix.  It should have been reverted following c/s
81d3a0b "x86emul: limit-check branch targets" which corrected the emulator
behaviour.

However, everyone involved in XSA-170, myself included, failed to read the SDM
correctly.  On the subject of %rip consistency checks, the SDM stated:

  If the processor supports N < 64 linear-address bits, bits 63:N must be
  identical

A non-canonical %rip (and SSP more recently) is an explicitly legal state in
x86, and the VMEntry consistency checks are intentionally off-by-one from a
regular canonical check.

The consequence of this bug is that Xen will currently take a legal x86 state
which would successfully VMEnter, and corrupt it into having non-architectural
behaviour.

Furthermore, in the time this bugfix has been pending in public, I
successfully persuaded Intel to clarify the SDM, adding the following
clarification:

  The guest RIP value is not required to be canonical; the value of bit N-1
  may differ from that of bit N.

Fixes: ffbbfda ("x86/VMX: sanitize rIP before re-entering guest")
Signed-off-by: Andrew Cooper <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
master commit: 10c83bb
master date: 2023-08-23 18:44:59 +0100
The return value of bogus_8259A_irq() is wrong: the function will
return `true` when the IRQ is real and `false` when it's a spurious
IRQ.  This causes the "No irq handler for vector ..." message in
do_IRQ() to be printed for spurious i8259 interrupts which is not
intended (and not helpful).

Fix by inverting the return value of bogus_8259A_irq().

Fixes: 1329063 ('x86/i8259: Handle bogus spurious interrupts more quietly')
Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
master commit: 709f6c8
master date: 2023-08-30 10:03:53 +0200
…ress space

The region that needs to be cleaned/invalidated may be at the top
of the address space. This means that 'end' (i.e. 'p + size') will
be 0 and therefore nothing will be cleaned/invalidated as the check
in the loop will always be false.

On Arm64, we only support we only support up to 48-bit Virtual
address space. So this is not a concern there. However, for 32-bit,
the mapcache is using the last 2GB of the address space. Therefore
we may not clean/invalidate properly some pages. This could lead
to memory corruption or data leakage (the scrubbed value may
still sit in the cache when the guest could read directly the memory
and therefore read the old content).

Rework invalidate_dcache_va_range(), clean_dcache_va_range(),
clean_and_invalidate_dcache_va_range() to handle a cache flush
with an element at the top of the address space.

This is CVE-2023-34321 / XSA-437.

Reported-by: Julien Grall <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Signed-off-by: Julien Grall <[email protected]>
Acked-by: Bertrand Marquis <[email protected]>
master commit: 9a216e9
master date: 2023-09-05 14:30:08 +0200
Reportedly the AMD Custom APU 0405 found on SteamDeck, models 0x90 and
0x91, (quoting the respective Linux commit) is similarly affected. Put
another instance of our Zen1 vs Zen2 distinction checks in
amd_check_zenbleed(), forcing use of the chickenbit irrespective of
ucode version (building upon real hardware never surfacing a version of
0xffffffff).

Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>
(cherry picked from commit 145a69c)
krystian-hebel and others added 23 commits April 26, 2024 18:10
It used to be called from smp_callin(), however BUG_ON() was invoked on
multiple occasions before that. It may end up calling machine_restart()
which tries to get APIC ID for CPU running this code. If BSP detected
that x2APIC is enabled, get_apic_id() will try to use it for all CPUs.
Enabling x2APIC on secondary CPUs earlier protects against an endless
loop of #GP exceptions caused by attempts to read IA32_X2APIC_APICID
MSR while x2APIC is disabled in IA32_APIC_BASE.

Signed-off-by: Krystian Hebel <[email protected]>
If multiple CPUs called machine_restart() before actual restart took
place, but after boot CPU declared itself not online, ASSERT in
on_selected_cpus() will fail. Few calls later execution would end up
in machine_restart() again, with another frame on call stack for new
exception.

To protect against running out of stack, code checks if boot CPU is
still online before calling on_selected_cpus().

Signed-off-by: Krystian Hebel <[email protected]>
CPU id is obtained as a side effect of searching for appropriate
stack for AP. It can be used as a parameter to start_secondary().
Coincidentally this also makes further work on making AP bring-up
code parallel easier.

Signed-off-by: Krystian Hebel <[email protected]>
This will be used for parallel AP bring-up.

CPU_STATE_INIT changed direction. It was previously set by BSP and never
consumed by AP. Now it signals that AP got through assembly part of
initialization and waits for BSP to call notifiers that set up data
structures required for further initialization.

Signed-off-by: Krystian Hebel <[email protected]>
This is no longer necessary, since AP loops on cpu_state and CPU
index is passed as argument.

In addition, move TXT JOIN structure to static data. There is no
guarantee that it would be consumed before it is overwritten on BSP
stack.

Signed-off-by: Krystian Hebel <[email protected]>
This is another requirement for parallel AP bringup.

Signed-off-by: Krystian Hebel <[email protected]>
Multiple delays are required when sending IPIs and waiting for
responses. During boot, 4 such IPIs were sent per each AP. With this
change, only one set of broadcast IPIs is sent. This reduces boot time,
especially for platforms with large number of cores.

Single CPU initialization is still possible, it is used for hotplug.

During wakeup from S3 APs are started one by one. It should be possible
to enable parallel execution there as well, but I don't have a way of
testing it as of now.

Signed-off-by: Krystian Hebel <[email protected]>
Signed-off-by: Sergii Dmytruk <[email protected]>
SHA1 and SHA256 is hardcoded here, but their support by TPM is checked
for. Addition of event log for TPM2.0 will generalize the code further.

Signed-off-by: Sergii Dmytruk <[email protected]>
Signed-off-by: Sergii Dmytruk <[email protected]>
Do not rely on bootloader to do that to avoid discrepancies between
measured data and binary file that's being loaded.

Signed-off-by: Sergii Dmytruk <[email protected]>
This makes the function independent of the way in which SLRT is
discovered and moves discovery code into a separate function reusable in
other places.

Signed-off-by: Sergii Dmytruk <[email protected]>
To collect its core functionality in one place instead of having some in
intel_txt and other in tpm units.

TXT_EVTYPE_* now live in <asm/slaunch.h> and are called
DLE_EVTYPE_* despite being based on TXT specification.  This way code
for non-Intel won't need to include TXT header.

No functional changes.

Signed-off-by: Sergii Dmytruk <[email protected]>
It also works without doing this explicitly thanks to the fact that
TXT register space is located in the same 2MB page as TPM.

Signed-off-by: Sergii Dmytruk <[email protected]>
It holds physical address of SLRT. The value is produced by
slaunch_early (known as txt_early previously), gets set in assembly and
then used by the main C code which don't need to know how we got
it (which is different for different CPUs).

Signed-off-by: Sergii Dmytruk <[email protected]>
secure-kernel-loader on AMD with SKINIT passes MBI as a parameter for
Multiboot kernel.

Another thing of interest is the location of SLRT which is bootloader's
data after SKL.

Signed-off-by: Sergii Dmytruk <[email protected]>
This mostly involves not running Intel-specific code when on AMD.
There are only a few new AMD-specific implementation details:
 - finding SLB start and size and mapping and protecting it
 - managing offset for adding the next TPM log entry

Signed-off-by: Sergii Dmytruk <[email protected]>
Some CPUs don't use default APIC base. Address in MSR is always valid,
and it is already read to test for x2APIC.

Signed-off-by: Krystian Hebel <[email protected]>
Just like TPM2 case this code path also needs extra handling on AMD
because TXT-compatible data prepared by SKL is stored inside of vendor
data field of TCG header.

Signed-off-by: Sergii Dmytruk <[email protected]>
Map the TPM event log after the TXT regions are mapped to avoid
an early page fault when booting with slaunch.

Signed-off-by: Michał Żygowski <[email protected]>
@SergiiDmytruk
Copy link
Member

Cleanup: @krystian-hebel, any point in keeping this PR and branch?

@krystian-hebel
Copy link
Member Author

This was used for building packages for Qubes OS, but they switched to 4.19 last week, so this won't be needed anymore, closing.

@krystian-hebel krystian-hebel deleted the aem-4.17.4 branch August 26, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.