LinuxLists.cc - x86: AMD Zen2 ymm registers rolling back

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, Feb 21, 2023 at 10:40:07PM -0800, Tavis Ormandy wrote:
> Hello, I'm experiencing a serious bug on AMD Zen2 that is causing
> registers to "roll back" to previous values after a context switch.

It probably doesn't matter all too much but which kernel are you running
this on?

Latest and greatest I hope...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-22 09:14:45

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, Feb 22, 2023 at 09:33:19AM +0100, Borislav Petkov wrote:
> It probably doesn't matter all too much but which kernel are you running
> this on?
>
> Latest and greatest I hope...

Also, what does

$ grep microcode /proc/cpuinfo | uniq

say?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-22 09:42:38

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On 22/02/2023 9:14 am, Borislav Petkov wrote:
> On Wed, Feb 22, 2023 at 09:33:19AM +0100, Borislav Petkov wrote:
>> It probably doesn't matter all too much but which kernel are you running
>> this on?
>>
>> Latest and greatest I hope...
> Also, what does
>
> $ grep microcode /proc/cpuinfo | uniq
>
> say?

This sounds suspiciously like an errata which was fixed with a ucode
update last year.

Check that you are up to date with ucode. If you are, perhaps the fix
needs another look at...

~Andrew

2023-02-22 10:10:00

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
> This sounds suspiciously like an errata which was fixed with a ucode
> update last year.

Yes, it looks like it.

Alternatively, you can try booting with "clearcpuid=xsaves" - that
should take care of your observation too but yeah, you should rather
update your microcode.

HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-22 21:27:28

by Tavis Ormandy

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, Feb 22, 2023 at 11:09:51AM +0100, Borislav Petkov wrote:
> On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
> > This sounds suspiciously like an errata which was fixed with a ucode
> > update last year.
>
> Yes, it looks like it.
>
> Alternatively, you can try booting with "clearcpuid=xsaves" - that
> should take care of your observation too but yeah, you should rather
> update your microcode.
>

Thanks - confirmed, it *doesn't* repro with 0x8301055, but does repro
with 0x830104d.

Annoyingly, I thought I was using the most recent microcode, but it seems
like there is some bug and debian wasn't applying it at boot.

That seems like a scary errata :-/

Tavis.

--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger [email protected]
_\_V _( ) _( ) @taviso

2023-02-22 22:18:49

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On 22/02/2023 9:26 pm, Tavis Ormandy wrote:
> On Wed, Feb 22, 2023 at 11:09:51AM +0100, Borislav Petkov wrote:
>> On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
>>> This sounds suspiciously like an errata which was fixed with a ucode
>>> update last year.
>> Yes, it looks like it.
>>
>> Alternatively, you can try booting with "clearcpuid=xsaves" - that
>> should take care of your observation too but yeah, you should rather
>> update your microcode.
>>
> Thanks - confirmed, it *doesn't* repro with 0x8301055, but does repro
> with 0x830104d.

For completeness, this is erratum 1386, details of which can be found in
https://www.amd.com/system/files/TechDocs/56323-PUB_1.00.pdf amongst others.

That said, looking at the list of preconditions to tickle this erratum,
I don't see anything in your code which would cause a change in MXCSR.
Which most likely means there's a path in the kernel modifying MXCSR and
that doesn't sound like a thing that ought to be happening by default.

> Annoyingly, I thought I was using the most recent microcode, but it seems
> like there is some bug and debian wasn't applying it at boot.
>
> That seems like a scary errata :-/

Honestly, this is tame compared to some of the things which get fixed in
ucode...

~Andrew

2023-02-22 22:25:14

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, Feb 22, 2023 at 01:26:37PM -0800, Tavis Ormandy wrote:
> Thanks - confirmed, it *doesn't* repro with 0x8301055, but does repro
> with 0x830104d.

Good.

> Annoyingly, I thought I was using the most recent microcode, but it seems
> like there is some bug and debian wasn't applying it at boot.

Well, actually, microcode loading is so simple - you don't absolutely
need to rely on the distros to do it for ya.

You simply get it from

git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

and copy all the bin files from amd-ucode/ into /lib/firmware/amd-ucode/

Technically, you'd need only the one for your family but the tools can
deal with multiple files in there.

And then you're all set - dracut or whatever creates your initrd will
add it. You can even add it by hand, see

Documentation/x86/microcode.rst

> That seems like a scary errata :-/

Yeah, modern x86 hardware is crazy complex. And we've had worse. :-\

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-22 22:26:53

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, Feb 22, 2023 at 10:17:35PM +0000, Andrew Cooper wrote:
> I don't see anything in your code which would cause a change in MXCSR.
> Which most likely means there's a path in the kernel modifying MXCSR and
> that doesn't sound like a thing that ought to be happening by default.

We were just talking about this internally too. We could trace it to see
what fiddles with MXCSR but meh, bigger fish to fry.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-28 18:47:30

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, 22 Feb 2023, Borislav Petkov wrote:

> On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
> > This sounds suspiciously like an errata which was fixed with a ucode
> > update last year.
>
> Yes, it looks like it.
>
> Alternatively, you can try booting with "clearcpuid=xsaves" - that
> should take care of your observation too but yeah, you should rather
> update your microcode.

Hi folks,

I can reproduce this bug on AMD Renoir SoC:

vendor_id : AuthenticAMD
cpu family : 23
model : 96
model name : AMD Ryzen 5 4600G with Radeon Graphics
stepping : 1
microcode : 0x8600104

for which there's no microcode update, the microcode_amd_fam17h.bin file
in the linux-firmware.git repo carries only the following patches:

$ ./amd_ucode_info.py microcode_amd_fam17h.bin
Microcode patches in microcode_amd_fam17h.bin:
Family=0x17 Model=0x08 Stepping=0x02: Patch=0x0800820d Length=3200 bytes
Family=0x17 Model=0x01 Stepping=0x02: Patch=0x0800126e Length=3200 bytes
Family=0x17 Model=0x31 Stepping=0x00: Patch=0x08301055 Length=3200 bytes

I've seen microcode version increase after a BIOS update, so it seems like
internally microcode patches exist for Renoir too, but it's up to hardware
vendors to pick them up as part of a BIOS update. Is there any chance of
a conventional release like for the above three CPU models?

Thanks.
Alexander

2023-02-28 19:20:13

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, Feb 28, 2023 at 09:47:22PM +0300, Alexander Monakov wrote:
> I've seen microcode version increase after a BIOS update,

Are you saying there's no BIOS update for your board?

Are you running the latest BIOS?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-28 19:24:56

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, 28 Feb 2023, Borislav Petkov wrote:

> On Tue, Feb 28, 2023 at 09:47:22PM +0300, Alexander Monakov wrote:
> > I've seen microcode version increase after a BIOS update,
>
> Are you saying there's no BIOS update for your board?

No, on the contrary, if there were no updates I would not be able to
see microcode version increase after updating.

> Are you running the latest BIOS?

Yes.

Alexander

2023-02-28 19:25:26

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, Feb 28, 2023 at 10:24:40PM +0300, Alexander Monakov wrote:
> No, on the contrary, if there were no updates I would not be able to
> see microcode version increase after updating.

So what are you saying then?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-28 19:29:29

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, 28 Feb 2023, Borislav Petkov wrote:

> On Tue, Feb 28, 2023 at 10:24:40PM +0300, Alexander Monakov wrote:
> > No, on the contrary, if there were no updates I would not be able to
> > see microcode version increase after updating.
>
> So what are you saying then?

That I can reproduce the bug even with the latest BIOS, i.e. the microcode
in the latest BIOS update does not have the fix, and the linux-firmware.git
microcode does not have a patch for this CPU family at all.

Hence the question how to get fixed microcode for this CPU family.

Alexander

2023-02-28 19:39:06

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On 28/02/2023 7:29 pm, Alexander Monakov wrote:
> On Tue, 28 Feb 2023, Borislav Petkov wrote:
>
>> On Tue, Feb 28, 2023 at 10:24:40PM +0300, Alexander Monakov wrote:
>>> No, on the contrary, if there were no updates I would not be able to
>>> see microcode version increase after updating.
>> So what are you saying then?
> That I can reproduce the bug even with the latest BIOS, i.e. the microcode
> in the latest BIOS update does not have the fix, and the linux-firmware.git
> microcode does not have a patch for this CPU family at all.
>
> Hence the question how to get fixed microcode for this CPU family.

I shouldn't be recommending this, but I've begged AMD many times to be
better at publishing microcode...

https://github.com/platomav/CPUMicrocodes

and you'll need

---8<---
#!/usr/bin/env python3
# -*- coding:utf-8 -*-

import sys
from pathlib import Path
from struct import pack

fout = sys.stdout

def stream_write(_):
    """Write to the output"""
    return fout.buffer.write(_)

blobs = []
for ucode in sorted(sys.argv[1:]):

    # cpu00xx0Fxx_
    fm = int(ucode[3:11], 16)

    eq = int("".join([ucode[5], ucode[6], ucode[9], ucode[10]]), 16)
    sz = Path(ucode).stat().st_size

    print("%08x, %s => %04x" % (fm, ucode[3:11], eq), file=sys.stderr)

    blobs.append((fm, eq, sz, open(ucode, "rb").read()))

equiv = []
for x in blobs:
    equiv.append(pack(b"=IIIHH", x[0], 0, 0, x[1], 0))
equiv.append(pack("=IIIHH", 0, 0, 0, 0, 0))

stream_write(pack("I", 0x00414d44)) # Magic

equiv_bin = b"".join(equiv)

stream_write(pack("II", 0, len(equiv_bin))) # Equiv table
stream_write(equiv_bin)

for blob in blobs:
    stream_write(pack("II", 1, len(blob[3])))
    stream_write(blob[3])
---8<---

to wrap the blob(s) from that git repo into the container format that
Linux and others know how to parse.

~Andrew

2023-02-28 20:56:23

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, Feb 28, 2023 at 10:29:23PM +0300, Alexander Monakov wrote:
> That I can reproduce the bug even with the latest BIOS,

Can you reproduce if you boot with

clearcpuid=xsaves

?

CONFIG_X86_FEATURE_NAMES must be enabled in your .config for that to
work.

Otherwise, try

clearcpuid=323

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-28 21:16:49

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, 28 Feb 2023, Borislav Petkov wrote:

> On Tue, Feb 28, 2023 at 10:29:23PM +0300, Alexander Monakov wrote:
> > That I can reproduce the bug even with the latest BIOS,
>
> Can you reproduce if you boot with
>
> clearcpuid=xsaves
>
> ?

No, with this option it is not reproducible.

Alexander

2023-02-28 21:46:16

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Tue, 28 Feb 2023, Andrew Cooper wrote:

> On 28/02/2023 7:29 pm, Alexander Monakov wrote:
> > On Tue, 28 Feb 2023, Borislav Petkov wrote:
> >
> > That I can reproduce the bug even with the latest BIOS, i.e. the microcode
> > in the latest BIOS update does not have the fix, and the linux-firmware.git
> > microcode does not have a patch for this CPU family at all.
> >
> > Hence the question how to get fixed microcode for this CPU family.
>
> I shouldn't be recommending this, but I've begged AMD many times to be
> better at publishing microcode...
>
> https://github.com/platomav/CPUMicrocodes
>
> and you'll need
[snip script]

Thank you, this is good to have. Unfortunately, even with microcode 0x08600109
taken from that repo the issue is reproducible (it is dated 2022-03-28, so
perhaps not really surprising).

Alexander

2023-03-01 00:23:19

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On 28/02/2023 9:16 pm, Alexander Monakov wrote:
> On Tue, 28 Feb 2023, Borislav Petkov wrote:
>> On Tue, Feb 28, 2023 at 10:29:23PM +0300, Alexander Monakov wrote:
>>> That I can reproduce the bug even with the latest BIOS,
>> Can you reproduce if you boot with
>>
>> clearcpuid=xsaves
>>
>> ?
> No, with this option it is not reproducible.

Ok.

Given that AMD do appear to have screwed up here, and the exploit does
reliably work on modern versions of Linux and up-to-date firmware, the
next course of action is to clobber XSAVES by default.

So we need a table for all Zen2 parts of ucode revisions below which we
force hide XSAVES as the erratum workaround.

That, or we skip the table and just hide XSAVES unconditionally on all
Fam17/18h CPUs... The Zen1/2 uarches have no supervisor states to
manage (AFAICT - the first supervisor states are CET in Zen3 I think),
and Linux already knows how to use XSAVEC (from virt usecases) which is
equivalent given no supervisor states.

Thoughts?

~Andrew

2023-03-01 08:54:25

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, Mar 01, 2023 at 12:23:01AM +0000, Andrew Cooper wrote:
> So we need a table for all Zen2 parts of ucode revisions below which we
> force hide XSAVES as the erratum workaround.

Working on it - stay tuned.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-07 16:49:32

by Tavis Ormandy

[permalink] [raw]

Subject: Re: x86: AMD Zen2 ymm registers rolling back

On Wed, Mar 01, 2023 at 09:54:14AM +0100, Borislav Petkov wrote:
> On Wed, Mar 01, 2023 at 12:23:01AM +0000, Andrew Cooper wrote:
> > So we need a table for all Zen2 parts of ucode revisions below which we
> > force hide XSAVES as the erratum workaround.
>
> Working on it - stay tuned.
>

FYI, I heard from someone offlist that they can reproduce with the
latest microcode available on this cpu:

cpu family : 23
model : 17
model name : AMD Ryzen 3 2200G with Radeon Vega Graphics
stepping : 0
microcode : 0x8101016

It seems likely many zen1 systems are affected without any microcode
patch, which seems not great to me.

I guess it would be nice if AMD could provide a list of devices they're
not going to provide a patch for...

Tavis.

--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger [email protected]
_\_V _( ) _( ) @taviso

2023-03-07 17:52:51

[permalink] [raw]

Subject: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch

AMD Erratum 1386 is summarised as:

XSAVES Instruction May Fail to Save XMM Registers to the Provided
State Save Area

This piece of accidental chronomancy causes the %xmm registers to
occasionally reset back to an older value.

Ignore the XSAVES feature on all AMD Zen1/2 hardware. The XSAVEC
instruction (which works fine) is equivalent on affected parts.

Reported-by: Tavis Ormandy <[email protected]>
Signed-off-by: Andrew Cooper <[email protected]>
---
CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: Dave Hansen <[email protected]>
CC: [email protected]
CC: "H. Peter Anvin" <[email protected]>
CC: [email protected]
CC: Tavis Ormandy <[email protected]>
CC: Alexander Monakov <[email protected]>

Only compile tested.

This wants backporting to all stable trees that understand XSAVES, but
before 5.19(?) needs the XSAVEC support backporting too...
---
arch/x86/kernel/cpu/amd.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 380753b14cab..f3a4bb479fd5 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -890,6 +890,17 @@ static void init_amd_zn(struct cpuinfo_x86 *c)
node_reclaim_distance = 32;
#endif

+ /*
+ * Work around Erratum 1386. The XSAVES instruction malfunctions in
+ * certain circumstances on Zen1/2 uarch, and not all parts have had
+ * updated microcode at the time of writing (March 2023).
+ *
+ * Affected parts all have no supervisor XSAVE states, meaning that
+ * the XSAVEC instruction (which works fine) is equivelent.
+ */
+ if (c->x86 == 0x17)
+ clear_cpu_cap(c, X86_FEATURE_XSAVES);
+
/* Fix up CPUID bits, but only if not virtualised. */
if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {

--
2.30.2

2023-03-07 17:56:37

[permalink] [raw]

Subject: Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch

On Tue, Mar 07, 2023 at 05:46:43PM +0000, Andrew Cooper wrote:
> + * Work around Erratum 1386. The XSAVES instruction malfunctions in
> + * certain circumstances on Zen1/2 uarch, and not all parts have had
> + * updated microcode at the time of writing (March 2023).

Any reason for rushing this and not waiting for me to clarify affected
models first?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-07 18:29:38

[permalink] [raw]

Subject: Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch

On 07/03/2023 5:50 pm, Borislav Petkov wrote:
> On Tue, Mar 07, 2023 at 05:46:43PM +0000, Andrew Cooper wrote:
>> + * Work around Erratum 1386. The XSAVES instruction malfunctions in
>> + * certain circumstances on Zen1/2 uarch, and not all parts have had
>> + * updated microcode at the time of writing (March 2023).
> Any reason for rushing this and not waiting for me to clarify affected
> models first?

Well yes - more and more reports of impacted systems.

Having the full list of affected models is great and all, but how is it
going to change this patch as a workaround ?

~Andrew

2023-03-07 19:13:15

[permalink] [raw]

Subject: Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch

On Tue, Mar 07, 2023 at 06:22:01PM +0000, Andrew Cooper wrote:
> Well yes - more and more reports of impacted systems.
>
> Having the full list of affected models is great and all, but how is it
> going to change this patch as a workaround ?

We don't have to clear the feature flag on those systems which have
a fix.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-07 20:03:35

[permalink] [raw]

Subject: Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch

On 07/03/2023 6:56 pm, Borislav Petkov wrote:
> On Tue, Mar 07, 2023 at 06:22:01PM +0000, Andrew Cooper wrote:
>> Well yes - more and more reports of impacted systems.
>>
>> Having the full list of affected models is great and all, but how is it
>> going to change this patch as a workaround ?
> We don't have to clear the feature flag on those systems which have
> a fix.

Sure, but why is that helpful?

XSAVES and XSAVEC are functionally identical on Zen1/2 because these
CPUs don't advertise any supervisor XSAVE states.

It is only Zen3 where XSAVES starts doing more than XSAVEC (and even
then, only after the CET series actually gets merged...)

~Andrew

2023-03-07 20:09:25