2019-11-26 19:51:21

by Barret Rhoden

[permalink] [raw]
Subject: AVX register corruption from signal delivery

Hi -

The Go Team found a bug[1] where the AVX registers can get corrupted
during signal delivery. They were able to bisect it to commits related
to the "x86: load FPU registers on return to userland" patchset[2].

The bug requires the kernel to be built with GCC 9 to trigger. In
particular, arch/x86/kernel/fpu/signal.c needs to be built with GCC 9.

Thanks,

Barret

[1] https://bugzilla.kernel.org/show_bug.cgi?id=205663
[2]
https://lore.kernel.org/kvm/[email protected]/


Subject: Re: AVX register corruption from signal delivery

On 2019-11-26 14:49:55 [-0500], Barret Rhoden wrote:
> Hi -
Hi,

> The bug requires the kernel to be built with GCC 9 to trigger. In
> particular, arch/x86/kernel/fpu/signal.c needs to be built with GCC 9.

I've been pinged already, I will look into this. Please send me a
.config just to be sure. From browsing over the bug CONFIG_PREEMPTION
was required.

> Thanks,
>
> Barret

Sebastian

2019-11-26 21:25:30

by Barret Rhoden

[permalink] [raw]
Subject: Re: AVX register corruption from signal delivery

On 11/26/19 3:20 PM, Sebastian Andrzej Siewior wrote:
> On 2019-11-26 14:49:55 [-0500], Barret Rhoden wrote:
>> Hi -
> Hi,
>
>> The bug requires the kernel to be built with GCC 9 to trigger. In
>> particular, arch/x86/kernel/fpu/signal.c needs to be built with GCC 9.
>
> I've been pinged already, I will look into this. Please send me a
> .config just to be sure. From browsing over the bug CONFIG_PREEMPTION
> was required.

Thanks; config attached. I've been able to recreate it in QEMU with at
least 2 cores.

Barret




Attachments:
avx-bug.config.gz (27.57 kB)

2019-11-26 22:15:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: AVX register corruption from signal delivery

On Tue, Nov 26, 2019 at 04:23:40PM -0500, Barret Rhoden wrote:
> Thanks; config attached. I've been able to recreate it in QEMU with at
> least 2 cores.

Yap, I can too, in my VM.

Btw, would you guys like to submit that reproducer test program

https://bugzilla.kernel.org/attachment.cgi?id=286073

into the kernel selftests pile here:

tools/testing/selftests/x86/

?

It needs proper cleanup to fit kernel coding style but it could be a
good start for collecting interesting FPU test cases...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-11-26 22:34:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: AVX register corruption from signal delivery



> On Nov 26, 2019, at 2:14 PM, Borislav Petkov <[email protected]> wrote:
>
> On Tue, Nov 26, 2019 at 04:23:40PM -0500, Barret Rhoden wrote:
>> Thanks; config attached. I've been able to recreate it in QEMU with at
>> least 2 cores.
>
> Yap, I can too, in my VM.
>
> Btw, would you guys like to submit that reproducer test program
>
> https://bugzilla.kernel.org/attachment.cgi?id=286073
>
> into the kernel selftests pile here:
>
> tools/testing/selftests/x86/
>
> ?
>
> It needs proper cleanup to fit kernel coding style but it could be a
> good start for collecting interesting FPU test cases.

If we do this, we should have selftests/x86/slow or otherwise have a fast vs slow mode. I really like that the entire suite takes under 2s.

2019-11-26 23:03:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: AVX register corruption from signal delivery

On Tue, Nov 26, 2019 at 02:30:10PM -0800, Andy Lutomirski wrote:
> If we do this, we should have selftests/x86/slow or otherwise have a
> fast vs slow mode. I really like that the entire suite takes under 2s.

Sure, we can stick it under a separate Makefile target called "full" or
so to mean, run the full set of tests. We can run the fast ones first
and the slow ones then. Or something to that effect.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the
context that is currently loaded. It never changed during the life time
of a task and remained stable/constant.

Since we deferred loading the FPU registers on return to userland, the
content of fpu_fpregs_owner_ctx may change during preemption and must
not be cached.
This went unnoticed for some time and was now noticed, in particular
gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse
it in the retry loop:

copy_fpstate_to_sigframe()
load fpu_fpregs_owner_ctx and save on stack
fpregs_lock()
copy_fpregs_to_sigframe() /* failed */
fpregs_unlock()
*** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***

fault_in_pages_writeable() /* succeed, retry */

fpregs_lock()
__fpregs_load_activate()
fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
copy_fpregs_to_sigframe() /* succeeds, random FPU content */

This is a comparison of the assembly of gcc-9, without vs with this
patch:

| # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size))
| cmpq %rdx, %rax # tmp183, _4
| jb .L190 #,
|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|-#APP
|-# 512 "arch/x86/include/asm/fpu/internal.h" 1
|- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__
|-# 0 "" 2
|-#NO_APP
|- movq %rax, -88(%rbp) # pfo_ret__, %sfp

|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|- movq -88(%rbp), %rcx # %sfp, pfo_ret__
|- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#APP
|+# 512 "arch/x86/include/asm/fpu/internal.h" 1
|+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__
|+# 0 "" 2
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#NO_APP
|+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp

Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
fpu_fpregs_owner_ctx during preemption points.

Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")
---

There is no Sign-off by here. Could this please be verified by the
reporter?

Also I would like to add
Debugged-by: Ian Lance Taylor

but I lack the complete address also I'm not sure if he wants to.
Also please send a Reported-by line since I'm not sure who started this.

arch/x86/include/asm/fpu/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 4c95c365058aa..44c48e34d7994 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)

static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
- return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
+ return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
}

/*
--
2.24.0

2019-11-27 14:09:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

On Wed, Nov 27, 2019 at 01:42:43PM +0100, Sebastian Andrzej Siewior wrote:
> The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the
^
to

> context that is currently loaded. It never changed during the life time
> of a task and remained stable/constant.
>
> Since we deferred loading the FPU registers on return to userland, the

Drop those "we"s :)

> content of fpu_fpregs_owner_ctx may change during preemption and must
> not be cached.
> This went unnoticed for some time and was now noticed, in particular
> gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse
> it in the retry loop:
>
> copy_fpstate_to_sigframe()
> load fpu_fpregs_owner_ctx and save on stack
> fpregs_lock()
> copy_fpregs_to_sigframe() /* failed */
> fpregs_unlock()
> *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
>
> fault_in_pages_writeable() /* succeed, retry */
>
> fpregs_lock()
> __fpregs_load_activate()
> fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
> copy_fpregs_to_sigframe() /* succeeds, random FPU content */
>
> This is a comparison of the assembly of gcc-9, without vs with this
> patch:
>
> | # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size))
> | cmpq %rdx, %rax # tmp183, _4
> | jb .L190 #,
> |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |-#APP
> |-# 512 "arch/x86/include/asm/fpu/internal.h" 1
> |- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__
> |-# 0 "" 2
> |-#NO_APP
> |- movq %rax, -88(%rbp) # pfo_ret__, %sfp
> …
> |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |- movq -88(%rbp), %rcx # %sfp, pfo_ret__
> |- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp
> |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |+#APP
> |+# 512 "arch/x86/include/asm/fpu/internal.h" 1
> |+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__
> |+# 0 "" 2
> |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |+#NO_APP
> |+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp
>
> Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
> fpu_fpregs_owner_ctx during preemption points.
>
> Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")

Or

a352a3b7b792 ("x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD")

maybe, which adds the fpregs_unlock() ?

> ---
>
> There is no Sign-off by here. Could this please be verified by the
> reporter?

Not the reporter, but I just tested it successfully too:

Tested-by: Borislav Petkov <[email protected]>

> Also I would like to add
> Debugged-by: Ian Lance Taylor

Yes, pls. CCed.

>
> but I lack the complete address also I'm not sure if he wants to.
> Also please send a Reported-by line since I'm not sure who started this.
>
> arch/x86/include/asm/fpu/internal.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 4c95c365058aa..44c48e34d7994 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
>
> static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
> {
> - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> }

And to add one more data point from IRC: this is also documented:

/*
* this_cpu_read() makes gcc load the percpu variable every time it is
* accessed while this_cpu_read_stable() allows the value to be cached.
^^^^^^^^^^^^^^^

* this_cpu_read_stable() is more efficient and can be used if its value
* is guaranteed to be valid across cpus. The current users include
* get_current() and get_thread_info() both of which are actually
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
#define this_cpu_read_stable(var) percpu_stable_op("mov", var)


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-11-27 16:07:27

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

On Wed, 2019-11-27 at 13:42 +0100, Sebastian Andrzej Siewior wrote:

> There is no Sign-off by here. Could this please be verified by the
> reporter?

Next time this is posted, feel free to add this :)

Reviewed-by: Rik van Riel <[email protected]>

> diff --git a/arch/x86/include/asm/fpu/internal.h
> b/arch/x86/include/asm/fpu/internal.h
> index 4c95c365058aa..44c48e34d7994 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -509,7 +509,7 @@ static inline void
> __fpu_invalidate_fpregs_state(struct fpu *fpu)
>
> static inline int fpregs_state_valid(struct fpu *fpu, unsigned int
> cpu)
> {
> - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu
> == fpu->last_cpu;
> + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu ==
> fpu->last_cpu;
> }
>
> /*
--
All Rights Reversed.


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2019-11-27 18:46:20

by Barret Rhoden

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

>> Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
>> fpu_fpregs_owner_ctx during preemption points.
>>
>> Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")
>
> Or
>
> a352a3b7b792 ("x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD")
>
> maybe, which adds the fpregs_unlock() ?

Using this_cpu_read_stable() (or some variant) seems to go back quite a
while; not sure when exactly it became a problem. If it helps, commit
d9c9ce34ed5c ("x86/fpu: Fault-in user stack if
copy_fpstate_to_sigframe() fails") was the one that popped up the most
during Austin's bisection.

>> Also I would like to add
>> Debugged-by: Ian Lance Taylor
>
> Yes, pls. CCed.

To close the loop on this, here's what Austin wrote on the bugzilla:

> --- Comment #2 from Austin Clements ([email protected]) ---
> I can confirm that the patch posted by Sebastian Andrzej Siewior at
> https://lkml.org/lkml/2019/11/27/304 fixes the issue both in our C reproducer
> and in our original Go reproducer. (Sorry, I'm not subscribed to LKML, so I
> can't reply there, and I'm on an airplane, so it's hard to get subscribed :)
>
> Regarding the question about the "Debugged-by" line in the patch, debugging was
> a joint effort between myself (Austin Clements <[email protected]>), David
> Chase <[email protected]>, and Ian Lance Taylor <[email protected]>.

Thanks,

Barret

Subject: [PATCH v2] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the
context that is currently loaded. It never changed during the life time
of a task and remained stable/constant.

Since deferred loading the FPU registers on return to userland, the
content of fpu_fpregs_owner_ctx may change during preemption and must
not be cached.
This went unnoticed for some time and was now noticed, in particular
gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse
it in the retry loop:

copy_fpstate_to_sigframe()
load fpu_fpregs_owner_ctx and save on stack
fpregs_lock()
copy_fpregs_to_sigframe() /* failed */
fpregs_unlock()
*** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***

fault_in_pages_writeable() /* succeed, retry */

fpregs_lock()
__fpregs_load_activate()
fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
copy_fpregs_to_sigframe() /* succeeds, random FPU content */

This is a comparison of the assembly of gcc-9, without vs with this
patch:

| # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size))
| cmpq %rdx, %rax # tmp183, _4
| jb .L190 #,
|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|-#APP
|-# 512 "arch/x86/include/asm/fpu/internal.h" 1
|- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__
|-# 0 "" 2
|-#NO_APP
|- movq %rax, -88(%rbp) # pfo_ret__, %sfp

|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|- movq -88(%rbp), %rcx # %sfp, pfo_ret__
|- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#APP
|+# 512 "arch/x86/include/asm/fpu/internal.h" 1
|+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__
|+# 0 "" 2
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#NO_APP
|+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp

Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
fpu_fpregs_owner_ctx during preemption points.

The fixes tag points to the commit where defered FPU loading was added.
Since this commit the compiler is no longer allowed move the load of
fpu_fpregs_owner_ctx somewhere else / outside of the locked section. A
task preemption will change its value and stale content will be observed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663
Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")
Debugged-by: Austin Clements <[email protected]>
Debugged-by: David Chase <[email protected]>
Debugged-by: Ian Lance Taylor <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---

v1…v2:
- Adding tags
- Explaining why Fixes: does not point to the bisected commit.

arch/x86/include/asm/fpu/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 4c95c365058aa..44c48e34d7994 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)

static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
- return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
+ return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
}

/*
--
2.24.0

Subject: [tip: x86/urgent] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 59c4bd853abcea95eccc167a7d7fd5f1a5f47b98
Gitweb: https://git.kernel.org/tip/59c4bd853abcea95eccc167a7d7fd5f1a5f47b98
Author: Sebastian Andrzej Siewior <[email protected]>
AuthorDate: Thu, 28 Nov 2019 09:53:06 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 28 Nov 2019 10:16:46 +01:00

x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing
to the context that is currently loaded. It never changed during the
lifetime of a task - it remained stable/constant.

After deferred FPU registers loading until return to userland was
implemented, the content of fpu_fpregs_owner_ctx may change during
preemption and must not be cached.

This went unnoticed for some time and was now noticed, in particular
since gcc 9 is caching that load in copy_fpstate_to_sigframe() and
reusing it in the retry loop:

copy_fpstate_to_sigframe()
load fpu_fpregs_owner_ctx and save on stack
fpregs_lock()
copy_fpregs_to_sigframe() /* failed */
fpregs_unlock()
*** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***

fault_in_pages_writeable() /* succeed, retry */

fpregs_lock()
__fpregs_load_activate()
fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
copy_fpregs_to_sigframe() /* succeeds, random FPU content */

This is a comparison of the assembly produced by gcc 9, without vs with this
patch:

| # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size))
| cmpq %rdx, %rax # tmp183, _4
| jb .L190 #,
|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|-#APP
|-# 512 "arch/x86/include/asm/fpu/internal.h" 1
|- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__
|-# 0 "" 2
|-#NO_APP
|- movq %rax, -88(%rbp) # pfo_ret__, %sfp

|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|- movq -88(%rbp), %rcx # %sfp, pfo_ret__
|- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#APP
|+# 512 "arch/x86/include/asm/fpu/internal.h" 1
|+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__
|+# 0 "" 2
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#NO_APP
|+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp

Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
fpu_fpregs_owner_ctx during preemption points.

The Fixes: tag points to the commit where deferred FPU loading was
added. Since this commit, the compiler is no longer allowed to move the
load of fpu_fpregs_owner_ctx somewhere else / outside of the locked
section. A task preemption will change its value and stale content will
be observed.

[ bp: Massage. ]

Debugged-by: Austin Clements <[email protected]>
Debugged-by: David Chase <[email protected]>
Debugged-by: Ian Lance Taylor <[email protected]>
Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Cc: Aubrey Li <[email protected]>
Cc: Austin Clements <[email protected]>
Cc: Barret Rhoden <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Chase <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Josh Bleecher Snyder <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663
---
arch/x86/include/asm/fpu/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 4c95c36..44c48e3 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)

static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
- return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
+ return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
}

/*

2019-11-29 16:59:11

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

From Sebastian Andrzej Siewior
> Sent: 28 November 2019 08:53
> The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the
> context that is currently loaded. It never changed during the life time
> of a task and remained stable/constant.
>
> Since deferred loading the FPU registers on return to userland, the
> content of fpu_fpregs_owner_ctx may change during preemption and must
> not be cached.
> This went unnoticed for some time and was now noticed, in particular
> gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse
> it in the retry loop:
>
> copy_fpstate_to_sigframe()
> load fpu_fpregs_owner_ctx and save on stack
> fpregs_lock()
> copy_fpregs_to_sigframe() /* failed */
> fpregs_unlock()
> *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
>
> fault_in_pages_writeable() /* succeed, retry */
>
> fpregs_lock()
> __fpregs_load_activate()
> fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
> copy_fpregs_to_sigframe() /* succeeds, random FPU content */

Should both fpregs_lock() and fpregs_unlock() contain a barrier() (or "memory" clobber)
do stop the compiler moving anything across them?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Subject: Re: [PATCH v2] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

On 2019-11-29 16:57:42 [+0000], David Laight wrote:
> Should both fpregs_lock() and fpregs_unlock() contain a barrier() (or "memory" clobber)
> do stop the compiler moving anything across them?

They already do.

> David

Sebastian