Hi,
On a 4-core Ivybridge platform, when doing a lot of suspend-to-ram/resume
cycles, we were observing processes randomly killed by a SIGFPE.
When dumping the FPU registers state on the SIGFPE (usually a floating stack
underflow/overflow on a floating point arithmetic operation), the FPU registers
looks empty or at least corrupted which was more or less impossible with
respect to the disassembled floating point code.
After doing more tracing, in the faulty case, the process seems to be keeping
FPU ownership over a secondary CPU unplug/re-plug triggered by the suspend.
Then it's doing a lazy restore of its FPU context (ie just using the current
FPU hardware registers as he is the owner) instead of writing them back
to the hardware from the version previously saved in the task context,
despite the fact the whole FPU hardware state has been lost.
Just invalidating the "fpu_owner_task" when disabling a secondary CPU seems
to solve my issue (it's already reset for the primary CPU).
By the way, when FPU the lazy restore patch was discussed back in february,
Ingo commented (in http://permalink.gmane.org/gmane.linux.kernel/1255423) :
"
I guess the CPU hotplug case deserves a comment in the code: CPU
hotplug + replug of the same (but meanwhile reset) CPU is safe
because fpu_owner_task[cpu] gets reset to NULL.
"
That contradicts my previous observation, so maybe I have totally overlooked
something in this mechanism.
Can you comment ?
I'm still putting my patch proposal in this thread.
The issue seems to exist since 3.4 after the FPU lazy restore was actually
implemented by commit 7e16838d "i387: support lazy restore of FPU state".
But the issue is mainly visible on 3.4 and 3.6 since on tip of tree, it is
hidden by the eager fpu implementation for platforms with xsave support,
but it still happens with eagerfpu=off.
To apply this change to 3.4, "this_cpu_write" needs to be replaced by
percpu_write.
--
Vincent
When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.
We can just invalidate the "fpu_owner_task", so nobody will try to
lazy restore a state which no longer exists in the hardware.
Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.
The issue seems to exist since 3.4 (after the FPU lazy restore was actually implemented),
to apply the change to 3.4, "this_cpu_write" needs to be replaced by percpu_write.
Cc: Duncan Laurie <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: <[email protected]> [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write
Signed-off-by: Vincent Palatin <[email protected]>
---
arch/x86/kernel/smpboot.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..7610c58 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
#include <asm/mwait.h>
#include <asm/apic.h>
#include <asm/io_apic.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
#include <linux/mc146818rtc.h>
@@ -1230,6 +1232,9 @@ int native_cpu_disable(void)
clear_local_APIC();
cpu_disable_common();
+
+ /* the FPU context will be lost, nobody owns it */
+ this_cpu_write(fpu_owner_task, NULL);
return 0;
}
--
1.7.7.3
On 11/30/2012 10:52 AM, Vincent Palatin wrote:
> When a cpu enters S3 state, the FPU state is lost.
> After resuming for S3, if we try to lazy restore the FPU for a process running
> on the same CPU, this will result in a corrupted FPU context.
>
> We can just invalidate the "fpu_owner_task", so nobody will try to
> lazy restore a state which no longer exists in the hardware.
>
> Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
> by doing thousands of suspend/resume cycles with 4 processes doing FPU
> operations running. Without the patch, a process is killed after a
> few hundreds cycles by a SIGFPE.
>
> The issue seems to exist since 3.4 (after the FPU lazy restore was actually implemented),
> to apply the change to 3.4, "this_cpu_write" needs to be replaced by percpu_write.
>
> Cc: Duncan Laurie <[email protected]>
> Cc: Olof Johansson <[email protected]>
> Cc: <[email protected]> [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write
> Signed-off-by: Vincent Palatin <[email protected]>
Ouch! Thank you for catching this!
-hpa
Commit-ID: c9370d1039848a813a63c3fd44b6e4833a78246a
Gitweb: http://git.kernel.org/tip/c9370d1039848a813a63c3fd44b6e4833a78246a
Author: Vincent Palatin <[email protected]>
AuthorDate: Fri, 30 Nov 2012 10:52:03 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Fri, 30 Nov 2012 10:58:14 -0800
x86, fpu: Avoid FPU lazy restore after suspend
When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.
We can just invalidate the "fpu_owner_task", so nobody will try to
lazy restore a state which no longer exists in the hardware.
Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.
The issue seems to exist since 3.4 (after the FPU lazy restore was actually implemented),
to apply the change to 3.4, "this_cpu_write" needs to be replaced by percpu_write.
Cc: Duncan Laurie <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: <[email protected]> [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write
Signed-off-by: Vincent Palatin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/kernel/smpboot.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..7610c58 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
#include <asm/mwait.h>
#include <asm/apic.h>
#include <asm/io_apic.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
#include <linux/mc146818rtc.h>
@@ -1230,6 +1232,9 @@ int native_cpu_disable(void)
clear_local_APIC();
cpu_disable_common();
+
+ /* the FPU context will be lost, nobody owns it */
+ this_cpu_write(fpu_owner_task, NULL);
return 0;
}
On Fri, Nov 30, 2012 at 10:52 AM, Vincent Palatin <[email protected]> wrote:
> When a cpu enters S3 state, the FPU state is lost.
> After resuming for S3, if we try to lazy restore the FPU for a process running
> on the same CPU, this will result in a corrupted FPU context.
Good catch, and I think the patch is technically correct, but:
> We can just invalidate the "fpu_owner_task", so nobody will try to
> lazy restore a state which no longer exists in the hardware.
I think this works well, but is a tiny bit subtle.
And what I _don't_ necessarily think is the right thing to do is to
only comment on it in the commit message, because it means that later
generations will not necessarily ever notice. And it does result in a
new combination that I don't think we've had before: if the current
task is the owner, we now have "tsk->thread.fpu.has_fpu = 1" but with
"fpu_owner_task" being NULL.
Which gets us the semantics we want (we will save the current CPU info
when switching away, but we will not restore when switching back), but
my gut feel is that we really want to comment on that exact thing. And
possibly even make a helper function for this in <sm/fpu-internal.h>,
something like
/*
* Must be run with preemption disabled: this clears the fpu_owner_task,
* on this CPU.
*
* This will disable any lazy FPU state restore of the current FPU state,
* but if the current thread owns the FPU, it will still be saved by.
*/
static inline void __cpu_disable_lazy_restore(void)
{
this_cpu_write(fpu_owner_task, NULL);
}
and in fact I think the right place to do this *might* be in
"native_cpu_die()" instead, at which point it would actually be
something like
per_cpu(fpu_owner_task, cpu) = NULL;
*after* the CPU is dead, so that nothing ever can actually see the
state where a process is still running on the CPU and might possibly
use the FPU.
I dunno. I think doing it after really killing the CPU (ie in the
native_cpu_die() function) might be easier to think about, but I don't
really hate your patch either (it does make me go "ok, we need to
guarantee no scheduling or FP use after" - which is probably true, but
it's still some non-local thing). Either way, a comment about it and
abstracting whatever the invalidation sequence is in fpu-internal.h
sounds like a good idea.
Hmm?
Linus
On 11/30/2012 11:25 AM, Linus Torvalds wrote:
>
> and in fact I think the right place to do this *might* be in
> "native_cpu_die()" instead, at which point it would actually be
> something like
>
> per_cpu(fpu_owner_task, cpu) = NULL;
>
> *after* the CPU is dead, so that nothing ever can actually see the
> state where a process is still running on the CPU and might possibly
> use the FPU.
>
> I dunno. I think doing it after really killing the CPU (ie in the
> native_cpu_die() function) might be easier to think about, but I don't
> really hate your patch either (it does make me go "ok, we need to
> guarantee no scheduling or FP use after" - which is probably true, but
> it's still some non-local thing). Either way, a comment about it and
> abstracting whatever the invalidation sequence is in fpu-internal.h
> sounds like a good idea.
>
Hmm... from my point of view it would almost seem saner to do this on
the way *up*... as part of CPU (re-)initialization. After all, the
"nothing is currently running on this CPU" is part of the initial state
of the CPU, regardless of if we have ever been online before or not.
-hpa
On Fri, Nov 30, 2012 at 11:38 AM, H. Peter Anvin <[email protected]> wrote:
> On 11/30/2012 11:25 AM, Linus Torvalds wrote:
>>
>> and in fact I think the right place to do this *might* be in
>> "native_cpu_die()" instead, at which point it would actually be
>> something like
>>
>> per_cpu(fpu_owner_task, cpu) = NULL;
>>
>> *after* the CPU is dead, so that nothing ever can actually see the
>> state where a process is still running on the CPU and might possibly
>> use the FPU.
>>
>> I dunno. I think doing it after really killing the CPU (ie in the
>> native_cpu_die() function) might be easier to think about, but I don't
>> really hate your patch either (it does make me go "ok, we need to
>> guarantee no scheduling or FP use after" - which is probably true, but
>> it's still some non-local thing). Either way, a comment about it and
>> abstracting whatever the invalidation sequence is in fpu-internal.h
>> sounds like a good idea.
>>
>
> Hmm... from my point of view it would almost seem saner to do this on
> the way *up*... as part of CPU (re-)initialization. After all, the
> "nothing is currently running on this CPU" is part of the initial state
> of the CPU, regardless of if we have ever been online before or not.
That works for me too, and has the similar advantage of being easier
to think about than "the cpu is actually still being used, and we're
playing tricks with the FPU state cache" approach.
Regardless, much credit to Vincent (and presumably others on the
chromium team) for finding and debugging this.
Linus
On 11/30/2012 11:41 AM, Linus Torvalds wrote:
>>
>> Hmm... from my point of view it would almost seem saner to do this on
>> the way *up*... as part of CPU (re-)initialization. After all, the
>> "nothing is currently running on this CPU" is part of the initial state
>> of the CPU, regardless of if we have ever been online before or not.
>
> That works for me too, and has the similar advantage of being easier
> to think about than "the cpu is actually still being used, and we're
> playing tricks with the FPU state cache" approach.
>
> Regardless, much credit to Vincent (and presumably others on the
> chromium team) for finding and debugging this.
>
OK, how do you want to handle this, given that 3.7 is presumably
imminent? We can apply Vincent's patch now, and then restructure it
later, or we can go for the gold now, but replicating the testing
Vincent has done will probably take well into next week.
-hpa
When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.
Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
so nobody will try to lazy restore a state which doesn't exist in the hardware.
Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.
The issue seems to exist since 3.4 (after the FPU lazy restore was actually implemented),
to apply the change to 3.4, "this_cpu_write" needs to be replaced by percpu_write.
Cc: Duncan Laurie <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: <[email protected]> [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write
Signed-off-by: Vincent Palatin <[email protected]>
---
arch/x86/include/asm/fpu-internal.h | 15 +++++++++------
arch/x86/kernel/smpboot.c | 5 +++++
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 831dbb9..41ab26e 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -399,14 +399,17 @@ static inline void drop_init_fpu(struct task_struct *tsk)
typedef struct { int preload; } fpu_switch_t;
/*
- * FIXME! We could do a totally lazy restore, but we need to
- * add a per-cpu "this was the task that last touched the FPU
- * on this CPU" variable, and the task needs to have a "I last
- * touched the FPU on this CPU" and check them.
+ * Must be run with preemption disabled: this clears the fpu_owner_task,
+ * on this CPU.
*
- * We don't do that yet, so "fpu_lazy_restore()" always returns
- * false, but some day..
+ * This will disable any lazy FPU state restore of the current FPU state,
+ * but if the current thread owns the FPU, it will still be saved by.
*/
+static inline void __cpu_disable_lazy_restore(unsigned int cpu)
+{
+ per_cpu(fpu_owner_task, cpu) = NULL;
+}
+
static inline int fpu_lazy_restore(struct task_struct *new, unsigned int cpu)
{
return new == this_cpu_read_stable(fpu_owner_task) &&
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..f3e2ec8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
#include <asm/mwait.h>
#include <asm/apic.h>
#include <asm/io_apic.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
#include <linux/mc146818rtc.h>
@@ -818,6 +820,9 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle)
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+ /* the FPU context is blank, nobody can own it */
+ __cpu_disable_lazy_restore(cpu);
+
err = do_boot_cpu(apicid, cpu, tidle);
if (err) {
pr_debug("do_boot_cpu failed %d\n", err);
--
1.7.7.3
On 11/30/2012 11:54 AM, Vincent Palatin wrote:
>>
> I have done a patch v2 according to your suggestions.
> I will run the testing on it now.
> I probably need at least 2 to 3 hours to validate it.
>
That would be super. Let me know and I'll queue it up and send a pull
request with this and a few more urgent things to Linus.
-hpa
When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.
Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
so nobody will try to lazy restore a state which doesn't exist in the hardware.
Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.
Cc: Duncan Laurie <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: <[email protected]> [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write
Signed-off-by: Vincent Palatin <[email protected]>
---
Hi,
The patch updated according the HPA and Linus comments.
I'm still re-running the testing on v3.
Change in v3:
- remove misleading comment about 3.4 in the description.
Change in v2:
- add an helper function and comment in fpu-internal.h as described by Linus
- do the cleaning in the native_cpu_up function as suggested by HPA
Vincent
arch/x86/include/asm/fpu-internal.h | 15 +++++++++------
arch/x86/kernel/smpboot.c | 5 +++++
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 831dbb9..41ab26e 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -399,14 +399,17 @@ static inline void drop_init_fpu(struct task_struct *tsk)
typedef struct { int preload; } fpu_switch_t;
/*
- * FIXME! We could do a totally lazy restore, but we need to
- * add a per-cpu "this was the task that last touched the FPU
- * on this CPU" variable, and the task needs to have a "I last
- * touched the FPU on this CPU" and check them.
+ * Must be run with preemption disabled: this clears the fpu_owner_task,
+ * on this CPU.
*
- * We don't do that yet, so "fpu_lazy_restore()" always returns
- * false, but some day..
+ * This will disable any lazy FPU state restore of the current FPU state,
+ * but if the current thread owns the FPU, it will still be saved by.
*/
+static inline void __cpu_disable_lazy_restore(unsigned int cpu)
+{
+ per_cpu(fpu_owner_task, cpu) = NULL;
+}
+
static inline int fpu_lazy_restore(struct task_struct *new, unsigned int cpu)
{
return new == this_cpu_read_stable(fpu_owner_task) &&
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..f3e2ec8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
#include <asm/mwait.h>
#include <asm/apic.h>
#include <asm/io_apic.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
#include <linux/mc146818rtc.h>
@@ -818,6 +820,9 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle)
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+ /* the FPU context is blank, nobody can own it */
+ __cpu_disable_lazy_restore(cpu);
+
err = do_boot_cpu(apicid, cpu, tidle);
if (err) {
pr_debug("do_boot_cpu failed %d\n", err);
--
1.7.7.3
On Fri, Nov 30, 2012 at 11:55 AM, H. Peter Anvin <[email protected]> wrote:
>
> On 11/30/2012 11:54 AM, Vincent Palatin wrote:
> >>
> > I have done a patch v2 according to your suggestions.
> > I will run the testing on it now.
> > I probably need at least 2 to 3 hours to validate it.
> >
>
> That would be super. Let me know and I'll queue it up and send a pull
> request with this and a few more urgent things to Linus.
I have done 1000+ cycles so far with patch v3 (on 4-core Ivybridge and
no eagerfpu),
and did not hit my issue.
I let the testing going on,
but wrt the issue after suspend, this fixes it with very high probability
(ie I have never done that many cycles without hitting the issue).
--
Vincent
Commit-ID: 644c154186386bb1fa6446bc5e037b9ed098db46
Gitweb: http://git.kernel.org/tip/644c154186386bb1fa6446bc5e037b9ed098db46
Author: Vincent Palatin <[email protected]>
AuthorDate: Fri, 30 Nov 2012 12:15:32 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Fri, 30 Nov 2012 13:48:05 -0800
x86, fpu: Avoid FPU lazy restore after suspend
When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.
Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
so nobody will try to lazy restore a state which doesn't exist in the hardware.
Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.
Cc: Duncan Laurie <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: <[email protected]> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write
Signed-off-by: Vincent Palatin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/fpu-internal.h | 15 +++++++++------
arch/x86/kernel/smpboot.c | 5 +++++
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 831dbb9..41ab26e 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -399,14 +399,17 @@ static inline void drop_init_fpu(struct task_struct *tsk)
typedef struct { int preload; } fpu_switch_t;
/*
- * FIXME! We could do a totally lazy restore, but we need to
- * add a per-cpu "this was the task that last touched the FPU
- * on this CPU" variable, and the task needs to have a "I last
- * touched the FPU on this CPU" and check them.
+ * Must be run with preemption disabled: this clears the fpu_owner_task,
+ * on this CPU.
*
- * We don't do that yet, so "fpu_lazy_restore()" always returns
- * false, but some day..
+ * This will disable any lazy FPU state restore of the current FPU state,
+ * but if the current thread owns the FPU, it will still be saved by.
*/
+static inline void __cpu_disable_lazy_restore(unsigned int cpu)
+{
+ per_cpu(fpu_owner_task, cpu) = NULL;
+}
+
static inline int fpu_lazy_restore(struct task_struct *new, unsigned int cpu)
{
return new == this_cpu_read_stable(fpu_owner_task) &&
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..f3e2ec8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
#include <asm/mwait.h>
#include <asm/apic.h>
#include <asm/io_apic.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
#include <linux/mc146818rtc.h>
@@ -818,6 +820,9 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle)
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+ /* the FPU context is blank, nobody can own it */
+ __cpu_disable_lazy_restore(cpu);
+
err = do_boot_cpu(apicid, cpu, tidle);
if (err) {
pr_debug("do_boot_cpu failed %d\n", err);