LinuxLists.cc - [PATCH 0/5] i387: stable kernel backport

2012-02-22 20:55:10

by Linus Torvalds

[permalink] [raw]

Subject: [PATCH 0/5] i387: stable kernel backport

This is a reasonably minimal backport of the i387 state save/restore bug.

A few of the commits are just minimal "make it easier to backport" commits
that don't necessarily fix anything on their own. And a few of the others
are combinations of what is two commits in the development tree, because
it turned out to be easier and cleaner that way.

The last one is the one that fixes the x86-32 bug, but the preemption
fixes are real fixes too, although they are probably not something that
anybody has necessarily ever hit in reality because the race window is so
small. Even so, if the fix itself isn't that important, the "make it
easier to backport the main one" would still be a big argument for it.

I *really* hope that the people who could reproduce this bug will test the
back-port series too, since I never actually saw the bug personally to
begin with. And again, big thanks to Raphael who helped pinpoint and debug
this.

Linus

---
Linus Torvalds (5):
i387: math_state_restore() isn't called from asm
i387: make irq_fpu_usable() tests more robust
i387: fix x86-64 preemption-unsafe user stack save/restore
i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers
i387: move TS_USEDFPU flag from thread_info to task_struct

arch/x86/include/asm/i387.h | 158 +++++++++++++++++++++++++++++++-----
arch/x86/include/asm/processor.h | 1 +
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/kernel/traps.c | 14 ++--
arch/x86/kernel/xsave.c | 12 +--
arch/x86/kvm/vmx.c | 2 +-
6 files changed, 150 insertions(+), 39 deletions(-)

--
1.7.9.188.g12766.dirty

2012-02-22 20:56:11

by Linus Torvalds

[permalink] [raw]

Subject: [PATCH 1/5] i387: math_state_restore() isn't called from asm

From: Linus Torvalds <[email protected]>
Date: Mon, 13 Feb 2012 13:47:25 -0800

[ Upstream commit be98c2cdb15ba26148cd2bd58a857d4f7759ed38 ]

It was marked asmlinkage for some really old and stale legacy reasons.
Fix that and the equally stale comment.

Noticed when debugging the irq_fpu_usable() bugs.

Signed-off-by: Linus Torvalds <[email protected]>
Cc: [email protected] # Helps back-porting
---

Ok, this is trivial, but correct, and avoids the silly clashes with the
other patches.

arch/x86/include/asm/i387.h | 2 +-
arch/x86/kernel/traps.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index c9e09ea05644..cba143210780 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -29,7 +29,7 @@ extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
extern int init_fpu(struct task_struct *child);
-extern asmlinkage void math_state_restore(void);
+extern void math_state_restore(void);
extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a8e3eb83466c..727e6c16f294 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -590,10 +590,10 @@ void __math_state_restore(void)
* Careful.. There are problems with IBM-designed IRQ13 behaviour.
* Don't touch unless you *really* know how it works.
*
- * Must be called with kernel preemption disabled (in this case,
- * local interrupts are disabled at the call-site in entry.S).
+ * Must be called with kernel preemption disabled (eg with local
+ * local interrupts as in the case of do_device_not_available).
*/
-asmlinkage void math_state_restore(void)
+void math_state_restore(void)
{
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = thread->task;
--
1.7.9.188.g12766.dirty

2012-02-22 20:56:43

by Linus Torvalds

[permalink] [raw]

Subject: [PATCH 2/5] i387: make irq_fpu_usable() tests more robust

From: Linus Torvalds <[email protected]>
Date: Mon, 13 Feb 2012 13:56:14 -0800

[ Upstream commit 5b1cbac37798805c1fee18c8cebe5c0a13975b17 and
subsequent fix from c38e23456278e967f094b08247ffc3711b1029b2 ]

Some code - especially the crypto layer - wants to use the x86
FP/MMX/AVX register set in what may be interrupt (typically softirq)
context.

That *can* be ok, but the tests for when it was ok were somewhat
suspect. We cannot touch the thread-specific status bits either, so
we'd better check that we're not going to try to save FP state or
anything like that.

Now, it may be that the TS bit is always cleared *before* we set the
USEDFPU bit (and only set when we had already cleared the USEDFP
before), so the TS bit test may actually have been sufficient, but it
certainly was not obviously so.

So this explicitly verifies that we will not touch the TS_USEDFPU bit,
and adds a few related sanity-checks. Because it seems that somehow
AES-NI is corrupting user FP state. The cause is not clear, and this
patch doesn't fix it, but while debugging it I really wanted the code to
be more obviously correct and robust.

Signed-off-by: Linus Torvalds <[email protected]>
Cc: [email protected] # Fixes (unimportant) bug and helps backporting
---
arch/x86/include/asm/i387.h | 54 ++++++++++++++++++++++++++++++++++++------
1 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index cba143210780..262bea981aa5 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -307,9 +307,54 @@ static inline void __clear_fpu(struct task_struct *tsk)
}
}

+/*
+ * Were we in an interrupt that interrupted kernel mode?
+ *
+ * We can do a kernel_fpu_begin/end() pair *ONLY* if that
+ * pair does nothing at all: TS_USEDFPU must be clear (so
+ * that we don't try to save the FPU state), and TS must
+ * be set (so that the clts/stts pair does nothing that is
+ * visible in the interrupted kernel thread).
+ */
+static inline bool interrupted_kernel_fpu_idle(void)
+{
+ return !(current_thread_info()->status & TS_USEDFPU) &&
+ (read_cr0() & X86_CR0_TS);
+}
+
+/*
+ * Were we in user mode (or vm86 mode) when we were
+ * interrupted?
+ *
+ * Doing kernel_fpu_begin/end() is ok if we are running
+ * in an interrupt context from user mode - we'll just
+ * save the FPU state as required.
+ */
+static inline bool interrupted_user_mode(void)
+{
+ struct pt_regs *regs = get_irq_regs();
+ return regs && user_mode_vm(regs);
+}
+
+/*
+ * Can we use the FPU in kernel mode with the
+ * whole "kernel_fpu_begin/end()" sequence?
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+static inline bool irq_fpu_usable(void)
+{
+ return !in_interrupt() ||
+ interrupted_user_mode() ||
+ interrupted_kernel_fpu_idle();
+}
+
static inline void kernel_fpu_begin(void)
{
struct thread_info *me = current_thread_info();
+
+ WARN_ON_ONCE(!irq_fpu_usable());
preempt_disable();
if (me->status & TS_USEDFPU)
__save_init_fpu(me->task);
@@ -323,14 +368,6 @@ static inline void kernel_fpu_end(void)
preempt_enable();
}

-static inline bool irq_fpu_usable(void)
-{
- struct pt_regs *regs;
-
- return !in_interrupt() || !(regs = get_irq_regs()) || \
- user_mode(regs) || (read_cr0() & X86_CR0_TS);
-}
-
/*
* Some instructions like VIA's padlock instructions generate a spurious
* DNA fault but don't modify SSE registers. And these instructions
@@ -367,6 +404,7 @@ static inline void irq_ts_restore(int TS_state)
*/
static inline void save_init_fpu(struct task_struct *tsk)
{
+ WARN_ON_ONCE(!(task_thread_info(tsk)->status & TS_USEDFPU));
preempt_disable();
__save_init_fpu(tsk);
stts();
--
1.7.9.188.g12766.dirty

2012-02-22 20:57:38

by Linus Torvalds

[permalink] [raw]

Subject: [PATCH 3/5] i387: fix x86-64 preemption-unsafe user stack save/restore

From: Linus Torvalds <[email protected]>
Date: Thu, 16 Feb 2012 09:15:04 -0800

[ Upstream commit 15d8791cae75dca27bfda8ecfe87dca9379d6bb0 ]

Commit 5b1cbac37798 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.

However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.

Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.

This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid. With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.

There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.

However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.

Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.

Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.

Signed-off-by: Linus Torvalds <[email protected]>
Cc: [email protected] # Fixes preemption bug and helps backporting
---
arch/x86/include/asm/i387.h | 42 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/xsave.c | 10 +++-------
2 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 262bea981aa5..6e87fa43c357 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -400,6 +400,48 @@ static inline void irq_ts_restore(int TS_state)
}

/*
+ * The question "does this thread have fpu access?"
+ * is slightly racy, since preemption could come in
+ * and revoke it immediately after the test.
+ *
+ * However, even in that very unlikely scenario,
+ * we can just assume we have FPU access - typically
+ * to save the FP state - we'll just take a #NM
+ * fault and get the FPU access back.
+ *
+ * The actual user_fpu_begin/end() functions
+ * need to be preemption-safe, though.
+ *
+ * NOTE! user_fpu_end() must be used only after you
+ * have saved the FP state, and user_fpu_begin() must
+ * be used only immediately before restoring it.
+ * These functions do not do any save/restore on
+ * their own.
+ */
+static inline int user_has_fpu(void)
+{
+ return current_thread_info()->status & TS_USEDFPU;
+}
+
+static inline void user_fpu_end(void)
+{
+ preempt_disable();
+ current_thread_info()->status &= ~TS_USEDFPU;
+ stts();
+ preempt_enable();
+}
+
+static inline void user_fpu_begin(void)
+{
+ preempt_disable();
+ if (!user_has_fpu()) {
+ clts();
+ current_thread_info()->status |= TS_USEDFPU;
+ }
+ preempt_enable();
+}
+
+/*
* These disable preemption on their own and are safe
*/
static inline void save_init_fpu(struct task_struct *tsk)
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a3911343976b..86f1f09a738a 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -168,7 +168,7 @@ int save_i387_xstate(void __user *buf)
if (!used_math())
return 0;

- if (task_thread_info(tsk)->status & TS_USEDFPU) {
+ if (user_has_fpu()) {
if (use_xsave())
err = xsave_user(buf);
else
@@ -176,8 +176,7 @@ int save_i387_xstate(void __user *buf)

if (err)
return err;
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
+ user_fpu_end();
} else {
sanitize_i387_state(tsk);
if (__copy_to_user(buf, &tsk->thread.fpu.state->fxsave,
@@ -292,10 +291,7 @@ int restore_i387_xstate(void __user *buf)
return err;
}

- if (!(task_thread_info(current)->status & TS_USEDFPU)) {
- clts();
- task_thread_info(current)->status |= TS_USEDFPU;
- }
+ user_fpu_begin();
if (use_xsave())
err = restore_user_xstate(buf);
else
--
1.7.9.188.g12766.dirty

2012-02-22 20:58:16

by Linus Torvalds

[permalink] [raw]

Subject: [PATCH 4/5] i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers

From: Linus Torvalds <[email protected]>
Date: Thu, 16 Feb 2012 12:22:48 -0800

[ Upstream commit b6c66418dcad0fcf83cd1d0a39482db37bf4fc41 ]

Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do
it. By moving it into the callers, we always do the TS_USEDFPU next to
the CR0.TS accesses in the source code, and it's much easier to see how
the two go hand in hand.

Signed-off-by: Linus Torvalds <[email protected]>
Cc: [email protected] # Helps backporting
---
arch/x86/include/asm/i387.h | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 6e87fa43c357..55fb3aa84b0d 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -259,7 +259,6 @@ static inline void fpu_save_init(struct fpu *fpu)
static inline void __save_init_fpu(struct task_struct *tsk)
{
fpu_save_init(&tsk->thread.fpu);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
}

static inline int fpu_fxrstor_checking(struct fpu *fpu)
@@ -290,6 +289,7 @@ static inline void __unlazy_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
__save_init_fpu(tsk);
+ task_thread_info(tsk)->status &= ~TS_USEDFPU;
stts();
} else
tsk->fpu_counter = 0;
@@ -356,9 +356,11 @@ static inline void kernel_fpu_begin(void)

WARN_ON_ONCE(!irq_fpu_usable());
preempt_disable();
- if (me->status & TS_USEDFPU)
+ if (me->status & TS_USEDFPU) {
__save_init_fpu(me->task);
- else
+ me->status &= ~TS_USEDFPU;
+ /* We do 'stts()' in kernel_fpu_end() */
+ } else
clts();
}

@@ -449,6 +451,7 @@ static inline void save_init_fpu(struct task_struct *tsk)
WARN_ON_ONCE(!(task_thread_info(tsk)->status & TS_USEDFPU));
preempt_disable();
__save_init_fpu(tsk);
+ task_thread_info(tsk)->status &= ~TS_USEDFPU;
stts();
preempt_enable();
}
--
1.7.9.188.g12766.dirty

2012-02-22 20:59:30

by Linus Torvalds

[permalink] [raw]

Subject: [PATCH 5/5] i387: move TS_USEDFPU flag from thread_info to task_struct

From: Linus Torvalds <[email protected]>
Date: Thu, 16 Feb 2012 13:33:12 -0800

[ Upstream commits 6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 and
f94edacf998516ac9d849f7bc6949a703977a7f3 ]

This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.

This fixes two independent bugs at the same time:

- changing 'thread_info->status' from the scheduler causes nasty
problems for the other users of that variable, since it is defined to
be thread-synchronous (that's what the "TS_" part of the naming was
supposed to indicate).

So perfectly valid code could (and did) do

ti->status |= TS_RESTORE_SIGMASK;

and the compiler was free to do that as separate load, or and store
instructions. Which can cause problems with preemption, since a task
switch could happen in between, and change the TS_USEDFPU bit. The
change to TS_USEDFPU would be overwritten by the final store.

In practice, this seldom happened, though, because the 'status' field
was seldom used more than once, so gcc would generally tend to
generate code that used a read-modify-write instruction and thus
happened to avoid this problem - RMW instructions are naturally low
fat and preemption-safe.

- On x86-32, the current_thread_info() pointer would, during interrupts
and softirqs, point to a *copy* of the real thread_info, because
x86-32 uses %esp to calculate the thread_info address, and thus the
separate irq (and softirq) stacks would cause these kinds of odd
thread_info copy aliases.

This is normally not a problem, since interrupts aren't supposed to
look at thread information anyway (what thread is running at
interrupt time really isn't very well-defined), but it confused the
heck out of irq_fpu_usable() and the code that tried to squirrel
away the FPU state.

(It also caused untold confusion for us poor kernel developers).

It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling). And the FPU data that we are going to save/restore is
found there too.

Thanks to Arjan Van De Ven <[email protected]> for pointing us to
the %esp issue.

Cc: Arjan van de Ven <[email protected]>
Reported-and-tested-by: Raphael Prevost <[email protected]>
Acked-and-tested-by: Suresh Siddha <[email protected]>
Tested-by: Peter Anvin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Cc: [email protected] # The main bugfix
---

This is the actual main bugfix.

arch/x86/include/asm/i387.h | 79 ++++++++++++++++++++++++++----------
arch/x86/include/asm/processor.h | 1 +
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/kernel/traps.c | 8 +--
arch/x86/kernel/xsave.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
6 files changed, 63 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 55fb3aa84b0d..1c485363a21e 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -280,6 +280,47 @@ static inline int restore_fpu_checking(struct task_struct *tsk)
}

/*
+ * Software FPU state helpers. Careful: these need to
+ * be preemption protection *and* they need to be
+ * properly paired with the CR0.TS changes!
+ */
+static inline int __thread_has_fpu(struct task_struct *tsk)
+{
+ return tsk->thread.has_fpu;
+}
+
+/* Must be paired with an 'stts' after! */
+static inline void __thread_clear_has_fpu(struct task_struct *tsk)
+{
+ tsk->thread.has_fpu = 0;
+}
+
+/* Must be paired with a 'clts' before! */
+static inline void __thread_set_has_fpu(struct task_struct *tsk)
+{
+ tsk->thread.has_fpu = 1;
+}
+
+/*
+ * Encapsulate the CR0.TS handling together with the
+ * software flag.
+ *
+ * These generally need preemption protection to work,
+ * do try to avoid using these on their own.
+ */
+static inline void __thread_fpu_end(struct task_struct *tsk)
+{
+ __thread_clear_has_fpu(tsk);
+ stts();
+}
+
+static inline void __thread_fpu_begin(struct task_struct *tsk)
+{
+ clts();
+ __thread_set_has_fpu(tsk);
+}
+
+/*
* Signal frame handlers...
*/
extern int save_i387_xstate(void __user *buf);
@@ -287,23 +328,21 @@ extern int restore_i387_xstate(void __user *buf);

static inline void __unlazy_fpu(struct task_struct *tsk)
{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
+ if (__thread_has_fpu(tsk)) {
__save_init_fpu(tsk);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
+ __thread_fpu_end(tsk);
} else
tsk->fpu_counter = 0;
}

static inline void __clear_fpu(struct task_struct *tsk)
{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
+ if (__thread_has_fpu(tsk)) {
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
"2:\n"
_ASM_EXTABLE(1b, 2b));
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
+ __thread_fpu_end(tsk);
}
}

@@ -311,14 +350,14 @@ static inline void __clear_fpu(struct task_struct *tsk)
* Were we in an interrupt that interrupted kernel mode?
*
* We can do a kernel_fpu_begin/end() pair *ONLY* if that
- * pair does nothing at all: TS_USEDFPU must be clear (so
+ * pair does nothing at all: the thread must not have fpu (so
* that we don't try to save the FPU state), and TS must
* be set (so that the clts/stts pair does nothing that is
* visible in the interrupted kernel thread).
*/
static inline bool interrupted_kernel_fpu_idle(void)
{
- return !(current_thread_info()->status & TS_USEDFPU) &&
+ return !__thread_has_fpu(current) &&
(read_cr0() & X86_CR0_TS);
}

@@ -352,13 +391,13 @@ static inline bool irq_fpu_usable(void)

static inline void kernel_fpu_begin(void)
{
- struct thread_info *me = current_thread_info();
+ struct task_struct *me = current;

WARN_ON_ONCE(!irq_fpu_usable());
preempt_disable();
- if (me->status & TS_USEDFPU) {
- __save_init_fpu(me->task);
- me->status &= ~TS_USEDFPU;
+ if (__thread_has_fpu(me)) {
+ __save_init_fpu(me);
+ __thread_clear_has_fpu(me);
/* We do 'stts()' in kernel_fpu_end() */
} else
clts();
@@ -422,24 +461,21 @@ static inline void irq_ts_restore(int TS_state)
*/
static inline int user_has_fpu(void)
{
- return current_thread_info()->status & TS_USEDFPU;
+ return __thread_has_fpu(current);
}

static inline void user_fpu_end(void)
{
preempt_disable();
- current_thread_info()->status &= ~TS_USEDFPU;
- stts();
+ __thread_fpu_end(current);
preempt_enable();
}

static inline void user_fpu_begin(void)
{
preempt_disable();
- if (!user_has_fpu()) {
- clts();
- current_thread_info()->status |= TS_USEDFPU;
- }
+ if (!user_has_fpu())
+ __thread_fpu_begin(current);
preempt_enable();
}

@@ -448,11 +484,10 @@ static inline void user_fpu_begin(void)
*/
static inline void save_init_fpu(struct task_struct *tsk)
{
- WARN_ON_ONCE(!(task_thread_info(tsk)->status & TS_USEDFPU));
+ WARN_ON_ONCE(!__thread_has_fpu(tsk));
preempt_disable();
__save_init_fpu(tsk);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
+ __thread_fpu_end(tsk);
preempt_enable();
}

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b650435ffb53..bb3ee3629a0f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -456,6 +456,7 @@ struct thread_struct {
unsigned long trap_no;
unsigned long error_code;
/* floating point and extended processor state */
+ unsigned long has_fpu;
struct fpu fpu;
#ifdef CONFIG_X86_32
/* Virtual 86 mode info */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a1fe5c127b52..d7ef849714a1 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -242,8 +242,6 @@ static inline struct thread_info *current_thread_info(void)
* ever touches our thread-synchronous status, so we don't
* have to worry about atomic accesses.
*/
-#define TS_USEDFPU 0x0001 /* FPU was used by this task
- this quantum (SMP) */
#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
#define TS_POLLING 0x0004 /* idle task polling need_resched,
skip sending interrupt */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 727e6c16f294..a67d0a833a44 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -574,12 +574,11 @@ void __math_state_restore(void)
* Paranoid restore. send a SIGSEGV if we fail to restore the state.
*/
if (unlikely(restore_fpu_checking(tsk))) {
- stts();
+ __thread_fpu_end(tsk);
force_sig(SIGSEGV, tsk);
return;
}

- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
tsk->fpu_counter++;
}

@@ -595,8 +594,7 @@ void __math_state_restore(void)
*/
void math_state_restore(void)
{
- struct thread_info *thread = current_thread_info();
- struct task_struct *tsk = thread->task;
+ struct task_struct *tsk = current;

if (!tsk_used_math(tsk)) {
local_irq_enable();
@@ -613,7 +611,7 @@ void math_state_restore(void)
local_irq_disable();
}

- clts(); /* Allow maths ops (or we recurse) */
+ __thread_fpu_begin(tsk);

__math_state_restore();
}
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 86f1f09a738a..711091114119 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -47,7 +47,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
if (!fx)
return;

- BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);
+ BUG_ON(__thread_has_fpu(tsk));

xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 579a0b51696a..4ea76784fe65 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1456,7 +1456,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
#ifdef CONFIG_X86_64
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
#endif
- if (current_thread_info()->status & TS_USEDFPU)
+ if (__thread_has_fpu(current))
clts();
load_gdt(&__get_cpu_var(host_gdt));
}
--
1.7.9.188.g12766.dirty

2012-02-22 21:02:40

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/22/2012 12:54 PM, Linus Torvalds wrote:
>
> I *really* hope that the people who could reproduce this bug will test the
> back-port series too, since I never actually saw the bug personally to
> begin with. And again, big thanks to Raphael who helped pinpoint and debug
> this.
>

I will run my tests on it overnight.

-hpa

2012-02-22 21:20:13

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 12:54:58PM -0800, Linus Torvalds wrote:
>
> This is a reasonably minimal backport of the i387 state save/restore bug.
>
> A few of the commits are just minimal "make it easier to backport" commits
> that don't necessarily fix anything on their own. And a few of the others
> are combinations of what is two commits in the development tree, because
> it turned out to be easier and cleaner that way.
>
> The last one is the one that fixes the x86-32 bug, but the preemption
> fixes are real fixes too, although they are probably not something that
> anybody has necessarily ever hit in reality because the race window is so
> small. Even so, if the fix itself isn't that important, the "make it
> easier to backport the main one" would still be a big argument for it.
>
> I *really* hope that the people who could reproduce this bug will test the
> back-port series too, since I never actually saw the bug personally to
> begin with. And again, big thanks to Raphael who helped pinpoint and debug
> this.

Thanks for doing the backport. Any ideas on how far back this problem
goes?

thanks,

greg k-h

2012-02-22 21:29:34

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 1:19 PM, Greg Kroah-Hartman
<[email protected]> wrote:
>
> Thanks for doing the backport. ?Any ideas on how far back this problem
> goes?

The fundamental bug goes back forever, but happily afaik you can only
*trigger* it by doing FPU accesses from interrupts, and nobody did
that until the AES-NI instructions came about.

So practically speaking it goes back to the introduction of
CRYPTO_AES_NI_INTEL, in commit 54b6a1bd5364 ("crypto: aes-ni - Add
support to Intel AES-NI instructions for x86_64 platform").

Which was merged into 2.6.30. So it still goes back pretty far.

The good news is that I *think* the whole i387 handling code hasn't
been touched much. But I didn't really check deeply.

Linus

2012-02-22 21:31:18

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 1:29 PM, Linus Torvalds
<[email protected]> wrote:
>
> The good news is that I *think* the whole i387 handling code hasn't
> been touched much. But I didn't really check deeply.

Btw, the backport was done to 3.2.0, in case you wonder. So that's
what the patches apply to. Whether they work, and whether they apply
to anything else, I'll leave for others to check ;)

Linus

2012-02-22 21:34:22

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 01:29:11PM -0800, Linus Torvalds wrote:
> On Wed, Feb 22, 2012 at 1:19 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > Thanks for doing the backport. ?Any ideas on how far back this problem
> > goes?
>
> The fundamental bug goes back forever, but happily afaik you can only
> *trigger* it by doing FPU accesses from interrupts, and nobody did
> that until the AES-NI instructions came about.
>
> So practically speaking it goes back to the introduction of
> CRYPTO_AES_NI_INTEL, in commit 54b6a1bd5364 ("crypto: aes-ni - Add
> support to Intel AES-NI instructions for x86_64 platform").
>
> Which was merged into 2.6.30. So it still goes back pretty far.
>
> The good news is that I *think* the whole i387 handling code hasn't
> been touched much. But I didn't really check deeply.

Ok, I'll see how far back I can backport it easily, after Peter verifies
that this series works for him on his box.

thanks,

greg k-h

2012-02-22 22:35:08

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/22/2012 12:54 PM, Linus Torvalds wrote:
>
> This is a reasonably minimal backport of the i387 state save/restore bug.
>
> A few of the commits are just minimal "make it easier to backport" commits
> that don't necessarily fix anything on their own. And a few of the others
> are combinations of what is two commits in the development tree, because
> it turned out to be easier and cleaner that way.
>
> The last one is the one that fixes the x86-32 bug, but the preemption
> fixes are real fixes too, although they are probably not something that
> anybody has necessarily ever hit in reality because the race window is so
> small. Even so, if the fix itself isn't that important, the "make it
> easier to backport the main one" would still be a big argument for it.
>
> I *really* hope that the people who could reproduce this bug will test the
> back-port series too, since I never actually saw the bug personally to
> begin with. And again, big thanks to Raphael who helped pinpoint and debug
> this.
>

Okay, this patchset does *NOT* work. It fails immediately on my system.
I hadn't ever seen it fail so fast.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-22 22:45:32

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/22/2012 02:34 PM, H. Peter Anvin wrote:
>
> Okay, this patchset does *NOT* work. It fails immediately on my system.
> I hadn't ever seen it fail so fast.
>

That was testing on top of 3.2.7, fwiw.

-hpa

2012-02-22 23:16:18

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 2:34 PM, H. Peter Anvin <[email protected]> wrote:
>
> Okay, this patchset does *NOT* work. ?It fails immediately on my system.
> ?I hadn't ever seen it fail so fast.

Ok, I probably missed something when I skipped the patches that
weren't appropriate for backporting (like the whole FPU state preload
removal and rewriting).

Back to the drawing board.

Linus

2012-02-22 23:32:12

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 3:15 PM, Linus Torvalds
<[email protected]> wrote:
>
> Ok, I probably missed something when I skipped the patches that
> weren't appropriate for backporting (like the whole FPU state preload
> removal and rewriting).

Ahh. Yes, I see what happened.

The FPU preloading patches cleaned up __switch_to() to use the
appropriate helpers to set TS_USEDFPU, but since I skipped them
(because I felt they weren't appropriate for -stable), those cleanups
got missed. So then the __math_state_restore()

Only the last patch needs fixing, methinks. Updated version attached.

Linus

Attachments:

0005-i387-move-TS_USEDFPU-flag-from-thread_info-to-task_s.patch (10.17 kB)

2012-02-23 00:14:50

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/22/2012 03:31 PM, Linus Torvalds wrote:
> On Wed, Feb 22, 2012 at 3:15 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Ok, I probably missed something when I skipped the patches that
>> weren't appropriate for backporting (like the whole FPU state preload
>> removal and rewriting).
>
> Ahh. Yes, I see what happened.
>
> The FPU preloading patches cleaned up __switch_to() to use the
> appropriate helpers to set TS_USEDFPU, but since I skipped them
> (because I felt they weren't appropriate for -stable), those cleanups
> got missed. So then the __math_state_restore()
>
> Only the last patch needs fixing, methinks. Updated version attached.
>
> Linus

Immediate failure, still.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-23 00:26:06

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 4:14 PM, H. Peter Anvin <[email protected]> wrote:
>
> Immediate failure, still.

Damn, I'll have to look closer at what else is missing then.

I really didn't want to back-port the whole series as-is, but that
would certainly have been easier.

Linus

2012-02-23 00:37:13

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 04:25:42PM -0800, Linus Torvalds wrote:
> On Wed, Feb 22, 2012 at 4:14 PM, H. Peter Anvin <[email protected]> wrote:
> >
> > Immediate failure, still.
>
> Damn, I'll have to look closer at what else is missing then.
>
> I really didn't want to back-port the whole series as-is, but that
> would certainly have been easier.

I can do that, it wouldn't be that hard, just give me the git commit
ids to apply.

I can do a release with just these fixes in it if you think that would
make it easier to test/verify.

thanks,

greg k-h

2012-02-23 02:25:34

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 23.02.2012 08:37, Greg Kroah-Hartman wrote:
> On Wed, Feb 22, 2012 at 04:25:42PM -0800, Linus Torvalds wrote:
>> On Wed, Feb 22, 2012 at 4:14 PM, H. Peter Anvin <[email protected]>
>> wrote:
>> >
>> > Immediate failure, still.
>>
>> Damn, I'll have to look closer at what else is missing then.
>>
>> I really didn't want to back-port the whole series as-is, but that
>> would certainly have been easier.

Hello,

Thank you for backporting this patchset to -stable. FWIW, the test
machine I had been working with has an uptime of 4 days now, with the
patchset in attachment applied on top of 3.2.6, so if it were
unpractical to trim it down further you can find solace in that it does
not break anything.

I'll be glad to help test any new patchset as well!

Also, the issue started before 2.6.30, 2.6.39 had the same issue, and
probably 2.6.38 even I did not test it for a while. 2.6.37 was the last
known "safe" release WRT this bug.

>
> I can do that, it wouldn't be that hard, just give me the git commit
> ids to apply.
>
> I can do a release with just these fixes in it if you think that
> would
> make it easier to test/verify.
>
> thanks,
>
> greg k-h

Attachments:

fpu-patch-1.diff (1.88 kB)
fpu-patch-2.diff (4.16 kB)
fpu-patch-3.diff (1.26 kB)
fpu-patch-4.diff (5.66 kB)
fpu-patch-5.diff (1.98 kB)
fpu-patch-6.diff (6.97 kB)
fpu-patch-9.diff (7.41 kB)
fpu-patch-81.diff (6.84 kB)
fpu-patch-82.diff (4.72 kB)
fpu-patch-91.diff (11.54 kB)
Download all attachments

2012-02-23 02:55:56

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 5:47 PM, <[email protected]> wrote:
>
> Thank you for backporting this patchset to -stable. FWIW, the test machine I
> had been working with has an uptime of 4 days now, with the patchset in
> attachment applied on top of 3.2.6, so if it were unpractical to trim it
> down further you can find solace in that it does not break anything.

Hmm. The patches in your attachements are whitespace-damaged. I was
going to apply that series and see what the difference was to my
minimal trial, but with the corruption that isn't possible.

I didn't find anything obviously wrong in my series, so..

Could you send the patches you used for your backport with the
whitespace fixed, and preferably with the patch numbering explained?

Linus

2012-02-23 03:12:35

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 23.02.2012 10:55, Linus Torvalds wrote:
> On Wed, Feb 22, 2012 at 5:47 PM, <[email protected]> wrote:
>>
>> Thank you for backporting this patchset to -stable. FWIW, the test
>> machine I
>> had been working with has an uptime of 4 days now, with the patchset
>> in
>> attachment applied on top of 3.2.6, so if it were unpractical to
>> trim it
>> down further you can find solace in that it does not break anything.
>
> Hmm. The patches in your attachements are whitespace-damaged. I was
> going to apply that series and see what the difference was to my
> minimal trial, but with the corruption that isn't possible.
>
> I didn't find anything obviously wrong in my series, so..
>
> Could you send the patches you used for your backport with the
> whitespace fixed, and preferably with the patch numbering explained?

The numbering is just so I can apply the patches in the right order
with a for loop in the packaging script. The missing 7* was the
experimental patches we tried which moved has_fpu in the thread_info
struct (which did not work).

The patchset is simply made of:
be98c2cdb15ba26148cd2bd58a857d4f7759ed38 (unmodified)
5b1cbac37798805c1fee18c8cebe5c0a13975b17 (")
c38e23456278e967f094b08247ffc3711b1029b2 (")
15d8791cae75dca27bfda8ecfe87dca9379d6bb0 (")
b6c66418dcad0fcf83cd1d0a39482db37bf4fc41 (")
6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 (")
b3b0870ef3ffed72b92415423da864f440f57ad6 (")
4903062b5485f0e2c286a23b44c9b59d9b017d53: this one requires a slight
modification:
-#define safe_address (kstat_cpu(0).cpustat.user)
instead of:
-#define safe_address
(__get_cpu_var(kernel_cpustat).cpustat[CPUTIME_USER])
f94edacf998516ac9d849f7bc6949a703977a7f3 (unmodified)
34ddc81a230b15c0e345b6b253049db731499f7e (")

>
> Linus

2012-02-23 03:37:54

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 6:41 PM, <[email protected]> wrote:
>
> The patchset is simply made of:
> be98c2cdb15ba26148cd2bd58a857d4f7759ed38 (unmodified)
> 5b1cbac37798805c1fee18c8cebe5c0a13975b17 (")
> c38e23456278e967f094b08247ffc3711b1029b2 (")
> 15d8791cae75dca27bfda8ecfe87dca9379d6bb0 (")
> b6c66418dcad0fcf83cd1d0a39482db37bf4fc41 (")
> 6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 (")
> b3b0870ef3ffed72b92415423da864f440f57ad6 (")
> 4903062b5485f0e2c286a23b44c9b59d9b017d53: this one requires a slight
> modification:
> -#define safe_address (kstat_cpu(0).cpustat.user)
> instead of:
> -#define safe_address (__get_cpu_var(kernel_cpustat).cpustat[CPUTIME_USER])
> f94edacf998516ac9d849f7bc6949a703977a7f3 (unmodified)
> 34ddc81a230b15c0e345b6b253049db731499f7e (")

Oh, Greg, since that series is tested by Raphael on top of 3.2.6
already, let's just make that be the stable backport.

It does mean that the stable backport will contain that whole "rip out
fp state preloading and reimplement it" thing, but hey, considering
that when I tried to avoid it I clearly screwed something up, maybe
that's all for the best. And perhaps staying closer to the development
tree is a good idea anyway in case there are any other issues.

Linus

2012-02-23 14:04:41

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 1/5] i387: math_state_restore() isn't called from asm

> + * Must be called with kernel preemption disabled (eg with local
> + * local interrupts as in the case of do_device_not_available).

"eg with local local interrupts"?

As long as I'm quibbling, "exempli gratia" deserves periods to make the
abbreviation, and a parenthetical comment at the end of a sentence is
usually better set off with a comma.

+ * Must be called with kernel preemption disabled, e.g. with local
+ * interrupts disabled as in the case of do_device_not_available.

2012-02-23 18:15:40

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 07:37:27PM -0800, Linus Torvalds wrote:
> On Wed, Feb 22, 2012 at 6:41 PM, <[email protected]> wrote:
> >
> > The patchset is simply made of:
> > be98c2cdb15ba26148cd2bd58a857d4f7759ed38 (unmodified)
> > 5b1cbac37798805c1fee18c8cebe5c0a13975b17 (")
> > c38e23456278e967f094b08247ffc3711b1029b2 (")
> > 15d8791cae75dca27bfda8ecfe87dca9379d6bb0 (")
> > b6c66418dcad0fcf83cd1d0a39482db37bf4fc41 (")
> > 6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 (")
> > b3b0870ef3ffed72b92415423da864f440f57ad6 (")
> > 4903062b5485f0e2c286a23b44c9b59d9b017d53: this one requires a slight
> > modification:
> > -#define safe_address (kstat_cpu(0).cpustat.user)
> > instead of:
> > -#define safe_address (__get_cpu_var(kernel_cpustat).cpustat[CPUTIME_USER])
> > f94edacf998516ac9d849f7bc6949a703977a7f3 (unmodified)
> > 34ddc81a230b15c0e345b6b253049db731499f7e (")
>
> Oh, Greg, since that series is tested by Raphael on top of 3.2.6
> already, let's just make that be the stable backport.
>
> It does mean that the stable backport will contain that whole "rip out
> fp state preloading and reimplement it" thing, but hey, considering
> that when I tried to avoid it I clearly screwed something up, maybe
> that's all for the best. And perhaps staying closer to the development
> tree is a good idea anyway in case there are any other issues.

I prefer the "staying closer" model, it works out better if there are
problems found later on. I'll work on queuing these up now.

thanks,

greg k-h

2012-02-23 19:36:35

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 10:41:57AM +0800, [email protected] wrote:
> On 23.02.2012 10:55, Linus Torvalds wrote:
> >On Wed, Feb 22, 2012 at 5:47 PM, <[email protected]> wrote:
> >>
> >>Thank you for backporting this patchset to -stable. FWIW, the
> >>test machine I
> >>had been working with has an uptime of 4 days now, with the
> >>patchset in
> >>attachment applied on top of 3.2.6, so if it were unpractical to
> >>trim it
> >>down further you can find solace in that it does not break anything.
> >
> >Hmm. The patches in your attachements are whitespace-damaged. I was
> >going to apply that series and see what the difference was to my
> >minimal trial, but with the corruption that isn't possible.
> >
> >I didn't find anything obviously wrong in my series, so..
> >
> >Could you send the patches you used for your backport with the
> >whitespace fixed, and preferably with the patch numbering explained?
>
> The numbering is just so I can apply the patches in the right order
> with a for loop in the packaging script. The missing 7* was the
> experimental patches we tried which moved has_fpu in the thread_info
> struct (which did not work).
>
> The patchset is simply made of:
> be98c2cdb15ba26148cd2bd58a857d4f7759ed38 (unmodified)
> 5b1cbac37798805c1fee18c8cebe5c0a13975b17 (")
> c38e23456278e967f094b08247ffc3711b1029b2 (")
> 15d8791cae75dca27bfda8ecfe87dca9379d6bb0 (")
> b6c66418dcad0fcf83cd1d0a39482db37bf4fc41 (")
> 6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 (")
> b3b0870ef3ffed72b92415423da864f440f57ad6 (")

These all applied fine, I've queued them up so far.

> 4903062b5485f0e2c286a23b44c9b59d9b017d53: this one requires a slight
> modification:
> -#define safe_address (kstat_cpu(0).cpustat.user)
> instead of:
> -#define safe_address
> (__get_cpu_var(kernel_cpustat).cpustat[CPUTIME_USER])

Hm, for 3.0-stable, yes, that's all that seems to be needed, but for
3.2-stable, something is odd here, there's some changes that are missing
here to get this correct (odds are you didn't test this on a AMD
processor, that's the odd portion of the merge.)

I'll dig into this some more, maybe we need another patch here...

greg k-h

2012-02-23 19:42:13

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 11:36 AM, Greg Kroah-Hartman
<[email protected]> wrote:
>

The insane safe_address games go away in the cleanup patch, so it
really shouldn't be much of an issue.

In the end, save_address should just be

#define safe_address (tsk->thread.has_fpu)

(and in fact the whole #define got removed entirely in mainline in
commit 80ab6f1e8c98, which might be fodder for -stable too)

Linus

2012-02-23 19:50:15

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 11:41:50AM -0800, Linus Torvalds wrote:
> On Thu, Feb 23, 2012 at 11:36 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
>
> The insane safe_address games go away in the cleanup patch, so it
> really shouldn't be much of an issue.
>
> In the end, save_address should just be
>
> #define safe_address (tsk->thread.has_fpu)
>
> (and in fact the whole #define got removed entirely in mainline in
> commit 80ab6f1e8c98, which might be fodder for -stable too)

That's not where the merge caused problems, it was in
arch/x86/kernel/traps.c and arch/x86/kernel/process_64.c, either I'm
missing a patch that needs to be added to the series, or something else
is odd, let me dig...

greg k-h

>
> Linus

2012-02-23 19:55:17

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 11:50:07AM -0800, Greg Kroah-Hartman wrote:
> On Thu, Feb 23, 2012 at 11:41:50AM -0800, Linus Torvalds wrote:
> > On Thu, Feb 23, 2012 at 11:36 AM, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> >
> > The insane safe_address games go away in the cleanup patch, so it
> > really shouldn't be much of an issue.
> >
> > In the end, save_address should just be
> >
> > #define safe_address (tsk->thread.has_fpu)
> >
> > (and in fact the whole #define got removed entirely in mainline in
> > commit 80ab6f1e8c98, which might be fodder for -stable too)
>
> That's not where the merge caused problems, it was in
> arch/x86/kernel/traps.c and arch/x86/kernel/process_64.c, either I'm
> missing a patch that needs to be added to the series, or something else
> is odd, let me dig...

No, something is messed up in my tree, let me try this again, this patch
isn't the merge problem, something else is going on...

greg

2012-02-23 20:03:01

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 11:55:11AM -0800, Greg Kroah-Hartman wrote:
> On Thu, Feb 23, 2012 at 11:50:07AM -0800, Greg Kroah-Hartman wrote:
> > On Thu, Feb 23, 2012 at 11:41:50AM -0800, Linus Torvalds wrote:
> > > On Thu, Feb 23, 2012 at 11:36 AM, Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > >
> > > The insane safe_address games go away in the cleanup patch, so it
> > > really shouldn't be much of an issue.
> > >
> > > In the end, save_address should just be
> > >
> > > #define safe_address (tsk->thread.has_fpu)
> > >
> > > (and in fact the whole #define got removed entirely in mainline in
> > > commit 80ab6f1e8c98, which might be fodder for -stable too)
> >
> > That's not where the merge caused problems, it was in
> > arch/x86/kernel/traps.c and arch/x86/kernel/process_64.c, either I'm
> > missing a patch that needs to be added to the series, or something else
> > is odd, let me dig...
>
> No, something is messed up in my tree, let me try this again, this patch
> isn't the merge problem, something else is going on...

Ok, that was operator error on my side, it should all now be
straightened out and the patches applied just like was expected.

sorry for the noise.

greg k-h

2012-02-23 20:14:45

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Wed, Feb 22, 2012 at 01:32:53PM -0800, Greg Kroah-Hartman wrote:
> On Wed, Feb 22, 2012 at 01:29:11PM -0800, Linus Torvalds wrote:
> > On Wed, Feb 22, 2012 at 1:19 PM, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > Thanks for doing the backport. ?Any ideas on how far back this problem
> > > goes?
> >
> > The fundamental bug goes back forever, but happily afaik you can only
> > *trigger* it by doing FPU accesses from interrupts, and nobody did
> > that until the AES-NI instructions came about.
> >
> > So practically speaking it goes back to the introduction of
> > CRYPTO_AES_NI_INTEL, in commit 54b6a1bd5364 ("crypto: aes-ni - Add
> > support to Intel AES-NI instructions for x86_64 platform").
> >
> > Which was merged into 2.6.30. So it still goes back pretty far.
> >
> > The good news is that I *think* the whole i387 handling code hasn't
> > been touched much. But I didn't really check deeply.
>
> Ok, I'll see how far back I can backport it easily, after Peter verifies
> that this series works for him on his box.

I've applied this now to the 3.0 and 3.2-stable trees. It looks like it
would work on the 2.6.32-stable tree, but it needs some tweaks, and as I
can't really test this, and the .32-stable tree is probably not going to
have problems in this area, I'll let someone else generate those patches
and test them if they feel it is needed there.

thanks,

greg k-h

2012-02-23 20:32:17

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 12:09 PM, Greg Kroah-Hartman wrote:
>
> I've applied this now to the 3.0 and 3.2-stable trees. It looks like it
> would work on the 2.6.32-stable tree, but it needs some tweaks, and as I
> can't really test this, and the .32-stable tree is probably not going to
> have problems in this area, I'll let someone else generate those patches
> and test them if they feel it is needed there.
>

It doesn't seem to appear in the stable git tree... anywhere I can get
your tree so I can test these out?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-23 20:48:50

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 12:29:56PM -0800, H. Peter Anvin wrote:
> On 02/23/2012 12:09 PM, Greg Kroah-Hartman wrote:
> >
> > I've applied this now to the 3.0 and 3.2-stable trees. It looks like it
> > would work on the 2.6.32-stable tree, but it needs some tweaks, and as I
> > can't really test this, and the .32-stable tree is probably not going to
> > have problems in this area, I'll let someone else generate those patches
> > and test them if they feel it is needed there.
> >
>
> It doesn't seem to appear in the stable git tree... anywhere I can get
> your tree so I can test these out?

They are in the stable-queue git tree at:
/pub/scm/linux/kernel/git/stable/stable-queue.git
on git.kernel.org

Testing them out would be great, if you want, I can knock up a git tree
with just these patches in it on 3.2.7 or anything else you need to test
with, if that makes it easier for you.

thanks,

greg k-h

2012-02-23 20:54:13

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 12:48 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 23, 2012 at 12:29:56PM -0800, H. Peter Anvin wrote:
>> On 02/23/2012 12:09 PM, Greg Kroah-Hartman wrote:
>>>
>>> I've applied this now to the 3.0 and 3.2-stable trees. It looks like it
>>> would work on the 2.6.32-stable tree, but it needs some tweaks, and as I
>>> can't really test this, and the .32-stable tree is probably not going to
>>> have problems in this area, I'll let someone else generate those patches
>>> and test them if they feel it is needed there.
>>>
>>
>> It doesn't seem to appear in the stable git tree... anywhere I can get
>> your tree so I can test these out?
>
> They are in the stable-queue git tree at:
> /pub/scm/linux/kernel/git/stable/stable-queue.git
> on git.kernel.org
>
> Testing them out would be great, if you want, I can knock up a git tree
> with just these patches in it on 3.2.7 or anything else you need to test
> with, if that makes it easier for you.
>

That would be awesome.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-23 21:10:24

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 12:51:56PM -0800, H. Peter Anvin wrote:
> On 02/23/2012 12:48 PM, Greg Kroah-Hartman wrote:
> > On Thu, Feb 23, 2012 at 12:29:56PM -0800, H. Peter Anvin wrote:
> >> On 02/23/2012 12:09 PM, Greg Kroah-Hartman wrote:
> >>>
> >>> I've applied this now to the 3.0 and 3.2-stable trees. It looks like it
> >>> would work on the 2.6.32-stable tree, but it needs some tweaks, and as I
> >>> can't really test this, and the .32-stable tree is probably not going to
> >>> have problems in this area, I'll let someone else generate those patches
> >>> and test them if they feel it is needed there.
> >>>
> >>
> >> It doesn't seem to appear in the stable git tree... anywhere I can get
> >> your tree so I can test these out?
> >
> > They are in the stable-queue git tree at:
> > /pub/scm/linux/kernel/git/stable/stable-queue.git
> > on git.kernel.org
> >
> > Testing them out would be great, if you want, I can knock up a git tree
> > with just these patches in it on 3.2.7 or anything else you need to test
> > with, if that makes it easier for you.
> >
>
> That would be awesome.

Ok, the patches are at:
git.kernel.org/pub/scm/linux/kernel/git/gregkh/stable-test.git linux-3.2.y-i387-test

They are on top of the 3.2.7 kernel release.

If that doesn't work for you, please let me know.

thanks,

greg k-h

2012-02-23 21:52:57

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

Hi,

On Thu, Feb 23, 2012 at 01:10:16PM -0800, Greg Kroah-Hartman wrote:
> On Thu, Feb 23, 2012 at 12:51:56PM -0800, H. Peter Anvin wrote:
> > On 02/23/2012 12:48 PM, Greg Kroah-Hartman wrote:
> > > On Thu, Feb 23, 2012 at 12:29:56PM -0800, H. Peter Anvin wrote:
> > >> On 02/23/2012 12:09 PM, Greg Kroah-Hartman wrote:
> > >>>
> > >>> I've applied this now to the 3.0 and 3.2-stable trees. It looks like it
> > >>> would work on the 2.6.32-stable tree, but it needs some tweaks, and as I
> > >>> can't really test this, and the .32-stable tree is probably not going to
> > >>> have problems in this area, I'll let someone else generate those patches
> > >>> and test them if they feel it is needed there.
> > >>>
> > >>
> > >> It doesn't seem to appear in the stable git tree... anywhere I can get
> > >> your tree so I can test these out?
> > >
> > > They are in the stable-queue git tree at:
> > > /pub/scm/linux/kernel/git/stable/stable-queue.git
> > > on git.kernel.org
> > >
> > > Testing them out would be great, if you want, I can knock up a git tree
> > > with just these patches in it on 3.2.7 or anything else you need to test
> > > with, if that makes it easier for you.
> > >
> >
> > That would be awesome.
>
> Ok, the patches are at:
> git.kernel.org/pub/scm/linux/kernel/git/gregkh/stable-test.git linux-3.2.y-i387-test
>
> They are on top of the 3.2.7 kernel release.
>
> If that doesn't work for you, please let me know.

I would test this too, but apart from ensuring my kernel still boots,
how do I ensure the patches do really fix what the ought to fix ? I
must admit I didn't catch the initial issue they were supposed to fix
unfortunately :-/

Thanks,
Willy

2012-02-23 22:12:12

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 1:52 PM, Willy Tarreau <[email protected]> wrote:
>
> I would test this too, but apart from ensuring my kernel still boots,
> how do I ensure the patches do really fix what the ought to fix ? I
> must admit I didn't catch the initial issue they were supposed to fix
> unfortunately :-/

Almost nobody did.

This only happens on modern CPU's that support the new AES-NI
instructions, and only with a 32-bit kernel (although the very
unlikely preemption issues can happen on x86-64 too). And you need to
have the AES instructions called from interrupts, which probably only
happens with the mac80211 wireless networking stack.

And even then you need WPA2 to trigger it (I guess AES is sometimes
used with "extended WPA1" too, but I dunno).

So it's not impossible to trigger, but you do need to have a fairly
recent CPU that happily runs in 64-bit mode, and install a 32-bit
system on it. And it needs to use the right wireless setup.

It's possible that the right solution for really older kernels is just
to say that AES_NI depends on X86_64.

Linus

2012-02-23 22:27:43

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 02:11:48PM -0800, Linus Torvalds wrote:
> On Thu, Feb 23, 2012 at 1:52 PM, Willy Tarreau <[email protected]> wrote:
> >
> > I would test this too, but apart from ensuring my kernel still boots,
> > how do I ensure the patches do really fix what the ought to fix ? I
> > must admit I didn't catch the initial issue they were supposed to fix
> > unfortunately :-/
>
> Almost nobody did.
>
> This only happens on modern CPU's that support the new AES-NI
> instructions, and only with a 32-bit kernel (although the very
> unlikely preemption issues can happen on x86-64 too). And you need to
> have the AES instructions called from interrupts, which probably only
> happens with the mac80211 wireless networking stack.
>
> And even then you need WPA2 to trigger it (I guess AES is sometimes
> used with "extended WPA1" too, but I dunno).
>
> So it's not impossible to trigger, but you do need to have a fairly
> recent CPU that happily runs in 64-bit mode, and install a 32-bit
> system on it. And it needs to use the right wireless setup.

OK so indeed I will only be able to check that it boots :-/

> It's possible that the right solution for really older kernels is just
> to say that AES_NI depends on X86_64.

Might be a good idea !

Thanks for the explanation,
Willy

2012-02-23 22:39:05

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 2:27 PM, Willy Tarreau <[email protected]> wrote:
>
> OK so indeed I will only be able to check that it boots :-/

Well, we could do some trivial test-harness that just forces the issue
with regular timer interrupts (and even without AES-NI). I think Peter
talked about that when we were trying to hunt it down - but I think he
was then able to reproduce the problem without anything special and we
dropped it.

Essentially, just doing something like

if (irq_fpu_usable()) {
kernel_fpu_begin();
kernel_fpu_end();
}

in do_irq() and do_softirq() would stress-test things even without
wireless, and even without AES-NI.

You'd still need an x86-32 machine to test on, because x86-64 was
immune to this issue.

But yeah, the impact of this seems to be small enough that for older
kernels (which are likely used on older systems for maintenance
anyway) disabling AES-NI on x86-32 really might be the way to go.

Linus

2012-02-23 22:49:26

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 02:38 PM, Linus Torvalds wrote:
>
> You'd still need an x86-32 machine to test on, because x86-64 was
> immune to this issue.
>
> But yeah, the impact of this seems to be small enough that for older
> kernels (which are likely used on older systems for maintenance
> anyway) disabling AES-NI on x86-32 really might be the way to go.
>

That would really suck for users of encrypted hard disks.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-23 22:49:38

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 02:38:42PM -0800, Linus Torvalds wrote:
> On Thu, Feb 23, 2012 at 2:27 PM, Willy Tarreau <[email protected]> wrote:
> >
> > OK so indeed I will only be able to check that it boots :-/
>
> Well, we could do some trivial test-harness that just forces the issue
> with regular timer interrupts (and even without AES-NI). I think Peter
> talked about that when we were trying to hunt it down - but I think he
> was then able to reproduce the problem without anything special and we
> dropped it.
>
> Essentially, just doing something like
>
> if (irq_fpu_usable()) {
> kernel_fpu_begin();
> kernel_fpu_end();
> }
>
> in do_irq() and do_softirq() would stress-test things even without
> wireless, and even without AES-NI.

Interesting...

> You'd still need an x86-32 machine to test on, because x86-64 was
> immune to this issue.
>
> But yeah, the impact of this seems to be small enough that for older
> kernels (which are likely used on older systems for maintenance
> anyway) disabling AES-NI on x86-32 really might be the way to go.

I agree. I don't know anybody using AES-NI on purpose on older x86-32
systems!

Willy

2012-02-23 22:52:36

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 02:48:51PM -0800, H. Peter Anvin wrote:
> On 02/23/2012 02:38 PM, Linus Torvalds wrote:
> >
> > You'd still need an x86-32 machine to test on, because x86-64 was
> > immune to this issue.
> >
> > But yeah, the impact of this seems to be small enough that for older
> > kernels (which are likely used on older systems for maintenance
> > anyway) disabling AES-NI on x86-32 really might be the way to go.
> >
>
> That would really suck for users of encrypted hard disks.

Peter, do you really think there are that many ? I think I only saw
AES-NI on recent 64-bit capable chips, and it's been a while that
users have been installing 64-bit distros on such machines. Note that
I'm not advocating for breaking existing setups, just that I'm surprized
by this combination (aes-ni + 32-bit).

Willy

2012-02-23 22:55:46

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 02:52 PM, Willy Tarreau wrote:
> On Thu, Feb 23, 2012 at 02:48:51PM -0800, H. Peter Anvin wrote:
>> On 02/23/2012 02:38 PM, Linus Torvalds wrote:
>>>
>>> You'd still need an x86-32 machine to test on, because x86-64 was
>>> immune to this issue.
>>>
>>> But yeah, the impact of this seems to be small enough that for older
>>> kernels (which are likely used on older systems for maintenance
>>> anyway) disabling AES-NI on x86-32 really might be the way to go.
>>>
>>
>> That would really suck for users of encrypted hard disks.
>
> Peter, do you really think there are that many ? I think I only saw
> AES-NI on recent 64-bit capable chips, and it's been a while that
> users have been installing 64-bit distros on such machines. Note that
> I'm not advocating for breaking existing setups, just that I'm surprized
> by this combination (aes-ni + 32-bit).
>

There are still people running 32-bit systems because they have some odd
compatibility constraints but now have to deal with corporate or other
security constraints; they may also have been using disk encryption
since before AES-NI was in but doing it on the integer side is way slower.

This is not AES-NI in the interrupt path, but I don't think there is a
knob for that.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-23 22:59:42

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 02:38:42PM -0800, Linus Torvalds wrote:
> On Thu, Feb 23, 2012 at 2:27 PM, Willy Tarreau <[email protected]> wrote:
> >
> > OK so indeed I will only be able to check that it boots :-/
>
> Well, we could do some trivial test-harness that just forces the issue
> with regular timer interrupts (and even without AES-NI). I think Peter
> talked about that when we were trying to hunt it down - but I think he
> was then able to reproduce the problem without anything special and we
> dropped it.
>
> Essentially, just doing something like
>
> if (irq_fpu_usable()) {
> kernel_fpu_begin();
> kernel_fpu_end();
> }
>
> in do_irq() and do_softirq() would stress-test things even without
> wireless, and even without AES-NI.
>
> You'd still need an x86-32 machine to test on, because x86-64 was
> immune to this issue.
>
> But yeah, the impact of this seems to be small enough that for older
> kernels (which are likely used on older systems for maintenance
> anyway) disabling AES-NI on x86-32 really might be the way to go.

I think that's already the case, 2.6.32 has the following depends for
CRYPTO_AES_NI_INTEL:
depends on (X86 || UML_X86) && 64BIT
It was this way until commit 0d258efb (crypto: aesni-intel - Ported
implementation to x86-32) which showed up in 2.6.38.

So we should be safe for 2.6.32 no changes needed, right?

greg k-h

2012-02-23 23:04:45

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 02:55:11PM -0800, H. Peter Anvin wrote:
> On 02/23/2012 02:52 PM, Willy Tarreau wrote:
> > On Thu, Feb 23, 2012 at 02:48:51PM -0800, H. Peter Anvin wrote:
> >> On 02/23/2012 02:38 PM, Linus Torvalds wrote:
> >>>
> >>> You'd still need an x86-32 machine to test on, because x86-64 was
> >>> immune to this issue.
> >>>
> >>> But yeah, the impact of this seems to be small enough that for older
> >>> kernels (which are likely used on older systems for maintenance
> >>> anyway) disabling AES-NI on x86-32 really might be the way to go.
> >>>
> >>
> >> That would really suck for users of encrypted hard disks.
> >
> > Peter, do you really think there are that many ? I think I only saw
> > AES-NI on recent 64-bit capable chips, and it's been a while that
> > users have been installing 64-bit distros on such machines. Note that
> > I'm not advocating for breaking existing setups, just that I'm surprized
> > by this combination (aes-ni + 32-bit).
> >
>
> There are still people running 32-bit systems because they have some odd
> compatibility constraints but now have to deal with corporate or other
> security constraints; they may also have been using disk encryption
> since before AES-NI was in but doing it on the integer side is way slower.

Indeed the combination looks plausible :-)

> This is not AES-NI in the interrupt path, but I don't think there is a
> knob for that.

OK. So let's hope this works then !

Willy

2012-02-23 23:07:49

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 01:10 PM, Greg Kroah-Hartman wrote:
>>
>> That would be awesome.
>
> Ok, the patches are at:
> git.kernel.org/pub/scm/linux/kernel/git/gregkh/stable-test.git linux-3.2.y-i387-test
>
> They are on top of the 3.2.7 kernel release.
>
> If that doesn't work for you, please let me know.
>

This is a bit odd.

I don't see the user space FPU corruption I did with the old code, but
the wifi drops off the network and won't come back after running for
quite a while. It might be coincidence, but I have had it happen a few
times now.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-23 23:16:14

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 03:05:32PM -0800, H. Peter Anvin wrote:
> On 02/23/2012 01:10 PM, Greg Kroah-Hartman wrote:
> >>
> >> That would be awesome.
> >
> > Ok, the patches are at:
> > git.kernel.org/pub/scm/linux/kernel/git/gregkh/stable-test.git linux-3.2.y-i387-test
> >
> > They are on top of the 3.2.7 kernel release.
> >
> > If that doesn't work for you, please let me know.
> >
>
> This is a bit odd.
>
> I don't see the user space FPU corruption I did with the old code, but
> the wifi drops off the network and won't come back after running for
> quite a while. It might be coincidence, but I have had it happen a few
> times now.

And a "clean" 3.2.7 doesn't have that problem?

Does the same thing happen in Linus's tree at the moment?

greg k-h

2012-02-23 23:18:15

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 03:16 PM, Greg Kroah-Hartman wrote:
>
> And a "clean" 3.2.7 doesn't have that problem?
>
> Does the same thing happen in Linus's tree at the moment?
>

Clean 3.2.7 I don't think has that problem, although of course when I
run the stress test it fails on other ways so it's hard to 100% rule out.

Linus' tree doesn't have that problems.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-23 23:19:47

by Suresh Siddha

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, 2012-02-23 at 15:18 -0800, H. Peter Anvin wrote:
> On 02/23/2012 03:16 PM, Greg Kroah-Hartman wrote:
> >
> > And a "clean" 3.2.7 doesn't have that problem?
> >
> > Does the same thing happen in Linus's tree at the moment?
> >
>
> Clean 3.2.7 I don't think has that problem, although of course when I
> run the stress test it fails on other ways so it's hard to 100% rule out.

Perhaps you can check the clean 3.2.7 kernel with AES-NI disabled?

thanks,
suresh

>
> Linus' tree doesn't have that problems.
>
> -hpa
>
>

2012-02-23 23:54:58

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On Thu, Feb 23, 2012 at 03:18:03PM -0800, H. Peter Anvin wrote:
> On 02/23/2012 03:16 PM, Greg Kroah-Hartman wrote:
> >
> > And a "clean" 3.2.7 doesn't have that problem?
> >
> > Does the same thing happen in Linus's tree at the moment?
> >
>
> Clean 3.2.7 I don't think has that problem, although of course when I
> run the stress test it fails on other ways so it's hard to 100% rule out.
>
> Linus' tree doesn't have that problems.

Ugh.

So, we need more testers, so let me release a -rc kernel with just these
changes in it, to get a wider testing range, and make an easy "this
kernel worked, but this one didn't" for people to work with.

I don't suppose that 'git bisect' would work for this, as it's really a
"all or nothing" type thing with this series from what I can tell,
right?

thanks,

greg k-h

2012-02-23 23:59:24

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 03:54 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 23, 2012 at 03:18:03PM -0800, H. Peter Anvin wrote:
>> On 02/23/2012 03:16 PM, Greg Kroah-Hartman wrote:
>>>
>>> And a "clean" 3.2.7 doesn't have that problem?
>>>
>>> Does the same thing happen in Linus's tree at the moment?
>>>
>>
>> Clean 3.2.7 I don't think has that problem, although of course when I
>> run the stress test it fails on other ways so it's hard to 100% rule out.
>>
>> Linus' tree doesn't have that problems.
>
> Ugh.
>
> So, we need more testers, so let me release a -rc kernel with just these
> changes in it, to get a wider testing range, and make an easy "this
> kernel worked, but this one didn't" for people to work with.
>
> I don't suppose that 'git bisect' would work for this, as it's really a
> "all or nothing" type thing with this series from what I can tell,
> right?
>

Yes. At Suresh' suggestion I am running a reference run without AES-NI,
just to make sure we're not chasing a wireless driver bug.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-24 00:48:01

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH 0/5] i387: stable kernel backport

On 02/23/2012 03:54 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 23, 2012 at 03:18:03PM -0800, H. Peter Anvin wrote:
>> On 02/23/2012 03:16 PM, Greg Kroah-Hartman wrote:
>>>
>>> And a "clean" 3.2.7 doesn't have that problem?
>>>
>>> Does the same thing happen in Linus's tree at the moment?
>>>
>>
>> Clean 3.2.7 I don't think has that problem, although of course when I
>> run the stress test it fails on other ways so it's hard to 100% rule out.
>>
>> Linus' tree doesn't have that problems.
>
> Ugh.
>
> So, we need more testers, so let me release a -rc kernel with just these
> changes in it, to get a wider testing range, and make an easy "this
> kernel worked, but this one didn't" for people to work with.
>
> I don't suppose that 'git bisect' would work for this, as it's really a
> "all or nothing" type thing with this series from what I can tell,
> right?
>

So far, disabling AES-NI works as expected, so I definitely don't think
it's a driver issue.

-hpa