2008-06-01 09:01:24

by Jürgen Mell

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

Hi,

> On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> I tracked this down to a single kernel configuration option. If
> CONFIG_PREEMPT is set to 'y' the application will start crashing.
> If CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> application will run without errors.

With lots of help from Heinz-Bernd, Bernd and Oliver of the Einstein@Home
project I now found the the following:

1. Einstein@home will crash with trap #8 if the problem is present. The
error occurs between some minutes after starting Einstein up to more than
10 hours after starting Einstein. This seems to depend on how many other
applications are used on the system (it takes much more time, if only the
Einstein processes are active on the system).

2. The error was introduced between kernel.org kernels 2.6.19.7 and 2.6.20.
It is still present in 2.6.26-rc4

3. If I revert the patch

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167

in 2.6.20, Einstein does not crash anymore (program was run for more than
30 hours while system was in normal use with programming, multi-media
etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.

Now I need some help as I am not an expert in this area. What I assume is
that either the state of the FPU is not always restored (perhaps if the
process is swapped between the two cores?) or it is restored more than
once. Please keep in mind, that I am always running two Einstein processes
simultaneously on my two cores!
I am willing to do further testing of this problem if someone can give me a
hint how to continue.

Bye,

J?rgen


2008-06-01 11:41:16

by Andi Kleen

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

[email protected] writes:

> or it is restored more than
> once. Please keep in mind, that I am always running two Einstein processes
> simultaneously on my two cores!
> I am willing to do further testing of this problem if someone can give me a
> hint how to continue.

My bet would have been actually on aa283f49276e7d840a40fb01eee6de97eaa7e012
because it does some nasty things (enable interrupts in the middle
of __switch_to).

I looked through the old patchkit and couldn't find any specific
PREEMPT problems. All code it changes should run with preempt_off

You could verify with sticking WARN_ON_ONCE(preemptible()) into
all the places acc207616a91a413a50fdd8847a747c4a7324167
changes (__unlazy_fpu, math_state_restore) and see if that triggers
anywhere.

-Andi

2008-06-01 12:13:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack


[
Fixed Andi's email and added those that Signed off on the
problem commit.
]


On Sun, 1 Jun 2008 [email protected] wrote:

>
> Hi,
>
> > On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> > I tracked this down to a single kernel configuration option. If
> > CONFIG_PREEMPT is set to 'y' the application will start crashing.
> > If CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> > application will run without errors.
>
> With lots of help from Heinz-Bernd, Bernd and Oliver of the Einstein@Home
> project I now found the the following:
>
> 1. Einstein@home will crash with trap #8 if the problem is present. The
> error occurs between some minutes after starting Einstein up to more than
> 10 hours after starting Einstein. This seems to depend on how many other
> applications are used on the system (it takes much more time, if only the
> Einstein processes are active on the system).
>
> 2. The error was introduced between kernel.org kernels 2.6.19.7 and 2.6.20.
> It is still present in 2.6.26-rc4
>
> 3. If I revert the patch
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167

Hi,

Thanks for bisecting this. I added the commiter and those that signed off
on the problem commit. They are the ones that will need to help you solve
this.

-- Steve


>
> in 2.6.20, Einstein does not crash anymore (program was run for more than
> 30 hours while system was in normal use with programming, multi-media
> etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
>
> Now I need some help as I am not an expert in this area. What I assume is
> that either the state of the FPU is not always restored (perhaps if the
> process is swapped between the two cores?) or it is restored more than
> once. Please keep in mind, that I am always running two Einstein processes
> simultaneously on my two cores!
> I am willing to do further testing of this problem if someone can give me a
> hint how to continue.
>
> Bye,
>
> J?rgen
>

2008-06-01 16:47:51

by Jürgen Mell

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> [email protected] writes:
> > or it is restored more than
> > once. Please keep in mind, that I am always running two Einstein
> > processes simultaneously on my two cores!
> > I am willing to do further testing of this problem if someone can give
> > me a hint how to continue.
>
> My bet would have been actually on
> aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some nasty
> things (enable interrupts in the middle of __switch_to).
>
> I looked through the old patchkit and couldn't find any specific
> PREEMPT problems. All code it changes should run with preempt_off
>
> You could verify with sticking WARN_ON_ONCE(preemptible()) into
> all the places acc207616a91a413a50fdd8847a747c4a7324167
> changes (__unlazy_fpu, math_state_restore) and see if that triggers
> anywhere.

No, that did not trigger. I put the WARN_ON_ONCE into process.c, traps.c
and also into the __unlazy_fpu macro in i387.h but I got no messages
anywhere (dmesg, /var/log/messages, /var/log/warn) when the trap #8
occurred.
Meanwhile I am also running the tests on another machine to make sure it is
not a hardware-related problem.

Any new ideas are welcome!

Meanwhile I will go back to 2.6.20 and revert
aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong track...

Bye,
J?rgen

2008-06-01 17:10:44

by Simon Holm Thøgersen

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

søn, 01 06 2008 kl. 11:01 +0200, skrev [email protected]:
[...]
>
> 3. If I revert the patch
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167
>
> in 2.6.20, Einstein does not crash anymore (program was run for more than
> 30 hours while system was in normal use with programming, multi-media
> etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
[...]

I don't think the bisected commit is responsible for anything, but
triggering a bug elsewhere with your workload. I've been chasing the
same problem I think, but with other symptoms.

I'm triggering the following by running an lguest guest, but I guess the
workload just need to have the right scheduler intensity to trigger the
bug.

BUG: sleeping function called from invalid context at mm/slab.c:3052
in_atomic():1, irqs_disabled():0
Pid: 4771, comm: lguest Not tainted
2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
[<c01146ee>] __might_sleep+0xe4/0xeb
[<c01605d9>] kmem_cache_alloc+0x22/0xb4
[<c0108479>] init_fpu+0xb0/0x14d
[<c0104768>] math_state_restore+0x26/0x5d
[<c01045ab>] device_not_available+0x43/0x48
[<c011007b>] ? handle_vm86_fault+0x213/0x6b8
[<c01029ad>] ? __switch_to+0x23/0x113
[<c02d6c9f>] schedule+0x221/0x2a4
[<c02d716a>] ? schedule_timeout+0x16/0x89
[<c016ed36>] ? __pollwait+0xaa/0xb0
[<c0168358>] ? pipe_poll+0x29/0x89
[<c016e79a>] ? do_select+0x478/0x4cd
[<c016ec8c>] ? __pollwait+0x0/0xb0
[<c0116657>] ? default_wake_function+0x0/0xd
[<c0116657>] ? default_wake_function+0x0/0xd
[<c0116657>] ? default_wake_function+0x0/0xd
[<c0116657>] ? default_wake_function+0x0/0xd
[<c0116657>] ? default_wake_function+0x0/0xd
[<c012f49d>] ? getnstimeofday+0x37/0xb7
[<c012d810>] ? ktime_get_ts+0x40/0x44
[<c012d827>] ? ktime_get+0x13/0x2f
[<c011406f>] ? hrtick_start_fair+0xd5/0x111
[<c01216f6>] ? internal_add_timer+0x8e/0x92
[<c01f27b7>] ? delay_tsc+0x4f/0x68
[<c025b05b>] ? ide_dma_intr+0x0/0x79
[<c01f274d>] ? __delay+0x9/0xb
[<c01f2766>] ? __const_udelay+0x17/0x19
[<c0256df7>] ? ide_execute_command+0x7b/0x95
[<c025a7a2>] ? ide_dma_start+0x24/0x36
[<c0259ebb>] ? do_rw_taskfile+0x1be/0x1cf
[<c025c31a>] ? ide_do_rw_disk+0x19a/0x1dd
[<c01f2766>] ? __const_udelay+0x17/0x19
[<c025554e>] ? ide_do_request+0x838/0x875
[<c0254941>] ? ide_end_request+0x7d/0x99
[<c016e9d8>] ? core_sys_select+0x1e9/0x2c7
[<c0146aa4>] ? find_lock_page+0xa1/0xbb
[<c01489a2>] ? filemap_fault+0x21c/0x382
[<c0146942>] ? unlock_page+0x24/0x27
[<c0151bac>] ? __do_fault+0x314/0x34c
[<c0153a56>] ? handle_mm_fault+0x291/0x65a
[<c016edcb>] ? sys_select+0x8f/0x143
[<c02d9dc8>] ? do_page_fault+0x33c/0x616
[<c0103a67>] ? sysenter_past_esp+0x78/0xb1
[<c02d0000>] ? packet_rcv+0x159/0x2c7
=======================
BUG: sleeping function called from invalid context at mm/slab.c:3052
in_atomic():1, irqs_disabled():0
Pid: 4771, comm: lguest Not tainted
2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
[<c01146ee>] __might_sleep+0xe4/0xeb
[<c01605d9>] kmem_cache_alloc+0x22/0xb4
[<c012f49d>] ? getnstimeofday+0x37/0xb7
[<c012f49d>] ? getnstimeofday+0x37/0xb7
[<c0108479>] init_fpu+0xb0/0x14d
[<c0104768>] math_state_restore+0x26/0x5d
[<c01045ab>] device_not_available+0x43/0x48
[<c0110000>] ? handle_vm86_fault+0x198/0x6b8
[<c01029ad>] ? __switch_to+0x23/0x113
[<c02d6c9f>] schedule+0x221/0x2a4
[<c012a95b>] ? prepare_to_wait+0x6c/0x84
[<c0168bdd>] ? pipe_wait+0x53/0x72
[<c012a76f>] ? autoremove_wake_function+0x0/0x30
[<c01692b9>] ? pipe_read+0x29a/0x302
[<c012d6ae>] ? hrtimer_start+0xcc/0xf8
[<c0115efd>] ? hrtick_set+0xcc/0x140
[<c01630b0>] ? do_sync_read+0xba/0xf8
[<c012a76f>] ? autoremove_wake_function+0x0/0x30
[<c01638b8>] ? default_llseek+0xa7/0xb5
[<c0162ff6>] ? do_sync_read+0x0/0xf8
[<c0163795>] ? vfs_read+0x8a/0x106
[<c0163ada>] ? sys_read+0x3b/0x60
[<c0103a67>] ? sysenter_past_esp+0x78/0xb1
=======================

I'm getting the traces with CONFIG_DEBUG_PREEMPT=y and no
CONFIG_DEBUG_SPINLOCK, since otherwise I'd just get

BUG: sleeping function called from invalid context at mm/slab.c:3052
BUG: spinlock recursion on CPU#0, lguest/5428

and nothing further. Using CONFIG_DEBUG_PREEMPT=y,
CONFIG_DEBUG_SPINLOCK=y and booting with "lapic nmi_watchdog=2" I was
also able to grab the following over netconsole though

BUG: sleeping function called from invalid context at mm/slab.c:3052
BUG: spinlock recursion on CPU#0, lguest/5428
BUG: NMI Watchdog detected LOCKUP on CPU0, ip c01f1f76, registers:
Modules linked in: lg tun arc4 ecb crypto_blkcipher cryptomgr
crypto_algapi ieee80211_crypt_wep bridge llc snd_seq snd_seq_device
radeonfb uhci_hcd snd_intel8x0 ehci_hcd snd_ac97_codec ipw2200 usbcore
fb ac97_bus fb_ddc backlight snd_pcm ieee80211 firewire_ohci
firewire_core i2c_algo_bit snd_timer intel_agp cfbcopyarea snd agpgart
i2c_i801 ieee80211_crypt firmware_class cfbimgblt soundcore crc_itu_t
cfbfillrect rtc iTCO_wdt snd_page_alloc i2c_core

Pid: 5428, comm: lguest Not tainted
(2.6.26-rc4debug-locks-extended-00103-g1beee8d #2)
EIP: 0060:[<c01f1f76>] EFLAGS: 00000002 CPU: 0
EIP is at delay_tsc+0x2e/0x68
EAX: 100a8436 EBX: c03acd84 ECX: 00000000 EDX: 0000004c
ESI: 100a83fb EDI: 00000001 EBP: e9651738 ESP: e9651724
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process lguest (pid: 5428, ti=e9650000 task=efbc92c0 task.ti=e964c000)
Stack: 00000001 100a83fb c03acd84 02df69af 00000001 e9651740 c01f1f2d
e9651758
c01ff481 593ac9e8 c03acd84 00000092 000078cf e9651774 c02d4968
00000000
00000002 c01141c4 c03acd84 00000001 e965178c c01141c4 00000001
00007892
Call Trace:
[<c01f1f2d>] ? __delay+0x9/0xb
[<c01ff481>] ? _raw_spin_lock+0x83/0xcb
[<c02d4968>] ? _spin_lock_irqsave+0x46/0x4f
[<c01141c4>] ? __wake_up+0x15/0x3b
[<c01141c4>] ? __wake_up+0x15/0x3b
[<c0119cf0>] ? wake_up_klogd+0x2e/0x31
[<c0119ea0>] ? release_console_sem+0x1ad/0x1b5
[<c011a3d1>] ? vprintk+0x383/0x3c2
[<c0134276>] ? debug_check_no_locks_freed+0x111/0x13c
[<c0134276>] ? debug_check_no_locks_freed+0x111/0x13c
[<c011a425>] ? printk+0x15/0x17
[<c01ff286>] ? spin_bug+0x3f/0x80
[<c01ff432>] ? _raw_spin_lock+0x34/0xcb
[<c02d474e>] ? _spin_lock+0x2a/0x32
[<c011471b>] ? task_rq_lock+0x2a/0x31
[<c011471b>] ? task_rq_lock+0x2a/0x31
[<c0114dc8>] ? try_to_wake_up+0x15/0x81
[<c0114e3f>] ? default_wake_function+0xb/0xd
[<c012985f>] ? autoremove_wake_function+0xe/0x30
[<c0113626>] ? __wake_up_common+0x35/0x5b
[<c01141d7>] ? __wake_up+0x28/0x3b
[<c0119cf0>] ? wake_up_klogd+0x2e/0x31
[<c0119ea0>] ? release_console_sem+0x1ad/0x1b5
[<c011a3d1>] ? vprintk+0x383/0x3c2
[<c011a425>] ? printk+0x15/0x17
[<c0115232>] ? __might_sleep+0x8c/0x108
[<c0160c37>] ? kmem_cache_alloc+0x22/0xd6
[<c0108441>] ? init_fpu+0xb0/0x14d
[<c01047fd>] ? math_state_restore+0x2b/0x67
[<c010463a>] ? device_not_available+0x4e/0x53
[<c02d2b8a>] ? schedule+0x1fe/0x2a1
[<c011007b>] ? handle_vm86_fault+0x31f/0x6b8
[<c01029c1>] ? __switch_to+0x23/0x113
[<c02d2bc6>] ? schedule+0x23a/0x2a1
[<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
[<c02d4c3f>] ? _spin_unlock_irqrestore+0x56/0x6c
[<c02d2db8>] ? schedule_timeout+0x16/0x89
[<c016ee08>] ? __pollwait+0xaa/0xb0
[<c0168604>] ? pipe_poll+0x29/0x89
[<c016e844>] ? do_select+0x4a6/0x4f8
[<c016ed5e>] ? __pollwait+0x0/0xb0
[<c0114e34>] ? default_wake_function+0x0/0xd
[<c0114e34>] ? default_wake_function+0x0/0xd
[<c0114e34>] ? default_wake_function+0x0/0xd
[<c0114e34>] ? default_wake_function+0x0/0xd
[<c0114e34>] ? default_wake_function+0x0/0xd
[<c0113dfa>] ? hrtick_start_fair+0x117/0x11f
[<c01607a8>] ? kfree+0xad/0xc0
[<c0160833>] ? kmem_cache_free+0x78/0x8a
[<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
[<c0133fac>] ? mark_held_locks+0x4e/0x66
[<c01349ae>] ? __lock_acquire+0x488/0xb4c
[<c016eaad>] ? core_sys_select+0x217/0x2f2
[<c02d4d42>] ? _read_unlock_irq+0x36/0x4b
[<c0147fab>] ? unlock_page+0x24/0x27
[<c0152c7f>] ? __do_fault+0x31e/0x356
[<c0153fb3>] ? handle_mm_fault+0x2a0/0x637
[<c016ee9d>] ? sys_select+0x8f/0x143
[<c02d67d3>] ? do_page_fault+0x352/0x631
[<c0103b59>] ? restore_nocheck+0x12/0x15
[<c02d6481>] ? do_page_fault+0x0/0x631
[<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
[<c0103a4f>] ? sysenter_past_esp+0x78/0xd1
=======================
Code: 57 56 53 83 ec 08 89 45 ec b8 01 00 00 00 e8 b8 4b 0e 00 0f 31 0f
1f 40 00 b9 00 00 00 00 89 c6 89 c8 09 f0 89 45 f0 f3 90 0f 31 <0f> 1f
40 00 b9 00 00 00 00 89 c6 89 c8 09 f0 2b 45 f0 3b 45 ec


Let us take a closer look at __switch_to. I'm using test data from yet
another trace (that I actually captured before all the other traces
presented here).

BUG: unable to handle kernel NULL pointer dereference at 000001ff
IP: [<c0102964>] __switch_to+0x19/0xff

That is the only part of the trace I've got; there wasn't produced more.
Anyway, for that particular kernel I used, __switch_to disassembled to

0xc0102955 <__switch_to+10>: mov 0x4(%eax),%eax
0xc0102958 <__switch_to+13>: testb $0x1,0xc(%eax)
0xc010295c <__switch_to+17>: je 0xc0102995 <__switch_to+74>
0xc010295e <__switch_to+19>: mov 0x26c(%edi),%eax
0xc0102964 <__switch_to+25>: fnsave (%eax)

with corresponding source code

struct task_struct * __switch_to(struct task_struct *prev_p, struct
task_struct *next_p)
{
[...]
__unlazy_fpu(prev_p);
[...]
}

static inline void __unlazy_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
__save_init_fpu(tsk);
stts();
} else
tsk->fpu_counter = 0;
}

where __save_init_fpu(tsk) accesses
tsk->thread.xstate->fxsave

The first mov, testb and je of the disassembly are the if statement, and
the second mov and fnsave are part of __save_init_fpu that access
tsk->thread.xstate->fxsave. %eax (initially) and %edi hold prev_p from
__switch_to.

Hope that was readable and not too confusing. Now comes some partly
guesswork from my side that could be wrong. As far as I can tell
%eax (initially) and %edi / prev_p must hold 0xffffff93. Why the first
mov is valid then I'm not sure, but the if statement must be true by
chance. It is not by pure chance though, since prev_p always has the
same value (i.e. it is consistently "dereference at 000001ff" I get
across multiple runs).

So I'd say some sort of corruption caused by very fast context switch
between the launcher and switcher (or-whatever-their-names-are)
processes in lguest.

That is as far as I have been able to debug this, suggestions are
welcome. I guess I should note that I haven't tried with preemption
disabled, but I don't think there's much point in it.


Simon Holm Thøgersen

2008-06-02 21:31:53

by Suresh Siddha

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Sun, Jun 01, 2008 at 07:11:02PM +0200, Simon Holm Th?gersen wrote:
> s?n, 01 06 2008 kl. 11:01 +0200, skrev [email protected]:
> [...]
> >
> > 3. If I revert the patch
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167
> >
> > in 2.6.20, Einstein does not crash anymore (program was run for more than
> > 30 hours while system was in normal use with programming, multi-media
> > etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
> [...]
>
> I don't think the bisected commit is responsible for anything, but
> triggering a bug elsewhere with your workload. I've been chasing the
> same problem I think, but with other symptoms.

Simon, There seems to be multiple issues here. fpu corruption seems
to be a different problem compared to the issue you have encountered.

>
> I'm triggering the following by running an lguest guest, but I guess the
> workload just need to have the right scheduler intensity to trigger the
> bug.
>
> BUG: sleeping function called from invalid context at mm/slab.c:3052
> in_atomic():1, irqs_disabled():0
> Pid: 4771, comm: lguest Not tainted
> 2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
> [<c01146ee>] __might_sleep+0xe4/0xeb
> [<c01605d9>] kmem_cache_alloc+0x22/0xb4
> [<c0108479>] init_fpu+0xb0/0x14d
> [<c0104768>] math_state_restore+0x26/0x5d
> [<c01045ab>] device_not_available+0x43/0x48
> [<c011007b>] ? handle_vm86_fault+0x213/0x6b8
> [<c01029ad>] ? __switch_to+0x23/0x113
> [<c02d6c9f>] schedule+0x221/0x2a4

Simon, Can you please try the appended patch and see if it fixes this
issue? Thanks.
---

[patch] x86: fix blocking call (math_state_restore()) condition in __switch_to

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

Signed-off-by: Suresh Siddha <[email protected]>
---

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index f8476df..6d54833 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -649,8 +649,11 @@ struct task_struct * __switch_to(struct task_struct *prev_p, struct task_struct
/* If the task has used fpu the last 5 timeslices, just do a full
* restore of the math state immediately to avoid the trap; the
* chances of needing FPU soon are obviously high now
+ *
+ * tsk_used_math() checks prevent calling math_state_restore(),
+ * which can sleep in the case of !tsk_used_math()
*/
- if (next_p->fpu_counter > 5)
+ if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
math_state_restore();

/*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e2319f3..ac54ff5 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -658,8 +658,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/* If the task has used fpu the last 5 timeslices, just do a full
* restore of the math state immediately to avoid the trap; the
* chances of needing FPU soon are obviously high now
+ *
+ * tsk_used_math() checks prevent calling math_state_restore(),
+ * which can sleep in the case of !tsk_used_math()
*/
- if (next_p->fpu_counter>5)
+ if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
math_state_restore();
return prev_p;
}

2008-06-02 21:38:13

by Suresh Siddha

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Sun, Jun 01, 2008 at 06:47:29PM +0200, J?rgen Mell wrote:
> On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > [email protected] writes:
> > > or it is restored more than
> > > once. Please keep in mind, that I am always running two Einstein
> > > processes simultaneously on my two cores!
> > > I am willing to do further testing of this problem if someone can give
> > > me a hint how to continue.
> >
> > My bet would have been actually on
> > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some nasty
> > things (enable interrupts in the middle of __switch_to).
> >
> > I looked through the old patchkit and couldn't find any specific
> > PREEMPT problems. All code it changes should run with preempt_off
> >
> > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > changes (__unlazy_fpu, math_state_restore) and see if that triggers
> > anywhere.
>
> No, that did not trigger. I put the WARN_ON_ONCE into process.c, traps.c
> and also into the __unlazy_fpu macro in i387.h but I got no messages
> anywhere (dmesg, /var/log/messages, /var/log/warn) when the trap #8
> occurred.
> Meanwhile I am also running the tests on another machine to make sure it is
> not a hardware-related problem.
>
> Any new ideas are welcome!
>
> Meanwhile I will go back to 2.6.20 and revert
> aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong track...

2.6.20 doesn't have the commit 'aa283f49276e7d840a40fb01eee6de97eaa7e012'

As you are seeing this corruption problem starting from 2.6.20,
atleast recent(in 2.6.26 series) fpu changes don't play a role in this.

I will try to reproduce your issue.

thanks,
suresh

2008-06-02 22:57:39

by Suresh Siddha

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Mon, Jun 02, 2008 at 02:37:56PM -0700, Suresh Siddha wrote:
> On Sun, Jun 01, 2008 at 06:47:29PM +0200, J?rgen Mell wrote:
> > On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > > [email protected] writes:
> > > > or it is restored more than
> > > > once. Please keep in mind, that I am always running two Einstein
> > > > processes simultaneously on my two cores!
> > > > I am willing to do further testing of this problem if someone can give
> > > > me a hint how to continue.
> > >
> > > My bet would have been actually on
> > > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some nasty
> > > things (enable interrupts in the middle of __switch_to).
> > >
> > > I looked through the old patchkit and couldn't find any specific
> > > PREEMPT problems. All code it changes should run with preempt_off
> > >
> > > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > > changes (__unlazy_fpu, math_state_restore) and see if that triggers
> > > anywhere.
> >
> > No, that did not trigger. I put the WARN_ON_ONCE into process.c, traps.c
> > and also into the __unlazy_fpu macro in i387.h but I got no messages
> > anywhere (dmesg, /var/log/messages, /var/log/warn) when the trap #8
> > occurred.
> > Meanwhile I am also running the tests on another machine to make sure it is
> > not a hardware-related problem.
> >
> > Any new ideas are welcome!
> >
> > Meanwhile I will go back to 2.6.20 and revert
> > aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong track...
>
> 2.6.20 doesn't have the commit 'aa283f49276e7d840a40fb01eee6de97eaa7e012'
>
> As you are seeing this corruption problem starting from 2.6.20,
> atleast recent(in 2.6.26 series) fpu changes don't play a role in this.
>
> I will try to reproduce your issue.

J?rgen, I think I found the reason for your issue aswell.

As you observed, it is probably coming from the commit
acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU optimization

It's a side affect though. This is the failing scenario:

process 'A' in save_i387_ia32() just after clear_used_math()

Got an interrupt and pre-empted out.

At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()

This results in init_fpu() during the __switch_to()'s math_state_restore()

And resulting in fpu corruption which will be saved/restored
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.

So in short, yes the problem shows up for preempt enabled kernels and the
same patch I sent out 30 mins back (appended again) should fix your issue
aswell. Can you please test this and check if my theory is indeed correct.
If it fixes your issue aswell, then I will re-post the patch with
a new changelog and updated comments in the patch.

thanks,
suresh
---

[patch] x86: fix blocking call (math_state_restore()) condition in __switch_to

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

Signed-off-by: Suresh Siddha <[email protected]>
---

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index f8476df..6d54833 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -649,8 +649,11 @@ struct task_struct * __switch_to(struct task_struct *prev_p, struct task_struct
/* If the task has used fpu the last 5 timeslices, just do a full
* restore of the math state immediately to avoid the trap; the
* chances of needing FPU soon are obviously high now
+ *
+ * tsk_used_math() checks prevent calling math_state_restore(),
+ * which can sleep in the case of !tsk_used_math()
*/
- if (next_p->fpu_counter > 5)
+ if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
math_state_restore();

/*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e2319f3..ac54ff5 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -658,8 +658,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/* If the task has used fpu the last 5 timeslices, just do a full
* restore of the math state immediately to avoid the trap; the
* chances of needing FPU soon are obviously high now
+ *
+ * tsk_used_math() checks prevent calling math_state_restore(),
+ * which can sleep in the case of !tsk_used_math()
*/
- if (next_p->fpu_counter>5)
+ if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
math_state_restore();
return prev_p;

2008-06-03 06:04:03

by Jürgen Mell

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Dienstag, 3. Juni 2008, Suresh Siddha wrote:
> On Mon, Jun 02, 2008 at 02:37:56PM -0700, Suresh Siddha wrote:
> > On Sun, Jun 01, 2008 at 06:47:29PM +0200, J?rgen Mell wrote:
> > > On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > > > [email protected] writes:
> > > > > or it is restored more than
> > > > > once. Please keep in mind, that I am always running two Einstein
> > > > > processes simultaneously on my two cores!
> > > > > I am willing to do further testing of this problem if someone
> > > > > can give me a hint how to continue.
> > > >
> > > > My bet would have been actually on
> > > > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some
> > > > nasty things (enable interrupts in the middle of __switch_to).
> > > >
> > > > I looked through the old patchkit and couldn't find any specific
> > > > PREEMPT problems. All code it changes should run with preempt_off
> > > >
> > > > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > > > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > > > changes (__unlazy_fpu, math_state_restore) and see if that
> > > > triggers anywhere.
> > >
> > > No, that did not trigger. I put the WARN_ON_ONCE into process.c,
> > > traps.c and also into the __unlazy_fpu macro in i387.h but I got no
> > > messages anywhere (dmesg, /var/log/messages, /var/log/warn) when the
> > > trap #8 occurred.
> > > Meanwhile I am also running the tests on another machine to make
> > > sure it is not a hardware-related problem.
> > >
> > > Any new ideas are welcome!
> > >
> > > Meanwhile I will go back to 2.6.20 and revert
> > > aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong
> > > track...
> >
> > 2.6.20 doesn't have the commit
> > 'aa283f49276e7d840a40fb01eee6de97eaa7e012'
> >
> > As you are seeing this corruption problem starting from 2.6.20,
> > atleast recent(in 2.6.26 series) fpu changes don't play a role in
> > this.
> >
> > I will try to reproduce your issue.
>
> J?rgen, I think I found the reason for your issue aswell.
>
> As you observed, it is probably coming from the commit
> acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> optimization
>
> It's a side affect though. This is the failing scenario:
>
> process 'A' in save_i387_ia32() just after clear_used_math()
>
> Got an interrupt and pre-empted out.
>
> At the next context switch to process 'A' again, kernel tries to restore
> the math state proactively and sees a fpu_counter > 0 and
> !tsk_used_math()
>
> This results in init_fpu() during the __switch_to()'s
> math_state_restore()
>
> And resulting in fpu corruption which will be saved/restored
> (save_i387_fxsave and restore_i387_fxsave) during the remaining
> part of the signal handling after the context switch.
>
> So in short, yes the problem shows up for preempt enabled kernels and
> the same patch I sent out 30 mins back (appended again) should fix your
> issue aswell. Can you please test this and check if my theory is indeed
> correct. If it fixes your issue aswell, then I will re-post the patch
> with a new changelog and updated comments in the patch.
>
> thanks,
> suresh

Many thanks for the patch!
I will test this immediately but as it takes some time to make sure that
the problem is really gone it will take some time until I have a report.

Thanks,
J?rgen

2008-06-03 13:27:15

by Simon Holm Thøgersen

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

[CC lguest <[email protected]>]
man, 02 06 2008 kl. 14:31 -0700, skrev Suresh Siddha:
> On Sun, Jun 01, 2008 at 07:11:02PM +0200, Simon Holm Thøgersen wrote:
> > søn, 01 06 2008 kl. 11:01 +0200, skrev [email protected]:
> > [...]
> > >
> > > 3. If I revert the patch
> > >
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167
> > >
> > > in 2.6.20, Einstein does not crash anymore (program was run for more than
> > > 30 hours while system was in normal use with programming, multi-media
> > > etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
> > [...]
> >
> > I don't think the bisected commit is responsible for anything, but
> > triggering a bug elsewhere with your workload. I've been chasing the
> > same problem I think, but with other symptoms.
>
> Simon, There seems to be multiple issues here. fpu corruption seems
> to be a different problem compared to the issue you have encountered.
>
> >
> > I'm triggering the following by running an lguest guest, but I guess the
> > workload just need to have the right scheduler intensity to trigger the
> > bug.
> >
> > BUG: sleeping function called from invalid context at mm/slab.c:3052
> > in_atomic():1, irqs_disabled():0
> > Pid: 4771, comm: lguest Not tainted
> > 2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
> > [<c01146ee>] __might_sleep+0xe4/0xeb
> > [<c01605d9>] kmem_cache_alloc+0x22/0xb4
> > [<c0108479>] init_fpu+0xb0/0x14d
> > [<c0104768>] math_state_restore+0x26/0x5d
> > [<c01045ab>] device_not_available+0x43/0x48
> > [<c011007b>] ? handle_vm86_fault+0x213/0x6b8
> > [<c01029ad>] ? __switch_to+0x23/0x113
> > [<c02d6c9f>] schedule+0x221/0x2a4
>
> Simon, Can you please try the appended patch and see if it fixes this
> issue? Thanks.
> ---
>
> [patch] x86: fix blocking call (math_state_restore()) condition in __switch_to
>
> Add tsk_used_math() checks to prevent calling math_state_restore()
> which can sleep in the case of !tsk_used_math(). This prevents
> making a blocking call in __switch_to().
>
> Apparently "fpu_counter > 5" check is not enough, as in some signal handling
> and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
>
> Signed-off-by: Suresh Siddha <[email protected]>
> ---
Hi Suresh,

and thanks for looking into this. The patch did not fix the issue, but
I'm wondering if it is lguest calling math_state_restore in
drivers/lguest/x86/core.c that could be the problem?

Regardless of whether that is the issue, I think you (and everybody
else) will be able to reproduce the issue by running lguest on a 32-bit
system with CONFIG_PREEMPT=y and CONFIG_DEBUG_SPINLOCKS_SLEEP=y (I'm
also using CONFIG_DEBUG_PREEMPT=y but I don't think that matter). If you
download http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img and
run

Documentation/lguest/lguest 64 vmlinux --block=initrd-1.1-i386.img

it will very likely trigger the backtraces I'm getting. Has anyone on
the lguest list tried running with CONFIG_PREEMPT?


Simon

2008-06-03 19:44:06

by Suresh Siddha

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Tue, Jun 03, 2008 at 03:23:30PM +0200, Simon Holm Th?gersen wrote:
> > [patch] x86: fix blocking call (math_state_restore()) condition in __switch_to
> >
> > Add tsk_used_math() checks to prevent calling math_state_restore()
> > which can sleep in the case of !tsk_used_math(). This prevents
> > making a blocking call in __switch_to().
> >
> > Apparently "fpu_counter > 5" check is not enough, as in some signal handling
> > and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
> >
> > Signed-off-by: Suresh Siddha <[email protected]>
> > ---
> Hi Suresh,
>
> and thanks for looking into this. The patch did not fix the issue, but

Ok. You are probably running into different issue (please see below).
Above patch fixes a real issue and I think it should fix the fpu
corruption issue encountered by J?rgen. I will wait for J?rgen's test
results before pushing the above patch.

> I'm wondering if it is lguest calling math_state_restore in
> drivers/lguest/x86/core.c that could be the problem?

I def see a problem. In lguest_arch_run_guest(), MSR_IA32_SYSENTER_CS is not
restored before making the math_state_restore() call. As the
math_state_restore() can now block, this can cause issues. Appending
patch should fix this issue and from your oops report, it is not very
clear if the below patch should help fix your issue or not. Can you
please try the below appended patch.

>
> Regardless of whether that is the issue, I think you (and everybody
> else) will be able to reproduce the issue by running lguest on a 32-bit
> system with CONFIG_PREEMPT=y and CONFIG_DEBUG_SPINLOCKS_SLEEP=y (I'm
> also using CONFIG_DEBUG_PREEMPT=y but I don't think that matter). If you
> download http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img and
> run
>
> Documentation/lguest/lguest 64 vmlinux --block=initrd-1.1-i386.img
>
> it will very likely trigger the backtraces I'm getting.

If the below patch doesn't help fix your issue, then I will try to reproduce
it locally here.

thanks,
suresh
---

[patch] x86, lguest: Restore MSR_IA32_SYSENTER_CS before math_state_restore()

Restore MSR_IA32_SYSENTER_CS before making the blocking math_state_restore()
in lguest_arch_run_guest()

Signed-off-by: Suresh Siddha <[email protected]>
---

diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 5126d5d..9279ce7 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -191,6 +191,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
* was doing. */
run_guest_once(cpu, lguest_pages(raw_smp_processor_id()));

+ /* Restore SYSENTER if it's supposed to be on. */
+ if (boot_cpu_has(X86_FEATURE_SEP))
+ wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
+
/* Note that the "regs" structure contains two extra entries which are
* not really registers: a trap number which says what interrupt or
* trap made the switcher code come back, and an error code which some
@@ -203,13 +207,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
if (cpu->regs->trapnum == 14)
cpu->arch.last_pagefault = read_cr2();
/* Similarly, if we took a trap because the Guest used the FPU,
- * we have to restore the FPU it expects to see. */
+ * we have to restore the FPU it expects to see. math_state_restore() can
+ * re-enable interrupts and block. */
else if (cpu->regs->trapnum == 7)
math_state_restore();
-
- /* Restore SYSENTER if it's supposed to be on. */
- if (boot_cpu_has(X86_FEATURE_SEP))
- wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
}

/*H:130 Now we've examined the hypercall code; our Guest can make requests.

2008-06-03 21:08:44

by Simon Holm Thøgersen

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

tir, 03 06 2008 kl. 12:43 -0700, skrev Suresh Siddha:
> On Tue, Jun 03, 2008 at 03:23:30PM +0200, Simon Holm Thøgersen wrote:
> > > [patch] x86: fix blocking call (math_state_restore()) condition in __switch_to
> > >
> > > Add tsk_used_math() checks to prevent calling math_state_restore()
> > > which can sleep in the case of !tsk_used_math(). This prevents
> > > making a blocking call in __switch_to().
> > >
> > > Apparently "fpu_counter > 5" check is not enough, as in some signal handling
> > > and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
> > >
> > > Signed-off-by: Suresh Siddha <[email protected]>
> > > ---
> > Hi Suresh,
> >
> > and thanks for looking into this. The patch did not fix the issue, but
>
> Ok. You are probably running into different issue (please see below).
> Above patch fixes a real issue and I think it should fix the fpu
> corruption issue encountered by Jürgen. I will wait for Jürgen's test
> results before pushing the above patch.
>
> > I'm wondering if it is lguest calling math_state_restore in
> > drivers/lguest/x86/core.c that could be the problem?
>
> I def see a problem. In lguest_arch_run_guest(), MSR_IA32_SYSENTER_CS is not
> restored before making the math_state_restore() call. As the
> math_state_restore() can now block, this can cause issues. Appending
> patch should fix this issue and from your oops report, it is not very
> clear if the below patch should help fix your issue or not. Can you
> please try the below appended patch.
>
> >
> > Regardless of whether that is the issue, I think you (and everybody
> > else) will be able to reproduce the issue by running lguest on a 32-bit
> > system with CONFIG_PREEMPT=y and CONFIG_DEBUG_SPINLOCKS_SLEEP=y (I'm
> > also using CONFIG_DEBUG_PREEMPT=y but I don't think that matter). If you
> > download http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img and
> > run
> >
> > Documentation/lguest/lguest 64 vmlinux --block=initrd-1.1-i386.img
> >
> > it will very likely trigger the backtraces I'm getting.
>
> If the below patch doesn't help fix your issue, then I will try to reproduce
> it locally here.
>
It didn't, I'm afraid. I had both patches applied, and was able to
reproduce the trace fairly easily. The patches might have made the issue
slightly more difficult to provoke, but I'm not sure.


Simon

2008-06-04 07:44:49

by Jürgen Mell

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Tuesday, 3rd June 2008, Suresh Siddha wrote:
> On Mon, Jun 02, 2008 at 02:37:56PM -0700, Suresh Siddha wrote:
> > On Sun, Jun 01, 2008 at 06:47:29PM +0200, J?rgen Mell wrote:
> > > On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > > > [email protected] writes:
> > > > > or it is restored more than
> > > > > once. Please keep in mind, that I am always running two Einstein
> > > > > processes simultaneously on my two cores!
> > > > > I am willing to do further testing of this problem if someone
> > > > > can give me a hint how to continue.
> > > >
> > > > My bet would have been actually on
> > > > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some
> > > > nasty things (enable interrupts in the middle of __switch_to).
> > > >
> > > > I looked through the old patchkit and couldn't find any specific
> > > > PREEMPT problems. All code it changes should run with preempt_off
> > > >
> > > > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > > > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > > > changes (__unlazy_fpu, math_state_restore) and see if that
> > > > triggers anywhere.
> > >
> > > No, that did not trigger. I put the WARN_ON_ONCE into process.c,
> > > traps.c and also into the __unlazy_fpu macro in i387.h but I got no
> > > messages anywhere (dmesg, /var/log/messages, /var/log/warn) when the
> > > trap #8 occurred.
> > > Meanwhile I am also running the tests on another machine to make
> > > sure it is not a hardware-related problem.
> > >
> > > Any new ideas are welcome!
> > >
> > > Meanwhile I will go back to 2.6.20 and revert
> > > aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong
> > > track...
> >
> > 2.6.20 doesn't have the commit
> > 'aa283f49276e7d840a40fb01eee6de97eaa7e012'
> >
> > As you are seeing this corruption problem starting from 2.6.20,
> > atleast recent(in 2.6.26 series) fpu changes don't play a role in
> > this.
> >
> > I will try to reproduce your issue.
>
> J?rgen, I think I found the reason for your issue aswell.
>
> As you observed, it is probably coming from the commit
> acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> optimization
>
> It's a side affect though. This is the failing scenario:
>
> process 'A' in save_i387_ia32() just after clear_used_math()
>
> Got an interrupt and pre-empted out.
>
> At the next context switch to process 'A' again, kernel tries to restore
> the math state proactively and sees a fpu_counter > 0 and
> !tsk_used_math()
>
> This results in init_fpu() during the __switch_to()'s
> math_state_restore()
>
> And resulting in fpu corruption which will be saved/restored
> (save_i387_fxsave and restore_i387_fxsave) during the remaining
> part of the signal handling after the context switch.
>
> So in short, yes the problem shows up for preempt enabled kernels and
> the same patch I sent out 30 mins back (appended again) should fix your
> issue aswell. Can you please test this and check if my theory is indeed
> correct. If it fixes your issue aswell, then I will re-post the patch
> with a new changelog and updated comments in the patch.
>

I have applied your patch to both an openSUSE 2.6.22.17 kernel and a
2.6.26-rc4 kernel.org kernel and run the test with Einstein@home on two
different machines. One machine is running 24 hours now, the other 18
hours.

During this time there were no faults on both machines.

As it never before took more than 12 hours until the first appearance of
the problem, I think your patch fixed it. Very good work!

I will continue running the test, but I believe we can call this fixed.

Thank you again!
J?rgen

2008-06-04 10:53:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack


* J?rgen Mell <[email protected]> wrote:

> > J?rgen, I think I found the reason for your issue aswell.
> >
> > As you observed, it is probably coming from the commit
> > acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> > optimization
> >
> > It's a side affect though. This is the failing scenario:
> >
> > process 'A' in save_i387_ia32() just after clear_used_math()
> >
> > Got an interrupt and pre-empted out.
> >
> > At the next context switch to process 'A' again, kernel tries to restore
> > the math state proactively and sees a fpu_counter > 0 and
> > !tsk_used_math()
> >
> > This results in init_fpu() during the __switch_to()'s
> > math_state_restore()
> >
> > And resulting in fpu corruption which will be saved/restored
> > (save_i387_fxsave and restore_i387_fxsave) during the remaining
> > part of the signal handling after the context switch.
> >
> > So in short, yes the problem shows up for preempt enabled kernels and
> > the same patch I sent out 30 mins back (appended again) should fix your
> > issue aswell. Can you please test this and check if my theory is indeed
> > correct. If it fixes your issue aswell, then I will re-post the patch
> > with a new changelog and updated comments in the patch.
> >
>
> I have applied your patch to both an openSUSE 2.6.22.17 kernel and a
> 2.6.26-rc4 kernel.org kernel and run the test with Einstein@home on
> two different machines. One machine is running 24 hours now, the other
> 18 hours.
>
> During this time there were no faults on both machines.
>
> As it never before took more than 12 hours until the first appearance
> of the problem, I think your patch fixed it. Very good work!
>
> I will continue running the test, but I believe we can call this
> fixed.
>
> Thank you again!

fix applied to tip/x86/urgent. Thanks everyone, nice find!

Ingo

2008-06-04 12:55:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack


On Wed, 4 Jun 2008, Ingo Molnar wrote:
> * J?rgen Mell <[email protected]> wrote:
> >
> > Thank you again!

No, thank YOU. What you gave was excellent feedback.

>
> fix applied to tip/x86/urgent. Thanks everyone, nice find!
>

Ingo, should this be forward to the stable branch?

-- Steve

2008-06-04 13:06:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack


* Steven Rostedt <[email protected]> wrote:

> > fix applied to tip/x86/urgent. Thanks everyone, nice find!
>
> Ingo, should this be forward to the stable branch?

yes indeed. Cc:-ed.

Ingo