2004-09-12 18:48:12

by Roel van der Made

[permalink] [raw]
Subject: kernel 2.6.9-rc1-mm4 oops

Hi there,

This morning one of our (MySQL-)database serves crashed with the
following kernel trace. Anyone has an idea what could've caused it?
The machine is an SMP Xeon 2.8Ghz with 4G internal Reg. ECC ram running
4 scsi disks in sw raid 5 on a Debian (almost sid-)distribution.

The trace:

------------[ cut here ]------------
kernel BUG at kernel/exit.c:852!
invalid operand: 0000 [#1]
SMP
Modules linked in: ip_vs_wlc af_packet ipt_MARK iptable_mangle ip_tables ip_vs tg3 e1000 e100 eepro100 mii
nfsd exportfs nfs lockd sunrpc unix
CPU: 0
EIP: 0060:[<c011df03>] Not tainted VLI
EFLAGS: 00010246 (2.6.9-rc1-mm4-fw-xeon.1)
EIP is at next_thread+0xc/0x41
eax: 00000000 ebx: 00000001 ecx: 00000001 edx: e93c3aa0
esi: 00000000 edi: e93c3aa0 ebp: 00000000 esp: f3893dd8
ds: 007b es: 007b ss: 0068
Process snmpd (pid: 1182, threadinfo=f3892000 task=f3fa1550)
Stack: c0182368 f3893f14 e93c3aa0 c016cecb c30c8a00 c011542b c03bfbe0 c30c8a00
c017fcf6 e18d6eb0 e93c3aa0 0000000d c017fdad e93c3aa0 4143bbb4 247966f0
c016c653 c03bfbe0 e18d6eb0 c03a4bc5 c01802a0 f3e56c20 e18d6eb0 0000000d
Call Trace:
[<c0182368>] do_task_stat+0x279/0x752
[<c016cecb>] alloc_inode+0x1b/0x146

[<c011542b>] do_page_fault+0x19d/0x5c7
[<c017fcf6>] task_dumpable+0x39/0x4a
[<c017fdad>] proc_pid_make_inode+0xa6/0xe5
[<c016c653>] d_rehash+0x55/0x79
[<c01802a0>] proc_pident_lookup+0x100/0x26c
[<c0161586>] real_lookup+0xcd/0xf0
[<c016b468>] dput+0x24/0x209
[<c0162247>] link_path_walk+0xa3e/0xd89
[<c0182883>] proc_tgid_stat+0x1f/0x23
[<c017f3ed>] proc_info_read+0x6a/0x9f
[<c015417f>] vfs_read+0xbc/0x127
[<c015444d>] sys_read+0x51/0x80
[<c0105cdf>] syscall_call+0x7/0xb
Code: 8b 44 24 0c 89 04 24 e8 1d fc ff ff 83 ec 04 0f b6 44 24 08 c1 e0 08 89 04 24 e8 0a fc ff ff 89 c2
8b 80 d0 04 00 00 85 c0 75 08 <0f> 0b 54 03 e5

Thanks,

--
Roel van der Made .--.
GNU/Linux Systems/Network Engineer |o_o |
Telegraaf Media ICT - Internet Services |:_/ |
Tel.: 020 585 2229 // \ \
GnuPG Key available at: http://roel.net/gpgkey.asc


2004-09-13 07:55:29

by Kirill Korotaev

[permalink] [raw]
Subject: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

--- ./kernel/exit.c.nt 2004-09-13 11:18:26.000000000 +0400
+++ ./kernel/exit.c 2004-09-13 11:53:23.611075360 +0400
@@ -848,10 +848,7 @@ asmlinkage long sys_exit(int error_code)
task_t fastcall *next_thread(const task_t *p)
{
#ifdef CONFIG_SMP
- if (!p->sighand)
- BUG();
- if (!spin_is_locked(&p->sighand->siglock) &&
- !rwlock_is_locked(&tasklist_lock))
+ if (!rwlock_is_locked(&tasklist_lock))
BUG();
#endif
return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);


Attachments:
diff-next_thread (491.00 B)

2004-09-13 08:06:25

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

On Mon, Sep 13, 2004 at 12:06:39PM +0400, Kirill Korotaev wrote:
> --- ./kernel/exit.c.nt 2004-09-13 11:18:26.000000000 +0400
> +++ ./kernel/exit.c 2004-09-13 11:53:23.611075360 +0400
> @@ -848,10 +848,7 @@ asmlinkage long sys_exit(int error_code)
> task_t fastcall *next_thread(const task_t *p)
> {
> #ifdef CONFIG_SMP
> - if (!p->sighand)
> - BUG();
> - if (!spin_is_locked(&p->sighand->siglock) &&
> - !rwlock_is_locked(&tasklist_lock))
> + if (!rwlock_is_locked(&tasklist_lock))
> BUG();
> #endif
> return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);

Hmm, BUG_ON() for whatever users (embedded?) that define BUG()/BUG_ON()
checks to nops may be useful here since no side effects are expected
from rwlock_is_locked().

FWIW I got this once while running initscripts(!) on a 4x logical x86-64
on virgin 2.6.9-rc1-mm4 but couldn't reproduce it.


-- wli

2004-09-13 08:29:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops


* Kirill Korotaev <[email protected]> wrote:

> This patch removes sighand checks from the next_thread(), since they
> are incorrect and has nothing to do with the next_thread() function.
> So they could trigger BUG() when there were no actually bug at all.

the problem is, generally it is not valid to have a thread on the thread
list that has no ->sighand structure. This is what happens when we exit
a task:

write_lock_irq(&tasklist_lock);
...
__exit_sighand(p);
__unhash_process(p);

the BUG() is useful for all the code that uses next_thread() - you can
only do a safe next_thread() iteration if you've locked ->sighand.

there's one exception: in the procfs code we can get a reference to
almost-dead tasks as well that are not even in the tasklist. (This is a
relatively new thing introduced by me that can happen due to the
preemptability of some of the exit path.)

so i believe your fix papers over the real bug which is the use of an
almost-dead task for thread iterations. Since we've already done
__unhash_process() not doing the BUG introduces a more subtle bug: the
use of the stale PID pointers! So i believe the right fix is the one
below, which (under the safety of read_lock(tasklock)) checks for the
availability of task->sighand - and skips the thread iterations if so.

Ingo

--- linux/fs/proc/array.c.orig
+++ linux/fs/proc/array.c
@@ -356,7 +356,7 @@ static int do_task_stat(struct task_stru
stime = task->signal->stime;
}
}
- if (whole) {
+ if (whole && task->sighand) {
t = task;
do {
min_flt += t->min_flt;

2004-09-13 09:04:07

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

Ingo Molnar wrote:
> * Kirill Korotaev <[email protected]> wrote:

>>This patch removes sighand checks from the next_thread(), since they
>>are incorrect and has nothing to do with the next_thread() function.
>>So they could trigger BUG() when there were no actually bug at all.
>
> the problem is, generally it is not valid to have a thread on the thread
> list that has no ->sighand structure. This is what happens when we exit
> a task:
>
> write_lock_irq(&tasklist_lock);
> ...
> __exit_sighand(p);
> __unhash_process(p);

> the BUG() is useful for all the code that uses next_thread() - you can
> only do a safe next_thread() iteration if you've locked ->sighand.

1. I don't see spin_lock() on p->sighand->siglock in do_task_stat()
before calling next_thread(). And the check inside next_thread() permits
only one of the locks to be taken:

if (!spin_is_locked(&p->sighand->siglock) &&
!rwlock_is_locked(&tasklist_lock))

which is probably wrong, since tasklist_lock is always required!

2. I think the idea of checking sighand is quite obscure.
Probably it would be better to call pid_alive() for check at such places
in proc, isn't it?

3. And yes, now I agree that this check and BUG() prevented
next_thread() from walking through the deleted list_head in tsk->pid_list.

But I would propose to reorganize these checks in next_thread() to
something like this:

if (!rwlock_is_locked(&tasklist_lock) || p->pids[PIDTYPE_TGID].nr == 0)
BUG();

the last check ensures that we are still hashed and this check is more
straithforward for understanding, agree?

4. If we do checks this way, then we can found strange proc numeric
results. Suppose, you have read the do_task_stat() and it iterated
through the threads and summed the times in this loop with
next_thread(). Next, the thread dies and you can read the results w/o
this loop and threads times, only this thread stats. Looks a bit
invalid. Don't you think so?
Maybe we should return an error?

> so i believe your fix papers over the real bug which is the use of an
> almost-dead task for thread iterations. Since we've already done
> __unhash_process() not doing the BUG introduces a more subtle bug: the
> use of the stale PID pointers! So i believe the right fix is the one
> below, which (under the safety of read_lock(tasklock)) checks for the
> availability of task->sighand - and skips the thread iterations if so.
You patch is correct. Though I would like to hear on my previous notes
about pid_alive().

Kirill

2004-09-13 09:23:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops


* Kirill Korotaev <[email protected]> wrote:

> >the BUG() is useful for all the code that uses next_thread() - you can
> >only do a safe next_thread() iteration if you've locked ->sighand.

> 1. I don't see spin_lock() on p->sighand->siglock in do_task_stat()
> before calling next_thread(). And the check inside next_thread() permits
> only one of the locks to be taken:
>
> if (!spin_is_locked(&p->sighand->siglock) &&
> !rwlock_is_locked(&tasklist_lock))
>
> which is probably wrong, since tasklist_lock is always required!

It's not 'wrong' in terms of correctness it's simply too restrictive for
no reason. I agree that we should check for the tasklist lock only.

> 2. I think the idea of checking sighand is quite obscure. Probably it
> would be better to call pid_alive() for check at such places in proc,
> isn't it?

yeah, it's just as good of a check.

> 3. And yes, now I agree that this check and BUG() prevented
> next_thread() from walking through the deleted list_head in
> tsk->pid_list.

good.

> But I would propose to reorganize these checks in next_thread() to
> something like this:
>
> if (!rwlock_is_locked(&tasklist_lock) || p->pids[PIDTYPE_TGID].nr == 0)
> BUG();
>
> the last check ensures that we are still hashed and this check is more
> straithforward for understanding, agree?

yep - please send a new patch to Andrew.

> 4. If we do checks this way, then we can found strange proc numeric
> results. Suppose, you have read the do_task_stat() and it iterated
> through the threads and summed the times in this loop with
> next_thread(). Next, the thread dies and you can read the results w/o
> this loop and threads times, only this thread stats. Looks a bit
> invalid. Don't you think so? Maybe we should return an error?

i'd just skip filling out that statistics field - like my minimal patch
does.

Ingo

2004-09-13 13:34:44

by Roel van der Made

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

Hi,

On Mon, Sep 13, 2004 at 11:24:43AM +0200, Ingo Molnar wrote:
> * Kirill Korotaev <[email protected]> wrote:
> > >the BUG() is useful for all the code that uses next_thread() - you can
> > >only do a safe next_thread() iteration if you've locked ->sighand.
> > 1. I don't see spin_lock() on p->sighand->siglock in do_task_stat()
> > before calling next_thread(). And the check inside next_thread() permits
> > only one of the locks to be taken:
> > if (!spin_is_locked(&p->sighand->siglock) &&
> > !rwlock_is_locked(&tasklist_lock))

<snip>

> > the last check ensures that we are still hashed and this check is more
> > straithforward for understanding, agree?
>
> yep - please send a new patch to Andrew.

I'll be awaiting the patch and give it a shot.

Thanks all for the feedback.

--
Roel van der Made .--.
GNU/Linux Systems/Network Engineer |o_o |
Telegraaf Media ICT - Internet Services |:_/ |
Tel.: 020 585 2229 // \ \
GnuPG Key available at: http://roel.net/gpgkey.asc


Attachments:
(No filename) (1.06 kB)
(No filename) (189.00 B)
Download all attachments

2004-09-13 13:38:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops


* Roel van der Made <[email protected]> wrote:

> > > the last check ensures that we are still hashed and this check is more
> > > straithforward for understanding, agree?
> >
> > yep - please send a new patch to Andrew.
>
> I'll be awaiting the patch and give it a shot.
>
> Thanks all for the feedback.

you can try my patch too, it will do the job of fixing the bug. The
other changes we talked about are only improvements to the debugging
infrastructure.

Ingo

2004-09-13 13:42:46

by Roel van der Made

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

On Mon, Sep 13, 2004 at 03:38:47PM +0200, Ingo Molnar wrote:
> * Roel van der Made <[email protected]> wrote:
> > > > the last check ensures that we are still hashed and this check is more
> > > > straithforward for understanding, agree?
> > > yep - please send a new patch to Andrew.
> > I'll be awaiting the patch and give it a shot.
> > Thanks all for the feedback.
> you can try my patch too, it will do the job of fixing the bug. The
> other changes we talked about are only improvements to the debugging
> infrastructure.

Saw there's an mm5 release already, is your patch included in this one
also?

> Ingo

--
Roel van der Made .--.
GNU/Linux Systems/Network Engineer |o_o |
Telegraaf Media ICT - Internet Services |:_/ |
Tel.: 020 585 2229 // \ \
GnuPG Key available at: http://roel.net/gpgkey.asc


Attachments:
(No filename) (884.00 B)
(No filename) (189.00 B)
Download all attachments

2004-09-13 14:33:27

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

--- ./kernel/exit.c.nt 2004-09-13 18:00:12.727181136 +0400
+++ ./kernel/exit.c 2004-09-13 18:00:51.864231400 +0400
@@ -848,10 +848,7 @@ asmlinkage long sys_exit(int error_code)
task_t fastcall *next_thread(const task_t *p)
{
#ifdef CONFIG_SMP
- if (!p->sighand)
- BUG();
- if (!spin_is_locked(&p->sighand->siglock) &&
- !rwlock_is_locked(&tasklist_lock))
+ if (!rwlock_is_locked(&tasklist_lock) || p->pids[PIDTYPE_TGID].nr == 0)
BUG();
#endif
return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);


Attachments:
diff-task_stat (1.34 kB)
diff-next_thread (524.00 B)
Download all attachments

2004-09-13 15:02:41

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

Roel van der Made wrote:
> On Mon, Sep 13, 2004 at 03:38:47PM +0200, Ingo Molnar wrote:
>
>>* Roel van der Made <[email protected]> wrote:
>>
>>>>>the last check ensures that we are still hashed and this check is more
>>>>>straithforward for understanding, agree?
>>>>
>>>>yep - please send a new patch to Andrew.
>>>
>>>I'll be awaiting the patch and give it a shot.
>>>Thanks all for the feedback.
>>
>>you can try my patch too, it will do the job of fixing the bug. The
>>other changes we talked about are only improvements to the debugging
>>infrastructure.
>

> Saw there's an mm5 release already, is your patch included in this one
> also?
I've checked that -mm5 contains fix to the BUG() you reported.
It's slight different (looks almost like Ingo Molnar patch), but it's there.

Kirill