2009-03-02 10:52:18

by Carsten Aulbert

[permalink] [raw]
Subject: Re: kernel BUG at kernel/workqueue.c:291

Hi again,

in the mean time 43 of our nodes were struck with this error. It seems
that the jobs of a certain user can trigger this bug, however I have no
clue how to really trigger it manually.

My questions:
Is this a know bug for 2.6.27.14 (we can upgrade to .19 if necessary),
but as this file was not modyfied recently, I suspect there is no ready
fix for that.

Do you need any more info of our systems (Intel X3220 based Supermirco
systems), the kernel config (deadline scheduler in use,...) or something
else?

Carsten Aulbert schrieb:
> [228704.928037] ------------[ cut here ]------------
> [228704.928224] kernel BUG at kernel/workqueue.c:291!
> [228704.928404] invalid opcode: 0000 [1] SMP
> [228704.928647] CPU 0
> [228704.928852] Modules linked in: lm92 w83793 w83781d hwmon_vid hwmon nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 netconsole configfs ipmi_si ipmi_devintf ipmi_watchdog ipmi_poweroff ipmi_msghandler e1000e i2c_i801 8250_pnp 8250 serial_core i2c_core
> [228704.930002] Pid: 1609, comm: rpciod/0 Not tainted 2.6.27.14-nodes #1
> [228704.930002] RIP: 0010:[<ffffffff8023c6db>] [<ffffffff8023c6db>] run_workqueue+0x6f/0x102
> [228704.930002] RSP: 0018:ffff880214bcdec0 EFLAGS: 00010207
> [228704.930002] RAX: 0000000000000000 RBX: ffff880214b82f40 RCX: ffff880215444418
> [228704.930002] RDX: ffff880187d07d58 RSI: ffff880214bcdee0 RDI: ffff880215444410
> [228704.930002] RBP: ffffffffa0077186 R08: ffff880214bcc000 R09: ffff88021491f808
> [228704.930002] R10: 0000000000000246 R11: ffff880187d07d50 R12: ffff880214ad7d28
> [228704.930002] R13: ffffffff806065a0 R14: ffffffff80607280 R15: 0000000000000000
> [228704.930002] FS: 0000000000000000(0000) GS:ffffffff80636040(0000) knlGS:0000000000000000
> [228704.930002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [228704.930002] CR2: 00007fc056333fd8 CR3: 00000001ed270000 CR4: 00000000000006e0
> [228704.930002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [228704.930002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [228704.930002] Process rpciod/0 (pid: 1609, threadinfo ffff880214bcc000, task ffff880217b08780)
> [228704.930002] Stack: ffff880214b82f40 ffff880214b82f40 ffff880214b82f58 ffffffff8023cff3
> [228704.930002] 0000000000000000 ffff880217b08780 ffffffff8023f7d7 ffff880214bcdef8
> [228704.930002] ffff880214bcdef8 ffffffff806065a0 ffffffff80607280 ffff880214b82f40
> [228704.930002] Call Trace:
> [228704.930002] [<ffffffff8023cff3>] ? worker_thread+0x90/0x9b
> [228704.930002] [<ffffffff8023f7d7>] ? autoremove_wake_function+0x0/0x2e
> [228704.930002] [<ffffffff8023cf63>] ? worker_thread+0x0/0x9b
> [228704.930002] [<ffffffff8023f6c2>] ? kthread+0x47/0x75
> [228704.930002] [<ffffffff8022afa8>] ? schedule_tail+0x27/0x5f
> [228704.930002] [<ffffffff8020ccb9>] ? child_rip+0xa/0x11
> [228704.930002] [<ffffffff8023f67b>] ? kthread+0x0/0x75
> [228704.930002] [<ffffffff8020ccaf>] ? child_rip+0x0/0x11
> [228704.930002]
> [228704.930002]
> [228704.930002] Code: 6f 18 48 89 7b 30 48 8b 11 48 8b 41 08 48 89 42 08 48 89 10 48 89 49 08 48 89 09 fe 03 fb 48 8b 41 f8 48 83 e0 fc 48 39 d8 74 04 <0f> 0b eb fe f0 80 61 f8 fe ff d5 65 48 8b 04 25 10 00 00 00 8b
> [228704.930002] RIP [<ffffffff8023c6db>] run_workqueue+0x6f/0x102
> [228704.930002] RSP <ffff880214bcdec0>
> [228704.941003] ---[ end trace deef6e5387b5a584 ]---

Thanks for any input, for reight now I'm quite helpless....

Cheers

Carsten


2009-03-03 07:27:19

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at kernel/workqueue.c:291

On Mon, 02 Mar 2009 11:51:48 +0100 Carsten Aulbert <[email protected]> wrote:

> Hi again,
>
> in the mean time 43 of our nodes were struck with this error. It seems
> that the jobs of a certain user can trigger this bug, however I have no
> clue how to really trigger it manually.

That's a lot of nodes.

> My questions:
> Is this a know bug for 2.6.27.14 (we can upgrade to .19 if necessary),
> but as this file was not modyfied recently, I suspect there is no ready
> fix for that.
>
> Do you need any more info of our systems (Intel X3220 based Supermirco
> systems), the kernel config (deadline scheduler in use,...) or something
> else?

Let's cc the NFS developers, see if this rpciod crash is familiar to them?

> Carsten Aulbert schrieb:
> > [228704.928037] ------------[ cut here ]------------
> > [228704.928224] kernel BUG at kernel/workqueue.c:291!
> > [228704.928404] invalid opcode: 0000 [1] SMP
> > [228704.928647] CPU 0
> > [228704.928852] Modules linked in: lm92 w83793 w83781d hwmon_vid hwmon nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 netconsole configfs ipmi_si ipmi_devintf ipmi_watchdog ipmi_poweroff ipmi_msghandler e1000e i2c_i801 8250_pnp 8250 serial_core i2c_core
> > [228704.930002] Pid: 1609, comm: rpciod/0 Not tainted 2.6.27.14-nodes #1
> > [228704.930002] RIP: 0010:[<ffffffff8023c6db>] [<ffffffff8023c6db>] run_workqueue+0x6f/0x102
> > [228704.930002] RSP: 0018:ffff880214bcdec0 EFLAGS: 00010207
> > [228704.930002] RAX: 0000000000000000 RBX: ffff880214b82f40 RCX: ffff880215444418
> > [228704.930002] RDX: ffff880187d07d58 RSI: ffff880214bcdee0 RDI: ffff880215444410
> > [228704.930002] RBP: ffffffffa0077186 R08: ffff880214bcc000 R09: ffff88021491f808
> > [228704.930002] R10: 0000000000000246 R11: ffff880187d07d50 R12: ffff880214ad7d28
> > [228704.930002] R13: ffffffff806065a0 R14: ffffffff80607280 R15: 0000000000000000
> > [228704.930002] FS: 0000000000000000(0000) GS:ffffffff80636040(0000) knlGS:0000000000000000
> > [228704.930002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [228704.930002] CR2: 00007fc056333fd8 CR3: 00000001ed270000 CR4: 00000000000006e0
> > [228704.930002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [228704.930002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [228704.930002] Process rpciod/0 (pid: 1609, threadinfo ffff880214bcc000, task ffff880217b08780)
> > [228704.930002] Stack: ffff880214b82f40 ffff880214b82f40 ffff880214b82f58 ffffffff8023cff3
> > [228704.930002] 0000000000000000 ffff880217b08780 ffffffff8023f7d7 ffff880214bcdef8
> > [228704.930002] ffff880214bcdef8 ffffffff806065a0 ffffffff80607280 ffff880214b82f40
> > [228704.930002] Call Trace:
> > [228704.930002] [<ffffffff8023cff3>] ? worker_thread+0x90/0x9b
> > [228704.930002] [<ffffffff8023f7d7>] ? autoremove_wake_function+0x0/0x2e
> > [228704.930002] [<ffffffff8023cf63>] ? worker_thread+0x0/0x9b
> > [228704.930002] [<ffffffff8023f6c2>] ? kthread+0x47/0x75
> > [228704.930002] [<ffffffff8022afa8>] ? schedule_tail+0x27/0x5f
> > [228704.930002] [<ffffffff8020ccb9>] ? child_rip+0xa/0x11
> > [228704.930002] [<ffffffff8023f67b>] ? kthread+0x0/0x75
> > [228704.930002] [<ffffffff8020ccaf>] ? child_rip+0x0/0x11
> > [228704.930002]
> > [228704.930002]
> > [228704.930002] Code: 6f 18 48 89 7b 30 48 8b 11 48 8b 41 08 48 89 42 08 48 89 10 48 89 49 08 48 89 09 fe 03 fb 48 8b 41 f8 48 83 e0 fc 48 39 d8 74 04 <0f> 0b eb fe f0 80 61 f8 fe ff d5 65 48 8b 04 25 10 00 00 00 8b
> > [228704.930002] RIP [<ffffffff8023c6db>] run_workqueue+0x6f/0x102
> > [228704.930002] RSP <ffff880214bcdec0>
> > [228704.941003] ---[ end trace deef6e5387b5a584 ]---
>
> Thanks for any input, for reight now I'm quite helpless....

2009-03-03 07:36:50

by Carsten Aulbert

[permalink] [raw]
Subject: Re: kernel BUG at kernel/workqueue.c:291

Hi Andrew,

Andrew Morton schrieb:
>> in the mean time 43 of our nodes were struck with this error. It seems
>> that the jobs of a certain user can trigger this bug, however I have no
>> clue how to really trigger it manually.
>
> That's a lot of nodes.
Quite, at least some percentage of the whole system.
>
> Let's cc the NFS developers, see if this rpciod crash is familiar to them?

Good idea, I should have done that myself - sorry

I think we were able to pinpoint at least one user's jobs to "generate"
this, but I need to talk to him, what access patterns are used via NFS here.

Systems are running Debian Etch,

dpkg -l | awk '/(nfs|portmap)/ {print $2 "\t\t" $3}'
libnfsidmap2 0.18-0
mountnfs 1.1.3-2
nfs-common 1.0.10-6+etch.1
nfs-kernel-server 1.0.10-6+etch.1
portmap 5-26


If you need more, please let me know! So far the machines are 'on hold',
i.e. we have not yet rebooted them to be able to find out a little bit
more. If you(anyone) think we can reboot them and put back into our
scheduling queue, please let me know, the users are waiting for more cycles.

Thanks a lot

Carsten

2009-03-03 15:17:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: kernel BUG at kernel/workqueue.c:291

On Mon, 2009-03-02 at 23:26 -0800, Andrew Morton wrote:
> On Mon, 02 Mar 2009 11:51:48 +0100 Carsten Aulbert <[email protected]> wrote:
>
> > Hi again,
> >
> > in the mean time 43 of our nodes were struck with this error. It seems
> > that the jobs of a certain user can trigger this bug, however I have no
> > clue how to really trigger it manually.
>
> That's a lot of nodes.
>
> > My questions:
> > Is this a know bug for 2.6.27.14 (we can upgrade to .19 if necessary),
> > but as this file was not modyfied recently, I suspect there is no ready
> > fix for that.
> >
> > Do you need any more info of our systems (Intel X3220 based Supermirco
> > systems), the kernel config (deadline scheduler in use,...) or something
> > else?
>
> Let's cc the NFS developers, see if this rpciod crash is familiar to them?

Nope. I've never seen it before.

struct rpc_task does admittedly share storage for the work queue and the
rpc wait queue links, but if that were to be causing the reported
corruption, then it would mean that an rpc_task is simultaneously on a
wait queue and trying to execute on a work queue. I have no history of
that ever having happened.

Trond

> > Carsten Aulbert schrieb:
> > > [228704.928037] ------------[ cut here ]------------
> > > [228704.928224] kernel BUG at kernel/workqueue.c:291!
> > > [228704.928404] invalid opcode: 0000 [1] SMP
> > > [228704.928647] CPU 0
> > > [228704.928852] Modules linked in: lm92 w83793 w83781d hwmon_vid hwmon nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 netconsole configfs ipmi_si ipmi_devintf ipmi_watchdog ipmi_poweroff ipmi_msghandler e1000e i2c_i801 8250_pnp 8250 serial_core i2c_core
> > > [228704.930002] Pid: 1609, comm: rpciod/0 Not tainted 2.6.27.14-nodes #1
> > > [228704.930002] RIP: 0010:[<ffffffff8023c6db>] [<ffffffff8023c6db>] run_workqueue+0x6f/0x102
> > > [228704.930002] RSP: 0018:ffff880214bcdec0 EFLAGS: 00010207
> > > [228704.930002] RAX: 0000000000000000 RBX: ffff880214b82f40 RCX: ffff880215444418
> > > [228704.930002] RDX: ffff880187d07d58 RSI: ffff880214bcdee0 RDI: ffff880215444410
> > > [228704.930002] RBP: ffffffffa0077186 R08: ffff880214bcc000 R09: ffff88021491f808
> > > [228704.930002] R10: 0000000000000246 R11: ffff880187d07d50 R12: ffff880214ad7d28
> > > [228704.930002] R13: ffffffff806065a0 R14: ffffffff80607280 R15: 0000000000000000
> > > [228704.930002] FS: 0000000000000000(0000) GS:ffffffff80636040(0000) knlGS:0000000000000000
> > > [228704.930002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > > [228704.930002] CR2: 00007fc056333fd8 CR3: 00000001ed270000 CR4: 00000000000006e0
> > > [228704.930002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [228704.930002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > [228704.930002] Process rpciod/0 (pid: 1609, threadinfo ffff880214bcc000, task ffff880217b08780)
> > > [228704.930002] Stack: ffff880214b82f40 ffff880214b82f40 ffff880214b82f58 ffffffff8023cff3
> > > [228704.930002] 0000000000000000 ffff880217b08780 ffffffff8023f7d7 ffff880214bcdef8
> > > [228704.930002] ffff880214bcdef8 ffffffff806065a0 ffffffff80607280 ffff880214b82f40
> > > [228704.930002] Call Trace:
> > > [228704.930002] [<ffffffff8023cff3>] ? worker_thread+0x90/0x9b
> > > [228704.930002] [<ffffffff8023f7d7>] ? autoremove_wake_function+0x0/0x2e
> > > [228704.930002] [<ffffffff8023cf63>] ? worker_thread+0x0/0x9b
> > > [228704.930002] [<ffffffff8023f6c2>] ? kthread+0x47/0x75
> > > [228704.930002] [<ffffffff8022afa8>] ? schedule_tail+0x27/0x5f
> > > [228704.930002] [<ffffffff8020ccb9>] ? child_rip+0xa/0x11
> > > [228704.930002] [<ffffffff8023f67b>] ? kthread+0x0/0x75
> > > [228704.930002] [<ffffffff8020ccaf>] ? child_rip+0x0/0x11
> > > [228704.930002]
> > > [228704.930002]
> > > [228704.930002] Code: 6f 18 48 89 7b 30 48 8b 11 48 8b 41 08 48 89 42 08 48 89 10 48 89 49 08 48 89 09 fe 03 fb 48 8b 41 f8 48 83 e0 fc 48 39 d8 74 04 <0f> 0b eb fe f0 80 61 f8 fe ff d5 65 48 8b 04 25 10 00 00 00 8b
> > > [228704.930002] RIP [<ffffffff8023c6db>] run_workqueue+0x6f/0x102
> > > [228704.930002] RSP <ffff880214bcdec0>
> > > [228704.941003] ---[ end trace deef6e5387b5a584 ]---
> >
> > Thanks for any input, for reight now I'm quite helpless....
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-03-03 15:23:36

by Carsten Aulbert

[permalink] [raw]
Subject: Re: kernel BUG at kernel/workqueue.c:291

Hi Trond,

Trond Myklebust schrieb:
> struct rpc_task does admittedly share storage for the work queue and the
> rpc wait queue links, but if that were to be causing the reported
> corruption, then it would mean that an rpc_task is simultaneously on a
> wait queue and trying to execute on a work queue. I have no history of
> that ever having happened.

Anything I might be able to give to you helping you to narrow it down
somewhat? As written I suspect a certain type of user jobs, but since
literally 1000s of these ran over the course of several days it might be
hard to trigger this reliably again.

Cheers

Carsten

2009-03-03 20:41:38

by Aaron Straus

[permalink] [raw]
Subject: Re: kernel BUG at kernel/workqueue.c:291

Hi Carsten,

On Mar 03 04:23 PM, Carsten Aulbert wrote:
> > struct rpc_task does admittedly share storage for the work queue and the
> > rpc wait queue links, but if that were to be causing the reported
> > corruption, then it would mean that an rpc_task is simultaneously on a
> > wait queue and trying to execute on a work queue. I have no history of
> > that ever having happened.
>
> Anything I might be able to give to you helping you to narrow it down
> somewhat? As written I suspect a certain type of user jobs, but since
> literally 1000s of these ran over the course of several days it might be
> hard to trigger this reliably again.

Out of curiosity, do you know (approximately) the last kernel you ran
which did not show the BUG?

i.e. we know 2.6.27.14 is bad, what was the last known good kernel?

Thanks!

=a=


--
===================
Aaron Straus
[email protected]

2009-03-03 21:22:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: kernel BUG at kernel/workqueue.c:291

On Tue, 2009-03-03 at 16:23 +0100, Carsten Aulbert wrote:
> Hi Trond,
>
> Trond Myklebust schrieb:
> > struct rpc_task does admittedly share storage for the work queue and the
> > rpc wait queue links, but if that were to be causing the reported
> > corruption, then it would mean that an rpc_task is simultaneously on a
> > wait queue and trying to execute on a work queue. I have no history of
> > that ever having happened.
>
> Anything I might be able to give to you helping you to narrow it down
> somewhat? As written I suspect a certain type of user jobs, but since
> literally 1000s of these ran over the course of several days it might be
> hard to trigger this reliably again.

Could you try triggering it after applying the following patch? I think
I've spotted a potential race condition.

Cheers
Trond
---------------------------------------------------------------------------
From: Trond Myklebust <[email protected]>
SUNRPC: Tighten up the task locking rules in __rpc_execute()

We should probably not be testing any flags after we've cleared the
RPC_TASK_RUNNING flag, since rpc_make_runnable() is then free to assign the
rpc_task to another workqueue, which may then destroy it.

We can fix any races with rpc_make_runnable() by ensuring that we only
clear the RPC_TASK_RUNNING flag while holding the rpc_wait_queue->lock that
the task is supposed to be sleeping on (and then checking whether or not
the task really is sleeping).

Signed-off-by: Trond Myklebust <[email protected]>
---

net/sunrpc/sched.c | 33 ++++++++++++++++++++-------------
1 files changed, 20 insertions(+), 13 deletions(-)


diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 385f427..ff50a05 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -293,11 +293,6 @@ static void rpc_make_runnable(struct rpc_task *task)
rpc_clear_queued(task);
if (rpc_test_and_set_running(task))
return;
- /* We might have raced */
- if (RPC_IS_QUEUED(task)) {
- rpc_clear_running(task);
- return;
- }
if (RPC_IS_ASYNC(task)) {
int status;

@@ -607,7 +602,9 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata)
*/
static void __rpc_execute(struct rpc_task *task)
{
- int status = 0;
+ struct rpc_wait_queue *queue;
+ int task_is_async = RPC_IS_ASYNC(task);
+ int status = 0;

dprintk("RPC: %5u __rpc_execute flags=0x%x\n",
task->tk_pid, task->tk_flags);
@@ -647,15 +644,25 @@ static void __rpc_execute(struct rpc_task *task)
*/
if (!RPC_IS_QUEUED(task))
continue;
- rpc_clear_running(task);
- if (RPC_IS_ASYNC(task)) {
- /* Careful! we may have raced... */
- if (RPC_IS_QUEUED(task))
- return;
- if (rpc_test_and_set_running(task))
- return;
+ /*
+ * The queue->lock protects against races with
+ * rpc_make_runnable().
+ *
+ * Note that once we clear RPC_TASK_RUNNING on an asynchronous
+ * rpc_task, rpc_make_runnable() can assign it to a
+ * different workqueue. We therefore cannot assume that the
+ * rpc_task pointer may still be dereferenced.
+ */
+ queue = task->tk_waitqueue;
+ spin_lock_bh(&queue->lock);
+ if (!RPC_IS_QUEUED(task)) {
+ spin_unlock_bh(&queue->lock);
continue;
}
+ rpc_clear_running(task);
+ spin_unlock_bh(&queue->lock);
+ if (task_is_async)
+ return;

/* sync task: sleep here */
dprintk("RPC: %5u sync task going to sleep\n", task->tk_pid);