2008-12-15 18:29:25

by Alexander Beregalov

[permalink] [raw]
Subject: next-20081215: kernel BUG at block/cfq-iosched.c:1243

Hi

[ 22.857720] scsi_scan_0 used greatest stack depth: 4864 bytes left
[ 22.859221] scsi 1:0:0:0: CD-ROM HL-DT-ST CD-ROM GCR-8240N 1.10 PQ
: 0 ANSI: 5
[ 23.077531] ------------[ cut here ]------------
[ 23.080733] kernel BUG at block/cfq-iosched.c:1243!
[ 23.080733] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 23.080733] last sysfs file:
[ 23.080733] CPU 3
[ 23.080733] Modules linked in:
[ 23.080733] Pid: 580, comm: scsi_scan_0 Tainted: G W 2.6.28-rc8-next-
20081215 #98
[ 23.080733] RIP: 0010:[<ffffffff80371160>] [<ffffffff80371160>] cic_free_fun
c+0x18/0x88
[ 23.080733] RSP: 0018:ffff88007ce9fe40 EFLAGS: 00010246
[ 23.080733] RAX: ffff88007ceb6618 RBX: ffff88007cf1d680 RCX: 0000000000000001
[ 23.080733] RDX: 0000000000000001 RSI: ffff88007cf1d680 RDI: ffff88007cea6000
[ 23.080733] RBP: ffff88007ce9fe60 R08: 0000000000000246 R09: 0000000000000001
[ 23.080733] R10: ffff88007ce9fde0 R11: ffff88007f0489c0 R12: ffff88007cea6000
[ 23.080733] R13: ffff88007ce9ff00 R14: ffff88007cc49568 R15: ffff88007cc49568
[ 23.080733] FS: 0000000000000000(0000) GS:ffff88007f402a28(0000) knlGS:00000
00000000000
[ 23.080733] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 23.080733] CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
[ 23.080733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.080733] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 23.080733] Process scsi_scan_0 (pid: 580, threadinfo ffff88007ce9e000, task
ffff88007cc49310)
[ 23.080733] Stack:
[ 23.080733] ffff88007cf1d6f8 ffff88007cea6000 ffff88007ce9ff00 ffff88007cc49
568
[ 23.080733] ffff88007ce9fe80 ffffffff803711f5 ffff88007cea6000 ffff88007cc49
310
[ 23.080733] ffff88007ce9fea0 ffffffff8036c985 ffff88007ce9ff00 ffff88007cea6
000
[ 23.080733] Call Trace:
[ 23.080733] [<ffffffff803711f5>] cfq_free_io_context+0x25/0x2f
[ 23.080733] [<ffffffff8036c985>] put_io_context+0x53/0x72
[ 23.080733] [<ffffffff8036ca61>] exit_io_context+0x8e/0x95
[ 23.080733] [<ffffffff8023a679>] do_exit+0x858/0x88f
[ 23.080733] [<ffffffff8020c901>] child_rip+0x11/0x20
[ 23.080733] [<ffffffff8020c2c0>] ? restore_args+0x0/0x30
[ 23.080733] [<ffffffff80249f30>] ? kthread+0x0/0x76
[ 23.080733] [<ffffffff8020c8f0>] ? child_rip+0x0/0x20
[ 23.080733] Code: c6 75 92 60 80 48 89 cf 31 c0 e8 45 b3 00 00 c9 48 98 c3 55
48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 7e 08 00 75 04 <0f> 0b eb
fe 4c 8d 6f 08 4c 89 ef e8 77 40 17 00 48 8b 73 08 49
[ 23.080733] RIP [<ffffffff80371160>] cic_free_func+0x18/0x88
[ 23.080733] RSP <ffff88007ce9fe40>
[ 25.744530] ---[ end trace 4eaa2a86a8e2da22 ]---


2008-12-15 18:33:17

by Jens Axboe

[permalink] [raw]
Subject: Re: next-20081215: kernel BUG at block/cfq-iosched.c:1243

On Mon, Dec 15 2008, Alexander Beregalov wrote:
> Hi
>
> [ 22.857720] scsi_scan_0 used greatest stack depth: 4864 bytes left
> [ 22.859221] scsi 1:0:0:0: CD-ROM HL-DT-ST CD-ROM GCR-8240N 1.10 PQ
> : 0 ANSI: 5
> [ 23.077531] ------------[ cut here ]------------
> [ 23.080733] kernel BUG at block/cfq-iosched.c:1243!
> [ 23.080733] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 23.080733] last sysfs file:
> [ 23.080733] CPU 3
> [ 23.080733] Modules linked in:
> [ 23.080733] Pid: 580, comm: scsi_scan_0 Tainted: G W 2.6.28-rc8-next-
> 20081215 #98
> [ 23.080733] RIP: 0010:[<ffffffff80371160>] [<ffffffff80371160>] cic_free_fun
> c+0x18/0x88
> [ 23.080733] RSP: 0018:ffff88007ce9fe40 EFLAGS: 00010246
> [ 23.080733] RAX: ffff88007ceb6618 RBX: ffff88007cf1d680 RCX: 0000000000000001
> [ 23.080733] RDX: 0000000000000001 RSI: ffff88007cf1d680 RDI: ffff88007cea6000
> [ 23.080733] RBP: ffff88007ce9fe60 R08: 0000000000000246 R09: 0000000000000001
> [ 23.080733] R10: ffff88007ce9fde0 R11: ffff88007f0489c0 R12: ffff88007cea6000
> [ 23.080733] R13: ffff88007ce9ff00 R14: ffff88007cc49568 R15: ffff88007cc49568
> [ 23.080733] FS: 0000000000000000(0000) GS:ffff88007f402a28(0000) knlGS:00000
> 00000000000
> [ 23.080733] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 23.080733] CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
> [ 23.080733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 23.080733] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 23.080733] Process scsi_scan_0 (pid: 580, threadinfo ffff88007ce9e000, task
> ffff88007cc49310)
> [ 23.080733] Stack:
> [ 23.080733] ffff88007cf1d6f8 ffff88007cea6000 ffff88007ce9ff00 ffff88007cc49
> 568
> [ 23.080733] ffff88007ce9fe80 ffffffff803711f5 ffff88007cea6000 ffff88007cc49
> 310
> [ 23.080733] ffff88007ce9fea0 ffffffff8036c985 ffff88007ce9ff00 ffff88007cea6
> 000
> [ 23.080733] Call Trace:
> [ 23.080733] [<ffffffff803711f5>] cfq_free_io_context+0x25/0x2f
> [ 23.080733] [<ffffffff8036c985>] put_io_context+0x53/0x72
> [ 23.080733] [<ffffffff8036ca61>] exit_io_context+0x8e/0x95
> [ 23.080733] [<ffffffff8023a679>] do_exit+0x858/0x88f
> [ 23.080733] [<ffffffff8020c901>] child_rip+0x11/0x20
> [ 23.080733] [<ffffffff8020c2c0>] ? restore_args+0x0/0x30
> [ 23.080733] [<ffffffff80249f30>] ? kthread+0x0/0x76
> [ 23.080733] [<ffffffff8020c8f0>] ? child_rip+0x0/0x20
> [ 23.080733] Code: c6 75 92 60 80 48 89 cf 31 c0 e8 45 b3 00 00 c9 48 98 c3 55
> 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 7e 08 00 75 04 <0f> 0b eb
> fe 4c 8d 6f 08 4c 89 ef e8 77 40 17 00 48 8b 73 08 49
> [ 23.080733] RIP [<ffffffff80371160>] cic_free_func+0x18/0x88
> [ 23.080733] RSP <ffff88007ce9fe40>
> [ 25.744530] ---[ end trace 4eaa2a86a8e2da22 ]---

Odd, is it reproducible? What is your .config?

--
Jens Axboe

2008-12-15 18:39:13

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20081215: kernel BUG at block/cfq-iosched.c:1243

2008/12/15 Jens Axboe <[email protected]>:
>> [ 23.080733] kernel BUG at block/cfq-iosched.c:1243!
>> [ 23.080733] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

> Odd, is it reproducible? What is your .config?
Yes, it is

Config is in attach


Attachments:
(No filename) (259.00 B)
config-cfq (31.14 kB)
Download all attachments

2008-12-15 18:45:23

by Jens Axboe

[permalink] [raw]
Subject: Re: next-20081215: kernel BUG at block/cfq-iosched.c:1243

On Mon, Dec 15 2008, Alexander Beregalov wrote:
> 2008/12/15 Jens Axboe <[email protected]>:
> >> [ 23.080733] kernel BUG at block/cfq-iosched.c:1243!
> >> [ 23.080733] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>
> > Odd, is it reproducible? What is your .config?
> Yes, it is
>
> Config is in attach

It's weird, because the only change for CFQ in that area is the one
below. Does the oops go away if you back this out?


>From 030a3f02bb42fad91ae84c99924fe3ee162a634d Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Fri, 12 Dec 2008 15:56:31 +0100
Subject: [PATCH] cfq-iosched: fix race between exiting queue and exiting task

Original patch from Nikanth Karthikesan <[email protected]>

When a queue exits the queue lock is taken and cfq_exit_queue() would free all
the cic's associated with the queue.

But when a task exits, cfq_exit_io_context() gets cic one by one and then
locks the associated queue to call __cfq_exit_single_io_context. It looks like
between getting a cic from the ioc and locking the queue, the queue might have
exited on another cpu.

Fix this by rechecking the cfq_io_context queue key inside the queue lock
again, and not calling into __cfq_exit_single_io_context() if somebody
beat us to it.

Signed-off-by: Jens Axboe <[email protected]>
---
block/cfq-iosched.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ee8a90c..8a2485d 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1314,7 +1314,16 @@ static void cfq_exit_single_io_context(struct io_context *ioc,
unsigned long flags;

spin_lock_irqsave(q->queue_lock, flags);
+
+ /*
+ * Ensure we get a fresh copy of the ->key to prevent
+ * race between exiting task and queue
+ */
+ smp_read_barrier_depends();
+ if (cic->key)
+ __cfq_exit_single_io_context(cfqd, cic);
__cfq_exit_single_io_context(cfqd, cic);
+
spin_unlock_irqrestore(q->queue_lock, flags);
}
}
--
1.6.0.4.755.g6e139


--
Jens Axboe

2008-12-15 20:09:38

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20081215: kernel BUG at block/cfq-iosched.c:1243

2008/12/15 Jens Axboe <[email protected]>:
> On Mon, Dec 15 2008, Alexander Beregalov wrote:
>> 2008/12/15 Jens Axboe <[email protected]>:
>> >> [ 23.080733] kernel BUG at block/cfq-iosched.c:1243!
>> >> [ 23.080733] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>
>> > Odd, is it reproducible? What is your .config?
>> Yes, it is
>>
>> Config is in attach
>
> It's weird, because the only change for CFQ in that area is the one
> below. Does the oops go away if you back this out?
Yes, it can boot without the patch, no oops.

> From 030a3f02bb42fad91ae84c99924fe3ee162a634d Mon Sep 17 00:00:00 2001
> From: Jens Axboe <[email protected]>
> Date: Fri, 12 Dec 2008 15:56:31 +0100
> Subject: [PATCH] cfq-iosched: fix race between exiting queue and exiting task

2008-12-15 20:19:18

by Jens Axboe

[permalink] [raw]
Subject: Re: next-20081215: kernel BUG at block/cfq-iosched.c:1243

On Mon, Dec 15 2008, Alexander Beregalov wrote:
> 2008/12/15 Jens Axboe <[email protected]>:
> > On Mon, Dec 15 2008, Alexander Beregalov wrote:
> >> 2008/12/15 Jens Axboe <[email protected]>:
> >> >> [ 23.080733] kernel BUG at block/cfq-iosched.c:1243!
> >> >> [ 23.080733] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >>
> >> > Odd, is it reproducible? What is your .config?
> >> Yes, it is
> >>
> >> Config is in attach
> >
> > It's weird, because the only change for CFQ in that area is the one
> > below. Does the oops go away if you back this out?
> Yes, it can boot without the patch, no oops.

OK, dumb error. Please try this instead, on top of -next:

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 8a2485d..e8525fa 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1322,7 +1322,6 @@ static void cfq_exit_single_io_context(struct io_context *ioc,
smp_read_barrier_depends();
if (cic->key)
__cfq_exit_single_io_context(cfqd, cic);
- __cfq_exit_single_io_context(cfqd, cic);

spin_unlock_irqrestore(q->queue_lock, flags);
}

--
Jens Axboe

2008-12-15 20:29:27

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20081215: kernel BUG at block/cfq-iosched.c:1243

2008/12/15 Jens Axboe <[email protected]>:
> On Mon, Dec 15 2008, Alexander Beregalov wrote:
>> 2008/12/15 Jens Axboe <[email protected]>:
>> > On Mon, Dec 15 2008, Alexander Beregalov wrote:
>> >> 2008/12/15 Jens Axboe <[email protected]>:
>> >> >> [ 23.080733] kernel BUG at block/cfq-iosched.c:1243!
>> >> >> [ 23.080733] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> >>
>> >> > Odd, is it reproducible? What is your .config?
>> >> Yes, it is
>> >>
>> >> Config is in attach
>> >
>> > It's weird, because the only change for CFQ in that area is the one
>> > below. Does the oops go away if you back this out?
>> Yes, it can boot without the patch, no oops.
>
> OK, dumb error. Please try this instead, on top of -next:
It works without oopses.
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 8a2485d..e8525fa 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1322,7 +1322,6 @@ static void cfq_exit_single_io_context(struct io_context *ioc,
> smp_read_barrier_depends();
> if (cic->key)
> __cfq_exit_single_io_context(cfqd, cic);
> - __cfq_exit_single_io_context(cfqd, cic);
>
> spin_unlock_irqrestore(q->queue_lock, flags);
> }
>
> --
> Jens Axboe
>
>