LinuxLists.cc - [BUG] Unable to handle kernel NULL pointer dereference...as_move_to

2007-01-22 18:48:46

Subject: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

All,

We've been trying to track down a nagging regression seen during some
port-disable/enable testing. The problem occurs anywhere from
20 minutes to 120 minutes of testing under very minimal I/O load:

[ 1143.890598] Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
[ 1143.896087] [<ffffffff802ed04f>] as_move_to_dispatch+0x11/0x135
[ 1143.904574] PGD 120ba067 PUD 31e8c067 PMD 0
[ 1143.908895] Oops: 0000 [1] SMP
[ 1143.912065] CPU 1
[ 1143.914092] Modules linked in: qla2xxx scsi_transport_fc
[ 1143.919453] Pid: 8362, comm: dt Not tainted 2.6.19 #39
[ 1143.924588] RIP: 0010:[<ffffffff802ed04f>] [<ffffffff802ed04f>] as_move_to_dispatch+0x11/0x135
[ 1143.933300] RSP: 0018:ffff8100021f7cf0 EFLAGS: 00010046
[ 1143.938610] RAX: 0000000000000000 RBX: ffff8100261392f0 RCX: 0000000000000000
[ 1143.945736] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8100261392f0
[ 1143.952865] RBP: 0000000000000000 R08: ffffffff804d0928 R09: ffff8100351a96e0
[ 1143.959992] R10: ffff8100351a96e0 R11: 0000000000000000 R12: 0000000000000000
[ 1143.967118] R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000001
[ 1143.974248] FS: 0000000000000000(0000) GS:ffff8100021c5930(0000) knlGS:0000000000000000
[ 1143.982327] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1143.988070] CR2: 0000000000000028 CR3: 000000003493b000 CR4: 00000000000006e0
[ 1143.995198] Process dt (pid: 8362, threadinfo ffff8100207a2000, task ffff810023d86840)
[ 1144.003103] Stack: 0000000000000000 ffff8100261392f0 0000000000000000 0000000000000001
[ 1144.011183] 0000000000000000 ffffffff802ed591 0000000000000000 ffff810035795990
[ 1144.018640] ffff81000f11b3f8 ffff810035795990 ffff810035795990 ffffffff802e53d6
[ 1144.025906] Call Trace:
[ 1144.028548] <IRQ> [<ffffffff802ed591>] as_dispatch_request+0x39d/0x3aa
[ 1144.035274] [<ffffffff802e53d6>] elv_next_request+0x45/0x151
[ 1144.041015] [<ffffffff8037b3cf>] scsi_request_fn+0x7a/0x375
[ 1144.046667] [<ffffffff802e8fa0>] blk_run_queue+0x42/0x74
[ 1144.052062] [<ffffffff8037b2bc>] scsi_next_command+0x2d/0x39
[ 1144.057805] [<ffffffff8037b85b>] scsi_end_request+0xc0/0xce
[ 1144.063460] [<ffffffff8037ba03>] scsi_io_completion+0x149/0x351
[ 1144.069464] [<ffffffff8038aad6>] sd_rw_intr+0x247/0x256
[ 1144.074770] [<ffffffff80376b92>] scsi_finish_command+0x88/0x8d
[ 1144.080686] [<ffffffff8037c25e>] scsi_softirq_done+0xe7/0xef
[ 1144.086429] [<ffffffff802e614c>] blk_done_softirq+0x63/0x71
[ 1144.092082] [<ffffffff80226487>] __do_softirq+0x51/0xbd
[ 1144.097394] [<ffffffff8020a88c>] call_softirq+0x1c/0x28
[ 1144.102701] [<ffffffff8020bffa>] do_softirq+0x2e/0x94
[ 1144.107836] [<ffffffff8020c10e>] do_IRQ+0xae/0xc8
[ 1144.112625] [<ffffffff80209c81>] ret_from_intr+0x0/0xb
[ 1144.117845] <EOI> [<ffffffff8027dfad>] __bio_add_page+0x1b8/0x1bf
[ 1144.124135] [<ffffffff8027ecb6>] blkdev_direct_IO+0x2c9/0x420
[ 1144.129964] [<ffffffff8021a1f4>] __activate_task+0x2d/0x3f
[ 1144.135537] [<ffffffff8040df0d>] _spin_unlock_irq+0x6/0x9
[ 1144.141024] [<ffffffff80241d86>] generic_file_direct_IO+0xa6/0xe8
[ 1144.147196] [<ffffffff80219efe>] __wake_up_common+0x42/0x6c
[ 1144.152852] [<ffffffff80242515>] generic_file_aio_read+0xcc/0x19d
[ 1144.159029] [<ffffffff8025d7cd>] do_sync_read+0xc8/0x10b
[ 1144.164426] [<ffffffff8023343c>] autoremove_wake_function+0x0/0x2e
[ 1144.170684] [<ffffffff8040df0d>] _spin_unlock_irq+0x6/0x9
[ 1144.176166] [<ffffffff8025c516>] do_filp_open+0x2d/0x3d
[ 1144.181480] [<ffffffff80259876>] poison_obj+0x35/0x44
[ 1144.186620] [<ffffffff8025e1c0>] vfs_read+0xac/0x14f
[ 1144.191668] [<ffffffff8025e322>] sys_read+0x45/0x6e
[ 1144.196630] [<ffffffff8020976e>] system_call+0x7e/0x83
[ 1144.201852]
[ 1144.203348]
[ 1144.203349] Code: 8b 46 28 48 8b 56 78 25 01 80 00 00 83 f8 01 48 8d 46 78 41
[ 1144.212416] RIP [<ffffffff802ed04f>] as_move_to_dispatch+0x11/0x135
[ 1144.218783] RSP <ffff8100021f7cf0>
[ 1144.222272] CR2: 0000000000000028
[ 1144.225866] <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[ 1144.233878] in_atomic():1, irqs_disabled():1
[ 1144.238147]
[ 1144.238148] Call Trace:
[ 1144.242088] <IRQ> [<ffffffff8021ae13>] __might_sleep+0xba/0xbc
[ 1144.248109] [<ffffffff80235c33>] down_read+0x15/0x1e
[ 1144.253163] [<ffffffff8022e430>] blocking_notifier_call_chain+0x13/0x36
[ 1144.259864] [<ffffffff802247af>] do_exit+0x22/0x843
[ 1144.264823] [<ffffffff80219259>] do_page_fault+0x708/0x775
[ 1144.270394] [<ffffffff8021433a>] smp_local_timer_interrupt+0x32/0x53
[ 1144.276826] [<ffffffff802143b0>] smp_apic_timer_interrupt+0x55/0x68
[ 1144.283174] [<ffffffff8020a336>] apic_timer_interrupt+0x66/0x70
[ 1144.289178] [<ffffffff8040e12d>] error_exit+0x0/0x84
[ 1144.294226] [<ffffffff802ed04f>] as_move_to_dispatch+0x11/0x135
[ 1144.300227] [<ffffffff802ed591>] as_dispatch_request+0x39d/0x3aa
[ 1144.306315] [<ffffffff802e53d6>] elv_next_request+0x45/0x151
[ 1144.312058] [<ffffffff8037b3cf>] scsi_request_fn+0x7a/0x375
[ 1144.317713] [<ffffffff802e8fa0>] blk_run_queue+0x42/0x74
[ 1144.323108] [<ffffffff8037b2bc>] scsi_next_command+0x2d/0x39
[ 1144.328850] [<ffffffff8037b85b>] scsi_end_request+0xc0/0xce
[ 1144.334505] [<ffffffff8037ba03>] scsi_io_completion+0x149/0x351
[ 1144.340507] [<ffffffff8038aad6>] sd_rw_intr+0x247/0x256
[ 1144.345815] [<ffffffff80376b92>] scsi_finish_command+0x88/0x8d
[ 1144.351730] [<ffffffff8037c25e>] scsi_softirq_done+0xe7/0xef
[ 1144.357474] [<ffffffff802e614c>] blk_done_softirq+0x63/0x71
[ 1144.363127] [<ffffffff80226487>] __do_softirq+0x51/0xbd
[ 1144.368437] [<ffffffff8020a88c>] call_softirq+0x1c/0x28
[ 1144.373745] [<ffffffff8020bffa>] do_softirq+0x2e/0x94
[ 1144.378880] [<ffffffff8020c10e>] do_IRQ+0xae/0xc8
[ 1144.383669] [<ffffffff80209c81>] ret_from_intr+0x0/0xb
[ 1144.388890] <EOI> [<ffffffff8027dfad>] __bio_add_page+0x1b8/0x1bf
[ 1144.395170] [<ffffffff8027ecb6>] blkdev_direct_IO+0x2c9/0x420
[ 1144.401001] [<ffffffff8021a1f4>] __activate_task+0x2d/0x3f
[ 1144.406569] [<ffffffff8040df0d>] _spin_unlock_irq+0x6/0x9
[ 1144.412051] [<ffffffff80241d86>] generic_file_direct_IO+0xa6/0xe8
[ 1144.418224] [<ffffffff80219efe>] __wake_up_common+0x42/0x6c
[ 1144.423881] [<ffffffff80242515>] generic_file_aio_read+0xcc/0x19d
[ 1144.430056] [<ffffffff8025d7cd>] do_sync_read+0xc8/0x10b
[ 1144.435451] [<ffffffff8023343c>] autoremove_wake_function+0x0/0x2e
[ 1144.441713] [<ffffffff8040df0d>] _spin_unlock_irq+0x6/0x9
[ 1144.447194] [<ffffffff8025c516>] do_filp_open+0x2d/0x3d
[ 1144.452503] [<ffffffff80259876>] poison_obj+0x35/0x44
[ 1144.457639] [<ffffffff8025e1c0>] vfs_read+0xac/0x14f
[ 1144.462688] [<ffffffff8025e322>] sys_read+0x45/0x6e
[ 1144.467650] [<ffffffff8020976e>] system_call+0x7e/0x83
[ 1144.472870]
[ 1144.474364] Kernel panic - not syncing: Aiee, killing interrupt handler!

Basically what is happening from the FC side is the initiator executes
a simple dt test:

dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048

against a single lun (a very basic Windows target mode driver).
During the test a port-enable, port-disable script is running agains
the switch's port that is connected to the target (this occurs every
sixty seconds (for a disabled duration of 2 seconds). Additionally,
the target itself is set to LOGO (logout) or drop off the topology
every 30 seconds.

This test runs fine up to 2.6.19. Breakage appears to have been introduced in
2.6.20-rc1 (our firmware-engineer noticed the problem in 2.6.20-rc4).
Unfortunately our efforts a basic bisections have been fruitless as according
to git-bisect the bad commit is:

commit 029530f810dd5147f7e59b939eb22cfbe0beea12
Author: Adrian Bunk <[email protected]>
Date: Wed Dec 13 00:35:16 2006 -0800

[PATCH] one more EXPORT_UNUSED_SYMBOL removal

Signed-off-by: Adrian Bunk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

which doesn't make much sense... Here's the log...

$ git bisect log
git-bisect start
# good: [c3fe6924620fd733ffe8bc8a9da1e9cde08402b3] Linux 2.6.19
git-bisect good c3fe6924620fd733ffe8bc8a9da1e9cde08402b3
# bad: [cc016448b0bf0764928275d034e367753bde8162] Linux v2.6.20-rc1
git-bisect bad cc016448b0bf0764928275d034e367753bde8162
# good: [b2c03941b50944a268ee4d5823872f220809a3ba] IPMI: Allow hot system interface remove
git-bisect good b2c03941b50944a268ee4d5823872f220809a3ba
# good: [de1ba09b214056365d9082982905b255caafb7a2] fault injection: documentation and scripts
git-bisect good de1ba09b214056365d9082982905b255caafb7a2
# good: [116140b7f5c9182c86a4e419f81684209357aea7] Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
git-bisect good 116140b7f5c9182c86a4e419f81684209357aea7
# good: [775ba7ad491a154f99871fe603f03366e84ae159] Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
git-bisect good 775ba7ad491a154f99871fe603f03366e84ae159
# good: [69688262fb94e92a32f188b79c02dc32016d4927] update Tigran's email addresses
git-bisect good 69688262fb94e92a32f188b79c02dc32016d4927
# bad: [5faad620264290b17e80a8b0996b039ea0d5ac73] Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block
git-bisect bad 5faad620264290b17e80a8b0996b039ea0d5ac73
# bad: [3161986224a3faa8ccca3e665b7404d81e7ee3cf] fbdev: remove references to non-existent fbmon_valid_timings()
git-bisect bad 3161986224a3faa8ccca3e665b7404d81e7ee3cf
# bad: [c954e2a5d1c9662991a41282297ddebcadee0578] knfsd: nfsd4: make verify and nverify wrappers
git-bisect bad c954e2a5d1c9662991a41282297ddebcadee0578
# bad: [021d3a72459191a76e8e482ee4937ba6bc9fd712] knfsd: nfsd4: handling more nfsd_cross_mnt errors in nfsd4 readdir
git-bisect bad 021d3a72459191a76e8e482ee4937ba6bc9fd712
# bad: [b797b5beac966df5c5d96c0d39fe366f57135343] knfsd: svcrpc: fix gss krb5i memory leak
git-bisect bad b797b5beac966df5c5d96c0d39fe366f57135343
# bad: [b21a323710e77a27b2f66af901bd3640c30aba6e] remove the broken BLK_DEV_SWIM_IOP driver
git-bisect bad b21a323710e77a27b2f66af901bd3640c30aba6e
# bad: [029530f810dd5147f7e59b939eb22cfbe0beea12] one more EXPORT_UNUSED_SYMBOL removal
git-bisect bad 029530f810dd5147f7e59b939eb22cfbe0beea12

Not sure how much help it is, but while trying to instrument
as-iosched.c, we can see that the failure has a fairly stable
signature:

[15280.813479] as_dispatch_request: ad=ffff810038704da8 ; reads=0 ; writes=1 ; dir=0 ;fifo_list[async]=ffff81003f4071a0 ad-new_batch=0 ad->change_batch=0.
[15280.827032] as_dispatch_request: q=ffff81003fdae050 ; #_reqs=128 ; lmerge=ffff81003f4071a0 ;.

which means:

static int as_dispatch_request(request_queue_t *q, int force)
{
...
const int writes = !list_empty(&ad->fifo_list[REQ_ASYNC]);

'writes' is set, the batch_data_dir has not changed
(ad->batch_data_dir != REQ_SYNC) in the following segment:

...
if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&ad->sort_list[REQ_ASYNC]));

if (ad->batch_data_dir == REQ_SYNC) {
ad->changed_batch = 1;

/*
* new_batch might be 1 when the queue runs out of
* reads. A subsequent submission of a write might
* cause a change of batch before the read is finished.
*/
ad->new_batch = 0;
}
ad->batch_data_dir = REQ_ASYNC;
ad->current_write_count = ad->write_batch_count;
ad->write_batch_idled = 0;
rq = ad->next_rq[ad->batch_data_dir];
goto dispatch_request;
}

ad->next_rq[ad->batch_data_dir] is NULL, and is then passed down to
as_move_to_dispatch() where the first dereference of rq:

static void as_move_to_dispatch(struct as_data *ad, struct request *rq)
{
const int data_dir = rq_is_sync(rq);

borks the machine. What's odd (perhaps it's just our rudimentary
understanding of AS) is that there are segments of code where
ad->next_rq[REQ_ASYNC] is checked against NULL (in 'writes' case it is
not).

Anyway, any ideas or hints? Attached is the .config used.

Thanks,
Andrew Vasquez

Attachments:

(No filename) (11.90 kB)
config.txt (24.92 kB)
Download all attachments

2007-02-02 07:23:21

by Andrew Morton

[permalink] [raw]

Subject: Re: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

On Mon, 22 Jan 2007 10:35:10 -0800 Andrew Vasquez <[email protected]> wrote:

> All,
>
> We've been trying to track down a nagging regression seen during some
> port-disable/enable testing. The problem occurs anywhere from
> 20 minutes to 120 minutes of testing under very minimal I/O load:
>
> [ 1143.890598] Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
> [ 1143.896087] [<ffffffff802ed04f>] as_move_to_dispatch+0x11/0x135
> [ 1143.904574] PGD 120ba067 PUD 31e8c067 PMD 0
> [ 1143.908895] Oops: 0000 [1] SMP
> [ 1143.912065] CPU 1
> [ 1143.914092] Modules linked in: qla2xxx scsi_transport_fc
> [ 1143.919453] Pid: 8362, comm: dt Not tainted 2.6.19 #39
> [ 1143.924588] RIP: 0010:[<ffffffff802ed04f>] [<ffffffff802ed04f>] as_move_to_dispatch+0x11/0x135
> [ 1143.933300] RSP: 0018:ffff8100021f7cf0 EFLAGS: 00010046
> [ 1143.938610] RAX: 0000000000000000 RBX: ffff8100261392f0 RCX: 0000000000000000
> [ 1143.945736] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8100261392f0
> [ 1143.952865] RBP: 0000000000000000 R08: ffffffff804d0928 R09: ffff8100351a96e0
> [ 1143.959992] R10: ffff8100351a96e0 R11: 0000000000000000 R12: 0000000000000000
> [ 1143.967118] R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000001
> [ 1143.974248] FS: 0000000000000000(0000) GS:ffff8100021c5930(0000) knlGS:0000000000000000
> [ 1143.982327] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1143.988070] CR2: 0000000000000028 CR3: 000000003493b000 CR4: 00000000000006e0
> [ 1143.995198] Process dt (pid: 8362, threadinfo ffff8100207a2000, task ffff810023d86840)
> [ 1144.003103] Stack: 0000000000000000 ffff8100261392f0 0000000000000000 0000000000000001
> [ 1144.011183] 0000000000000000 ffffffff802ed591 0000000000000000 ffff810035795990
> [ 1144.018640] ffff81000f11b3f8 ffff810035795990 ffff810035795990 ffffffff802e53d6
> [ 1144.025906] Call Trace:
> [ 1144.028548] <IRQ> [<ffffffff802ed591>] as_dispatch_request+0x39d/0x3aa
> [ 1144.035274] [<ffffffff802e53d6>] elv_next_request+0x45/0x151
> [ 1144.041015] [<ffffffff8037b3cf>] scsi_request_fn+0x7a/0x375
> [ 1144.046667] [<ffffffff802e8fa0>] blk_run_queue+0x42/0x74
> [ 1144.052062] [<ffffffff8037b2bc>] scsi_next_command+0x2d/0x39
> [ 1144.057805] [<ffffffff8037b85b>] scsi_end_request+0xc0/0xce
> [ 1144.063460] [<ffffffff8037ba03>] scsi_io_completion+0x149/0x351
> [ 1144.069464] [<ffffffff8038aad6>] sd_rw_intr+0x247/0x256
> [ 1144.074770] [<ffffffff80376b92>] scsi_finish_command+0x88/0x8d
> [ 1144.080686] [<ffffffff8037c25e>] scsi_softirq_done+0xe7/0xef
> [ 1144.086429] [<ffffffff802e614c>] blk_done_softirq+0x63/0x71
> [ 1144.092082] [<ffffffff80226487>] __do_softirq+0x51/0xbd
> [ 1144.097394] [<ffffffff8020a88c>] call_softirq+0x1c/0x28
> [ 1144.102701] [<ffffffff8020bffa>] do_softirq+0x2e/0x94
> [ 1144.107836] [<ffffffff8020c10e>] do_IRQ+0xae/0xc8
> [ 1144.112625] [<ffffffff80209c81>] ret_from_intr+0x0/0xb
> [ 1144.117845] <EOI> [<ffffffff8027dfad>] __bio_add_page+0x1b8/0x1bf
> [ 1144.124135] [<ffffffff8027ecb6>] blkdev_direct_IO+0x2c9/0x420
> [ 1144.129964] [<ffffffff8021a1f4>] __activate_task+0x2d/0x3f
> [ 1144.135537] [<ffffffff8040df0d>] _spin_unlock_irq+0x6/0x9
> [ 1144.141024] [<ffffffff80241d86>] generic_file_direct_IO+0xa6/0xe8
> [ 1144.147196] [<ffffffff80219efe>] __wake_up_common+0x42/0x6c
> [ 1144.152852] [<ffffffff80242515>] generic_file_aio_read+0xcc/0x19d
> [ 1144.159029] [<ffffffff8025d7cd>] do_sync_read+0xc8/0x10b
> [ 1144.164426] [<ffffffff8023343c>] autoremove_wake_function+0x0/0x2e
> [ 1144.170684] [<ffffffff8040df0d>] _spin_unlock_irq+0x6/0x9
> [ 1144.176166] [<ffffffff8025c516>] do_filp_open+0x2d/0x3d
> [ 1144.181480] [<ffffffff80259876>] poison_obj+0x35/0x44
> [ 1144.186620] [<ffffffff8025e1c0>] vfs_read+0xac/0x14f
> [ 1144.191668] [<ffffffff8025e322>] sys_read+0x45/0x6e
> [ 1144.196630] [<ffffffff8020976e>] system_call+0x7e/0x83
> [ 1144.201852]
>
> ...
>
> Basically what is happening from the FC side is the initiator executes
> a simple dt test:
>
> dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048
>
> against a single lun (a very basic Windows target mode driver).
> During the test a port-enable, port-disable script is running agains
> the switch's port that is connected to the target (this occurs every
> sixty seconds (for a disabled duration of 2 seconds). Additionally,
> the target itself is set to LOGO (logout) or drop off the topology
> every 30 seconds.

I don't understand what effect the port-enable/port-disable has upon the
system. Will it cause I/O errors, or what?

> This test runs fine up to 2.6.19.

One thing we did in there was to give direct-io-against-blockdevs some
special-case bio-preparation code. Perhaps this is tickling a bug somehow.

We can revert that change like this:

diff -puN fs/block_dev.c~a fs/block_dev.c
--- a/fs/block_dev.c~a
+++ a/fs/block_dev.c
@@ -196,8 +196,47 @@ static void blk_unget_page(struct page *
pvec->page[--pvec->idx] = page;
}

+static int
+blkdev_get_blocks(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh, int create)
+{
+ sector_t end_block = max_block(I_BDEV(inode));
+ unsigned long max_blocks = bh->b_size >> inode->i_blkbits;
+
+ if ((iblock + max_blocks) > end_block) {
+ max_blocks = end_block - iblock;
+ if ((long)max_blocks <= 0) {
+ if (create)
+ return -EIO; /* write fully beyond EOF */
+ /*
+ * It is a read which is fully beyond EOF. We return
+ * a !buffer_mapped buffer
+ */
+ max_blocks = 0;
+ }
+ }
+
+ bh->b_bdev = I_BDEV(inode);
+ bh->b_blocknr = iblock;
+ bh->b_size = max_blocks << inode->i_blkbits;
+ if (max_blocks)
+ set_buffer_mapped(bh);
+ return 0;
+}
+
static ssize_t
blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file->f_mapping->host;
+
+ return blockdev_direct_IO_no_locking(rw, iocb, inode, I_BDEV(inode),
+ iov, offset, nr_segs, blkdev_get_blocks, NULL);
+}
+
+static ssize_t
+xxblkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
loff_t pos, unsigned long nr_segs)
{
struct inode *inode = iocb->ki_filp->f_mapping->host;
_

2007-02-02 20:56:36

by Andrew Vasquez

[permalink] [raw]

Subject: Re: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

On Thu, 01 Feb 2007, Andrew Morton wrote:

> On Mon, 22 Jan 2007 10:35:10 -0800 Andrew Vasquez <[email protected]> wrote:
> > Basically what is happening from the FC side is the initiator executes
> > a simple dt test:
> >
> > dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048
> >
> > against a single lun (a very basic Windows target mode driver).
> > During the test a port-enable, port-disable script is running agains
> > the switch's port that is connected to the target (this occurs every
> > sixty seconds (for a disabled duration of 2 seconds). Additionally,
> > the target itself is set to LOGO (logout) or drop off the topology
> > every 30 seconds.
>
> I don't understand what effect the port-enable/port-disable has upon the
> system. Will it cause I/O errors, or what?

No I/O errors should make there way to the upper-layers (block/FS).
The system *should* be shielded from the fibre-channel fabric events.
I just wanted to explain what the (basic sanity) test did.

> > This test runs fine up to 2.6.19.
>
> One thing we did in there was to give direct-io-against-blockdevs some
> special-case bio-preparation code. Perhaps this is tickling a bug somehow.
>
> We can revert that change like this:
>
>
> diff -puN fs/block_dev.c~a fs/block_dev.c
> --- a/fs/block_dev.c~a
> +++ a/fs/block_dev.c
> @@ -196,8 +196,47 @@ static void blk_unget_page(struct page *
> pvec->page[--pvec->idx] = page;
> }
>
> +static int
> +blkdev_get_blocks(struct inode *inode, sector_t iblock,
> + struct buffer_head *bh, int create)
...

Hmm, with this patch we've noted two main differences:

1) I/O throughput with the basic 'dd' command used (above) is back to
60MB/s, rather than the appalling 20-22 MB/s we were seeing with
2.6.20-rcX.

2) No panics -- so far with 2+ hours of testing. With our vanilla
system of 2.6.20-rc7, the test could trigger the panic within 15 to
20 minutes.

We'll let this run over the weekend -- I'll certainly let you know if
anything has changed (failures).

--
Andrew Vasquez

2007-02-02 22:12:26

by Andrew Morton

[permalink] [raw]

Subject: Re: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

On Fri, 2 Feb 2007 12:56:30 -0800
Andrew Vasquez <[email protected]> wrote:

> On Thu, 01 Feb 2007, Andrew Morton wrote:
>
> > On Mon, 22 Jan 2007 10:35:10 -0800 Andrew Vasquez <[email protected]> wrote:
> > > Basically what is happening from the FC side is the initiator executes
> > > a simple dt test:
> > >
> > > dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048
> > >
> > > against a single lun (a very basic Windows target mode driver).
> > > During the test a port-enable, port-disable script is running agains
> > > the switch's port that is connected to the target (this occurs every
> > > sixty seconds (for a disabled duration of 2 seconds). Additionally,
> > > the target itself is set to LOGO (logout) or drop off the topology
> > > every 30 seconds.
> >
> > I don't understand what effect the port-enable/port-disable has upon the
> > system. Will it cause I/O errors, or what?
>
> No I/O errors should make there way to the upper-layers (block/FS).
> The system *should* be shielded from the fibre-channel fabric events.
> I just wanted to explain what the (basic sanity) test did.
>
> > > This test runs fine up to 2.6.19.
> >
> > One thing we did in there was to give direct-io-against-blockdevs some
> > special-case bio-preparation code. Perhaps this is tickling a bug somehow.
> >
> > We can revert that change like this:
> >
> >
> > diff -puN fs/block_dev.c~a fs/block_dev.c
> > --- a/fs/block_dev.c~a
> > +++ a/fs/block_dev.c
> > @@ -196,8 +196,47 @@ static void blk_unget_page(struct page *
> > pvec->page[--pvec->idx] = page;
> > }
> >
> > +static int
> > +blkdev_get_blocks(struct inode *inode, sector_t iblock,
> > + struct buffer_head *bh, int create)
> ...
>
> Hmm, with this patch we've noted two main differences:
>
> 1) I/O throughput with the basic 'dd' command used (above) is back to
> 60MB/s, rather than the appalling 20-22 MB/s we were seeing with
> 2.6.20-rcX.
>
> 2) No panics -- so far with 2+ hours of testing. With our vanilla
> system of 2.6.20-rc7, the test could trigger the panic within 15 to
> 20 minutes.
>
> We'll let this run over the weekend -- I'll certainly let you know if
> anything has changed (failures).

Oh crap, I didn't realise we had a performance regression as well.

direct-io against a blockdev is kinda important for some people, so this is
a must-fix for 2.6.20. I'll prepare a minimal patch to switch 2.6.20 back
to the 2.6.19 codepaths.

2007-02-03 00:25:52

by Andrew Morton

[permalink] [raw]

Subject: Re: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

On Fri, 2 Feb 2007 12:56:30 -0800
Andrew Vasquez <[email protected]> wrote:

> > > dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048

What is this mysterious dt command, btw?

2007-02-03 01:23:43

by Randy Dunlap

[permalink] [raw]

Subject: Re: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

On Fri, 2 Feb 2007 16:25:41 -0800 Andrew Morton wrote:

> On Fri, 2 Feb 2007 12:56:30 -0800
> Andrew Vasquez <[email protected]> wrote:
>
> > > > dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048
>
> What is this mysterious dt command, btw?

I expect that it's the one here:
http://www.scsifaq.org/RMiller_Tools/index.html

---
~Randy

2007-02-03 04:30:31

by Andrew Vasquez

[permalink] [raw]

Subject: Re: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

On Fri, 02 Feb 2007, Randy Dunlap wrote:

> On Fri, 2 Feb 2007 16:25:41 -0800 Andrew Morton wrote:
>
> > On Fri, 2 Feb 2007 12:56:30 -0800
> > Andrew Vasquez <[email protected]> wrote:
> >
> > > > > dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048
> >
> > What is this mysterious dt command, btw?
>
> I expect that it's the one here:
> http://www.scsifaq.org/RMiller_Tools/index.html

yep, that's the one.