2008-01-29 14:02:19

by Nai Xia

[permalink] [raw]
Subject: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi,

Build environment: debian sid, gcc-4.2.3, i386.

The bug is in lately git-pull from

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

And it can be reproduced very easily on a machine with normal cdroms.
It halts booting and I grabbed the bug output with an serial console.

Code around the bug is:

/*
* If DRQ is clear, the command has completed.
*/
if ((stat & DRQ_STAT) == 0) {
spin_lock_irqsave(&ide_lock, flags);
if (__blk_end_request(rq, 0, 0))
BUG(); // Here comes the bug !
HWGROUP(drive)->rq = NULL;
spin_unlock_irqrestore(&ide_lock, flags);

return ide_stopped;
}

[ 6.964356] kernel BUG at drivers/ide/ide-cd.c:1726!
[ 6.964435] invalid opcode: 0000 [#1] PREEMPT SMP
[ 6.964435] Modules linked in:
[ 6.964435]
[ 6.964435] Pid: 1138, comm: scsi_id Not tainted (2.6.24-g8561b089 #19)
[ 6.964435] EIP: 0060:[<c03edf9c>] EFLAGS: 00010002 CPU: 0
[ 6.964435] EIP is at cdrom_newpc_intr+0x2bc/0x2c0
[ 6.964435] EAX: 00000001 EBX: 00000002 ECX: 00000001 EDX: 00000001
[ 6.964435] ESI: 00000003 EDI: cf0144d8 EBP: c07a3c58 ESP: cf2b59a8
[ 6.964435] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 6.964435] Process scsi_id (pid: 1138, ti=cf2b4000 task=cf8856b0 task.ti=cf2b4000)
[ 6.964435] Stack: 00000202 00000002 c0778e80 c0133b57 0000009e 00000000 cf009c00 00000050
[ 6.964435] 0000000f c07a3c00 cf009c00 c07a3c58 c03df2c9 00000000 00000000 00000000
[ 6.964435] 00000000 00000000 00000202 c03edce0 cfbcbb60 00000000 00000000 0000000f
[ 6.964435] Call Trace:
[ 6.964435] [<c0133b57>] lock_timer_base+0x27/0x60
[ 6.964435] [<c03df2c9>] ide_intr+0xa9/0x200
[ 6.964435] [<c03edce0>] cdrom_newpc_intr+0x0/0x2c0
[ 6.964435] [<c01503d5>] handle_IRQ_event+0x25/0x50
[ 6.964435] [<c01517d9>] handle_edge_irq+0xb9/0x120
[ 6.964435] [<c010687b>] do_IRQ+0x3b/0x70
[ 6.964435] [<c0104df3>] common_interrupt+0x23/0x28
[ 6.964435] [<c03dfa88>] ide_outsw+0x8/0x10
[ 6.964435] [<c03e0ae4>] ata_output_data+0x84/0x90
[ 6.964435] [<c03dfe07>] atapi_output_bytes+0x27/0x60
[ 6.964435] [<c03ee2f8>] cdrom_transfer_packet_command+0x98/0x110
[ 6.964435] [<c03eafb0>] cdrom_timer_expiry+0x0/0x60
[ 6.964435] [<c03ecd8b>] cdrom_start_packet_command+0x10b/0x130
[ 6.964435] [<c03ee370>] cdrom_do_newpc_cont+0x0/0x30
[ 6.964435] [<c03deb96>] ide_do_request+0x426/0x990
[ 6.964435] [<c0133ee7>] del_timer+0x57/0x70
[ 6.964435] [<c0321ae6>] blk_remove_plug+0x26/0x60
[ 6.964435] [<c031db95>] elv_drain_elevator+0x15/0x60
[ 6.964435] [<c031e6d8>] elv_insert+0xd8/0x170
[ 6.964435] [<c0321ba4>] blk_execute_rq_nowait+0x54/0xa0
[ 6.964435] [<c0321c70>] blk_execute_rq+0x80/0xf0
[ 6.964435] [<c0320a10>] blk_end_sync_rq+0x0/0x30
[ 6.964435] [<c019b563>] bio_add_pc_page+0x23/0x30
[ 6.964435] [<c0320c06>] blk_rq_bio_prep+0x96/0xb0
[ 6.964435] [<c0320f0c>] blk_rq_append_bio+0x1c/0x70
[ 6.964435] [<c03210f2>] blk_rq_map_user+0x122/0x1a0
[ 6.964435] [<c032577e>] sg_io+0x1be/0x320
[ 6.964435] [<c0325bdc>] scsi_cmd_ioctl+0x2fc/0x430
[ 6.964435] [<c03ec9a0>] idecd_revalidate_disk+0x10/0x20
[ 6.964435] [<c01794d9>] get_super+0x99/0xa0
[ 6.964435] [<c019c329>] __invalidate_device+0x39/0x50
[ 6.964435] [<c047d0b7>] cdrom_ioctl+0x37/0xe20
[ 6.964435] [<c01886c6>] __d_lookup+0x146/0x160
[ 6.964435] [<c03eb6ed>] idecd_ioctl+0x17d/0x190
[ 6.964435] [<c01804bf>] __link_path_walk+0xa1f/0xd50
[ 6.964435] [<c032366d>] blkdev_driver_ioctl+0x6d/0x80
[ 6.964435] [<c032390e>] blkdev_ioctl+0x28e/0x810
[ 6.964435] [<c032bd9f>] kobject_get+0xf/0x20
[ 6.964435] [<c0324169>] get_disk+0x29/0xa0
[ 6.964435] [<c03241e7>] exact_lock+0x7/0x10
[ 6.964435] [<c03b33b8>] kobj_lookup+0x148/0x160
[ 6.964435] [<c019cb47>] do_open+0xb7/0x290
[ 6.964435] [<c019cf50>] blkdev_open+0x0/0x70
[ 6.964435] [<c019cf80>] blkdev_open+0x30/0x70
[ 6.964435] [<c01757ed>] __dentry_open+0x16d/0x1f0
[ 6.964435] [<c0175925>] nameidata_to_filp+0x35/0x40
[ 6.964435] [<c017597b>] do_filp_open+0x4b/0x60
[ 6.964435] [<c017a5c2>] sys_readlinkat+0x32/0x90
[ 6.964435] [<c019c218>] block_ioctl+0x18/0x20
[ 6.964435] [<c019c200>] block_ioctl+0x0/0x20
[ 6.964435] [<c0182cfb>] do_ioctl+0x2b/0x90
[ 6.964435] [<c0182f7e>] vfs_ioctl+0x21e/0x2a0
[ 6.964435] [<c018303d>] sys_ioctl+0x3d/0x70
[ 6.964435] [<c010444a>] syscall_call+0x7/0xb
[ 6.964435] =======================
[ 6.964435] Code: 04 00 00 89 44 24 0c e9 b4 fe ff ff b8 00 71 70 c0 e8 49 a5 18 00 31 c9 31 d2 89 c3 89 f8 e8 ac 27 f3 ff 85 c0 0f 84 5c fe ff ff <0f> 0b eb fe 83 ec 38 31 d2 89 7c 24 30 89 c7 89 5c 24 28 8d 4c
[ 6.964435] EIP: [<c03edf9c>] cdrom_newpc_intr+0x2bc/0x2c0 SS:ESP 0068:cf2b59a8
[ 6.964435] Kernel panic - not syncing: Fatal exception in interrupt

--
Best Regards,

Nai


2008-01-29 16:10:25

by Florian Lohoff

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Tue, Jan 29, 2008 at 10:01:45PM +0800, Nai Xia wrote:
>
> Build environment: debian sid, gcc-4.2.3, i386.
>
> The bug is in lately git-pull from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>
> And it can be reproduced very easily on a machine with normal cdroms.
> It halts booting and I grabbed the bug output with an serial console.

I get this on boot on an Core Duo Lifebook E8110 with an DVD Recorder which
renders this kernel unusable:

[ 40.406743] ------------[ cut here ]------------
[ 40.410255] kernel BUG at drivers/ide/ide-cd.c:1726!
[ 40.410255] invalid opcode: 0000 [#1] PREEMPT SMP
[ 40.410255] Modules linked in: ehci_hcd(+) ohci1394(+) uhci_hcd(+) output yenta_socket(+) rsrc_nonstatic irda parport_pc(+) parport mmc_core ieee1394 pcmcia_core sky2 ide_cd usbcore crc_ccitt
[ 40.410255]
[ 40.410255] Pid: 1115, comm: modprobe Not tainted (2.6.25-rc0 #1)
[ 40.410255] EIP: 0060:[<e0055e70>] EFLAGS: 00010002 CPU: 0
[ 40.410255] EIP is at cdrom_newpc_intr+0xfb/0x29f [ide_cd]
[ 40.410255] EAX: 00000001 EBX: 00000002 ECX: 0000009e EDX: 00000001
[ 40.410255] ESI: 00000003 EDI: c0547f18 EBP: de52bdbc ESP: de52bd8c
[ 40.410255] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 40.410255] Process modprobe (pid: 1115, ti=de52a000 task=de616000 task.ti=de52a000)
[ 40.410255] Stack: c0528d00 de52bda8 df2d9600 de52bdb0 df24bc20 00000000 0000009e de52bdbc
[ 40.410255] 00000050 c0547ec0 c0547f18 df24bc00 de52bdf4 c025bc76 00000c64 de54ab7c
[ 40.410255] c050b760 000200d2 c0491758 0000000e 00000202 e0055d75 de616000 df24d660
[ 40.410255] Call Trace:
[ 40.410255] [<c0104f2e>] show_trace_log_lvl+0x1a/0x2f
[ 40.410255] [<c0104fde>] show_stack_log_lvl+0x9b/0xa3
[ 40.410255] [<c010508a>] show_registers+0xa4/0x1ba
[ 40.410255] [<c01052bd>] die+0x11d/0x1fe
[ 40.410255] [<c0105428>] do_trap+0x8a/0xa3
[ 40.410255] [<c0105700>] do_invalid_op+0x88/0x92
[ 40.410255] [<c037d142>] error_code+0x72/0x78
[ 40.410255] [<c025bc76>] ide_intr+0x166/0x1c9
[ 40.410255] [<c014dbee>] handle_IRQ_event+0x21/0x48
[ 40.410255] [<c014ebac>] handle_edge_irq+0xa8/0x10d
[ 40.410255] [<c0106848>] do_IRQ+0x5c/0x78
[ 40.410255] [<c010494f>] common_interrupt+0x23/0x28
[ 40.410255] [<c014207c>] sys_init_module+0x29d/0x18e1
[ 40.410255] [<c0103f3e>] sysenter_past_esp+0x5f/0x85
[ 40.410255] =======================
[ 40.410255] Code: f6 45 f0 08 8b 92 98 00 00 00 89 55 e8 75 3e b8 80 e5 4c c0 e8 81 6d 32 e0 31 c9 31 d2 89 c3 8b 45 d8 e8 bf 1e 17 e0 85 c0 74 04 <0f> 0b eb fe 8b 47 2c 89 da 8b 40 08 c7 40 1c 00 00 00 00 b8 80
[ 40.410255] EIP: [<e0055e70>] cdrom_newpc_intr+0xfb/0x29f [ide_cd] SS:ESP 0068:de52bd8c
[ 40.410255] Kernel panic - not syncing: Fatal exception in interrupt

Flo
--
Florian Lohoff [email protected] +49-171-2280134
Those who would give up a little freedom to get a little
security shall soon have neither - Benjamin Franklin


Attachments:
(No filename) (2.95 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-01-29 20:15:08

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi,

Sorry for the inconvenience.
I can't reproduce this bug on my test environment.
So please give me more information below to find what's wrong.

On Tue, 29 Jan 2008 22:01:45 +0800, Nai Xia wrote:
> Hi,
>
> Build environment: debian sid, gcc-4.2.3, i386.
>
> The bug is in lately git-pull from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>
> And it can be reproduced very easily on a machine with normal cdroms.
> It halts booting and I grabbed the bug output with an serial console.
>
> Code around the bug is:
>
> /*
> * If DRQ is clear, the command has completed.
> */
> if ((stat & DRQ_STAT) == 0) {
> spin_lock_irqsave(&ide_lock, flags);
> if (__blk_end_request(rq, 0, 0))
> BUG(); // Here comes the bug !
> HWGROUP(drive)->rq = NULL;
> spin_unlock_irqrestore(&ide_lock, flags);
>
> return ide_stopped;
> }
>
> [ 6.964356] kernel BUG at drivers/ide/ide-cd.c:1726!
> [ 6.964435] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 6.964435] Modules linked in:
> [ 6.964435]
> [ 6.964435] Pid: 1138, comm: scsi_id Not tainted (2.6.24-g8561b089 #19)
> [ 6.964435] EIP: 0060:[<c03edf9c>] EFLAGS: 00010002 CPU: 0
> [ 6.964435] EIP is at cdrom_newpc_intr+0x2bc/0x2c0
> [ 6.964435] EAX: 00000001 EBX: 00000002 ECX: 00000001 EDX: 00000001
> [ 6.964435] ESI: 00000003 EDI: cf0144d8 EBP: c07a3c58 ESP: cf2b59a8
> [ 6.964435] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [ 6.964435] Process scsi_id (pid: 1138, ti=cf2b4000 task=cf8856b0 task.ti=cf2b4000)
> [ 6.964435] Stack: 00000202 00000002 c0778e80 c0133b57 0000009e 00000000 cf009c00 00000050
> [ 6.964435] 0000000f c07a3c00 cf009c00 c07a3c58 c03df2c9 00000000 00000000 00000000
> [ 6.964435] 00000000 00000000 00000202 c03edce0 cfbcbb60 00000000 00000000 0000000f
> [ 6.964435] Call Trace:
> [ 6.964435] [<c0133b57>] lock_timer_base+0x27/0x60
> [ 6.964435] [<c03df2c9>] ide_intr+0xa9/0x200
> [ 6.964435] [<c03edce0>] cdrom_newpc_intr+0x0/0x2c0
> [ 6.964435] [<c01503d5>] handle_IRQ_event+0x25/0x50
> [ 6.964435] [<c01517d9>] handle_edge_irq+0xb9/0x120
> [ 6.964435] [<c010687b>] do_IRQ+0x3b/0x70
> [ 6.964435] [<c0104df3>] common_interrupt+0x23/0x28
> [ 6.964435] [<c03dfa88>] ide_outsw+0x8/0x10
> [ 6.964435] [<c03e0ae4>] ata_output_data+0x84/0x90
> [ 6.964435] [<c03dfe07>] atapi_output_bytes+0x27/0x60
> [ 6.964435] [<c03ee2f8>] cdrom_transfer_packet_command+0x98/0x110
> [ 6.964435] [<c03eafb0>] cdrom_timer_expiry+0x0/0x60
> [ 6.964435] [<c03ecd8b>] cdrom_start_packet_command+0x10b/0x130
> [ 6.964435] [<c03ee370>] cdrom_do_newpc_cont+0x0/0x30
> [ 6.964435] [<c03deb96>] ide_do_request+0x426/0x990
> [ 6.964435] [<c0133ee7>] del_timer+0x57/0x70
> [ 6.964435] [<c0321ae6>] blk_remove_plug+0x26/0x60
> [ 6.964435] [<c031db95>] elv_drain_elevator+0x15/0x60
> [ 6.964435] [<c031e6d8>] elv_insert+0xd8/0x170
> [ 6.964435] [<c0321ba4>] blk_execute_rq_nowait+0x54/0xa0
> [ 6.964435] [<c0321c70>] blk_execute_rq+0x80/0xf0
> [ 6.964435] [<c0320a10>] blk_end_sync_rq+0x0/0x30
> [ 6.964435] [<c019b563>] bio_add_pc_page+0x23/0x30
> [ 6.964435] [<c0320c06>] blk_rq_bio_prep+0x96/0xb0
> [ 6.964435] [<c0320f0c>] blk_rq_append_bio+0x1c/0x70
> [ 6.964435] [<c03210f2>] blk_rq_map_user+0x122/0x1a0
> [ 6.964435] [<c032577e>] sg_io+0x1be/0x320
> [ 6.964435] [<c0325bdc>] scsi_cmd_ioctl+0x2fc/0x430
> [ 6.964435] [<c03ec9a0>] idecd_revalidate_disk+0x10/0x20
> [ 6.964435] [<c01794d9>] get_super+0x99/0xa0
> [ 6.964435] [<c019c329>] __invalidate_device+0x39/0x50
> [ 6.964435] [<c047d0b7>] cdrom_ioctl+0x37/0xe20
> [ 6.964435] [<c01886c6>] __d_lookup+0x146/0x160
> [ 6.964435] [<c03eb6ed>] idecd_ioctl+0x17d/0x190
> [ 6.964435] [<c01804bf>] __link_path_walk+0xa1f/0xd50
> [ 6.964435] [<c032366d>] blkdev_driver_ioctl+0x6d/0x80
> [ 6.964435] [<c032390e>] blkdev_ioctl+0x28e/0x810
> [ 6.964435] [<c032bd9f>] kobject_get+0xf/0x20
> [ 6.964435] [<c0324169>] get_disk+0x29/0xa0
> [ 6.964435] [<c03241e7>] exact_lock+0x7/0x10
> [ 6.964435] [<c03b33b8>] kobj_lookup+0x148/0x160
> [ 6.964435] [<c019cb47>] do_open+0xb7/0x290
> [ 6.964435] [<c019cf50>] blkdev_open+0x0/0x70
> [ 6.964435] [<c019cf80>] blkdev_open+0x30/0x70
> [ 6.964435] [<c01757ed>] __dentry_open+0x16d/0x1f0
> [ 6.964435] [<c0175925>] nameidata_to_filp+0x35/0x40
> [ 6.964435] [<c017597b>] do_filp_open+0x4b/0x60
> [ 6.964435] [<c017a5c2>] sys_readlinkat+0x32/0x90
> [ 6.964435] [<c019c218>] block_ioctl+0x18/0x20
> [ 6.964435] [<c019c200>] block_ioctl+0x0/0x20
> [ 6.964435] [<c0182cfb>] do_ioctl+0x2b/0x90
> [ 6.964435] [<c0182f7e>] vfs_ioctl+0x21e/0x2a0
> [ 6.964435] [<c018303d>] sys_ioctl+0x3d/0x70
> [ 6.964435] [<c010444a>] syscall_call+0x7/0xb
> [ 6.964435] =======================
> [ 6.964435] Code: 04 00 00 89 44 24 0c e9 b4 fe ff ff b8 00 71 70 c0 e8 49 a5 18 00 31 c9 31 d2 89 c3 89 f8 e8 ac 27 f3 ff 85 c0 0f 84 5c fe ff ff <0f> 0b eb fe 83 ec 38 31 d2 89 7c 24 30 89 c7 89 5c 24 28 8d 4c
> [ 6.964435] EIP: [<c03edf9c>] cdrom_newpc_intr+0x2bc/0x2c0 SS:ESP 0068:cf2b59a8
> [ 6.964435] Kernel panic - not syncing: Fatal exception in interrupt

Could you give me all boot messages and your config?


Are you able to get some outputs on 2.6.24 + the patch below?
If yes, original ide-cd might have a problem.

--- 2.6.24.orig/drivers/ide/ide-cd.c 2008-01-24 17:58:37.000000000 -0500
+++ 2.6.24/drivers/ide/ide-cd.c 2008-01-29 20:55:59.000000000 -0500
@@ -1711,8 +1711,12 @@
/*
* If DRQ is clear, the command has completed.
*/
- if ((stat & DRQ_STAT) == 0)
+ if ((stat & DRQ_STAT) == 0) {
+ if (end_that_request_chunk(rq, 1, 0))
+ blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
+
goto end_request;
+ }

/*
* check which way to transfer data



If that's not comfortable for you, could you try with the patch
on top of current 2.6.24-git and send outputs?

--- linus-git.orig/drivers/ide/ide-cd.c 2008-01-29 10:04:51.000000000 -0500
+++ linus-git/drivers/ide/ide-cd.c 2008-01-29 21:00:19.000000000 -0500
@@ -1723,7 +1723,7 @@
if ((stat & DRQ_STAT) == 0) {
spin_lock_irqsave(&ide_lock, flags);
if (__blk_end_request(rq, 0, 0))
- BUG();
+ blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
HWGROUP(drive)->rq = NULL;
spin_unlock_irqrestore(&ide_lock, flags);


Thanks,
Kiyoshi Ueda

2008-01-29 23:24:19

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi Bart,

On Tue, 29 Jan 2008 14:22:53 -0800, Roland Dreier wrote:
> Hi, I saw the same BUG from ide-cd on one of my systems. I applied
> the debugging patch to replace the BUG with blk_dump_rq_flags(), and I
> got the output below (full boot log and .config attached to this
> email).
>
> Please let me know if there's anything else that would help debug the
> problem.

Thank you for the information, Roland.


> [ 4.072271] Uniform CD-ROM driver Revision: 3.20
> [ 4.098236] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 4.100269]
> [ 4.100269] sector 1949759, nr/cnr 0/0
> [ 4.100269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> [ 4.100269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 4.101005] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 4.104269]
> [ 4.104269] sector 1949759, nr/cnr 0/0
> [ 4.104269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> [ 4.104269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 4.109203] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 4.112270]
> [ 4.112270] sector 1949759, nr/cnr 0/0
> [ 4.112270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> [ 4.112270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 4.112945] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 4.116270]
> [ 4.116270] sector 1949759, nr/cnr 0/0
> [ 4.116270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> [ 4.116270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00

Bart,
This means that the rq still has a bio even after DRQ_STAT is cleared.
The original ide-cd code was calling only end_that_request_last() there.
So I thought that the rq should have no bio when DRQ_STAT is cleared,
otherwise the bio leaks.

Was my understanding wrong and is that correct behavior in ide-cd?

Thanks,
Kiyoshi Ueda

Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089


Hi,

On Wednesday 30 January 2008, Kiyoshi Ueda wrote:
> Hi Bart,
>
> On Tue, 29 Jan 2008 14:22:53 -0800, Roland Dreier wrote:
> > Hi, I saw the same BUG from ide-cd on one of my systems. I applied
> > the debugging patch to replace the BUG with blk_dump_rq_flags(), and I
> > got the output below (full boot log and .config attached to this
> > email).
> >
> > Please let me know if there's anything else that would help debug the
> > problem.
>
> Thank you for the information, Roland.
>
>
> > [ 4.072271] Uniform CD-ROM driver Revision: 3.20
> > [ 4.098236] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.100269]
> > [ 4.100269] sector 1949759, nr/cnr 0/0
> > [ 4.100269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > [ 4.100269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 4.101005] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.104269]
> > [ 4.104269] sector 1949759, nr/cnr 0/0
> > [ 4.104269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > [ 4.104269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 4.109203] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.112270]
> > [ 4.112270] sector 1949759, nr/cnr 0/0
> > [ 4.112270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > [ 4.112270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 4.112945] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.116270]
> > [ 4.116270] sector 1949759, nr/cnr 0/0
> > [ 4.116270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > [ 4.116270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
>
> Bart,
> This means that the rq still has a bio even after DRQ_STAT is cleared.
> The original ide-cd code was calling only end_that_request_last() there.
> So I thought that the rq should have no bio when DRQ_STAT is cleared,
> otherwise the bio leaks.
>
> Was my understanding wrong and is that correct behavior in ide-cd?

Added Borislav to cc:.

PS I'm extremely busy with "real-life" (unfortunately IDE hacking is not
my paid job) and the friday is the earliest date on which I would be able
to look in detail into this problem and other outstanding IDE stuff, sorry.

Thanks,
Bart

2008-01-30 05:04:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Wed, Jan 30, 2008 at 12:58:33AM +0100, Bartlomiej Zolnierkiewicz wrote:
>
> Hi,
>
> On Wednesday 30 January 2008, Kiyoshi Ueda wrote:
> > Hi Bart,
> >
> > On Tue, 29 Jan 2008 14:22:53 -0800, Roland Dreier wrote:
> > > Hi, I saw the same BUG from ide-cd on one of my systems. I applied
> > > the debugging patch to replace the BUG with blk_dump_rq_flags(), and I
> > > got the output below (full boot log and .config attached to this
> > > email).
> > >
> > > Please let me know if there's anything else that would help debug the
> > > problem.
> >
> > Thank you for the information, Roland.
> >
> >
> > > [ 4.072271] Uniform CD-ROM driver Revision: 3.20
> > > [ 4.098236] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > [ 4.100269]
> > > [ 4.100269] sector 1949759, nr/cnr 0/0
> > > [ 4.100269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > > [ 4.100269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > [ 4.101005] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > [ 4.104269]
> > > [ 4.104269] sector 1949759, nr/cnr 0/0
> > > [ 4.104269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > > [ 4.104269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > [ 4.109203] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > [ 4.112270]
> > > [ 4.112270] sector 1949759, nr/cnr 0/0
> > > [ 4.112270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > > [ 4.112270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > [ 4.112945] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > [ 4.116270]
> > > [ 4.116270] sector 1949759, nr/cnr 0/0
> > > [ 4.116270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > > [ 4.116270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> >
> > Bart,
> > This means that the rq still has a bio even after DRQ_STAT is cleared.
> > The original ide-cd code was calling only end_that_request_last() there.
> > So I thought that the rq should have no bio when DRQ_STAT is cleared,
> > otherwise the bio leaks.
> >
> > Was my understanding wrong and is that correct behavior in ide-cd?
>
> Added Borislav to cc:.
>
> PS I'm extremely busy with "real-life" (unfortunately IDE hacking is not
> my paid job) and the friday is the earliest date on which I would be able
> to look in detail into this problem and other outstanding IDE stuff, sorry.

Same here, will be able to look into it tomorrow. In the meantime, can someone
direct me the full BUG() output?
--
Regards/Gru?,
Boris.

2008-01-30 05:17:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Wed, Jan 30, 2008 at 06:03:47AM +0100, Borislav Petkov wrote:
> On Wed, Jan 30, 2008 at 12:58:33AM +0100, Bartlomiej Zolnierkiewicz wrote:
> >
> > Hi,
> >
> > On Wednesday 30 January 2008, Kiyoshi Ueda wrote:
> > > Hi Bart,
> > >
> > > On Tue, 29 Jan 2008 14:22:53 -0800, Roland Dreier wrote:
> > > > Hi, I saw the same BUG from ide-cd on one of my systems. I applied
> > > > the debugging patch to replace the BUG with blk_dump_rq_flags(), and I
> > > > got the output below (full boot log and .config attached to this
> > > > email).
> > > >
> > > > Please let me know if there's anything else that would help debug the
> > > > problem.
> > >
> > > Thank you for the information, Roland.
> > >
> > >
> > > > [ 4.072271] Uniform CD-ROM driver Revision: 3.20
> > > > [ 4.098236] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > > [ 4.100269]
> > > > [ 4.100269] sector 1949759, nr/cnr 0/0
> > > > [ 4.100269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > > > [ 4.100269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > > [ 4.101005] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > > [ 4.104269]
> > > > [ 4.104269] sector 1949759, nr/cnr 0/0
> > > > [ 4.104269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > > > [ 4.104269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > > [ 4.109203] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > > [ 4.112270]
> > > > [ 4.112270] sector 1949759, nr/cnr 0/0
> > > > [ 4.112270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > > > [ 4.112270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > > [ 4.112945] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > > > [ 4.116270]
> > > > [ 4.116270] sector 1949759, nr/cnr 0/0
> > > > [ 4.116270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > > > [ 4.116270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > >
> > > Bart,
> > > This means that the rq still has a bio even after DRQ_STAT is cleared.
> > > The original ide-cd code was calling only end_that_request_last() there.
> > > So I thought that the rq should have no bio when DRQ_STAT is cleared,
> > > otherwise the bio leaks.
> > >
> > > Was my understanding wrong and is that correct behavior in ide-cd?
> >
> > Added Borislav to cc:.
> >
> > PS I'm extremely busy with "real-life" (unfortunately IDE hacking is not
> > my paid job) and the friday is the earliest date on which I would be able
> > to look in detail into this problem and other outstanding IDE stuff, sorry.
>
> Same here, will be able to look into it tomorrow. In the meantime, can someone
> direct me the full BUG() output?

Nevermind. Got it, thanks.

--
Regards/Gru?,
Boris.

2008-01-31 01:28:07

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi Roland, Borislav, Bart,

Added linux-ide ML, since we may be able to get helps from other
ide experts. This thread started from:
http://lkml.org/lkml/2008/1/29/140

On Tue, 29 Jan 2008 18:23:56 -0500 (EST), Kiyoshi Ueda wrote:
> Hi Bart,
>
> On Tue, 29 Jan 2008 14:22:53 -0800, Roland Dreier wrote:
> > Hi, I saw the same BUG from ide-cd on one of my systems. I applied
> > the debugging patch to replace the BUG with blk_dump_rq_flags(), and I
> > got the output below (full boot log and .config attached to this
> > email).
> >
> > Please let me know if there's anything else that would help debug the
> > problem.
>
> Thank you for the information, Roland.
>
>
> > [ 4.072271] Uniform CD-ROM driver Revision: 3.20
> > [ 4.098236] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.100269]
> > [ 4.100269] sector 1949759, nr/cnr 0/0
> > [ 4.100269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > [ 4.100269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 4.101005] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.104269]
> > [ 4.104269] sector 1949759, nr/cnr 0/0
> > [ 4.104269] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > [ 4.104269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 4.109203] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.112270]
> > [ 4.112270] sector 1949759, nr/cnr 0/0
> > [ 4.112270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d8
> > [ 4.112270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 4.112945] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> > [ 4.116270]
> > [ 4.116270] sector 1949759, nr/cnr 0/0
> > [ 4.116270] bio ffff8102418cc600, biotail ffff8102418cc600, buffer 0000000000000000, d2
> > [ 4.116270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
>
> Bart,
> This means that the rq still has a bio even after DRQ_STAT is cleared.
> The original ide-cd code was calling only end_that_request_last() there.
> So I thought that the rq should have no bio when DRQ_STAT is cleared,
> otherwise the bio leaks.
>
> Was my understanding wrong and is that correct behavior in ide-cd?

I borrowed a box having the same nForce chipset and tried sg_inq
command to submit the GPCMD_INQUIRY ("cdb: 12" of the debug message).
I confirmed that ide-cd run through the code path (DRQ_STAT == 0)
by the same debug patch, but requests always don't have bio there
on my test box. So I can't reproduce the problem yet.
-----------------------------------------------------------------------
ide-cd: rq: dev hda: type=2, flags=114c8

sector 37958141, nr/cnr 0/0
bio 00000000, biotail f78e4980, buffer 00000000, data 00000000, len 0
cdb: 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00
-----------------------------------------------------------------------


The original code was calling only end_that_request_last() here,
but no problem happened.
This may mean that the upper layer can handle the rq correctly,
no matter whether the rq still has a bio or not.
If so, we should be able to unlink the bio by calling
end_that_request_chunk() with remaining data size.



Roland,
Could you try the patch below and give me all boot messages again?

This patch displays debug messages against requests still having bio,
then tries to unlink all bios from the rq before the rq is completed.
So your system may be able to continue to work correctly
after displaying debug messages.
I'd like to see the debug messages and know whether your system
still gets the problem or not.

--- a/drivers/ide/ide-cd.c 2008-01-30 18:24:51.000000000 -0500
+++ b/drivers/ide/ide-cd.c 2008-01-30 18:24:33.000000000 -0500
@@ -1722,8 +1722,18 @@ static ide_startstop_t cdrom_newpc_intr(
*/
if ((stat & DRQ_STAT) == 0) {
spin_lock_irqsave(&ide_lock, flags);
- if (__blk_end_request(rq, 0, 0))
- BUG();
+ if (__blk_end_request(rq, 0, 0)) {
+ blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
+ printk("backup: data_len=%u bi_size=%u\n",
+ rq->data_len, rq->bio->bi_size);
+
+ if (__blk_end_request(rq, 0, rq->data_len)) {
+ blk_dump_rq_flags(rq, "ide-cd: BAD rq");
+ printk("backup: data_len=%u bi_size=%u\n",
+ rq->data_len, rq->bio->bi_size);
+ BUG();
+ }
+ }
HWGROUP(drive)->rq = NULL;
spin_unlock_irqrestore(&ide_lock, flags);

Thanks,
Kiyoshi Ueda

2008-01-31 07:51:06

by Nai Xia

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

My dmesg relevant info is quite similar:

[ 6.875041] Freeing unused kernel memory: 320k freed
[ 8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
[ 8.144439]
[ 8.144439] sector 10824201199534213, nr/cnr 0/0
[ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
00000000, len 158
[ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[ 8.144439] backup: data_len=158 bi_size=158
[ 8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
[ 8.160756]
[ 8.160756] sector 2669858, nr/cnr 0/0
[ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
00000000, len 158
[ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[ 8.160756] backup: data_len=158 bi_size=158
[ 14.851101] eth0: link up
[ 27.121883] eth0: no IPv6 routers present


And by the way, Kiyoshi,
This can be reproduced in a typical setup vmware workstation 6.02 with
a vritual IDE cdrom,
in case you wanna catch that with your own eyes. :-)
Thanks for your trying hard to correct this annoying bug.

Best Regards,

Nai


On 1/31/08, Roland Dreier <[email protected]> wrote:
> > Could you try the patch below and give me all boot messages again?
>
> Sure, no problem, see below for full log (I updated to the latest git,
> which seems to have some other unrelated problems with things timing
> out earlier in the boot, but it does get to the ide-cd init); here's
> the relevant looking debug info:
>
> [ 163.990058] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 163.990060]
> [ 163.990061] sector 1987647, nr/cnr 0/0
> [ 163.990063] bio ffff8101c418c200, biotail ffff8101c418c200, buffer
> 0000000000000000, data 0000000000000000, len 158
> [ 163.990064] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 163.990069] backup: data_len=158 bi_size=158
> [ 163.994021] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 163.994023]
> [ 163.994024] sector 1987647, nr/cnr 0/0
> [ 163.994025] bio ffff8101c418c200, biotail ffff8101c418c200, buffer
> 0000000000000000, data 0000000000000000, len 158
> [ 163.994027] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 163.994032] backup: data_len=158 bi_size=158
>
> > This patch displays debug messages against requests still having bio,
> > then tries to unlink all bios from the rq before the rq is completed.
> > So your system may be able to continue to work correctly
> > after displaying debug messages.
>
> By the way, even with your first debug patch, I had no problem using
> the box once it booted. I didn't try the CD-ROM but everything else
> seemed fine.
>
> Also, I saw the same issue on a different box (Dell SC1435 with
> serverworks IDE):
>
> [ 1.367648] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> [ 1.371649] ide: Assuming 33MHz system bus speed for PIO modes;
> override with idebus=xx
> [ 2.612417] SvrWks HT1000: IDE controller (0x1166:0x0214 rev 0x00) at
> PCI slot 0000:00:02.1
> [ 2.612439] SvrWks HT1000: not 100% native mode: will probe irqs
> later
> [ 2.616427] ide0: BM-DMA at 0x08c0-0x08c7, BIOS settings:
> hda:DMA, hdb:pio
> [ 2.623505] Probing IDE interface ide0...
> [ 4.037070] hda: PHILIPS DVD-ROM SDR089, ATAPI CD/DVD-ROM drive
> [ 4.041801] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> [ 4.058447] hda: UDMA/33 mode selected
> [ 4.061278] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> [ 4.632739] hda: ATAPI 24X DVD-ROM drive, 2048kB Cache
> ...
> [ 8.631615] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 8.631615]
> [ 8.631615] sector 2867855, nr/cnr 0/0
> [ 8.631615] bio ffff81021fba3f40, biotail ffff81021fba3f40, buffer
> 0000000000000000, data 0000000000000000, len 158
> [ 8.631615] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 8.660505] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
> [ 8.660505]
> [ 8.660505] sector 2867855, nr/cnr 0/0
> [ 8.660505] bio ffff81021fba3f40, biotail ffff81021fba3f40, buffer
> 0000000000000000, data 0000000000000000, len 62
> [ 8.660505] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
>
> Let me know if you'd like debug info from that too.
>
>

2008-01-31 13:06:19

by Jens Axboe

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Thu, Jan 31 2008, Nai Xia wrote:
> My dmesg relevant info is quite similar:
>
> [ 6.875041] Freeing unused kernel memory: 320k freed
> [ 8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> [ 8.144439]
> [ 8.144439] sector 10824201199534213, nr/cnr 0/0
> [ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
> 00000000, len 158
> [ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 8.144439] backup: data_len=158 bi_size=158
> [ 8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> [ 8.160756]
> [ 8.160756] sector 2669858, nr/cnr 0/0
> [ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
> 00000000, len 158
> [ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [ 8.160756] backup: data_len=158 bi_size=158
> [ 14.851101] eth0: link up
> [ 27.121883] eth0: no IPv6 routers present
>
>
> And by the way, Kiyoshi,
> This can be reproduced in a typical setup vmware workstation 6.02 with
> a vritual IDE cdrom,
> in case you wanna catch that with your own eyes. :-)
> Thanks for your trying hard to correct this annoying bug.

The below fix should be enough. It's perfectly legal to have leftover
byte counts when the drive signals completion, happens all the time for
eg user issued commands where you don't know an exact byte count.

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74c6087..bee05a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
*/
if ((stat & DRQ_STAT) == 0) {
spin_lock_irqsave(&ide_lock, flags);
- if (__blk_end_request(rq, 0, 0))
+ if (__blk_end_request(rq, 0, rq->data_len))
BUG();
HWGROUP(drive)->rq = NULL;
spin_unlock_irqrestore(&ide_lock, flags);

--
Jens Axboe

2008-01-31 14:57:06

by Florian Lohoff

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
> The below fix should be enough. It's perfectly legal to have leftover
> byte counts when the drive signals completion, happens all the time for
> eg user issued commands where you don't know an exact byte count.
>
> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> index 74c6087..bee05a3 100644
> --- a/drivers/ide/ide-cd.c
> +++ b/drivers/ide/ide-cd.c
> @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
> */
> if ((stat & DRQ_STAT) == 0) {
> spin_lock_irqsave(&ide_lock, flags);
> - if (__blk_end_request(rq, 0, 0))
> + if (__blk_end_request(rq, 0, rq->data_len))
> BUG();
> HWGROUP(drive)->rq = NULL;
> spin_unlock_irqrestore(&ide_lock, flags);
>

Fixes the crash on boot for me ...

Flo
--
Florian Lohoff [email protected] +49-171-2280134
Those who would give up a little freedom to get a little
security shall soon have neither - Benjamin Franklin


Attachments:
(No filename) (0.99 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-01-31 15:04:37

by Jens Axboe

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Thu, Jan 31 2008, Florian Lohoff wrote:
> On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
> > The below fix should be enough. It's perfectly legal to have leftover
> > byte counts when the drive signals completion, happens all the time for
> > eg user issued commands where you don't know an exact byte count.
> >
> > diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> > index 74c6087..bee05a3 100644
> > --- a/drivers/ide/ide-cd.c
> > +++ b/drivers/ide/ide-cd.c
> > @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
> > */
> > if ((stat & DRQ_STAT) == 0) {
> > spin_lock_irqsave(&ide_lock, flags);
> > - if (__blk_end_request(rq, 0, 0))
> > + if (__blk_end_request(rq, 0, rq->data_len))
> > BUG();
> > HWGROUP(drive)->rq = NULL;
> > spin_unlock_irqrestore(&ide_lock, flags);
> >
>
> Fixes the crash on boot for me ...

Great, thanks for confirming that. I'll make sure the patch goes
upstream today, if Linus is available.

--
Jens Axboe

2008-01-31 17:06:49

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi Jens,

On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote:
> On Thu, Jan 31 2008, Nai Xia wrote:
> > My dmesg relevant info is quite similar:
> >
> > [ 6.875041] Freeing unused kernel memory: 320k freed
> > [ 8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > [ 8.144439]
> > [ 8.144439] sector 10824201199534213, nr/cnr 0/0
> > [ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
> > 00000000, len 158
> > [ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 8.144439] backup: data_len=158 bi_size=158
> > [ 8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > [ 8.160756]
> > [ 8.160756] sector 2669858, nr/cnr 0/0
> > [ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
> > 00000000, len 158
> > [ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 8.160756] backup: data_len=158 bi_size=158
> > [ 14.851101] eth0: link up
> > [ 27.121883] eth0: no IPv6 routers present
> >
> >
> > And by the way, Kiyoshi,
> > This can be reproduced in a typical setup vmware workstation 6.02 with
> > a vritual IDE cdrom,
> > in case you wanna catch that with your own eyes. :-)
> > Thanks for your trying hard to correct this annoying bug.
>
> The below fix should be enough. It's perfectly legal to have leftover
> byte counts when the drive signals completion, happens all the time for
> eg user issued commands where you don't know an exact byte count.
>
> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> index 74c6087..bee05a3 100644
> --- a/drivers/ide/ide-cd.c
> +++ b/drivers/ide/ide-cd.c
> @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
> */
> if ((stat & DRQ_STAT) == 0) {
> spin_lock_irqsave(&ide_lock, flags);
> - if (__blk_end_request(rq, 0, 0))
> + if (__blk_end_request(rq, 0, rq->data_len))
> BUG();
> HWGROUP(drive)->rq = NULL;
> spin_unlock_irqrestore(&ide_lock, flags);

OK, I undarstand the leftover is legal.

By the way, is it safe to always return success if there is a leftover?
I thought we might have to complete the rq with -EIO in such case.

Thanks,
Kiyoshi Ueda

2008-01-31 18:19:05

by Jens Axboe

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089



On 31/01/2008, at 18.04, Kiyoshi Ueda <[email protected]> wrote:

> Hi Jens,
>
> On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote:
>> On Thu, Jan 31 2008, Nai Xia wrote:
>>> My dmesg relevant info is quite similar:
>>>
>>> [ 6.875041] Freeing unused kernel memory: 320k freed
>>> [ 8.143120] ide-cd: rq still having bio: dev hdc: type=2,
>>> flags=114c8
>>> [ 8.144439]
>>> [ 8.144439] sector 10824201199534213, nr/cnr 0/0
>>> [ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
>>> 00000000, len 158
>>> [ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
>>> [ 8.144439] backup: data_len=158 bi_size=158
>>> [ 8.160756] ide-cd: rq still having bio: dev hdc: type=2,
>>> flags=114c8
>>> [ 8.160756]
>>> [ 8.160756] sector 2669858, nr/cnr 0/0
>>> [ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
>>> 00000000, len 158
>>> [ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
>>> [ 8.160756] backup: data_len=158 bi_size=158
>>> [ 14.851101] eth0: link up
>>> [ 27.121883] eth0: no IPv6 routers present
>>>
>>>
>>> And by the way, Kiyoshi,
>>> This can be reproduced in a typical setup vmware workstation 6.02
>>> with
>>> a vritual IDE cdrom,
>>> in case you wanna catch that with your own eyes. :-)
>>> Thanks for your trying hard to correct this annoying bug.
>>
>> The below fix should be enough. It's perfectly legal to have leftover
>> byte counts when the drive signals completion, happens all the time
>> for
>> eg user issued commands where you don't know an exact byte count.
>>
>> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
>> index 74c6087..bee05a3 100644
>> --- a/drivers/ide/ide-cd.c
>> +++ b/drivers/ide/ide-cd.c
>> @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr
>> (ide_drive_t *drive)
>> */
>> if ((stat & DRQ_STAT) == 0) {
>> spin_lock_irqsave(&ide_lock, flags);
>> - if (__blk_end_request(rq, 0, 0))
>> + if (__blk_end_request(rq, 0, rq->data_len))
>> BUG();
>> HWGROUP(drive)->rq = NULL;
>> spin_unlock_irqrestore(&ide_lock, flags);
>
> OK, I undarstand the leftover is legal.
>
> By the way, is it safe to always return success if there is a
> leftover?
> I thought we might have to complete the rq with -EIO in such case.

data_len being non zero should pass the residual count back to the
issuer.

2008-01-31 20:36:23

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi Jens,

On Thu, 31 Jan 2008 19:16:54 +0100, Jens Axboe wrote:
> On 31/01/2008, at 18.04, Kiyoshi Ueda <[email protected]> wrote:
> > On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote:
> >> On Thu, Jan 31 2008, Nai Xia wrote:
> >>> My dmesg relevant info is quite similar:
> >>>
> >>> [ 6.875041] Freeing unused kernel memory: 320k freed
> >>> [ 8.143120] ide-cd: rq still having bio: dev hdc: type=2,
> >>> flags=114c8
> >>> [ 8.144439]
> >>> [ 8.144439] sector 10824201199534213, nr/cnr 0/0
> >>> [ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
> >>> 00000000, len 158
> >>> [ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> >>> [ 8.144439] backup: data_len=158 bi_size=158
> >>> [ 8.160756] ide-cd: rq still having bio: dev hdc: type=2,
> >>> flags=114c8
> >>> [ 8.160756]
> >>> [ 8.160756] sector 2669858, nr/cnr 0/0
> >>> [ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
> >>> 00000000, len 158
> >>> [ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> >>> [ 8.160756] backup: data_len=158 bi_size=158
> >>> [ 14.851101] eth0: link up
> >>> [ 27.121883] eth0: no IPv6 routers present
> >>>
> >>>
> >>> And by the way, Kiyoshi,
> >>> This can be reproduced in a typical setup vmware workstation 6.02
> >>> with
> >>> a vritual IDE cdrom,
> >>> in case you wanna catch that with your own eyes. :-)
> >>> Thanks for your trying hard to correct this annoying bug.
> >>
> >> The below fix should be enough. It's perfectly legal to have leftover
> >> byte counts when the drive signals completion, happens all the time
> >> for eg user issued commands where you don't know an exact byte count.
> >>
> >> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> >> index 74c6087..bee05a3 100644
> >> --- a/drivers/ide/ide-cd.c
> >> +++ b/drivers/ide/ide-cd.c
> >> @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr
> >> (ide_drive_t *drive)
> >> */
> >> if ((stat & DRQ_STAT) == 0) {
> >> spin_lock_irqsave(&ide_lock, flags);
> >> - if (__blk_end_request(rq, 0, 0))
> >> + if (__blk_end_request(rq, 0, rq->data_len))
> >> BUG();
> >> HWGROUP(drive)->rq = NULL;
> >> spin_unlock_irqrestore(&ide_lock, flags);
> >
> > OK, I undarstand the leftover is legal.
> >
> > By the way, is it safe to always return success if there is a
> > leftover?
> > I thought we might have to complete the rq with -EIO in such case.
>
> data_len being non zero should pass the residual count back to the
> issuer.

Aah, so the issuer can know how many bytes of the I/Os are not done,
and the error status of the bio which is completed by
end_that_request_first() in __blk_end_request() don't matter
for the issuer.
OK, thanks. I think the patch is fine.

Thanks,
Kiyoshi Ueda

2008-01-31 21:38:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
> On Thu, Jan 31 2008, Nai Xia wrote:
> > My dmesg relevant info is quite similar:
> >
> > [ 6.875041] Freeing unused kernel memory: 320k freed
> > [ 8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > [ 8.144439]
> > [ 8.144439] sector 10824201199534213, nr/cnr 0/0
> > [ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
> > 00000000, len 158
> > [ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 8.144439] backup: data_len=158 bi_size=158
> > [ 8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > [ 8.160756]
> > [ 8.160756] sector 2669858, nr/cnr 0/0
> > [ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
> > 00000000, len 158
> > [ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > [ 8.160756] backup: data_len=158 bi_size=158
> > [ 14.851101] eth0: link up
> > [ 27.121883] eth0: no IPv6 routers present
> >
> >
> > And by the way, Kiyoshi,
> > This can be reproduced in a typical setup vmware workstation 6.02 with
> > a vritual IDE cdrom,
> > in case you wanna catch that with your own eyes. :-)
> > Thanks for your trying hard to correct this annoying bug.
>
> The below fix should be enough. It's perfectly legal to have leftover
> byte counts when the drive signals completion, happens all the time for
> eg user issued commands where you don't know an exact byte count.

Actually, this behavior has been the case even before the __blk_end_request()
changes. I did test plain 2.6.24 with the following


--- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.000000000 +0100
+++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.000000000 +0100
@@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
/*
* If DRQ is clear, the command has completed.
*/
- if ((stat & DRQ_STAT) == 0)
+ if ((stat & DRQ_STAT) == 0) {
+ blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
+ printk("backup: data_len=%u bi_size=%u\n",
+ rq->data_len, rq->bio->bi_size);
goto end_request;
+ }

/*
* check which way to transfer data


to see whether we've been getting residual byte counts:

Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
Jan 31 22:10:06 gollum kernel: [ 26.702945]
Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0
Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer 00000000, data 00000000, len 158
Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158

... so we've been simply silently ignoring this until now so i guess we don't
need to BUG() for something that's totally benign.

--
Regards/Gru?,
Boris.

2008-01-31 22:36:20

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi Boris,

Thank you for the confirmation of original behavior.

On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote:
> On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
> > On Thu, Jan 31 2008, Nai Xia wrote:
> > > My dmesg relevant info is quite similar:
> > >
> > > [ 6.875041] Freeing unused kernel memory: 320k freed
> > > [ 8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > > [ 8.144439]
> > > [ 8.144439] sector 10824201199534213, nr/cnr 0/0
> > > [ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
> > > 00000000, len 158
> > > [ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > [ 8.144439] backup: data_len=158 bi_size=158
> > > [ 8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > > [ 8.160756]
> > > [ 8.160756] sector 2669858, nr/cnr 0/0
> > > [ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
> > > 00000000, len 158
> > > [ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > [ 8.160756] backup: data_len=158 bi_size=158
> > > [ 14.851101] eth0: link up
> > > [ 27.121883] eth0: no IPv6 routers present
> > >
> > >
> > > And by the way, Kiyoshi,
> > > This can be reproduced in a typical setup vmware workstation 6.02 with
> > > a vritual IDE cdrom,
> > > in case you wanna catch that with your own eyes. :-)
> > > Thanks for your trying hard to correct this annoying bug.
> >
> > The below fix should be enough. It's perfectly legal to have leftover
> > byte counts when the drive signals completion, happens all the time for
> > eg user issued commands where you don't know an exact byte count.
>
> Actually, this behavior has been the case even before the __blk_end_request()
> changes. I did test plain 2.6.24 with the following
>
>
> --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.000000000 +0100
> +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.000000000 +0100
> @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
> /*
> * If DRQ is clear, the command has completed.
> */
> - if ((stat & DRQ_STAT) == 0)
> + if ((stat & DRQ_STAT) == 0) {
> + blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
> + printk("backup: data_len=%u bi_size=%u\n",
> + rq->data_len, rq->bio->bi_size);
> goto end_request;
> + }
>
> /*
> * check which way to transfer data
>
>
> to see whether we've been getting residual byte counts:
>
> Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> Jan 31 22:10:06 gollum kernel: [ 26.702945]
> Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0
> Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer 00000000, data 00000000, len 158
> Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158
>
> ... so we've been simply silently ignoring this until now so i guess we don't
> need to BUG() for something that's totally benign.

end_that_request_last() is not called when __blk_end_reuqest()
returns 1. Then, the issuer isn't waken up.
So I think the BUG() or error messages should be there.

And fortunately, the issuer seems not to mind whether
end_that_request_first() is called for the remaining bio or not.

So I think Jens' patch is fine.

Thanks,
Kiyoshi Ueda

2008-02-01 07:53:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Thu, Jan 31, 2008 at 05:35:56PM -0500, Kiyoshi Ueda wrote:
> Hi Boris,
>
> Thank you for the confirmation of original behavior.
>
> On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote:
> > On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
> > > On Thu, Jan 31 2008, Nai Xia wrote:
> > > > My dmesg relevant info is quite similar:
> > > >
> > > > [ 6.875041] Freeing unused kernel memory: 320k freed
> > > > [ 8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > > > [ 8.144439]
> > > > [ 8.144439] sector 10824201199534213, nr/cnr 0/0
> > > > [ 8.144439] bio cf029280, biotail cf029280, buffer 00000000, data
> > > > 00000000, len 158
> > > > [ 8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > > [ 8.144439] backup: data_len=158 bi_size=158
> > > > [ 8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > > > [ 8.160756]
> > > > [ 8.160756] sector 2669858, nr/cnr 0/0
> > > > [ 8.160756] bio cf029300, biotail cf029300, buffer 00000000, data
> > > > 00000000, len 158
> > > > [ 8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > > [ 8.160756] backup: data_len=158 bi_size=158
> > > > [ 14.851101] eth0: link up
> > > > [ 27.121883] eth0: no IPv6 routers present
> > > >
> > > >
> > > > And by the way, Kiyoshi,
> > > > This can be reproduced in a typical setup vmware workstation 6.02 with
> > > > a vritual IDE cdrom,
> > > > in case you wanna catch that with your own eyes. :-)
> > > > Thanks for your trying hard to correct this annoying bug.
> > >
> > > The below fix should be enough. It's perfectly legal to have leftover
> > > byte counts when the drive signals completion, happens all the time for
> > > eg user issued commands where you don't know an exact byte count.
> >
> > Actually, this behavior has been the case even before the __blk_end_request()
> > changes. I did test plain 2.6.24 with the following
> >
> >
> > --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.000000000 +0100
> > +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.000000000 +0100
> > @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
> > /*
> > * If DRQ is clear, the command has completed.
> > */
> > - if ((stat & DRQ_STAT) == 0)
> > + if ((stat & DRQ_STAT) == 0) {
> > + blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
> > + printk("backup: data_len=%u bi_size=%u\n",
> > + rq->data_len, rq->bio->bi_size);
> > goto end_request;
> > + }
> >
> > /*
> > * check which way to transfer data
> >
> >
> > to see whether we've been getting residual byte counts:
> >
> > Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > Jan 31 22:10:06 gollum kernel: [ 26.702945]
> > Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0
> > Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer 00000000, data 00000000, len 158
> > Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158
> >
> > ... so we've been simply silently ignoring this until now so i guess we don't
> > need to BUG() for something that's totally benign.

Hi Kiyoshi,

> end_that_request_last() is not called when __blk_end_reuqest()
> returns 1. Then, the issuer isn't waken up.
> So I think the BUG() or error messages should be there.

you mean, end_that_request_last() isn't called when __end_that_request_first()
returns an error and this is the case only for fs and pc requests. Otherwise it
_is_ called, thus simulating somewhat the previous behavior. However, we never BUG()'ged
on residual byte counts before and this driver has been in the kernel tree for
ages, so what puzzles me now is how is BUG()'ing here better than before and
shouldn't we simply issue a warning instead of killing the interrupt handler...

..or am i missing something?

--
Regards/Gru?,
Boris.

2008-02-01 17:40:03

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi Boris,

On Fri, 1 Feb 2008 08:51:17 +0100, Borislav Petkov wrote:
> > > > The below fix should be enough. It's perfectly legal to have leftover
> > > > byte counts when the drive signals completion, happens all the time for
> > > > eg user issued commands where you don't know an exact byte count.
> > >
> > > Actually, this behavior has been the case even before
> > > the __blk_end_request() changes.
> > > I did test plain 2.6.24 with the following
> > >
> > >
> > > --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.000000000 +0100
> > > +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.000000000 +0100
> > > @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
> > > /*
> > > * If DRQ is clear, the command has completed.
> > > */
> > > - if ((stat & DRQ_STAT) == 0)
> > > + if ((stat & DRQ_STAT) == 0) {
> > > + blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
> > > + printk("backup: data_len=%u bi_size=%u\n",
> > > + rq->data_len, rq->bio->bi_size);
> > > goto end_request;
> > > + }
> > >
> > > /*
> > > * check which way to transfer data
> > >
> > >
> > > to see whether we've been getting residual byte counts:
> > >
> > > Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > > Jan 31 22:10:06 gollum kernel: [ 26.702945]
> > > Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0
> > > Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer 00000000, data 00000000, len 158
> > > Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158
> > >
> > > ... so we've been simply silently ignoring this until now so
> > > i guess we don't need to BUG() for something that's totally benign.
>
> Hi Kiyoshi,
>
> > end_that_request_last() is not called when __blk_end_reuqest()
> > returns 1. Then, the issuer isn't waken up.
> > So I think the BUG() or error messages should be there.
>
> you mean, end_that_request_last() isn't called when __end_that_request_first()
> returns an error and this is the case only for fs and pc requests.
> Otherwise it _is_ called, thus simulating somewhat the previous behavior.
> However, we never BUG()'ged on residual byte counts before and
> this driver has been in the kernel tree for ages, so what puzzles
> me now is how is BUG()'ing here better than before and shouldn't we
> simply issue a warning instead of killing the interrupt handler...

The Jens' patch passes the residual byte counts to __blk_end_request(),
so __end_that_reqeust_first() should never return 1 and we should never
BUG() on the residual byte counts, unless inconsistency happens such as
the size of remaining bios is bigger than the residual byte counts.

So if __blk_end_request() returns 1 even with the Jens' patch,
it means that the block layer or the driver really have a bug.
And then, the request and the bios could leak or the issuer
would wait forever because end_that_request_last() isn't called.

The previous behavior might ignore such inconsistency and leak only
the bios because it was calling end_that_request_last() anyway.
I would like to BUG() in such cases personally, but I don't object
strongly if you prefer not to BUG().

Thanks,
Kiyoshi Ueda

2008-02-01 18:30:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Fri, Feb 01, 2008 at 12:39:27PM -0500, Kiyoshi Ueda wrote:

<snip>

> > > end_that_request_last() is not called when __blk_end_reuqest()
> > > returns 1. Then, the issuer isn't waken up.
> > > So I think the BUG() or error messages should be there.
> >
> > you mean, end_that_request_last() isn't called when __end_that_request_first()
> > returns an error and this is the case only for fs and pc requests.
> > Otherwise it _is_ called, thus simulating somewhat the previous behavior.
> > However, we never BUG()'ged on residual byte counts before and
> > this driver has been in the kernel tree for ages, so what puzzles
> > me now is how is BUG()'ing here better than before and shouldn't we
> > simply issue a warning instead of killing the interrupt handler...
>
> The Jens' patch passes the residual byte counts to __blk_end_request(),
> so __end_that_reqeust_first() should never return 1 and we should never
> BUG() on the residual byte counts, unless inconsistency happens such as
> the size of remaining bios is bigger than the residual byte counts.

yep.

> So if __blk_end_request() returns 1 even with the Jens' patch,
> it means that the block layer or the driver really have a bug.
> And then, the request and the bios could leak or the issuer
> would wait forever because end_that_request_last() isn't called.
>
> The previous behavior might ignore such inconsistency and leak only
> the bios because it was calling end_that_request_last() anyway.
> I would like to BUG() in such cases personally, but I don't object
> strongly if you prefer not to BUG().

BUG() is definitely what we should do here to catch this case of sizeof(bios) >
rq->data_len. Putting a brown paper bag over the issue will never get it fixed
if it really leaks bios. Thanks for clarifying that.

By the way, shouldn't we be doing a little branch prediction here:

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74c6087..bee05a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
*/
if ((stat & DRQ_STAT) == 0) {
spin_lock_irqsave(&ide_lock, flags);
- if (__blk_end_request(rq, 0, 0))
+ if (unlikely(__blk_end_request(rq, 0, rq->data_len)))
BUG();
HWGROUP(drive)->rq = NULL;
spin_unlock_irqrestore(&ide_lock, flags);


--
Regards/Gru?,
Boris.

2008-02-01 19:39:09

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Hi Boris,

On Fri, 1 Feb 2008 19:29:09 +0100, Borislav Petkov wrote:
> > > > end_that_request_last() is not called when __blk_end_reuqest()
> > > > returns 1. Then, the issuer isn't waken up.
> > > > So I think the BUG() or error messages should be there.
> > >
> > > you mean, end_that_request_last() isn't called when __end_that_request_first()
> > > returns an error and this is the case only for fs and pc requests.
> > > Otherwise it _is_ called, thus simulating somewhat the previous behavior.
> > > However, we never BUG()'ged on residual byte counts before and
> > > this driver has been in the kernel tree for ages, so what puzzles
> > > me now is how is BUG()'ing here better than before and shouldn't we
> > > simply issue a warning instead of killing the interrupt handler...
> >
> > The Jens' patch passes the residual byte counts to __blk_end_request(),
> > so __end_that_reqeust_first() should never return 1 and we should never
> > BUG() on the residual byte counts, unless inconsistency happens such as
> > the size of remaining bios is bigger than the residual byte counts.
>
> yep.
>
> > So if __blk_end_request() returns 1 even with the Jens' patch,
> > it means that the block layer or the driver really have a bug.
> > And then, the request and the bios could leak or the issuer
> > would wait forever because end_that_request_last() isn't called.
> >
> > The previous behavior might ignore such inconsistency and leak only
> > the bios because it was calling end_that_request_last() anyway.
> > I would like to BUG() in such cases personally, but I don't object
> > strongly if you prefer not to BUG().
>
> BUG() is definitely what we should do here to catch this case of sizeof(bios) >
> rq->data_len. Putting a brown paper bag over the issue will never get it fixed
> if it really leaks bios. Thanks for clarifying that.
>
> By the way, shouldn't we be doing a little branch prediction here:
>
> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> index 74c6087..bee05a3 100644
> --- a/drivers/ide/ide-cd.c
> +++ b/drivers/ide/ide-cd.c
> @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
> */
> if ((stat & DRQ_STAT) == 0) {
> spin_lock_irqsave(&ide_lock, flags);
> - if (__blk_end_request(rq, 0, 0))
> + if (unlikely(__blk_end_request(rq, 0, rq->data_len)))
> BUG();
> HWGROUP(drive)->rq = NULL;
> spin_unlock_irqrestore(&ide_lock, flags);

I think it's reasonable.

Thanks,
Kiyoshi Ueda

Subject: Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

On Thursday 31 January 2008, Jens Axboe wrote:
> On Thu, Jan 31 2008, Florian Lohoff wrote:
> > On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
> > > The below fix should be enough. It's perfectly legal to have leftover
> > > byte counts when the drive signals completion, happens all the time for
> > > eg user issued commands where you don't know an exact byte count.
> > >
> > > diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> > > index 74c6087..bee05a3 100644
> > > --- a/drivers/ide/ide-cd.c
> > > +++ b/drivers/ide/ide-cd.c
> > > @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
> > > */
> > > if ((stat & DRQ_STAT) == 0) {
> > > spin_lock_irqsave(&ide_lock, flags);
> > > - if (__blk_end_request(rq, 0, 0))
> > > + if (__blk_end_request(rq, 0, rq->data_len))
> > > BUG();
> > > HWGROUP(drive)->rq = NULL;
> > > spin_unlock_irqrestore(&ide_lock, flags);
> > >
> >
> > Fixes the crash on boot for me ...
>
> Great, thanks for confirming that. I'll make sure the patch goes
> upstream today, if Linus is available.

Thanks Jens.