2007-05-14 18:26:24

by Gerald Britton

[permalink] [raw]
Subject: [PATCH] cciss: Fix pci_driver.shutdown while device is still active

Fix an Oops in the cciss driver caused by system shutdown while a filesystem
on a cciss device is still active. The cciss_remove_one function only
properly removes the device if the device has been cleanly released by its
users, which is not the case when the pci_driver.shutdown method is called.

This patch adds a new cciss_shutdown function to better match the pattern
used by various SCSI drivers: deactivate device interrupts and flush caches.
It also alters the cciss_remove_one function to match and readds the
__devexit annotation that was removed when cciss_remove_one was serving as
the pci_driver.shutdown method.

Signed-off-by: Gerald Britton <[email protected]>
---
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 370dfe1..5acc6c4 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -3469,13 +3469,39 @@ static int __devinit cciss_init_one(struct pci_dev *pdev,
return -1;
}

-static void cciss_remove_one(struct pci_dev *pdev)
+static void cciss_shutdown(struct pci_dev *pdev)
{
ctlr_info_t *tmp_ptr;
- int i, j;
+ int i;
char flush_buf[4];
int return_code;

+ tmp_ptr = pci_get_drvdata(pdev);
+ if (tmp_ptr == NULL)
+ return;
+ i = tmp_ptr->ctlr;
+ if (hba[i] == NULL)
+ return;
+
+ /* Turn board interrupts off and send the flush cache command */
+ /* sendcmd will turn off interrupt, and send the flush...
+ * To write all data in the battery backed cache to disks */
+ memset(flush_buf, 0, 4);
+ return_code = sendcmd(CCISS_CACHE_FLUSH, i, flush_buf, 4, 0, 0, 0, NULL,
+ TYPE_CMD);
+ if (return_code == IO_OK) {
+ printk(KERN_INFO "Completed flushing cache on controller %d\n", i);
+ } else {
+ printk(KERN_WARNING "Error flushing cache on controller %d\n", i);
+ }
+ free_irq(hba[i]->intr[2], hba[i]);
+}
+
+static void __devexit cciss_remove_one(struct pci_dev *pdev)
+{
+ ctlr_info_t *tmp_ptr;
+ int i, j;
+
if (pci_get_drvdata(pdev) == NULL) {
printk(KERN_ERR "cciss: Unable to remove device \n");
return;
@@ -3506,18 +3532,7 @@ static void cciss_remove_one(struct pci_dev *pdev)

cciss_unregister_scsi(i); /* unhook from SCSI subsystem */

- /* Turn board interrupts off and send the flush cache command */
- /* sendcmd will turn off interrupt, and send the flush...
- * To write all data in the battery backed cache to disks */
- memset(flush_buf, 0, 4);
- return_code = sendcmd(CCISS_CACHE_FLUSH, i, flush_buf, 4, 0, 0, 0, NULL,
- TYPE_CMD);
- if (return_code == IO_OK) {
- printk(KERN_INFO "Completed flushing cache on controller %d\n", i);
- } else {
- printk(KERN_WARNING "Error flushing cache on controller %d\n", i);
- }
- free_irq(hba[i]->intr[2], hba[i]);
+ cciss_shutdown(pdev);

#ifdef CONFIG_PCI_MSI
if (hba[i]->msix_vector)
@@ -3550,7 +3565,7 @@ static struct pci_driver cciss_pci_driver = {
.probe = cciss_init_one,
.remove = __devexit_p(cciss_remove_one),
.id_table = cciss_pci_device_id, /* id_table */
- .shutdown = cciss_remove_one,
+ .shutdown = cciss_shutdown,
};

/*


2007-05-14 19:07:36

by Mike Miller

[permalink] [raw]
Subject: RE: [PATCH] cciss: Fix pci_driver.shutdown while device is still active



> -----Original Message-----
> From: Gerald Britton [mailto:[email protected]]
> Sent: Monday, May 14, 2007 12:53 PM
> To: [email protected]; [email protected];
> Miller, Mike (OS Dev); ISS StorageDev;
> [email protected]; [email protected]
> Subject: [PATCH] cciss: Fix pci_driver.shutdown while device
> is still active
>
> Fix an Oops in the cciss driver caused by system shutdown
> while a filesystem on a cciss device is still active. The
> cciss_remove_one function only properly removes the device if
> the device has been cleanly released by its users, which is
> not the case when the pci_driver.shutdown method is called.

Please send the Oops output.

mikem

>
> This patch adds a new cciss_shutdown function to better match
> the pattern used by various SCSI drivers: deactivate device
> interrupts and flush caches.
> It also alters the cciss_remove_one function to match and
> readds the __devexit annotation that was removed when
> cciss_remove_one was serving as the pci_driver.shutdown method.
>
> Signed-off-by: Gerald Britton <[email protected]>
> ---
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> index 370dfe1..5acc6c4 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -3469,13 +3469,39 @@ static int __devinit
> cciss_init_one(struct pci_dev *pdev,
> return -1;
> }
>
> -static void cciss_remove_one(struct pci_dev *pdev)
> +static void cciss_shutdown(struct pci_dev *pdev)
> {
> ctlr_info_t *tmp_ptr;
> - int i, j;
> + int i;
> char flush_buf[4];
> int return_code;
>
> + tmp_ptr = pci_get_drvdata(pdev);
> + if (tmp_ptr == NULL)
> + return;
> + i = tmp_ptr->ctlr;
> + if (hba[i] == NULL)
> + return;
> +
> + /* Turn board interrupts off and send the flush cache
> command */
> + /* sendcmd will turn off interrupt, and send the flush...
> + * To write all data in the battery backed cache to disks */
> + memset(flush_buf, 0, 4);
> + return_code = sendcmd(CCISS_CACHE_FLUSH, i, flush_buf,
> 4, 0, 0, 0, NULL,
> + TYPE_CMD);
> + if (return_code == IO_OK) {
> + printk(KERN_INFO "Completed flushing cache on
> controller %d\n", i);
> + } else {
> + printk(KERN_WARNING "Error flushing cache on
> controller %d\n", i);
> + }
> + free_irq(hba[i]->intr[2], hba[i]);
> +}
> +
> +static void __devexit cciss_remove_one(struct pci_dev *pdev) {
> + ctlr_info_t *tmp_ptr;
> + int i, j;
> +
> if (pci_get_drvdata(pdev) == NULL) {
> printk(KERN_ERR "cciss: Unable to remove device \n");
> return;
> @@ -3506,18 +3532,7 @@ static void cciss_remove_one(struct
> pci_dev *pdev)
>
> cciss_unregister_scsi(i); /* unhook from SCSI subsystem */
>
> - /* Turn board interrupts off and send the flush cache
> command */
> - /* sendcmd will turn off interrupt, and send the flush...
> - * To write all data in the battery backed cache to disks */
> - memset(flush_buf, 0, 4);
> - return_code = sendcmd(CCISS_CACHE_FLUSH, i, flush_buf,
> 4, 0, 0, 0, NULL,
> - TYPE_CMD);
> - if (return_code == IO_OK) {
> - printk(KERN_INFO "Completed flushing cache on
> controller %d\n", i);
> - } else {
> - printk(KERN_WARNING "Error flushing cache on
> controller %d\n", i);
> - }
> - free_irq(hba[i]->intr[2], hba[i]);
> + cciss_shutdown(pdev);
>
> #ifdef CONFIG_PCI_MSI
> if (hba[i]->msix_vector)
> @@ -3550,7 +3565,7 @@ static struct pci_driver cciss_pci_driver = {
> .probe = cciss_init_one,
> .remove = __devexit_p(cciss_remove_one),
> .id_table = cciss_pci_device_id, /* id_table */
> - .shutdown = cciss_remove_one,
> + .shutdown = cciss_shutdown,
> };
>
> /*
>

2007-05-14 19:21:26

by Gerald Britton

[permalink] [raw]
Subject: Re: [PATCH] cciss: Fix pci_driver.shutdown while device is still active

On Mon, May 14, 2007 at 07:06:45PM -0000, Miller, Mike (OS Dev) wrote:
> Please send the Oops output.

Sure, here it is, this was built from a clean 2.6.21.1 tree (despite the
-dirty in the kernel version).

-- Gerald

Unmounting local filesystems...umount: /: device is busy
done.
mount: / is busy
Rebooting... md: stopping all md devices.
p<6>Completed flushing cache on controller 0
BUG: unable to handle kernel NULL pointer dereference<2>EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #879553 offset 0
Aborting journal on device cciss/c0d0p1.
Buffer I/O error on device cciss/c0d0p1, logical block 518
lost page write due to I/O error on cciss/c0d0p1
Remounting filesystem read-only
Buffer I/O error on device cciss/c0d0p1, logical block 0
lost page write due to I/O error on cciss/c0d0p1
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #879553 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #456065 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #879553 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #879553 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #197834 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #197834 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #197834 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #197834 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #197834 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #197834 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #197834 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #293192 offset 0
EXT3-fs error (device cciss/c0d0p1): ext3_find_entry: reading directory #456065 offset 0
at virtual address 00000024
printing eip:
c04dbb7e
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: tun ipv6 pcspkr e1000 bnx2 ipmi_si ipmi_msghandler sg sr_mod cdrom ata_piix ata_generic libata ehci_hcd joydev uhci_hcd ext3 jbd mbcache cciss scsi_mod
CPU: 2
EIP: 0060:[<c04dbb7e>] Not tainted VLI
EFLAGS: 00010002 (2.6.21.1-dirty #2)
EIP is at elv_completed_request+0x7e/0x90
eax: 00000000 ebx: dfbf3784 ecx: 00000000 edx: f7497d60
esi: dfbf3784 edi: f7497d60 ebp: f7c80000 esp: c07b6f94
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process reboot (pid: 4004, ti=c07b6000 task=dfadba70 task.ti=c1920000)
Stack: f7497d60 dfbf3784 c04df4d8 f7c7b094 00000282 f7497d60 f7c80000 f883b445
00000001 f7c70000 f7497d68 c07b6fd0 c07add00 c07ac380 c04e0549 c07b6fd0
c07b6fd0 00000001 c075db20 c0432df4 00000002 0000000a c1920d68 c1920000
Call Trace:
[<c04df4d8>] __blk_put_request+0x28/0xa0
[<f883b445>] cciss_softirq_done+0xa5/0x110 [cciss]
[<c04e0549>] blk_done_softirq+0x59/0x70
[<c0432df4>] __do_softirq+0xc4/0xe0
[<c0407199>] do_softirq+0x59/0xd0
[<c0432eb5>] irq_exit+0x75/0x80
[<c041d13a>] smp_apic_timer_interrupt+0x2a/0x40
[<c0405b74>] apic_timer_interrupt+0x28/0x30
[<c042e5ce>] vprintk+0x1ae/0x290
[<f883d465>] do_cciss_intr+0x4e5/0x4f0 [cciss]
[<f883c8ad>] sendcmd+0x8d/0x2b0 [cciss]
[<c042e417>] printk+0x17/0x20
[<f883e52d>] cciss_remove_one+0xfd/0x230 [cciss]
[<c04fa654>] pci_device_shutdown+0x14/0x20
[<c0570de0>] device_shutdown+0xf0/0x100
[<c043c14e>] kernel_restart+0xe/0x40
[<c063c6ac>] lock_kernel+0x1c/0x40
[<c043c327>] sys_reboot+0xf7/0x1b0
[<c04606e1>] filemap_nopage+0x211/0x3a0
[<c046d923>] do_no_page+0x1a3/0x310
[<c06147b0>] inet_sock_destruct+0xd0/0x210
[<c063c458>] _spin_lock_bh+0x8/0x20
[<c04a14be>] invalidate_inode_buffers+0xe/0xb0
[<c0491835>] dput+0x25/0x150
[<c0481ebf>] __fput+0x11f/0x190
[<c04967cb>] mntput_no_expire+0x1b/0x90
[<c0480483>] filp_close+0x43/0x70
[<c0480514>] sys_close+0x64/0xb0
[<c040511c>] syscall_call+0x7/0xb
=======================
Code: d1 89 d8 31 c9 ba 02 00 00 00 e8 6e 09 00 00 89 d8 ff 53 40 eb bc 8d b4 26 00 00 00 00 ff 8b 44 01 00 00 f6 42 14 04 74 9f 8b 00 <8b> 48 24 85 c9 74 96 89 d8 ff d1 eb 90 90 8d 74 26 00 83 ec 10
EIP: [<c04dbb7e>] elv_completed_request+0x7e/0x90 SS:ESP 0068:c07b6f94
Kernel panic - not syncing: Fatal exception in interrupt
BUG: at /home/gbritton/kernel/linux-2.6/arch/i386/kernel/smp.c:546 smp_call_function()
[<c041b51e>] smp_call_function+0xfe/0x110
[<c040617e>] show_registers+0x18e/0x2a0
[<c0552f24>] do_unblank_screen+0x24/0x150
[<c041b58b>] smp_send_stop+0x1b/0x30
[<c042da70>] panic+0x50/0x100
[<c04064a7>] die+0x1c7/0x220
[<c063e199>] do_page_fault+0x2a9/0x560
[<c053fb28>] __add_entropy_words+0x68/0x1b0
[<c063def0>] do_page_fault+0x0/0x560
[<c063c794>] error_code+0x7c/0x84
[<c04dbb7e>] elv_completed_request+0x7e/0x90
[<c04df4d8>] __blk_put_request+0x28/0xa0
[<f883b445>] cciss_softirq_done+0xa5/0x110 [cciss]
[<c04e0549>] blk_done_softirq+0x59/0x70
[<c0432df4>] __do_softirq+0xc4/0xe0
[<c0407199>] do_softirq+0x59/0xd0
[<c0432eb5>] irq_exit+0x75/0x80
[<c041d13a>] smp_apic_timer_interrupt+0x2a/0x40
[<c0405b74>] apic_timer_interrupt+0x28/0x30
[<c042e5ce>] vprintk+0x1ae/0x290
[<f883d465>] do_cciss_intr+0x4e5/0x4f0 [cciss]
[<f883c8ad>] sendcmd+0x8d/0x2b0 [cciss]
[<c042e417>] printk+0x17/0x20
[<f883e52d>] cciss_remove_one+0xfd/0x230 [cciss]
[<c04fa654>] pci_device_shutdown+0x14/0x20
[<c0570de0>] device_shutdown+0xf0/0x100
[<c043c14e>] kernel_restart+0xe/0x40
[<c063c6ac>] lock_kernel+0x1c/0x40
[<c043c327>] sys_reboot+0xf7/0x1b0
[<c04606e1>] filemap_nopage+0x211/0x3a0
[<c046d923>] do_no_page+0x1a3/0x310
[<c06147b0>] inet_sock_destruct+0xd0/0x210
[<c063c458>] _spin_lock_bh+0x8/0x20
[<c04a14be>] invalidate_inode_buffers+0xe/0xb0
[<c0491835>] dput+0x25/0x150
[<c0481ebf>] __fput+0x11f/0x190
[<c04967cb>] mntput_no_expire+0x1b/0x90
[<c0480483>] filp_close+0x43/0x70
[<c0480514>] sys_close+0x64/0xb0
[<c040511c>] syscall_call+0x7/0xb
=======================

2007-05-17 21:34:31

by Mike Miller

[permalink] [raw]
Subject: Re: [PATCH] cciss: Fix pci_driver.shutdown while device is still active

On Mon, 2007-05-14 at 17:53 +0000, Gerald Britton wrote:
> Fix an Oops in the cciss driver caused by system shutdown while a filesystem
> on a cciss device is still active. The cciss_remove_one function only
> properly removes the device if the device has been cleanly released by its
> users, which is not the case when the pci_driver.shutdown method is called.
>
> This patch adds a new cciss_shutdown function to better match the pattern
> used by various SCSI drivers: deactivate device interrupts and flush caches.
> It also alters the cciss_remove_one function to match and readds the
> __devexit annotation that was removed when cciss_remove_one was serving as
> the pci_driver.shutdown method.

Sorry I've taken so long to reply. I've been testing this patch with up
to 512 logical volumes. Looks good. I may make a tweak or 2 after it's
merged, I have other changes that touch the same code.

ACKed-by: Mike Miller <[email protected]>

>
> Signed-off-by: Gerald Britton <[email protected]>
> ---
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> index 370dfe1..5acc6c4 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -3469,13 +3469,39 @@ static int __devinit cciss_init_one(struct pci_dev *pdev,
> return -1;
> }
>
> -static void cciss_remove_one(struct pci_dev *pdev)
> +static void cciss_shutdown(struct pci_dev *pdev)
> {
> ctlr_info_t *tmp_ptr;
> - int i, j;
> + int i;
> char flush_buf[4];
> int return_code;
>
> + tmp_ptr = pci_get_drvdata(pdev);
> + if (tmp_ptr == NULL)
> + return;
> + i = tmp_ptr->ctlr;
> + if (hba[i] == NULL)
> + return;
> +
> + /* Turn board interrupts off and send the flush cache command */
> + /* sendcmd will turn off interrupt, and send the flush...
> + * To write all data in the battery backed cache to disks */
> + memset(flush_buf, 0, 4);
> + return_code = sendcmd(CCISS_CACHE_FLUSH, i, flush_buf, 4, 0, 0, 0, NULL,
> + TYPE_CMD);
> + if (return_code == IO_OK) {
> + printk(KERN_INFO "Completed flushing cache on controller %d\n", i);
> + } else {
> + printk(KERN_WARNING "Error flushing cache on controller %d\n", i);
> + }
> + free_irq(hba[i]->intr[2], hba[i]);
> +}
> +
> +static void __devexit cciss_remove_one(struct pci_dev *pdev)
> +{
> + ctlr_info_t *tmp_ptr;
> + int i, j;
> +
> if (pci_get_drvdata(pdev) == NULL) {
> printk(KERN_ERR "cciss: Unable to remove device \n");
> return;
> @@ -3506,18 +3532,7 @@ static void cciss_remove_one(struct pci_dev *pdev)
>
> cciss_unregister_scsi(i); /* unhook from SCSI subsystem */
>
> - /* Turn board interrupts off and send the flush cache command */
> - /* sendcmd will turn off interrupt, and send the flush...
> - * To write all data in the battery backed cache to disks */
> - memset(flush_buf, 0, 4);
> - return_code = sendcmd(CCISS_CACHE_FLUSH, i, flush_buf, 4, 0, 0, 0, NULL,
> - TYPE_CMD);
> - if (return_code == IO_OK) {
> - printk(KERN_INFO "Completed flushing cache on controller %d\n", i);
> - } else {
> - printk(KERN_WARNING "Error flushing cache on controller %d\n", i);
> - }
> - free_irq(hba[i]->intr[2], hba[i]);
> + cciss_shutdown(pdev);
>
> #ifdef CONFIG_PCI_MSI
> if (hba[i]->msix_vector)
> @@ -3550,7 +3565,7 @@ static struct pci_driver cciss_pci_driver = {
> .probe = cciss_init_one,
> .remove = __devexit_p(cciss_remove_one),
> .id_table = cciss_pci_device_id, /* id_table */
> - .shutdown = cciss_remove_one,
> + .shutdown = cciss_shutdown,
> };
>
> /*

2007-05-17 22:19:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] cciss: Fix pci_driver.shutdown while device is still active

On Thu, 17 May 2007 16:33:12 -0500
"MIke Miller (OS Dev)" <[email protected]> wrote:

> On Mon, 2007-05-14 at 17:53 +0000, Gerald Britton wrote:
> > Fix an Oops in the cciss driver caused by system shutdown while a filesystem
> > on a cciss device is still active. The cciss_remove_one function only
> > properly removes the device if the device has been cleanly released by its
> > users, which is not the case when the pci_driver.shutdown method is called.
> >
> > This patch adds a new cciss_shutdown function to better match the pattern
> > used by various SCSI drivers: deactivate device interrupts and flush caches.
> > It also alters the cciss_remove_one function to match and readds the
> > __devexit annotation that was removed when cciss_remove_one was serving as
> > the pci_driver.shutdown method.
>
> Sorry I've taken so long to reply. I've been testing this patch with up
> to 512 logical volumes. Looks good. I may make a tweak or 2 after it's
> merged, I have other changes that touch the same code.
>
> ACKed-by: Mike Miller <[email protected]>
>

Thanks. I marked this as needed-in-2.6.21.x also.