2008-01-30 19:48:42

by Sven Köhler

[permalink] [raw]
Subject: [BUG?] GDTH driver not working after upgrade to 2.6.24

Hi,

so i have upgraded a system to kernel 2.6.24. After that, it failed to
boot with the usual message telling, that the rootfs on device /dev/sda1
cannot be mounted (a raid1 run by the controller below).

With 2.6.23.12, everything is working fine.

# lspci -v:

03:01.0 RAID bus controller: Intel Corporation RAID Controller
Subsystem: Intel Corporation Unknown device 01db
Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17
Memory at ddffc000 (32-bit, prefetchable) [size=16K]
[virtual] Expansion ROM at deef0000 [disabled] [size=32K]
Capabilities: [80] Power Management version 2

# GDT-related dmesg output (2.6.23.12):

GDT-HA: Storage RAID Controller Driver. Version: 3.05
ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17
GDT-HA: Found 1 PCI Storage RAID Controllers
Configuring GDT-PCI HA at 3/1 IRQ 17
GDT-HA 0: Name: SRCU42L
scsi0 : SRCU42L
scsi 0:0:0:0: Direct-Access Intel Host Drive #00 PQ: 0 ANSI: 2
scsi 0:2:6:0: Processor ESG-SHV SCA HSBP M29 1.06 PQ: 0 ANSI: 2
sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
sd 0:0:0:0: [sda] Assuming Write Enabled
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
sd 0:0:0:0: [sda] Assuming Write Enabled
sd 0:0:0:0: [sda] Assuming drive cache: write through
sda: sda1 sda2 < sda5 >
sd 0:0:0:0: [sda] Attached SCSI disk

# cat /boot/config-2.6.24 |grep GDT

CONFIG_SCSI_GDTH=y




Any ideas?

http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2
show huge drivers/scsi/gdth* related changes.

Can't test at the moment. System went production.


Regards,
Sven


Attachments:
signature.asc (252.00 B)
OpenPGP digital signature

2008-01-31 10:09:27

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUG?] GDTH driver not working after upgrade to 2.6.24

On Wed, Jan 30 2008 at 21:47 +0200, Sven K?hler <[email protected]> wrote:
> Hi,
>
> so i have upgraded a system to kernel 2.6.24. After that, it failed to
> boot with the usual message telling, that the rootfs on device /dev/sda1
> cannot be mounted (a raid1 run by the controller below).
>
> With 2.6.23.12, everything is working fine.
>
> # lspci -v:
>
> 03:01.0 RAID bus controller: Intel Corporation RAID Controller
> Subsystem: Intel Corporation Unknown device 01db
> Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17
> Memory at ddffc000 (32-bit, prefetchable) [size=16K]
> [virtual] Expansion ROM at deef0000 [disabled] [size=32K]
> Capabilities: [80] Power Management version 2
>
> # GDT-related dmesg output (2.6.23.12):
>
> GDT-HA: Storage RAID Controller Driver. Version: 3.05
> ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17
> GDT-HA: Found 1 PCI Storage RAID Controllers
> Configuring GDT-PCI HA at 3/1 IRQ 17
> GDT-HA 0: Name: SRCU42L
> scsi0 : SRCU42L
> scsi 0:0:0:0: Direct-Access Intel Host Drive #00 PQ: 0 ANSI: 2
> scsi 0:2:6:0: Processor ESG-SHV SCA HSBP M29 1.06 PQ: 0 ANSI: 2
> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
> sd 0:0:0:0: [sda] Assuming Write Enabled
> sd 0:0:0:0: [sda] Assuming drive cache: write through
> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
> sd 0:0:0:0: [sda] Assuming Write Enabled
> sd 0:0:0:0: [sda] Assuming drive cache: write through
> sda: sda1 sda2 < sda5 >
> sd 0:0:0:0: [sda] Attached SCSI disk
>
> # cat /boot/config-2.6.24 |grep GDT
>
> CONFIG_SCSI_GDTH=y
>
>
>
>
> Any ideas?
>
> http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2
> show huge drivers/scsi/gdth* related changes.
>
> Can't test at the moment. System went production.
>
>
> Regards,
> Sven
>

Hi Sven!

CCing to the scsi mailing list.

Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing
about it is that all three of the Coders that did that did not have any HW to work
on. One of them is me. We did cry for tester for a long time but no one came forward.

Could you test patches for us? first thing would be to enable debug output patch below.

If you absolutely need a 2.6.24 kernel, + gdth in a production system you could
checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24.
It will not however even compile in 2.6.25-rcx.

If any one wants to send me a card that uses the gdth driver, I will be very happy
to debug this card, and return it once I'm done.

Boaz

---
git-diff --stat -p
drivers/scsi/gdth.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index c825239..eca72c4 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -188,6 +188,7 @@ static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
struct gdth_cmndinfo *cmndinfo);
static void gdth_scsi_done(struct scsi_cmnd *scp);

+#define DEBUG_GDTH 1
#ifdef DEBUG_GDTH
static unchar DebugState = DEBUG_GDTH;






2008-01-31 11:57:05

by Sven Köhler

[permalink] [raw]
Subject: Re: [BUG?] GDTH driver not working after upgrade to 2.6.24

> Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing
> about it is that all three of the Coders that did that did not have any HW to work
> on. One of them is me. We did cry for tester for a long time but no one came forward.

All i'd like to ask is WHY!?
WHY such a big open heart surgery?
OK, you have your reasons.

> Could you test patches for us? first thing would be to enable debug output patch below.

The machine is still production. It will be replaced by a 64bit system
with an AACRAID card some time in the future. Then, i could maybe test
patches on the old machine. But unfortunatly, it's not my machine. It's
just administrated by me. And i don't know, what's exactly is the future
of it.

> If you absolutely need a 2.6.24 kernel, + gdth in a production system you could
> checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24.
> It will not however even compile in 2.6.25-rcx.

I see. Thanks for the hint. Would be an alternative. Actually, i don't
need 2.6.24 - but if something security related is fixed, then i'm not
able to move on to 2.6.24. And i'm using Gentoo. There's not really a
kernel maintained by the distribution.

> If any one wants to send me a card that uses the gdth driver, I will be very happy
> to debug this card, and return it once I'm done.

I don't think, it's a seperate card, that i could send you. It's some
19" rack server with such a card on-board, i think.


So it don't see much opportunity for me to test patched and stuff :-(


Attachments:
signature.asc (252.00 B)
OpenPGP digital signature

2008-01-31 12:36:55

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUG?] GDTH driver not working after upgrade to 2.6.24

On Thu, Jan 31 2008 at 13:07 +0200, Sven K?hler <[email protected]> wrote:
>> Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing
>> about it is that all three of the Coders that did that did not have any HW to work
>> on. One of them is me. We did cry for tester for a long time but no one came forward.
>
> All i'd like to ask is WHY!?
> WHY such a big open heart surgery?
> OK, you have your reasons.
>
>> Could you test patches for us? first thing would be to enable debug output patch below.
>
> The machine is still production. It will be replaced by a 64bit system
> with an AACRAID card some time in the future. Then, i could maybe test
> patches on the old machine. But unfortunatly, it's not my machine. It's
> just administrated by me. And i don't know, what's exactly is the future
> of it.
>
>> If you absolutely need a 2.6.24 kernel, + gdth in a production system you could
>> checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24.
>> It will not however even compile in 2.6.25-rcx.
>
> I see. Thanks for the hint. Would be an alternative. Actually, i don't
> need 2.6.24 - but if something security related is fixed, then i'm not
> able to move on to 2.6.24. And i'm using Gentoo. There's not really a
> kernel maintained by the distribution.
>
>> If any one wants to send me a card that uses the gdth driver, I will be very happy
>> to debug this card, and return it once I'm done.
>
> I don't think, it's a seperate card, that i could send you. It's some
> 19" rack server with such a card on-board, i think.
>
>
> So it don't see much opportunity for me to test patched and stuff :-(
>

Thanks, Perhaps someone else then.
Anyone with gdth HW that can test patches?
Your lspci said: "Intel Corporation RAID Controller" Matthew
is there a gdth card lying around in an Intel lab near you?

James do we need to mark gdth BROKEN for 2.6.24 and higher?

Boaz

2008-01-31 16:39:24

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [BUG?] GDTH driver not working after upgrade to 2.6.24

Hi,

On Jan 31 2008 14:35, Boaz Harrosh wrote:
>
>Thanks, Perhaps someone else then.
>Anyone with gdth HW that can test patches?

Is bisecting down the existing chain and finding the bad commit
sufficient? (I also take new patches.)

>Your lspci said: "Intel Corporation RAID Controller" Matthew
>is there a gdth card lying around in an Intel lab near you?
>
>James do we need to mark gdth BROKEN for 2.6.24 and higher?

I say revert it for the time being.

2008-01-31 16:53:23

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUG?] GDTH driver not working after upgrade to 2.6.24

On Thu, Jan 31 2008 at 18:39 +0200, Jan Engelhardt <[email protected]> wrote:
> Hi,
>
> On Jan 31 2008 14:35, Boaz Harrosh wrote:
>> Thanks, Perhaps someone else then.
>> Anyone with gdth HW that can test patches?
>
> Is bisecting down the existing chain and finding the bad commit
> sufficient? (I also take new patches.)
Most certainly. If you are willing, please... I'm looking for someone
responsive.

first - enable debug prints (see the original post)
and send me the prints.
second - bisection could be grate yes.
third - accepting patches and testing could be grate, thanks.

>
>> Your lspci said: "Intel Corporation RAID Controller" Matthew
>> is there a gdth card lying around in an Intel lab near you?
>>
>> James do we need to mark gdth BROKEN for 2.6.24 and higher?
>
> I say revert it for the time being.

It could be reverted for 2.6.24.x maintenance releases but for 2.6.25-xxx
it cannot as it will not compile, and the fix to that is what you see in code.

Boaz

2008-02-12 17:31:57

by Boaz Harrosh

[permalink] [raw]
Subject: [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage

On Thu, Jan 31 2008 at 12:08 +0200, Boaz Harrosh <[email protected]> wrote:
> On Wed, Jan 30 2008 at 21:47 +0200, Sven K?hler <[email protected]> wrote:
>> Hi,
>>
>> so i have upgraded a system to kernel 2.6.24. After that, it failed to
>> boot with the usual message telling, that the rootfs on device /dev/sda1
>> cannot be mounted (a raid1 run by the controller below).
>>
>> With 2.6.23.12, everything is working fine.
>>
>> # lspci -v:
>>
>> 03:01.0 RAID bus controller: Intel Corporation RAID Controller
>> Subsystem: Intel Corporation Unknown device 01db
>> Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17
>> Memory at ddffc000 (32-bit, prefetchable) [size=16K]
>> [virtual] Expansion ROM at deef0000 [disabled] [size=32K]
>> Capabilities: [80] Power Management version 2
>>
>> # GDT-related dmesg output (2.6.23.12):
>>
>> GDT-HA: Storage RAID Controller Driver. Version: 3.05
>> ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17
>> GDT-HA: Found 1 PCI Storage RAID Controllers
>> Configuring GDT-PCI HA at 3/1 IRQ 17
>> GDT-HA 0: Name: SRCU42L
>> scsi0 : SRCU42L
>> scsi 0:0:0:0: Direct-Access Intel Host Drive #00 PQ: 0 ANSI: 2
>> scsi 0:2:6:0: Processor ESG-SHV SCA HSBP M29 1.06 PQ: 0 ANSI: 2
>> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
>> sd 0:0:0:0: [sda] Assuming Write Enabled
>> sd 0:0:0:0: [sda] Assuming drive cache: write through
>> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
>> sd 0:0:0:0: [sda] Assuming Write Enabled
>> sd 0:0:0:0: [sda] Assuming drive cache: write through
>> sda: sda1 sda2 < sda5 >
>> sd 0:0:0:0: [sda] Attached SCSI disk
>>
>> # cat /boot/config-2.6.24 |grep GDT
>>
>> CONFIG_SCSI_GDTH=y
>>
>>
>>
>>
>> Any ideas?
>>
>> http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2
>> show huge drivers/scsi/gdth* related changes.
>>
>> Can't test at the moment. System went production.
>>
>>
>> Regards,
>> Sven
>>
>
> Hi Sven!
<snip>

With the kind help of:
Joerg Dorchain: <[email protected]>
Jon Chelton <[email protected]>
Stefan Priebe <[email protected]>

Which let me take up their machines their effort and their time
I hope I'm able to fix the gdth driver for the 2.6.24 kernel and forward.
Actually it was a simple miss by Christoph, but with my inexperience
it took me a bisection and a while to get it.

Both Joerg, and Stefan where able to boot with these patches and work
on their machine. Jon Chelton's disks array should also work, we're testing.

Submitted 2 patches. They should also be included after some testing
into the 2.6.24.x stable releases. (Will be posted after some more testing)

[PATCH 1/2] gdth: scan for scsi devices
simple but must fatal.

[PATCH 2/2] gdth: bugfix for the Timer at exit crash
James please inspect and comment on this patch. It was not yet tested
by the original bug submitter.

In the original gdth series Christoph has forgotten to add the call to
scsi_scan_host(). Jeff alternative patches did do the scan. After everything
was probed, the code would loop on all found cards and scan. However I like
to individually scan at each probe, because I think this way it is more ready
for the hot-plug API where the discovery is done outside of the driver, and the
probe is called on single host at a time. Is that right? please comment.

Test away.

Meany thanks to Joerg, Jon && Stefan, Cheers.

Boaz

2008-02-12 17:36:34

by Boaz Harrosh

[permalink] [raw]
Subject: [BUGFIX 1/2] gdth: scan for scsi devices


The patch: "gdth: switch to modern scsi host registration"

missed one simple fact when moving a way from scsi_module.c.
That is to call scsi_scan_host() on the probed host.
With this the gdth driver from 2.6.24 is again able to
see drives and boot.

Signed-off-by: Boaz Harrosh <[email protected]>
Tested-by: Joerg Dorchain: <[email protected]>
Tested-by: Stefan Priebe <[email protected]>
Tested-by: Jon Chelton <[email protected]>
---
drivers/scsi/gdth.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index b253b8c..8eb78be 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -4838,6 +4838,9 @@ static int __init gdth_isa_probe_one(ulong32 isa_bios)
if (error)
goto out_free_coal_stat;
list_add_tail(&ha->list, &gdth_instances);
+
+ scsi_scan_host(shp);
+
return 0;

out_free_coal_stat:
@@ -4965,6 +4968,9 @@ static int __init gdth_eisa_probe_one(ushort eisa_slot)
if (error)
goto out_free_coal_stat;
list_add_tail(&ha->list, &gdth_instances);
+
+ scsi_scan_host(shp);
+
return 0;

out_free_ccb_phys:
@@ -5102,6 +5108,9 @@ static int __init gdth_pci_probe_one(gdth_pci_str *pcistr, int ctr)
if (error)
goto out_free_coal_stat;
list_add_tail(&ha->list, &gdth_instances);
+
+ scsi_scan_host(shp);
+
return 0;

out_free_coal_stat:
--
1.5.3.3

2008-02-12 17:42:22

by Boaz Harrosh

[permalink] [raw]
Subject: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash


gdth _exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.

del_timer_sync the timer before we delete the cards.

NOT YET TESTED

Signed-off-by: Boaz Harrosh <[email protected]>
---
drivers/scsi/gdth.c | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..103280e 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data)
gdth_ha_str *ha;
ulong flags;

+ BUG_ON(list_empty(&gdth_instances));
+
ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
spin_lock_irqsave(&ha->smp_lock, flags);

@@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha)
ha->sdev = NULL;
}

- gdth_flush(ha);
-
if (shp->irq)
free_irq(shp->irq,ha);

@@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void)
{
gdth_ha_str *ha;

- list_for_each_entry(ha, &gdth_instances, list)
- gdth_remove_one(ha);
+ unregister_chrdev(major,"gdth");
+ unregister_reboot_notifier(&gdth_notifier);

#ifdef GDTH_STATISTICS
- del_timer(&gdth_timer);
+ del_timer_sync(&gdth_timer);
#endif
- unregister_chrdev(major,"gdth");
- unregister_reboot_notifier(&gdth_notifier);
+
+ list_for_each_entry(ha, &gdth_instances, list)
+ gdth_remove_one(ha);
}

module_init(gdth_init);
--
1.5.3.3

2008-02-12 18:05:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [BUGFIX 1/2] gdth: scan for scsi devices

On Tue, Feb 12, 2008 at 07:35:22PM +0200, Boaz Harrosh wrote:
>
> The patch: "gdth: switch to modern scsi host registration"
>
> missed one simple fact when moving a way from scsi_module.c.
> That is to call scsi_scan_host() on the probed host.
> With this the gdth driver from 2.6.24 is again able to
> see drives and boot.

Doh, someone please hand me a brown paper bag. My first series
of patches had this but it got dropped when I rebased it over various
janitor cleanups.

The patch looks obviously correct, thanks.

Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

Hello!

I've tested this patch now - and it works fine. Now rmmod, halt and
reboot also works.

Stefan Priebe


Boaz Harrosh schrieb:
> gdth _exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
>
> del_timer_sync the timer before we delete the cards.
>
> NOT YET TESTED
>
> Signed-off-by: Boaz Harrosh <[email protected]>
> ---
> drivers/scsi/gdth.c | 15 ++++++++-------
> 1 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
> index 8eb78be..103280e 100644
> --- a/drivers/scsi/gdth.c
> +++ b/drivers/scsi/gdth.c
> @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data)
> gdth_ha_str *ha;
> ulong flags;
>
> + BUG_ON(list_empty(&gdth_instances));
> +
> ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
> spin_lock_irqsave(&ha->smp_lock, flags);
>
> @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha)
> ha->sdev = NULL;
> }
>
> - gdth_flush(ha);
> -
> if (shp->irq)
> free_irq(shp->irq,ha);
>
> @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void)
> {
> gdth_ha_str *ha;
>
> - list_for_each_entry(ha, &gdth_instances, list)
> - gdth_remove_one(ha);
> + unregister_chrdev(major,"gdth");
> + unregister_reboot_notifier(&gdth_notifier);
>
> #ifdef GDTH_STATISTICS
> - del_timer(&gdth_timer);
> + del_timer_sync(&gdth_timer);
> #endif
> - unregister_chrdev(major,"gdth");
> - unregister_reboot_notifier(&gdth_notifier);
> +
> + list_for_each_entry(ha, &gdth_instances, list)
> + gdth_remove_one(ha);
> }
>
> module_init(gdth_init);

2008-02-13 09:06:27

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 9:06 +0200, Stefan Priebe - allied internet ag <[email protected]> wrote:
> Hello!
>
> I've tested this patch now - and it works fine. Now rmmod, halt and
> reboot also works.
>
> Stefan Priebe
>
This is grate news Stefan. Thank you very much for all your time
and effort, with out we could not have fixed all this.

Boaz

2008-02-13 10:50:33

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Tue, Feb 12 2008 at 19:40 +0200, Boaz Harrosh <[email protected]> wrote:
> gdth _exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
>
> del_timer_sync the timer before we delete the cards.
>
> NOT YET TESTED
>
> Signed-off-by: Boaz Harrosh <[email protected]>

Tested-by: Stefan Priebe <[email protected]>

> ---
> drivers/scsi/gdth.c | 15 ++++++++-------
> 1 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
> index 8eb78be..103280e 100644
> --- a/drivers/scsi/gdth.c
> +++ b/drivers/scsi/gdth.c
> @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data)
> gdth_ha_str *ha;
> ulong flags;
>
> + BUG_ON(list_empty(&gdth_instances));
> +
> ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
> spin_lock_irqsave(&ha->smp_lock, flags);
>
> @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha)
> ha->sdev = NULL;
> }
>
> - gdth_flush(ha);
> -
> if (shp->irq)
> free_irq(shp->irq,ha);
>
> @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void)
> {
> gdth_ha_str *ha;
>
> - list_for_each_entry(ha, &gdth_instances, list)
> - gdth_remove_one(ha);
> + unregister_chrdev(major,"gdth");
> + unregister_reboot_notifier(&gdth_notifier);
>
> #ifdef GDTH_STATISTICS
> - del_timer(&gdth_timer);
> + del_timer_sync(&gdth_timer);
> #endif
> - unregister_chrdev(major,"gdth");
> - unregister_reboot_notifier(&gdth_notifier);
> +
> + list_for_each_entry(ha, &gdth_instances, list)
> + gdth_remove_one(ha);
> }
>
> module_init(gdth_init);

James please put this patch in rc-fixes also. It has now been tested
by few people, and it solves a reproducible problem in the unloading
of the driver.

It was not yet confirmed by Andrew's reporter with the:
+ if (list_empty(&gdth_instances))
+ return;

at gdth_timer() In -mm tree. In my patch I have converted the if() to a
BUG_ON because now it should not happen. But I figure it is not worse then
what there is now, which is nothing.

With your recommendation I will push both patches to the stable branches
People have emailed me requesting it.

Thanks
Boaz

2008-02-13 15:45:09

by James Bottomley

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> - gdth_flush(ha);
> -

This piece doesn't look right. gdth_flush() forces the internal cache
to disk backing. If you remove it, you're taking the chance that the
machine will be powered off without a writeback which can cause data
corruption.

James

2008-02-13 15:57:54

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>> - gdth_flush(ha);
>> -
>
> This piece doesn't look right. gdth_flush() forces the internal cache
> to disk backing. If you remove it, you're taking the chance that the
> machine will be powered off without a writeback which can cause data
> corruption.
>
> James
>
Yes.
I have more problems reported, with exit, and am just sending one more patch that puts
this back in. Which was tested.

So I will resend this one plus one new one.

Boaz

2008-02-13 16:35:32

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[email protected]> wrote:
> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>> - gdth_flush(ha);
>>> -
>> This piece doesn't look right. gdth_flush() forces the internal cache
>> to disk backing. If you remove it, you're taking the chance that the
>> machine will be powered off without a writeback which can cause data
>> corruption.
>>
>> James
>>
> Yes.
> I have more problems reported, with exit, and am just sending one more patch that puts
> this back in. Which was tested.
>
> So I will resend this one plus one new one.
>
> Boaz
>

The gdth driver would do a register_reboot_notifier(&gdth_notifier);
to a gdth_halt() function, which would then redo half of what gdth_exit
does, and wrongly so, and crash.

Are we guaranteed in todays kernel that modules .exit function be called
on an halt or reboot? If so then there is no need for duplications and
the gdth_halt() should go.

Submitted a patch that replaces the previous one I submitted with a deeper
fix.
[PATCH] gdth: bugfix for the at-exit problems

If you ask me this all gdth_flush() is a crackup. sd and scsi-ml are doing
scsi FLUSH commands when ever is needed. The controller as no business caching
data in memory longer then what is stated in standard. Raid controller or no raid
controller. Virtual or not virtual device. Data on Plate means data on plate.
What if there is a power outage? what the driver can do then?

Boaz

2008-02-13 16:36:56

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH ver2] gdth: bugfix for the at-exit problems


gdth_exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.
So del_timer_sync the timer before we delete the cards.

also a reboot notifier function was registered but is
not needed anymore. And would crash. So remove gdth_halt
and the reboot notifier registration.

Signed-off-by: Boaz Harrosh <[email protected]>
---
drivers/scsi/gdth.c | 64 +++++++-------------------------------------------
1 files changed, 9 insertions(+), 55 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..8469fe6 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
unsigned int cmd, unsigned long arg);

static void gdth_flush(gdth_ha_str *ha);
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
struct gdth_cmndinfo *cmndinfo);
@@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
#include "gdth_proc.h"
#include "gdth_proc.c"

-/* notifier block to get a notify on system shutdown/halt/reboot */
-static struct notifier_block gdth_notifier = {
- gdth_halt, NULL, 0
-};
-static int notifier_disabled = 0;
-
static gdth_ha_str *gdth_find_ha(int hanum)
{
gdth_ha_str *ha;
@@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
gdth_ha_str *ha;
ulong flags;

+ BUG_ON(list_empty(&gdth_instances));
+
ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
spin_lock_irqsave(&ha->smp_lock, flags);

@@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
}
}

-/* shutdown routine */
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
-{
- gdth_ha_str *ha;
-#ifndef __alpha__
- gdth_cmd_str gdtcmd;
- char cmnd[MAX_COMMAND_SIZE];
-#endif
-
- if (notifier_disabled)
- return NOTIFY_OK;
-
- TRACE2(("gdth_halt() event %d\n",(int)event));
- if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
- return NOTIFY_DONE;
-
- notifier_disabled = 1;
- printk("GDT-HA: Flushing all host drives .. ");
- list_for_each_entry(ha, &gdth_instances, list) {
- gdth_flush(ha);
-
-#ifndef __alpha__
- /* controller reset */
- memset(cmnd, 0xff, MAX_COMMAND_SIZE);
- gdtcmd.BoardNode = LOCALBOARD;
- gdtcmd.Service = CACHESERVICE;
- gdtcmd.OpCode = GDT_RESET;
- TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
- gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
-#endif
- }
- printk("Done.\n");
-
-#ifdef GDTH_STATISTICS
- del_timer(&gdth_timer);
-#endif
- return NOTIFY_OK;
-}
-
/* configure lun */
static int gdth_slave_configure(struct scsi_device *sdev)
{
@@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)

scsi_remove_host(shp);

+ gdth_flush(ha);
+
if (ha->sdev) {
scsi_free_host_dev(ha->sdev);
ha->sdev = NULL;
}

- gdth_flush(ha);
-
if (shp->irq)
free_irq(shp->irq,ha);

@@ -5235,8 +5191,6 @@ static int __init gdth_init(void)
add_timer(&gdth_timer);
#endif
major = register_chrdev(0,"gdth", &gdth_fops);
- notifier_disabled = 0;
- register_reboot_notifier(&gdth_notifier);
gdth_polling = FALSE;
return 0;
}
@@ -5245,14 +5199,14 @@ static void __exit gdth_exit(void)
{
gdth_ha_str *ha;

- list_for_each_entry(ha, &gdth_instances, list)
- gdth_remove_one(ha);
+ unregister_chrdev(major,"gdth");

#ifdef GDTH_STATISTICS
- del_timer(&gdth_timer);
+ del_timer_sync(&gdth_timer);
#endif
- unregister_chrdev(major,"gdth");
- unregister_reboot_notifier(&gdth_notifier);
+
+ list_for_each_entry(ha, &gdth_instances, list)
+ gdth_remove_one(ha);
}

module_init(gdth_init);
--
1.5.3.3

2008-02-13 16:45:48

by James Bottomley

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash


On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[email protected]> wrote:
> > On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
> >> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>> - gdth_flush(ha);
> >>> -
> >> This piece doesn't look right. gdth_flush() forces the internal cache
> >> to disk backing. If you remove it, you're taking the chance that the
> >> machine will be powered off without a writeback which can cause data
> >> corruption.
> >>
> >> James
> >>
> > Yes.
> > I have more problems reported, with exit, and am just sending one more patch that puts
> > this back in. Which was tested.
> >
> > So I will resend this one plus one new one.
> >
> > Boaz
> >
>
> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> to a gdth_halt() function, which would then redo half of what gdth_exit
> does, and wrongly so, and crash.
>
> Are we guaranteed in todays kernel that modules .exit function be called
> on an halt or reboot? If so then there is no need for duplications and
> the gdth_halt() should go.

No. The __exit section is actually discardable if you promise never to
remove the module.

James

2008-02-13 16:51:56

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <[email protected]> wrote:
> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[email protected]> wrote:
>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>>>> - gdth_flush(ha);
>>>>> -
>>>> This piece doesn't look right. gdth_flush() forces the internal cache
>>>> to disk backing. If you remove it, you're taking the chance that the
>>>> machine will be powered off without a writeback which can cause data
>>>> corruption.
>>>>
>>>> James
>>>>
>>> Yes.
>>> I have more problems reported, with exit, and am just sending one more patch that puts
>>> this back in. Which was tested.
>>>
>>> So I will resend this one plus one new one.
>>>
>>> Boaz
>>>
>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
>> to a gdth_halt() function, which would then redo half of what gdth_exit
>> does, and wrongly so, and crash.
>>
>> Are we guaranteed in todays kernel that modules .exit function be called
>> on an halt or reboot? If so then there is no need for duplications and
>> the gdth_halt() should go.
>
> No. The __exit section is actually discardable if you promise never to
> remove the module.
>
I don't understand please explain.
What does a driver need to do if it needs a consistent shutdown retine?
module or built in? unload or shutdown?


> James
>
>

2008-02-13 17:05:26

by James Bottomley

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <[email protected]> wrote:
> > On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> >> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[email protected]> wrote:
> >>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
> >>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>>>> - gdth_flush(ha);
> >>>>> -
> >>>> This piece doesn't look right. gdth_flush() forces the internal cache
> >>>> to disk backing. If you remove it, you're taking the chance that the
> >>>> machine will be powered off without a writeback which can cause data
> >>>> corruption.
> >>>>
> >>>> James
> >>>>
> >>> Yes.
> >>> I have more problems reported, with exit, and am just sending one more patch that puts
> >>> this back in. Which was tested.
> >>>
> >>> So I will resend this one plus one new one.
> >>>
> >>> Boaz
> >>>
> >> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> >> to a gdth_halt() function, which would then redo half of what gdth_exit
> >> does, and wrongly so, and crash.
> >>
> >> Are we guaranteed in todays kernel that modules .exit function be called
> >> on an halt or reboot? If so then there is no need for duplications and
> >> the gdth_halt() should go.
> >
> > No. The __exit section is actually discardable if you promise never to
> > remove the module.
> >
> I don't understand please explain.
> What does a driver need to do if it needs a consistent shutdown retine?
> module or built in? unload or shutdown?

It needs to register a reboot notifier, which gdth does.

However, the notifier is only called on reboot, so it also needs to
clean up correctly on module exit as well.

The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
command. That's done by a shutdown notifier from sd, so the correct
thing would always get done; however it does mean the driver has to be
in a condition to process the last sync cache command.

For the quick fix, just keep the current infrastructure and put back the
gdth_flush() command where it can be effective.

James

2008-02-13 17:14:09

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <[email protected]> wrote:
> On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
>> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <[email protected]> wrote:
>>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
>>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[email protected]> wrote:
>>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
>>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>>>>>> - gdth_flush(ha);
>>>>>>> -
>>>>>> This piece doesn't look right. gdth_flush() forces the internal cache
>>>>>> to disk backing. If you remove it, you're taking the chance that the
>>>>>> machine will be powered off without a writeback which can cause data
>>>>>> corruption.
>>>>>>
>>>>>> James
>>>>>>
>>>>> Yes.
>>>>> I have more problems reported, with exit, and am just sending one more patch that puts
>>>>> this back in. Which was tested.
>>>>>
>>>>> So I will resend this one plus one new one.
>>>>>
>>>>> Boaz
>>>>>
>>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
>>>> to a gdth_halt() function, which would then redo half of what gdth_exit
>>>> does, and wrongly so, and crash.
>>>>
>>>> Are we guaranteed in todays kernel that modules .exit function be called
>>>> on an halt or reboot? If so then there is no need for duplications and
>>>> the gdth_halt() should go.
>>> No. The __exit section is actually discardable if you promise never to
>>> remove the module.
>>>
>> I don't understand please explain.
>> What does a driver need to do if it needs a consistent shutdown retine?
>> module or built in? unload or shutdown?
>
> It needs to register a reboot notifier, which gdth does.
>
> However, the notifier is only called on reboot, so it also needs to
> clean up correctly on module exit as well.
>
> The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
> command. That's done by a shutdown notifier from sd, so the correct
> thing would always get done; however it does mean the driver has to be
> in a condition to process the last sync cache command.
>
> For the quick fix, just keep the current infrastructure and put back the
> gdth_flush() command where it can be effective.
>
> James
>
>
Totally untested.

---
From: Boaz Harrosh <[email protected]>
Subject: [PATCH] gdth: bugfix for the at-exit problems

gdth_exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.
So del_timer_sync the timer before we delete the cards.

also the reboot notifier function would crash. So unify
the exit and halt functions with a gdth_shutdown() that's
called by both.

Signed-off-by: Boaz Harrosh <[email protected]>
---
drivers/scsi/gdth.c | 99 ++++++++++++++++++++------------------------------
1 files changed, 40 insertions(+), 59 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..7bb9b45 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
unsigned int cmd, unsigned long arg);

static void gdth_flush(gdth_ha_str *ha);
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
struct gdth_cmndinfo *cmndinfo);
@@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
#include "gdth_proc.h"
#include "gdth_proc.c"

-/* notifier block to get a notify on system shutdown/halt/reboot */
-static struct notifier_block gdth_notifier = {
- gdth_halt, NULL, 0
-};
-static int notifier_disabled = 0;
-
static gdth_ha_str *gdth_find_ha(int hanum)
{
gdth_ha_str *ha;
@@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
gdth_ha_str *ha;
ulong flags;

+ BUG_ON(list_empty(&gdth_instances));
+
ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
spin_lock_irqsave(&ha->smp_lock, flags);

@@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
}
}

-/* shutdown routine */
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
-{
- gdth_ha_str *ha;
-#ifndef __alpha__
- gdth_cmd_str gdtcmd;
- char cmnd[MAX_COMMAND_SIZE];
-#endif
-
- if (notifier_disabled)
- return NOTIFY_OK;
-
- TRACE2(("gdth_halt() event %d\n",(int)event));
- if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
- return NOTIFY_DONE;
-
- notifier_disabled = 1;
- printk("GDT-HA: Flushing all host drives .. ");
- list_for_each_entry(ha, &gdth_instances, list) {
- gdth_flush(ha);
-
-#ifndef __alpha__
- /* controller reset */
- memset(cmnd, 0xff, MAX_COMMAND_SIZE);
- gdtcmd.BoardNode = LOCALBOARD;
- gdtcmd.Service = CACHESERVICE;
- gdtcmd.OpCode = GDT_RESET;
- TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
- gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
-#endif
- }
- printk("Done.\n");
-
-#ifdef GDTH_STATISTICS
- del_timer(&gdth_timer);
-#endif
- return NOTIFY_OK;
-}
-
/* configure lun */
static int gdth_slave_configure(struct scsi_device *sdev)
{
@@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)

scsi_remove_host(shp);

+ gdth_flush(ha);
+
if (ha->sdev) {
scsi_free_host_dev(ha->sdev);
ha->sdev = NULL;
}

- gdth_flush(ha);
-
if (shp->irq)
free_irq(shp->irq,ha);

@@ -5173,6 +5129,40 @@ static void gdth_remove_one(gdth_ha_str *ha)
scsi_host_put(shp);
}

+static void gdth_shutdown(void);
+static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
+{
+ TRACE2(("gdth_halt() event %d\n",(int)event));
+ if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
+ return NOTIFY_DONE;
+
+ gdth_shutdown();
+ return NOTIFY_OK;
+}
+
+static struct notifier_block gdth_notifier = {
+ gdth_halt, NULL, 0
+};
+
+bool gdth_shutdown_done;
+static void gdth_shutdown()
+{
+ gdth_ha_str *ha;
+ if (gdth_shutdown_done)
+ return;
+
+ gdth_shutdown_done = true;
+ unregister_chrdev(major,"gdth");
+ unregister_reboot_notifier(&gdth_notifier);
+
+#ifdef GDTH_STATISTICS
+ del_timer_sync(&gdth_timer);
+#endif
+
+ list_for_each_entry(ha, &gdth_instances, list)
+ gdth_remove_one(ha);
+}
+
static int __init gdth_init(void)
{
if (disable) {
@@ -5185,6 +5175,7 @@ static int __init gdth_init(void)
GDTH_VERSION_STR);

/* initializations */
+ gdth_shutdown_done = false;
gdth_polling = TRUE;
gdth_clear_events();

@@ -5235,7 +5226,6 @@ static int __init gdth_init(void)
add_timer(&gdth_timer);
#endif
major = register_chrdev(0,"gdth", &gdth_fops);
- notifier_disabled = 0;
register_reboot_notifier(&gdth_notifier);
gdth_polling = FALSE;
return 0;
@@ -5243,16 +5233,7 @@ static int __init gdth_init(void)

static void __exit gdth_exit(void)
{
- gdth_ha_str *ha;
-
- list_for_each_entry(ha, &gdth_instances, list)
- gdth_remove_one(ha);
-
-#ifdef GDTH_STATISTICS
- del_timer(&gdth_timer);
-#endif
- unregister_chrdev(major,"gdth");
- unregister_reboot_notifier(&gdth_notifier);
+ gdth_shutdown();
}

module_init(gdth_init);
--
1.5.3.3

2008-02-13 17:20:58

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <[email protected]> wrote:
> On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
>> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <[email protected]> wrote:
>>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
>>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[email protected]> wrote:
>>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
>>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>>>>>> - gdth_flush(ha);
>>>>>>> -
>>>>>> This piece doesn't look right. gdth_flush() forces the internal cache
>>>>>> to disk backing. If you remove it, you're taking the chance that the
>>>>>> machine will be powered off without a writeback which can cause data
>>>>>> corruption.
>>>>>>
>>>>>> James
>>>>>>
>>>>> Yes.
>>>>> I have more problems reported, with exit, and am just sending one more patch that puts
>>>>> this back in. Which was tested.
>>>>>
>>>>> So I will resend this one plus one new one.
>>>>>
>>>>> Boaz
>>>>>
>>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
>>>> to a gdth_halt() function, which would then redo half of what gdth_exit
>>>> does, and wrongly so, and crash.
>>>>
>>>> Are we guaranteed in todays kernel that modules .exit function be called
>>>> on an halt or reboot? If so then there is no need for duplications and
>>>> the gdth_halt() should go.
>>> No. The __exit section is actually discardable if you promise never to
>>> remove the module.
>>>
>> I don't understand please explain.
>> What does a driver need to do if it needs a consistent shutdown retine?
>> module or built in? unload or shutdown?
>
> It needs to register a reboot notifier, which gdth does.
>
> However, the notifier is only called on reboot, so it also needs to
> clean up correctly on module exit as well.
>
> The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE

Why would we think that the controller does not support this command
is it not in the mandatory section of the standard?

> command. That's done by a shutdown notifier from sd, so the correct
> thing would always get done; however it does mean the driver has to be
> in a condition to process the last sync cache command.

Why would it not be ready? what do other drivers do?
The drivers is ready until the very last module's .exit. Is that good
enough?

>
> For the quick fix, just keep the current infrastructure and put back the
> gdth_flush() command where it can be effective.
>

Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE
command and not that boot notifier thing. Please advise.

> James
>
>

Boaz

2008-02-13 17:34:34

by James Bottomley

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, 2008-02-13 at 19:18 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <[email protected]> wrote:
> > It needs to register a reboot notifier, which gdth does.
> >
> > However, the notifier is only called on reboot, so it also needs to
> > clean up correctly on module exit as well.
> >
> > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
>
> Why would we think that the controller does not support this command
> is it not in the mandatory section of the standard?

Um, because the controller isn't a SCSI device. It's an emulated device
which means the SCSI comands are processed in the driver. It does look
like the driver<->HBA communication is some sort of translated dialect
of SCSI.

> > command. That's done by a shutdown notifier from sd, so the correct
> > thing would always get done; however it does mean the driver has to be
> > in a condition to process the last sync cache command.
>
> Why would it not be ready? what do other drivers do?
> The drivers is ready until the very last module's .exit. Is that good
> enough?

shutdown is called as part of device removal and module unload ...
usually from scsi_remove_host(). So you can't tear down command
processing before that point.

> >
> > For the quick fix, just keep the current infrastructure and put back the
> > gdth_flush() command where it can be effective.
> >
>
> Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE
> command and not that boot notifier thing. Please advise.

I think such a change, though desirable, would be too large to count as
a bug fix.

James

2008-02-13 17:37:09

by James Bottomley

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, 2008-02-13 at 19:12 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <[email protected]> wrote:
> > On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
> >> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <[email protected]> wrote:
> >>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> >>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[email protected]> wrote:
> >>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[email protected]> wrote:
> >>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>>>>>> - gdth_flush(ha);
> >>>>>>> -
> >>>>>> This piece doesn't look right. gdth_flush() forces the internal cache
> >>>>>> to disk backing. If you remove it, you're taking the chance that the
> >>>>>> machine will be powered off without a writeback which can cause data
> >>>>>> corruption.
> >>>>>>
> >>>>>> James
> >>>>>>
> >>>>> Yes.
> >>>>> I have more problems reported, with exit, and am just sending one more patch that puts
> >>>>> this back in. Which was tested.
> >>>>>
> >>>>> So I will resend this one plus one new one.
> >>>>>
> >>>>> Boaz
> >>>>>
> >>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> >>>> to a gdth_halt() function, which would then redo half of what gdth_exit
> >>>> does, and wrongly so, and crash.
> >>>>
> >>>> Are we guaranteed in todays kernel that modules .exit function be called
> >>>> on an halt or reboot? If so then there is no need for duplications and
> >>>> the gdth_halt() should go.
> >>> No. The __exit section is actually discardable if you promise never to
> >>> remove the module.
> >>>
> >> I don't understand please explain.
> >> What does a driver need to do if it needs a consistent shutdown retine?
> >> module or built in? unload or shutdown?
> >
> > It needs to register a reboot notifier, which gdth does.
> >
> > However, the notifier is only called on reboot, so it also needs to
> > clean up correctly on module exit as well.
> >
> > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
> > command. That's done by a shutdown notifier from sd, so the correct
> > thing would always get done; however it does mean the driver has to be
> > in a condition to process the last sync cache command.
> >
> > For the quick fix, just keep the current infrastructure and put back the
> > gdth_flush() command where it can be effective.
> >
> > James
> >
> >
> Totally untested.
>
> ---
> From: Boaz Harrosh <[email protected]>
> Subject: [PATCH] gdth: bugfix for the at-exit problems
>
> gdth_exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
> So del_timer_sync the timer before we delete the cards.
>
> also the reboot notifier function would crash. So unify
> the exit and halt functions with a gdth_shutdown() that's
> called by both.
>
> Signed-off-by: Boaz Harrosh <[email protected]>
> ---
> drivers/scsi/gdth.c | 99 ++++++++++++++++++++------------------------------
> 1 files changed, 40 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
> index 8eb78be..7bb9b45 100644
> --- a/drivers/scsi/gdth.c
> +++ b/drivers/scsi/gdth.c
> @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
> unsigned int cmd, unsigned long arg);
>
> static void gdth_flush(gdth_ha_str *ha);
> -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
> static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
> static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
> struct gdth_cmndinfo *cmndinfo);
> @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
> #include "gdth_proc.h"
> #include "gdth_proc.c"
>
> -/* notifier block to get a notify on system shutdown/halt/reboot */
> -static struct notifier_block gdth_notifier = {
> - gdth_halt, NULL, 0
> -};
> -static int notifier_disabled = 0;
> -
> static gdth_ha_str *gdth_find_ha(int hanum)
> {
> gdth_ha_str *ha;
> @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
> gdth_ha_str *ha;
> ulong flags;
>
> + BUG_ON(list_empty(&gdth_instances));
> +
> ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
> spin_lock_irqsave(&ha->smp_lock, flags);
>
> @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
> }
> }
>
> -/* shutdown routine */
> -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
> -{
> - gdth_ha_str *ha;
> -#ifndef __alpha__
> - gdth_cmd_str gdtcmd;
> - char cmnd[MAX_COMMAND_SIZE];
> -#endif
> -
> - if (notifier_disabled)
> - return NOTIFY_OK;
> -
> - TRACE2(("gdth_halt() event %d\n",(int)event));
> - if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
> - return NOTIFY_DONE;
> -
> - notifier_disabled = 1;
> - printk("GDT-HA: Flushing all host drives .. ");
> - list_for_each_entry(ha, &gdth_instances, list) {
> - gdth_flush(ha);
> -
> -#ifndef __alpha__
> - /* controller reset */
> - memset(cmnd, 0xff, MAX_COMMAND_SIZE);
> - gdtcmd.BoardNode = LOCALBOARD;
> - gdtcmd.Service = CACHESERVICE;
> - gdtcmd.OpCode = GDT_RESET;
> - TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
> - gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
> -#endif
> - }
> - printk("Done.\n");
> -
> -#ifdef GDTH_STATISTICS
> - del_timer(&gdth_timer);
> -#endif
> - return NOTIFY_OK;
> -}
> -
> /* configure lun */
> static int gdth_slave_configure(struct scsi_device *sdev)
> {
> @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)
>
> scsi_remove_host(shp);
>
> + gdth_flush(ha);
> +
> if (ha->sdev) {
> scsi_free_host_dev(ha->sdev);
> ha->sdev = NULL;
> }
>
> - gdth_flush(ha);
> -
> if (shp->irq)
> free_irq(shp->irq,ha);
>
> @@ -5173,6 +5129,40 @@ static void gdth_remove_one(gdth_ha_str *ha)
> scsi_host_put(shp);
> }
>
> +static void gdth_shutdown(void);
> +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
> +{
> + TRACE2(("gdth_halt() event %d\n",(int)event));
> + if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
> + return NOTIFY_DONE;
> +
> + gdth_shutdown();
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block gdth_notifier = {
> + gdth_halt, NULL, 0
> +};
> +
> +bool gdth_shutdown_done;

Static police alert! Just make it static and move it into
gdth_shutdown()

> +static void gdth_shutdown()
> +{
> + gdth_ha_str *ha;
> + if (gdth_shutdown_done)
> + return;
> +
> + gdth_shutdown_done = true;
> + unregister_chrdev(major,"gdth");
> + unregister_reboot_notifier(&gdth_notifier);

I'm not sure you can do this, aren't reboot notifiers called with the
rwsem held? In which case the unregister which also takes the rwsem
will hang the system.

James

2008-02-13 19:39:14

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash


On Feb 13 2008 11:03, Boaz Harrosh wrote:
>>
>> I've tested this patch now - and it works fine. Now rmmod, halt and
>> reboot also works.
>>
>> Stefan Priebe
>>
>This is grate news Stefan. Thank you very much for all your time
>and effort, with out we could not have fixed all this.

Do you have a git tree with the latest pieces?

2008-02-14 06:51:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13, 2008 at 11:03:45AM -0600, James Bottomley wrote:
> > I don't understand please explain.
> > What does a driver need to do if it needs a consistent shutdown retine?
> > module or built in? unload or shutdown?
>
> It needs to register a reboot notifier, which gdth does.

Well, for crappy legacy driver that's the way, but it's not really
recommended. As soon as a driver uses the proper driver models,
e.g. gdth for pci using Jeff's pci hotplug patches it can just
implement the ->shutdown method that is called before shutdown/kexec
and can do the right thing.

2008-02-14 10:51:48

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 19:36 +0200, James Bottomley <[email protected]> wrote:
<snip>
>> ---
>> From: Boaz Harrosh <[email protected]>
>> Subject: [PATCH] gdth: bugfix for the at-exit problems
>>
>> gdth_exit would first remove all cards then stop the timer
>> and would not sync with the timer function. This caused a crash
>> in gdth_timer() when module was unloaded.
>> So del_timer_sync the timer before we delete the cards.
>>
>> also the reboot notifier function would crash. So unify
>> the exit and halt functions with a gdth_shutdown() that's
>> called by both.
>>
>> Signed-off-by: Boaz Harrosh <[email protected]>
>> ---
<snip>
>> +static struct notifier_block gdth_notifier = {
>> + gdth_halt, NULL, 0
>> +};
>> +
>> +bool gdth_shutdown_done;
>

right forgot the static. But I use it in gdth_init(), so it
must be external. Unless you promise me that gdth_init() will
never ever be called after a call to shutdown.
Any way the hot-plug patch changes all that. This is only
for 2.6.24 bugfixs.

> Static police alert! Just make it static and move it into
> gdth_shutdown()
>
>> +static void gdth_shutdown()
>> +{
>> + gdth_ha_str *ha;
>> + if (gdth_shutdown_done)
>> + return;
>> +
>> + gdth_shutdown_done = true;
>> + unregister_chrdev(major,"gdth");
>> + unregister_reboot_notifier(&gdth_notifier);
>
> I'm not sure you can do this, aren't reboot notifiers called with the
> rwsem held? In which case the unregister which also takes the rwsem
> will hang the system.
>
humm, can't remove a notifier from within the notifier. Thanks James for
the catch, it's what happens when you don't test your own patches.

I have moved unregister_reboot_notifier to gdth_exit.
> James
>

Will send a new version for review. Please note that this is a bugfix patch
on top of 2.6.24. It is not needed for Jeff's hot-plug path.

There will be one more bugfix patch for a crash at the user-mode ioctl code.

Boaz

2008-02-14 12:00:28

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH] gdth: bugfix for the at-exit problems


This is a bugfix for the 2.6.24.x stable releases.

gdth_exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.
So del_timer_sync the timer before we delete the cards.

also the reboot notifier function would crash. So unify
the exit and halt functions with a gdth_shutdown() that's
called by both.

Signed-off-by: Boaz Harrosh <[email protected]>
---
drivers/scsi/gdth.c | 90 ++++++++++++++++++++------------------------------
1 files changed, 36 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..3828b23 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
unsigned int cmd, unsigned long arg);

static void gdth_flush(gdth_ha_str *ha);
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
struct gdth_cmndinfo *cmndinfo);
@@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
#include "gdth_proc.h"
#include "gdth_proc.c"

-/* notifier block to get a notify on system shutdown/halt/reboot */
-static struct notifier_block gdth_notifier = {
- gdth_halt, NULL, 0
-};
-static int notifier_disabled = 0;
-
static gdth_ha_str *gdth_find_ha(int hanum)
{
gdth_ha_str *ha;
@@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
gdth_ha_str *ha;
ulong flags;

+ BUG_ON(list_empty(&gdth_instances));
+
ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
spin_lock_irqsave(&ha->smp_lock, flags);

@@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
}
}

-/* shutdown routine */
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
-{
- gdth_ha_str *ha;
-#ifndef __alpha__
- gdth_cmd_str gdtcmd;
- char cmnd[MAX_COMMAND_SIZE];
-#endif
-
- if (notifier_disabled)
- return NOTIFY_OK;
-
- TRACE2(("gdth_halt() event %d\n",(int)event));
- if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
- return NOTIFY_DONE;
-
- notifier_disabled = 1;
- printk("GDT-HA: Flushing all host drives .. ");
- list_for_each_entry(ha, &gdth_instances, list) {
- gdth_flush(ha);
-
-#ifndef __alpha__
- /* controller reset */
- memset(cmnd, 0xff, MAX_COMMAND_SIZE);
- gdtcmd.BoardNode = LOCALBOARD;
- gdtcmd.Service = CACHESERVICE;
- gdtcmd.OpCode = GDT_RESET;
- TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
- gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
-#endif
- }
- printk("Done.\n");
-
-#ifdef GDTH_STATISTICS
- del_timer(&gdth_timer);
-#endif
- return NOTIFY_OK;
-}
-
/* configure lun */
static int gdth_slave_configure(struct scsi_device *sdev)
{
@@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)

scsi_remove_host(shp);

+ gdth_flush(ha);
+
if (ha->sdev) {
scsi_free_host_dev(ha->sdev);
ha->sdev = NULL;
}

- gdth_flush(ha);
-
if (shp->irq)
free_irq(shp->irq,ha);

@@ -5173,6 +5129,22 @@ static void gdth_remove_one(gdth_ha_str *ha)
scsi_host_put(shp);
}

+static void gdth_shutdown(void);
+static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
+{
+ TRACE2(("gdth_halt() event %d\n", (int)event));
+ if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
+ return NOTIFY_DONE;
+
+ gdth_shutdown();
+ return NOTIFY_OK;
+}
+
+static struct notifier_block gdth_notifier = {
+ gdth_halt, NULL, 0
+};
+static bool gdth_shutdown_done;
+
static int __init gdth_init(void)
{
if (disable) {
@@ -5185,6 +5157,7 @@ static int __init gdth_init(void)
GDTH_VERSION_STR);

/* initializations */
+ gdth_shutdown_done = false;
gdth_polling = TRUE;
gdth_clear_events();

@@ -5235,23 +5208,32 @@ static int __init gdth_init(void)
add_timer(&gdth_timer);
#endif
major = register_chrdev(0,"gdth", &gdth_fops);
- notifier_disabled = 0;
register_reboot_notifier(&gdth_notifier);
gdth_polling = FALSE;
return 0;
}

-static void __exit gdth_exit(void)
+static void gdth_shutdown()
{
gdth_ha_str *ha;

- list_for_each_entry(ha, &gdth_instances, list)
- gdth_remove_one(ha);
+ if (gdth_shutdown_done)
+ return;
+
+ gdth_shutdown_done = true;
+ unregister_chrdev(major, "gdth");

#ifdef GDTH_STATISTICS
- del_timer(&gdth_timer);
+ del_timer_sync(&gdth_timer);
#endif
- unregister_chrdev(major,"gdth");
+
+ list_for_each_entry(ha, &gdth_instances, list)
+ gdth_remove_one(ha);
+}
+
+static void __exit gdth_exit(void)
+{
+ gdth_shutdown();
unregister_reboot_notifier(&gdth_notifier);
}

--
1.5.3.3

2008-02-14 15:59:55

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

On Wed, Feb 13 2008 at 21:38 +0200, Jan Engelhardt <[email protected]> wrote:
> On Feb 13 2008 11:03, Boaz Harrosh wrote:
>>> I've tested this patch now - and it works fine. Now rmmod, halt and
>>> reboot also works.
>>>
>>> Stefan Priebe
>>>
>> This is grate news Stefan. Thank you very much for all your time
>> and effort, with out we could not have fixed all this.
>
> Do you have a git tree with the latest pieces?
No, scsi-misc I guess ;)

I could put it here:
git://git.bhalevy.com/open-osd gdth

branch give me an hours

Boaz

2008-02-14 16:11:08

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] gdth: bugfix for the at-exit problems

On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote:
> This is a bugfix for the 2.6.24.x stable releases.
>
> gdth_exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
> So del_timer_sync the timer before we delete the cards.
>
> also the reboot notifier function would crash. So unify
> the exit and halt functions with a gdth_shutdown() that's
> called by both.

The patch looks fine now, thanks. Can we actually get a tester just to
make sure there's nothing I missed.

James

2008-02-14 16:20:18

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH] gdth: bugfix for the at-exit problems

On Thu, Feb 14 2008 at 18:10 +0200, James Bottomley <[email protected]> wrote:
> On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote:
>> This is a bugfix for the 2.6.24.x stable releases.
>>
>> gdth_exit would first remove all cards then stop the timer
>> and would not sync with the timer function. This caused a crash
>> in gdth_timer() when module was unloaded.
>> So del_timer_sync the timer before we delete the cards.
>>
>> also the reboot notifier function would crash. So unify
>> the exit and halt functions with a gdth_shutdown() that's
>> called by both.
>
> The patch looks fine now, thanks. Can we actually get a tester just to
> make sure there's nothing I missed.
>
> James
>
>

Yes, and the tester reported, a breakage. We are on it.
Apparently, you cannot do a full deallocation of resources
at reboot notifier, nor would you want to I guess.

But you can do the flush. The exit call is never called
on a reboot and the card access is valid to the end.

Please comment?

So I pretty much reverted that patch, but did leave some
cleanups.

Also we found the other problems reported with user-mode tools
and cat /proc/sys/gdth/0

so 2 patches on the way above reverted. Give us a few ours to test
every thing.

Thanks
Boaz