2007-05-30 10:12:25

by Geller Sandor

[permalink] [raw]
Subject: HPT374 IDE problem with 2.6.21.* kernels

Hi,

I saw a similar report yesterday with '2.6.21.1 - 97% wait time on IDE
operations' subject.

After upgrading from 2.6.20.7 kernel to 2.6.21.1 my system started to
reset infrequenly the IDE bus. In the syslog DMA timeout, resetting IDE
bus messages appeared. I've changed the two disks attached to the HPT374
controller, and always the first disk had problems. I've replaced cables,
plugged the disks into other IDE ports, but it was only a matter of time
to experience an IDE reset. When I upgraded to 2.6.21.3 the resets became
much more frequent, this time even DMA was disabled too on the first disk.
I turned DMA back Manually with hdparm, and a few seconds of intense IO
activity resulted in another IDE reset.

Reverting back to 2.6.20.12 the problem seems to be gone. BTW I'm using
the PATA driver for the HTP374, not the libata one.

Is this a known problem/ is there a way I can help locating the cause of
the problem?

Regards,

Geller Sandor <[email protected]>


2007-06-01 20:46:31

by Andrew Morton

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

On Wed, 30 May 2007 11:30:00 +0200 (CEST)
Geller Sandor <[email protected]> wrote:

> Hi,
>
> I saw a similar report yesterday with '2.6.21.1 - 97% wait time on IDE
> operations' subject.
>
> After upgrading from 2.6.20.7 kernel to 2.6.21.1 my system started to
> reset infrequenly the IDE bus. In the syslog DMA timeout, resetting IDE
> bus messages appeared. I've changed the two disks attached to the HPT374
> controller, and always the first disk had problems. I've replaced cables,
> plugged the disks into other IDE ports, but it was only a matter of time
> to experience an IDE reset. When I upgraded to 2.6.21.3 the resets became
> much more frequent, this time even DMA was disabled too on the first disk.
> I turned DMA back Manually with hdparm, and a few seconds of intense IO
> activity resulted in another IDE reset.
>
> Reverting back to 2.6.20.12 the problem seems to be gone. BTW I'm using
> the PATA driver for the HTP374, not the libata one.
>
> Is this a known problem/ is there a way I can help locating the cause of
> the problem?
>

(cc's added)

2007-06-01 20:52:37

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Hello.

Andrew Morton wrote:

>>I saw a similar report yesterday with '2.6.21.1 - 97% wait time on IDE
>>operations' subject.

>>After upgrading from 2.6.20.7 kernel to 2.6.21.1 my system started to
>>reset infrequenly the IDE bus. In the syslog DMA timeout, resetting IDE
>>bus messages appeared. I've changed the two disks attached to the HPT374
>>controller, and always the first disk had problems. I've replaced cables,
>>plugged the disks into other IDE ports, but it was only a matter of time
>>to experience an IDE reset. When I upgraded to 2.6.21.3 the resets became
>>much more frequent, this time even DMA was disabled too on the first disk.
>>I turned DMA back Manually with hdparm, and a few seconds of intense IO
>>activity resulted in another IDE reset.

>>Reverting back to 2.6.20.12 the problem seems to be gone. BTW I'm using
>>the PATA driver for the HTP374, not the libata one.

>>Is this a known problem/ is there a way I can help locating the cause of
>>the problem?

Yes, please post the boot log and the IDE reset log too for starters...

> (cc's added)

WBR, Sergei

2007-06-01 21:14:20

by Geller Sandor

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

On Sat, 2 Jun 2007, Sergei Shtylyov wrote:

> Hello.
>
> Andrew Morton wrote:
>
>>> I saw a similar report yesterday with '2.6.21.1 - 97% wait time on IDE
>>> operations' subject.
>
>>> After upgrading from 2.6.20.7 kernel to 2.6.21.1 my system started to
>>> reset infrequenly the IDE bus. In the syslog DMA timeout, resetting IDE
>>> bus messages appeared. I've changed the two disks attached to the HPT374
>>> controller, and always the first disk had problems. I've replaced cables,
>>> plugged the disks into other IDE ports, but it was only a matter of time
>>> to experience an IDE reset. When I upgraded to 2.6.21.3 the resets became
>>> much more frequent, this time even DMA was disabled too on the first disk.
>>> I turned DMA back Manually with hdparm, and a few seconds of intense IO
>>> activity resulted in another IDE reset.
>
>>> Reverting back to 2.6.20.12 the problem seems to be gone. BTW I'm using
>>> the PATA driver for the HTP374, not the libata one.
>
>>> Is this a known problem/ is there a way I can help locating the cause of
>>> the problem?
>
> Yes, please post the boot log and the IDE reset log too for starters...
>
>> (cc's added)
>
> WBR, Sergei

Hi,

The log of a typical IDE reset is available here:

http://petra.hos.u-szeged.hu/~wildy/syslog.gz

This was the worst case: the IDE bus was resetted during the system boot.

Geller Sandor <[email protected]>

2007-06-01 21:24:58

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Hello.

Geller Sandor wrote:

>>>> I saw a similar report yesterday with '2.6.21.1 - 97% wait time on
>>>> IDE operations' subject.

>>>> After upgrading from 2.6.20.7 kernel to 2.6.21.1 my system started
>>>> to reset infrequenly the IDE bus. In the syslog DMA timeout,
>>>> resetting IDE bus messages appeared. I've changed the two disks
>>>> attached to the HPT374 controller, and always the first disk had
>>>> problems. I've replaced cables, plugged the disks into other IDE
>>>> ports, but it was only a matter of time to experience an IDE reset.
>>>> When I upgraded to 2.6.21.3 the resets became much more frequent,
>>>> this time even DMA was disabled too on the first disk. I turned DMA
>>>> back Manually with hdparm, and a few seconds of intense IO activity
>>>> resulted in another IDE reset.
>>
>>
>>>> Reverting back to 2.6.20.12 the problem seems to be gone. BTW I'm
>>>> using the PATA driver for the HTP374, not the libata one.
>>
>>
>>>> Is this a known problem/ is there a way I can help locating the
>>>> cause of the problem?

>> Yes, please post the boot log and the IDE reset log too for starters...

>>> (cc's added)

> The log of a typical IDE reset is available here:

> http://petra.hos.u-szeged.hu/~wildy/syslog.gz

> This was the worst case: the IDE bus was resetted during the system boot.

Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?

MBR, Sergei

2007-06-01 22:41:19

by Geller Sandor

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

On Sat, 2 Jun 2007, Sergei Shtylyov wrote:

>> The log of a typical IDE reset is available here:
>
>> http://petra.hos.u-szeged.hu/~wildy/syslog.gz
>
>> This was the worst case: the IDE bus was resetted during the system boot.
>
> Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
> drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?


Hi Sergei,

This looks promising. Using a vanilla 2.6.22-rc3 I was able to reproduce
the problem within a few seconds. With the above modification the machine
is running under heavy disk I/O without problems since 30 minutes...

Regards,

Geller Sandor <[email protected]>

Subject: Re: HPT374 IDE problem with 2.6.21.* kernels


Hi,

On Saturday 02 June 2007, Geller Sandor wrote:
> On Sat, 2 Jun 2007, Sergei Shtylyov wrote:
>
> >> The log of a typical IDE reset is available here:
> >
> >> http://petra.hos.u-szeged.hu/~wildy/syslog.gz
> >
> >> This was the worst case: the IDE bus was resetted during the system boot.
> >
> > Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
> > drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?
>
>
> Hi Sergei,
>
> This looks promising. Using a vanilla 2.6.22-rc3 I was able to reproduce
> the problem within a few seconds. With the above modification the machine
> is running under heavy disk I/O without problems since 30 minutes...

Did it fix the problem for good?

Sergei, do we need to disallow UDMA6 completely on HPT734 or
is it only an issue with some problematic devices (=> blacklist)?

Either way we need to fix it somehow for 2.6.22.

Thanks,
Bart

2007-06-03 10:37:51

by Geller Sandor

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Hello,

On Sun, 3 Jun 2007, Bartlomiej Zolnierkiewicz wrote:

>
> Hi,
>
> On Saturday 02 June 2007, Geller Sandor wrote:
>> On Sat, 2 Jun 2007, Sergei Shtylyov wrote:
>>
>>>> The log of a typical IDE reset is available here:
>>>
>>>> http://petra.hos.u-szeged.hu/~wildy/syslog.gz
>>>
>>>> This was the worst case: the IDE bus was resetted during the system boot.
>>>
>>> Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
>>> drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?
>>
>>
>> Hi Sergei,
>>
>> This looks promising. Using a vanilla 2.6.22-rc3 I was able to reproduce
>> the problem within a few seconds. With the above modification the machine
>> is running under heavy disk I/O without problems since 30 minutes...
>
> Did it fix the problem for good?

It seems so far. There hasn't been any problem since I've applied the fix.

> Sergei, do we need to disallow UDMA6 completely on HPT734 or
> is it only an issue with some problematic devices (=> blacklist)?
>
> Either way we need to fix it somehow for 2.6.22.

For the record: this HTP374 is running with a quite outdated firmware
(1.22) - maybe newer firmwares work correctly. I'm going to upgrade the
firmware to the latest one (which was released in 2004...), but
unfortunately in the upcoming 2-3 weeks I won't have access to this
machine, so I can't check the case within the release cycle of 2.6.22. If
you were interested I would post the result of the firmware upgrade.

Regards,

Sandor

2007-06-03 17:35:30

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Geller Sandor wrote:
Hello.

>>>>> The log of a typical IDE reset is available here:

>>>>> http://petra.hos.u-szeged.hu/~wildy/syslog.gz

>>>>> This was the worst case: the IDE bus was resetted during the system
>>>>> boot.

>>>> Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
>>>> drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?

>>> Hi Sergei,

>>> This looks promising. Using a vanilla 2.6.22-rc3 I was able to reproduce
>>> the problem within a few seconds. With the above modification the
>>> machine
>>> is running under heavy disk I/O without problems since 30 minutes...

>> Did it fix the problem for good?

> It seems so far. There hasn't been any problem since I've applied the fix.

>> Sergei, do we need to disallow UDMA6 completely on HPT734 or
>> is it only an issue with some problematic devices (=> blacklist)?

Note that I didn't change what the old code was doing in this regard --
although the HPT374 spec does *not* say that UDMA6 is supported, it had been
enabled. What have *really* changed for HPT374 was:

- in 2.6.20-rc1, the driver switched to using the actual 33 MHz timing table
instead of the old one, matching 50 MHz (and so, severely underclocked);

- in 2.6.2-rc1, the driver switched from 33 MHz PCI to 66 MHz DPLL clock.

Disallowing UDMA6 would clock the chip with 50 MHz DPLL, howewer, the
original report claimed that something has changed to worse between 2.6.21.1
and .3 but nothing changed in drivers/ide/ between those releases...

>> Either way we need to fix it somehow for 2.6.22.

> For the record: this HTP374 is running with a quite outdated firmware
> (1.22) - maybe newer firmwares work correctly. I'm going to upgrade the
> firmware to the latest one (which was released in 2004...), but
> unfortunately in the upcoming 2-3 weeks I won't have access to this
> machine, so I can't check the case within the release cycle of 2.6.22.
> If you were interested I would post the result of the firmware upgrade.

I don't think this will matter...

> Regards,
> Sandor

MBR, Sergei

Subject: Re: HPT374 IDE problem with 2.6.21.* kernels


Hello,

On Sunday 03 June 2007, Sergei Shtylyov wrote:
> Geller Sandor wrote:
> Hello.
>
> >>>>> The log of a typical IDE reset is available here:
>
> >>>>> http://petra.hos.u-szeged.hu/~wildy/syslog.gz
>
> >>>>> This was the worst case: the IDE bus was resetted during the system
> >>>>> boot.
>
> >>>> Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
> >>>> drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?
>
> >>> Hi Sergei,
>
> >>> This looks promising. Using a vanilla 2.6.22-rc3 I was able to reproduce
> >>> the problem within a few seconds. With the above modification the
> >>> machine
> >>> is running under heavy disk I/O without problems since 30 minutes...
>
> >> Did it fix the problem for good?
>
> > It seems so far. There hasn't been any problem since I've applied the fix.
>
> >> Sergei, do we need to disallow UDMA6 completely on HPT734 or
> >> is it only an issue with some problematic devices (=> blacklist)?
>
> Note that I didn't change what the old code was doing in this regard --
> although the HPT374 spec does *not* say that UDMA6 is supported, it had been
> enabled. What have *really* changed for HPT374 was:
>
> - in 2.6.20-rc1, the driver switched to using the actual 33 MHz timing table
> instead of the old one, matching 50 MHz (and so, severely underclocked);
>
> - in 2.6.2-rc1, the driver switched from 33 MHz PCI to 66 MHz DPLL clock.
>
> Disallowing UDMA6 would clock the chip with 50 MHz DPLL, howewer, the

I felt inspired by this explanation (thanks!) and took a look at
hpt374-opensource-v2.10 vendor driver. Here is something interesting:

glbdata.c:

...
#ifdef CLOCK_66MHZ
ULONG setting370_66[] = {
0xd029d5e, 0xd029d26, 0xc829ca6, 0xc829c84, 0xc829c62,
0x2c829d2c, 0x2c829c66, 0x2c829c62,
0x1c829c62, 0x1c9a9c62, 0x1c929c62, 0x1c8e9c62, 0x1c8a9c62,
0x1c8a9c62/*0x1cae9c62*/, 0x1c869c62, 0x1c869c62,
};
...

hpt366.c:

...
static u32 sixty_six_base_hpt37x[] = {
/* XFER_UDMA_6 */ 0x1c869c62,
/* XFER_UDMA_5 */ 0x1cae9c62, /* 0x1c8a9c62 */
...

So we are using Dual ATA Clock for UDMA5 whereas vendor driver doesn't
(the only other mode which uses Dual ATA Clock, in both drivers, is rarely
used UDMA3).

Thanks to this UDMA cycle time should be equal 22.5ns instead of 30ns
(spec defines it at 16.8ns, ide_timings[] uses 20ns) when using 66 MHz DPLL
clock. In theory everything should play nice but the data manual for HPT374
contains weird note that Dual ATA Clock is meant to implement ATA100 read
and write at different clocks (there is no more explanation to this).

Geller reported that the problems started after migrating from 2.6.20.7 to
2.6.21.1 (the affected disks are using UDMA5) and at the same time the driver
switched from 33 MHz PCI to 66 MHz DPLL clock. Also the issue is completely
fixed by using 50 MHz DPLL clock (UDMA5 timing for 50 MHz DPLL clock is
0x12848242 so UDMA cycle time equals 20ns and is smaller than the one
obtained using 66 MHz DPLL clock).

It all makes me wonder whether it is really safe to use Dual ATA Clock for
UDMA5 and whether we should just be using "the offical" timing instead...

Sergei?

> original report claimed that something has changed to worse between 2.6.21.1
> and .3 but nothing changed in drivers/ide/ between those releases...

It could be that md changes from 2.6.21.3 have influenced the situation
(by putting more stress on disks etc)...

Thanks,
Bart

2007-06-05 12:44:16

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Hello.

Bartlomiej Zolnierkiewicz wrote:

>>>>>>>The log of a typical IDE reset is available here:

>>>>>>>http://petra.hos.u-szeged.hu/~wildy/syslog.gz

>>>>>>>This was the worst case: the IDE bus was resetted during the system
>>>>>>>boot.

>>>>>> Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
>>>>>>drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?

>>>>>Hi Sergei,

>>>>>This looks promising. Using a vanilla 2.6.22-rc3 I was able to reproduce
>>>>>the problem within a few seconds. With the above modification the
>>>>>machine
>>>>>is running under heavy disk I/O without problems since 30 minutes...

>>>>Did it fix the problem for good?

>>>It seems so far. There hasn't been any problem since I've applied the fix.

>>>>Sergei, do we need to disallow UDMA6 completely on HPT734 or
>>>>is it only an issue with some problematic devices (=> blacklist)?

>> Note that I didn't change what the old code was doing in this regard --
>>although the HPT374 spec does *not* say that UDMA6 is supported, it had been
>>enabled. What have *really* changed for HPT374 was:

>>- in 2.6.20-rc1, the driver switched to using the actual 33 MHz timing table
>> instead of the old one, matching 50 MHz (and so, severely underclocked);

>>- in 2.6.2-rc1, the driver switched from 33 MHz PCI to 66 MHz DPLL clock.

>> Disallowing UDMA6 would clock the chip with 50 MHz DPLL, howewer, the

> I felt inspired by this explanation (thanks!) and took a look at
> hpt374-opensource-v2.10 vendor driver. Here is something interesting:

> glbdata.c:

> ...
> #ifdef CLOCK_66MHZ
> ULONG setting370_66[] = {
> 0xd029d5e, 0xd029d26, 0xc829ca6, 0xc829c84, 0xc829c62,
> 0x2c829d2c, 0x2c829c66, 0x2c829c62,
> 0x1c829c62, 0x1c9a9c62, 0x1c929c62, 0x1c8e9c62, 0x1c8a9c62,
> 0x1c8a9c62/*0x1cae9c62*/, 0x1c869c62, 0x1c869c62,
> };
> ...

> hpt366.c:

> ...
> static u32 sixty_six_base_hpt37x[] = {
> /* XFER_UDMA_6 */ 0x1c869c62,
> /* XFER_UDMA_5 */ 0x1cae9c62, /* 0x1c8a9c62 */
> ...

> So we are using Dual ATA Clock for UDMA5 whereas vendor driver doesn't

This is so in all other HPT drivers (and HPT371N datasheet has the same
figures -- this chip is the only one supporting UDMA6 and having the default
DPLL clock > 50 MHz). Note that it means that there's no actual UDMA5 since
the timing exactly matches that one used for UDMA4.

> (the only other mode which uses Dual ATA Clock, in both drivers, is rarely
> used UDMA3).

And UDMA4 with 50 MHz clock.

> Thanks to this UDMA cycle time should be equal 22.5ns instead of 30ns
> (spec defines it at 16.8ns, ide_timings[] uses 20ns) when using 66 MHz DPLL
> clock. In theory everything should play nice but the data manual for HPT374

And it does -- on other chips.

> contains weird note that Dual ATA Clock is meant to implement ATA100 read
> and write at different clocks (there is no more explanation to this).

That's the thing that keeps me confused in the other datasheets too --
from my interpretation of their timing figures it seemed to control 2x ATA
clock multipler. HPT370 datasheet just gives different timings and SCR2 values
for reads/writes in UDMA5 (I've disabled this mode on HPT370 from which the
read performance only gained -- not sure if it makes sense to restore the old
clock turnaround hack).

> Geller reported that the problems started after migrating from 2.6.20.7 to
> 2.6.21.1 (the affected disks are using UDMA5) and at the same time the driver
> switched from 33 MHz PCI to 66 MHz DPLL clock. Also the issue is completely
> fixed by using 50 MHz DPLL clock (UDMA5 timing for 50 MHz DPLL clock is
> 0x12848242 so UDMA cycle time equals 20ns and is smaller than the one
> obtained using 66 MHz DPLL clock).


> It all makes me wonder whether it is really safe to use Dual ATA Clock for
> UDMA5 and whether we should just be using "the offical" timing instead...

Not sure. I had no problems with this on the HPT371N/302 and 371N was
clocked by 66 MHz DPLL from the start (its default clock is 75 MHz however).
I'm still holding to my hypothesis that HPT374 simply can't tolerate 66
MHz DPLL clock, and the UDMA5 timing figures that you've cited seem to prove that.
I'm going to post a patch today -- how about completely prohibiting UDMA6
on HPT374?

> Thanks,
> Bart

WBR, Sergei

2007-06-05 14:13:36

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Hello, I wrote:

>> I felt inspired by this explanation (thanks!) and took a look at
>> hpt374-opensource-v2.10 vendor driver. Here is something interesting:

>> glbdata.c:

>> ...
>> #ifdef CLOCK_66MHZ
>> ULONG setting370_66[] = {
>> 0xd029d5e, 0xd029d26, 0xc829ca6, 0xc829c84, 0xc829c62,
>> 0x2c829d2c, 0x2c829c66, 0x2c829c62,
>> 0x1c829c62, 0x1c9a9c62, 0x1c929c62, 0x1c8e9c62, 0x1c8a9c62,
>> 0x1c8a9c62/*0x1cae9c62*/, 0x1c869c62, 0x1c869c62,
>> };
>> ...

>> hpt366.c:

>> ...
>> static u32 sixty_six_base_hpt37x[] = {
>> /* XFER_UDMA_6 */ 0x1c869c62,
>> /* XFER_UDMA_5 */ 0x1cae9c62, /* 0x1c8a9c62 */
>> ...

>> So we are using Dual ATA Clock for UDMA5 whereas vendor driver doesn't

> This is so in all other HPT drivers (and HPT371N datasheet has the
> same figures -- this chip is the only one supporting UDMA6 and having
> the default DPLL clock > 50 MHz).

What I meant to say was the only one I have a datasheet for. :-)

MBR, Sergei

2007-06-05 20:07:28

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Hello, I wrote:

>>>> This looks promising. Using a vanilla 2.6.22-rc3 I was able to
>>>> reproduce
>>>> the problem within a few seconds. With the above modification the
>>>> machine
>>>> is running under heavy disk I/O without problems since 30 minutes...

>>> Did it fix the problem for good?

>> It seems so far. There hasn't been any problem since I've applied the
>> fix.

>>> Sergei, do we need to disallow UDMA6 completely on HPT734 or
>>> is it only an issue with some problematic devices (=> blacklist)?

> Note that I didn't change what the old code was doing in this regard
> -- although the HPT374 spec does *not* say that UDMA6 is supported, it
> had been enabled. What have *really* changed for HPT374 was:

No, I've lied (my memory haven't served and I've finally forgot to check
myself). It was me who enabled it by default (that there should have been no
option to do this is another question). :-<

> - in 2.6.20-rc1, the driver switched to using the actual 33 MHz timing
> table
> instead of the old one, matching 50 MHz (and so, severely underclocked);

> - in 2.6.2-rc1, the driver switched from 33 MHz PCI to 66 MHz DPLL clock.

> Disallowing UDMA6 would clock the chip with 50 MHz DPLL, howewer, the
> original report claimed that something has changed to worse between
> 2.6.21.1 and .3 but nothing changed in drivers/ide/ between those
> releases...

>>> Either way we need to fix it somehow for 2.6.22.

>> For the record: this HTP374 is running with a quite outdated firmware
>> (1.22) - maybe newer firmwares work correctly. I'm going to upgrade
>> the firmware to the latest one (which was released in 2004...), but
>> unfortunately in the upcoming 2-3 weeks I won't have access to this
>> machine, so I can't check the case within the release cycle of 2.6.22.
>> If you were interested I would post the result of the firmware upgrade.

> I don't think this will matter...

>> Regards,
>> Sandor

MBR, Sergei

Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

On Tuesday 05 June 2007, Sergei Shtylyov wrote:
> Hello.
>
> Bartlomiej Zolnierkiewicz wrote:
>
> >>>>>>>The log of a typical IDE reset is available here:
>
> >>>>>>>http://petra.hos.u-szeged.hu/~wildy/syslog.gz
>
> >>>>>>>This was the worst case: the IDE bus was resetted during the system
> >>>>>>>boot.
>
> >>>>>> Could you try setting HPT374_ALLOW_ATA133_6 to 0 in
> >>>>>>drivers/ide/pci/hpt366.c and rebuild/reboot the kernel?
>
> >>>>>Hi Sergei,
>
> >>>>>This looks promising. Using a vanilla 2.6.22-rc3 I was able to reproduce
> >>>>>the problem within a few seconds. With the above modification the
> >>>>>machine
> >>>>>is running under heavy disk I/O without problems since 30 minutes...
>
> >>>>Did it fix the problem for good?
>
> >>>It seems so far. There hasn't been any problem since I've applied the fix.
>
> >>>>Sergei, do we need to disallow UDMA6 completely on HPT734 or
> >>>>is it only an issue with some problematic devices (=> blacklist)?
>
> >> Note that I didn't change what the old code was doing in this regard --
> >>although the HPT374 spec does *not* say that UDMA6 is supported, it had been
> >>enabled. What have *really* changed for HPT374 was:
>
> >>- in 2.6.20-rc1, the driver switched to using the actual 33 MHz timing table
> >> instead of the old one, matching 50 MHz (and so, severely underclocked);
>
> >>- in 2.6.2-rc1, the driver switched from 33 MHz PCI to 66 MHz DPLL clock.
>
> >> Disallowing UDMA6 would clock the chip with 50 MHz DPLL, howewer, the
>
> > I felt inspired by this explanation (thanks!) and took a look at
> > hpt374-opensource-v2.10 vendor driver. Here is something interesting:
>
> > glbdata.c:
>
> > ...
> > #ifdef CLOCK_66MHZ
> > ULONG setting370_66[] = {
> > 0xd029d5e, 0xd029d26, 0xc829ca6, 0xc829c84, 0xc829c62,
> > 0x2c829d2c, 0x2c829c66, 0x2c829c62,
> > 0x1c829c62, 0x1c9a9c62, 0x1c929c62, 0x1c8e9c62, 0x1c8a9c62,
> > 0x1c8a9c62/*0x1cae9c62*/, 0x1c869c62, 0x1c869c62,
> > };
> > ...
>
> > hpt366.c:
>
> > ...
> > static u32 sixty_six_base_hpt37x[] = {
> > /* XFER_UDMA_6 */ 0x1c869c62,
> > /* XFER_UDMA_5 */ 0x1cae9c62, /* 0x1c8a9c62 */
> > ...
>
> > So we are using Dual ATA Clock for UDMA5 whereas vendor driver doesn't
>
> This is so in all other HPT drivers (and HPT371N datasheet has the same
> figures -- this chip is the only one supporting UDMA6 and having the default
> DPLL clock > 50 MHz). Note that it means that there's no actual UDMA5 since
> the timing exactly matches that one used for UDMA4.
>
> > (the only other mode which uses Dual ATA Clock, in both drivers, is rarely
> > used UDMA3).
>
> And UDMA4 with 50 MHz clock.
>
> > Thanks to this UDMA cycle time should be equal 22.5ns instead of 30ns
> > (spec defines it at 16.8ns, ide_timings[] uses 20ns) when using 66 MHz DPLL
> > clock. In theory everything should play nice but the data manual for HPT374
>
> And it does -- on other chips.

My beautiful theory failed... Oh, well... ;)

> > contains weird note that Dual ATA Clock is meant to implement ATA100 read
> > and write at different clocks (there is no more explanation to this).
>
> That's the thing that keeps me confused in the other datasheets too --
> from my interpretation of their timing figures it seemed to control 2x ATA
> clock multipler. HPT370 datasheet just gives different timings and SCR2 values
> for reads/writes in UDMA5 (I've disabled this mode on HPT370 from which the
> read performance only gained -- not sure if it makes sense to restore the old
> clock turnaround hack).
>
> > Geller reported that the problems started after migrating from 2.6.20.7 to
> > 2.6.21.1 (the affected disks are using UDMA5) and at the same time the driver
> > switched from 33 MHz PCI to 66 MHz DPLL clock. Also the issue is completely
> > fixed by using 50 MHz DPLL clock (UDMA5 timing for 50 MHz DPLL clock is
> > 0x12848242 so UDMA cycle time equals 20ns and is smaller than the one
> > obtained using 66 MHz DPLL clock).
>
>
> > It all makes me wonder whether it is really safe to use Dual ATA Clock for
> > UDMA5 and whether we should just be using "the offical" timing instead...
>
> Not sure. I had no problems with this on the HPT371N/302 and 371N was
> clocked by 66 MHz DPLL from the start (its default clock is 75 MHz however).
> I'm still holding to my hypothesis that HPT374 simply can't tolerate 66
> MHz DPLL clock, and the UDMA5 timing figures that you've cited seem to prove that.
> I'm going to post a patch today -- how about completely prohibiting UDMA6
> on HPT374?

Sounds fine, in case somebody misses it we can introduce something like
hpt374_allow_66mhz_dpll module parameter...

Thanks,
Bart

2007-06-09 10:11:45

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: HPT374 IDE problem with 2.6.21.* kernels

Bartlomiej Zolnierkiewicz wrote:

>>>>>>Sergei, do we need to disallow UDMA6 completely on HPT734 or
>>>>>>is it only an issue with some problematic devices (=> blacklist)?

>>>> Note that I didn't change what the old code was doing in this regard --
>>>>although the HPT374 spec does *not* say that UDMA6 is supported, it had been
>>>>enabled. What have *really* changed for HPT374 was:

>>>>- in 2.6.20-rc1, the driver switched to using the actual 33 MHz timing table
>>>> instead of the old one, matching 50 MHz (and so, severely underclocked);

>>>>- in 2.6.2-rc1, the driver switched from 33 MHz PCI to 66 MHz DPLL clock.

>>>> Disallowing UDMA6 would clock the chip with 50 MHz DPLL, howewer, the

>>>I felt inspired by this explanation (thanks!) and took a look at
>>>hpt374-opensource-v2.10 vendor driver. Here is something interesting:
>>
>>>glbdata.c:
>>
>>>...
>>>#ifdef CLOCK_66MHZ
>>>ULONG setting370_66[] = {
>>> 0xd029d5e, 0xd029d26, 0xc829ca6, 0xc829c84, 0xc829c62,
>>> 0x2c829d2c, 0x2c829c66, 0x2c829c62,
>>> 0x1c829c62, 0x1c9a9c62, 0x1c929c62, 0x1c8e9c62, 0x1c8a9c62,
>>> 0x1c8a9c62/*0x1cae9c62*/, 0x1c869c62, 0x1c869c62,
>>>};
>>>...

>>>hpt366.c:

>>>...
>>>static u32 sixty_six_base_hpt37x[] = {
>>> /* XFER_UDMA_6 */ 0x1c869c62,
>>> /* XFER_UDMA_5 */ 0x1cae9c62, /* 0x1c8a9c62 */
>>>...

>>>So we are using Dual ATA Clock for UDMA5 whereas vendor driver doesn't

>> This is so in all other HPT drivers (and HPT371N datasheet has the same
>>figures -- this chip is the only one supporting UDMA6 and having the default
>>DPLL clock > 50 MHz). Note that it means that there's no actual UDMA5 since
>>the timing exactly matches that one used for UDMA4.

>>>(the only other mode which uses Dual ATA Clock, in both drivers, is rarely
>>>used UDMA3).

>> And UDMA4 with 50 MHz clock.

>>>Thanks to this UDMA cycle time should be equal 22.5ns instead of 30ns
>>>(spec defines it at 16.8ns, ide_timings[] uses 20ns) when using 66 MHz DPLL
>>>clock. In theory everything should play nice but the data manual for HPT374

>> And it does -- on other chips.

> My beautiful theory failed... Oh, well... ;)

Sigh, if we only knew why HPT decided that UDMA5 timings should be the
same as UDMA4 -- probably they had some reason...

>>>contains weird note that Dual ATA Clock is meant to implement ATA100 read
>>>and write at different clocks (there is no more explanation to this).

>> That's the thing that keeps me confused in the other datasheets too --
>>from my interpretation of their timing figures it seemed to control 2x ATA
>>clock multipler. HPT370 datasheet just gives different timings and SCR2 values
>>for reads/writes in UDMA5 (I've disabled this mode on HPT370 from which the
>>read performance only gained -- not sure if it makes sense to restore the old
>>clock turnaround hack).

It used to clock the writes from DPLL in UDMA5, and clock UDMA5 reads and
all other modes from PCI... And the result was dog slow reads in UDMA5 which
UDMA4 was beating by about 8 MB/s... Maybe that's why UDMA4 timings were used
for UDMA5 in later chips by HPT -- but the real UDMA5 yielded faster transfer
speeds than UDMA4 for those chips...

>>>Geller reported that the problems started after migrating from 2.6.20.7 to
>>>2.6.21.1 (the affected disks are using UDMA5) and at the same time the driver
>>>switched from 33 MHz PCI to 66 MHz DPLL clock. Also the issue is completely
>>>fixed by using 50 MHz DPLL clock (UDMA5 timing for 50 MHz DPLL clock is
>>>0x12848242 so UDMA cycle time equals 20ns and is smaller than the one
>>>obtained using 66 MHz DPLL clock).

>>>It all makes me wonder whether it is really safe to use Dual ATA Clock for
>>>UDMA5 and whether we should just be using "the offical" timing instead...

>> Not sure. I had no problems with this on the HPT371N/302 and 371N was
>>clocked by 66 MHz DPLL from the start (its default clock is 75 MHz however).

I meant to say 77... :-)

>> I'm still holding to my hypothesis that HPT374 simply can't tolerate 66
>>MHz DPLL clock, and the UDMA5 timing figures that you've cited seem to prove that.
>> I'm going to post a patch today -- how about completely prohibiting UDMA6
>>on HPT374?

> Sounds fine, in case somebody misses it we can introduce something like
> hpt374_allow_66mhz_dpll module parameter...

Don't think anybody will miss it. Anyway, chip spec doesn't say that it's
supported.

> Thanks,
> Bart

MBR, Sergei