Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."


Jeff,

Could we get this merged ASAP (it was first posted on 26th February)?

This patch fixes data corruption for libata PATA ServerWorks and HPT drivers.

[ IDE users are already leaving happy life since they are not affected
(modes masking has always worked correctly in the original drivers)...

Can we make life happy also for libata PATA users? :) ]

On Wednesday 05 March 2008, [email protected] wrote:
> From: Alan Cox <[email protected]>
>
> When masking mask out the modes that are unsupported not the ones that are
> supported. This makes life happier.
>
> [[email protected]: coding-style fixes]
> Signed-off-by: Alan Cox <[email protected]>
> Cc: Jeff Garzik <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> drivers/ata/pata_hpt366.c | 6 +++---
> drivers/ata/pata_hpt37x.c | 6 +++---
> drivers/ata/pata_serverworks.c | 2 +-
> 3 files changed, 7 insertions(+), 7 deletions(-)
>
> diff -puN drivers/ata/pata_hpt366.c~pata-i-do-not-think-it-means-what-you-think-it-means drivers/ata/pata_hpt366.c
> --- a/drivers/ata/pata_hpt366.c~pata-i-do-not-think-it-means-what-you-think-it-means
> +++ a/drivers/ata/pata_hpt366.c
> @@ -27,7 +27,7 @@
> #include <linux/libata.h>
>
> #define DRV_NAME "pata_hpt366"
> -#define DRV_VERSION "0.6.1"
> +#define DRV_VERSION "0.6.2"
>
> struct hpt_clock {
> u8 xfer_speed;
> @@ -180,9 +180,9 @@ static unsigned long hpt366_filter(struc
> if (hpt_dma_blacklisted(adev, "UDMA", bad_ata33))
> mask &= ~ATA_MASK_UDMA;
> if (hpt_dma_blacklisted(adev, "UDMA3", bad_ata66_3))
> - mask &= ~(0x07 << ATA_SHIFT_UDMA);
> + mask &= ~(0xF8 << ATA_SHIFT_UDMA);
> if (hpt_dma_blacklisted(adev, "UDMA4", bad_ata66_4))
> - mask &= ~(0x0F << ATA_SHIFT_UDMA);
> + mask &= ~(0xF0 << ATA_SHIFT_UDMA);
> }
> return ata_pci_default_filter(adev, mask);
> }
> diff -puN drivers/ata/pata_hpt37x.c~pata-i-do-not-think-it-means-what-you-think-it-means drivers/ata/pata_hpt37x.c
> --- a/drivers/ata/pata_hpt37x.c~pata-i-do-not-think-it-means-what-you-think-it-means
> +++ a/drivers/ata/pata_hpt37x.c
> @@ -24,7 +24,7 @@
> #include <linux/libata.h>
>
> #define DRV_NAME "pata_hpt37x"
> -#define DRV_VERSION "0.6.9"
> +#define DRV_VERSION "0.6.11"
>
> struct hpt_clock {
> u8 xfer_speed;
> @@ -281,7 +281,7 @@ static unsigned long hpt370_filter(struc
> if (hpt_dma_blacklisted(adev, "UDMA", bad_ata33))
> mask &= ~ATA_MASK_UDMA;
> if (hpt_dma_blacklisted(adev, "UDMA100", bad_ata100_5))
> - mask &= ~(0x1F << ATA_SHIFT_UDMA);
> + mask &= ~(0xE0 << ATA_SHIFT_UDMA);
> }
> return ata_pci_default_filter(adev, mask);
> }
> @@ -297,7 +297,7 @@ static unsigned long hpt370a_filter(stru
> {
> if (adev->class == ATA_DEV_ATA) {
> if (hpt_dma_blacklisted(adev, "UDMA100", bad_ata100_5))
> - mask &= ~ (0x1F << ATA_SHIFT_UDMA);
> + mask &= ~(0xE0 << ATA_SHIFT_UDMA);
> }
> return ata_pci_default_filter(adev, mask);
> }
> diff -puN drivers/ata/pata_serverworks.c~pata-i-do-not-think-it-means-what-you-think-it-means drivers/ata/pata_serverworks.c
> --- a/drivers/ata/pata_serverworks.c~pata-i-do-not-think-it-means-what-you-think-it-means
> +++ a/drivers/ata/pata_serverworks.c
> @@ -226,7 +226,7 @@ static unsigned long serverworks_csb_fil
>
> for (i = 0; (p = csb_bad_ata100[i]) != NULL; i++) {
> if (!strcmp(p, model_num))
> - mask &= ~(0x1F << ATA_SHIFT_UDMA);
> + mask &= ~(0xE0 << ATA_SHIFT_UDMA);
> }
> return ata_pci_default_filter(adev, mask);
> }


2008-03-05 01:32:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

On Wed, 5 Mar 2008 02:34:40 +0100
Bartlomiej Zolnierkiewicz <[email protected]> wrote:

> On Wednesday 05 March 2008, [email protected] wrote:
> > From: Alan Cox <[email protected]>
> >
> > When masking mask out the modes that are unsupported not the ones that are
> > supported. This makes life happier.
> >
>
> Jeff,
>
> Could we get this merged ASAP (it was first posted on 26th February)?
>
> This patch fixes data corruption for libata PATA ServerWorks and HPT drivers.
>
> [ IDE users are already leaving happy life since they are not affected
> (modes masking has always worked correctly in the original drivers)...
>
> Can we make life happy also for libata PATA users? :) ]
>

(edited to undo top-posting damage)

I didn't know any of that. The changelog might have been kinda fun, but
given that it failed to tell us that the patch fixes data-corruption
errors, the changelog was excrutiatingly bad.

Do we need this fix in 2.6.24.x as well?

Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

On Wednesday 05 March 2008, Andrew Morton wrote:
> On Wed, 5 Mar 2008 02:34:40 +0100
> Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>
> > On Wednesday 05 March 2008, [email protected] wrote:
> > > From: Alan Cox <[email protected]>
> > >
> > > When masking mask out the modes that are unsupported not the ones that are
> > > supported. This makes life happier.
> > >
> >
> > Jeff,
> >
> > Could we get this merged ASAP (it was first posted on 26th February)?
> >
> > This patch fixes data corruption for libata PATA ServerWorks and HPT drivers.
> >
> > [ IDE users are already leaving happy life since they are not affected
> > (modes masking has always worked correctly in the original drivers)...
> >
> > Can we make life happy also for libata PATA users? :) ]
> >
>
> (edited to undo top-posting damage)
>
> I didn't know any of that. The changelog might have been kinda fun, but
> given that it failed to tell us that the patch fixes data-corruption
> errors, the changelog was excrutiatingly bad.
>
> Do we need this fix in 2.6.24.x as well?

Let me quote Alan's opinion on this:

"I think that would be wise ;)"

nothing needs to be added here ;)

Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

On Wednesday 05 March 2008, Alan Cox wrote:
>
> > I didn't know any of that. The changelog might have been kinda fun, but
> > given that it failed to tell us that the patch fixes data-corruption
> > errors, the changelog was excrutiatingly bad.
>
> I have no reason/evidence to believe it fixes data corruption errors of
> any kind. For the specific combinations of device it should simply avoid
> a long pause, complaints and a switch to lower speeds.

http://www.mail-archive.com/[email protected]/msg16599.html

There is strange coincidence with being on the blacklist and FIFO corruption.

https://bugzilla.redhat.com/show_bug.cgi?id=433557

Bugzilla Bug 433557: Data corrupion with Fedora8 on HPT370 disk controller (Abit BX133 mobo)

IBM-DTLA-307030 is on the blacklist....

[ there can be more libata problems involved, anyway FC6 w/IDE works fine ]

> The ATA disk case with serverworks (which is a potential corruptor) was
> always correctly handled.

for OSB4 yes but...

/* Seagate Barracuda ATA IV Family drives in UDMA mode 5
* can overrun their FIFOs when used with the CSB5 */

static const char *csb_bad_ata100[] = {

Thanks,
Bart

2008-03-05 15:01:32

by Alan

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."


> I didn't know any of that. The changelog might have been kinda fun, but
> given that it failed to tell us that the patch fixes data-corruption
> errors, the changelog was excrutiatingly bad.

I have no reason/evidence to believe it fixes data corruption errors of
any kind. For the specific combinations of device it should simply avoid
a long pause, complaints and a switch to lower speeds.

The ATA disk case with serverworks (which is a potential corruptor) was
always correctly handled.

2008-03-05 15:03:15

by Alan

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

> There is strange coincidence with being on the blacklist and FIFO corruption.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=433557
>
> Bugzilla Bug 433557: Data corrupion with Fedora8 on HPT370 disk controller (Abit BX133 mobo)
>
> IBM-DTLA-307030 is on the blacklist....

I wish it was more than a co-incidence but the trace shows it dropped
speed as expected.

> for OSB4 yes but...
>
> /* Seagate Barracuda ATA IV Family drives in UDMA mode 5
> * can overrun their FIFOs when used with the CSB5 */

Which gives you a CRC error according to my notes

Alan

Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

On Wed, Mar 5, 2008 at 3:39 PM, Alan Cox <[email protected]> wrote:
> > There is strange coincidence with being on the blacklist and FIFO corruption.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=433557
> >
> > Bugzilla Bug 433557: Data corrupion with Fedora8 on HPT370 disk controller (Abit BX133 mobo)
> >
> > IBM-DTLA-307030 is on the blacklist....
>
> I wish it was more than a co-incidence but the trace shows it dropped
> speed as expected.

the trace shows only ST380011A (ata1) dropping speed...

I would suggest that you ask Wojciech to test the patch
(unless it has already happend).

> > for OSB4 yes but...
> >
> > /* Seagate Barracuda ATA IV Family drives in UDMA mode 5
> > * can overrun their FIFOs when used with the CSB5 */
>
> Which gives you a CRC error according to my notes

I have no documentation / errata for ServerWorks chipsets (everything is
NDA-ed) or the hardware in question so I'll trust your opinion on this.

Thanks,
Bart

2008-03-05 16:35:20

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

Bartlomiej Zolnierkiewicz <[email protected]> writes:

> IBM-DTLA-307030 is on the blacklist....

Any still in use? Surpised.
--
Krzysztof Halasa

Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

On Wed, Mar 5, 2008 at 5:34 PM, Krzysztof Halasa <[email protected]> wrote:
> Bartlomiej Zolnierkiewicz <[email protected]> writes:
>
> > IBM-DTLA-307030 is on the blacklist....
>
> Any still in use? Surpised.

https://bugzilla.redhat.com/show_bug.cgi?id=433557

the bug was opened two weeks ago so I assume that the drive is still in use

[ Actually, I'm not surprised as we are getting patches/reports for much
older/buggier hardware... After all we are doing Linux not some other
OS whose every release obsoletes the old hardware. ;-) ]

2008-03-05 16:55:49

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

"Bartlomiej Zolnierkiewicz" <[email protected]> writes:

> [ Actually, I'm not surprised as we are getting patches/reports for much
> older/buggier hardware... After all we are doing Linux not some other
> OS whose every release obsoletes the old hardware. ;-) ]

It isn't just about old/buggy hardware (personally I'm using much
older items). The DTLA disks had "rather" short MTBF, with something
like 30% returns in the first year after purchase, according to a
friendly distributor. Despite firmware upgrades, they were unfixable.
--
Krzysztof Halasa

Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

On Wed, Mar 5, 2008 at 5:55 PM, Krzysztof Halasa <[email protected]> wrote:
> "Bartlomiej Zolnierkiewicz" <[email protected]> writes:
>
>
> > [ Actually, I'm not surprised as we are getting patches/reports for much
> > older/buggier hardware... After all we are doing Linux not some other
> > OS whose every release obsoletes the old hardware. ;-) ]
>
> It isn't just about old/buggy hardware (personally I'm using much
> older items). The DTLA disks had "rather" short MTBF, with something
> like 30% returns in the first year after purchase, according to a
> friendly distributor. Despite firmware upgrades, they were unfixable.

Ah, you meant that it was a DeathStar drive... well, a lucky survivor...

2008-03-05 18:34:39

by Zan Lynx

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."


On Wed, 2008-03-05 at 18:22 +0100, Bartlomiej Zolnierkiewicz wrote:
> On Wed, Mar 5, 2008 at 5:55 PM, Krzysztof Halasa <[email protected]> wrote:
> > "Bartlomiej Zolnierkiewicz" <[email protected]> writes:
> >
> >
> > > [ Actually, I'm not surprised as we are getting patches/reports for much
> > > older/buggier hardware... After all we are doing Linux not some other
> > > OS whose every release obsoletes the old hardware. ;-) ]
> >
> > It isn't just about old/buggy hardware (personally I'm using much
> > older items). The DTLA disks had "rather" short MTBF, with something
> > like 30% returns in the first year after purchase, according to a
> > friendly distributor. Despite firmware upgrades, they were unfixable.
>
> Ah, you meant that it was a DeathStar drive... well, a lucky survivor...

In my experience what they needed was proper cooling. I have a 3ware
RAID-5 array of 4 120 GB DeskStar drives still working. In a nice RAID
enclosure with fans, not tucked next to an overclocked video card and
the power supply.
--
Zan Lynx <[email protected]>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-03-06 01:33:42

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

Zan Lynx <[email protected]> writes:

> In my experience what they needed was proper cooling. I have a 3ware
> RAID-5 array of 4 120 GB DeskStar drives still working.

I think the largest "deathstars" (75GXP?) were 75 GB.
--
Krzysztof Halasa

2008-03-06 11:13:21

by Ville Syrjälä

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

On Thu, Mar 06, 2008 at 02:33:29AM +0100, Krzysztof Halasa wrote:
> Zan Lynx <[email protected]> writes:
>
> > In my experience what they needed was proper cooling. I have a 3ware
> > RAID-5 array of 4 120 GB DeskStar drives still working.
>
> I think the largest "deathstars" (75GXP?) were 75 GB.

AFAIK there were basically two series of deathstars. The original
DTLA<something> and the more recent IC35<something>. The IC35 series
were bigger (120GB is the most common size I've seen for those).
Proper cooling and firmware upgrade usually fixed the deathstarness on
both series. I still have some of both, not in active use for a year or
two but still working. As a strange coincidence I was just pulling out
some old data from them yesterday.

--
Ville Syrj?l?
[email protected]
http://www.sci.fi/~syrjala/

2008-03-06 14:58:57

by Mark Lord

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

Ville Syrjälä wrote:
> On Thu, Mar 06, 2008 at 02:33:29AM +0100, Krzysztof Halasa wrote:
>> Zan Lynx <[email protected]> writes:
>>
>>> In my experience what they needed was proper cooling. I have a 3ware
>>> RAID-5 array of 4 120 GB DeskStar drives still working.
>> I think the largest "deathstars" (75GXP?) were 75 GB.
>
> AFAIK there were basically two series of deathstars. The original
> DTLA<something> and the more recent IC35<something>. The IC35 series
> were bigger (120GB is the most common size I've seen for those).
> Proper cooling and firmware upgrade usually fixed the deathstarness on
> both series. I still have some of both, not in active use for a year or
> two but still working. As a strange coincidence I was just pulling out
> some old data from them yesterday.
..

The original Deathstar ailment had nothing to do with firmware or cooling.
But rather, a bad batch of chips that IBM had the misfortune to use a lot of.

The chips would grow tiny internal whiskers over a period of 2+ years,
and eventually short circuit themselves.

My last one died here just a few weeks ago, after sitting on the shelf
for nearly all of it's life. Never more than perhaps 40 power-on hours total,
and never enough to get very warm.

Cheers

2008-03-06 15:01:04

by Mark Lord

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

Mark Lord wrote:
>
> The original Deathstar ailment had nothing to do with firmware or cooling.
> But rather, a bad batch of chips that IBM had the misfortune to use a
> lot of.
>
> The chips would grow tiny internal whiskers over a period of 2+ years,
> and eventually short circuit themselves.
...

Oddly enough, the Wikipedia entry doesn't include this information,
but does talk about other failure modes of the 75GXP series.

Cheers

2008-03-06 22:11:06

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [patch 3/3] pata: "I do not think it means, what you think it means."

Ville Syrj?l? <[email protected]> writes:

> AFAIK there were basically two series of deathstars. The original
> DTLA<something> and the more recent IC35<something>.

Said friendly distributor recognizes only one deathstar line.
Personally I lost only two such drives, 45 GB I think. My only IC35
(still) produces some strange noises but it was like this from the
beginning and I guess it's normal. IC25 (?, 2.5") still working, too -
noises are a bit different than on IC35.

> Proper cooling and firmware upgrade usually fixed the deathstarness on
> both series.

Nope, DTLAs were not field-fixable, I think it was some problem with
drive electronics. Anyway replacements from IBM were dying the same
death, firmware upgrades or not, brand new or repaired, temperature or
not.

Perhaps there were other problems with them (switching while writing
to medium IIRC) - a different story.
--
Krzysztof Halasa