2007-02-01 01:50:37

by TJ

[permalink] [raw]
Subject: [PATCH 1/1] filesystem: Disk Errors at boot-time caused by probe of partitions

From: TJ <[email protected]>

Applies to: up-to and including 2.6.20-rc7

Update to previous patch.

Fixed logical error in if() statements for conditional printk(). Replaced bit-wise OR with AND.

Signed-off-by: TJ <[email protected]>

--- fs/partitions/msdos.tj.c 2007-02-01 00:41:57.000000000 +0000
+++ fs/partitions/msdos.tj.1.c 2007-02-01 01:07:31.000000000 +0000
@@ -459,13 +459,13 @@ int check_sane_values(struct partition *
if (insane) { /* insanity found; report it */
ret = -1; /* error code */
printk("\n"); /* start error report on a fresh line */
- if (insane | 1)
+ if (insane & 1)
printk(" partition %d: start (sector %d) beyond end of disk (sector %d)\n",
slot, START_SECT(p), (unsigned int) bdev->bd_disk->capacity-1);
- if (insane | 2)
+ if (insane & 2)
printk(" partition %d: end (sector %d) beyond end of disk (sector %d)\n",
slot, START_SECT(p)+NR_SECTS(p)-1, (unsigned int) bdev->bd_disk->capacity-1);
- if (insane | 4)
+ if (insane & 4)
printk(" partition %d: insane extended contents\n", slot);
}
}





2007-02-01 04:22:17

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH 1/1] filesystem: Disk Errors at boot-time caused by probe of partitions

TJ wrote:
> From: TJ <[email protected]>
>
> Applies to: up-to and including 2.6.20-rc7
>
> This rare but critical bug has the potential to cause a hardware failure on disk drives by
> allowing the system to repeatedly attempt to seek to sectors beyond the end of the physical
> disk, causing sustained 'head banging'.

It seems pretty unlikely that telling a hard drive to seek past its
capacity would cause it to damage itself, that would be some pretty
moronic firmware. Though, you never know - if it's true, let me know
what kind of drives these are, so I know never to buy one :-)

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-02-01 11:28:10

by TJ

[permalink] [raw]
Subject: Re: [PATCH 1/1] filesystem: Disk Errors at boot-time caused by probe of partitions

On Wed, 2007-01-31 at 22:20 -0600, Robert Hancock wrote:

> It seems pretty unlikely that telling a hard drive to seek past its
> capacity would cause it to damage itself, that would be some pretty
> moronic firmware. Though, you never know - if it's true, let me know
> what kind of drives these are, so I know never to buy one :-)
>

Hi Robert,

Yes, I'd have expected that too. I'm particularly surprised the
drive-logic doesn't refuse to move the heads and just report the failure
based on the LBA value.

These are Maxtor drives, but its also happened with IBM drives in
another system with similar configuration, as a test.

During the bug-hunt (which started end of December) I built about 100
kernels trying to track down the root cause, and therefore went through
many reboot cycles.

Several times the drives were 'knocked out' and would refuse to
initialise during POST. The only remedy was to leave the system powered
down for a while - the rest seemed to do them good.

The difficulty I had in debugging was the errors are generated on the
work-queue and interrupt handling side, and it was extremely difficult
to pin-point the root cause because the symptoms (drive seek errors)
occur well after the partition tables have been scanned, and also repeat
themselves several times during system start-up.

TJ.

2007-02-01 19:12:55

by Phillip Susi

[permalink] [raw]
Subject: Re: [PATCH 1/1] filesystem: Disk Errors at boot-time caused by probe of partitions

I think you may be barking up the wrong tree because IIRC, these
requests for data beyond the end of the disk never make it to the drive;
the kernel fails them in the block layer. There was a patch a while
back to fix the partition detection code to NOT request sectors beyond
the end of the disk, but I don't think it was ever merged.

In any case, if you are sure the requests are making it to the drive and
causing damage, I hope you give Maxtor and IBM a sound thrashing for
using retarded firmware.

TJ wrote:
> Hi Robert,
>
> Yes, I'd have expected that too. I'm particularly surprised the
> drive-logic doesn't refuse to move the heads and just report the failure
> based on the LBA value.
>
> These are Maxtor drives, but its also happened with IBM drives in
> another system with similar configuration, as a test.
>
> During the bug-hunt (which started end of December) I built about 100
> kernels trying to track down the root cause, and therefore went through
> many reboot cycles.
>
> Several times the drives were 'knocked out' and would refuse to
> initialise during POST. The only remedy was to leave the system powered
> down for a while - the rest seemed to do them good.
>
> The difficulty I had in debugging was the errors are generated on the
> work-queue and interrupt handling side, and it was extremely difficult
> to pin-point the root cause because the symptoms (drive seek errors)
> occur well after the partition tables have been scanned, and also repeat
> themselves several times during system start-up.

2007-02-01 20:32:23

by Alan

[permalink] [raw]
Subject: Re: [PATCH 1/1] filesystem: Disk Errors at boot-time caused by probe of partitions

On Thu, 01 Feb 2007 14:12:42 -0500
Phillip Susi <[email protected]> wrote:

> I think you may be barking up the wrong tree because IIRC, these
> requests for data beyond the end of the disk never make it to the drive;
> the kernel fails them in the block layer. There was a patch a while
> back to fix the partition detection code to NOT request sectors beyond
> the end of the disk, but I don't think it was ever merged.

ide-scsi and libata support this correctly. Ingo Molnar also ported the
recent CD changes related to size handling. None of these are relevant to
hard disks

> In any case, if you are sure the requests are making it to the drive and
> causing damage, I hope you give Maxtor and IBM a sound thrashing for
> using retarded firmware.

All the IBM and Maxtor drives I've played with correctly error when a
sector isn't available. It's pretty implausible they would do otherwise
as the "sector" is a logically mapping onto the drives internal file
system these days.

Fed a wrong sector any drive I know of will report that the sector cannot
be found.