2004-11-07 21:54:40

by Matt Domsch

[permalink] [raw]
Subject: Re: EFI partition code broken..

On Sun, Nov 07, 2004 at 11:30:18AM -0800, Linus Torvalds wrote:
> There's a few reports of various USB storage devices locking up. The last
> one was an iPod, but there's apparently others too.
>
> The reason? They are unhappy if you access them past the end, and they
> seem to have problems reporting their true size.
>
> And the EFI partitioning code will happily just blindly try to access the
> last sector, because that's where the EFI partition is. Boom. Immediately
> dead iPod/whatever.

Another train of thought, and copying gregkh for inspiration. Is there
any way to know which devices lie about their size, and fix that with
quirk code in the device discovery routines? While I can fix
fs/partitions/efi.c to not to always do I/O to the end of the
purported size of the device, userspace and 'dd' can't. If we could
quirk down the reported size for devices known to lie, then everything
which uses that value wouldn't have to have its own rules for such.

Thanks,
Matt

--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com


2004-11-07 22:41:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: EFI partition code broken..



On Sun, 7 Nov 2004, Matt Domsch wrote:
>
> Another train of thought, and copying gregkh for inspiration. Is there
> any way to know which devices lie about their size, and fix that with
> quirk code in the device discovery routines?

The USB layer actually has some quirks like this, but I think that's a
bug waiting to happen.

The thing is, if you start doing quirks, you _are_ screwed in the end.
Don't do it. It's not just a maintenance nightmare, it's fundamentally
wrong. It fundamentally takes the approach of "you have to have a kernel
that is two years newer than the hardware you have", which is an approach
that I just find incredibly broken.

Quirks work slightly better in practice for stuff that seldom changes,
and/or where we have fairly good vendor support. So CPU's, for example,
are largely ok with quirks (aka "errata"). But random regular devices?
Please no.

Side note: the USB storage stuff has historically had tons of quirks,
largely because the SCSI layer used to do crap-all to try to be sane. The
SCSI layer historically only cared about high-end devices, and then the
USB storage model clashed pretty hard with the old SCSI layer belief that
standards are something that people follow etc.

Happily, most of those quirks are hopefully stale these days, because the
SCSI layer has been slowly converted to the idea that you don't use every
documented feature under the sun just because it exists.

So I'm trying to make for _fewer_ quirks rather than more of them.

Linus

2004-11-07 23:47:04

by Andries Brouwer

[permalink] [raw]
Subject: Re: EFI partition code broken..

On Sun, Nov 07, 2004 at 03:52:04PM -0600, Matt Domsch wrote:
> On Sun, Nov 07, 2004 at 11:30:18AM -0800, Linus Torvalds wrote:
> > There's a few reports of various USB storage devices locking up. The last
> > one was an iPod, but there's apparently others too.
> >
> > The reason? They are unhappy if you access them past the end, and they
> > seem to have problems reporting their true size.
> >
> > And the EFI partitioning code will happily just blindly try to access the
> > last sector, because that's where the EFI partition is. Boom. Immediately
> > dead iPod/whatever.
>
> Another train of thought, and copying gregkh for inspiration. Is there
> any way to know which devices lie about their size, and fix that with
> quirk code in the device discovery routines? While I can fix
> fs/partitions/efi.c to not to always do I/O to the end of the
> purported size of the device, userspace and 'dd' can't. If we could
> quirk down the reported size for devices known to lie, then everything
> which uses that value wouldn't have to have its own rules for such.

You see, Linux does automatic partition reading - nothing the user
can do about that - and goes Boom. One has to be more careful with
things that happen fully automatically than with things that happen
from user space. Perhaps the user learns not to do certain things.

The reason things go wrong is confusion about the result of asking
for the capacity. The SCSI way is to report the highest available address,
so that one has to add 1 to get the capacity. The ATA way is to give the
capacity. Some USB devices (that should use the SCSI way) get it wrong,
and thus give an off-by-1 answer, making the device one sector too large.
I introduced the US_FL_FIX_CAPACITY flag for that (in unusual_devs.h),
so once it is known that a device is bad it can be added to the list
with quirks. But it is better to be careful, and only do I/O to the last
sector when the user really asks for that.

Andries