2009-03-11 01:50:54

by Norman Diamond

[permalink] [raw]
Subject: Off-by-one in both LIBATA and IDE drivers

It looks like both LIBATA and the old IDE drivers have an
off-by-one error in deciding whether to use READ SECTOR(S)
instead of READ SECTOR(S) EXT.

Sorry the kernel numbers are in the range around 2.6.24.
Knoppix 6.0.1 has a newer kernel but doesn't have hdparm,
so I haven't tested it yet. Also this adds onto a thread
that I started in linux-kernel yesterday, but a lot of
information here is new.

LBA number 0x0fffffff fits in 28 bits but READ SECTOR(S)
can't reliably handle it. In my testing the largest LBA
number that could be read reliably using READ SECTOR(S) is
0x0ffffffe.

It looks like a Western Digital drive could handle sector
number 0x0fffffff in LBA28, but Toshiba and Seagate
couldn't.

This can be reproduced using commands
hdparm --read-sector 268435455 /dev/sda
or
hdparm --read-sector 268435455 /dev/hda
depending on which driver is involved.

Sectors 268435454 and 268435456 have no problem. I didn't
take a close look but a pretty obvious guess is that LBA28
worked for the preceding one and LBA48 worked for the
following one.

If LIBATA is in charge, dmesg shows that the command was
0x20 (READ SECTOR) and the sector number appears to be
correct, but the error bit is IDNF.

If IDE is in charge, dmesg shows that the command was 0x20
(READ SECTOR) and the sector number is shown as 16777215
instead of 268435455. This looks like an error in the
error message to compound the error in the main code.
I'll guess the command had the additional nibble correct,
just that LBA28 can't handle it.

Sometimes the dd command can read that sector but
sometimes not. If dd fails then dmesg shows several error
messages with sector numbers, 0x0ffffff8, 268435448, and
33554431. Someone decides 8 sectors should be read so the
start sector makes sense in both hex and decimal, but the
error sector is missing three bits instead of missing an
entire nibble. And here, the command code is 0xc8, READ
DMA, LBA28 again. This one needs changing to READ DMA
EXT.

All drives in my test were SATA, and the chipsets were
ICH7 or ICH7M, but BIOSes differed in setting ICH7 to
expose IDE or SATA interfaces. In the case with ICH7 and
full SATA operation, the Western Digital drive worked but
Seagate failed.


--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/


2009-03-11 03:14:45

by Jim Paris

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Norman Diamond wrote:
> It looks like both LIBATA and the old IDE drivers have an
> off-by-one error in deciding whether to use READ SECTOR(S)
> instead of READ SECTOR(S) EXT.

Hi,

This was fixed here:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119

-jim

2009-03-11 03:28:22

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Jim Paris wrote:
> Norman Diamond wrote:
>>
>> It looks like both LIBATA and the old IDE drivers
>> have an off-by-one error in deciding whether to use
>> READ SECTOR(S) instead of READ SECTOR(S) EXT.
>
> This was fixed here:
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119

Thank you.

I will see if I can port that fix to 2.6.24.3, because
Slax 6.0.3 and kernel 2.6.24.3 avoided some other bugs of
later kernels.


--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-11 08:26:51

by Alan

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

On Wed, 11 Mar 2009 10:50:39 +0900 (JST)
Norman Diamond <[email protected]> wrote:

> It looks like both LIBATA and the old IDE drivers have an
> off-by-one error in deciding whether to use READ SECTOR(S)
> instead of READ SECTOR(S) EXT.

This was fixed some time ago, you need a newer kernel.

2009-03-11 08:38:59

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Alan Cox wrote:
> Norman Diamond <[email protected]> wrote:
>
>> It looks like both LIBATA and the old IDE drivers
>> have an off-by-one error in deciding whether to use
>> READ SECTOR(S) instead of READ SECTOR(S) EXT.
>
> This was fixed some time ago, you need a newer
> kernel.

Well, either that or I need an older kernel. It was also
fixed in 2.6.20, for which I think there was a ready-made
Slax distribution.

On another topic, trying 2.6.20 in whatever Slax
distribution it was, an Intel ICH7M had DMA enabled on
both /dev/hda and /dev/hdc. I understand that the change
which was made shortly after that is considered to be by
design not a bug. In 2.6.20 I didn't even have to type a
"combined_mode" parameter, it just worked. I understand
that the addition and subsequent deletion of the
"combined_mode" parameter are considered to be by design
not bugs. But it is not at all pleasant that my /dev/hda
runs at 1.3 megabytes per second in 2.6.24.3 and later,
when it used to run at 45 megabytes per second in 2.6.20.
Yeah I know libata is supposed to solve all this stuff.
Removed some bugs and added others.

I can't go newer than 2.6.24.3 until I find a version
where TASKFILEs start working again. There might not be
one. Yeah I know libata is supposed to solve all this
stuff. I wonder how much time I'll need and how many
varieties of hardware I'll have to buy to see if it's
really fixed.


--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-11 21:04:28

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Sergei Shtylyov wrote:
> Norman Diamond wrote:
>> [attribution stolen:]
>>> [Norman Diamond:]
>>>> It looks like both LIBATA and the old IDE drivers
>>>> have an off-by-one error in deciding whether to use
>>>> READ SECTOR(S) instead of READ SECTOR(S) EXT.
>>>>
>>> This was fixed here:
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119
>>
>> Thank you.
>> I will see if I can port that fix to 2.6.24.3, because
>> Slax 6.0.3 and kernel 2.6.24.3 avoided some other bugs of
>> later kernels.
>
> Note that this doesn't fix it for the IDE core.

That helps explain my subsequent testing. I'm still having problems.

> The ide-disk driver however seems to use LBA48 regardless of the sector
> address.

But that doesn't explain my subsequent testing. That behaviour would cause
it to work in one of my present cases, but it still fails.

> But still there are incorrect capacity checks, and it can fails with
> drivers not supporting LBA48 with DMA...

Yeah that could be part of it. 2.6.20 turned on DMA more often than later
kernels, so I'm now thinking of reverting to 2.6.20. I've heard (but
couldn't test personally) that 2.6.20 gave abysmal performance to users of
AMD chipsets. Since Intel users outnumber them, I'm inclined to revert to a
version that worked for Intel even though AMD suffers.

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-12 00:11:23

by Robert Hancock

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Norman Diamond wrote:
> Alan Cox wrote:
>> Norman Diamond <[email protected]> wrote:
>>
>>> It looks like both LIBATA and the old IDE drivers
>>> have an off-by-one error in deciding whether to use
>>> READ SECTOR(S) instead of READ SECTOR(S) EXT.
>> This was fixed some time ago, you need a newer
>> kernel.
>
> Well, either that or I need an older kernel. It was also
> fixed in 2.6.20, for which I think there was a ready-made
> Slax distribution.
>
> On another topic, trying 2.6.20 in whatever Slax
> distribution it was, an Intel ICH7M had DMA enabled on
> both /dev/hda and /dev/hdc. I understand that the change
> which was made shortly after that is considered to be by
> design not a bug. In 2.6.20 I didn't even have to type a
> "combined_mode" parameter, it just worked. I understand
> that the addition and subsequent deletion of the
> "combined_mode" parameter are considered to be by design
> not bugs. But it is not at all pleasant that my /dev/hda
> runs at 1.3 megabytes per second in 2.6.24.3 and later,
> when it used to run at 45 megabytes per second in 2.6.20.
> Yeah I know libata is supposed to solve all this stuff.
> Removed some bugs and added others.

You really shouldn't use the IDE drivers with SATA devices, if that's
what you're talking about as far as the previous behavior. They were
really never designed for it.

>
> I can't go newer than 2.6.24.3 until I find a version
> where TASKFILEs start working again. There might not be
> one. Yeah I know libata is supposed to solve all this
> stuff. I wonder how much time I'll need and how many
> varieties of hardware I'll have to buy to see if it's
> really fixed.

Realistically, although some people still work on it, testing coverage
of the old IDE drivers is not that great these days, since most
distributions no longer use it. The crusty, byzantine IDE code base
doesn't exactly make it easy for the inexperienced to debug problems,
either..

2009-03-12 02:28:45

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Robert Hancock wrote:
> Norman Diamond wrote:
>> Alan Cox wrote:
>>> Norman Diamond wrote:
>>>
>>>> It looks like both LIBATA and the old IDE drivers
>>>> have an off-by-one error in deciding whether to
>>>> use READ SECTOR(S) instead of READ SECTOR(S) EXT.
>>>
>>> This was fixed some time ago, you need a newer
>>> kernel.
>>
>> Well, either that or I need an older kernel. It
>> was also fixed in 2.6.20, for which I think there
>> was a ready-made Slax distribution.
>>
>> On another topic, trying 2.6.20 in whatever Slax
>> distribution it was, an Intel ICH7M had DMA
>> enabled on both /dev/hda and /dev/hdc. I
>> understand that [subsequent changes losing DMA]
>> is considered to be by design not a bug.
>
> You really shouldn't use the IDE drivers with SATA
> devices, if that's what you're talking about as far
> as the previous behavior. They were really never
> designed for it.

There's next to nothing that I can do about it.
If the BIOS sets the Intel chip to present an ATA
interface then the IDE drivers take control early in
the boot process. If the BIOS sets the Intel chip to
present a SATA interface then LIBATA takes control
early in the boot process.

I'm considering constructing a boot menu where the
default command line has hda=noprobe hdb=noprobe
hdc=noprobe hdd=noprobe, and an alternate boot option
omits those. Even if this will be reliable enough, it
won't be easy to explain to customers (if you don't
believe that then read some tech support stories).

> Realistically, although some people still work on
> it, testing coverage of the old IDE drivers is not
> that great these days, since most distributions no
> longer use it.

Even Knoppix 6.0.1, whose kernel isn't too antique
yet, assigned /dev/hda and /dev/hdc on my Dell D820
with ICH7M.

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-12 04:26:25

by Robert Hancock

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Norman Diamond wrote:
> There's next to nothing that I can do about it.
> If the BIOS sets the Intel chip to present an ATA
> interface then the IDE drivers take control early in
> the boot process. If the BIOS sets the Intel chip to
> present a SATA interface then LIBATA takes control
> early in the boot process.
>
> I'm considering constructing a boot menu where the
> default command line has hda=noprobe hdb=noprobe
> hdc=noprobe hdd=noprobe, and an alternate boot option
> omits those. Even if this will be reliable enough, it
> won't be easy to explain to customers (if you don't
> believe that then read some tech support stories).

I think that at some point the IDE drivers were updated to be less
aggressive about taking control of anything that looked like an IDE
controller, but I'm not certain. These kind of problems are kind of
inevitable when you configure two drivers that will attach to the same
device, it's hard to control which one will attach. (It's especially bad
if two drivers will attach to different parts of the same controller,
which is the combined mode fiasco we had for a while.)

Newer distributions like Fedora are generally setting CONFIG_IDE=n
entirely and avoid the problem.

>
>> Realistically, although some people still work on
>> it, testing coverage of the old IDE drivers is not
>> that great these days, since most distributions no
>> longer use it.
>
> Even Knoppix 6.0.1, whose kernel isn't too antique
> yet, assigned /dev/hda and /dev/hdc on my Dell D820
> with ICH7M.
>
> --------------------------------------
> Power up the Internet with Yahoo! Toolbar.
> http://pr.mail.yahoo.co.jp/toolbar/

2009-03-12 11:21:32

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Jim Paris wrote:
> Norman Diamond wrote:
>>
>> It looks like both LIBATA and the old IDE drivers have an off-by-one
>> error in deciding whether to use READ SECTOR(S) instead of READ SECTOR(S)
>> EXT.
>
> This was fixed here:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119

I'm still having trouble after applying the same patch to 2.6.24.3 (applying
it three times in order to build Slax). But now I wonder if it's no longer
the fault of drivers.

Does hdparm construct its own taskfiles for ATA and SATA in order to produce
an error trying to read sector number 0x0fffffff even after I patched the
kernel?

If dd works then did I adequately patch the kernel?

Meanwhile I think the kernel needs more patches than ata.h.
(1) libata-core.c contains a suspicious expression 1UL << 28.
(2) sata_inic162x.c contains a suspicious expression 1 << 28.

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-12 14:31:05

by Mark Lord

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Norman Diamond wrote:
> Jim Paris wrote:
>> Norman Diamond wrote:
>>>
>>> It looks like both LIBATA and the old IDE drivers have an off-by-one
>>> error in deciding whether to use READ SECTOR(S) instead of READ
>>> SECTOR(S) EXT.
>>
>> This was fixed here:
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119
>>
>
> I'm still having trouble after applying the same patch to 2.6.24.3
> (applying it three times in order to build Slax). But now I wonder if
> it's no longer the fault of drivers.
>
> Does hdparm construct its own taskfiles for ATA and SATA in order to
> produce an error trying to read sector number 0x0fffffff even after I
> patched the kernel?
..

What, *exactly*, do you mean there.
Yes, hdparm constructs its own taskfiles for the --read-sector subcommand.
Are you hitting these errors with the latest hdparm (9.12)?

???

2009-03-12 23:02:47

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Mark Lord wrote:
> Norman Diamond wrote:
>> Jim Paris wrote:
>>> Norman Diamond wrote:
>>>>
>>>> It looks like both LIBATA and the old IDE drivers
>>>> have an off-by-one error in deciding whether to
>>>> use READ SECTOR(S) instead of READ SECTOR(S) EXT.

I don't know for sure now but the old IDE drivers
might be OK on this matter. When hdparm got errors
I thought they were the driver's fault, but now it's
clear it was hdparm's fault.

It still seems doubtful whether LIBATA is fully fixed.

>>> This was fixed here:
>>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119

(Though that patch doesn't mention two suspicious .c
files that compare sector numbers to expressions
1UL << 28 and 1 << 28.)

>> I'm still having trouble after applying the same
>> patch to 2.6.24.3 (applying it three times in order
>> to build Slax).
>> Does hdparm construct its own taskfiles for ATA and
>> SATA in order to produce an error trying to read
>> sector number 0x0fffffff even after I patched the
>> kernel?
>
> What, *exactly*, do you mean there.

hdparm --read-sector 268435455 /dev/hda
and
hdparm --read-sector 268435455 /dev/sda
still gave errors even after I patched LIBATA in
kernel 2.6.24.3. The exceptions are Western Digital
drives that apparently accept LBA28 instructions on
that sector number.

> Yes, hdparm constructs its own taskfiles for the
> --read-sector subcommand.

Thank you. So I think my build of Slax now will work
except for two obscure chipsets under LIBATA, and will
work for every situation that I'm aware of under the
old IDE drivers (though the old IDE drivers will run
unbearably slowly in some cases).

> Are you hitting these errors with the latest hdparm
> (9.12)?

Sorry, I just checked and it's hdparm 8.6. Anyway
it's a relief to see that I probably have a build
working well enough for present needs, and I just have
to use tools other than that build's version of hdparm
to test it.

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-13 07:41:49

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

LIBATA with the patch to ata.h now handles all sectors on
hard drives that it recognizes.

An example of a hard drive that it recognizes is one that
is attached to an Intel ICH7M chipset when hda=noprobe
hdc=noprobe have been specified in the boot command.

An example of a hard drive that it doesn't recognize is
one that is attached to an Intel PIIX4 chipset when
hda=noprobe hdc=noprobe have been specified in the boot
command.

In either case, the boot parameters persuade the old IDE
drivers not to grab the controllers.

With ICH7M, LIBATA takes over and runs both the hard drive
and DVD at full speed.

With PIIX4, LIBATA initializes. End of story. Slax can't
find its own CD. If I only use hda=noprobe then the old
IDE controller assigns hdc to the CD and Slax finds it,
but the hard drive is still undetected. Behaviour is the
same under VMware as in a genuine old PC.

LIBATA's PIIX drivers are built in along with everything
else. They just seem not to get executed.

What am I missing?

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-13 14:45:28

by Robert Hancock

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Norman Diamond wrote:
> LIBATA with the patch to ata.h now handles all sectors on
> hard drives that it recognizes.
>
> An example of a hard drive that it recognizes is one that
> is attached to an Intel ICH7M chipset when hda=noprobe
> hdc=noprobe have been specified in the boot command.
>
> An example of a hard drive that it doesn't recognize is
> one that is attached to an Intel PIIX4 chipset when
> hda=noprobe hdc=noprobe have been specified in the boot
> command.
>
> In either case, the boot parameters persuade the old IDE
> drivers not to grab the controllers.
>
> With ICH7M, LIBATA takes over and runs both the hard drive
> and DVD at full speed.
>
> With PIIX4, LIBATA initializes. End of story. Slax can't
> find its own CD. If I only use hda=noprobe then the old
> IDE controller assigns hdc to the CD and Slax finds it,
> but the hard drive is still undetected. Behaviour is the
> same under VMware as in a genuine old PC.
>
> LIBATA's PIIX drivers are built in along with everything
> else. They just seem not to get executed.
>
> What am I missing?

I assume that ATA_PIIX is set in the configuration.. The lspci -vn and
dmesg output from when it fails to detect would be useful.

>
> --------------------------------------
> Power up the Internet with Yahoo! Toolbar.
> http://pr.mail.yahoo.co.jp/toolbar/

2009-03-14 02:12:39

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Robert Hancock wrote:
> Norman Diamond wrote:
>>
>> LIBATA with the patch to ata.h now handles all sectors on
>> hard drives that it recognizes.
>>
>> An example of a hard drive that it recognizes is one that
>> is attached to an Intel ICH7M chipset when hda=noprobe
>> hdc=noprobe have been specified in the boot command.
>>
>> An example of a hard drive that it doesn't recognize is
>> one that is attached to an Intel PIIX4 chipset when
>> hda=noprobe hdc=noprobe have been specified in the boot
>> command.
>>
>> In either case, the boot parameters persuade the old IDE
>> drivers not to grab the controllers.
>>
>> With ICH7M, LIBATA takes over and runs both the hard drive
>> and DVD at full speed.
>>
>> With PIIX4, LIBATA initializes. End of story. Slax can't
>> find its own CD. If I only use hda=noprobe then the old
>> IDE controller assigns hdc to the CD and Slax finds it,
>> but the hard drive is still undetected. Behaviour is the
>> same under VMware as in a genuine old PC.
>>
>> LIBATA's PIIX drivers are built in along with everything
>> else. They just seem not to get executed.
>>
>> What am I missing?
>
> I assume that ATA_PIIX is set in the configuration.. The lspci -vn and
> dmesg output from when it fails to detect would be useful.

Yes, and built in (not even as a module).

Sorry I'm away from the machine, but even near the machine it's a pain to
copy text exactly because it's running under Slax (live CD) with no network
shares. Even when quoting a dump from the TASKFILE breakage in a later
kernel I had to read and type by hand.

I think the PIIX4 devices of VMware server are well known. Besides vendor
8086 and device 7111, VMware's subsystem is coded into LIBATA's PIIX driver.

For the same PIIX4 devices on a real machine the subsystem is different but
they're still vendor 8086 and device 7111.

If I omit hda=noprobe hdb=noprobe hdc=noprobe hdd=noprobe then dmesg shows
the old IDE driver probing properly, assigning hda to the hard drive and hdc
to Slax's CD-ROM. After that LIBATA 3.0 loads but has nothing to do.

If I include the four noprobes (or as another experiment, ide0=noprobe
ide1=noprobe hda=none hdb=none hdc=none hdd=none), dmesg shows something
recognizing that PIIX4 isn't completely native (as always), then the old IDE
drivers obediently ignoring those devices and reasonably proceeding to probe
and find nothing on ide2, ide3, ide4, and ide5. After that LIBATA 3.0 loads
and still does nothing, even though we know what it should be doing.

In comparison, I did the same experiments again on Dell and Lenovo machines
whose BIOSes set ICH7M chipsets to present ATA interfaces, with no option to
present AHCI interfaces. If I omit the noprobes then the old IDE drivers
take over, the hard drive and DVD drive run UDMA to the ICH7M chips, and the
old IDE drivers run PIO to the ICH7M's ATA interfaces on hda and hdc (with
no option to enable use_dma). If I include the noprobes then the old IDE
drivers obediently refrain, and this them LIBATA 3.0 takes over as it
should.

As a workaround I recompiled Slax's kernel 2.6.24.3 and Slax's other stuff,
with the old IDE drivers changed to modules instead of built in, and LIBATA
built in the same as before. Now LIBATA 3.0 takes over as it should, both
for PIIX4 and ICH7M. But I still worry that someone's going to have
chipsets from UMC or ALi or AMD or VIA or SIS or something, where the old
IDE drivers are necessary, and who knows if this is going to work. I don't
have all the machines I need for testing.

On a different tangent, LIBATA's off-by-one error was present in 2.6.20. I
booted that Slax with no modification on a machine with ICH7M, a Toshiba
250GB hard drive got /dev/sda with DMA, and the DVD drive got /dev/hdc and I
forgot to check if it got DMA. Three dd commands should have all failed:
dd if=/dev/sda of=/dev/null bs=512 skip=268435455 count=1
dd if=/dev/sda of=/dev/null bs=512 skip=268435448 count=8
dd if=/dev/sda of=/dev/null bs=512 skip=268435440 count=16
Somehow the third one worked. Two error messages and one silence, 100%
repro. OK, I'm not the only one pulling hair out over this stuff. I
noticed LIBATA's blacklists for broken firmware in various devices.

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-14 02:15:50

by Robert Hancock

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Norman Diamond wrote:
>> I assume that ATA_PIIX is set in the configuration.. The lspci -vn and
>> dmesg output from when it fails to detect would be useful.
>
> Yes, and built in (not even as a module).
>
> Sorry I'm away from the machine, but even near the machine it's a pain to
> copy text exactly because it's running under Slax (live CD) with no network
> shares. Even when quoting a dump from the TASKFILE breakage in a later
> kernel I had to read and type by hand.
>
> I think the PIIX4 devices of VMware server are well known. Besides vendor
> 8086 and device 7111, VMware's subsystem is coded into LIBATA's PIIX
> driver.
>
> For the same PIIX4 devices on a real machine the subsystem is different but
> they're still vendor 8086 and device 7111.
>
> If I omit hda=noprobe hdb=noprobe hdc=noprobe hdd=noprobe then dmesg shows
> the old IDE driver probing properly, assigning hda to the hard drive and
> hdc
> to Slax's CD-ROM. After that LIBATA 3.0 loads but has nothing to do.
>
> If I include the four noprobes (or as another experiment, ide0=noprobe
> ide1=noprobe hda=none hdb=none hdc=none hdd=none), dmesg shows something
> recognizing that PIIX4 isn't completely native (as always), then the old
> IDE
> drivers obediently ignoring those devices and reasonably proceeding to
> probe
> and find nothing on ide2, ide3, ide4, and ide5. After that LIBATA 3.0
> loads
> and still does nothing, even though we know what it should be doing.

ata_piix should definitely attach to 8086/7111 regardless of the
subsystem IDs. I can't really explain why it's not. You should at least
be seeing a message about the ata_piix version, if not then somehow the
driver probe function isn't even being called..

>
> In comparison, I did the same experiments again on Dell and Lenovo machines
> whose BIOSes set ICH7M chipsets to present ATA interfaces, with no
> option to
> present AHCI interfaces. If I omit the noprobes then the old IDE drivers
> take over, the hard drive and DVD drive run UDMA to the ICH7M chips, and
> the
> old IDE drivers run PIO to the ICH7M's ATA interfaces on hda and hdc (with
> no option to enable use_dma). If I include the noprobes then the old IDE
> drivers obediently refrain, and this them LIBATA 3.0 takes over as it
> should.
>
> As a workaround I recompiled Slax's kernel 2.6.24.3 and Slax's other stuff,
> with the old IDE drivers changed to modules instead of built in, and LIBATA
> built in the same as before. Now LIBATA 3.0 takes over as it should, both
> for PIIX4 and ICH7M. But I still worry that someone's going to have
> chipsets from UMC or ALi or AMD or VIA or SIS or something, where the old
> IDE drivers are necessary, and who knows if this is going to work. I don't
> have all the machines I need for testing.

The old IDE drivers shouldn't be necessary on any of those, at least not
in current kernels. I don't know what the state of all of those libata
drivers in 2.6.24 was.

>
> On a different tangent, LIBATA's off-by-one error was present in 2.6.20. I
> booted that Slax with no modification on a machine with ICH7M, a Toshiba
> 250GB hard drive got /dev/sda with DMA, and the DVD drive got /dev/hdc
> and I
> forgot to check if it got DMA. Three dd commands should have all failed:
> dd if=/dev/sda of=/dev/null bs=512 skip=268435455 count=1
> dd if=/dev/sda of=/dev/null bs=512 skip=268435448 count=8
> dd if=/dev/sda of=/dev/null bs=512 skip=268435440 count=16
> Somehow the third one worked. Two error messages and one silence, 100%
> repro. OK, I'm not the only one pulling hair out over this stuff. I
> noticed LIBATA's blacklists for broken firmware in various devices.
>
> --------------------------------------
> Power up the Internet with Yahoo! Toolbar.
> http://pr.mail.yahoo.co.jp/toolbar/

2009-03-14 08:46:19

by Alan

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

> I think the PIIX4 devices of VMware server are well known. Besides vendor
> 8086 and device 7111, VMware's subsystem is coded into LIBATA's PIIX driver.

That is a known working case so its something about your setup or build.

You might want to try building with CONFIG_IDE=n and CONFIG_ATA +
ATA_PIIX + AHCI = y to double check your config.

2009-03-14 08:47:55

by Alan

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

> ata_piix should definitely attach to 8086/7111 regardless of the
> subsystem IDs. I can't really explain why it's not. You should at least
> be seeing a message about the ata_piix version, if not then somehow the
> driver probe function isn't even being called..

Because on at least some kernels

hda=noprobe ... etc

does not stop the old IDE layer grabbing the PCI device and continuing to
claim ownership (which means ATA cannot attach to it). Arguably this is
correct behaviour as the IDE layer was asked to ignore the devices *NOT*
to ignore the controller.

Alan

2009-03-14 09:35:01

by Norman Diamond

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

Alan Cox wrote:
> [attribution stolen:]
>> ata_piix should definitely attach to 8086/7111 regardless of the
>> subsystem IDs. I can't really explain why it's not. You should at least
>> be seeing a message about the ata_piix version, if not then somehow the
>> driver probe function isn't even being called..
>
> Because on at least some kernels
> hda=noprobe ... etc
> does not stop the old IDE layer grabbing the PCI device and continuing to
> claim ownership (which means ATA cannot attach to it).

That could be part of the explanation. dmesg did include the usual
statements about PIIX4 not being 100% native and irqs would be probed later
(though of course they weren't probed later).

> Arguably this is correct behaviour as the IDE layer was asked to ignore
> the devices *NOT* to ignore the controller.

That doesn't explain why ide0=noprobe ide1=noprobe hda=none hdb=none
hdc=none hdd=none still didn't stop the old IDE layer from grabbing
something. ATA didn't report whether there was something it couldn't grab
onto, all it did was report loading and that was the end of it.


Alan Cox also wrote:
> [Norman Diamond:]
>> I think the PIIX4 devices of VMware server are well known. Besides
>> vendor 8086 and device 7111, VMware's subsystem is coded into LIBATA's
>> PIIX driver.
>
> That is a known working case so its something about your setup or build.

I doubt it. I think your other message explained part of the reason, though
something is still missing.

> You might want to try building with CONFIG_IDE=n and CONFIG_ATA + ATA_PIIX
> + AHCI = y to double check your config.

CONFIG_ATA + ATA_PIIX + AHCI were always y all along. An experiment
changing old IDE from y to m got PIIX4 working, but I'm still nervous about
lots of cases where I don't have machines with other old IDE chipsets to
test.

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

2009-03-14 10:48:16

by Alan

[permalink] [raw]
Subject: Re: Off-by-one in both LIBATA and IDE drivers

> That doesn't explain why ide0=noprobe ide1=noprobe hda=none hdb=none
> hdc=none hdd=none still didn't stop the old IDE layer from grabbing
> something. ATA didn't report whether there was something it couldn't grab
> onto, all it did was report loading and that was the end of it.

It doesn't report that case because that case is considered quite normal
and in addition the probing is handled by the PCI core so it would have
to make a real effort to do so.

As to why ide0=noprobe doesn't work you'd have to ask the IDE folks but
it used to be the case that didn't deal with PCI devices just legacy