Hi Alan,
In 2.6.23 and previous, CD writing works fine on my system. I'm using
ata_piix on:
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA
IDE Controller (rev 01)
When I'm running CD writing utilities, I sometimes see this message in
the kernel logs:
ata2.00: 66 bytes trailing data
Things do work fine though.
With 2.6.24-rc1, I can't write CDs. Instead of the above message, I get:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data 10 in
res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2: EH complete
and the software acts oddly - for example Brasero tells me that the
blank CD I have inserted has a capacity of 17,179,869,184gb and it
doesn't let me write CDs as it says the medium is not writable.
git bisect lead me to commit 2db78dd302d26d242d3e8e5c4c5024b6c3ea93c2 as
the culprit.
Author: Alan Cox <[email protected]>
Date: Tue Oct 2 13:53:04 2007 -0700
libata_scsi: Fix ATAPI transfer lengths
Some controller variants snoop the ATAPI length value for Packet
transfers to do state machine and FIFO management. Thus we want to
set it properly, even for cases where it is otherwise meaningless.
Any ideas?
Thanks!
Daniel
On Tue, 30 Oct 2007 15:14:39 +0000
Daniel Drake <[email protected]> wrote:
> Hi Alan,
>
> In 2.6.23 and previous, CD writing works fine on my system. I'm using
> ata_piix on:
>
> 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA
> IDE Controller (rev 01)
>
> When I'm running CD writing utilities, I sometimes see this message in
> the kernel logs:
> ata2.00: 66 bytes trailing data
> Things do work fine though.
By luck in part I suspect. That shouldn't be happening and indicates
something is very wrong. With the fifo buffers set right that may well
get worse.
> git bisect lead me to commit 2db78dd302d26d242d3e8e5c4c5024b6c3ea93c2 as
> the culprit.
>
> Author: Alan Cox <[email protected]>
> Date: Tue Oct 2 13:53:04 2007 -0700
>
> libata_scsi: Fix ATAPI transfer lengths
>
> Some controller variants snoop the ATAPI length value for Packet
> transfers to do state machine and FIFO management. Thus we want to
> set it properly, even for cases where it is otherwise meaningless.
>
> Any ideas?
Not immediately - but if you've got wrong transfer lengths its a
candidate for this.
Ok lets start with the basics
If you mount a CD and use it does it work
If you use cdrecord does it work ?
What vendor drive and does it seem to be a specific box/drive that
triggers this ?
Alan
Daniel Drake wrote:
> Hi Alan,
>
> In 2.6.23 and previous, CD writing works fine on my system. I'm using
> ata_piix on:
>
> 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA
> IDE Controller (rev 01)
>
> When I'm running CD writing utilities, I sometimes see this message in
> the kernel logs:
> ata2.00: 66 bytes trailing data
> Things do work fine though.
>
> With 2.6.24-rc1, I can't write CDs. Instead of the above message, I get:
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data 10 in
> res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/33
> ata2: EH complete
>
> and the software acts oddly - for example Brasero tells me that the
> blank CD I have inserted has a capacity of 17,179,869,184gb and it
> doesn't let me write CDs as it says the medium is not writable.
>
> git bisect lead me to commit 2db78dd302d26d242d3e8e5c4c5024b6c3ea93c2 as
> the culprit.
>
> Author: Alan Cox <[email protected]>
> Date: Tue Oct 2 13:53:04 2007 -0700
>
> libata_scsi: Fix ATAPI transfer lengths
>
> Some controller variants snoop the ATAPI length value for Packet
> transfers to do state machine and FIFO management. Thus we want to
> set it properly, even for cases where it is otherwise meaningless.
>
> Any ideas?
hrm
we may need to make that controller-specific, given that the code prior
to 2db78dd302d26d242d3e8e5c4c5024b6c3ea93c2 was working for most...
Jeff
> > Any ideas?
>
> hrm
>
> we may need to make that controller-specific, given that the code prior
> to 2db78dd302d26d242d3e8e5c4c5024b6c3ea93c2 was working for most...
Removing it breaks some devices on any controller.
I don't think this one is a simple compatibility issue - its a behaviour
we have done for ten years on PIIX/ICH chipsets without problems in the
old IDE. Also those logs of 66 bytes overrun are worrying - the app is
either issuing incorrectly sized commands, the drive is providing
incorrectly sized answers or something really evil is going on in the
midlayer translations.
And the ICH7 is in fact the chipset I did the original testing on ...
Need to work out if this is some nutty gentoo app, a quirky device we need
to dig into or something else. I've asked Daniel for some first test
points to try and eliminate the obvious.
Alan
#include <stdio.h>
#include <scsi/sg.h>
#include <scsi/scsi.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(void)
{
struct sg_io_hdr transport;
unsigned char cmd[] = {
0x5a, //opcode -- mode sense(10)
0x08, //dbd, llbaa -- dbd=1
0x2a, //page code -- BRASERO_SPC_PAGE_STATUS
// spc-3 says thats "CD capabilities and mechanical status"
//
0x00, //brasero says reserved, spc3 says subpage code
0x00, //reserved
0x00, //reserved
0x00, //alloc len
0x0a, //alloc len
0x00, //ctl
};
unsigned char buffer[10];
unsigned char sense_data[19];
int r;
int i;
int fd = open("/dev/sr0", O_RDONLY|O_NONBLOCK);
if (fd < 0) {
printf("open failed\n");
exit(1);
}
memset(&transport, 0, sizeof(transport));
memset(buffer, 0, sizeof(buffer));
memset(sense_data, 0, sizeof(sense_data));
transport.interface_id = 'S';
transport.cmdp = cmd;
transport.cmd_len = sizeof(cmd);
transport.dxferp = buffer;
transport.dxfer_len = sizeof(buffer);
transport.sbp = sense_data;
transport.mx_sb_len = sizeof(sense_data);
transport.dxfer_direction = SG_DXFER_FROM_DEV;
r = ioctl(fd, SG_IO, &transport);
printf("result %d\n", r);
if ((transport.masked_status & CHECK_CONDITION) && transport.sb_len_wr) {
printf("check sense data:\n");
for (i = 0; i < sizeof(sense_data); i++)
printf("%02x ", sense_data[i]);
printf("\n");
}
}
Daniel Drake wrote:
> I have narrowed down the exact command submission which causes those
> nasty messages in dmesg. The function which submits this is named
> "brasero_medium_get_page_2A_write_speed_desc".
Just as an extra data point...
I've compiled the test program an ran it on my ICH7/Pentium D box running
2.6.24-rc1-192-gef49c32 (x86_64), but it does not give the error and
nothing in dmesg. I just get:
result 0
My SATA DVD-RW drive is:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: Optiarc Model: DVD RW AD-7170S Rev: 1.00
Type: CD-ROM ANSI SCSI revision: 05
> The nutty app I was using for burning is Brasero, a GNOME app which does
> some SG_IO directly with the drive. (I guess it has some bad error
> handling and doesn't realise when some I/O path has failed)
I would guess Brasero is issuing a command with the length of data
wrongly set. In the old code that might well just produce errors of the
"Umm wtf is this data left over for ?", with the new code the drive is
likely to change state as it knows the transfer size and that will
*correctly* cause an HSM error and what follows.
Now the question is who gets the length wrong - Brasero or the ata
translation code in libata
Alan
Alan Cox wrote:
> I would guess Brasero is issuing a command with the length of data
> wrongly set. In the old code that might well just produce errors of the
> "Umm wtf is this data left over for ?", with the new code the drive is
> likely to change state as it knows the transfer size and that will
> *correctly* cause an HSM error and what follows.
>
> Now the question is who gets the length wrong - Brasero or the ata
> translation code in libata
Brasero does exactly the same as my test app which I attached to my last
mail. Is my test app wrong?
Daniel
On Tue, 30 Oct 2007 19:21:29 +0000
Daniel Drake <[email protected]> wrote:
> Alan Cox wrote:
> > I would guess Brasero is issuing a command with the length of data
> > wrongly set. In the old code that might well just produce errors of the
> > "Umm wtf is this data left over for ?", with the new code the drive is
> > likely to change state as it knows the transfer size and that will
> > *correctly* cause an HSM error and what follows.
> >
> > Now the question is who gets the length wrong - Brasero or the ata
> > translation code in libata
>
> Brasero does exactly the same as my test app which I attached to my last
> mail. Is my test app wrong?
Would need to double check the SCSI specificatons to be sure but I think
you are asking for less data than the drive wishes to provide. You
aren't allowed to do that with ATA.
Alan
On Wed, Oct 31 2007, Alan Cox wrote:
> On Tue, 30 Oct 2007 19:21:29 +0000
> Daniel Drake <[email protected]> wrote:
>
> > Alan Cox wrote:
> > > I would guess Brasero is issuing a command with the length of data
> > > wrongly set. In the old code that might well just produce errors of the
> > > "Umm wtf is this data left over for ?", with the new code the drive is
> > > likely to change state as it knows the transfer size and that will
> > > *correctly* cause an HSM error and what follows.
> > >
> > > Now the question is who gets the length wrong - Brasero or the ata
> > > translation code in libata
> >
> > Brasero does exactly the same as my test app which I attached to my last
> > mail. Is my test app wrong?
>
> Would need to double check the SCSI specificatons to be sure but I think
> you are asking for less data than the drive wishes to provide. You
> aren't allowed to do that with ATA.
ide-cd handles this by throwing the excess away, which I think is the
sane way to do this.
--
Jens Axboe
Jens Axboe wrote:
> On Wed, Oct 31 2007, Alan Cox wrote:
>> On Tue, 30 Oct 2007 19:21:29 +0000
>> Daniel Drake <[email protected]> wrote:
>>
>>> Alan Cox wrote:
>>>> I would guess Brasero is issuing a command with the length of data
>>>> wrongly set. In the old code that might well just produce errors of the
>>>> "Umm wtf is this data left over for ?", with the new code the drive is
>>>> likely to change state as it knows the transfer size and that will
>>>> *correctly* cause an HSM error and what follows.
>>>>
>>>> Now the question is who gets the length wrong - Brasero or the ata
>>>> translation code in libata
>>> Brasero does exactly the same as my test app which I attached to my last
>>> mail. Is my test app wrong?
>> Would need to double check the SCSI specificatons to be sure but I think
>> you are asking for less data than the drive wishes to provide. You
>> aren't allowed to do that with ATA.
>
> ide-cd handles this by throwing the excess away, which I think is the
> sane way to do this.
That's easy for the PIO case. But CD writing is normally DMA, which
means you will get a DMA engine exception if the device wants to give
you more data than the scatter/gather entries permit.
Jeff
> > Would need to double check the SCSI specificatons to be sure but I think
> > you are asking for less data than the drive wishes to provide. You
> > aren't allowed to do that with ATA.
>
> ide-cd handles this by throwing the excess away, which I think is the
> sane way to do this.
It also fails for some cases because the controller snoops the length
information in the packet command.
Alan
On Wed, Oct 31 2007, Jeff Garzik wrote:
> Jens Axboe wrote:
>> On Wed, Oct 31 2007, Alan Cox wrote:
>>> On Tue, 30 Oct 2007 19:21:29 +0000
>>> Daniel Drake <[email protected]> wrote:
>>>
>>>> Alan Cox wrote:
>>>>> I would guess Brasero is issuing a command with the length of data
>>>>> wrongly set. In the old code that might well just produce errors of the
>>>>> "Umm wtf is this data left over for ?", with the new code the drive is
>>>>> likely to change state as it knows the transfer size and that will
>>>>> *correctly* cause an HSM error and what follows.
>>>>>
>>>>> Now the question is who gets the length wrong - Brasero or the ata
>>>>> translation code in libata
>>>> Brasero does exactly the same as my test app which I attached to my last
>>>> mail. Is my test app wrong?
>>> Would need to double check the SCSI specificatons to be sure but I think
>>> you are asking for less data than the drive wishes to provide. You
>>> aren't allowed to do that with ATA.
>> ide-cd handles this by throwing the excess away, which I think is the
>> sane way to do this.
>
> That's easy for the PIO case. But CD writing is normally DMA, which means
> you will get a DMA engine exception if the device wants to give you more
> data than the scatter/gather entries permit.
Right, that's of course problematic... There has to be a way to recover
that situation though, or you can't export any user command issue
facility.
--
Jens Axboe
Jens Axboe wrote:
> Right, that's of course problematic... There has to be a way to recover
> that situation though, or you can't export any user command issue
> facility.
You cannot hope to handle all possible effects arising from an app
providing an invalid sg header / cdb.
Once you start talking "recovery" you are already screwed: we are
talking about low-level hardware commands that are passed straight to
the hardware. It is trivial to lock up hardware, brick hardware, and
corrupt data at that level.
If this is NOT a privileged app, we must update the command validation
to ensure that invalid commands are not transported to the hardware.
If this is a privileged app, our work is done. Fix the app. We gave
root rope, and he took it.
I even venture to say that "accept anything, clean up afterwards" is
/impossible/ to implement, in addition to being dangerous.
Jeff
> If this is NOT a privileged app, we must update the command validation
> to ensure that invalid commands are not transported to the hardware.
We allow only a subset of standard commands to be issued by unprivileged
applications. We don't strictly enforce transfer sizes and its arguable
on SCSI transports this doesn't matter one iota. On ATA it perhaps
matters a bit more.
Note btw - the reset sequence cased by excess data off these drives is
something we jump up and down and trigger. For PIO draining the bits is
fine, for DMA pretty much all controllers will clean up happily enough.
Its also nothing like the problem you might think as most drives told
"send me 150 bytes, oh and the transfer length is 40" will in fact send
40.
Alan
Jeff Garzik wrote:
> Jens Axboe wrote:
>> Right, that's of course problematic... There has to be a way to recover
>> that situation though, or you can't export any user command issue
>> facility.
>
> You cannot hope to handle all possible effects arising from an app
> providing an invalid sg header / cdb.
Is it "invalid" though? As a SCSI command, it seems perfectly valid to
ask for less data than the total amount the drive could possibly send.
It's only when we have to translate it to an ATA command in libata that
it becomes invalid there, right?
Are you saying that we should limit the SG_IO commands to only ones that
are valid both as SCSI *and* as their ATA translations? Would that be
for all SCSI devices, or just ones that libata backs?
Daniel
On Wed, Oct 31 2007, Jeff Garzik wrote:
> Jens Axboe wrote:
> >Right, that's of course problematic... There has to be a way to recover
> >that situation though, or you can't export any user command issue
> >facility.
>
> You cannot hope to handle all possible effects arising from an app
> providing an invalid sg header / cdb.
>
> Once you start talking "recovery" you are already screwed: we are
> talking about low-level hardware commands that are passed straight to
> the hardware. It is trivial to lock up hardware, brick hardware, and
> corrupt data at that level.
>
>
> If this is NOT a privileged app, we must update the command validation
> to ensure that invalid commands are not transported to the hardware.
>
> If this is a privileged app, our work is done. Fix the app. We gave
> root rope, and he took it.
Woaw, back the truck up a bit :-)
I'm talking about simple things - like asking for 8 bytes of sense data.
Simple mistakes. You cannot possibly check for everything like that in a
command filter, it's utterly impossible.
> I even venture to say that "accept anything, clean up afterwards" is
> /impossible/ to implement, in addition to being dangerous.
Certainly, that's not what I'm talking about.
--
Jens Axboe
Hello, Jeff.
Jeff Garzik wrote:
> That's easy for the PIO case. But CD writing is normally DMA, which
> means you will get a DMA engine exception if the device wants to give
> you more data than the scatter/gather entries permit.
For sense data and mode pages, the standard-sanctioned way to know the
transfer size is to issue command with short buffer size just enough to
contain the fixed size header part, determine actual transfer size from
it and issue the command again with the correct buffer size. This
doesn't happen for READ/WRITE commands. Transfer sizes are
pre-determined for those commands and WRITE's to optical devices often
can't be retried w/o side effect.
I've just went through the ATA spec and this basically means we can't
use DMA for these variable-transfer-length commands. Some DMA engines
have "throw away what's left over bit" in its command structure or SG
entry but not all do and none of drivers we currently has such feature
enabled.
Hmmm.... reading ide-cd.c::cdrom_pc_intr(). OIC, ide-cd is dealing with
this problem by draining PIO after BMDMA engine is done. This is
possible for BMDMA engines as they simply step out when the SG entries
are exhausted; then, the interrupt handler kicks in and drains the
left-over using PIO. This just isn't possible with more modern DMA engines.
This really makes me think libata should do these commands via PIO
unless we're gonna enable leftover draining for each DMA engine
implementation or blacklist the ones which can't drain individually.
Then again, nobody really knows how well those features would work as
probably none has actually used them.
Thanks.
--
tejun
Tejun Heo wrote:
> Hello, Jeff.
>
> Jeff Garzik wrote:
>> That's easy for the PIO case. But CD writing is normally DMA, which
>> means you will get a DMA engine exception if the device wants to give
>> you more data than the scatter/gather entries permit.
>
> For sense data and mode pages, the standard-sanctioned way to know the
> transfer size is to issue command with short buffer size just enough to
> contain the fixed size header part, determine actual transfer size from
> it and issue the command again with the correct buffer size. This
> doesn't happen for READ/WRITE commands. Transfer sizes are
> pre-determined for those commands and WRITE's to optical devices often
> can't be retried w/o side effect.
>
> I've just went through the ATA spec and this basically means we can't
> use DMA for these variable-transfer-length commands. Some DMA engines
> have "throw away what's left over bit" in its command structure or SG
> entry but not all do and none of drivers we currently has such feature
> enabled.
>
> Hmmm.... reading ide-cd.c::cdrom_pc_intr(). OIC, ide-cd is dealing with
> this problem by draining PIO after BMDMA engine is done. This is
> possible for BMDMA engines as they simply step out when the SG entries
> are exhausted; then, the interrupt handler kicks in and drains the
> left-over using PIO. This just isn't possible with more modern DMA engines.
>
> This really makes me think libata should do these commands via PIO
> unless we're gonna enable leftover draining for each DMA engine
> implementation or blacklist the ones which can't drain individually.
> Then again, nobody really knows how well those features would work as
> probably none has actually used them.
Eeek, please ignore this. I somehow completely forgot about Allocation
Length fields in CDBs.
Thanks.
--
tejun
Ok, gave this a hard look.
This is basically a behavior change with regards to how we program the
bcount(low) and bcount(high) registers.
Issues about FIFO draining and devices returning too-much data are
ultimately tangential. Furthermore, this is an ATAPI PIO issue, as
demonstrated by (a) Alan's patch did not change DMA lbam/lbah
programming and (b) Daniel's report of the message "ata2.00: 66 bytes
trailing data" which occurs in the PIO state machine.
To survey the behaviors for ATAPI PIO:
ide-cd: read/write commands, blimit = 32k
others cmds, blimit = xfer_len
old libata (pre-Alan): blimit = 8k
new libata behavior: blimit = xfer_len
thus Alan's patch was moving us CLOSER to ide-cd's behavior (if we
ignore read/write commands for the moment, which are not at issue).
and the end result is that the change from old-libata to new-libata
behavior broke Daniel's case, which is a device known to ignore the SCSI
command CDB's allocation length field.
Additionally, let's add some background:
libata was originally written for first-gen SATA controllers (incl.
first-gen SATA bridges), most notably ata_piix, the controller Daniel is
using.
I chose blimit 8k because I felt that matches the maximum size of a SATA
Data FIS. I felt -- this was a wild-added guess -- that setting blimit
thusly would properly configure all the magic internal FIFOs and data
buffers in silicon that handled these crazy ATAPI devices with random
transfer lengths.
I am not drawing any conclusions yet, but I'm thinking that blimit=8k
may be a better choice for SATA ATAPI.
Other comment:
* as Tejun noted in IRC, we don't have a clear idea of what triggered
the HSM violation. turning on debugging printk's would help, but
unfortunately that means turning them on for everything, including hard
drives.
Jeff
O> Hmmm.... reading ide-cd.c::cdrom_pc_intr(). OIC, ide-cd is dealing with
> this problem by draining PIO after BMDMA engine is done. This is
> possible for BMDMA engines as they simply step out when the SG entries
> are exhausted; then, the interrupt handler kicks in and drains the
> left-over using PIO. This just isn't possible with more modern DMA engines.
Its not possible with old ones either - its a bug not a feature and is
why there is special casing for stuff like promise where this hangs the
chip.
Alan
> demonstrated by (a) Alan's patch did not change DMA lbam/lbah
> programming and (b) Daniel's report of the message "ata2.00: 66 bytes
(a) did. Well the original did, dunno about your version.
> I am not drawing any conclusions yet, but I'm thinking that blimit=8k
> may be a better choice for SATA ATAPI.
Only if your transfer is actually 8K or more.
Alan
Alan Cox wrote:
>> demonstrated by (a) Alan's patch did not change DMA lbam/lbah
>> programming and (b) Daniel's report of the message "ata2.00: 66 bytes
>
> (a) did. Well the original did, dunno about your version.
We are both half-right. I reverted my version of that completely,
applied your version verbatim, and pushed it upstream. The result:
atapi_xlat _did_ start programming lbam/lbah for DMA (another behavior
change), but request-sense DMA path was not changed to program lba[mh]
for DMA.
>> I am not drawing any conclusions yet, but I'm thinking that blimit=8k
>> may be a better choice for SATA ATAPI.
>
> Only if your transfer is actually 8K or more.
No, that's precisely the problem cause for what Daniel is reporting. We
are setting blimit=xfer_len(==10), when the device wants to return more
than 10 bytes. When set to the old limit (8k), the problem didn't occur.
Prior to this change, __atapi_pio_bytes() happily discarded trailing
data, so the software already knows how to eat trailing data left to us
by the ATAPI device. The printk (and lack of problem) indicated that
this code was active and working fine.
Jeff
> We are both half-right. I reverted my version of that completely,
> applied your version verbatim, and pushed it upstream. The result:
> atapi_xlat _did_ start programming lbam/lbah for DMA (another behavior
> change), but request-sense DMA path was not changed to program lba[mh]
> for DMA.
I didn't realise we ever issued reques sense by DMA ?
> > Only if your transfer is actually 8K or more.
>
> No, that's precisely the problem cause for what Daniel is reporting. We
> are setting blimit=xfer_len(==10), when the device wants to return more
> than 10 bytes. When set to the old limit (8k), the problem didn't occur.
Lots of *other* problems occur instead. Daniel is reporting that if he
makes a stupid request to a buggy drive he gets a reset and the system
continues happily. Even that reset being a reset not a new command issue
is actually us being excessively paranoid.
Prior to these changes people were reporting things like "I cannot
use my CD-ROM".
So given we have ten years of experience of the method I put in taken
from the old IDE layer versus a good way to make one buggy request work
and break everything else - I *know* which I consider to be the right
path.
And PIO drains of DMA controllers do not work for most controllers, some
will simply hang solid if you touch the data register as part of a DMA
sequence (eg promise 2026x)
Alan
Alan Cox wrote:
> Lots of *other* problems occur instead. Daniel is reporting that if he
> makes a stupid request to a buggy drive he gets a reset and the system
> continues happily. Even that reset being a reset not a new command issue
> is actually us being excessively paranoid.
Sorry if I'm sprouting nonsense here -- I don't have the knowledge that
you all do. However, just wanted to point out that I looked up the info
about that mode page in the MMC specs.
http://www.t10.org/ftp/t10/drafts/mmc4/mmc4r05a.pdf page 513.
"E.3.3 MM Capabilities and Mechanical Status Page (Page Code 2Ah)"
The length of this mode page varies from drive to drive, so there is no
"one size" that you can supply to the SG_IO command (unless you want to
use a stupidly large buffer) to retrieve all the data at once. Instead,
as Tejun describes, you put a short read request in first, look at byte
1 of the page which tells you the length, and then read the whole lot.
Again, ignore me if I'm not contributing anything useful, but I'm
increasingly thinking that the SG_IO command block in question is
perfectly valid, and doing a short read of the mode page in question
(and probably others too) is in fact required before you can know its
true size to do a full read anyway.
Thanks,
Daniel
> "one size" that you can supply to the SG_IO command (unless you want to
> use a stupidly large buffer) to retrieve all the data at once. Instead,
> as Tejun describes, you put a short read request in first, look at byte
> 1 of the page which tells you the length, and then read the whole lot.
ATAPI effectively requires you supply a stupidly large buffer. In theory
you set the transfer size in the lba registers and it all works. In
practice some drives ignore this and there isn't a nice reliable way to
clean up.
Old IDE tries to pio the extra out but that won't work on all controllers
so the wrong combination will still blow up on you.
> Again, ignore me if I'm not contributing anything useful, but I'm
> increasingly thinking that the SG_IO command block in question is
> perfectly valid, and doing a short read of the mode page in question
> (and probably others too) is in fact required before you can know its
> true size to do a full read anyway.
Welcome to the wonderful world of IDE, where the spec sucks and the
drives manage to do even worse things.
We can try and clean up better in these cases at least for PIO transfers
by trying to drain the data beyond this point, on the controllers that
cope with this but really - fix the app to reflect reality: ATAPI is SCSI
as spoken by yokels
Alan
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8ee56e5..5424a59 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5251,9 +5251,13 @@ fsm_start:
if (likely(status & (ATA_ERR | ATA_DF)))
/* device stops HSM for abort/error */
qc->err_mask |= AC_ERR_DEV;
- else
+ else {
/* HSM violation. Let EH handle this */
+ ata_dev_printk(qc->dev, KERN_WARNING,
+ "HSM violation: HSM_ST_FIRST: "
+ "!DRQ && !ERR && !DF\n");
qc->err_mask |= AC_ERR_HSM;
+ }
ap->hsm_task_state = HSM_ST_ERR;
goto fsm_start;
@@ -5344,13 +5348,17 @@ fsm_start:
if (likely(status & (ATA_ERR | ATA_DF)))
/* device stops HSM for abort/error */
qc->err_mask |= AC_ERR_DEV;
- else
+ else {
/* HSM violation. Let EH handle this.
* Phantom devices also trigger this
* condition. Mark hint.
*/
+ ata_dev_printk(qc->dev, KERN_WARNING,
+ "HSM violation: HSM_ST: "
+ "!DRQ && !ERR && !DF\n");
qc->err_mask |= AC_ERR_HSM |
AC_ERR_NODEV_HINT;
+ }
ap->hsm_task_state = HSM_ST_ERR;
goto fsm_start;
@@ -5375,8 +5383,12 @@ fsm_start:
status = ata_wait_idle(ap);
}
- if (status & (ATA_BUSY | ATA_DRQ))
+ if (status & (ATA_BUSY | ATA_DRQ)) {
+ ata_dev_printk(qc->dev, KERN_WARNING,
+ "HSM violation: HSM_ST: "
+ "ERR/DF w/ BUSY/DRQ\n");
qc->err_mask |= AC_ERR_HSM;
+ }
/* ata_pio_sectors() might change the
* state to HSM_ST_LAST. so, the state
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 496edaf..8caaa77 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1394,6 +1394,8 @@ static void ata_eh_analyze_serror(struct ata_link *link)
action |= ATA_EH_SOFTRESET;
}
if (serror & SERR_PROTOCOL) {
+ ata_link_printk(link, KERN_WARNING,
+ "HSM violation: analyze_serror: SERR_PROTOCOL\n");
err_mask |= AC_ERR_HSM;
action |= ATA_EH_SOFTRESET;
}
@@ -1504,6 +1506,8 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc,
u8 stat = tf->command, err = tf->feature;
if ((stat & (ATA_BUSY | ATA_DRQ | ATA_DRDY)) != ATA_DRDY) {
+ ata_dev_printk(qc->dev, KERN_WARNING,
+ "HSM violation: eh_analyze_tf: BUSY|DRQ\n");
qc->err_mask |= AC_ERR_HSM;
return ATA_EH_SOFTRESET;
}
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
Alan Cox wrote:
>> "one size" that you can supply to the SG_IO command (unless you want to
>> use a stupidly large buffer) to retrieve all the data at once. Instead,
>> as Tejun describes, you put a short read request in first, look at byte
>> 1 of the page which tells you the length, and then read the whole lot.
>
> ATAPI effectively requires you supply a stupidly large buffer. In theory
> you set the transfer size in the lba registers and it all works. In
> practice some drives ignore this and there isn't a nice reliable way to
> clean up.
The transfer size is specified in Allocation Length field inside CDB of
commands which can transfer variable length data. I got confused about
this too but both the host and device know how much data can be transferred.
> Welcome to the wonderful world of IDE, where the spec sucks and the
> drives manage to do even worse things.
>
> We can try and clean up better in these cases at least for PIO transfers
> by trying to drain the data beyond this point, on the controllers that
> cope with this but really - fix the app to reflect reality: ATAPI is SCSI
> as spoken by yokels
Not that I disagree to this point tho. :-)
--
tejun
Daniel Drake wrote:
> Again, ignore me if I'm not contributing anything useful, but I'm
> increasingly thinking that the SG_IO command block in question is
> perfectly valid, and doing a short read of the mode page in question
> (and probably others too) is in fact required before you can know its
> true size to do a full read anyway.
Correct -- I do not see anything wrong with the SCSI command being
passed to the device.
It is quite normal to request an allocation length less than the amount
of data to be returned -- often this is even a requirement, in order for
the SCSI client to see how much data buffer is must allocate for the
returned data.
This technique is used all over SCSI, both in userland and in the
kernel. Nothing wrong with the test app AFAICS.
This behavior can be adequated summarized as a drive firmware bug, where
the device will puke when both SCSI allocation length and ATA byte count
limit are below the amount of data it wishes to return.
At this point I think we're interested to see the output from Tejun's
debugging patch, posted in this thread.
Jeff
Tejun Heo wrote:
> Yeap, the SG command is fine. The drive is being weird tho. The
> allocation length field says 10 bytes, so it should just have
> transferred 10 bytes without causing HSM violation.
>
> Can you please apply the attached patch and report what the kernel says
> after triggering the error condition?
Thanks for looking at this, the kernel now says:
<4>ata2.00: HSM violation: eh_analyze_tf: BUSY|DRQ
<3>ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
<3>ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data
10 in
<4> res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM
violation)
<3>ata2.00: status: { DRDY DRQ }
<6>ata2: soft resetting link
<6>ata2.00: configured for UDMA/33
<6>ata2: EH complete
Daniel
Daniel Drake wrote:
> Tejun Heo wrote:
>> Yeap, the SG command is fine. The drive is being weird tho. The
>> allocation length field says 10 bytes, so it should just have
>> transferred 10 bytes without causing HSM violation.
>>
>> Can you please apply the attached patch and report what the kernel says
>> after triggering the error condition?
>
> Thanks for looking at this, the kernel now says:
>
> <4>ata2.00: HSM violation: eh_analyze_tf: BUSY|DRQ
> <3>ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> <3>ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data
> 10 in
> <4> res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM
> violation)
> <3>ata2.00: status: { DRDY DRQ }
> <6>ata2: soft resetting link
> <6>ata2.00: configured for UDMA/33
> <6>ata2: EH complete
Does this patch fix the problem?
--
tejun
Tejun Heo wrote:
> Does this patch fix the problem?
[snip]
I hope it does... that patch removes a call to sg_last() and in general
cleans up the function a bit, which is nice.
Jeff
Tejun Heo wrote:
>> <4>ata2.00: HSM violation: eh_analyze_tf: BUSY|DRQ
>> <3>ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
>> <3>ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data
>> 10 in
>> <4> res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM
>> violation)
>> <3>ata2.00: status: { DRDY DRQ }
>> <6>ata2: soft resetting link
>> <6>ata2.00: configured for UDMA/33
>> <6>ata2: EH complete
>
> Does this patch fix the problem?
That fixes it, thanks! There is no more ugly error in dmesg, the test
prog doesn't print any sense data, and brasero works OK too. However,
these messages appear in the kernel log every time I run the test app
(or when brasero does its thing):
<4>ata2.00: 10 bytes trailing data
<4>ata2.00: 10 bytes trailing data
<4>ata2.00: 10 bytes trailing data
<4>ata2.00: 10 bytes trailing data
<4>ata2.00: 10 bytes trailing data
<4>ata2.00: 10 bytes trailing data
<4>ata2.00: 6 bytes trailing data
Daniel
Daniel Drake wrote:
> Tejun Heo wrote:
>>> <4>ata2.00: HSM violation: eh_analyze_tf: BUSY|DRQ
>>> <3>ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
>>> <3>ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data
>>> 10 in
>>> <4> res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM
>>> violation)
>>> <3>ata2.00: status: { DRDY DRQ }
>>> <6>ata2: soft resetting link
>>> <6>ata2.00: configured for UDMA/33
>>> <6>ata2: EH complete
>>
>> Does this patch fix the problem?
>
> That fixes it, thanks! There is no more ugly error in dmesg, the test
> prog doesn't print any sense data, and brasero works OK too. However,
> these messages appear in the kernel log every time I run the test app
> (or when brasero does its thing):
>
> <4>ata2.00: 10 bytes trailing data
> <4>ata2.00: 10 bytes trailing data
> <4>ata2.00: 10 bytes trailing data
> <4>ata2.00: 10 bytes trailing data
> <4>ata2.00: 10 bytes trailing data
> <4>ata2.00: 10 bytes trailing data
> <4>ata2.00: 6 bytes trailing data
Yeah, that's expected. What's going on here is that your drive sends
full mode sense data (76bytes) regardless of allocation size in CDB but
it does honor transfer chunk set in the PACKET TF, which is set to the
same value as allocation size by Alan's patch. So, now the drive sends
the 76 bytes in 8 chunks. The first chunk is transferred into the sg
buffer and the following chunks are thrown away.
Previously, transfer chunk was set to 8k, so the drive claims to
transfer 76 bytes from the buegging, libata transfers leading 10 bytes
got transferred into the user buffer and throws away what's remaining.
The change caused problem because libata HSM always switches to
HSM_ST_LAST (command sequence completion) after filling the command
buffer completely. So, throwing away is activated iff the extra data to
throw away is transfered together with the last chunk of useful data.
With the chunk size reduced to allocation size, the initial chunk fills
the data buffer completely and all the extra bytes are transfered in
separate chunks. However, libata HSM expects command sequence to
complete after the initial chunk but the drive asserts DRQ for the next
chunk on the following interrupt, so HSM violation is triggered.
The patch modifies HSM such that it keeps throwing away extra data as
long as the drive asserts DRQ which is how IDE driver does it.
However, there's still remaining issues. What does happen if you raise
allocation length and buffersize of the test program to 16? ie. Change
0x0a in cmd[] to 0x10 and increase buffer[10] to buffer[16].
Thanks.
--
tejun
Tejun Heo wrote:
> Daniel Drake wrote:
>
>>Tejun Heo wrote:
>>
>>>><4>ata2.00: HSM violation: eh_analyze_tf: BUSY|DRQ
>>>><3>ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
>>>><3>ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data
>>>>10 in
>>>><4> res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM
>>>>violation)
>>>><3>ata2.00: status: { DRDY DRQ }
>>>><6>ata2: soft resetting link
>>>><6>ata2.00: configured for UDMA/33
>>>><6>ata2: EH complete
>>>
>>>Does this patch fix the problem?
>>
>>That fixes it, thanks! There is no more ugly error in dmesg, the test
>>prog doesn't print any sense data, and brasero works OK too. However,
>>these messages appear in the kernel log every time I run the test app
>>(or when brasero does its thing):
>>
>><4>ata2.00: 10 bytes trailing data
>><4>ata2.00: 10 bytes trailing data
>><4>ata2.00: 10 bytes trailing data
>><4>ata2.00: 10 bytes trailing data
>><4>ata2.00: 10 bytes trailing data
>><4>ata2.00: 10 bytes trailing data
>><4>ata2.00: 6 bytes trailing data
>
>
> Yeah, that's expected. What's going on here is that your drive sends
> full mode sense data (76bytes) regardless of allocation size in CDB but
> it does honor transfer chunk set in the PACKET TF, which is set to the
> same value as allocation size by Alan's patch. So, now the drive sends
> the 76 bytes in 8 chunks. The first chunk is transferred into the sg
> buffer and the following chunks are thrown away.
>
> Previously, transfer chunk was set to 8k, so the drive claims to
> transfer 76 bytes from the buegging, libata transfers leading 10 bytes
> got transferred into the user buffer and throws away what's remaining.
> The change caused problem because libata HSM always switches to
> HSM_ST_LAST (command sequence completion) after filling the command
> buffer completely. So, throwing away is activated iff the extra data to
> throw away is transfered together with the last chunk of useful data.
>
> With the chunk size reduced to allocation size, the initial chunk fills
> the data buffer completely and all the extra bytes are transfered in
> separate chunks. However, libata HSM expects command sequence to
> complete after the initial chunk but the drive asserts DRQ for the next
> chunk on the following interrupt, so HSM violation is triggered.
>
> The patch modifies HSM such that it keeps throwing away extra data as
> long as the drive asserts DRQ which is how IDE driver does it.
>
>From past experience, some dead ATAPI devices stuck in DRQ=1. We should
take care of such situation, otherwise the HSM might get into an infinite
loop of waiting for the dead ATAPI device to say DRQ=0 and discarding
endless "trailing data".
Maybe we could set a limit here. If the ATAPI device keeps DRQ=1 and
exceeds the limit, we consider it as HSM violation and have EH handle it.
--
albert
> Maybe we could set a limit here. If the ATAPI device keeps DRQ=1 and
> exceeds the limit, we consider it as HSM violation and have EH handle it.
On a DMA transfer its basically out of our control (and a PIO drain will
lock some controllers solid until power cycle), but on PIO we know we
never set a chunk size over 64K, so if we exceed 64K its time to apply a
larger hammer
Alan Cox wrote:
>> Maybe we could set a limit here. If the ATAPI device keeps DRQ=1 and
>> exceeds the limit, we consider it as HSM violation and have EH handle it.
>
> On a DMA transfer its basically out of our control (and a PIO drain will
> lock some controllers solid until power cycle),
Do such controllers lock up on PIO draining after PIO transfers too?
Can you tell which are those controllers?
> but on PIO we know we
> never set a chunk size over 64K, so if we exceed 64K its time to apply a
> larger hammer
Draining is related to the amount of data the drive responds not to the
chunk size. I agree 64k should be enough for most cases but I think
there can be corner cases where this doesn't hold.
Thanks.
--
tejun
Tejun Heo wrote:
> However, there's still remaining issues. What does happen if you raise
> allocation length and buffersize of the test program to 16? ie. Change
> 0x0a in cmd[] to 0x10 and increase buffer[10] to buffer[16].
Eek. The process hangs in D state for a good 60 seconds or so.
Caught a trace with sysrq.
<6>a.out D 00200202 0 7015 3217
<4> b801fc7c 00200082 b029d5cb 00200202 b04a4c00 b04a4c00 b046ee58
b0472040
<4> b24fa000 b24fa140 b17fc040 00000000 b264e000 b801fb80 b264e000
b01f266c
<4> b278f800 00200202 b0285eb7 7fffffff 7fffffff b801fc7c b801fc7c
b035327c
<4>Call Trace:
<4> [<b029d5cb>] ata_scsi_translate+0xd7/0x105
<4> [<b01f266c>] elv_next_request+0xe3/0xf2
<4> [<b0285eb7>] scsi_dispatch_cmd+0x172/0x1aa
<4> [<b035327c>] schedule_timeout+0x13/0x8d
<4> [<b0128830>] del_timer+0x48/0x4e
<4> [<b0352abc>] wait_for_common+0xb2/0x118
<4> [<b011bc10>] default_wake_function+0x0/0x8
<4> [<b01f50d6>] blk_execute_rq+0xa6/0xbc
<4> [<b01f412b>] blk_end_sync_rq+0x0/0x23
<4> [<b0181a14>] bio_add_pc_page+0x23/0x28
<4> [<b0182116>] bio_copy_user+0x116/0x221
<4> [<b0181e5b>] bio_phys_segments+0xe/0x14
<4> [<b01f4253>] blk_rq_bio_prep+0x28/0xa4
<4> [<b01f452a>] blk_rq_append_bio+0x11/0x3a
<4> [<b01f46a0>] blk_rq_map_user+0xfc/0x17f
<4> [<b01f8173>] sg_io+0x20e/0x2f0
<4> [<b01f8408>] scsi_cmd_ioctl+0x1b3/0x34b
When it unhung, this appeared in the logs
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd a0/01:00:00:10:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data 16 in
res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: port is slow to respond, please be patient (Status 0xd0)
ata2: device not ready (errno=-16), forcing hardreset
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2: EH complete
And the app itself outputted:
result 0
check sense data:
72 0b 00 00 00 00 00 0e 09 0c 00 00 00 03 00 00 00 00 00
Daniel
On Mon, 05 Nov 2007 09:05:48 +0900
Tejun Heo <[email protected]> wrote:
> Alan Cox wrote:
> >> Maybe we could set a limit here. If the ATAPI device keeps DRQ=1 and
> >> exceeds the limit, we consider it as HSM violation and have EH handle it.
> >
> > On a DMA transfer its basically out of our control (and a PIO drain will
> > lock some controllers solid until power cycle),
>
> Do such controllers lock up on PIO draining after PIO transfers too?
> Can you tell which are those controllers?
Promise PDC202xx will lock on a PIO drain of a DMA transfer or (if you
reset it before you drain) on a PIO drain of a PIO transfer.
> > but on PIO we know we
> > never set a chunk size over 64K, so if we exceed 64K its time to apply a
> > larger hammer
>
> Draining is related to the amount of data the drive responds not to the
> chunk size. I agree 64k should be enough for most cases but I think
> there can be corner cases where this doesn't hold.
Tapes perhaps yes ?
Alan Cox wrote:
> On Mon, 05 Nov 2007 09:05:48 +0900
> Tejun Heo <[email protected]> wrote:
>
>> Alan Cox wrote:
>>>> Maybe we could set a limit here. If the ATAPI device keeps DRQ=1 and
>>>> exceeds the limit, we consider it as HSM violation and have EH handle it.
>>> On a DMA transfer its basically out of our control (and a PIO drain will
>>> lock some controllers solid until power cycle),
>> Do such controllers lock up on PIO draining after PIO transfers too?
>> Can you tell which are those controllers?
>
> Promise PDC202xx will lock on a PIO drain of a DMA transfer or (if you
> reset it before you drain) on a PIO drain of a PIO transfer.
Just to make sure I'm not misunderstanding it. So, PDC202xx will lock
up if you transfer allocation size, reset and then try to drain, but
it's okay to keep draining as part of continued PIO transfer, right?
--
tejun
> Just to make sure I'm not misunderstanding it. So, PDC202xx will lock
> up if you transfer allocation size, reset and then try to drain, but
> it's okay to keep draining as part of continued PIO transfer, right?
The following hang mine:
PIO transfer
Reset state machine
Read data register
DMA transfer in progress
Read data register
And this does not
PIO transfer
Keep reading data register while checking DRQ