2005-12-20 21:55:45

by John Treubig

[permalink] [raw]
Subject: ATA Write Error and Time-out Notification in User Space

Where would I look in the LibATA/SCSI chain to permit Write Error and
Time-out notification to be passed back to user space without hanging the
system?

BACKGROUND: We've implemented a disk drive testing system under 2.6 using
LibATA to access the ATA devices. During testing, we attach additional
"test" drives to the system and perform reads and writes to these drives.
The system drive is not part of the test environment. During testing we
expect to see errors reported (read, write and time-out) from the "test"
drives. When a SCSI disks report errors, the SCSI handlers perform as
expected, reporting the error and recovering. When ATA drives report
errors, only read errors recover and we are able to capture the error.
Write and time-out errors hang the system.

Hardware tested:
Promise Ultra133 TX2 (PDC20269 chip; pata_pdc2027x driver)

Kernel versions tested:
2.6.11
2.6.14 rc2
2.6.15 rc5

RECREATING THE PROBLEM: In our situation, we see the write and time-out
failures when we screen drives beyond their commercial temperature limits
for military applications using a custom application to read and write to
the "test" drives. Rather than having to have a drive in a special
temperature environment, the easiest way to simulate the failures is to
unplug power to a test drive while it is under test. The resulting errors
and time-outs will hang the system. Under this scenario, any type of read
or write to the "test" drive will fail the same as with our application.
I've attached copies of the system messages and startup-log to give further
details into the hangs.

---
Best wishes,
John Treubig
VT Miltope
Senior Test Engineer
(334) 613-6495


Attachments:
dmesg.txt (15.27 kB)
messages1.gz (11.86 kB)
Download all attachments

2005-12-20 22:50:41

by Alan

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

On Maw, 2005-12-20 at 15:55 -0600, John Treubig wrote:
> Where would I look in the LibATA/SCSI chain to permit Write Error and
> Time-out notification to be passed back to user space without hanging the
> system?


Some background first:

The 2.6 block layer can generally handle passing errors back up. It has
a load of problems with EOF on media that is variable size but block
that need fixing but the fundamental errors get back to the block layer.

Unfortunately although they get back to the block request the full error
is not propogated further up the stack. Thats actually tricky for the
general file system case as I/O as asynchronous to the actual file
system accesses and we may even hit errors on pages we didn't actually
need.

One result of that is that on write errors we generally mark a volume
offline and processes accessing it get stuck.

> drives. When a SCSI disks report errors, the SCSI handlers perform as
> expected, reporting the error and recovering. When ATA drives report
> errors, only read errors recover and we are able to capture the error.
> Write and time-out errors hang the system.

The problem with the file system layer at this point is it isn't clear
how you get the device back. What you should see is a sequence of
retries and then the volume going offline.

I don't know how complete your log is but it doesn't end with the
expected 'giving up' and volume offlining. Is that because the final
messages don't hit the log or are they just not seen ? The promise
devices have some "interesting" behaviour when you reset the chip.

There is a second problem with PATA too. If the drive decides to keel
over asserting IORDY its game over. The bus transactions will hang and
the CPU get stuck. That would _not_ be my first suspicion however.



2005-12-21 00:31:51

by Drew Winstel

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

Hi Alan,

On Tuesday, December 20, 2005 16:50, Alan Cox wrote:
> On Maw, 2005-12-20 at 15:55 -0600, John Treubig wrote:
> > Where would I look in the LibATA/SCSI chain to permit Write Error and
> > Time-out notification to be passed back to user space without hanging the
> > system?
>
> Some background first:
>
> The 2.6 block layer can generally handle passing errors back up. It has
> a load of problems with EOF on media that is variable size but block
> that need fixing but the fundamental errors get back to the block layer.
>
> Unfortunately although they get back to the block request the full error
> is not propogated further up the stack. Thats actually tricky for the
> general file system case as I/O as asynchronous to the actual file
> system accesses and we may even hit errors on pages we didn't actually
> need.
>
> One result of that is that on write errors we generally mark a volume
> offline and processes accessing it get stuck.
>

With the application that John is using (namely, it delivers reads and writes
directly to the drive via various SG ioctls), the file system is not an
issue, hence wanting the errors to be returned to userspace.

I presume this means that John would have to look at the block level error
handling as opposed to the SCSI level?

<snip>

Drew

2005-12-22 01:08:32

by Alan

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

On Maw, 2005-12-20 at 18:31 -0600, Drew Winstel wrote:
> With the application that John is using (namely, it delivers reads and writes
> directly to the drive via various SG ioctls), the file system is not an
> issue, hence wanting the errors to be returned to userspace.
>
> I presume this means that John would have to look at the block level error
> handling as opposed to the SCSI level?

If you are using the sg ioctls then the commands are dispatched and the
results come through the request queues but not the block layer above.

In that case you really shouldn't be seeing a hang.

Alan

2006-01-03 18:29:56

by John Treubig

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

Alan,

Happy new year and many thanks for your help. I took the information from
the prior submissions and wanted to see if the Promise sub-system was my
failure point. I put a drive on the Secondary IDE bus hanging off the
motherboard Nvidia NForce 2 controller, began an access and pulled the plug.
Sure enough the failures occured and were passed back to user level, but
the system did not hang. I've repeated this a number of times. I moved the
same drive to the Promise Controller and the hang occurs. Thus it seems we
have proved the Promise sub-system is my problem.

You suggested the something like IORDY hanging was not likely the problem.
Given the data, what would you suspect? I've included the message logs for
the NForce for comparison.


From: Alan Cox <[email protected]>
To: Drew Winstel <[email protected]>
CC: [email protected], John Treubig
<[email protected]>,[email protected],
[email protected]
Subject: Re: ATA Write Error and Time-out Notification in User Space
Date: Thu, 22 Dec 2005 01:09:16 +0000

On Maw, 2005-12-20 at 18:31 -0600, Drew Winstel wrote:
> With the application that John is using (namely, it delivers reads and
writes
> directly to the drive via various SG ioctls), the file system is not an
> issue, hence wanting the errors to be returned to userspace.
>
> I presume this means that John would have to look at the block level
error
> handling as opposed to the SCSI level?

If you are using the sg ioctls then the commands are dispatched and the
results come through the request queues but not the block layer above.

In that case you really shouldn't be seeing a hang.

Alan


Attachments:
failure.txt (14.61 kB)

2006-01-03 18:56:41

by Alan

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

On Maw, 2006-01-03 at 12:29 -0600, John Treubig wrote:
> failure point. I put a drive on the Secondary IDE bus hanging off the
> motherboard Nvidia NForce 2 controller, began an access and pulled the plug.
> Sure enough the failures occured and were passed back to user level, but
> the system did not hang. I've repeated this a number of times. I moved the
> same drive to the Promise Controller and the hang occurs. Thus it seems we
> have proved the Promise sub-system is my problem.


Bingo. Yes I know why this is occuring now.

There is a known old bug with error handling in some cases on promise
chips. The core kernel code tries to clean up any remaining data after
an error (to handle chip prefetch/postwrite FIFOs) if DRQ_STAT is
asserted. Its a nice trick, saves on resets and slow recovery but isn't
compatible with some promise controllers.

The -mm tree has a partial but incomplete fix to this implemented, the
base kernel does not have this fixed.

Its been known for some time so perhaps the ide maintainers have patches
waiting for 2.6.16 now its opened ?

Alan

2006-01-03 19:27:30

by John Treubig

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

I receive this as great news, only I don't know where the -mm tree is
located to see if I can get the patch or fix! Can you give me a few
pointers?!



From: Alan Cox <[email protected]>
To: John Treubig <[email protected]>
CC: [email protected], [email protected],[email protected],
[email protected]
Subject: Re: ATA Write Error and Time-out Notification in User Space
Date: Tue, 03 Jan 2006 18:58:42 +0000

On Maw, 2006-01-03 at 12:29 -0600, John Treubig wrote:
> failure point. I put a drive on the Secondary IDE bus hanging off the
> motherboard Nvidia NForce 2 controller, began an access and pulled the
plug.
> Sure enough the failures occured and were passed back to user level,
but
> the system did not hang. I've repeated this a number of times. I moved
the
> same drive to the Promise Controller and the hang occurs. Thus it seems
we
> have proved the Promise sub-system is my problem.


Bingo. Yes I know why this is occuring now.

There is a known old bug with error handling in some cases on promise
chips. The core kernel code tries to clean up any remaining data after
an error (to handle chip prefetch/postwrite FIFOs) if DRQ_STAT is
asserted. Its a nice trick, saves on resets and slow recovery but isn't
compatible with some promise controllers.

The -mm tree has a partial but incomplete fix to this implemented, the
base kernel does not have this fixed.

Its been known for some time so perhaps the ide maintainers have patches
waiting for 2.6.16 now its opened ?

Alan

2006-01-03 21:47:07

by Alan

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

On Maw, 2006-01-03 at 13:27 -0600, John Treubig wrote:
> I receive this as great news, only I don't know where the -mm tree is
> located to see if I can get the patch or fix! Can you give me a few
> pointers?!

The -mm patches live on kernel.org in pub/linux/kernel/people/akpm/

The patch you want is the one to drivers/ide/ide-io.c although be aware
it will make non PCI ATA controllers crash on errors if applied. The
"right" fix for this is probably to have a hwif->flush_data() function
that defaults to try_to_flush_leftover_data() so that the knowledge
involved is not hacked into the ide core but kept in the driver.


Alan

2006-01-05 20:27:09

by John Treubig

[permalink] [raw]
Subject: Re: ATA Write Error and Time-out Notification in User Space

I applied the patch to the mm and yes, we have no more hangs! The bad news
is the application gets no notification that there has been any error in the
drive subsystem. I have only had one instance that I got a notification
back, but this was due to me adding a BUG_ON statement prior to the PRINTKs.

Here's the details of what occurs: With the application running, and I
unplug the drive, my app still tries to read and write because it has gotten
no error from the SG calls. Is there a way that we can notify the kernel
that this device is dead and cause all future accesses to result in an
error?

I have had great difficulty getting the PRINTKs to output anything for this
specific error, yet I've been able to get other PRINTKs in the IDE-IO.C to
output. I have attached the messages that do result.



From: Alan Cox <[email protected]>
To: John Treubig <[email protected]>
CC: [email protected], [email protected],[email protected],
[email protected]
Subject: Re: ATA Write Error and Time-out Notification in User Space
Date: Tue, 03 Jan 2006 21:48:04 +0000

On Maw, 2006-01-03 at 13:27 -0600, John Treubig wrote:
> I receive this as great news, only I don't know where the -mm tree is
> located to see if I can get the patch or fix! Can you give me a few
> pointers?!

The -mm patches live on kernel.org in pub/linux/kernel/people/akpm/

The patch you want is the one to drivers/ide/ide-io.c although be aware
it will make non PCI ATA controllers crash on errors if applied. The
"right" fix for this is probably to have a hwif->flush_data() function
that defaults to try_to_flush_leftover_data() so that the knowledge
involved is not hacked into the ide core but kept in the driver.


Alan


Attachments:
junk1.txt (8.85 kB)