2006-01-10 22:30:39

by John Treubig

[permalink] [raw]
Subject: Error handling in LibATA

I've been working on a problem with Promise 20269 PATA adapter under LibATA
that if the drive has a write error or time-out, the application that is
accessing the drive using SG should see some sort of error. My first
problem was my system hung. After patching the IDE-IO.C, with a recognized
patch, I have been able to keep my system from hanging. Now the only
problem is the application gets no notification that the drive has been
rendered inaccessible. (Test case is to run a system with my app going, and
then pull the power from the drive. System log shows the errors, but
nothing gets back to the app). The app does get notifications if I perform
the same type of test on a drive attached to the motherboard secondary IDE
adapter, so we know the app is correctly implemented.

I've traced the errors down to the fact that the errors are caught in
libata-core.c (ata_qc_timeout). I'd like to put a call in libata-core.c
that would cause an error to be reflected back to the application. Can you
suggest the function or method that would do this?

Best wishes,
John Treubig
VT Miltope Corporation



2006-01-10 23:12:10

by Alan

[permalink] [raw]
Subject: Re: Error handling in LibATA

On Maw, 2006-01-10 at 16:30 -0600, John Treubig wrote:
> I've traced the errors down to the fact that the errors are caught in
> libata-core.c (ata_qc_timeout). I'd like to put a call in libata-core.c


drivers/ide/* doesn't use libata. libata is used by the new PATA/SATA
drivers.

2006-01-10 23:21:45

by Douglas Gilbert

[permalink] [raw]
Subject: Re: Error handling in LibATA

John Treubig wrote:
> I've been working on a problem with Promise 20269 PATA adapter under
> LibATA that if the drive has a write error or time-out, the application
> that is accessing the drive using SG should see some sort of error. My
> first problem was my system hung. After patching the IDE-IO.C, with a
> recognized patch, I have been able to keep my system from hanging. Now
> the only problem is the application gets no notification that the drive
> has been rendered inaccessible. (Test case is to run a system with my
> app going, and then pull the power from the drive. System log shows the
> errors, but nothing gets back to the app). The app does get
> notifications if I perform the same type of test on a drive attached to
> the motherboard secondary IDE adapter, so we know the app is correctly
> implemented.
>
> I've traced the errors down to the fact that the errors are caught in
> libata-core.c (ata_qc_timeout). I'd like to put a call in libata-core.c
> that would cause an error to be reflected back to the application. Can
> you suggest the function or method that would do this?

John,
SG_IO ioctl users would normally expect to see DRIVER_TIMEOUT
(plus a suggest mask) in sg_io_hdr::driver_status when a
mid level timeout goes off. So that needs to be "wired"
in libata (along with some other transport errors I suspect).

Here is an example of a timeout using the scsi_debug driver:
# modprobe scsi_debug ptype=9 delay=200000
# lsscsi -g
[0:0:0:0] comms Linux scsi_debug 0004 - /dev/sg0
# sg_start /dev/sg0
start stop unit: transport: Driver_status=0x06 [DRIVER_TIMEOUT, SUGGEST_OK]
START STOP UNIT command failed


Doug Gilbert

2006-01-11 02:07:48

by Albert Lee

[permalink] [raw]
Subject: Re: Error handling in LibATA

John Treubig wrote:
> I've been working on a problem with Promise 20269 PATA adapter under
> LibATA that if the drive has a write error or time-out, the application
> that is accessing the drive using SG should see some sort of error. My
> first problem was my system hung. After patching the IDE-IO.C, with a
> recognized patch, I have been able to keep my system from hanging. Now
> the only problem is the application gets no notification that the drive
> has been rendered inaccessible. (Test case is to run a system with my
> app going, and then pull the power from the drive. System log shows the
> errors, but nothing gets back to the app). The app does get
> notifications if I perform the same type of test on a drive attached to
> the motherboard secondary IDE adapter, so we know the app is correctly
> implemented.

As Alan commented, not sure you are using IDE or libata?
Could you send the boot dmesg?

>
> I've traced the errors down to the fact that the errors are caught in
> libata-core.c (ata_qc_timeout). I'd like to put a call in libata-core.c
> that would cause an error to be reflected back to the application. Can
> you suggest the function or method that would do this?
>

If you are using libata, maybe the following patch can help.
It checks more bits of drv_stat, so status like 0x00 are returned as error.

Albert

========

--- linux/drivers/scsi/libata-core.c 2006-01-11 09:47:25.000000000 +0800
+++ errmask/drivers/scsi/libata-core.c 2006-01-11 09:51:09.000000000 +0800
@@ -3418,8 +3418,14 @@ static void ata_qc_timeout(struct ata_qu
printk(KERN_ERR "ata%u: command 0x%x timeout, stat 0x%x host_stat 0x%x\n",
ap->id, qc->tf.command, drv_stat, host_stat);

+ /* If drv_stat looks ok (0x50 normally), we treat this
+ * as lost interrupt and complete the qc as normal.
+ * If drv_stat looks bad (0x00, 0xff, etc), err_mask is set.
+ */
+ if (!ata_ok(drv_stat))
+ qc->err_mask |= __ac_err_mask(drv_stat);
+
/* complete taskfile transaction */
- qc->err_mask |= ac_err_mask(drv_stat);
ata_qc_complete(qc);
break;
}




2006-01-11 15:30:11

by John Treubig

[permalink] [raw]
Subject: Re: Error handling in LibATA

Thanks all for your recommendations. I'm in the process of installing your
patch. Per your requests I've attached copies of lspci, dmesg and two
excerpts from the messages file. The messages from libata.msg are from a
drive attached to the Promise adapter and ide.msg are from a drive attached
to the motherboard secondary IDE. In both cases power was removed during
testing.

My system is built with 2.6.15 rc5 and I will let you know my results.

Thanks again,
John


From: Albert Lee <[email protected]>
To: John Treubig <[email protected]>
CC: [email protected], [email protected],
[email protected], Alan Cox <[email protected]>,
Douglas Gilbert <[email protected]>, Doug Maxey <[email protected]>
Subject: Re: Error handling in LibATA
Date: Wed, 11 Jan 2006 10:07:35 +0800

John Treubig wrote:
> I've been working on a problem with Promise 20269 PATA adapter under
> LibATA that if the drive has a write error or time-out, the application
> that is accessing the drive using SG should see some sort of error. My
> first problem was my system hung. After patching the IDE-IO.C, with a
> recognized patch, I have been able to keep my system from hanging. Now
> the only problem is the application gets no notification that the drive
> has been rendered inaccessible. (Test case is to run a system with my
> app going, and then pull the power from the drive. System log shows the
> errors, but nothing gets back to the app). The app does get
> notifications if I perform the same type of test on a drive attached to
> the motherboard secondary IDE adapter, so we know the app is correctly
> implemented.

As Alan commented, not sure you are using IDE or libata?
Could you send the boot dmesg?

>
> I've traced the errors down to the fact that the errors are caught in
> libata-core.c (ata_qc_timeout). I'd like to put a call in libata-core.c
> that would cause an error to be reflected back to the application. Can
> you suggest the function or method that would do this?
>

If you are using libata, maybe the following patch can help.
It checks more bits of drv_stat, so status like 0x00 are returned as error.

Albert

========

--- linux/drivers/scsi/libata-core.c 2006-01-11 09:47:25.000000000 +0800
+++ errmask/drivers/scsi/libata-core.c 2006-01-11 09:51:09.000000000 +0800
@@ -3418,8 +3418,14 @@ static void ata_qc_timeout(struct ata_qu
printk(KERN_ERR "ata%u: command 0x%x timeout, stat 0x%x host_stat
0x%x\n",
ap->id, qc->tf.command, drv_stat, host_stat);

+ /* If drv_stat looks ok (0x50 normally), we treat this
+ * as lost interrupt and complete the qc as normal.
+ * If drv_stat looks bad (0x00, 0xff, etc), err_mask is set.
+ */
+ if (!ata_ok(drv_stat))
+ qc->err_mask |= __ac_err_mask(drv_stat);
+
/* complete taskfile transaction */
- qc->err_mask |= ac_err_mask(drv_stat);
ata_qc_complete(qc);
break;
}


Attachments:
dmesg.lst (15.32 kB)
pci.lst (1.74 kB)
ide.msg (10.12 kB)
libata.msg (4.49 kB)
Download all attachments

2006-01-12 19:49:21

by John Treubig

[permalink] [raw]
Subject: Re: Error handling in LibATA

Albert,

I began to try your patch and noticed that the code was different than my
copy of 2.6.15 rc5, so I downloaded 2.6.15 (release) and still see that the
base copy of libata-core.c is different. A good indicator was that
ata_qc_complete() requires 2 parameters in the code from 2.6.15. Can you
tell me where I can find the copy your working off and if I have to have any
other files to support it?

Best wishes,
John Treubig
VT Miltope


From: "John Treubig" <[email protected]>
To: [email protected]
CC: [email protected],
[email protected],[email protected],
[email protected],[email protected], [email protected]
Subject: Re: Error handling in LibATA
Date: Wed, 11 Jan 2006 09:30:02 -0600

Thanks all for your recommendations. I'm in the process of installing your
patch. Per your requests I've attached copies of lspci, dmesg and two
excerpts from the messages file. The messages from libata.msg are from a
drive attached to the Promise adapter and ide.msg are from a drive attached
to the motherboard secondary IDE. In both cases power was removed during
testing.

My system is built with 2.6.15 rc5 and I will let you know my results.

Thanks again,
John


From: Albert Lee <[email protected]>
To: John Treubig <[email protected]>
CC: [email protected], [email protected],
[email protected], Alan Cox <[email protected]>,
Douglas Gilbert <[email protected]>, Doug Maxey <[email protected]>
Subject: Re: Error handling in LibATA
Date: Wed, 11 Jan 2006 10:07:35 +0800

John Treubig wrote:
> I've been working on a problem with Promise 20269 PATA adapter under
> LibATA that if the drive has a write error or time-out, the application
> that is accessing the drive using SG should see some sort of error. My
> first problem was my system hung. After patching the IDE-IO.C, with a
> recognized patch, I have been able to keep my system from hanging. Now
> the only problem is the application gets no notification that the drive
> has been rendered inaccessible. (Test case is to run a system with my
> app going, and then pull the power from the drive. System log shows the
> errors, but nothing gets back to the app). The app does get
> notifications if I perform the same type of test on a drive attached to
> the motherboard secondary IDE adapter, so we know the app is correctly
> implemented.

As Alan commented, not sure you are using IDE or libata?
Could you send the boot dmesg?

>
> I've traced the errors down to the fact that the errors are caught in
> libata-core.c (ata_qc_timeout). I'd like to put a call in libata-core.c
> that would cause an error to be reflected back to the application. Can
> you suggest the function or method that would do this?
>

If you are using libata, maybe the following patch can help.
It checks more bits of drv_stat, so status like 0x00 are returned as error.

Albert

========

--- linux/drivers/scsi/libata-core.c 2006-01-11 09:47:25.000000000 +0800
+++ errmask/drivers/scsi/libata-core.c 2006-01-11 09:51:09.000000000 +0800
@@ -3418,8 +3418,14 @@ static void ata_qc_timeout(struct ata_qu
printk(KERN_ERR "ata%u: command 0x%x timeout, stat 0x%x host_stat
0x%x\n",
ap->id, qc->tf.command, drv_stat, host_stat);

+ /* If drv_stat looks ok (0x50 normally), we treat this
+ * as lost interrupt and complete the qc as normal.
+ * If drv_stat looks bad (0x00, 0xff, etc), err_mask is set.
+ */
+ if (!ata_ok(drv_stat))
+ qc->err_mask |= __ac_err_mask(drv_stat);
+
/* complete taskfile transaction */
- qc->err_mask |= ac_err_mask(drv_stat);
ata_qc_complete(qc);
break;
}



<< dmesg.lst >>


<< pci.lst >>


<< ide.msg >>


<< libata.msg >>


2006-01-13 05:45:37

by Albert Lee

[permalink] [raw]
Subject: Re: Error handling in LibATA


>
> I began to try your patch and noticed that the code was different than
> my copy of 2.6.15 rc5, so I downloaded 2.6.15 (release) and still see
> that the base copy of libata-core.c is different. A good indicator was
> that ata_qc_complete() requires 2 parameters in the code from 2.6.15.
> Can you tell me where I can find the copy your working off and if I have
> to have any other files to support it?

Hi John,

Attached please find the patch for the 2.6.15 tree.
Also please turn on the ATA_DEBUG and ATA_VERBOSE_DEBUG in libata.h
for detailed log during the test, thanks.

Albert

=================

--- linux-2.6.15/drivers/scsi/libata-core.c 2006-01-03 11:21:10.000000000 +0800
+++ errmask/drivers/scsi/libata-core.c 2006-01-13 13:27:12.000000000 +0800
@@ -3312,6 +3312,7 @@ static void ata_qc_timeout(struct ata_qu
struct ata_host_set *host_set = ap->host_set;
u8 host_stat = 0, drv_stat;
unsigned long flags;
+ unsigned int err_mask = 0;

DPRINTK("ENTER\n");

@@ -3346,8 +3347,15 @@ static void ata_qc_timeout(struct ata_qu
printk(KERN_ERR "ata%u: command 0x%x timeout, stat 0x%x host_stat 0x%x\n",
ap->id, qc->tf.command, drv_stat, host_stat);

+ /* If drv_stat looks ok (0x50 normally), we treat this
+ * as lost interrupt and complete the qc as normal.
+ * If drv_stat looks bad (0x00, 0xff, etc), err_mask is set.
+ */
+ if (!ata_ok(drv_stat))
+ err_mask |= __ac_err_mask(drv_stat);
+
/* complete taskfile transaction */
- ata_qc_complete(qc, ac_err_mask(drv_stat));
+ ata_qc_complete(qc, err_mask);
break;
}




2006-01-13 15:41:14

by John Treubig

[permalink] [raw]
Subject: Re: Error handling in LibATA

Thanks Albert for the quick reply. The patch you sent fixed the problem.
To recap, from a base of 2.6.15, I have had to apply your patch (to permit
errors to be seen by my application) plus a patch to ide-io.c (to prevent
the kernel from hanging). Hope these can be incorporated in future
releases.

Best wishes,
John Treubig
VT Miltope


From: Albert Lee <[email protected]>
To: John Treubig <[email protected]>
CC: [email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected]
Subject: Re: Error handling in LibATA
Date: Fri, 13 Jan 2006 13:45:22 +0800


>
> I began to try your patch and noticed that the code was different than
> my copy of 2.6.15 rc5, so I downloaded 2.6.15 (release) and still see
> that the base copy of libata-core.c is different. A good indicator was
> that ata_qc_complete() requires 2 parameters in the code from 2.6.15.
> Can you tell me where I can find the copy your working off and if I have
> to have any other files to support it?

Hi John,

Attached please find the patch for the 2.6.15 tree.
Also please turn on the ATA_DEBUG and ATA_VERBOSE_DEBUG in libata.h
for detailed log during the test, thanks.

Albert

=================

--- linux-2.6.15/drivers/scsi/libata-core.c 2006-01-03 11:21:10.000000000
+0800
+++ errmask/drivers/scsi/libata-core.c 2006-01-13 13:27:12.000000000 +0800
@@ -3312,6 +3312,7 @@ static void ata_qc_timeout(struct ata_qu
struct ata_host_set *host_set = ap->host_set;
u8 host_stat = 0, drv_stat;
unsigned long flags;
+ unsigned int err_mask = 0;

DPRINTK("ENTER\n");

@@ -3346,8 +3347,15 @@ static void ata_qc_timeout(struct ata_qu
printk(KERN_ERR "ata%u: command 0x%x timeout, stat 0x%x host_stat
0x%x\n",
ap->id, qc->tf.command, drv_stat, host_stat);

+ /* If drv_stat looks ok (0x50 normally), we treat this
+ * as lost interrupt and complete the qc as normal.
+ * If drv_stat looks bad (0x00, 0xff, etc), err_mask is set.
+ */
+ if (!ata_ok(drv_stat))
+ err_mask |= __ac_err_mask(drv_stat);
+
/* complete taskfile transaction */
- ata_qc_complete(qc, ac_err_mask(drv_stat));
+ ata_qc_complete(qc, err_mask);
break;
}