2008-06-22 23:23:59

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> Hi,
>
> On Thursday 19 June 2008, Elias Oltmanns wrote:
>> Hi Bart,
>>
>> currently, the code path executing an HDIO_DRIVE_RESET ioctl is broken
>> in various ways. Firstly, ide_abort() is called with the ide_lock held
>> and may call ide_end_request() further down the line which will try to
>> grab the same lock again. More importantly though, the whole concept of
>> aborting an inflight request is flawed -- at least as far as ide_abort()
>> is concerned.
>>
>> The patch below tries to address these issues by handling the
>> HDIO_DRIVE_RESET ioctl in-band.
[...]
> One more thing, ide-2.6 is for syncing with Linus _only_ (it seems I'm to
> blame for the confusion), please rebase your work on top of pata-2.6 quilt
> tree (some comments which should ease the transition below):
>
> http://kernel.org/pub/linux/kernel/people/bart/pata-2.6/
>
> [ This tree is pulled into linux-next so you may prefer to just use the
> latest -next snapshost instead. ]

Indeed, the following patch series is based on nex-20080620. Just to be
absolutely clear though, this is actually a bug fix since a

# hdparm -w /dev/hda

currently freezes the system if there happens to be any I/O operation in
progress. Not sure whether this is serious enough for -rc or, indeed,
-stable trees, but I thought I'd mention it.

[...]
>> + rq->cmd[0] = REQ_DRIVE_RESET;
>> + rq->cmd_type = REQ_TYPE_SPECIAL;
>> + rq->cmd_flags |= REQ_SOFTBARRIER;
>> + ide_do_drive_cmd(drive, rq, ide_end);
>
> ide_do_drive_cmd() now handles only ide_preempt + it seems that previously
> the ioctl would wait till the reset is complete so probably this needs to
> use blk_execute_rq() instead

There is some uncertainty here: Previously, the code didn't actually
wait for the reset to complete before returning but did so right after
the reset sequence had been initiated. It might be desirable to wait
for completion before returning though and that's the way I've
implemented it in the new patch (first in the series). This has the
advantage that user space could be informed when anything should have
gone wrong (like a timeout). This would obviously mean a functional
change visible from user space but then fixing the bug in the first
place is a change visible in user space. See the fourth patch in the
series for what I've had in mind.

>
> [...]
>
> Oh, and now that you've fixed HDIO_DRIVE_RESET you get the following bonus:
> ide_abort(), __ide_abort() and ide_driver_t.abort all can be removed in the
> incremental patch! ;)

Done. It's the second in the series.

So here is the summary of the patch series to come:

1. Fixes the buggy implementation of HDIO_DRIVE_RESET handling.
2. Removes all code that has been made superfluous by the previous
change.
3. Documents the way HDIO_DRIVE_RESET is handled accurately.
4. Adds some more error reporting facilities and documents them as well.

Please be particularly alert when reviewing the last patch. I merely
did what seemed to be the right and obvious thing to do but I ironed out
some irregularities along the way which (for all improbability) may have
been there for some reason or other. It beats me, for instance, why
->polling but not ->resetting should be reset to 0 when
sil_sata_reset_poll() returns non zero. So, I now both are 0 once any
of the poll functions returns ide_stopped.

Regards,

Elias


2008-06-22 23:28:39

by Elias Oltmanns

[permalink] [raw]
Subject: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

Currently, the code path executing an HDIO_DRIVE_RESET ioctl is broken
in various ways. Most importantly, it is treated as an out of band
request in an illegal way which may very likely lead to system lock ups.
Use the drive's request queue to avoid this problem (and fix a locking
issue for free along the way).

Signed-off-by: Elias Oltmanns <[email protected]>
---

drivers/ide/ide-io.c | 23 ++++++++++++++++++++++-
drivers/ide/ide-iops.c | 14 +++++++++++++-
drivers/ide/ide.c | 33 ++++++++++++++-------------------
include/linux/ide.h | 6 ++++++
4 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 2805774..d80f1b9 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -766,6 +766,18 @@ static ide_startstop_t execute_drive_cmd (ide_drive_t *drive,
return ide_stopped;
}

+static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
+{
+ switch (rq->cmd[0]) {
+ case REQ_DRIVE_RESET:
+ return ide_do_reset(drive);
+ default:
+ blk_dump_rq_flags(rq, "ide_special_rq - bad request");
+ ide_end_request(drive, 0, 0);
+ return ide_stopped;
+ }
+}
+
static void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
{
struct request_pm_state *pm = rq->data;
@@ -869,7 +881,16 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
pm->pm_step == ide_pm_state_completed)
ide_complete_pm_request(drive, rq);
return startstop;
- }
+ } else if (!rq->rq_disk && blk_special_request(rq))
+ /*
+ * TODO: Once all ULDs have been modified to
+ * check for specific op codes rather than
+ * blindly accepting any special request, the
+ * echeck for ->rq_disk above may be replaced
+ * by a more suitable mechanism or even
+ * dropped entirely.
+ */
+ return ide_special_rq(drive, rq);

drv = *(ide_driver_t **)rq->rq_disk->private_data;
return drv->do_request(drive, rq, block);
diff --git a/drivers/ide/ide-iops.c b/drivers/ide/ide-iops.c
index 66e39e5..d68fc2a 100644
--- a/drivers/ide/ide-iops.c
+++ b/drivers/ide/ide-iops.c
@@ -905,6 +905,14 @@ void ide_execute_pkt_cmd(ide_drive_t *drive)
}
EXPORT_SYMBOL_GPL(ide_execute_pkt_cmd);

+static inline void ide_complete_drive_reset(ide_drive_t *drive)
+{
+ struct request *rq = HWGROUP(drive)->rq;
+
+ if (rq && blk_special_request(rq) && rq->cmd[0] == REQ_DRIVE_RESET)
+ ide_end_request(drive, 1, 0);
+}
+
/* needed below */
static ide_startstop_t do_reset1 (ide_drive_t *, int);

@@ -941,6 +949,7 @@ static ide_startstop_t atapi_reset_pollfunc (ide_drive_t *drive)
/* done polling */
hwgroup->polling = 0;
hwgroup->resetting = 0;
+ ide_complete_drive_reset(drive);
return ide_stopped;
}

@@ -961,7 +970,7 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
if (port_ops->reset_poll(drive)) {
printk(KERN_ERR "%s: host reset_poll failure for %s.\n",
hwif->name, drive->name);
- return ide_stopped;
+ goto out;
}
}

@@ -1005,6 +1014,8 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
}
hwgroup->polling = 0; /* done polling */
hwgroup->resetting = 0; /* done reset attempt */
+out:
+ ide_complete_drive_reset(drive);
return ide_stopped;
}

@@ -1112,6 +1123,7 @@ static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)

if (io_ports->ctl_addr == 0) {
spin_unlock_irqrestore(&ide_lock, flags);
+ ide_complete_drive_reset(drive);
return ide_stopped;
}

diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index b0739c9..bf17529 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -529,6 +529,19 @@ static int generic_ide_resume(struct device *dev)
return err;
}

+static void generic_drive_reset(ide_drive_t *drive)
+{
+ struct request *rq;
+
+ rq = blk_get_request(drive->queue, READ, __GFP_WAIT);
+ rq->cmd_type = REQ_TYPE_SPECIAL;
+ rq->cmd_len = 1;
+ rq->cmd[0] = REQ_DRIVE_RESET;
+ rq->cmd_flags |= REQ_SOFTBARRIER;
+ blk_execute_rq(drive->queue, NULL, rq, 1);
+ blk_put_request(rq);
+}
+
int generic_ide_ioctl(ide_drive_t *drive, struct file *file, struct block_device *bdev,
unsigned int cmd, unsigned long arg)
{
@@ -603,31 +616,13 @@ int generic_ide_ioctl(ide_drive_t *drive, struct file *file, struct block_device
if (!capable(CAP_SYS_ADMIN))
return -EACCES;

- /*
- * Abort the current command on the
- * group if there is one, taking
- * care not to allow anything else
- * to be queued and to die on the
- * spot if we miss one somehow
- */
-
spin_lock_irqsave(&ide_lock, flags);
-
if (HWGROUP(drive)->resetting) {
spin_unlock_irqrestore(&ide_lock, flags);
return -EBUSY;
}
-
- ide_abort(drive, "drive reset");
-
- BUG_ON(HWGROUP(drive)->handler);
-
- /* Ensure nothing gets queued after we
- drop the lock. Reset will clear the busy */
-
- HWGROUP(drive)->busy = 1;
spin_unlock_irqrestore(&ide_lock, flags);
- (void) ide_do_reset(drive);
+ generic_drive_reset(drive);

return 0;
case HDIO_GET_BUSSTATE:
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 5ddae9b..8fbe838 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -139,6 +139,12 @@ struct ide_io_ports {
#define WAIT_MIN_SLEEP (2*HZ/100) /* 20msec - minimum sleep time */

/*
+ * Op codes for special requests to be handled by ide_special_rq().
+ * Values should be in the range of 0x20 to 0x3f.
+ */
+#define REQ_DRIVE_RESET 0x20
+
+/*
* Check for an interrupt and acknowledge the interrupt status
*/
struct hwif_s;

2008-06-22 23:32:58

by Elias Oltmanns

[permalink] [raw]
Subject: [PATCH 2/4] IDE: Remove unused code

Remove some code which has been made obsolete and hasn't worked properly
before anyway.

Signed-off-by: Elias Oltmanns <[email protected]>
---

drivers/ide/ide-cd.c | 1 -
drivers/ide/ide-disk.c | 1 -
drivers/ide/ide-floppy.c | 1 -
drivers/ide/ide-io.c | 49 ----------------------------------------------
drivers/ide/ide-tape.c | 1 -
drivers/scsi/ide-scsi.c | 14 -------------
include/linux/ide.h | 5 -----
7 files changed, 0 insertions(+), 72 deletions(-)

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index b5ec249..26e8972 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1940,7 +1940,6 @@ static ide_driver_t ide_cdrom_driver = {
.do_request = ide_cd_do_request,
.end_request = ide_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idecd_proc,
#endif
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 5f49a4a..3a2e802 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -985,7 +985,6 @@ static ide_driver_t idedisk_driver = {
.do_request = ide_do_rw_disk,
.end_request = ide_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idedisk_proc,
#endif
diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
index c7e4433..011d720 100644
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -1129,7 +1129,6 @@ static ide_driver_t idefloppy_driver = {
.do_request = idefloppy_do_request,
.end_request = idefloppy_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idefloppy_proc,
#endif
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index d80f1b9..731f975 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -504,55 +504,6 @@ ide_startstop_t ide_error (ide_drive_t *drive, const char *msg, u8 stat)

EXPORT_SYMBOL_GPL(ide_error);

-ide_startstop_t __ide_abort(ide_drive_t *drive, struct request *rq)
-{
- if (drive->media != ide_disk)
- rq->errors |= ERROR_RESET;
-
- ide_kill_rq(drive, rq);
-
- return ide_stopped;
-}
-
-EXPORT_SYMBOL_GPL(__ide_abort);
-
-/**
- * ide_abort - abort pending IDE operations
- * @drive: drive the error occurred on
- * @msg: message to report
- *
- * ide_abort kills and cleans up when we are about to do a
- * host initiated reset on active commands. Longer term we
- * want handlers to have sensible abort handling themselves
- *
- * This differs fundamentally from ide_error because in
- * this case the command is doing just fine when we
- * blow it away.
- */
-
-ide_startstop_t ide_abort(ide_drive_t *drive, const char *msg)
-{
- struct request *rq;
-
- if (drive == NULL || (rq = HWGROUP(drive)->rq) == NULL)
- return ide_stopped;
-
- /* retry only "normal" I/O: */
- if (!blk_fs_request(rq)) {
- rq->errors = 1;
- ide_end_drive_cmd(drive, BUSY_STAT, 0);
- return ide_stopped;
- }
-
- if (rq->rq_disk) {
- ide_driver_t *drv;
-
- drv = *(ide_driver_t **)rq->rq_disk->private_data;
- return drv->abort(drive, rq);
- } else
- return __ide_abort(drive, rq);
-}
-
static void ide_tf_set_specify_cmd(ide_drive_t *drive, struct ide_taskfile *tf)
{
tf->nsect = drive->sect;
diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
index 2deb23d..353dd11 100644
--- a/drivers/ide/ide-tape.c
+++ b/drivers/ide/ide-tape.c
@@ -2591,7 +2591,6 @@ static ide_driver_t idetape_driver = {
.do_request = idetape_do_request,
.end_request = idetape_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idetape_proc,
#endif
diff --git a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
index 683bce3..f843c13 100644
--- a/drivers/scsi/ide-scsi.c
+++ b/drivers/scsi/ide-scsi.c
@@ -258,19 +258,6 @@ idescsi_atapi_error(ide_drive_t *drive, struct request *rq, u8 stat, u8 err)
return ide_stopped;
}

-static ide_startstop_t
-idescsi_atapi_abort(ide_drive_t *drive, struct request *rq)
-{
- debug_log("%s called for %lu\n", __func__,
- ((struct ide_atapi_pc *) rq->special)->scsi_cmd->serial_number);
-
- rq->errors |= ERROR_MAX;
-
- idescsi_end_request(drive, 0, 0);
-
- return ide_stopped;
-}
-
static int idescsi_end_request (ide_drive_t *drive, int uptodate, int nrsecs)
{
idescsi_scsi_t *scsi = drive_to_idescsi(drive);
@@ -524,7 +511,6 @@ static ide_driver_t idescsi_driver = {
.do_request = idescsi_do_request,
.end_request = idescsi_end_request,
.error = idescsi_atapi_error,
- .abort = idescsi_atapi_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idescsi_proc,
#endif
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 8fbe838..483df6b 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -777,7 +777,6 @@ struct ide_driver_s {
ide_startstop_t (*do_request)(ide_drive_t *, struct request *, sector_t);
int (*end_request)(ide_drive_t *, int, int);
ide_startstop_t (*error)(ide_drive_t *, struct request *rq, u8, u8);
- ide_startstop_t (*abort)(ide_drive_t *, struct request *rq);
struct device_driver gen_driver;
int (*probe)(ide_drive_t *);
void (*remove)(ide_drive_t *);
@@ -819,10 +818,6 @@ ide_startstop_t __ide_error(ide_drive_t *, struct request *, u8, u8);

ide_startstop_t ide_error (ide_drive_t *drive, const char *msg, byte stat);

-ide_startstop_t __ide_abort(ide_drive_t *, struct request *);
-
-extern ide_startstop_t ide_abort(ide_drive_t *, const char *);
-
extern void ide_fix_driveid(struct hd_driveid *);

extern void ide_fixstring(u8 *, const int, const int);

2008-06-22 23:35:23

by Elias Oltmanns

[permalink] [raw]
Subject: [PATCH 3/4] Update documentation of HDIO_DRIVE_RESET ioctl

Alter the entry for HDIO_DRIVE_RESET in Documentation/ioctl/hdio.txt to
reflect a functional change in the driver. Besides, the entry has been
inaccurate before.

Signed-off-by: Elias Oltmanns <[email protected]>
---

Documentation/ioctl/hdio.txt | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt
index c19efde..a859734 100644
--- a/Documentation/ioctl/hdio.txt
+++ b/Documentation/ioctl/hdio.txt
@@ -508,12 +508,12 @@ HDIO_DRIVE_RESET execute a device reset

error returns:
EACCES Access denied: requires CAP_SYS_ADMIN
+ EBUSY Device busy: reset operation in progress already

notes:

- Abort any current command, prevent anything else from being
- queued, execute a reset on the device, and issue BLKRRPART
- ioctl on the block device.
+ Execute a reset on the device as soon as the current IO
+ operation has completed.

Executes an ATAPI soft reset if applicable, otherwise
executes an ATA soft reset on the controller.

2008-06-23 07:48:33

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

Elias Oltmanns <[email protected]> wrote:
[...]
> @@ -941,6 +949,7 @@ static ide_startstop_t atapi_reset_pollfunc (ide_drive_t *drive)
> /* done polling */
> hwgroup->polling = 0;
> hwgroup->resetting = 0;

Actually, ->resetting needs to be protected by the ide_lock here.

[...]
> @@ -1005,6 +1014,8 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
> }
> hwgroup->polling = 0; /* done polling */
> hwgroup->resetting = 0; /* done reset attempt */

Same as above. Unless I have missed something, a simple

spin_lock(&ide_lock)

should suffice since there cannot possibly be another interrupt that
changes ->resetting behind our back. I'll send an updated version of
patches 1 and 4 once I have your opinion on the current series.

Regards,

Elias

2008-06-23 09:34:54

by Alan

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

> in various ways. Most importantly, it is treated as an out of band
> request in an illegal way which may very likely lead to system lock ups.
> Use the drive's request queue to avoid this problem (and fix a locking
> issue for free along the way).

It was always designed to be, and used out of band. One of the important
uses of the ioctl is to abort a running command when an interface has
jammed up. If you end up queueing it behind that command you've lost most
of the reason for the ioctl anyway (and you might as well just remove it
really given SG_IO exists).

Other than the command aborting bit, it looks a good idea - that code has
always been racy and raced against timer handlers, irq handlers and if
neither of them got it then a speed changedown raced the lot 8(

Alan

2008-06-23 09:37:19

by Alan

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

> ->polling but not ->resetting should be reset to 0 when
> sil_sata_reset_poll() returns non zero. So, I now both are 0 once any
> of the poll functions returns ide_stopped.

Historically the polling value was a hack to reduce the size of the
timer/irq/polling races where IRQs end up unmasked again. Resetting
indicated if a reset was currently progressing (in essence 'don't issue
any more resets for the moment'). At the time at least there were cases
where polling=1 resetting=0 could occur (notably DMA crc error handling).

Alan

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling


On Monday 23 June 2008, Elias Oltmanns wrote:
> Currently, the code path executing an HDIO_DRIVE_RESET ioctl is broken
> in various ways. Most importantly, it is treated as an out of band
> request in an illegal way which may very likely lead to system lock ups.
> Use the drive's request queue to avoid this problem (and fix a locking
> issue for free along the way).
>
> Signed-off-by: Elias Oltmanns <[email protected]>

[...]

> + } else if (!rq->rq_disk && blk_special_request(rq))
> + /*
> + * TODO: Once all ULDs have been modified to
> + * check for specific op codes rather than
> + * blindly accepting any special request, the
> + * echeck for ->rq_disk above may be replaced

'echeck' typo.

[...]

> +static inline void ide_complete_drive_reset(ide_drive_t *drive)
> +{
> + struct request *rq = HWGROUP(drive)->rq;
> +
> + if (rq && blk_special_request(rq) && rq->cmd[0] == REQ_DRIVE_RESET)

Shouldn't we be also checking for !rq->rq_disk here?

[ also HWGROUP() macro is on its way out ]

Otherwise OK.

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

On Monday 23 June 2008, Elias Oltmanns wrote:
> Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> > Hi,
> >
> > On Thursday 19 June 2008, Elias Oltmanns wrote:
> >> Hi Bart,
> >>
> >> currently, the code path executing an HDIO_DRIVE_RESET ioctl is broken
> >> in various ways. Firstly, ide_abort() is called with the ide_lock held
> >> and may call ide_end_request() further down the line which will try to
> >> grab the same lock again. More importantly though, the whole concept of
> >> aborting an inflight request is flawed -- at least as far as ide_abort()
> >> is concerned.
> >>
> >> The patch below tries to address these issues by handling the
> >> HDIO_DRIVE_RESET ioctl in-band.
> [...]
> > One more thing, ide-2.6 is for syncing with Linus _only_ (it seems I'm to
> > blame for the confusion), please rebase your work on top of pata-2.6 quilt
> > tree (some comments which should ease the transition below):
> >
> > http://kernel.org/pub/linux/kernel/people/bart/pata-2.6/
> >
> > [ This tree is pulled into linux-next so you may prefer to just use the
> > latest -next snapshost instead. ]
>
> Indeed, the following patch series is based on nex-20080620. Just to be
> absolutely clear though, this is actually a bug fix since a
>
> # hdparm -w /dev/hda
>
> currently freezes the system if there happens to be any I/O operation in
> progress. Not sure whether this is serious enough for -rc or, indeed,
> -stable trees, but I thought I'd mention it.

According to 'man hdparm':

-w Perform a device reset (DANGEROUS). Do NOT use this option. It
exists for unlikely situations where a reboot might otherwise be
required to get a confused drive back into a useable state.

so I don't think that a rush is necessary (however we may still want to get
patch #1 in for 2.6.26).

BTW Your fix adds framework which can be re-used for fixing locking of IDE
settings (+ it finally makes sense to dust-off my IDE settings rework which
was always low-prio and was never posted :) and maybe it could be also used
for drive->special handling.

> [...]
> >> + rq->cmd[0] = REQ_DRIVE_RESET;
> >> + rq->cmd_type = REQ_TYPE_SPECIAL;
> >> + rq->cmd_flags |= REQ_SOFTBARRIER;
> >> + ide_do_drive_cmd(drive, rq, ide_end);
> >
> > ide_do_drive_cmd() now handles only ide_preempt + it seems that previously
> > the ioctl would wait till the reset is complete so probably this needs to
> > use blk_execute_rq() instead
>
> There is some uncertainty here: Previously, the code didn't actually
> wait for the reset to complete before returning but did so right after
> the reset sequence had been initiated. It might be desirable to wait
> for completion before returning though and that's the way I've
> implemented it in the new patch (first in the series). This has the
> advantage that user space could be informed when anything should have
> gone wrong (like a timeout). This would obviously mean a functional
> change visible from user space but then fixing the bug in the first
> place is a change visible in user space. See the fourth patch in the
> series for what I've had in mind.
>
> >
> > [...]
> >
> > Oh, and now that you've fixed HDIO_DRIVE_RESET you get the following bonus:
> > ide_abort(), __ide_abort() and ide_driver_t.abort all can be removed in the
> > incremental patch! ;)
>
> Done. It's the second in the series.

Thanks.

> So here is the summary of the patch series to come:
>
> 1. Fixes the buggy implementation of HDIO_DRIVE_RESET handling.

Two minor issues here (see reply to the patch).

> 2. Removes all code that has been made superfluous by the previous
> change.
> 3. Documents the way HDIO_DRIVE_RESET is handled accurately.
> 4. Adds some more error reporting facilities and documents them as well.
>
> Please be particularly alert when reviewing the last patch. I merely
> did what seemed to be the right and obvious thing to do but I ironed out
> some irregularities along the way which (for all improbability) may have
> been there for some reason or other. It beats me, for instance, why
> ->polling but not ->resetting should be reset to 0 when
> sil_sata_reset_poll() returns non zero. So, I now both are 0 once any
> of the poll functions returns ide_stopped.

Looks OK but please move the above description to the patch description.

Patches #2 and #3 also look good.

Bart

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

On Monday 23 June 2008, Elias Oltmanns wrote:
> Elias Oltmanns <[email protected]> wrote:
> [...]
> > @@ -941,6 +949,7 @@ static ide_startstop_t atapi_reset_pollfunc (ide_drive_t *drive)
> > /* done polling */
> > hwgroup->polling = 0;
> > hwgroup->resetting = 0;
>
> Actually, ->resetting needs to be protected by the ide_lock here.
>
> [...]
> > @@ -1005,6 +1014,8 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
> > }
> > hwgroup->polling = 0; /* done polling */
> > hwgroup->resetting = 0; /* done reset attempt */
>
> Same as above. Unless I have missed something, a simple
>
> spin_lock(&ide_lock)
>
> should suffice since there cannot possibly be another interrupt that
> changes ->resetting behind our back. I'll send an updated version of
> patches 1 and 4 once I have your opinion on the current series.

With patch #1 we may as well just remove ->resetting and allow the next
reset request to be added to the queue.

2008-06-24 07:02:55

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

Alan Cox <[email protected]> wrote:
>> in various ways. Most importantly, it is treated as an out of band
>> request in an illegal way which may very likely lead to system lock ups.
>> Use the drive's request queue to avoid this problem (and fix a locking
>> issue for free along the way).
>
> It was always designed to be, and used out of band. One of the important
> uses of the ioctl is to abort a running command when an interface has
> jammed up. If you end up queueing it behind that command you've lost most
> of the reason for the ioctl anyway (and you might as well just remove it
> really given SG_IO exists).

Well, I can see your point. In fact, there really doesn't seem to be an
alternative to the out of band approach for the purposes you described.
Now, I even think that I could perhaps fix the request aborting properly
and restore the original behaviour. Moreover, I may very likely live to
regret having removed ide_abort() and friends when implementing disk
shock protection in the IDE layer. Maybe I should try to send an
alternative patch for discussion. On the other hand I don't see the
equivalent for HDIO_DRIVE_RESET in libata which makes me wonder whether
this ioctl has actually been used in real life for the purposes you
described.

>
> Other than the command aborting bit, it looks a good idea - that code
>has
> always been racy and raced against timer handlers, irq handlers and if
> neither of them got it then a speed changedown raced the lot 8(

My idea to solve this would be roughly this: Change ide_set_handler to
leave the ->handler and ->expiry members alone if they have been set on
entry. If a request is being processed by the time a HDIO_DRIVE_RESET
ioctl is received, these callbacks will be changed so the reset sequence
will be started on the next interrupt, timeout, or when the ->busy flag
is cleared. I'm not quite sure yet whether things will work out the way
I want them to and I don't know whether HDIO_DRIVE_RESET actually
justifies the effort since I don't knowof an equivalent in libata
anyway. But as I said, it might come in handy for other purposes.

Comments?

Elias

2008-06-24 07:14:42

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> On Monday 23 June 2008, Elias Oltmanns wrote:
> [...]
>
>> +static inline void ide_complete_drive_reset(ide_drive_t *drive)
>> +{
>> + struct request *rq = HWGROUP(drive)->rq;
>> +
>> + if (rq && blk_special_request(rq) && rq->cmd[0] == REQ_DRIVE_RESET)
>
> Shouldn't we be also checking for !rq->rq_disk here?

Well, it would probably feel more consistent. However, as far as I can
tell, opcodes 0x20 and above aren't used by any of the ULDs right now
and it would probably make sense to reserve that range (or part of it)
for the generic ide_special_rq() handler. Consequently, it is sufficient
to check for REQ_TYPE and op code, as I did.

Regards,

Elias

2008-06-24 07:23:48

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> On Monday 23 June 2008, Elias Oltmanns wrote:
[...]
>> Indeed, the following patch series is based on nex-20080620. Just to be
>> absolutely clear though, this is actually a bug fix since a
>>
>> # hdparm -w /dev/hda
>>
>> currently freezes the system if there happens to be any I/O operation in
>> progress. Not sure whether this is serious enough for -rc or, indeed,
>> -stable trees, but I thought I'd mention it.
>
> According to 'man hdparm':
>
> -w Perform a device reset (DANGEROUS). Do NOT use this option. It
> exists for unlikely situations where a reboot might otherwise be
> required to get a confused drive back into a useable state.
>
> so I don't think that a rush is necessary (however we may still want to get
> patch #1 in for 2.6.26).

I see.

>
> BTW Your fix adds framework which can be re-used for fixing locking of
>IDE
> settings (+ it finally makes sense to dust-off my IDE settings rework which
> was always low-prio and was never posted :) and maybe it could be also used
> for drive->special handling.

Alan's remark made me think again and I'm still not quite sure whether I
can (and indeed should) fix ide_abort() and allow for out of band
execution of HDIO_DRIVE_RESET as well as idle immediate with head unload
for disk shock protection. What are your views on Alan's remark and my
reply?

[...]
>> 4. Adds some more error reporting facilities and documents them as well.
>>
>> Please be particularly alert when reviewing the last patch. I merely
>> did what seemed to be the right and obvious thing to do but I ironed out
>> some irregularities along the way which (for all improbability) may have
>> been there for some reason or other. It beats me, for instance, why
>> ->polling but not ->resetting should be reset to 0 when
>> sil_sata_reset_poll() returns non zero. So, I now both are 0 once any
>> of the poll functions returns ide_stopped.
>
> Looks OK but please move the above description to the patch description.

Will do.

>
> Patches #2 and #3 also look good.

Thanks for reviewing.

Regards,

Elias

2008-06-24 09:28:50

by Alan

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

> alternative to the out of band approach for the purposes you described.
> Now, I even think that I could perhaps fix the request aborting properly

I'm not sure the two are mutually conflicting. The queued reset you've
done is very much cleaner so in theory you can abort the command in the
ioctl and then queue the rest of the reset/recovery as you have done

> equivalent for HDIO_DRIVE_RESET in libata which makes me wonder whether
> this ioctl has actually been used in real life for the purposes you
> described.

Its on the todo list along with more speed control - it gets hard to do
right and cleanly.

> My idea to solve this would be roughly this: Change ide_set_handler to
> leave the ->handler and ->expiry members alone if they have been set on
> entry. If a request is being processed by the time a HDIO_DRIVE_RESET

The basic problem is that we have to issue a speed change and the driver
layer speed change code assumed this was issued polled with a wait (in
fact for some controllers/disks it has to be polled). Not a good thing to
discover you are doing in an IRQ handler ...

> Comments?

I think your basic approach is right. It might be that the abort path
wants to abort and then queue the rest. I could of course be completely
wrong too 8)

Alan

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

On Tuesday 24 June 2008, Elias Oltmanns wrote:
> Bartlomiej Zolnierkiewicz <[email protected]> wrote:

[...]

> > BTW Your fix adds framework which can be re-used for fixing locking of
> >IDE
> > settings (+ it finally makes sense to dust-off my IDE settings rework which
> > was always low-prio and was never posted :) and maybe it could be also used
> > for drive->special handling.
>
> Alan's remark made me think again and I'm still not quite sure whether I
> can (and indeed should) fix ide_abort() and allow for out of band

[...]

I don't see a reliable way to fix ide_abort() - once the request/command
is started hardware can be already in a state that makes aborting hard if
not impossible.

Thanks,
Bart

2008-06-24 12:50:37

by Alan

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

> I don't see a reliable way to fix ide_abort() - once the request/command
> is started hardware can be already in a state that makes aborting hard if
> not impossible.

It depends on the ATA version what you do but you end up doing a reset
sequence without waiting for the existing command to finish if your drive
is too new to have IDLE IMMEDIATE. What you can't do is wait for the
command to finish before issuing a reset because it may never finish.

I don't see why you think it's "hard". We have timeout handlers for many
commands and those reset/abort just fine.

Alan

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

On Tue, Jun 24, 2008 at 2:32 PM, Alan Cox <[email protected]> wrote:
>> I don't see a reliable way to fix ide_abort() - once the request/command
>> is started hardware can be already in a state that makes aborting hard if
>> not impossible.
>
> It depends on the ATA version what you do but you end up doing a reset
> sequence without waiting for the existing command to finish if your drive
> is too new to have IDLE IMMEDIATE. What you can't do is wait for the
> command to finish before issuing a reset because it may never finish.
>
> I don't see why you think it's "hard". We have timeout handlers for many
> commands and those reset/abort just fine.

They are different beasts from user-space initiated abort operation
which can happen in any moment (timeout handlers explicetely know
what state software/hardware is supposed to be currently in) and is
in no way synchronized with the current request/command processing.

It may be possible to fix it but it will be really hard to get it
right and I don't think it is worth the pain for broken-by-design
hack in an odd ioctl workarounding shortcomings in core code error
recovery (which should be fixed instead, if is not fixed already).

Thanks,
Bart

2008-06-24 13:55:10

by Alan

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

> > I don't see why you think it's "hard". We have timeout handlers for many
> > commands and those reset/abort just fine.
>
> They are different beasts from user-space initiated abort operation

No they are not. They are the *same* thing in every respect.

You have the drive in an unknown state, you want it back. If your drive
lost a command due to noise or a firmware flaw you have no idea about the
state it is actually in (supposed to be is irrelevant)

Alan

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

On Tue, Jun 24, 2008 at 3:35 PM, Alan Cox <[email protected]> wrote:
>> > I don't see why you think it's "hard". We have timeout handlers for many
>> > commands and those reset/abort just fine.
>>
>> They are different beasts from user-space initiated abort operation
>
> No they are not. They are the *same* thing in every respect.
>
> You have the drive in an unknown state, you want it back. If your drive
> lost a command due to noise or a firmware flaw you have no idea about the
> state it is actually in (supposed to be is irrelevant)

I generally agree with you w.r.t. to drive side of the operations but
the drive is only part of the equation (the host and the request states
are the others) so 'supposed to be is' is quite relevant.

Also abort request can happen i.e. while the command is being prepared
& issued (it is done without ide_lock being taken and the timeout is not
even armed yet) so there are additional issues to take care of.

IOW while the two operations are very similar on the drive level they
certainly have different needs and requirements at the higher level
(abort operation being especially tricky).

Thanks,
Bart

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

On Tue, Jun 24, 2008 at 4:19 PM, Bartlomiej Zolnierkiewicz
<[email protected]> wrote:

[...]

> IOW while the two operations are very similar on the drive level they
> certainly have different needs and requirements at the higher level
> (abort operation being especially tricky).

The confusion seems to be caused by unfortunate use of the word
'hardware' in my initial mail ('system' should be more proper)...

2008-06-25 11:24:29

by Elias Oltmanns

[permalink] [raw]
Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

"Bartlomiej Zolnierkiewicz" <[email protected]> wrote:
> On Tue, Jun 24, 2008 at 3:35 PM, Alan Cox <[email protected]> wrote:
>>> > I don't see why you think it's "hard". We have timeout handlers for many
>
>>> > commands and those reset/abort just fine.
>>>
>>> They are different beasts from user-space initiated abort operation
>>
>> No they are not. They are the *same* thing in every respect.
>>
>> You have the drive in an unknown state, you want it back. If your drive
>> lost a command due to noise or a firmware flaw you have no idea about the
>> state it is actually in (supposed to be is irrelevant)
>
> I generally agree with you w.r.t. to drive side of the operations but
> the drive is only part of the equation (the host and the request states
> are the others) so 'supposed to be is' is quite relevant.
>
> Also abort request can happen i.e. while the command is being prepared
> & issued (it is done without ide_lock being taken and the timeout is not
> even armed yet) so there are additional issues to take care of.

Yes there are. Still, I think it should be feasible which is why I
personally would prefer to drop the second patch in the series for the
time being. But then I can keep it around for reference locally and the
original infrastructure can be found in the history after all.

Even though the patch series currently doesn't fully restore the
intended functionality, I'd like to merge it now. Command aborting
didn't work reliably (if at all) before and now, at least, a simple
ioctl won't harm a healthy system anymore. Since I think that we can add
command aborting back later, I'd like to keep the HDIO_DRIVE_RESET ioctl
even if it should currently be superfluous given SG_IO.

So, here comes the second run of the patch series based on
next-20080624. All suggested changes have been incorporated except for
one additional check for ->rq_disk which I consider unnecessary (and you
haven't insisted on it). In particular, this means that in the second
patch the ->resetting member has been removed from the ide_hwgroup_t
structure for completeness' sake (even though I'm not all in favour of
applying that second patch just yet).

Regards,

Elias

2008-06-25 11:27:48

by Elias Oltmanns

[permalink] [raw]
Subject: [PATCH 1/4 v2] IDE: Fix HDIO_DRIVE_RESET handling

Currently, the code path executing an HDIO_DRIVE_RESET ioctl is broken
in various ways. Most importantly, it is treated as an out of band
request in an illegal way which may very likely lead to system lock ups.
Use the drive's request queue to avoid this problem (and fix a locking
issue for free along the way).

Signed-off-by: Elias Oltmanns <[email protected]>
---

drivers/ide/ide-io.c | 23 ++++++++++++++++++++++-
drivers/ide/ide-iops.c | 18 +++++++++++++-----
drivers/ide/ide.c | 41 +++++++++++++++--------------------------
include/linux/ide.h | 6 ++++++
4 files changed, 56 insertions(+), 32 deletions(-)

diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 2805774..2b33c12 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -766,6 +766,18 @@ static ide_startstop_t execute_drive_cmd (ide_drive_t *drive,
return ide_stopped;
}

+static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
+{
+ switch (rq->cmd[0]) {
+ case REQ_DRIVE_RESET:
+ return ide_do_reset(drive);
+ default:
+ blk_dump_rq_flags(rq, "ide_special_rq - bad request");
+ ide_end_request(drive, 0, 0);
+ return ide_stopped;
+ }
+}
+
static void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
{
struct request_pm_state *pm = rq->data;
@@ -869,7 +881,16 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
pm->pm_step == ide_pm_state_completed)
ide_complete_pm_request(drive, rq);
return startstop;
- }
+ } else if (!rq->rq_disk && blk_special_request(rq))
+ /*
+ * TODO: Once all ULDs have been modified to
+ * check for specific op codes rather than
+ * blindly accepting any special request, the
+ * check for ->rq_disk above may be replaced
+ * by a more suitable mechanism or even
+ * dropped entirely.
+ */
+ return ide_special_rq(drive, rq);

drv = *(ide_driver_t **)rq->rq_disk->private_data;
return drv->do_request(drive, rq, block);
diff --git a/drivers/ide/ide-iops.c b/drivers/ide/ide-iops.c
index 66e39e5..80e782b 100644
--- a/drivers/ide/ide-iops.c
+++ b/drivers/ide/ide-iops.c
@@ -905,6 +905,14 @@ void ide_execute_pkt_cmd(ide_drive_t *drive)
}
EXPORT_SYMBOL_GPL(ide_execute_pkt_cmd);

+static inline void ide_complete_drive_reset(ide_drive_t *drive)
+{
+ struct request *rq = drive->hwif->hwgroup->rq;
+
+ if (rq && blk_special_request(rq) && rq->cmd[0] == REQ_DRIVE_RESET)
+ ide_end_request(drive, 1, 0);
+}
+
/* needed below */
static ide_startstop_t do_reset1 (ide_drive_t *, int);

@@ -940,7 +948,7 @@ static ide_startstop_t atapi_reset_pollfunc (ide_drive_t *drive)
}
/* done polling */
hwgroup->polling = 0;
- hwgroup->resetting = 0;
+ ide_complete_drive_reset(drive);
return ide_stopped;
}

@@ -961,7 +969,7 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
if (port_ops->reset_poll(drive)) {
printk(KERN_ERR "%s: host reset_poll failure for %s.\n",
hwif->name, drive->name);
- return ide_stopped;
+ goto out;
}
}

@@ -1004,7 +1012,8 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
}
}
hwgroup->polling = 0; /* done polling */
- hwgroup->resetting = 0; /* done reset attempt */
+out:
+ ide_complete_drive_reset(drive);
return ide_stopped;
}

@@ -1090,7 +1099,6 @@ static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)

/* For an ATAPI device, first try an ATAPI SRST. */
if (drive->media != ide_disk && !do_not_try_atapi) {
- hwgroup->resetting = 1;
pre_reset(drive);
SELECT_DRIVE(drive);
udelay (20);
@@ -1112,10 +1120,10 @@ static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)

if (io_ports->ctl_addr == 0) {
spin_unlock_irqrestore(&ide_lock, flags);
+ ide_complete_drive_reset(drive);
return ide_stopped;
}

- hwgroup->resetting = 1;
/*
* Note that we also set nIEN while resetting the device,
* to mask unwanted interrupts from the interface during the reset.
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index b0739c9..dbedb02 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -529,6 +529,19 @@ static int generic_ide_resume(struct device *dev)
return err;
}

+static void generic_drive_reset(ide_drive_t *drive)
+{
+ struct request *rq;
+
+ rq = blk_get_request(drive->queue, READ, __GFP_WAIT);
+ rq->cmd_type = REQ_TYPE_SPECIAL;
+ rq->cmd_len = 1;
+ rq->cmd[0] = REQ_DRIVE_RESET;
+ rq->cmd_flags |= REQ_SOFTBARRIER;
+ blk_execute_rq(drive->queue, NULL, rq, 1);
+ blk_put_request(rq);
+}
+
int generic_ide_ioctl(ide_drive_t *drive, struct file *file, struct block_device *bdev,
unsigned int cmd, unsigned long arg)
{
@@ -603,33 +616,9 @@ int generic_ide_ioctl(ide_drive_t *drive, struct file *file, struct block_device
if (!capable(CAP_SYS_ADMIN))
return -EACCES;

- /*
- * Abort the current command on the
- * group if there is one, taking
- * care not to allow anything else
- * to be queued and to die on the
- * spot if we miss one somehow
- */
-
- spin_lock_irqsave(&ide_lock, flags);
-
- if (HWGROUP(drive)->resetting) {
- spin_unlock_irqrestore(&ide_lock, flags);
- return -EBUSY;
- }
-
- ide_abort(drive, "drive reset");
-
- BUG_ON(HWGROUP(drive)->handler);
-
- /* Ensure nothing gets queued after we
- drop the lock. Reset will clear the busy */
-
- HWGROUP(drive)->busy = 1;
- spin_unlock_irqrestore(&ide_lock, flags);
- (void) ide_do_reset(drive);
-
+ generic_drive_reset(drive);
return 0;
+
case HDIO_GET_BUSSTATE:
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 5ddae9b..8fbe838 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -139,6 +139,12 @@ struct ide_io_ports {
#define WAIT_MIN_SLEEP (2*HZ/100) /* 20msec - minimum sleep time */

/*
+ * Op codes for special requests to be handled by ide_special_rq().
+ * Values should be in the range of 0x20 to 0x3f.
+ */
+#define REQ_DRIVE_RESET 0x20
+
+/*
* Check for an interrupt and acknowledge the interrupt status
*/
struct hwif_s;

2008-06-25 11:29:10

by Elias Oltmanns

[permalink] [raw]
Subject: [PATCH 2/4 v2] IDE: Remove unused code

Remove some code which has been made obsolete and hasn't worked properly
before anyway. Part of the infrastructure may be reintroduced in a
follow up patch to implement a working command aborting facility.

Signed-off-by: Elias Oltmanns <[email protected]>
---

drivers/ide/ide-cd.c | 1 -
drivers/ide/ide-disk.c | 1 -
drivers/ide/ide-floppy.c | 1 -
drivers/ide/ide-io.c | 49 ----------------------------------------------
drivers/ide/ide-tape.c | 1 -
drivers/scsi/ide-scsi.c | 14 -------------
include/linux/ide.h | 7 -------
7 files changed, 0 insertions(+), 74 deletions(-)

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index b5ec249..26e8972 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1940,7 +1940,6 @@ static ide_driver_t ide_cdrom_driver = {
.do_request = ide_cd_do_request,
.end_request = ide_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idecd_proc,
#endif
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 5f49a4a..3a2e802 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -985,7 +985,6 @@ static ide_driver_t idedisk_driver = {
.do_request = ide_do_rw_disk,
.end_request = ide_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idedisk_proc,
#endif
diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
index c7e4433..011d720 100644
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -1129,7 +1129,6 @@ static ide_driver_t idefloppy_driver = {
.do_request = idefloppy_do_request,
.end_request = idefloppy_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idefloppy_proc,
#endif
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 2b33c12..661b75a 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -504,55 +504,6 @@ ide_startstop_t ide_error (ide_drive_t *drive, const char *msg, u8 stat)

EXPORT_SYMBOL_GPL(ide_error);

-ide_startstop_t __ide_abort(ide_drive_t *drive, struct request *rq)
-{
- if (drive->media != ide_disk)
- rq->errors |= ERROR_RESET;
-
- ide_kill_rq(drive, rq);
-
- return ide_stopped;
-}
-
-EXPORT_SYMBOL_GPL(__ide_abort);
-
-/**
- * ide_abort - abort pending IDE operations
- * @drive: drive the error occurred on
- * @msg: message to report
- *
- * ide_abort kills and cleans up when we are about to do a
- * host initiated reset on active commands. Longer term we
- * want handlers to have sensible abort handling themselves
- *
- * This differs fundamentally from ide_error because in
- * this case the command is doing just fine when we
- * blow it away.
- */
-
-ide_startstop_t ide_abort(ide_drive_t *drive, const char *msg)
-{
- struct request *rq;
-
- if (drive == NULL || (rq = HWGROUP(drive)->rq) == NULL)
- return ide_stopped;
-
- /* retry only "normal" I/O: */
- if (!blk_fs_request(rq)) {
- rq->errors = 1;
- ide_end_drive_cmd(drive, BUSY_STAT, 0);
- return ide_stopped;
- }
-
- if (rq->rq_disk) {
- ide_driver_t *drv;
-
- drv = *(ide_driver_t **)rq->rq_disk->private_data;
- return drv->abort(drive, rq);
- } else
- return __ide_abort(drive, rq);
-}
-
static void ide_tf_set_specify_cmd(ide_drive_t *drive, struct ide_taskfile *tf)
{
tf->nsect = drive->sect;
diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
index 2deb23d..353dd11 100644
--- a/drivers/ide/ide-tape.c
+++ b/drivers/ide/ide-tape.c
@@ -2591,7 +2591,6 @@ static ide_driver_t idetape_driver = {
.do_request = idetape_do_request,
.end_request = idetape_end_request,
.error = __ide_error,
- .abort = __ide_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idetape_proc,
#endif
diff --git a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
index 683bce3..f843c13 100644
--- a/drivers/scsi/ide-scsi.c
+++ b/drivers/scsi/ide-scsi.c
@@ -258,19 +258,6 @@ idescsi_atapi_error(ide_drive_t *drive, struct request *rq, u8 stat, u8 err)
return ide_stopped;
}

-static ide_startstop_t
-idescsi_atapi_abort(ide_drive_t *drive, struct request *rq)
-{
- debug_log("%s called for %lu\n", __func__,
- ((struct ide_atapi_pc *) rq->special)->scsi_cmd->serial_number);
-
- rq->errors |= ERROR_MAX;
-
- idescsi_end_request(drive, 0, 0);
-
- return ide_stopped;
-}
-
static int idescsi_end_request (ide_drive_t *drive, int uptodate, int nrsecs)
{
idescsi_scsi_t *scsi = drive_to_idescsi(drive);
@@ -524,7 +511,6 @@ static ide_driver_t idescsi_driver = {
.do_request = idescsi_do_request,
.end_request = idescsi_end_request,
.error = idescsi_atapi_error,
- .abort = idescsi_atapi_abort,
#ifdef CONFIG_IDE_PROC_FS
.proc = idescsi_proc,
#endif
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 8fbe838..25f4c65 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -556,8 +556,6 @@ typedef struct hwgroup_s {
unsigned int sleeping : 1;
/* BOOL: polling active & poll_timeout field valid */
unsigned int polling : 1;
- /* BOOL: in a polling reset situation. Must not trigger another reset yet */
- unsigned int resetting : 1;

/* current drive */
ide_drive_t *drive;
@@ -777,7 +775,6 @@ struct ide_driver_s {
ide_startstop_t (*do_request)(ide_drive_t *, struct request *, sector_t);
int (*end_request)(ide_drive_t *, int, int);
ide_startstop_t (*error)(ide_drive_t *, struct request *rq, u8, u8);
- ide_startstop_t (*abort)(ide_drive_t *, struct request *rq);
struct device_driver gen_driver;
int (*probe)(ide_drive_t *);
void (*remove)(ide_drive_t *);
@@ -819,10 +816,6 @@ ide_startstop_t __ide_error(ide_drive_t *, struct request *, u8, u8);

ide_startstop_t ide_error (ide_drive_t *drive, const char *msg, byte stat);

-ide_startstop_t __ide_abort(ide_drive_t *, struct request *);
-
-extern ide_startstop_t ide_abort(ide_drive_t *, const char *);
-
extern void ide_fix_driveid(struct hd_driveid *);

extern void ide_fixstring(u8 *, const int, const int);

2008-06-25 11:30:17

by Elias Oltmanns

[permalink] [raw]
Subject: [PATCH 3/4 v2] Update documentation of HDIO_DRIVE_RESET ioctl

Alter the entry for HDIO_DRIVE_RESET in Documentation/ioctl/hdio.txt to
reflect a functional change in the driver. Besides, the entry has been
inaccurate before.

Signed-off-by: Elias Oltmanns <[email protected]>
---

Documentation/ioctl/hdio.txt | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt
index c19efde..44d283d 100644
--- a/Documentation/ioctl/hdio.txt
+++ b/Documentation/ioctl/hdio.txt
@@ -511,9 +511,8 @@ HDIO_DRIVE_RESET execute a device reset

notes:

- Abort any current command, prevent anything else from being
- queued, execute a reset on the device, and issue BLKRRPART
- ioctl on the block device.
+ Execute a reset on the device as soon as the current IO
+ operation has completed.

Executes an ATAPI soft reset if applicable, otherwise
executes an ATA soft reset on the controller.

2008-06-25 11:31:21

by Elias Oltmanns

[permalink] [raw]
Subject: [PATCH 4/4 v2] IDE: Report errors during drive reset back to user space

Make sure that each error condition during the execution of an
HDIO_DRIVE_RESET ioctl is actually reported to the calling process.
Also, unify the exit path of reset_pollfunc() when returning ide_stopped
since the need of ->port_ops->reset_poll() to be treated specially has
vanished (way back, it seems).

Signed-off-by: Elias Oltmanns <[email protected]>
---

Documentation/ioctl/hdio.txt | 2 ++
drivers/ide/ide-iops.c | 18 +++++++++++-------
drivers/ide/ide.c | 10 ++++++----
drivers/ide/pci/siimage.c | 3 +--
4 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt
index 44d283d..91a6ecb 100644
--- a/Documentation/ioctl/hdio.txt
+++ b/Documentation/ioctl/hdio.txt
@@ -508,6 +508,8 @@ HDIO_DRIVE_RESET execute a device reset

error returns:
EACCES Access denied: requires CAP_SYS_ADMIN
+ ENXIO No such device: phy dead or ctl_addr == 0
+ EIO I/O error: reset timed out or hardware error

notes:

diff --git a/drivers/ide/ide-iops.c b/drivers/ide/ide-iops.c
index 80e782b..6a8b955 100644
--- a/drivers/ide/ide-iops.c
+++ b/drivers/ide/ide-iops.c
@@ -905,12 +905,12 @@ void ide_execute_pkt_cmd(ide_drive_t *drive)
}
EXPORT_SYMBOL_GPL(ide_execute_pkt_cmd);

-static inline void ide_complete_drive_reset(ide_drive_t *drive)
+static inline void ide_complete_drive_reset(ide_drive_t *drive, int err)
{
struct request *rq = drive->hwif->hwgroup->rq;

if (rq && blk_special_request(rq) && rq->cmd[0] == REQ_DRIVE_RESET)
- ide_end_request(drive, 1, 0);
+ ide_end_request(drive, err ? err : 1, 0);
}

/* needed below */
@@ -948,7 +948,7 @@ static ide_startstop_t atapi_reset_pollfunc (ide_drive_t *drive)
}
/* done polling */
hwgroup->polling = 0;
- ide_complete_drive_reset(drive);
+ ide_complete_drive_reset(drive, 0);
return ide_stopped;
}

@@ -964,9 +964,11 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
ide_hwif_t *hwif = HWIF(drive);
const struct ide_port_ops *port_ops = hwif->port_ops;
u8 tmp;
+ int err = 0;

if (port_ops && port_ops->reset_poll) {
- if (port_ops->reset_poll(drive)) {
+ err = port_ops->reset_poll(drive);
+ if (err) {
printk(KERN_ERR "%s: host reset_poll failure for %s.\n",
hwif->name, drive->name);
goto out;
@@ -983,6 +985,7 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
}
printk("%s: reset timed-out, status=0x%02x\n", hwif->name, tmp);
drive->failures++;
+ err = -EIO;
} else {
printk("%s: reset: ", hwif->name);
tmp = ide_read_error(drive);
@@ -1009,11 +1012,12 @@ static ide_startstop_t reset_pollfunc (ide_drive_t *drive)
if (tmp & 0x80)
printk("; slave: failed");
printk("\n");
+ err = -EIO;
}
}
- hwgroup->polling = 0; /* done polling */
out:
- ide_complete_drive_reset(drive);
+ hwgroup->polling = 0; /* done polling */
+ ide_complete_drive_reset(drive, err);
return ide_stopped;
}

@@ -1120,7 +1124,7 @@ static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)

if (io_ports->ctl_addr == 0) {
spin_unlock_irqrestore(&ide_lock, flags);
- ide_complete_drive_reset(drive);
+ ide_complete_drive_reset(drive, -ENXIO);
return ide_stopped;
}

diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index dbedb02..dfdc48a 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -529,17 +529,20 @@ static int generic_ide_resume(struct device *dev)
return err;
}

-static void generic_drive_reset(ide_drive_t *drive)
+static int generic_drive_reset(ide_drive_t *drive)
{
struct request *rq;
+ int ret = 0;

rq = blk_get_request(drive->queue, READ, __GFP_WAIT);
rq->cmd_type = REQ_TYPE_SPECIAL;
rq->cmd_len = 1;
rq->cmd[0] = REQ_DRIVE_RESET;
rq->cmd_flags |= REQ_SOFTBARRIER;
- blk_execute_rq(drive->queue, NULL, rq, 1);
+ if (blk_execute_rq(drive->queue, NULL, rq, 1))
+ ret = rq->errors;
blk_put_request(rq);
+ return ret;
}

int generic_ide_ioctl(ide_drive_t *drive, struct file *file, struct block_device *bdev,
@@ -616,8 +619,7 @@ int generic_ide_ioctl(ide_drive_t *drive, struct file *file, struct block_device
if (!capable(CAP_SYS_ADMIN))
return -EACCES;

- generic_drive_reset(drive);
- return 0;
+ return generic_drive_reset(drive);

case HDIO_GET_BUSSTATE:
if (!capable(CAP_SYS_ADMIN))
diff --git a/drivers/ide/pci/siimage.c b/drivers/ide/pci/siimage.c
index b75e9bb..6e9d765 100644
--- a/drivers/ide/pci/siimage.c
+++ b/drivers/ide/pci/siimage.c
@@ -421,8 +421,7 @@ static int sil_sata_reset_poll(ide_drive_t *drive)
if ((sata_stat & 0x03) != 0x03) {
printk(KERN_WARNING "%s: reset phy dead, status=0x%08x\n",
hwif->name, sata_stat);
- HWGROUP(drive)->polling = 0;
- return ide_started;
+ return -ENXIO;
}
}

Subject: Re: [PATCH] IDE: Fix HDIO_DRIVE_RESET handling

On Wednesday 25 June 2008, Elias Oltmanns wrote:
> "Bartlomiej Zolnierkiewicz" <[email protected]> wrote:
> > On Tue, Jun 24, 2008 at 3:35 PM, Alan Cox <[email protected]> wrote:
> >>> > I don't see why you think it's "hard". We have timeout handlers for many
> >
> >>> > commands and those reset/abort just fine.
> >>>
> >>> They are different beasts from user-space initiated abort operation
> >>
> >> No they are not. They are the *same* thing in every respect.
> >>
> >> You have the drive in an unknown state, you want it back. If your drive
> >> lost a command due to noise or a firmware flaw you have no idea about the
> >> state it is actually in (supposed to be is irrelevant)
> >
> > I generally agree with you w.r.t. to drive side of the operations but
> > the drive is only part of the equation (the host and the request states
> > are the others) so 'supposed to be is' is quite relevant.
> >
> > Also abort request can happen i.e. while the command is being prepared
> > & issued (it is done without ide_lock being taken and the timeout is not
> > even armed yet) so there are additional issues to take care of.
>
> Yes there are. Still, I think it should be feasible which is why I
> personally would prefer to drop the second patch in the series for the
> time being. But then I can keep it around for reference locally and the
> original infrastructure can be found in the history after all.
>
> Even though the patch series currently doesn't fully restore the
> intended functionality, I'd like to merge it now. Command aborting

Thanks, I applied everything and queued it for 2.6.27.

[ including patch #2, we should re-add aborting when fixed/necessary ]

> didn't work reliably (if at all) before and now, at least, a simple
> ioctl won't harm a healthy system anymore. Since I think that we can add
> command aborting back later, I'd like to keep the HDIO_DRIVE_RESET ioctl
> even if it should currently be superfluous given SG_IO.

SG_IO ATA pass-through is unsupported currently in drivers/ide/
(though nowadays it should be quite easy to add it if somebody is
interested) so it is indeed the best to leave the ioctl for now.

Bart