2005-03-23 02:14:52

by Tejun Heo

[permalink] [raw]
Subject: [PATCH scsi-misc-2.6 00/08] scsi: small fixes & cleanups

Hello, James.
Hello, Jens.

These are series of small fixes & cleanups. The last two patches
deal with reference counting and hot unplugging oops. Patches are
against scsi-misc-2.6 tree (this is the devel tree, right?).

Jens, please try #08 and tell me if you still get oops. AFAICT,
reference counting isn't the issue here. We're over-counting not
under-counting (see description of #07).

All compile cleanly and I haven't found any problem yet. USB
hot-unplugging doesn't create oops or offline device anymore for me.

[ Start of patch descriptions ]

01_scsi_remove_scsi_release_buffers.patch
: remove unused bounce-buffer release path

Buffer bouncing hasn't been done inside the scsi midlayer for
quite sometime now, but bounce-buffer release paths are still
around. This patch removes these unused paths.

02_scsi_no_special_on_requeue.patch
: don't use blk_insert_request() for requeueing

blk_insert_request() has 'reinsert' argument, which, when set,
turns on REQ_SPECIAL and REQ_SOFTBARRIER and requeues the
request. SCSI midlayer was the only user of this feature and
all requeued requests become special requests defeating
quiesce state. This patch makes scsi midlayer use
blk_requeue_request() for requeueing and removes 'reinsert'
feature from blk_insert_request().

Note: In drivers/scsi/scsi_lib.c, scsi_single_lun_run() and
scsi_run_queue() are moved upward unchanged.

03_scsi_remove_internal_timeout.patch
: remove unused scsi_cmnd->internal_timeout field

scsi_cmnd->internal_timeout field doesn't have any meaning
anymore. Kill the field.

04_scsi_remove_volatile.patch
: remove meaningless volatile qualifiers from structure definitions

scsi_device->device_busy, Scsi_Host->host_busy and
->host_failed have volatile qualifiers, but the qualifiers
don't serve any purpose. Kill them. While at it, protect
->host_failed update in scsi_error for consistency and clarity.

05_scsi_timer_cleanup.patch
: remove a timer race from scsi_queue_insert() and cleanup timer

scsi_queue_insert() has four callers. Three callers call with
timer disabled and one (the second invocation in
scsi_dispatch_cmd()) calls with timer activated.
scsi_queue_insert() used to always call scsi_delete_timer()
and ignore the return value. This results in race with timer
expiration. Remove scsi_delete_timer() call from
scsi_queue_insert() and make the caller delete timer and check
the return value.

While at it, as, once a scsi timer is added, it should expire
or be deleted before reused, make scsi_add_timer() strict
about timer reuses. Now timer expiration function clears
->function and all timer deletion should go through
scsi_delete_timer(). Also, remove bogus ->eh_action tests
from scsi_eh_{done|times_out} functions. The condition is
always true and the test is somewhat misleading.

06_scsi_remove_serial_number_at_timeout.patch
: remove meaningless scsi_cmnd->serial_number_at_timeout field

scsi_cmnd->serial_number_at_timeout doesn't serve any purpose
anymore. All serial_number == serial_number_at_timeout tests
are always true in abort callbacks. Kill the field. Also, as
->pid always equals ->serial_number and ->serial_number
doesn't have any special meaning anymore, update comments
above ->serial_number accordingly. Once we remove all uses of
this field from all lldd's, this field should go.

07_scsi_refcnt_cleanup.patch
: remove bogus {get|put}_device() calls

SCSI request submission paths can be categorized like the
following.

* through high-level driver (sd, st, sg...)
+ requests (fs / pc)
+ ioctls
+ flushes (issue_flush / barrier rqs)
+ backing dev (unplug fn / field referencing)
+ high-level specific (init / revalidation...)
* through scsi-midlayer
+ midlevel specific (init...)
+ transport specific (domain validations...)

All accesses either

* open high-level driver before submitting requests and
closes with no request left.
* get_device() before submitting requests and put_device()
with no request left.

So, basically, SCSI high-level object (scsi_disk) and
mid-level object (scsi_device) are reference counted by users,
not the requests they submit. Reference count cannot go zero
with active users and users cannot access the object once the
reference count reaches zero.

So, the {get/put}_device() calls in scsi_get_command() and
scsi_request_fn() are bogus and misleading. In addition,
get_device() cannot synchronize 1->0 and 0->1 transitions and
always returns the device pointer given as the argument. The
== NULL tests are just misleading.

08_scsi_hot_unplug_fix.patch
: fix hot unplug sequence

When hot-unplugging using scsi_remove_host() function (as usb
does), scsi_forget_host() used to be called before
scsi_host_cancel(). So, the device gets removed first without
request cleanup and scsi_host_cancel() never gets to call
scsi_device_cancel() on the removed devices. This results in
premature completion of hot-unplugging process with active
requests left in queue, eventually leading to hang/offlined
device or oops when the active command times out.

This patch makes scsi_remove_host() call scsi_host_cancel()
first such that the host is first transited into cancel state
and all requests of all devices are killed, and then, the
devices are removed. This patch fixes the oops in eh after
hot-unplugging bug.

[ End of patch descriptions ]

Thanks a lot.

--
tejun


2005-03-23 02:15:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 01/08] scsi: remove unused bounce-buffer release path

01_scsi_remove_scsi_release_buffers.patch

Buffer bouncing hasn't been done inside the scsi midlayer for
quite sometime now, but bounce-buffer release paths are still
around. This patch removes these unused paths.

Signed-off-by: Tejun Heo <[email protected]>

scsi_lib.c | 60 +++++-------------------------------------------------------
1 files changed, 5 insertions(+), 55 deletions(-)

Index: scsi-export/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_lib.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_lib.c 2005-03-23 09:40:09.000000000 +0900
@@ -622,45 +622,6 @@ static void scsi_free_sgtable(struct sca
}

/*
- * Function: scsi_release_buffers()
- *
- * Purpose: Completion processing for block device I/O requests.
- *
- * Arguments: cmd - command that we are bailing.
- *
- * Lock status: Assumed that no lock is held upon entry.
- *
- * Returns: Nothing
- *
- * Notes: In the event that an upper level driver rejects a
- * command, we must release resources allocated during
- * the __init_io() function. Primarily this would involve
- * the scatter-gather table, and potentially any bounce
- * buffers.
- */
-static void scsi_release_buffers(struct scsi_cmnd *cmd)
-{
- struct request *req = cmd->request;
-
- /*
- * Free up any indirection buffers we allocated for DMA purposes.
- */
- if (cmd->use_sg)
- scsi_free_sgtable(cmd->request_buffer, cmd->sglist_len);
- else if (cmd->request_buffer != req->buffer)
- kfree(cmd->request_buffer);
-
- /*
- * Zero these out. They now point to freed memory, and it is
- * dangerous to hang onto the pointers.
- */
- cmd->buffer = NULL;
- cmd->bufflen = 0;
- cmd->request_buffer = NULL;
- cmd->request_bufflen = 0;
-}
-
-/*
* Function: scsi_io_completion()
*
* Purpose: Completion processing for block device I/O requests.
@@ -703,22 +664,8 @@ void scsi_io_completion(struct scsi_cmnd
if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
return;

- /*
- * Free up any indirection buffers we allocated for DMA purposes.
- * For the case of a READ, we need to copy the data out of the
- * bounce buffer and into the real buffer.
- */
if (cmd->use_sg)
scsi_free_sgtable(cmd->buffer, cmd->sglist_len);
- else if (cmd->buffer != req->buffer) {
- if (rq_data_dir(req) == READ) {
- unsigned long flags;
- char *to = bio_kmap_irq(req->bio, &flags);
- memcpy(to, cmd->buffer, cmd->bufflen);
- bio_kunmap_irq(to, &flags);
- }
- kfree(cmd->buffer);
- }

if (result) {
sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
@@ -963,7 +910,8 @@ static int scsi_init_io(struct scsi_cmnd
req->current_nr_sectors);

/* release the command and kill it */
- scsi_release_buffers(cmd);
+ if (cmd->use_sg)
+ scsi_free_sgtable(cmd->request_buffer, cmd->sglist_len);
scsi_put_command(cmd);
return BLKPREP_KILL;
}
@@ -1140,7 +1088,9 @@ static int scsi_prep_fn(struct request_q
*/
drv = *(struct scsi_driver **)req->rq_disk->private_data;
if (unlikely(!drv->init_command(cmd))) {
- scsi_release_buffers(cmd);
+ if (cmd->use_sg)
+ scsi_free_sgtable(cmd->request_buffer,
+ cmd->sglist_len);
scsi_put_command(cmd);
return BLKPREP_KILL;
}

2005-03-23 02:18:37

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 02/08] scsi: don't use blk_insert_request() for requeueing

02_scsi_no_special_on_requeue.patch

blk_insert_request() has 'reinsert' argument, which, when set,
turns on REQ_SPECIAL and REQ_SOFTBARRIER and requeues the
request. SCSI midlayer was the only user of this feature and
all requeued requests become special requests defeating
quiesce state. This patch makes scsi midlayer use
blk_requeue_request() for requeueing and removes 'reinsert'
feature from blk_insert_request().

Note: In drivers/scsi/scsi_lib.c, scsi_single_lun_run() and
scsi_run_queue() are moved upward unchanged.

Signed-off-by: Tejun Heo <[email protected]>

drivers/block/ll_rw_blk.c | 20 +--
drivers/block/paride/pd.c | 2
drivers/block/sx8.c | 4
drivers/scsi/scsi_lib.c | 238 +++++++++++++++++++++++-----------------------
include/linux/blkdev.h | 2
5 files changed, 133 insertions(+), 133 deletions(-)

Index: scsi-export/drivers/block/ll_rw_blk.c
===================================================================
--- scsi-export.orig/drivers/block/ll_rw_blk.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/block/ll_rw_blk.c 2005-03-23 09:40:09.000000000 +0900
@@ -2028,7 +2028,6 @@ EXPORT_SYMBOL(blk_requeue_request);
* @rq: request to be inserted
* @at_head: insert request at head or tail of queue
* @data: private data
- * @reinsert: true if request it a reinsertion of previously processed one
*
* Description:
* Many block devices need to execute commands asynchronously, so they don't
@@ -2043,8 +2042,9 @@ EXPORT_SYMBOL(blk_requeue_request);
* host that is unable to accept a particular command.
*/
void blk_insert_request(request_queue_t *q, struct request *rq,
- int at_head, void *data, int reinsert)
+ int at_head, void *data)
{
+ int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
unsigned long flags;

/*
@@ -2061,20 +2061,12 @@ void blk_insert_request(request_queue_t
/*
* If command is tagged, release the tag
*/
- if (reinsert)
- blk_requeue_request(q, rq);
- else {
- int where = ELEVATOR_INSERT_BACK;
+ if (blk_rq_tagged(rq))
+ blk_queue_end_tag(q, rq);

- if (at_head)
- where = ELEVATOR_INSERT_FRONT;
+ drive_stat_acct(rq, rq->nr_sectors, 1);
+ __elv_add_request(q, rq, where, 0);

- if (blk_rq_tagged(rq))
- blk_queue_end_tag(q, rq);
-
- drive_stat_acct(rq, rq->nr_sectors, 1);
- __elv_add_request(q, rq, where, 0);
- }
if (blk_queue_plugged(q))
__generic_unplug_device(q);
else
Index: scsi-export/drivers/block/paride/pd.c
===================================================================
--- scsi-export.orig/drivers/block/paride/pd.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/block/paride/pd.c 2005-03-23 09:40:09.000000000 +0900
@@ -723,7 +723,7 @@ static int pd_special_command(struct pd_
rq.ref_count = 1;
rq.waiting = &wait;
rq.end_io = blk_end_sync_rq;
- blk_insert_request(disk->gd->queue, &rq, 0, func, 0);
+ blk_insert_request(disk->gd->queue, &rq, 0, func);
wait_for_completion(&wait);
rq.waiting = NULL;
if (rq.errors)
Index: scsi-export/drivers/block/sx8.c
===================================================================
--- scsi-export.orig/drivers/block/sx8.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/block/sx8.c 2005-03-23 09:40:09.000000000 +0900
@@ -614,7 +614,7 @@ static int carm_array_info (struct carm_
spin_unlock_irq(&host->lock);

DPRINTK("blk_insert_request, tag == %u\n", idx);
- blk_insert_request(host->oob_q, crq->rq, 1, crq, 0);
+ blk_insert_request(host->oob_q, crq->rq, 1, crq);

return 0;

@@ -653,7 +653,7 @@ static int carm_send_special (struct car
crq->msg_bucket = (u32) rc;

DPRINTK("blk_insert_request, tag == %u\n", idx);
- blk_insert_request(host->oob_q, crq->rq, 1, crq, 0);
+ blk_insert_request(host->oob_q, crq->rq, 1, crq);

return 0;
}
Index: scsi-export/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_lib.c 2005-03-23 09:40:09.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_lib.c 2005-03-23 09:40:09.000000000 +0900
@@ -65,6 +65,109 @@ struct scsi_host_sg_pool scsi_sg_pools[]


/*
+ * Called for single_lun devices on IO completion. Clear starget_sdev_user,
+ * and call blk_run_queue for all the scsi_devices on the target -
+ * including current_sdev first.
+ *
+ * Called with *no* scsi locks held.
+ */
+static void scsi_single_lun_run(struct scsi_device *current_sdev)
+{
+ struct Scsi_Host *shost = current_sdev->host;
+ struct scsi_device *sdev, *tmp;
+ struct scsi_target *starget = scsi_target(current_sdev);
+ unsigned long flags;
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ starget->starget_sdev_user = NULL;
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
+ /*
+ * Call blk_run_queue for all LUNs on the target, starting with
+ * current_sdev. We race with others (to set starget_sdev_user),
+ * but in most cases, we will be first. Ideally, each LU on the
+ * target would get some limited time or requests on the target.
+ */
+ blk_run_queue(current_sdev->request_queue);
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (starget->starget_sdev_user)
+ goto out;
+ list_for_each_entry_safe(sdev, tmp, &starget->devices,
+ same_target_siblings) {
+ if (sdev == current_sdev)
+ continue;
+ if (scsi_device_get(sdev))
+ continue;
+
+ spin_unlock_irqrestore(shost->host_lock, flags);
+ blk_run_queue(sdev->request_queue);
+ spin_lock_irqsave(shost->host_lock, flags);
+
+ scsi_device_put(sdev);
+ }
+ out:
+ spin_unlock_irqrestore(shost->host_lock, flags);
+}
+
+/*
+ * Function: scsi_run_queue()
+ *
+ * Purpose: Select a proper request queue to serve next
+ *
+ * Arguments: q - last request's queue
+ *
+ * Returns: Nothing
+ *
+ * Notes: The previous command was completely finished, start
+ * a new one if possible.
+ */
+static void scsi_run_queue(struct request_queue *q)
+{
+ struct scsi_device *sdev = q->queuedata;
+ struct Scsi_Host *shost = sdev->host;
+ unsigned long flags;
+
+ if (sdev->single_lun)
+ scsi_single_lun_run(sdev);
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ while (!list_empty(&shost->starved_list) &&
+ !shost->host_blocked && !shost->host_self_blocked &&
+ !((shost->can_queue > 0) &&
+ (shost->host_busy >= shost->can_queue))) {
+ /*
+ * As long as shost is accepting commands and we have
+ * starved queues, call blk_run_queue. scsi_request_fn
+ * drops the queue_lock and can add us back to the
+ * starved_list.
+ *
+ * host_lock protects the starved_list and starved_entry.
+ * scsi_request_fn must get the host_lock before checking
+ * or modifying starved_list or starved_entry.
+ */
+ sdev = list_entry(shost->starved_list.next,
+ struct scsi_device, starved_entry);
+ list_del_init(&sdev->starved_entry);
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
+ blk_run_queue(sdev->request_queue);
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (unlikely(!list_empty(&sdev->starved_entry)))
+ /*
+ * sdev lost a race, and was put back on the
+ * starved list. This is unlikely but without this
+ * in theory we could loop forever.
+ */
+ break;
+ }
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
+ blk_run_queue(q);
+}
+
+/*
* Function: scsi_insert_special_req()
*
* Purpose: Insert pre-formed request into request queue.
@@ -92,7 +195,7 @@ int scsi_insert_special_req(struct scsi_
*/
sreq->sr_request->flags &= ~REQ_DONTPREP;
blk_insert_request(sreq->sr_device->request_queue, sreq->sr_request,
- at_head, sreq, 0);
+ at_head, sreq);
return 0;
}

@@ -119,6 +222,8 @@ int scsi_queue_insert(struct scsi_cmnd *
{
struct Scsi_Host *host = cmd->device->host;
struct scsi_device *device = cmd->device;
+ struct request_queue *q = device->request_queue;
+ unsigned long flags;

SCSI_LOG_MLQUEUE(1,
printk("Inserting command %p into mlqueue\n", cmd));
@@ -160,17 +265,17 @@ int scsi_queue_insert(struct scsi_cmnd *
scsi_device_unbusy(device);

/*
- * Insert this command at the head of the queue for it's device.
- * It will go before all other commands that are already in the queue.
- *
- * NOTE: there is magic here about the way the queue is plugged if
- * we have no outstanding commands.
- *
- * Although this *doesn't* plug the queue, it does call the request
- * function. The SCSI request function detects the blocked condition
- * and plugs the queue appropriately.
+ * Requeue the command. Turn on REQ_SOFTBARRIER to prevent
+ * other requests from passing this request.
*/
- blk_insert_request(device->request_queue, cmd->request, 1, cmd, 1);
+ cmd->request->flags |= REQ_SOFTBARRIER;
+
+ spin_lock_irqsave(q->queue_lock, flags);
+ blk_requeue_request(q, cmd->request);
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ scsi_run_queue(q);
+
return 0;
}

@@ -355,109 +460,6 @@ void scsi_device_unbusy(struct scsi_devi
}

/*
- * Called for single_lun devices on IO completion. Clear starget_sdev_user,
- * and call blk_run_queue for all the scsi_devices on the target -
- * including current_sdev first.
- *
- * Called with *no* scsi locks held.
- */
-static void scsi_single_lun_run(struct scsi_device *current_sdev)
-{
- struct Scsi_Host *shost = current_sdev->host;
- struct scsi_device *sdev, *tmp;
- struct scsi_target *starget = scsi_target(current_sdev);
- unsigned long flags;
-
- spin_lock_irqsave(shost->host_lock, flags);
- starget->starget_sdev_user = NULL;
- spin_unlock_irqrestore(shost->host_lock, flags);
-
- /*
- * Call blk_run_queue for all LUNs on the target, starting with
- * current_sdev. We race with others (to set starget_sdev_user),
- * but in most cases, we will be first. Ideally, each LU on the
- * target would get some limited time or requests on the target.
- */
- blk_run_queue(current_sdev->request_queue);
-
- spin_lock_irqsave(shost->host_lock, flags);
- if (starget->starget_sdev_user)
- goto out;
- list_for_each_entry_safe(sdev, tmp, &starget->devices,
- same_target_siblings) {
- if (sdev == current_sdev)
- continue;
- if (scsi_device_get(sdev))
- continue;
-
- spin_unlock_irqrestore(shost->host_lock, flags);
- blk_run_queue(sdev->request_queue);
- spin_lock_irqsave(shost->host_lock, flags);
-
- scsi_device_put(sdev);
- }
- out:
- spin_unlock_irqrestore(shost->host_lock, flags);
-}
-
-/*
- * Function: scsi_run_queue()
- *
- * Purpose: Select a proper request queue to serve next
- *
- * Arguments: q - last request's queue
- *
- * Returns: Nothing
- *
- * Notes: The previous command was completely finished, start
- * a new one if possible.
- */
-static void scsi_run_queue(struct request_queue *q)
-{
- struct scsi_device *sdev = q->queuedata;
- struct Scsi_Host *shost = sdev->host;
- unsigned long flags;
-
- if (sdev->single_lun)
- scsi_single_lun_run(sdev);
-
- spin_lock_irqsave(shost->host_lock, flags);
- while (!list_empty(&shost->starved_list) &&
- !shost->host_blocked && !shost->host_self_blocked &&
- !((shost->can_queue > 0) &&
- (shost->host_busy >= shost->can_queue))) {
- /*
- * As long as shost is accepting commands and we have
- * starved queues, call blk_run_queue. scsi_request_fn
- * drops the queue_lock and can add us back to the
- * starved_list.
- *
- * host_lock protects the starved_list and starved_entry.
- * scsi_request_fn must get the host_lock before checking
- * or modifying starved_list or starved_entry.
- */
- sdev = list_entry(shost->starved_list.next,
- struct scsi_device, starved_entry);
- list_del_init(&sdev->starved_entry);
- spin_unlock_irqrestore(shost->host_lock, flags);
-
- blk_run_queue(sdev->request_queue);
-
- spin_lock_irqsave(shost->host_lock, flags);
- if (unlikely(!list_empty(&sdev->starved_entry)))
- /*
- * sdev lost a race, and was put back on the
- * starved list. This is unlikely but without this
- * in theory we could loop forever.
- */
- break;
- }
- spin_unlock_irqrestore(shost->host_lock, flags);
-
- blk_run_queue(q);
-}
-
-/*
* Function: scsi_requeue_command()
*
* Purpose: Handle post-processing of completed commands.
@@ -476,8 +478,14 @@ static void scsi_run_queue(struct reques
*/
static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
{
+ unsigned long flags;
+
cmd->request->flags &= ~REQ_DONTPREP;
- blk_insert_request(q, cmd->request, 1, cmd, 1);
+ cmd->request->flags |= REQ_SOFTBARRIER;
+
+ spin_lock_irqsave(q->queue_lock, flags);
+ blk_requeue_request(q, cmd->request);
+ spin_unlock_irqrestore(q->queue_lock, flags);

scsi_run_queue(q);
}
Index: scsi-export/include/linux/blkdev.h
===================================================================
--- scsi-export.orig/include/linux/blkdev.h 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/include/linux/blkdev.h 2005-03-23 09:40:09.000000000 +0900
@@ -541,7 +541,7 @@ extern void blk_end_sync_rq(struct reque
extern void blk_attempt_remerge(request_queue_t *, struct request *);
extern void __blk_attempt_remerge(request_queue_t *, struct request *);
extern struct request *blk_get_request(request_queue_t *, int, int);
-extern void blk_insert_request(request_queue_t *, struct request *, int, void *, int);
+extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
extern void blk_requeue_request(request_queue_t *, struct request *);
extern void blk_plug_device(request_queue_t *);
extern int blk_remove_plug(request_queue_t *);

2005-03-23 02:19:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 03/08] scsi: remove unused scsi_cmnd->internal_timeout field

03_scsi_remove_internal_timeout.patch

scsi_cmnd->internal_timeout field doesn't have any meaning
anymore. Kill the field.

Signed-off-by: Tejun Heo <[email protected]>

drivers/scsi/advansys.c | 2 --
drivers/scsi/pci2000.c | 4 ++--
drivers/scsi/scsi.c | 1 -
drivers/scsi/scsi_error.c | 1 -
drivers/scsi/scsi_lib.c | 1 -
drivers/scsi/scsi_priv.h | 5 -----
include/scsi/scsi_cmnd.h | 6 ------
7 files changed, 2 insertions(+), 18 deletions(-)

Index: scsi-export/drivers/scsi/advansys.c
===================================================================
--- scsi-export.orig/drivers/scsi/advansys.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/advansys.c 2005-03-23 09:40:09.000000000 +0900
@@ -9206,8 +9206,6 @@ asc_prt_scsi_cmnd(struct scsi_cmnd *s)
" timeout_per_command %d, timeout_total %d, timeout %d\n",
s->timeout_per_command, s->timeout_total, s->timeout);

- printk(" internal_timeout %u\n", s->internal_timeout);
-
printk(
" scsi_done 0x%lx, done 0x%lx, host_scribble 0x%lx, result 0x%x\n",
(ulong) s->scsi_done, (ulong) s->done,
Index: scsi-export/drivers/scsi/pci2000.c
===================================================================
--- scsi-export.orig/drivers/scsi/pci2000.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/pci2000.c 2005-03-23 09:40:09.000000000 +0900
@@ -438,8 +438,8 @@ int Pci2000_QueueCommand (Scsi_Cmnd *SCp
if ( bus )
{
DEB (if(*cdb) printk ("\nCDB: %X- %X %X %X %X %X %X %X %X %X %X ", SCpnt->cmd_len, cdb[0], cdb[1], cdb[2], cdb[3], cdb[4], cdb[5], cdb[6], cdb[7], cdb[8], cdb[9]));
- DEB (if(*cdb) printk ("\ntimeout_per_command: %d, timeout_total: %d, timeout: %d, internal_timout: %d", SCpnt->timeout_per_command,
- SCpnt->timeout_total, SCpnt->timeout, SCpnt->internal_timeout));
+ DEB (if(*cdb) printk ("\ntimeout_per_command: %d, timeout_total: %d, timeout: %d", SCpnt->timeout_per_command,
+ SCpnt->timeout_total, SCpnt->timeout));
outl (SCpnt->timeout_per_command, padapter->mb1);
outb_p (CMD_SCSI_TIMEOUT, padapter->cmd);
if ( WaitReady (padapter) )
Index: scsi-export/drivers/scsi/scsi.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/scsi.c 2005-03-23 09:40:09.000000000 +0900
@@ -716,7 +716,6 @@ void scsi_init_cmd_from_req(struct scsi_
/*
* Start the timer ticking.
*/
- cmd->internal_timeout = NORMAL_TIMEOUT;
cmd->abort_reason = 0;
cmd->result = 0;

Index: scsi-export/drivers/scsi/scsi_error.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_error.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_error.c 2005-03-23 09:40:09.000000000 +0900
@@ -1843,7 +1843,6 @@ scsi_reset_provider(struct scsi_device *
scmd->bufflen = 0;
scmd->request_buffer = NULL;
scmd->request_bufflen = 0;
- scmd->internal_timeout = NORMAL_TIMEOUT;
scmd->abort_reason = DID_ABORT;

scmd->cmd_len = 0;
Index: scsi-export/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_lib.c 2005-03-23 09:40:09.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_lib.c 2005-03-23 09:40:09.000000000 +0900
@@ -414,7 +414,6 @@ static int scsi_init_cmd_errh(struct scs
memcpy(cmd->data_cmnd, cmd->cmnd, sizeof(cmd->cmnd));
cmd->buffer = cmd->request_buffer;
cmd->bufflen = cmd->request_bufflen;
- cmd->internal_timeout = NORMAL_TIMEOUT;
cmd->abort_reason = 0;

return 1;
Index: scsi-export/drivers/scsi/scsi_priv.h
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_priv.h 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_priv.h 2005-03-23 09:40:09.000000000 +0900
@@ -30,11 +30,6 @@ struct Scsi_Host;
#define SCSI_REQ_MAGIC 0x75F6D354

/*
- * Flag bit for the internal_timeout array
- */
-#define NORMAL_TIMEOUT 0
-
-/*
* Scsi Error Handler Flags
*/
#define scsi_eh_eflags_chk(scp, flags) \
Index: scsi-export/include/scsi/scsi_cmnd.h
===================================================================
--- scsi-export.orig/include/scsi/scsi_cmnd.h 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/include/scsi/scsi_cmnd.h 2005-03-23 09:40:09.000000000 +0900
@@ -65,12 +65,6 @@ struct scsi_cmnd {
int timeout_total;
int timeout;

- /*
- * We handle the timeout differently if it happens when a reset,
- * abort, etc are in process.
- */
- unsigned volatile char internal_timeout;
-
unsigned char cmd_len;
unsigned char old_cmd_len;
enum dma_data_direction sc_data_direction;

2005-03-23 02:24:05

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

08_scsi_hot_unplug_fix.patch

When hot-unplugging using scsi_remove_host() function (as usb
does), scsi_forget_host() used to be called before
scsi_host_cancel(). So, the device gets removed first without
request cleanup and scsi_host_cancel() never gets to call
scsi_device_cancel() on the removed devices. This results in
premature completion of hot-unplugging process with active
requests left in queue, eventually leading to hang/offlined
device or oops when the active command times out.

This patch makes scsi_remove_host() call scsi_host_cancel()
first such that the host is first transited into cancel state
and all requests of all devices are killed, and then, the
devices are removed. This patch fixes the oops in eh after
hot-unplugging bug.

Signed-off-by: Tejun Heo <[email protected]>

hosts.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

Index: scsi-export/drivers/scsi/hosts.c
===================================================================
--- scsi-export.orig/drivers/scsi/hosts.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/hosts.c 2005-03-23 09:40:11.000000000 +0900
@@ -74,8 +74,8 @@ void scsi_host_cancel(struct Scsi_Host *
**/
void scsi_remove_host(struct Scsi_Host *shost)
{
- scsi_forget_host(shost);
scsi_host_cancel(shost, 0);
+ scsi_forget_host(shost);
scsi_proc_host_rm(shost);

set_bit(SHOST_DEL, &shost->shost_state);

2005-03-23 02:24:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 06/08] scsi: remove meaningless scsi_cmnd->serial_number_at_timeout field

06_scsi_remove_serial_number_at_timeout.patch

scsi_cmnd->serial_number_at_timeout doesn't serve any purpose
anymore. All serial_number == serial_number_at_timeout tests
are always true in abort callbacks. Kill the field. Also, as
->pid always equals ->serial_number and ->serial_number
doesn't have any special meaning anymore, update comments
above ->serial_number accordingly. Once we remove all uses of
this field from all lldd's, this field should go.

Signed-off-by: Tejun Heo <[email protected]>

drivers/scsi/BusLogic.c | 7 -------
drivers/scsi/advansys.c | 5 ++---
drivers/scsi/ips.c | 7 -------
drivers/scsi/ncr53c8xx.c | 14 ++------------
drivers/scsi/qla2xxx/qla_dbg.c | 6 ++----
drivers/scsi/scsi.c | 2 --
drivers/scsi/scsi_error.c | 7 -------
drivers/scsi/scsi_lib.c | 1 -
drivers/scsi/sym53c8xx_2/sym_glue.c | 6 ------
include/scsi/scsi_cmnd.h | 22 +++++++++-------------
10 files changed, 15 insertions(+), 62 deletions(-)

Index: scsi-export/drivers/scsi/BusLogic.c
===================================================================
--- scsi-export.orig/drivers/scsi/BusLogic.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/BusLogic.c 2005-03-23 09:40:11.000000000 +0900
@@ -2958,13 +2958,6 @@ static int BusLogic_AbortCommand(struct
struct BusLogic_CCB *CCB;
BusLogic_IncrementErrorCounter(&HostAdapter->TargetStatistics[TargetID].CommandAbortsRequested);
/*
- If this Command has already completed, then no Abort is necessary.
- */
- if (Command->serial_number != Command->serial_number_at_timeout) {
- BusLogic_Warning("Unable to Abort Command to Target %d - " "Already Completed\n", HostAdapter, TargetID);
- return SUCCESS;
- }
- /*
Attempt to find an Active CCB for this Command. If no Active CCB for this
Command is found, then no Abort is necessary.
*/
Index: scsi-export/drivers/scsi/advansys.c
===================================================================
--- scsi-export.orig/drivers/scsi/advansys.c 2005-03-23 09:40:09.000000000 +0900
+++ scsi-export/drivers/scsi/advansys.c 2005-03-23 09:40:11.000000000 +0900
@@ -9198,9 +9198,8 @@ asc_prt_scsi_cmnd(struct scsi_cmnd *s)
s->use_sg, s->sglist_len, s->abort_reason);

printk(
-" serial_number 0x%x, serial_number_at_timeout 0x%x, retries %d, allowed %d\n",
- (unsigned) s->serial_number, (unsigned) s->serial_number_at_timeout,
- s->retries, s->allowed);
+" serial_number 0x%x, retries %d, allowed %d\n",
+ (unsigned) s->serial_number, s->retries, s->allowed);

printk(
" timeout_per_command %d, timeout_total %d, timeout %d\n",
Index: scsi-export/drivers/scsi/ips.c
===================================================================
--- scsi-export.orig/drivers/scsi/ips.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/ips.c 2005-03-23 09:40:11.000000000 +0900
@@ -833,13 +833,6 @@ ips_eh_abort(Scsi_Cmnd * SC)
if (!ha->active)
return (FAILED);

- if (SC->serial_number != SC->serial_number_at_timeout) {
- /* HMM, looks like a bogus command */
- DEBUG(1, "Abort called with bogus scsi command");
-
- return (FAILED);
- }
-
/* See if the command is on the copp queue */
item = ha->copp_waitlist.head;
while ((item) && (item->scsi_cmd != SC))
Index: scsi-export/drivers/scsi/ncr53c8xx.c
===================================================================
--- scsi-export.orig/drivers/scsi/ncr53c8xx.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/ncr53c8xx.c 2005-03-23 09:40:11.000000000 +0900
@@ -7601,24 +7601,14 @@ static int ncr53c8xx_abort(struct scsi_c
struct scsi_cmnd *done_list;

#if defined SCSI_RESET_SYNCHRONOUS && defined SCSI_RESET_ASYNCHRONOUS
- printk("ncr53c8xx_abort: pid=%lu serial_number=%ld serial_number_at_timeout=%ld\n",
- cmd->pid, cmd->serial_number, cmd->serial_number_at_timeout);
+ printk("ncr53c8xx_abort: pid=%lu serial_number=%ld\n",
+ cmd->pid, cmd->serial_number);
#else
printk("ncr53c8xx_abort: command pid %lu\n", cmd->pid);
#endif

NCR_LOCK_NCB(np, flags);

-#if defined SCSI_RESET_SYNCHRONOUS && defined SCSI_RESET_ASYNCHRONOUS
- /*
- * We have to just ignore abort requests in some situations.
- */
- if (cmd->serial_number != cmd->serial_number_at_timeout) {
- sts = SCSI_ABORT_NOT_RUNNING;
- goto out;
- }
-#endif
-
sts = ncr_abort_command(np, cmd);
out:
done_list = np->done_list;
Index: scsi-export/drivers/scsi/qla2xxx/qla_dbg.c
===================================================================
--- scsi-export.orig/drivers/scsi/qla2xxx/qla_dbg.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/qla2xxx/qla_dbg.c 2005-03-23 09:40:11.000000000 +0900
@@ -1050,10 +1050,8 @@ qla2x00_print_scsi_cmd(struct scsi_cmnd
for (i = 0; i < cmd->cmd_len; i++) {
printk("0x%02x ", cmd->cmnd[i]);
}
- printk("\n seg_cnt=%d, allowed=%d, retries=%d, "
- "serial_number_at_timeout=0x%lx\n",
- cmd->use_sg, cmd->allowed, cmd->retries,
- cmd->serial_number_at_timeout);
+ printk("\n seg_cnt=%d, allowed=%d, retries=%d\n",
+ cmd->use_sg, cmd->allowed, cmd->retries);
printk(" request buffer=0x%p, request buffer len=0x%x\n",
cmd->request_buffer, cmd->request_bufflen);
printk(" tag=%d, transfersize=0x%x\n",
Index: scsi-export/drivers/scsi/scsi.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi.c 2005-03-23 09:40:10.000000000 +0900
+++ scsi-export/drivers/scsi/scsi.c 2005-03-23 09:40:11.000000000 +0900
@@ -687,7 +687,6 @@ void scsi_init_cmd_from_req(struct scsi_
cmd->request = sreq->sr_request;
memcpy(cmd->data_cmnd, sreq->sr_cmnd, sizeof(cmd->data_cmnd));
cmd->serial_number = 0;
- cmd->serial_number_at_timeout = 0;
cmd->bufflen = sreq->sr_bufflen;
cmd->buffer = sreq->sr_buffer;
cmd->retries = 0;
@@ -766,7 +765,6 @@ void __scsi_done(struct scsi_cmnd *cmd)
* Set the serial numbers back to zero
*/
cmd->serial_number = 0;
- cmd->serial_number_at_timeout = 0;
cmd->state = SCSI_STATE_BHQUEUE;
cmd->owner = SCSI_OWNER_BH_HANDLER;

Index: scsi-export/drivers/scsi/scsi_error.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_error.c 2005-03-23 09:40:10.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_error.c 2005-03-23 09:40:11.000000000 +0900
@@ -80,11 +80,6 @@ int scsi_eh_scmd_add(struct scsi_cmnd *s
*/
scmd->owner = SCSI_OWNER_ERROR_HANDLER;
scmd->state = SCSI_STATE_FAILED;
- /*
- * Set the serial_number_at_timeout to the current
- * serial_number
- */
- scmd->serial_number_at_timeout = scmd->serial_number;
list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
set_bit(SHOST_RECOVERY, &shost->shost_state);
shost->host_failed++;
@@ -1060,7 +1055,6 @@ static int scsi_try_bus_reset(struct scs
SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Snd Bus RST\n",
__FUNCTION__));
scmd->owner = SCSI_OWNER_LOWLEVEL;
- scmd->serial_number_at_timeout = scmd->serial_number;

if (!scmd->device->host->hostt->eh_bus_reset_handler)
return FAILED;
@@ -1092,7 +1086,6 @@ static int scsi_try_host_reset(struct sc
SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Snd Host RST\n",
__FUNCTION__));
scmd->owner = SCSI_OWNER_LOWLEVEL;
- scmd->serial_number_at_timeout = scmd->serial_number;

if (!scmd->device->host->hostt->eh_host_reset_handler)
return FAILED;
Index: scsi-export/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_lib.c 2005-03-23 09:40:10.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_lib.c 2005-03-23 09:40:11.000000000 +0900
@@ -386,7 +386,6 @@ static int scsi_init_cmd_errh(struct scs
{
cmd->owner = SCSI_OWNER_MIDLEVEL;
cmd->serial_number = 0;
- cmd->serial_number_at_timeout = 0;
cmd->abort_reason = 0;

memset(cmd->sense_buffer, 0, sizeof cmd->sense_buffer);
Index: scsi-export/drivers/scsi/sym53c8xx_2/sym_glue.c
===================================================================
--- scsi-export.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-03-23 09:40:11.000000000 +0900
@@ -799,12 +799,6 @@ static int sym_eh_handler(int op, char *

dev_warn(&cmd->device->sdev_gendev, "%s operation started.\n", opname);

-#if 0
- /* This one should be the result of some race, thus to ignore */
- if (cmd->serial_number != cmd->serial_number_at_timeout)
- goto prepare;
-#endif
-
/* This one is queued in some place -> to wait for completion */
FOR_EACH_QUEUED_ELEMENT(&np->busy_ccbq, qp) {
struct sym_ccb *cp = sym_que_entry(qp, struct sym_ccb, link_ccbq);
Index: scsi-export/include/scsi/scsi_cmnd.h
===================================================================
--- scsi-export.orig/include/scsi/scsi_cmnd.h 2005-03-23 09:40:09.000000000 +0900
+++ scsi-export/include/scsi/scsi_cmnd.h 2005-03-23 09:40:11.000000000 +0900
@@ -43,21 +43,17 @@ struct scsi_cmnd {
void (*done) (struct scsi_cmnd *); /* Mid-level done function */

/*
- * A SCSI Command is assigned a nonzero serial_number when internal_cmnd
- * passes it to the driver's queue command function. The serial_number
- * is cleared when scsi_done is entered indicating that the command has
- * been completed. If a timeout occurs, the serial number at the moment
- * of timeout is copied into serial_number_at_timeout. By subsequently
- * comparing the serial_number and serial_number_at_timeout fields
- * during abort or reset processing, we can detect whether the command
- * has already completed. This also detects cases where the command has
- * completed and the SCSI Command structure has already being reused
- * for another command, so that we can avoid incorrectly aborting or
- * resetting the new command.
- * The serial number is only unique per host.
+ * A SCSI Command is assigned a nonzero serial_number before passed
+ * to the driver's queue command function. The serial_number is
+ * cleared when scsi_done is entered indicating that the command
+ * has been completed. It currently doesn't have much use other
+ * than printk's. Some lldd's use this number for other purposes.
+ * It's almost certain that such usages are either incorrect or
+ * meaningless. Please kill all usages other than printk's. Also,
+ * as this number is always identical to ->pid, please convert
+ * printk's to use ->pid, so that we can kill this field.
*/
unsigned long serial_number;
- unsigned long serial_number_at_timeout;

int retries;
int allowed;

2005-03-23 02:28:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 07/08] scsi: remove bogus {get|put}_device() calls

07_scsi_refcnt_cleanup.patch

SCSI request submission paths can be categorized like the
following.

* through high-level driver (sd, st, sg...)
+ requests (fs / pc)
+ ioctls
+ flushes (issue_flush / barrier rqs)
+ backing dev (unplug fn / field referencing)
+ high-level specific (init / revalidation...)
* through scsi-midlayer
+ midlevel specific (init...)
+ transport specific (domain validations...)

All accesses either

* open high-level driver before submitting requests and
closes with no request left.
* get_device() before submitting requests and put_device()
with no request left.

So, basically, SCSI high-level object (scsi_disk) and
mid-level object (scsi_device) are reference counted by users,
not the requests they submit. Reference count cannot go zero
with active users and users cannot access the object once the
reference count reaches zero.

So, the {get/put}_device() calls in scsi_get_command() and
scsi_request_fn() are bogus and misleading. In addition,
get_device() cannot synchronize 1->0 and 0->1 transitions and
always returns the device pointer given as the argument. The
== NULL tests are just misleading.

Signed-off-by: Tejun Heo <[email protected]>

scsi.c | 9 +--------
scsi_lib.c | 12 +-----------
2 files changed, 2 insertions(+), 19 deletions(-)

Index: scsi-export/drivers/scsi/scsi.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi.c 2005-03-23 09:40:11.000000000 +0900
+++ scsi-export/drivers/scsi/scsi.c 2005-03-23 09:40:11.000000000 +0900
@@ -246,10 +246,6 @@ struct scsi_cmnd *scsi_get_command(struc
{
struct scsi_cmnd *cmd;

- /* Bail if we can't get a reference to the device */
- if (!get_device(&dev->sdev_gendev))
- return NULL;
-
cmd = __scsi_get_command(dev->host, gfp_mask);

if (likely(cmd != NULL)) {
@@ -264,8 +260,7 @@ struct scsi_cmnd *scsi_get_command(struc
spin_lock_irqsave(&dev->list_lock, flags);
list_add_tail(&cmd->list, &dev->cmd_list);
spin_unlock_irqrestore(&dev->list_lock, flags);
- } else
- put_device(&dev->sdev_gendev);
+ }

return cmd;
}
@@ -303,8 +298,6 @@ void scsi_put_command(struct scsi_cmnd *

if (likely(cmd != NULL))
kmem_cache_free(shost->cmd_pool->slab, cmd);
-
- put_device(&sdev->sdev_gendev);
}
EXPORT_SYMBOL(scsi_put_command);

Index: scsi-export/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_lib.c 2005-03-23 09:40:11.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_lib.c 2005-03-23 09:40:11.000000000 +0900
@@ -1200,10 +1200,6 @@ static void scsi_request_fn(struct reque
struct scsi_cmnd *cmd;
struct request *req;

- if(!get_device(&sdev->sdev_gendev))
- /* We must be tearing the block queue down already */
- return;
-
/*
* To start with, we keep looping until the queue is empty, or until
* the host is no longer able to accept any more requests.
@@ -1288,7 +1284,7 @@ static void scsi_request_fn(struct reque
}
}

- goto out;
+ return;

not_ready:
spin_unlock_irq(shost->host_lock);
@@ -1306,12 +1302,6 @@ static void scsi_request_fn(struct reque
sdev->device_busy--;
if(sdev->device_busy == 0)
blk_plug_device(q);
- out:
- /* must be careful here...if we trigger the ->remove() function
- * we cannot be holding the q lock */
- spin_unlock_irq(q->queue_lock);
- put_device(&sdev->sdev_gendev);
- spin_lock_irq(q->queue_lock);
}

u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)

2005-03-23 02:20:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 04/08] scsi: remove meaningless volatile qualifiers from structure definitions

04_scsi_remove_volatile.patch

scsi_device->device_busy, Scsi_Host->host_busy and
->host_failed have volatile qualifiers, but the qualifiers
don't serve any purpose. Kill them. While at it, protect
->host_failed update in scsi_error for consistency and clarity.

Signed-off-by: Tejun Heo <[email protected]>

drivers/scsi/scsi_error.c | 6 +++++-
include/scsi/scsi_device.h | 2 +-
include/scsi/scsi_host.h | 4 ++--
3 files changed, 8 insertions(+), 4 deletions(-)

Index: scsi-export/drivers/scsi/scsi_error.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_error.c 2005-03-23 09:40:09.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_error.c 2005-03-23 09:40:10.000000000 +0900
@@ -652,9 +652,13 @@ static int scsi_request_sense(struct scs
static void scsi_eh_finish_cmd(struct scsi_cmnd *scmd,
struct list_head *done_q)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(scmd->device->host->host_lock, flags);
scmd->device->host->host_failed--;
- scmd->state = SCSI_STATE_BHQUEUE;
+ spin_unlock_irqrestore(scmd->device->host->host_lock, flags);

+ scmd->state = SCSI_STATE_BHQUEUE;
scsi_eh_eflags_clr_all(scmd);

/*
Index: scsi-export/include/scsi/scsi_device.h
===================================================================
--- scsi-export.orig/include/scsi/scsi_device.h 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/include/scsi/scsi_device.h 2005-03-23 09:40:10.000000000 +0900
@@ -43,7 +43,7 @@ struct scsi_device {
struct list_head siblings; /* list of all devices on this host */
struct list_head same_target_siblings; /* just the devices sharing same target id */

- volatile unsigned short device_busy; /* commands actually active on low-level */
+ unsigned short device_busy; /* commands actually active on low-level */
spinlock_t sdev_lock; /* also the request queue_lock */
spinlock_t list_lock;
struct list_head cmd_list; /* queue of in use SCSI Command structures */
Index: scsi-export/include/scsi/scsi_host.h
===================================================================
--- scsi-export.orig/include/scsi/scsi_host.h 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/include/scsi/scsi_host.h 2005-03-23 09:40:10.000000000 +0900
@@ -448,8 +448,8 @@ struct Scsi_Host {
wait_queue_head_t host_wait;
struct scsi_host_template *hostt;
struct scsi_transport_template *transportt;
- volatile unsigned short host_busy; /* commands actually active on low-level */
- volatile unsigned short host_failed; /* commands that failed. */
+ unsigned short host_busy; /* commands actually active on low-level */
+ unsigned short host_failed; /* commands that failed. */

unsigned short host_no; /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
int resetting; /* if set, it means that last_reset is a valid value */

2005-03-23 02:33:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 05/08] scsi: remove a timer race from scsi_queue_insert() and cleanup timer

05_scsi_timer_cleanup.patch

scsi_queue_insert() has four callers. Three callers call with
timer disabled and one (the second invocation in
scsi_dispatch_cmd()) calls with timer activated.
scsi_queue_insert() used to always call scsi_delete_timer()
and ignore the return value. This results in race with timer
expiration. Remove scsi_delete_timer() call from
scsi_queue_insert() and make the caller delete timer and check
the return value.

While at it, as, once a scsi timer is added, it should expire
or be deleted before reused, make scsi_add_timer() strict
about timer reuses. Now timer expiration function clears
->function and all timer deletion should go through
scsi_delete_timer(). Also, remove bogus ->eh_action tests
from scsi_eh_{done|times_out} functions. The condition is
always true and the test is somewhat misleading.

Signed-off-by: Tejun Heo <[email protected]>

aic7xxx/aic79xx_osm.c | 1 +
aic7xxx/aic7xxx_osm.c | 1 +
scsi.c | 7 ++++---
scsi_error.c | 24 +++++++-----------------
scsi_lib.c | 6 ------
5 files changed, 13 insertions(+), 26 deletions(-)

Index: scsi-export/drivers/scsi/aic7xxx/aic79xx_osm.c
===================================================================
--- scsi-export.orig/drivers/scsi/aic7xxx/aic79xx_osm.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/aic7xxx/aic79xx_osm.c 2005-03-23 09:40:10.000000000 +0900
@@ -2725,6 +2725,7 @@ ahd_linux_dv_target(struct ahd_softc *ah
/* Queue the command and wait for it to complete */
/* Abuse eh_timeout in the scsi_cmnd struct for our purposes */
init_timer(&cmd->eh_timeout);
+ cmd->eh_timeout.function = NULL;
#ifdef AHD_DEBUG
if ((ahd_debug & AHD_SHOW_MESSAGES) != 0)
/*
Index: scsi-export/drivers/scsi/aic7xxx/aic7xxx_osm.c
===================================================================
--- scsi-export.orig/drivers/scsi/aic7xxx/aic7xxx_osm.c 2005-03-23 09:39:36.000000000 +0900
+++ scsi-export/drivers/scsi/aic7xxx/aic7xxx_osm.c 2005-03-23 09:40:10.000000000 +0900
@@ -2409,6 +2409,7 @@ ahc_linux_dv_target(struct ahc_softc *ah
/* Queue the command and wait for it to complete */
/* Abuse eh_timeout in the scsi_cmnd struct for our purposes */
init_timer(&cmd->eh_timeout);
+ cmd->eh_timeout.function = NULL;
#ifdef AHC_DEBUG
if ((ahc_debug & AHC_SHOW_MESSAGES) != 0)
/*
Index: scsi-export/drivers/scsi/scsi.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi.c 2005-03-23 09:40:09.000000000 +0900
+++ scsi-export/drivers/scsi/scsi.c 2005-03-23 09:40:10.000000000 +0900
@@ -639,9 +639,10 @@ int scsi_dispatch_cmd(struct scsi_cmnd *
spin_unlock_irqrestore(host->host_lock, flags);
if (rtn) {
atomic_inc(&cmd->device->iodone_cnt);
- scsi_queue_insert(cmd,
- (rtn == SCSI_MLQUEUE_DEVICE_BUSY) ?
- rtn : SCSI_MLQUEUE_HOST_BUSY);
+ if (scsi_delete_timer(cmd))
+ scsi_queue_insert(cmd,
+ (rtn == SCSI_MLQUEUE_DEVICE_BUSY) ?
+ rtn : SCSI_MLQUEUE_HOST_BUSY);
SCSI_LOG_MLQUEUE(3,
printk("queuecommand : request rejected\n"));
}
Index: scsi-export/drivers/scsi/scsi_error.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_error.c 2005-03-23 09:40:10.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_error.c 2005-03-23 09:40:10.000000000 +0900
@@ -107,14 +107,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *s
void scsi_add_timer(struct scsi_cmnd *scmd, int timeout,
void (*complete)(struct scsi_cmnd *))
{
-
- /*
- * If the clock was already running for this command, then
- * first delete the timer. The timer handling code gets rather
- * confused if we don't do this.
- */
- if (scmd->eh_timeout.function)
- del_timer(&scmd->eh_timeout);
+ BUG_ON(scmd->eh_timeout.function);

scmd->eh_timeout.data = (unsigned long)scmd;
scmd->eh_timeout.expires = jiffies + timeout;
@@ -170,6 +163,9 @@ void scsi_times_out(struct scsi_cmnd *sc
{
scsi_log_completion(scmd, TIMEOUT_ERROR);

+ scmd->eh_timeout.data = (unsigned long)NULL;
+ scmd->eh_timeout.function = NULL;
+
if (scmd->device->host->hostt->eh_timed_out)
switch (scmd->device->host->hostt->eh_timed_out(scmd)) {
case EH_HANDLED:
@@ -442,9 +438,7 @@ static void scsi_eh_times_out(struct scs
scsi_eh_eflags_set(scmd, SCSI_EH_REC_TIMEOUT);
SCSI_LOG_ERROR_RECOVERY(3, printk("%s: scmd:%p\n", __FUNCTION__,
scmd));
-
- if (scmd->device->host->eh_action)
- up(scmd->device->host->eh_action);
+ up(scmd->device->host->eh_action);
}

/**
@@ -459,15 +453,12 @@ static void scsi_eh_done(struct scsi_cmn
* way of stopping the timeout handler from running, so we must
* always defer to it.
*/
- if (del_timer(&scmd->eh_timeout)) {
+ if (scsi_delete_timer(scmd)) {
scmd->request->rq_status = RQ_SCSI_DONE;
scmd->owner = SCSI_OWNER_ERROR_HANDLER;
-
SCSI_LOG_ERROR_RECOVERY(3, printk("%s scmd: %p result: %x\n",
__FUNCTION__, scmd, scmd->result));
-
- if (scmd->device->host->eh_action)
- up(scmd->device->host->eh_action);
+ up(scmd->device->host->eh_action);
}
}

@@ -1881,7 +1872,6 @@ scsi_reset_provider(struct scsi_device *
rtn = FAILED;
}

- scsi_delete_timer(scmd);
scsi_next_command(scmd);
return rtn;
}
Index: scsi-export/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-export.orig/drivers/scsi/scsi_lib.c 2005-03-23 09:40:09.000000000 +0900
+++ scsi-export/drivers/scsi/scsi_lib.c 2005-03-23 09:40:10.000000000 +0900
@@ -229,12 +229,6 @@ int scsi_queue_insert(struct scsi_cmnd *
printk("Inserting command %p into mlqueue\n", cmd));

/*
- * We are inserting the command into the ml queue. First, we
- * cancel the timer, so it doesn't time out.
- */
- scsi_delete_timer(cmd);
-
- /*
* Next, set the appropriate busy bit for the device/host.
*
* If the host/device isn't busy, assume that something actually

2005-03-23 04:07:53

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 01/08] scsi: remove unused bounce-buffer release path

On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> 01_scsi_remove_scsi_release_buffers.patch
>
> Buffer bouncing hasn't been done inside the scsi midlayer for
> quite sometime now, but bounce-buffer release paths are still
> around. This patch removes these unused paths.

Yes, but scsi_release_buffers isn't referring to bounce buffers anymore,
it's simply releasing the sg buffers.

[...]
> - else if (cmd->buffer != req->buffer) {
> - if (rq_data_dir(req) == READ) {
> - unsigned long flags;
> - char *to = bio_kmap_irq(req->bio, &flags);
> - memcpy(to, cmd->buffer, cmd->bufflen);
> - bio_kunmap_irq(to, &flags);
> - }
> - kfree(cmd->buffer);
> - }

I'll defer to Jens here, but I don't thing you can just remove this ...
sg_io with a misaligned buffer will fail without this.

That rather nasty code freeing cmd->buffer needs to be in there as
well ... so it does make sense to keep this API

James


2005-03-23 04:09:50

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> When hot-unplugging using scsi_remove_host() function (as usb
> does), scsi_forget_host() used to be called before
> scsi_host_cancel(). So, the device gets removed first without
> request cleanup and scsi_host_cancel() never gets to call
> scsi_device_cancel() on the removed devices. This results in
> premature completion of hot-unplugging process with active
> requests left in queue, eventually leading to hang/offlined
> device or oops when the active command times out.
>
> This patch makes scsi_remove_host() call scsi_host_cancel()
> first such that the host is first transited into cancel state
> and all requests of all devices are killed, and then, the
> devices are removed. This patch fixes the oops in eh after
> hot-unplugging bug.

This is actually simply reversing this patch:

http://marc.theaimsgroup.com/?l=linux-scsi&m=109268755500248

And all it does is give us the previous consequences back.

The oops isn't in the eh it's in the usb-storage eh routine.

However, the current host code does need fixing, but the fix is to move
it over to a proper state model rather than the current bit twiddling we
do.

James


2005-03-23 04:15:37

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 04/08] scsi: remove meaningless volatile qualifiers from structure definitions

On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> scsi_device->device_busy, Scsi_Host->host_busy and
> ->host_failed have volatile qualifiers, but the qualifiers
> don't serve any purpose. Kill them. While at it, protect
> ->host_failed update in scsi_error for consistency and clarity.

Well ... the data here is volatile so what you're advocating is a move
away from a volatile variable model to a protected variable one ... did
you audit all users of both of these to make sure we have protection on
all of them? It looks like the sata strategy handlers would still rely
on the volatile data.

James


2005-03-23 04:16:18

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 07/08] scsi: remove bogus {get|put}_device() calls

On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> So, basically, SCSI high-level object (scsi_disk) and
> mid-level object (scsi_device) are reference counted by users,
> not the requests they submit. Reference count cannot go zero
> with active users and users cannot access the object once the
> reference count reaches zero.

Actually, no. Unfortunately we still have some fire and forget APIs, so
the contention that we always have an open refcounted descriptor isn't
always true.

James


2005-03-23 04:22:42

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 04/08] scsi: remove meaningless volatile qualifiers from structure definitions

James Bottomley wrote:
> On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
>
>> scsi_device->device_busy, Scsi_Host->host_busy and
>> ->host_failed have volatile qualifiers, but the qualifiers
>> don't serve any purpose. Kill them. While at it, protect
>> ->host_failed update in scsi_error for consistency and clarity.
>
>
> Well ... the data here is volatile so what you're advocating is a move
> away from a volatile variable model to a protected variable one ... did
> you audit all users of both of these to make sure we have protection on
> all of them? It looks like the sata strategy handlers would still rely
> on the volatile data.

volatile is almost always (a) buggy, or (b) hiding bugs. At the very
least, barriers are usually needed.

Almost every case really wants to be inside a spinlock, or atomic_t, or
similarly protected.

Specifically for SATA, I am making the presumption that SCSI is smart
enough not to mess with host_failed until my error handler completes.

Jeff


2005-03-23 04:51:15

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

Hi,

James Bottomley wrote:
> On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
>
>> When hot-unplugging using scsi_remove_host() function (as usb
>> does), scsi_forget_host() used to be called before
>> scsi_host_cancel(). So, the device gets removed first without
>> request cleanup and scsi_host_cancel() never gets to call
>> scsi_device_cancel() on the removed devices. This results in
>> premature completion of hot-unplugging process with active
>> requests left in queue, eventually leading to hang/offlined
>> device or oops when the active command times out.
>>
>> This patch makes scsi_remove_host() call scsi_host_cancel()
>> first such that the host is first transited into cancel state
>> and all requests of all devices are killed, and then, the
>> devices are removed. This patch fixes the oops in eh after
>> hot-unplugging bug.
>
>
> This is actually simply reversing this patch:
>
> http://marc.theaimsgroup.com/?l=linux-scsi&m=109268755500248
>
> And all it does is give us the previous consequences back.
>
> The oops isn't in the eh it's in the usb-storage eh routine.

Well, but it's because scsi midlayer calls back into usb-storage eh
after the detaching process is complete.

> However, the current host code does need fixing, but the fix is to move
> it over to a proper state model rather than the current bit twiddling we
> do.

I agree & am working on it. This patch was mainly to verify Jens' oops.

--
tejun

2005-03-23 05:28:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 04/08] scsi: remove meaningless volatile qualifiers from structure definitions

Hello, guys.

On Tue, Mar 22, 2005 at 11:22:23PM -0500, Jeff Garzik wrote:
> James Bottomley wrote:
> >On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> >
> >> scsi_device->device_busy, Scsi_Host->host_busy and
> >> ->host_failed have volatile qualifiers, but the qualifiers
> >> don't serve any purpose. Kill them. While at it, protect
> >> ->host_failed update in scsi_error for consistency and clarity.
> >
> >
> >Well ... the data here is volatile so what you're advocating is a move
> >away from a volatile variable model to a protected variable one ... did
> >you audit all users of both of these to make sure we have protection on
> >all of them? It looks like the sata strategy handlers would still rely
> >on the volatile data.

Yes, I did (well, tried :-). Adding locking/unlocking was just for
clarity. We have synchronization w/ implied memory barrier before and
after eh processing, so we don't really need to acquire the lock
there. And while adding it, I forgot about libata strategy function.
Sorry about that. I removed the locking from scsi_error and added
comments to data structure definition that those fields can be
accessed without acquiring the host lock. I think it's clearer this
way.

> volatile is almost always (a) buggy, or (b) hiding bugs. At the very
> least, barriers are usually needed.
>
> Almost every case really wants to be inside a spinlock, or atomic_t, or
> similarly protected.
>
> Specifically for SATA, I am making the presumption that SCSI is smart
> enough not to mess with host_failed until my error handler completes.

Yes, volatile only instructs the compiler to not cache the variable
in the register and not move around accesses to the variable. The
only valid usage would be raw synchronization with variables (busy
checking, two flag variable synchronization, etc...), but even those
usages are better off with explicit barriers and cpu relaxations to
clarify what's going on.

Currently all accesses outside eh are properly protected with locks
except for the following two cases.

* sg_proc_seq_show_dev(): read access, informational. doesn't matter.
* check looping in scsi_device_quiesce(): we have proper barrier.


Signed-off-by: Tejun Heo <[email protected]>


Index: scsi-export/include/scsi/scsi_device.h
===================================================================
--- scsi-export.orig/include/scsi/scsi_device.h 2005-03-23 09:40:12.000000000 +0900
+++ scsi-export/include/scsi/scsi_device.h 2005-03-23 14:04:59.000000000 +0900
@@ -43,7 +43,8 @@ struct scsi_device {
struct list_head siblings; /* list of all devices on this host */
struct list_head same_target_siblings; /* just the devices sharing same target id */

- volatile unsigned short device_busy; /* commands actually active on low-level */
+ unsigned short device_busy; /* commands actually active on
+ * low-level. protected by sdev_lock. */
spinlock_t sdev_lock; /* also the request queue_lock */
spinlock_t list_lock;
struct list_head cmd_list; /* queue of in use SCSI Command structures */
Index: scsi-export/include/scsi/scsi_host.h
===================================================================
--- scsi-export.orig/include/scsi/scsi_host.h 2005-03-23 09:40:12.000000000 +0900
+++ scsi-export/include/scsi/scsi_host.h 2005-03-23 14:04:59.000000000 +0900
@@ -448,8 +448,14 @@ struct Scsi_Host {
wait_queue_head_t host_wait;
struct scsi_host_template *hostt;
struct scsi_transport_template *transportt;
- volatile unsigned short host_busy; /* commands actually active on low-level */
- volatile unsigned short host_failed; /* commands that failed. */
+
+ /*
+ * The following two fields are protected with host_lock;
+ * however, eh routines can safely access during eh processing
+ * without acquiring the lock.
+ */
+ unsigned short host_busy; /* commands actually active on low-level */
+ unsigned short host_failed; /* commands that failed. */

unsigned short host_no; /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
int resetting; /* if set, it means that last_reset is a valid value */

2005-03-23 06:36:57

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 01/08] scsi: remove unused bounce-buffer release path


Hello, James.

James Bottomley wrote:
> On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
>
>>01_scsi_remove_scsi_release_buffers.patch
>>
>> Buffer bouncing hasn't been done inside the scsi midlayer for
>> quite sometime now, but bounce-buffer release paths are still
>> around. This patch removes these unused paths.
>
>
> Yes, but scsi_release_buffers isn't referring to bounce buffers anymore,
> it's simply releasing the sg buffers.
>

That's what I did. Replacing scsi_release_buffers() calls with calls
to scsi_free_sgtable(). The only logic removed is bounce-buffer
release/copy-back.

> [...]
>
>>- else if (cmd->buffer != req->buffer) {
>>- if (rq_data_dir(req) == READ) {
>>- unsigned long flags;
>>- char *to = bio_kmap_irq(req->bio, &flags);
>>- memcpy(to, cmd->buffer, cmd->bufflen);
>>- bio_kunmap_irq(to, &flags);
>>- }
>>- kfree(cmd->buffer);
>>- }
>
>
> I'll defer to Jens here, but I don't thing you can just remove this ...
> sg_io with a misaligned buffer will fail without this.

AFAIK, those are done by blk_rq_map_user() and blk_rq_unmap_user(),
both of which are invoked directly by sg_io().

> That rather nasty code freeing cmd->buffer needs to be in there as
> well ... so it does make sense to keep this API

That code is invoked only for REQ_BLOCK_PC requests without bio, and I
digged pretty hard but, in those cases, AFAICT, the callers are
responsible for supplying dma-able buffers and nothing seems to alter
cmd->buffer after the cmd gets initialized, but I might be missing
things here. If so, please point out.

Thanks.

--
tejun

2005-03-23 07:20:01

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Wed, Mar 23 2005, Tejun Heo wrote:
> Hi,
>
> James Bottomley wrote:
> >On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> >
> >> When hot-unplugging using scsi_remove_host() function (as usb
> >> does), scsi_forget_host() used to be called before
> >> scsi_host_cancel(). So, the device gets removed first without
> >> request cleanup and scsi_host_cancel() never gets to call
> >> scsi_device_cancel() on the removed devices. This results in
> >> premature completion of hot-unplugging process with active
> >> requests left in queue, eventually leading to hang/offlined
> >> device or oops when the active command times out.
> >>
> >> This patch makes scsi_remove_host() call scsi_host_cancel()
> >> first such that the host is first transited into cancel state
> >> and all requests of all devices are killed, and then, the
> >> devices are removed. This patch fixes the oops in eh after
> >> hot-unplugging bug.
> >
> >
> >This is actually simply reversing this patch:
> >
> >http://marc.theaimsgroup.com/?l=linux-scsi&m=109268755500248
> >
> >And all it does is give us the previous consequences back.
> >
> >The oops isn't in the eh it's in the usb-storage eh routine.
>
> Well, but it's because scsi midlayer calls back into usb-storage eh
> after the detaching process is complete.
>
> >However, the current host code does need fixing, but the fix is to move
> >it over to a proper state model rather than the current bit twiddling we
> >do.
>
> I agree & am working on it. This patch was mainly to verify Jens' oops.

It is not the oops I am getting. When I get a few minutes today, I'll
reproduce with vanilla and post it here.

--
Jens Axboe

2005-03-23 09:15:11

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 07/08] scsi: remove bogus {get|put}_device() calls

Hi,

James Bottomley wrote:
> On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
>
>> So, basically, SCSI high-level object (scsi_disk) and
>> mid-level object (scsi_device) are reference counted by users,
>> not the requests they submit. Reference count cannot go zero
>> with active users and users cannot access the object once the
>> reference count reaches zero.
>
>
> Actually, no. Unfortunately we still have some fire and forget APIs, so
> the contention that we always have an open refcounted descriptor isn't
> always true.

Yeap, you're right. So, what we have is

* All high-level users have open access to the scsi high-level
object on issueing requests, but may close it before its requests
complete.
* All mid-layer users do get_device() before submitting requests,
but may put_device() before its requests complete.

Thanks for pointing that out. :-)

--
tejun

2005-03-23 15:12:37

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Wed, 2005-03-23 at 13:50 +0900, Tejun Heo wrote:
> Well, but it's because scsi midlayer calls back into usb-storage eh
> after the detaching process is complete.

Yes, but that's legitimate. It's always been explicitly stated that we
can't ensure absolute synchronisation in the stack: storage devices
must expect to have to reject I/O for devices they think have been
removed.

> > However, the current host code does need fixing, but the fix is to move
> > it over to a proper state model rather than the current bit twiddling we
> > do.
>
> I agree & am working on it. This patch was mainly to verify Jens' oops.

Thanks! You can look at the device state model as a guide ...
originally that was a bit mask too.

James

2005-03-23 15:16:55

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 04/08] scsi: remove meaningless volatile qualifiers from structure definitions

On Tue, 2005-03-22 at 23:22 -0500, Jeff Garzik wrote:
> volatile is almost always (a) buggy, or (b) hiding bugs. At the very
> least, barriers are usually needed.

The choice is either barrier or volatile usually. volatile is nasty
primarily because it causes compiler pessimism in variable reloading.

> Almost every case really wants to be inside a spinlock, or atomic_t, or
> similarly protected.

I know that's what I'm asking if an audit has been conducted for...to
replace the volatile, accesses have to be barrier protected.

> Specifically for SATA, I am making the presumption that SCSI is smart
> enough not to mess with host_failed until my error handler completes.

Yes, that's a valid assumption ... and by the single threaded nature of
the error handler, always true. However, the proposed patch wanted to
add a spinlock around the access in the scsi eh thread (the comment
stating for clarity). Thus, the same change should be made in SATA for
consistency.

Since that change isn't in the patch, I was asking if all the users of
these variables had been audited for barriers instead ... since the
answer looks to be "no" to me.

James


2005-03-23 15:21:20

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Wed, 2005-03-23 at 08:19 +0100, Jens Axboe wrote:
> It is not the oops I am getting. When I get a few minutes today, I'll
> reproduce with vanilla and post it here.

Well, I have news too. Unfortunately, the python script I posted is
hanging in D wait. When I tested all of this out (with a similar
script) in the 2.6.10 timeframe, it wasn't doing this, so we have some
other problem introduced into the stack since then, sigh.

Also it means my test isn't effective, so I need to track down the
open/close hang before I can make progress.

James


2005-03-23 15:26:08

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Wed, Mar 23 2005, James Bottomley wrote:
> On Wed, 2005-03-23 at 08:19 +0100, Jens Axboe wrote:
> > It is not the oops I am getting. When I get a few minutes today, I'll
> > reproduce with vanilla and post it here.
>
> Well, I have news too. Unfortunately, the python script I posted is
> hanging in D wait. When I tested all of this out (with a similar
> script) in the 2.6.10 timeframe, it wasn't doing this, so we have some
> other problem introduced into the stack since then, sigh.

Let me guess, it is hanging in wait_for_completion()?

> Also it means my test isn't effective, so I need to track down the
> open/close hang before I can make progress.

Makes sense, that is why you are not seeing the crash :)

--
Jens Axboe

2005-03-23 15:28:25

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 01/08] scsi: remove unused bounce-buffer release path

On Wed, Mar 23 2005, Tejun Heo wrote:
>
> Hello, James.
>
> James Bottomley wrote:
> >On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> >
> >>01_scsi_remove_scsi_release_buffers.patch
> >>
> >> Buffer bouncing hasn't been done inside the scsi midlayer for
> >> quite sometime now, but bounce-buffer release paths are still
> >> around. This patch removes these unused paths.
> >
> >
> >Yes, but scsi_release_buffers isn't referring to bounce buffers anymore,
> >it's simply releasing the sg buffers.
> >
>
> That's what I did. Replacing scsi_release_buffers() calls with calls
> to scsi_free_sgtable(). The only logic removed is bounce-buffer
> release/copy-back.
>
> >[...]
> >
> >>- else if (cmd->buffer != req->buffer) {
> >>- if (rq_data_dir(req) == READ) {
> >>- unsigned long flags;
> >>- char *to = bio_kmap_irq(req->bio, &flags);
> >>- memcpy(to, cmd->buffer, cmd->bufflen);
> >>- bio_kunmap_irq(to, &flags);
> >>- }
> >>- kfree(cmd->buffer);
> >>- }
> >
> >
> >I'll defer to Jens here, but I don't thing you can just remove this ...
> >sg_io with a misaligned buffer will fail without this.
>
> AFAIK, those are done by blk_rq_map_user() and blk_rq_unmap_user(),
> both of which are invoked directly by sg_io().
>
> >That rather nasty code freeing cmd->buffer needs to be in there as
> >well ... so it does make sense to keep this API
>
> That code is invoked only for REQ_BLOCK_PC requests without bio, and I
> digged pretty hard but, in those cases, AFAICT, the callers are
> responsible for supplying dma-able buffers and nothing seems to alter
> cmd->buffer after the cmd gets initialized, but I might be missing
> things here. If so, please point out.

That did not use to be true - eg request coming from the CDROM layer to
sr had to be bounced in the scsi layer for isa host adapters. I bet that
is still true.

--
Jens Axboe

2005-03-25 00:54:50

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Wed, 2005-03-23 at 16:25 +0100, Jens Axboe wrote:
> Let me guess, it is hanging in wait_for_completion()?

Yes, I have the trace now. Why is curious. This is the trace of the
failure:

Mar 24 18:40:34 localhost kernel: usb 4-2: USB disconnect, address 3
Mar 24 18:40:34 localhost kernel: sd 0:0:0:0: CMD c25c98b0 done, completing
Mar 24 18:40:34 localhost kernel: 0:0:0:0: cmd c25c98b0 returning
Mar 24 18:40:34 localhost kernel: 0:0:0:0: cmd c25c98b0 going out <6>Read Capacity (10) 25 00 00 00 00 00 00 00 00 00
Mar 24 18:40:34 localhost kernel: scsi0 (0:0): rejecting I/O to dead device (req c25c98b0)
Mar 24 18:40:34 localhost kernel: usb 4-2: new full speed USB device using uhci_hcd and address 4
Mar 24 18:40:34 localhost kernel: scsi1 : SCSI emulation for USB Mass Storage devices
Mar 24 18:40:34 localhost kernel: 1:0:0:0: cmd c1a1b4b0 going out <6>Inquiry 12 00 00 00 24 00

The problem occurs when the mid-layer rejects the I/O to the dead
device. Here it returns BLKPREP_KILL to the prep function, but after
that we never get a completion back.

I'll dig around in ll_rw_blk.c to see if I can trace the problem, but
you know this code better than I do ...

James


2005-03-25 03:15:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

Hello, James and Jens.

On Thu, Mar 24, 2005 at 06:45:58PM -0600, James Bottomley wrote:
> On Wed, 2005-03-23 at 16:25 +0100, Jens Axboe wrote:
> > Let me guess, it is hanging in wait_for_completion()?
>
> Yes, I have the trace now. Why is curious. This is the trace of the
> failure:
>
> Mar 24 18:40:34 localhost kernel: usb 4-2: USB disconnect, address 3
> Mar 24 18:40:34 localhost kernel: sd 0:0:0:0: CMD c25c98b0 done, completing
> Mar 24 18:40:34 localhost kernel: 0:0:0:0: cmd c25c98b0 returning
> Mar 24 18:40:34 localhost kernel: 0:0:0:0: cmd c25c98b0 going out <6>Read Capacity (10) 25 00 00 00 00 00 00 00 00 00
> Mar 24 18:40:34 localhost kernel: scsi0 (0:0): rejecting I/O to dead device (req c25c98b0)
> Mar 24 18:40:34 localhost kernel: usb 4-2: new full speed USB device using uhci_hcd and address 4
> Mar 24 18:40:34 localhost kernel: scsi1 : SCSI emulation for USB Mass Storage devices
> Mar 24 18:40:34 localhost kernel: 1:0:0:0: cmd c1a1b4b0 going out <6>Inquiry 12 00 00 00 24 00
>
> The problem occurs when the mid-layer rejects the I/O to the dead
> device. Here it returns BLKPREP_KILL to the prep function, but after
> that we never get a completion back.

I think I found the cause. Special requests submitted using
scsi_do_req() never initializes ->end_io(). Normally, SCSI midlayer
terminates special requests inside the SCSI midlayer without passing
through the blkdev layer. However, if a device is going away or taken
offline, blkdev layer gets to terminate special requests and, as
->end_io() is never set-up, nothing happens and the completion gets
lost.

The following patch implements scsi_do_req_endio() and sets up
->end_io() and ->end_io_data before sending out special commands.
It's a quick fix & hacky. I think the proper fix might be one of

* Don't return BLKPREP_KILL in the prep_fn and always terminate
special commands inside request_fn without using end_that_*
functions.

* I don't really know why the scsi_request/scsi_cmnd distincion
is made (resource usage?), but, if it's a legacy thing, replace
scsi_request with scsi_cmnd; then we can BLKPREP_KILL without using
dummy scsi_cmnd.


Signed-off-by: Tejun Heo <[email protected]>


# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2005/03/25 11:57:25+09:00 [email protected]
# midlayer special command termination fix
#
# drivers/scsi/scsi_lib.c
# 2005/03/25 11:57:17+09:00 [email protected] +30 -0
# midlayer special command termination fix
#
diff -Nru a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c 2005-03-25 11:59:28 +09:00
+++ b/drivers/scsi/scsi_lib.c 2005-03-25 11:59:28 +09:00
@@ -274,6 +274,33 @@
}

/*
+ * Special requests usually gets terminated inside scsi midlayer
+ * proper; however, they can be terminated by the blkdev layer when
+ * scsi_prep_fn() returns BLKPREP_KILL or scsi_request_fn() detects
+ * offline condition. The following callback is invoked when the
+ * blkdev layer terminates a special request. Emulate DID_NO_CONNECT.
+ */
+static void scsi_do_req_endio(struct request *rq)
+{
+ struct scsi_request *sreq = rq->end_io_data;
+ struct request_queue *q = sreq->sr_device->request_queue;
+ struct scsi_cmnd cmd;
+
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.device = sreq->sr_device;
+ scsi_init_cmd_from_req(&cmd, sreq);
+ /* Our command is dummy, nullify back link. */
+ sreq->sr_command = NULL;
+
+ sreq->sr_result = cmd.result = DID_NO_CONNECT << 16;
+
+ /* The sreq->done() callback expects queue_lock to be unlocked. */
+ spin_unlock(q->queue_lock);
+ cmd.done(&cmd);
+ spin_lock(q->queue_lock);
+}
+
+/*
* Function: scsi_do_req
*
* Purpose: Queue a SCSI request
@@ -326,6 +353,9 @@

if (sreq->sr_cmd_len == 0)
sreq->sr_cmd_len = COMMAND_SIZE(sreq->sr_cmnd[0]);
+
+ sreq->sr_request->end_io = scsi_do_req_endio;
+ sreq->sr_request->end_io_data = sreq;

/*
* head injection *required* here otherwise quiesce won't work

2005-03-25 05:03:00

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Fri, 2005-03-25 at 12:15 +0900, Tejun Heo wrote:
> I think I found the cause. Special requests submitted using
> scsi_do_req() never initializes ->end_io(). Normally, SCSI midlayer
> terminates special requests inside the SCSI midlayer without passing
> through the blkdev layer. However, if a device is going away or taken
> offline, blkdev layer gets to terminate special requests and, as
> ->end_io() is never set-up, nothing happens and the completion gets
> lost.

The analysis is exactly correct, well done! I think your patch is a bit
overly complex, though. We can achieve the same effect simply by
executing the completion without changing the rq_status like the patch
below.

Jens, To go back to the original problem, except when I hit the usb-
storage error handling oops, I can plug and unplug to my hearts content
and everything works.

James

===== drivers/scsi/scsi_lib.c 1.152 vs edited =====
--- 1.152/drivers/scsi/scsi_lib.c 2005-03-18 05:33:09 -06:00
+++ edited/drivers/scsi/scsi_lib.c 2005-03-24 22:59:18 -06:00
@@ -252,6 +252,16 @@
complete(req->waiting);
}

+/* This is the end routine we get to if a command was never attached
+ * to the request. Simply complete the request without changing
+ * rq_status; this will cause a DRIVER_ERROR. */
+static void scsi_wait_req_end_io(struct request *req)
+{
+ BUG_ON(!req->waiting);
+
+ complete(req->waiting);
+}
+
void scsi_wait_req(struct scsi_request *sreq, const void *cmnd, void *buffer,
unsigned bufflen, int timeout, int retries)
{
@@ -259,6 +269,7 @@

sreq->sr_request->waiting = &wait;
sreq->sr_request->rq_status = RQ_SCSI_BUSY;
+ sreq->sr_request->end_io = scsi_wait_req_end_io;
scsi_do_req(sreq, cmnd, buffer, bufflen, scsi_wait_done,
timeout, retries);
wait_for_completion(&wait);


2005-03-25 05:38:53

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

Hi,

On Thu, Mar 24, 2005 at 11:02:45PM -0600, James Bottomley wrote:
> On Fri, 2005-03-25 at 12:15 +0900, Tejun Heo wrote:
> > I think I found the cause. Special requests submitted using
> > scsi_do_req() never initializes ->end_io(). Normally, SCSI midlayer
> > terminates special requests inside the SCSI midlayer without passing
> > through the blkdev layer. However, if a device is going away or taken
> > offline, blkdev layer gets to terminate special requests and, as
> > ->end_io() is never set-up, nothing happens and the completion gets
> > lost.
>
> The analysis is exactly correct, well done! I think your patch is a bit
> overly complex, though. We can achieve the same effect simply by
> executing the completion without changing the rq_status like the patch
> below.
>
> Jens, To go back to the original problem, except when I hit the usb-
> storage error handling oops, I can plug and unplug to my hearts content
> and everything works.

We have users of scsi_do_req() other than scsi_wait_req() and they
use different done() functions to do different things. I've checked
other done functions and none uses contents inside the passed
scsi_cmnd, so using a dummy command should be okay with them. Am I
missing something here?

Oh, and I would really appreciate if you can fill me in / give a
pointer about the scsi_request/scsi_cmnd distinction.

Thanks a lot.

--
tejun

2005-03-25 19:19:57

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
> We have users of scsi_do_req() other than scsi_wait_req() and they
> use different done() functions to do different things. I've checked
> other done functions and none uses contents inside the passed
> scsi_cmnd, so using a dummy command should be okay with them. Am I
> missing something here?

Well ... the other users are supposed to be going away. They're
actually all coded wrongly in some way or other ... perhaps I should
speed up the process.

> Oh, and I would really appreciate if you can fill me in / give a
> pointer about the scsi_request/scsi_cmnd distinction.

The block layer speaks in terms of requests and the scsi layers in terms
of commands. The scsi_request_fn() actually associates a request with a
command. However, since SCSI uses the block layer for queueing, all the
internal scsi command submit paths have to use requests. This is what a
scsi_request is. The reason for the special casing is that we can't use
the normal REQ_CMD or REQ_BLOCK_PC paths because they need ULD
initialisation and back end processing.

James


2005-03-25 21:43:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence


Hello, James.

James Bottomley wrote:
> On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
>
>> We have users of scsi_do_req() other than scsi_wait_req() and they
>>use different done() functions to do different things. I've checked
>>other done functions and none uses contents inside the passed
>>scsi_cmnd, so using a dummy command should be okay with them. Am I
>>missing something here?
>
>
> Well ... the other users are supposed to be going away. They're
> actually all coded wrongly in some way or other ... perhaps I should
> speed up the process.

Sounds great. :-)

>> Oh, and I would really appreciate if you can fill me in / give a
>>pointer about the scsi_request/scsi_cmnd distinction.
>
> The block layer speaks in terms of requests and the scsi layers in terms
> of commands. The scsi_request_fn() actually associates a request with a
> command. However, since SCSI uses the block layer for queueing, all the
> internal scsi command submit paths have to use requests. This is what a
> scsi_request is. The reason for the special casing is that we can't use
> the normal REQ_CMD or REQ_BLOCK_PC paths because they need ULD
> initialisation and back end processing.

What I meant was we could just use scsi_cmnd instead of scsi_request
for commands. Currently, we do the following for special commands.

1. Allocate scsi_request and request (two are linked)
2. Initialize scsi_request as needed
3. queue the request
4. the request is dispatched
5. scsi_cmnd is initialized from scsi_request
6. scsi_cmnd is executed
7. result code and sense copied back to scsi_request
8. request is completed

Instead, we can

1. Allocate scsi_cmnd and request (two are linked)
2. Initialize scsi_cmnd as needed
3. queue the request
4. the request is dispatched
5. scsi_cmnd is executed
6. request is completed

As the latter seemed more straight-forward to me, I was wondering if
there were reasons that I wasn't aware of.

Thanks.

--
tejun

2005-03-25 22:52:16

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Sat, 2005-03-26 at 06:43 +0900, Tejun Heo wrote:
> 1. Allocate scsi_request and request (two are linked)

This can't be done because the scsi_cmnd's are allocated specially (slab
with reserve pool).

James


2005-03-26 07:26:07

by Kai Mäkisara (Kolumbus)

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Fri, 25 Mar 2005, James Bottomley wrote:

> On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
> > We have users of scsi_do_req() other than scsi_wait_req() and they
> > use different done() functions to do different things. I've checked
> > other done functions and none uses contents inside the passed
> > scsi_cmnd, so using a dummy command should be okay with them. Am I
> > missing something here?
>
> Well ... the other users are supposed to be going away. They're
> actually all coded wrongly in some way or other ... perhaps I should
> speed up the process.
>
I have seen you mention this several times now and I am getting more and
more worried. The reason is that scsi_wait_req() is a synchronous
interface and it does not allow a driver to do this:

- send a request
- do other useful things/let the user do useful work
- wait for completion before starting another request

I fully agree that doing done() correctly _is_ a problem, especially when
the SCSI subsystem evolves and the high-level driver writers do not follow
the development closely enough.

One solution to these problems would be to let the drivers still use
scsi_do_req() and their own done() function, but create two
(three) helpers:
- one to be called at the beginning of done(); it would do what needs to
be done here but lets the driver to do some special things of its own if
necessary
- one to be called to wait for the request to finish
(- one to do scsi_ro_req() and the things necessary before these)

Having these helpers would isolate the user of the SCSI subsystem from the
internals. scsi_wait_req() should call these functions and no additional
maintenance would be needed for this additional asynchronous interface.

The current drivers may not do any work in done() that could not be done
later but there is one patch pending where this happens: the st
performance statistics patch needs to get the time stamp when the SCSI
command is processed.

--
Kai

2005-03-26 14:49:11

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

On Sat, 2005-03-26 at 09:27 +0200, Kai Makisara wrote:
> I fully agree that doing done() correctly _is_ a problem, especially when
> the SCSI subsystem evolves and the high-level driver writers do not follow
> the development closely enough.
>
> One solution to these problems would be to let the drivers still use
> scsi_do_req() and their own done() function, but create two
> (three) helpers:
> - one to be called at the beginning of done(); it would do what needs to
> be done here but lets the driver to do some special things of its own if
> necessary
> - one to be called to wait for the request to finish
> (- one to do scsi_ro_req() and the things necessary before these)

Yes. The drivers that use it just need visiting with a big hammer.
However, our character ULDs (st and sg) use it because they try to
simulate fire and forget in the async write path (That's the only time
you actually don't wait on completion for scsi_do_req).

This comes about because the mid-layer is block oriented, so you can't
use any of the read/write machinery. We could fix this by having a
generic character tap to a block queue for use in cases like SCSI where
the underlying driver uses block queues even if the actual device isn't
a block device.

Essentially, the character tap would simply submit a stream of reads and
writes through the block queue. Then we could modify st and sg to use
an identical framework to the other ULDs ... you get a setup API and a
returning command API which are called for every I/O and the block layer
gets to handle the async/not-async pieces.

James


2005-03-29 17:04:21

by Patrick Mansfield

[permalink] [raw]
Subject: Re: [PATCH scsi-misc-2.6 07/08] scsi: remove bogus {get|put}_device() calls

On Wed, Mar 23, 2005 at 06:13:26PM +0900, Tejun Heo wrote:
> Hi,
>
> James Bottomley wrote:
> >On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> >
> >> So, basically, SCSI high-level object (scsi_disk) and
> >> mid-level object (scsi_device) are reference counted by users,
> >> not the requests they submit. Reference count cannot go zero
> >> with active users and users cannot access the object once the
> >> reference count reaches zero.
> >
> >
> >Actually, no. Unfortunately we still have some fire and forget APIs, so
> >the contention that we always have an open refcounted descriptor isn't
> >always true.

What API's, and what usage?

> Yeap, you're right. So, what we have is

> * All high-level users have open access to the scsi high-level
> object on issueing requests, but may close it before its requests
> complete.

> * All mid-layer users do get_device() before submitting requests,
> but may put_device() before its requests complete.

Any LLDD's issuing requests should be doing a get/put around the request.

Any upper level drivers calling scsi_device_put() before a request
completes is likely a bug. sg has code in place to handle the
post-release/close completion of IO (IMO, a bad design).

Are any upper level drivers calling scsi_device_put() while they have
outstanding IO?

The scan code never calls upper level drivers probe functions via
device_add unless we are going to keep the scsi_device (well, there are
error paths in scsi_sysfs_add_sdev that look bad - we don't check the
result of scsi_sysfs_add_sdev). But for completeness, we could add
get/puts to the scan code.

As you pointed out, the current get_device() will never return NULL when
called via:

get_device(&sdev->sdev_gendev)

The current code only narrows the window where problems might occur, I
don't see how it can completely avoid races with removal.

And the patch removes code from the mainline scsi IO paths.

-- Patrick Mansfield