2024-03-07 15:19:42

by Wenchao Hao

[permalink] [raw]
Subject: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle

I am testing SCSI error handle with my previous scsi_debug error
injection patches, and found some issues when removing device and
error handler happened together.

These issues are triggered because devices in removing would be skipped
when calling shost_for_each_device().

The issues are found:
1. statistic info printed at beginning of scsi_error_handler is wrong
2. device reset is not triggered

V4:
- Remove the forth patch which fix IO hang when device removing
becaust the issue is fixed by commit '6df0e077d76bd (scsi: core:
Kick the requeue list after inserting when flushing)'

V3:
- Update patch description
- Update comments of functions added

V2:
- Fix IO hang by run all devices' queue after error handler
- Do not modify shost_for_each_device() directly but add a new
helper to iterate devices but do not skip devices in removing

Wenchao Hao (3):
scsi: core: Add new helper to iterate all devices of host
scsi: scsi_error: Fix wrong statistic when print error info
scsi: scsi_error: Fix device reset is not triggered

drivers/scsi/scsi.c | 46 ++++++++++++++++++++++++++------------
drivers/scsi/scsi_error.c | 4 ++--
include/scsi/scsi_device.h | 25 ++++++++++++++++++---
3 files changed, 56 insertions(+), 19 deletions(-)

--
2.32.0



2024-03-07 15:19:56

by Wenchao Hao

[permalink] [raw]
Subject: [PATCH v4 1/3] scsi: core: Add new helper to iterate all devices of host

shost_for_each_device() would skip devices which is in SDEV_CANCEL or
SDEV_DEL state, for some scenarios, we donot want to skip these devices,
so add a new macro shost_for_each_device_include_deleted() to handle it.

Following changes are introduced:

1. Rework scsi_device_get(), add new helper __scsi_device_get() which
determine if skip deleted scsi_device by parameter "skip_deleted".
2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
is used when calling __scsi_device_get()
3. Update shost_for_each_device() to call __scsi_iterate_devices() with
"skip_deleted" true
4. Add new macro shost_for_each_device_include_deleted() which call
__scsi_iterate_devices() with "skip_deleted" false

Signed-off-by: Wenchao Hao <[email protected]>
---
drivers/scsi/scsi.c | 46 ++++++++++++++++++++++++++------------
include/scsi/scsi_device.h | 25 ++++++++++++++++++---
2 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 8cad9792a562..ec0a5f7e6ab8 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -733,20 +733,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
return 0;
}

-/**
- * scsi_device_get - get an additional reference to a scsi_device
+/*
+ * __scsi_device_get - get an additional reference to a scsi_device
* @sdev: device to get a reference to
- *
- * Description: Gets a reference to the scsi_device and increments the use count
- * of the underlying LLDD module. You must hold host_lock of the
- * parent Scsi_Host or already have a reference when calling this.
- *
- * This will fail if a device is deleted or cancelled, or when the LLD module
- * is in the process of being unloaded.
+ * @skip_deleted: when true, would return failed if device is deleted
*/
-int scsi_device_get(struct scsi_device *sdev)
+static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
{
- if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
+ /*
+ * if skip_deleted is true and device is in removing, return failed
+ */
+ if (skip_deleted &&
+ (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
goto fail;
if (!try_module_get(sdev->host->hostt->module))
goto fail;
@@ -759,6 +757,22 @@ int scsi_device_get(struct scsi_device *sdev)
fail:
return -ENXIO;
}
+
+/**
+ * scsi_device_get - get an additional reference to a scsi_device
+ * @sdev: device to get a reference to
+ *
+ * Description: Gets a reference to the scsi_device and increments the use count
+ * of the underlying LLDD module. You must hold host_lock of the
+ * parent Scsi_Host or already have a reference when calling this.
+ *
+ * This will fail if a device is deleted or cancelled, or when the LLD module
+ * is in the process of being unloaded.
+ */
+int scsi_device_get(struct scsi_device *sdev)
+{
+ return __scsi_device_get(sdev, 0);
+}
EXPORT_SYMBOL(scsi_device_get);

/**
@@ -778,9 +792,13 @@ void scsi_device_put(struct scsi_device *sdev)
}
EXPORT_SYMBOL(scsi_device_put);

-/* helper for shost_for_each_device, see that for documentation */
+/**
+ * helper for shost_for_each_device, see that for documentation
+ * @skip_deleted: if true, sdev in progress of removing would be skipped
+ */
struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
- struct scsi_device *prev)
+ struct scsi_device *prev,
+ bool skip_deleted)
{
struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
struct scsi_device *next = NULL;
@@ -790,7 +808,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
while (list->next != &shost->__devices) {
next = list_entry(list->next, struct scsi_device, siblings);
/* skip devices that we can't get a reference to */
- if (!scsi_device_get(next))
+ if (!__scsi_device_get(next, skip_deleted))
break;
next = NULL;
list = list->next;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index c38f4fe5e64c..fc6d999cd201 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -411,7 +411,8 @@ extern void __starget_for_each_device(struct scsi_target *, void *,

/* only exposed to implement shost_for_each_device */
extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
- struct scsi_device *);
+ struct scsi_device *,
+ bool);

/**
* shost_for_each_device - iterate over all devices of a host
@@ -421,11 +422,29 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
* Iterator that returns each device attached to @shost. This loop
* takes a reference on each device and releases it at the end. If
* you break out of the loop, you must call scsi_device_put(sdev).
+ *
+ * Note: this macro would skip sdev which is in progress of removing
*/
#define shost_for_each_device(sdev, shost) \
- for ((sdev) = __scsi_iterate_devices((shost), NULL); \
+ for ((sdev) = __scsi_iterate_devices((shost), NULL, 1); \
+ (sdev); \
+ (sdev) = __scsi_iterate_devices((shost), (sdev), 1))
+
+/*
+ * shost_for_each_device_include_deleted- iterate over all devices of a host
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @shost: the &struct scsi_host to iterate over
+ *
+ * Iterator that returns each device attached to @shost. This loop
+ * takes a reference on each device and releases it at the end. If
+ * you break out of the loop, you must call scsi_device_put(sdev).
+ *
+ * Note: this macro would include sdev which is in progress of removing
+ */
+#define shost_for_each_device_include_deleted(sdev, shost) \
+ for ((sdev) = __scsi_iterate_devices((shost), NULL, 0); \
(sdev); \
- (sdev) = __scsi_iterate_devices((shost), (sdev)))
+ (sdev) = __scsi_iterate_devices((shost), (sdev), 0))

/**
* __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
--
2.32.0


2024-03-26 06:37:47

by Wenchao Hao

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle

On 2024/3/7 22:43, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
>

Ping...

> The issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
>
> V4:
> - Remove the forth patch which fix IO hang when device removing
> becaust the issue is fixed by commit '6df0e077d76bd (scsi: core:
> Kick the requeue list after inserting when flushing)'
>
> V3:
> - Update patch description
> - Update comments of functions added
>
> V2:
> - Fix IO hang by run all devices' queue after error handler
> - Do not modify shost_for_each_device() directly but add a new
> helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (3):
> scsi: core: Add new helper to iterate all devices of host
> scsi: scsi_error: Fix wrong statistic when print error info
> scsi: scsi_error: Fix device reset is not triggered
>
> drivers/scsi/scsi.c | 46 ++++++++++++++++++++++++++------------
> drivers/scsi/scsi_error.c | 4 ++--
> include/scsi/scsi_device.h | 25 ++++++++++++++++++---
> 3 files changed, 56 insertions(+), 19 deletions(-)
>


2024-04-12 02:07:19

by Wenchao Hao

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle

On 2024/3/7 22:43, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
>

Friendly ping...


> The issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
>
> V4:
> - Remove the forth patch which fix IO hang when device removing
> becaust the issue is fixed by commit '6df0e077d76bd (scsi: core:
> Kick the requeue list after inserting when flushing)'
>
> V3:
> - Update patch description
> - Update comments of functions added
>
> V2:
> - Fix IO hang by run all devices' queue after error handler
> - Do not modify shost_for_each_device() directly but add a new
> helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (3):
> scsi: core: Add new helper to iterate all devices of host
> scsi: scsi_error: Fix wrong statistic when print error info
> scsi: scsi_error: Fix device reset is not triggered
>
> drivers/scsi/scsi.c | 46 ++++++++++++++++++++++++++------------
> drivers/scsi/scsi_error.c | 4 ++--
> include/scsi/scsi_device.h | 25 ++++++++++++++++++---
> 3 files changed, 56 insertions(+), 19 deletions(-)
>


2024-04-15 15:09:08

by Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle


> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().


How do you think about to add the tag “Fixes” to any of your patches?

Regards,
Markus

2024-04-17 15:00:53

by Wenchao Hao

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle

On 4/15/24 11:07 PM, Markus Elfring wrote:
> …
>> These issues are triggered because devices in removing would be skipped
>> when calling shost_for_each_device().
> …
>
> How do you think about to add the tag “Fixes” to any of your patches?
>

Thanks for your suggestion, but I don't know how to add this tag. Could you
please tell me how to do?

I just added "fix" in my cover letter and some patch's subject.

> Regards,
> Markus
>


2024-04-17 15:08:37

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle



On Wed, 17 Apr 2024, Wenchao Hao wrote:

> On 4/15/24 11:07 PM, Markus Elfring wrote:
> > …
> >> These issues are triggered because devices in removing would be skipped
> >> when calling shost_for_each_device().
> > …
> >
> > How do you think about to add the tag “Fixes” to any of your patches?
> >
>
> Thanks for your suggestion, but I don't know how to add this tag. Could you
> please tell me how to do?
>
> I just added "fix" in my cover letter and some patch's subject.

Search in the git history for other patches that contain Fixes:

Search in process/submitting-patches.rst for Fixes:

julia

>
> > Regards,
> > Markus
> >
>
>
>

2024-04-17 15:29:35

by Wenchao Hao

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle

On 4/17/24 11:06 PM, Julia Lawall wrote:
>
>
> On Wed, 17 Apr 2024, Wenchao Hao wrote:
>
>> On 4/15/24 11:07 PM, Markus Elfring wrote:
>>> …
>>>> These issues are triggered because devices in removing would be skipped
>>>> when calling shost_for_each_device().
>>> …
>>>
>>> How do you think about to add the tag “Fixes” to any of your patches?
>>>
>>
>> Thanks for your suggestion, but I don't know how to add this tag. Could you
>> please tell me how to do?
>>
>> I just added "fix" in my cover letter and some patch's subject.
>
> Search in the git history for other patches that contain Fixes:
>
> Search in process/submitting-patches.rst for Fixes:
>

These issues are introduced at first version of git record, which is
1da177e4c3f4 ("Linux-2.6.12-rc2"). I think "Fixes" tag should not added
for this commit.

> julia
>
>>
>>> Regards,
>>> Markus
>>>
>>
>>
>>


2024-04-17 15:40:00

by Wenchao Hao

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] SCSI: Fix issues between removing device and error handle

On 4/17/24 11:06 PM, Julia Lawall wrote:
>
>
> On Wed, 17 Apr 2024, Wenchao Hao wrote:
>
>> On 4/15/24 11:07 PM, Markus Elfring wrote:
>>> …
>>>> These issues are triggered because devices in removing would be skipped
>>>> when calling shost_for_each_device().
>>> …
>>>
>>> How do you think about to add the tag “Fixes” to any of your patches?
>>>
>>
>> Thanks for your suggestion, but I don't know how to add this tag. Could you
>> please tell me how to do?
>>
>> I just added "fix" in my cover letter and some patch's subject.
>
> Search in the git history for other patches that contain Fixes:
>
> Search in process/submitting-patches.rst for Fixes:
>

Thank you, I would add this tag in my next post.

> julia
>
>>
>>> Regards,
>>> Markus
>>>
>>
>>
>>


2024-04-17 16:11:54

by Markus Elfring

[permalink] [raw]
Subject: Re: [v4 0/3] SCSI: Fix issues between removing device and error handle

>> Search in process/submitting-patches.rst for Fixes:
>
> These issues are introduced at first version of git record, which is
> 1da177e4c3f4 ("Linux-2.6.12-rc2"). I think "Fixes" tag should not added
> for this commit.

I suggest to take also another look at information in a table
like “Releases fixed in v6.8” from the article “Development statistics for 6.8”
by Jonathan Corbet.
https://lwn.net/Articles/964106/

Regards,
Markus

2024-04-17 16:39:53

by Wenchao Hao

[permalink] [raw]
Subject: Re: [v4 0/3] SCSI: Fix issues between removing device and error handle

On 4/17/24 11:48 PM, Markus Elfring wrote:
>>> Search in process/submitting-patches.rst for Fixes:
>>
>> These issues are introduced at first version of git record, which is
>> 1da177e4c3f4 ("Linux-2.6.12-rc2"). I think "Fixes" tag should not added
>> for this commit.
>
> I suggest to take also another look at information in a table
> like “Releases fixed in v6.8” from the article “Development statistics for 6.8”
> by Jonathan Corbet.
> https://lwn.net/Articles/964106/
>

Thank you, I found some Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") in this article.
Then I searched in git log and found these "Fixes" tag too.

> Regards,
> Markus