I am testing SCSI error handle with my previous scsi_debug error
injection patches, and found some issues when removing device and
error handler happened together.
These issues are triggered because devices in removing would be skipped
when calling shost_for_each_device().
Three issues are found:
1. statistic info printed at beginning of scsi_error_handler is wrong
2. device reset is not triggered
3. IO requeued to request_queue would be hang after error handle
V2:
- Fix IO hang by run all devices' queue after error handler
- Do not modify shost_for_each_device() directly but add a new
helper to iterate devices but do not skip devices in removing
Wenchao Hao (4):
scsi: core: Add new helper to iterate all devices of host
scsi: scsi_error: Fix wrong statistic when print error info
scsi: scsi_error: Fix device reset is not triggered
scsi: scsi_core: Fix IO hang when device removing
drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
drivers/scsi/scsi_error.c | 4 ++--
drivers/scsi/scsi_lib.c | 2 +-
include/scsi/scsi_device.h | 25 +++++++++++++++++++---
4 files changed, 53 insertions(+), 21 deletions(-)
--
2.32.0
On 2023/9/28 15:35, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
>
ping...
> Three issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
> 3. IO requeued to request_queue would be hang after error handle
>
> V2:
> - Fix IO hang by run all devices' queue after error handler
> - Do not modify shost_for_each_device() directly but add a new
> helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (4):
> scsi: core: Add new helper to iterate all devices of host
> scsi: scsi_error: Fix wrong statistic when print error info
> scsi: scsi_error: Fix device reset is not triggered
> scsi: scsi_core: Fix IO hang when device removing
>
> drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
> drivers/scsi/scsi_error.c | 4 ++--
> drivers/scsi/scsi_lib.c | 2 +-
> include/scsi/scsi_device.h | 25 +++++++++++++++++++---
> 4 files changed, 53 insertions(+), 21 deletions(-)
>
On 2023/9/28 15:35, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
>
> Three issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
> 3. IO requeued to request_queue would be hang after error handle
>
These patches fix bug which is easy to recurrent when removing device
and error handle happened together, so friendly ping again...
> V2:
> - Fix IO hang by run all devices' queue after error handler
> - Do not modify shost_for_each_device() directly but add a new
> helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (4):
> scsi: core: Add new helper to iterate all devices of host
> scsi: scsi_error: Fix wrong statistic when print error info
> scsi: scsi_error: Fix device reset is not triggered
> scsi: scsi_core: Fix IO hang when device removing
>
> drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
> drivers/scsi/scsi_error.c | 4 ++--
> drivers/scsi/scsi_lib.c | 2 +-
> include/scsi/scsi_device.h | 25 +++++++++++++++++++---
> 4 files changed, 53 insertions(+), 21 deletions(-)
>
On 2023/9/28 15:35, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
>
> Three issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
> 3. IO requeued to request_queue would be hang after error handle
>
Hi Martin, would you help review these patches?
> V2:
> - Fix IO hang by run all devices' queue after error handler
> - Do not modify shost_for_each_device() directly but add a new
> helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (4):
> scsi: core: Add new helper to iterate all devices of host
> scsi: scsi_error: Fix wrong statistic when print error info
> scsi: scsi_error: Fix device reset is not triggered
> scsi: scsi_core: Fix IO hang when device removing
>
> drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
> drivers/scsi/scsi_error.c | 4 ++--
> drivers/scsi/scsi_lib.c | 2 +-
> include/scsi/scsi_device.h | 25 +++++++++++++++++++---
> 4 files changed, 53 insertions(+), 21 deletions(-)
>