Currently I/O request could be still submitted to UFS device while
UFS is working on shutdown flow. This may lead to racing as below
scenarios and finally system may crash due to unclocked register
accesses.
To fix this kind of issues, specifically quiesce all SCSI devices
before UFS shutdown to block all I/O request sending from block
layer.
Example of racing scenario: While UFS device is runtime-suspended
Thread #1: Executing UFS shutdown flow, e.g.,
ufshcd_suspend(UFS_SHUTDOWN_PM)
Thread #2: Executing runtime resume flow triggered by I/O request,
e.g., ufshcd_resume(UFS_RUNTIME_PM)
This breaks the assumption that UFS PM flows can not be running
concurrently and some unexpected racing behavior may happen.
Signed-off-by: Stanley Chu <[email protected]>
---
drivers/scsi/ufs/ufshcd.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9d180da77488..2e18596f3a8e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -159,6 +159,12 @@ struct ufs_pm_lvl_states ufs_pm_lvl_states[] = {
{UFS_POWERDOWN_PWR_MODE, UIC_LINK_OFF_STATE},
};
+#define ufshcd_scsi_for_each_sdev(fn) \
+ list_for_each_entry(starget, &hba->host->__targets, siblings) { \
+ __starget_for_each_device(starget, NULL, \
+ fn); \
+ }
+
static inline enum ufs_dev_pwr_mode
ufs_get_pm_lvl_to_dev_pwr_mode(enum ufs_pm_level lvl)
{
@@ -8620,6 +8626,13 @@ int ufshcd_runtime_idle(struct ufs_hba *hba)
}
EXPORT_SYMBOL(ufshcd_runtime_idle);
+static void ufshcd_quiesce_sdev(struct scsi_device *sdev, void *data)
+{
+ /* Suspended devices are already quiesced so can be skipped */
+ if (!pm_runtime_suspended(&sdev->sdev_gendev))
+ scsi_device_quiesce(sdev);
+}
+
/**
* ufshcd_shutdown - shutdown routine
* @hba: per adapter instance
@@ -8631,6 +8644,7 @@ EXPORT_SYMBOL(ufshcd_runtime_idle);
int ufshcd_shutdown(struct ufs_hba *hba)
{
int ret = 0;
+ struct scsi_target *starget;
if (!hba->is_powered)
goto out;
@@ -8644,6 +8658,21 @@ int ufshcd_shutdown(struct ufs_hba *hba)
goto out;
}
+ /*
+ * Quiesce all SCSI devices to prevent any non-PM requests sending
+ * from block layer during and after shutdown.
+ *
+ * Here we can not use blk_cleanup_queue() since PM requests
+ * (with BLK_MQ_REQ_PREEMPT flag) are still required to be sent
+ * through block layer. Therefore SCSI command queued after the
+ * scsi_target_quiesce() call returned will block until
+ * blk_cleanup_queue() is called.
+ *
+ * Besides, scsi_target_"un"quiesce (e.g., scsi_target_resume) can
+ * be ignored since shutdown is one-way flow.
+ */
+ ufshcd_scsi_for_each_sdev(ufshcd_quiesce_sdev);
+
ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM);
out:
if (ret)
--
2.18.0
Hi Stanley,
On 2020-07-24 22:01, Stanley Chu wrote:
> Currently I/O request could be still submitted to UFS device while
> UFS is working on shutdown flow. This may lead to racing as below
> scenarios and finally system may crash due to unclocked register
> accesses.
>
> To fix this kind of issues, specifically quiesce all SCSI devices
> before UFS shutdown to block all I/O request sending from block
> layer.
>
> Example of racing scenario: While UFS device is runtime-suspended
>
> Thread #1: Executing UFS shutdown flow, e.g.,
> ufshcd_suspend(UFS_SHUTDOWN_PM)
> Thread #2: Executing runtime resume flow triggered by I/O request,
> e.g., ufshcd_resume(UFS_RUNTIME_PM)
>
I don't quite get it, how can you prevent block layer PM from iniating
hba runtime resume by quiescing the scsi devices? Block layer PM
iniates hba async runtime resume in blk_queue_enter(). But quiescing
the scsi devices can only prevent general I/O requests from passing
through scsi_queue_rq() callback.
Say hba is runtime suspended, if an I/O request to sda is sent from
block layer (sda must be runtime suspended as well at this time),
blk_queue_enter() initiates async runtime resume for sda. But since
sda's parents are also runtime suspended, the RPM framework shall do
runtime resume to the devices in the sequence hba->host->target->sda.
In this case, ufshcd_resume() still runs concurrently, no?
Thanks,
Can Guo.
> This breaks the assumption that UFS PM flows can not be running
> concurrently and some unexpected racing behavior may happen.
>
> Signed-off-by: Stanley Chu <[email protected]>
> ---
> drivers/scsi/ufs/ufshcd.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 9d180da77488..2e18596f3a8e 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -159,6 +159,12 @@ struct ufs_pm_lvl_states ufs_pm_lvl_states[] = {
> {UFS_POWERDOWN_PWR_MODE, UIC_LINK_OFF_STATE},
> };
>
> +#define ufshcd_scsi_for_each_sdev(fn) \
> + list_for_each_entry(starget, &hba->host->__targets, siblings) { \
> + __starget_for_each_device(starget, NULL, \
> + fn); \
> + }
> +
> static inline enum ufs_dev_pwr_mode
> ufs_get_pm_lvl_to_dev_pwr_mode(enum ufs_pm_level lvl)
> {
> @@ -8620,6 +8626,13 @@ int ufshcd_runtime_idle(struct ufs_hba *hba)
> }
> EXPORT_SYMBOL(ufshcd_runtime_idle);
>
> +static void ufshcd_quiesce_sdev(struct scsi_device *sdev, void *data)
> +{
> + /* Suspended devices are already quiesced so can be skipped */
> + if (!pm_runtime_suspended(&sdev->sdev_gendev))
> + scsi_device_quiesce(sdev);
> +}
> +
> /**
> * ufshcd_shutdown - shutdown routine
> * @hba: per adapter instance
> @@ -8631,6 +8644,7 @@ EXPORT_SYMBOL(ufshcd_runtime_idle);
> int ufshcd_shutdown(struct ufs_hba *hba)
> {
> int ret = 0;
> + struct scsi_target *starget;
>
> if (!hba->is_powered)
> goto out;
> @@ -8644,6 +8658,21 @@ int ufshcd_shutdown(struct ufs_hba *hba)
> goto out;
> }
>
> + /*
> + * Quiesce all SCSI devices to prevent any non-PM requests sending
> + * from block layer during and after shutdown.
> + *
> + * Here we can not use blk_cleanup_queue() since PM requests
> + * (with BLK_MQ_REQ_PREEMPT flag) are still required to be sent
> + * through block layer. Therefore SCSI command queued after the
> + * scsi_target_quiesce() call returned will block until
> + * blk_cleanup_queue() is called.
> + *
> + * Besides, scsi_target_"un"quiesce (e.g., scsi_target_resume) can
> + * be ignored since shutdown is one-way flow.
> + */
> + ufshcd_scsi_for_each_sdev(ufshcd_quiesce_sdev);
> +
> ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM);
> out:
> if (ret)
Hi Can,
On Mon, 2020-07-27 at 15:30 +0800, Can Guo wrote:
> Hi Stanley,
>
> On 2020-07-24 22:01, Stanley Chu wrote:
> > Currently I/O request could be still submitted to UFS device while
> > UFS is working on shutdown flow. This may lead to racing as below
> > scenarios and finally system may crash due to unclocked register
> > accesses.
> >
> > To fix this kind of issues, specifically quiesce all SCSI devices
> > before UFS shutdown to block all I/O request sending from block
> > layer.
> >
> > Example of racing scenario: While UFS device is runtime-suspended
> >
> > Thread #1: Executing UFS shutdown flow, e.g.,
> > ufshcd_suspend(UFS_SHUTDOWN_PM)
> > Thread #2: Executing runtime resume flow triggered by I/O request,
> > e.g., ufshcd_resume(UFS_RUNTIME_PM)
> >
>
> I don't quite get it, how can you prevent block layer PM from iniating
> hba runtime resume by quiescing the scsi devices? Block layer PM
> iniates hba async runtime resume in blk_queue_enter(). But quiescing
> the scsi devices can only prevent general I/O requests from passing
> through scsi_queue_rq() callback.
>
> Say hba is runtime suspended, if an I/O request to sda is sent from
> block layer (sda must be runtime suspended as well at this time),
> blk_queue_enter() initiates async runtime resume for sda. But since
> sda's parents are also runtime suspended, the RPM framework shall do
> runtime resume to the devices in the sequence hba->host->target->sda.
> In this case, ufshcd_resume() still runs concurrently, no?
>
You are right. This patch can not fix the case you mentioned. It just
prevents "general I/O requests".
So perhaps we also need below patch?
#2 scsi: ufs: Use pm_runtime_get_sync in shutdown flow
https://patchwork.kernel.org/patch/10964097/
The above patch #2 let runtime PM framework manage and prevent
concurrent runtime operations in device driver.
And then using patch #1 (this patch) to block general I/O requests after
ufshcd device is resumed.
Thanks,
Stanley Chu
> Thanks,
>
> Can Guo.
>
> > This breaks the assumption that UFS PM flows can not be running
> > concurrently and some unexpected racing behavior may happen.
> >
> > Signed-off-by: Stanley Chu <[email protected]>
> > ---
> > drivers/scsi/ufs/ufshcd.c | 29 +++++++++++++++++++++++++++++
> > 1 file changed, 29 insertions(+)
> >
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index 9d180da77488..2e18596f3a8e 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -159,6 +159,12 @@ struct ufs_pm_lvl_states ufs_pm_lvl_states[] = {
> > {UFS_POWERDOWN_PWR_MODE, UIC_LINK_OFF_STATE},
> > };
> >
> > +#define ufshcd_scsi_for_each_sdev(fn) \
> > + list_for_each_entry(starget, &hba->host->__targets, siblings) { \
> > + __starget_for_each_device(starget, NULL, \
> > + fn); \
> > + }
> > +
> > static inline enum ufs_dev_pwr_mode
> > ufs_get_pm_lvl_to_dev_pwr_mode(enum ufs_pm_level lvl)
> > {
> > @@ -8620,6 +8626,13 @@ int ufshcd_runtime_idle(struct ufs_hba *hba)
> > }
> > EXPORT_SYMBOL(ufshcd_runtime_idle);
> >
> > +static void ufshcd_quiesce_sdev(struct scsi_device *sdev, void *data)
> > +{
> > + /* Suspended devices are already quiesced so can be skipped */
> > + if (!pm_runtime_suspended(&sdev->sdev_gendev))
> > + scsi_device_quiesce(sdev);
> > +}
> > +
> > /**
> > * ufshcd_shutdown - shutdown routine
> > * @hba: per adapter instance
> > @@ -8631,6 +8644,7 @@ EXPORT_SYMBOL(ufshcd_runtime_idle);
> > int ufshcd_shutdown(struct ufs_hba *hba)
> > {
> > int ret = 0;
> > + struct scsi_target *starget;
> >
> > if (!hba->is_powered)
> > goto out;
> > @@ -8644,6 +8658,21 @@ int ufshcd_shutdown(struct ufs_hba *hba)
> > goto out;
> > }
> >
> > + /*
> > + * Quiesce all SCSI devices to prevent any non-PM requests sending
> > + * from block layer during and after shutdown.
> > + *
> > + * Here we can not use blk_cleanup_queue() since PM requests
> > + * (with BLK_MQ_REQ_PREEMPT flag) are still required to be sent
> > + * through block layer. Therefore SCSI command queued after the
> > + * scsi_target_quiesce() call returned will block until
> > + * blk_cleanup_queue() is called.
> > + *
> > + * Besides, scsi_target_"un"quiesce (e.g., scsi_target_resume) can
> > + * be ignored since shutdown is one-way flow.
> > + */
> > + ufshcd_scsi_for_each_sdev(ufshcd_quiesce_sdev);
> > +
> > ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM);
> > out:
> > if (ret)
Hi Can,
On Fri, 2020-07-31 at 16:58 +0800, Can Guo wrote:
> Hi Stanley,
>
> On 2020-07-31 16:22, Stanley Chu wrote:
> > Hi Can,
> >
> > On Mon, 2020-07-27 at 15:30 +0800, Can Guo wrote:
> >> Hi Stanley,
> >>
> >> On 2020-07-24 22:01, Stanley Chu wrote:
> >> > Currently I/O request could be still submitted to UFS device while
> >> > UFS is working on shutdown flow. This may lead to racing as below
> >> > scenarios and finally system may crash due to unclocked register
> >> > accesses.
> >> >
> >> > To fix this kind of issues, specifically quiesce all SCSI devices
> >> > before UFS shutdown to block all I/O request sending from block
> >> > layer.
> >> >
> >> > Example of racing scenario: While UFS device is runtime-suspended
> >> >
> >> > Thread #1: Executing UFS shutdown flow, e.g.,
> >> > ufshcd_suspend(UFS_SHUTDOWN_PM)
> >> > Thread #2: Executing runtime resume flow triggered by I/O request,
> >> > e.g., ufshcd_resume(UFS_RUNTIME_PM)
> >> >
> >>
> >> I don't quite get it, how can you prevent block layer PM from iniating
> >> hba runtime resume by quiescing the scsi devices? Block layer PM
> >> iniates hba async runtime resume in blk_queue_enter(). But quiescing
> >> the scsi devices can only prevent general I/O requests from passing
> >> through scsi_queue_rq() callback.
> >>
> >> Say hba is runtime suspended, if an I/O request to sda is sent from
> >> block layer (sda must be runtime suspended as well at this time),
> >> blk_queue_enter() initiates async runtime resume for sda. But since
> >> sda's parents are also runtime suspended, the RPM framework shall do
> >> runtime resume to the devices in the sequence hba->host->target->sda.
> >> In this case, ufshcd_resume() still runs concurrently, no?
> >>
> >
> > You are right. This patch can not fix the case you mentioned. It just
> > prevents "general I/O requests".
> >
> > So perhaps we also need below patch?
> >
> > #2 scsi: ufs: Use pm_runtime_get_sync in shutdown flow
> > https://patchwork.kernel.org/patch/10964097/
>
> That is what I am talking about, we definitely need this. Since
> you are already working on the fixes to the shutdown path, I will
> not upload my fixes (basically look same with yours). However, as
> regard for the new change, if pm_runtime_get_sync(hba->dev) < 0,
> hba can still be runtime ACTIVE, why directly goto out without a
> check of hba's runtime status?
>
Thanks for reminding this. Then I will fix it and resend both patches as
a new series to fix the shutdown path.
Thanks so much,
Stanley Chu