LinuxLists.cc - Re: [PATCH] scsi: ufs: fix livelock on ufshcd_clear_ua

2020-12-17 01:30:17

Subject: Re: [PATCH] scsi: ufs: fix livelock on ufshcd_clear_ua_wlun

Hi Jaegeuk,

On Wed, 2020-12-16 at 11:02 -0800, Jaegeuk Kim wrote:
> From: Jaegeuk Kim <[email protected]>
>
> This fixes the below livelock which is caused by calling a scsi command before
> ufshcd_scsi_unblock_requests() in ufshcd_ungate_work().
>
> Workqueue: ufs_clk_gating_0 ufshcd_ungate_work
> Call trace:
> __switch_to+0x298/0x2bc
> __schedule+0x59c/0x760
> schedule+0xac/0xf0
> schedule_timeout+0x44/0x1b4
> io_schedule_timeout+0x44/0x68
> wait_for_common_io+0x7c/0x100
> wait_for_completion_io+0x14/0x20
> blk_execute_rq+0x94/0xd0
> __scsi_execute+0x100/0x1c0
> ufshcd_clear_ua_wlun+0x124/0x1c8
> ufshcd_host_reset_and_restore+0x1d0/0x2cc
> ufshcd_link_recovery+0xac/0x134
> ufshcd_uic_hibern8_exit+0x1e8/0x1f0
> ufshcd_ungate_work+0xac/0x130

According to the latest mainstream kernel, once
ufshcd_uic_hibern8_exit() encounters error, instead, error handler work
will be scheduled without blocking ufshcd_uic_hibern8_exit(). In
addition, ufshcd_scsi_unblock_requests() would be invoked before leaving
ufshcd_uic_hibern8_exit(). So this stack is no longer existed.

Thanks,
Stanley Chu

> process_one_work+0x270/0x47c
> worker_thread+0x27c/0x4d8
> kthread+0x13c/0x320
> ret_from_fork+0x10/0x18
>
> Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> drivers/scsi/ufs/ufshcd.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index e221add25a7e..b0998db1b781 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -1603,6 +1603,7 @@ static void ufshcd_ungate_work(struct work_struct *work)
> }
> unblock_reqs:
> ufshcd_scsi_unblock_requests(hba);
> + ufshcd_clear_ua_wluns(hba);
> }
>
> /**
> @@ -6913,7 +6914,7 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
>
> /* Establish the link again and restore the device */
> err = ufshcd_probe_hba(hba, false);
> - if (!err)
> + if (!err && !hba->clk_gating.is_suspended)
> ufshcd_clear_ua_wluns(hba);
> out:
> if (err)
> @@ -8745,6 +8746,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
> ufshcd_resume_clkscaling(hba);
> hba->clk_gating.is_suspended = false;
> hba->dev_info.b_rpm_dev_flush_capable = false;
> + ufshcd_clear_ua_wluns(hba);
> ufshcd_release(hba);
> out:
> if (hba->dev_info.b_rpm_dev_flush_capable) {
> @@ -8855,6 +8857,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
> cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
> }
>
> + ufshcd_clear_ua_wluns(hba);
> +
> /* Schedule clock gating in case of no access to UFS device yet */
> ufshcd_release(hba);
>

2020-12-17 01:44:29

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [PATCH] scsi: ufs: fix livelock on ufshcd_clear_ua_wlun

On 12/17, Stanley Chu wrote:
> Hi Jaegeuk,
>
> On Wed, 2020-12-16 at 11:02 -0800, Jaegeuk Kim wrote:
> > From: Jaegeuk Kim <[email protected]>
> >
> > This fixes the below livelock which is caused by calling a scsi command before
> > ufshcd_scsi_unblock_requests() in ufshcd_ungate_work().
> >
> > Workqueue: ufs_clk_gating_0 ufshcd_ungate_work
> > Call trace:
> > __switch_to+0x298/0x2bc
> > __schedule+0x59c/0x760
> > schedule+0xac/0xf0
> > schedule_timeout+0x44/0x1b4
> > io_schedule_timeout+0x44/0x68
> > wait_for_common_io+0x7c/0x100
> > wait_for_completion_io+0x14/0x20
> > blk_execute_rq+0x94/0xd0
> > __scsi_execute+0x100/0x1c0
> > ufshcd_clear_ua_wlun+0x124/0x1c8
> > ufshcd_host_reset_and_restore+0x1d0/0x2cc
> > ufshcd_link_recovery+0xac/0x134
> > ufshcd_uic_hibern8_exit+0x1e8/0x1f0
> > ufshcd_ungate_work+0xac/0x130
>
> According to the latest mainstream kernel, once
> ufshcd_uic_hibern8_exit() encounters error, instead, error handler work
> will be scheduled without blocking ufshcd_uic_hibern8_exit(). In
> addition, ufshcd_scsi_unblock_requests() would be invoked before leaving
> ufshcd_uic_hibern8_exit(). So this stack is no longer existed.

Oh, thank you for pointing this out. It seems the below patch made it.
4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and other error recovery paths")

Next time, I need to check upstream more carefully. :P

>
> Thanks,
> Stanley Chu
>
> > process_one_work+0x270/0x47c
> > worker_thread+0x27c/0x4d8
> > kthread+0x13c/0x320
> > ret_from_fork+0x10/0x18
> >
> > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > drivers/scsi/ufs/ufshcd.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index e221add25a7e..b0998db1b781 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -1603,6 +1603,7 @@ static void ufshcd_ungate_work(struct work_struct *work)
> > }
> > unblock_reqs:
> > ufshcd_scsi_unblock_requests(hba);
> > + ufshcd_clear_ua_wluns(hba);
> > }
> >
> > /**
> > @@ -6913,7 +6914,7 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
> >
> > /* Establish the link again and restore the device */
> > err = ufshcd_probe_hba(hba, false);
> > - if (!err)
> > + if (!err && !hba->clk_gating.is_suspended)
> > ufshcd_clear_ua_wluns(hba);
> > out:
> > if (err)
> > @@ -8745,6 +8746,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
> > ufshcd_resume_clkscaling(hba);
> > hba->clk_gating.is_suspended = false;
> > hba->dev_info.b_rpm_dev_flush_capable = false;
> > + ufshcd_clear_ua_wluns(hba);
> > ufshcd_release(hba);
> > out:
> > if (hba->dev_info.b_rpm_dev_flush_capable) {
> > @@ -8855,6 +8857,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
> > cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
> > }
> >
> > + ufshcd_clear_ua_wluns(hba);
> > +
> > /* Schedule clock gating in case of no access to UFS device yet */
> > ufshcd_release(hba);
> >
>