LinuxLists.cc - [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

2021-09-17 11:50:30

Subject: [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

Assume a scenario where task A and B call ufshcd_devfreq_scale()
simultaneously. After task B calls downgrade_write() [1], but before it
calls down_read() [3], if task A calls down_write() [2], when task B calls
down_read() [3], it will lead to dead lock. Fix this by utilizing the
existing flag scaling.is_allowed to make sure only one task can do clock
scaling at a time.

Task A -
down_write [2]
ufshcd_clock_scaling_prepare
ufshcd_devfreq_scale
ufshcd_clkscale_enable_store

Task B -
down_read [3]
ufshcd_exec_dev_cmd
ufshcd_query_flag
ufshcd_wb_ctrl
downgrade_write [1]
ufshcd_devfreq_scale
ufshcd_devfreq_target
devfreq_set_target
update_devfreq
devfreq_performance_handler
governor_store

Fixes: 0e9d4ca43ba81 ("scsi: ufs: Protect some contexts from unexpected clock scaling")
Signed-off-by: Can Guo <[email protected]>

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 3841ab49..782a9c8 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1186,6 +1186,7 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
goto out;
}

+ hba->clk_scaling.is_allowed = false;
/* let's not get into low power until clock scaling is completed */
ufshcd_hold(hba, false);

@@ -1193,12 +1194,10 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
return ret;
}

-static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, bool writelock)
+static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
{
- if (writelock)
- up_write(&hba->clk_scaling_lock);
- else
- up_read(&hba->clk_scaling_lock);
+ hba->clk_scaling.is_allowed = true;
+ up_write(&hba->clk_scaling_lock);
ufshcd_scsi_unblock_requests(hba);
ufshcd_release(hba);
}
@@ -1215,7 +1214,6 @@ static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, bool writelock)
static int ufshcd_devfreq_scale(struct ufs_hba *hba, bool scale_up)
{
int ret = 0;
- bool is_writelock = true;

ret = ufshcd_clock_scaling_prepare(hba);
if (ret)
@@ -1245,12 +1243,12 @@ static int ufshcd_devfreq_scale(struct ufs_hba *hba, bool scale_up)
}

/* Enable Write Booster if we have scaled up else disable it */
- downgrade_write(&hba->clk_scaling_lock);
- is_writelock = false;
+ up_write(&hba->clk_scaling_lock);
ufshcd_wb_toggle(hba, scale_up);
+ down_write(&hba->clk_scaling_lock);

out_unprepare:
- ufshcd_clock_scaling_unprepare(hba, is_writelock);
+ ufshcd_clock_scaling_unprepare(hba);
return ret;
}

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2021-09-18 01:31:00

by Bart Van Assche

[permalink] [raw]

Subject: Re: [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

On 9/16/21 6:51 PM, Can Guo wrote:
> Assume a scenario where task A and B call ufshcd_devfreq_scale()
> simultaneously. After task B calls downgrade_write() [1], but before it
> calls down_read() [3], if task A calls down_write() [2], when task B calls
> down_read() [3], it will lead to dead lock.

Something is wrong with the above description. The downgrade_write() call is
not followed by down_read() but by up_read(). Additionally, I don't see how
concurrent calls of ufshcd_devfreq_scale() could lead to a deadlock. If one
thread calls downgrade_write() and another thread calls down_write() immediately,
that down_write() call will block until the other thread has called up_read()
without triggering a deadlock.

Thanks,

Bart.

2021-09-29 04:01:25

by Can Guo

[permalink] [raw]

Subject: Re: [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

Hi Bart,

On 2021-09-18 01:27, Bart Van Assche wrote:
> On 9/16/21 6:51 PM, Can Guo wrote:
>> Assume a scenario where task A and B call ufshcd_devfreq_scale()
>> simultaneously. After task B calls downgrade_write() [1], but before
>> it
>> calls down_read() [3], if task A calls down_write() [2], when task B
>> calls
>> down_read() [3], it will lead to dead lock.
>
> Something is wrong with the above description. The downgrade_write()
> call is
> not followed by down_read() but by up_read(). Additionally, I don't see
> how
> concurrent calls of ufshcd_devfreq_scale() could lead to a deadlock.

As mentioned in the commit msg, the down_read() [3] is from
ufshcd_wb_ctrl().

Task A -
down_write [2]
ufshcd_clock_scaling_prepare
ufshcd_devfreq_scale
ufshcd_clkscale_enable_store

Task B -
down_read [3]
ufshcd_exec_dev_cmd
ufshcd_query_flag
ufshcd_wb_ctrl
downgrade_write [1]
ufshcd_devfreq_scale
ufshcd_devfreq_target
devfreq_set_target
update_devfreq
devfreq_performance_handler
governor_store

> If one thread calls downgrade_write() and another thread calls
> down_write()
> immediately, that down_write() call will block until the other thread
> has called up_read()
> without triggering a deadlock.

Since the down_write() caller is blocked, the down_read() caller, which
comes after
down_write(), is blocked too, no? downgrade_write() keeps lock owner as
it is, but
it does not change the fact that readers and writers can be blocked by
each other.

>
> Thanks,
>
> Bart.

Thanks,

Can.

2021-09-29 19:31:21

by Bart Van Assche

[permalink] [raw]

Subject: Re: [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

On 9/28/21 8:31 PM, Can Guo wrote:
> On 2021-09-18 01:27, Bart Van Assche wrote:
>> On 9/16/21 6:51 PM, Can Guo wrote:
>>> Assume a scenario where task A and B call ufshcd_devfreq_scale()
>>> simultaneously. After task B calls downgrade_write() [1], but before it
>>> calls down_read() [3], if task A calls down_write() [2], when task B calls
>>> down_read() [3], it will lead to dead lock.
>>
>> Something is wrong with the above description. The downgrade_write() call is
>> not followed by down_read() but by up_read(). Additionally, I don't see how
>> concurrent calls of ufshcd_devfreq_scale() could lead to a deadlock.
>
> As mentioned in the commit msg, the down_read() [3] is from ufshcd_wb_ctrl().
>
> Task A -
> down_write [2]
> ufshcd_clock_scaling_prepare
> ufshcd_devfreq_scale
> ufshcd_clkscale_enable_store
>
> Task B -
> down_read [3]
> ufshcd_exec_dev_cmd
> ufshcd_query_flag
> ufshcd_wb_ctrl
> downgrade_write [1]
> ufshcd_devfreq_scale
> ufshcd_devfreq_target
> devfreq_set_target
> update_devfreq
> devfreq_performance_handler
> governor_store
>
>
>> If one thread calls downgrade_write() and another thread calls down_write()
>> immediately, that down_write() call will block until the other thread has called up_read()
>> without triggering a deadlock.
>
> Since the down_write() caller is blocked, the down_read() caller, which comes after
> down_write(), is blocked too, no? downgrade_write() keeps lock owner as it is, but
> it does not change the fact that readers and writers can be blocked by each other.

Please use the upstream function names when posting upstream patches. I think that
ufshcd_wb_ctrl() has been renamed into ufshcd_wb_toggle().

So the deadlock is caused by nested locking - one task holding a reader lock, another
task calling down_write() and next the first task grabbing the reader lock recursively?
I prefer one of the following two solutions above the patch that has been posted since
I expect that both alternatives will result in easier to maintain UFS code:
- Fix the down_read() implementation. Making down_read() wait in case of nested locking
seems wrong to me.
- Modify the UFS driver such that it does not lock hba->clk_scaling_lock recursively.

Thanks,

Bart.

2021-09-30 04:03:16

by Can Guo

[permalink] [raw]

Subject: Re: [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

On 2021-09-30 02:15, Bart Van Assche wrote:
> On 9/28/21 8:31 PM, Can Guo wrote:
>> On 2021-09-18 01:27, Bart Van Assche wrote:
>>> On 9/16/21 6:51 PM, Can Guo wrote:
>>>> Assume a scenario where task A and B call ufshcd_devfreq_scale()
>>>> simultaneously. After task B calls downgrade_write() [1], but before
>>>> it
>>>> calls down_read() [3], if task A calls down_write() [2], when task B
>>>> calls
>>>> down_read() [3], it will lead to dead lock.
>>>
>>> Something is wrong with the above description. The downgrade_write()
>>> call is
>>> not followed by down_read() but by up_read(). Additionally, I don't
>>> see how
>>> concurrent calls of ufshcd_devfreq_scale() could lead to a deadlock.
>>
>> As mentioned in the commit msg, the down_read() [3] is from
>> ufshcd_wb_ctrl().
>>
>> Task A -
>> down_write [2]
>> ufshcd_clock_scaling_prepare
>> ufshcd_devfreq_scale
>> ufshcd_clkscale_enable_store
>>
>> Task B -
>> down_read [3]
>> ufshcd_exec_dev_cmd
>> ufshcd_query_flag
>> ufshcd_wb_ctrl
>> downgrade_write [1]
>> ufshcd_devfreq_scale
>> ufshcd_devfreq_target
>> devfreq_set_target
>> update_devfreq
>> devfreq_performance_handler
>> governor_store
>>
>>
>>> If one thread calls downgrade_write() and another thread calls
>>> down_write()
>>> immediately, that down_write() call will block until the other thread
>>> has called up_read()
>>> without triggering a deadlock.
>>
>> Since the down_write() caller is blocked, the down_read() caller,
>> which comes after
>> down_write(), is blocked too, no? downgrade_write() keeps lock owner
>> as it is, but
>> it does not change the fact that readers and writers can be blocked by
>> each other.
>
> Please use the upstream function names when posting upstream patches.
> I think that
> ufshcd_wb_ctrl() has been renamed into ufshcd_wb_toggle().
>
> So the deadlock is caused by nested locking - one task holding a
> reader lock, another
> task calling down_write() and next the first task grabbing the reader
> lock recursively?
> I prefer one of the following two solutions above the patch that has
> been posted since
> I expect that both alternatives will result in easier to maintain UFS
> code:
> - Fix the down_read() implementation. Making down_read() wait in case
> of nested locking
> seems wrong to me.
> - Modify the UFS driver such that it does not lock
> hba->clk_scaling_lock recursively.

My current change is the 2nd solution - drop the hba->clk_scaling_lock
before calls ufshcd_wb_toggle() to avoid recursive lock.

Thanks,

Can Guo.

>
> Thanks,
>
> Bart.

2021-09-30 06:07:26

by Adrian Hunter

[permalink] [raw]

Subject: Re: [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

On 30/09/2021 06:57, Can Guo wrote:
> On 2021-09-30 02:15, Bart Van Assche wrote:
>> On 9/28/21 8:31 PM, Can Guo wrote:
>>> On 2021-09-18 01:27, Bart Van Assche wrote:
>>>> On 9/16/21 6:51 PM, Can Guo wrote:
>>>>> Assume a scenario where task A and B call ufshcd_devfreq_scale()
>>>>> simultaneously. After task B calls downgrade_write() [1], but before it
>>>>> calls down_read() [3], if task A calls down_write() [2], when task B calls
>>>>> down_read() [3], it will lead to dead lock.
>>>>
>>>> Something is wrong with the above description. The downgrade_write() call is
>>>> not followed by down_read() but by up_read(). Additionally, I don't see how
>>>> concurrent calls of ufshcd_devfreq_scale() could lead to a deadlock.
>>>
>>> As mentioned in the commit msg, the down_read() [3] is from ufshcd_wb_ctrl().
>>>
>>> Task A -
>>> down_write [2]
>>> ufshcd_clock_scaling_prepare
>>> ufshcd_devfreq_scale
>>> ufshcd_clkscale_enable_store
>>>
>>> Task B -
>>> down_read [3]
>>> ufshcd_exec_dev_cmd
>>> ufshcd_query_flag
>>> ufshcd_wb_ctrl
>>> downgrade_write [1]
>>> ufshcd_devfreq_scale
>>> ufshcd_devfreq_target
>>> devfreq_set_target
>>> update_devfreq
>>> devfreq_performance_handler
>>> governor_store
>>>
>>>
>>>> If one thread calls downgrade_write() and another thread calls down_write()
>>>> immediately, that down_write() call will block until the other thread has called up_read()
>>>> without triggering a deadlock.
>>>
>>> Since the down_write() caller is blocked, the down_read() caller, which comes after
>>> down_write(), is blocked too, no? downgrade_write() keeps lock owner as it is, but
>>> it does not change the fact that readers and writers can be blocked by each other.
>>
>> Please use the upstream function names when posting upstream patches.
>> I think that
>> ufshcd_wb_ctrl() has been renamed into ufshcd_wb_toggle().
>>
>> So the deadlock is caused by nested locking - one task holding a
>> reader lock, another
>> task calling down_write() and next the first task grabbing the reader
>> lock recursively?
>> I prefer one of the following two solutions above the patch that has
>> been posted since
>> I expect that both alternatives will result in easier to maintain UFS code:
>> - Fix the down_read() implementation. Making down_read() wait in case
>> of nested locking
>> seems wrong to me.
>> - Modify the UFS driver such that it does not lock
>> hba->clk_scaling_lock recursively.
>
> My current change is the 2nd solution - drop the hba->clk_scaling_lock
> before calls ufshcd_wb_toggle() to avoid recursive lock.

I have been looking at elevating hba->clk_scaling_lock to be a general purpose
lock for ufshcd. That includes allowing down_read if down_write is held already.
The plan being to hold down_write during the error handler instead of releasing it,
which would plug some gaps in synchronization.

So the locking code would look like this:

diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index a42a289eaef93..63b6a26e3a327 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -898,7 +898,8 @@ struct ufs_hba {
enum bkops_status urgent_bkops_lvl;
bool is_urgent_bkops_lvl_checked;

- struct rw_semaphore clk_scaling_lock;
+ struct rw_semaphore host_rw_sem;
+ struct task_struct *excl_task;
unsigned char desc_size[QUERY_DESC_IDN_MAX];
atomic_t scsi_block_reqs_cnt;

@@ -1418,4 +1419,42 @@ static inline int ufshcd_rpmb_rpm_put(struct ufs_hba *hba)
return pm_runtime_put(&hba->sdev_rpmb->sdev_gendev);
}

+static inline void ufshcd_down_read(struct ufs_hba *hba)
+{
+ if (hba->excl_task != current)
+ down_read(&hba->host_rw_sem);
+}
+
+static inline void ufshcd_up_read(struct ufs_hba *hba)
+{
+ if (hba->excl_task != current)
+ up_read(&hba->host_rw_sem);
+}
+
+static inline int ufshcd_down_read_trylock(struct ufs_hba *hba)
+{
+ if (hba->excl_task == current)
+ return 1;
+
+ return down_read_trylock(&hba->host_rw_sem);
+}
+
+static inline void ufshcd_down_write(struct ufs_hba *hba)
+{
+ down_write(&hba->host_rw_sem);
+ hba->excl_task = current;
+}
+
+static inline void ufshcd_up_write(struct ufs_hba *hba)
+{
+ hba->excl_task = NULL;
+ up_write(&hba->host_rw_sem);
+}
+
+static inline void ufshcd_downgrade_write(struct ufs_hba *hba)
+{
+ hba->excl_task = NULL;
+ downgrade_write(&hba->host_rw_sem);
+}
+
#endif /* End of Header */