2023-03-04 01:41:53

by ZhaoLong Wang

[permalink] [raw]
Subject: [PATCH] ubi: Fix deadlock caused by recursively holding work_sem

During the processing of the bgt, if the sync_erase() return -EBUSY
or some other error code in __erase_worker(),schedule_erase() called
again lead to the down_read(ubi->work_sem) hold twice and may get
block by down_write(ubi->work_sem) in ubi_update_fastmap(),
which cause deadlock.

ubi bgt other task
do_work
down_read(&ubi->work_sem) ubi_update_fastmap
erase_worker # Blocked by down_read
__erase_worker down_write(&ubi->work_sem)
schedule_erase
schedule_ubi_work
down_read(&ubi->work_sem)

Fix this by changing input parameter @nested of the schedule_erase() to
'true' to avoid recursively acquiring the down_read(&ubi->work_sem).

Also, fix the incorrect comment about @nested parameter of the
schedule_erase() because when down_write(ubi->work_sem) is held, the
@nested is also need be true.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093
Fixes: 2e8f08deabbc ("ubi: Fix races around ubi_refill_pools()")
Signed-off-by: ZhaoLong Wang <[email protected]>
---
drivers/mtd/ubi/wl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 40f39e5d6dfc..26a214f016c1 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -575,7 +575,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
* @vol_id: the volume ID that last used this PEB
* @lnum: the last used logical eraseblock number for the PEB
* @torture: if the physical eraseblock has to be tortured
- * @nested: denotes whether the work_sem is already held in read mode
+ * @nested: denotes whether the work_sem is already held
*
* This function returns zero in case of success and a %-ENOMEM in case of
* failure.
@@ -1131,7 +1131,7 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
int err1;

/* Re-schedule the LEB for erasure */
- err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false);
+ err1 = schedule_erase(ubi, e, vol_id, lnum, 0, true);
if (err1) {
spin_lock(&ubi->wl_lock);
wl_entry_destroy(ubi, e);
--
2.31.1



2023-03-04 02:32:39

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH] ubi: Fix deadlock caused by recursively holding work_sem

> During the processing of the bgt, if the sync_erase() return -EBUSY
> or some other error code in __erase_worker(),schedule_erase() called
> again lead to the down_read(ubi->work_sem) hold twice and may get
> block by down_write(ubi->work_sem) in ubi_update_fastmap(),
> which cause deadlock.
>
> ubi bgt other task
> do_work
> down_read(&ubi->work_sem) ubi_update_fastmap
> erase_worker # Blocked by down_read
> __erase_worker down_write(&ubi->work_sem)
> schedule_erase
> schedule_ubi_work
> down_read(&ubi->work_sem)
>
> Fix this by changing input parameter @nested of the schedule_erase() to
> 'true' to avoid recursively acquiring the down_read(&ubi->work_sem).
>
> Also, fix the incorrect comment about @nested parameter of the
> schedule_erase() because when down_write(ubi->work_sem) is held, the
> @nested is also need be true.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093
> Fixes: 2e8f08deabbc ("ubi: Fix races around ubi_refill_pools()")
> Signed-off-by: ZhaoLong Wang <[email protected]>
> ---
> drivers/mtd/ubi/wl.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Zhihao Cheng <[email protected]>
>
> diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
> index 40f39e5d6dfc..26a214f016c1 100644
> --- a/drivers/mtd/ubi/wl.c
> +++ b/drivers/mtd/ubi/wl.c
> @@ -575,7 +575,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
> * @vol_id: the volume ID that last used this PEB
> * @lnum: the last used logical eraseblock number for the PEB
> * @torture: if the physical eraseblock has to be tortured
> - * @nested: denotes whether the work_sem is already held in read mode
> + * @nested: denotes whether the work_sem is already held
> *
> * This function returns zero in case of success and a %-ENOMEM in case of
> * failure.
> @@ -1131,7 +1131,7 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
> int err1;
>
> /* Re-schedule the LEB for erasure */
> - err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false);
> + err1 = schedule_erase(ubi, e, vol_id, lnum, 0, true);
> if (err1) {
> spin_lock(&ubi->wl_lock);
> wl_entry_destroy(ubi, e);
>


2023-03-04 16:59:56

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH] ubi: Fix deadlock caused by recursively holding work_sem

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
>> During the processing of the bgt, if the sync_erase() return -EBUSY
>> or some other error code in __erase_worker(),schedule_erase() called
>> again lead to the down_read(ubi->work_sem) hold twice and may get
>> block by down_write(ubi->work_sem) in ubi_update_fastmap(),
>> which cause deadlock.
>>
>> ubi bgt other task
>> do_work
>> down_read(&ubi->work_sem) ubi_update_fastmap
>> erase_worker # Blocked by down_read
>> __erase_worker down_write(&ubi->work_sem)
>> schedule_erase
>> schedule_ubi_work
>> down_read(&ubi->work_sem)
>>
>> Fix this by changing input parameter @nested of the schedule_erase() to
>> 'true' to avoid recursively acquiring the down_read(&ubi->work_sem).
>>
>> Also, fix the incorrect comment about @nested parameter of the
>> schedule_erase() because when down_write(ubi->work_sem) is held, the
>> @nested is also need be true.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093
>> Fixes: 2e8f08deabbc ("ubi: Fix races around ubi_refill_pools()")
>> Signed-off-by: ZhaoLong Wang <[email protected]>
>> ---
>> drivers/mtd/ubi/wl.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Reviewed-by: Zhihao Cheng <[email protected]>

Applied to -next. Thanks everyone!

Thanks,
//richard