2023-11-21 07:09:19

by SEO HOYOUNG

[permalink] [raw]
Subject: [PATCH v3] scsi: ufs: core: fix racing issue during ufshcd_mcq_abort

If cq complete irq raise during abort processing,
the command has already been complete.
So could not get utag to erase cmd like below log.
Because the cmd that was handling abort has already been completed

ufshcd_try_to_abort_task: cmd pending in the device. tag = 25
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000194
Mem abort info:
ESR = 0x0000000096000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0

pc : blk_mq_unique_tag+0x8/0x14
lr : ufshcd_mcq_sq_cleanup+0x6c/0x1b8
sp : ffffffc03e3b3b10
x29: ffffffc03e3b3b10 x28: 0000000000000001 x27: ffffff8830b34f68
x26: ffffff8830b34f6c x25: ffffff8830b34040 x24: 0000000000000000
x23: 0000000000000f18 x22: ffffffc03e3b3bb8 x21: 0000000000000019
x20: 0000000000000019 x19: ffffff8830b309b0 x18: ffffffc00a1b5380
x17: 00000000529c6ef0 x16: 00000000529c6ef0 x15: 0000000000000000
x14: 0000000000000010 x13: 0000000000000032 x12: 0000001169e8a5bc
x11: 0000000000000001 x10: ffffff885dfc1588 x9 : 0000000000000019
x8 : 0000000000000000 x7 : 0000000000000001 x6 : fffffffdef706f28
x5 : 000000000000283d x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000003 x1 : 0000000000000019 x0 : ffffff8855781200
Call trace:
blk_mq_unique_tag+0x8/0x14
ufshcd_clear_cmd+0x34/0x118
ufshcd_try_to_abort_task+0x1c4/0x4b0
ufshcd_err_handler+0x8d0/0xd24
process_one_work+0x1e4/0x43c
worker_thread+0x25c/0x430
kthread+0x104/0x1d4
ret_from_fork+0x10/0x20

v1 -> v2: fix build error

v2 -> v3: move to ufshcd_mcq_sq_cleanup() function

Bart said that lrbp->cmd could be changed before ufshcd_clear_cmd() was
called, so lrbp->cmd check was moved to ufshcd_clear_cmd().
In the case of legacy mode, spin_lock is used to protect before clear cmd,
but spin_lock cannot be used due to mcq mode, so it is necessary to check
the status of lrbp->cmd.

Change-Id: Id8412190e60286d00a30820591566835cefbf47e
Signed-off-by: SEO HOYOUNG <[email protected]>
---
drivers/ufs/core/ufs-mcq.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 2ba8ec254dce..deb6dac724c8 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -507,6 +507,10 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
if (hba->quirks & UFSHCD_QUIRK_MCQ_BROKEN_RTC)
return -ETIMEDOUT;

+ if (!ufshcd_cmd_inflight(cmd) ||
+ test_bit(SCMD_STATE_COMPLETE, &cmd->state))
+ return 0;
+
if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
if (!cmd)
return -EINVAL;
--
2.26.0


2023-11-21 17:57:50

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v3] scsi: ufs: core: fix racing issue during ufshcd_mcq_abort

On 11/20/23 23:11, SEO HOYOUNG wrote:
> Bart said that lrbp->cmd could be changed before ufshcd_clear_cmd() was
> called, so lrbp->cmd check was moved to ufshcd_clear_cmd().
> In the case of legacy mode, spin_lock is used to protect before clear cmd,
> but spin_lock cannot be used due to mcq mode, so it is necessary to check
> the status of lrbp->cmd.

Does this mean that the race that I mentioned has not been addressed at all?
ufshcd_mcq_sq_cleanup() is called by ufshcd_clear_cmd(). No locks are held by
ufshcd_eh_device_reset_handler() when it calls ufshcd_clear_cmd(). So I think
there is still a race between the code added by this patch and the completion
interrupt.

Thanks,

Bart.

> Change-Id: Id8412190e60286d00a30820591566835cefbf47e

No Change-Ids in patches that are posted on upstream mailing lists please.

> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index 2ba8ec254dce..deb6dac724c8 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -507,6 +507,10 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
> if (hba->quirks & UFSHCD_QUIRK_MCQ_BROKEN_RTC)
> return -ETIMEDOUT;
>
> + if (!ufshcd_cmd_inflight(cmd) ||
> + test_bit(SCMD_STATE_COMPLETE, &cmd->state))
> + return 0;
> +
> if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
> if (!cmd)
> return -EINVAL;

Thanks,

Bart.

2023-11-22 09:23:45

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH v3] scsi: ufs: core: fix racing issue during ufshcd_mcq_abort

Hi SEO,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/SEO-HOYOUNG/scsi-ufs-core-fix-racing-issue-during-ufshcd_mcq_abort/20231121-151923
base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
patch link: https://lore.kernel.org/r/20231121071128.7743-1-hy50.seo%40samsung.com
patch subject: [PATCH v3] scsi: ufs: core: fix racing issue during ufshcd_mcq_abort
config: powerpc-randconfig-r071-20231122 (https://download.01.org/0day-ci/archive/20231122/[email protected]/config)
compiler: powerpc-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20231122/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Reported-by: Dan Carpenter <[email protected]>
| Closes: https://lore.kernel.org/r/[email protected]/

smatch warnings:
drivers/ufs/core/ufs-mcq.c:515 ufshcd_mcq_sq_cleanup() warn: variable dereferenced before check 'cmd' (see line 511)

vim +/cmd +515 drivers/ufs/core/ufs-mcq.c

8d7290348992f2 Bao D. Nguyen 2023-05-29 498 int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
8d7290348992f2 Bao D. Nguyen 2023-05-29 499 {
8d7290348992f2 Bao D. Nguyen 2023-05-29 500 struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
8d7290348992f2 Bao D. Nguyen 2023-05-29 501 struct scsi_cmnd *cmd = lrbp->cmd;
8d7290348992f2 Bao D. Nguyen 2023-05-29 502 struct ufs_hw_queue *hwq;
8d7290348992f2 Bao D. Nguyen 2023-05-29 503 void __iomem *reg, *opr_sqd_base;
8d7290348992f2 Bao D. Nguyen 2023-05-29 504 u32 nexus, id, val;
8d7290348992f2 Bao D. Nguyen 2023-05-29 505 int err;
8d7290348992f2 Bao D. Nguyen 2023-05-29 506
aa9d5d0015a8b7 Po-Wen Kao 2023-06-12 507 if (hba->quirks & UFSHCD_QUIRK_MCQ_BROKEN_RTC)
aa9d5d0015a8b7 Po-Wen Kao 2023-06-12 508 return -ETIMEDOUT;
aa9d5d0015a8b7 Po-Wen Kao 2023-06-12 509
5363c9d813101c SEO HOYOUNG 2023-11-21 510 if (!ufshcd_cmd_inflight(cmd) ||
5363c9d813101c SEO HOYOUNG 2023-11-21 @511 test_bit(SCMD_STATE_COMPLETE, &cmd->state))
^^^^^^^^^^^
The patch adds a new unchecked dereference

5363c9d813101c SEO HOYOUNG 2023-11-21 512 return 0;
5363c9d813101c SEO HOYOUNG 2023-11-21 513
8d7290348992f2 Bao D. Nguyen 2023-05-29 514 if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
8d7290348992f2 Bao D. Nguyen 2023-05-29 @515 if (!cmd)
^^^
But the old code assumed "cmd" could be NULL

8d7290348992f2 Bao D. Nguyen 2023-05-29 516 return -EINVAL;
8d7290348992f2 Bao D. Nguyen 2023-05-29 517 hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
8d7290348992f2 Bao D. Nguyen 2023-05-29 518 } else {
8d7290348992f2 Bao D. Nguyen 2023-05-29 519 hwq = hba->dev_cmd_queue;
8d7290348992f2 Bao D. Nguyen 2023-05-29 520 }

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki