2023-06-28 12:05:38

by Chengfeng Ye

[permalink] [raw]
Subject: [PATCH v2] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
context, other process context code should disable irq or bottom-half
before acquire the same lock, otherwise deadlock could happen if the
timer preempt the execution while the lock is held in process context
on the same CPU.

Possible deadlock scenario
bcm_vk_open()
-> bcm_vk_get_ctx()
-> spin_lock(&vk->ctx_lock)
<timer iterrupt>
-> bcm_vk_hb_poll()
-> bcm_vk_blk_drv_access()
-> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)

This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock, which reported the following
warning when analyzing the linux kernel 6.4-rc7 release.

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_open
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_release
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306

The tentative patch fix the potential deadlock by spin_lock_irqsave().

Signed-off-by: Chengfeng Ye <[email protected]>
---
drivers/misc/bcm-vk/bcm_vk_dev.c | 10 ++++++----
drivers/misc/bcm-vk/bcm_vk_msg.c | 10 ++++++----
2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c b/drivers/misc/bcm-vk/bcm_vk_dev.c
index d4a96137728d..dfe16154b25a 100644
--- a/drivers/misc/bcm-vk/bcm_vk_dev.c
+++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
@@ -503,13 +503,14 @@ static int bcm_vk_sync_card_info(struct bcm_vk *vk)
void bcm_vk_blk_drv_access(struct bcm_vk *vk)
{
int i;
+ unsigned long flags;

/*
* kill all the apps except for the process that is resetting.
* If not called during reset, reset_pid will be 0, and all will be
* killed.
*/
- spin_lock(&vk->ctx_lock);
+ spin_lock_irqsave(&vk->ctx_lock, flags);

/* set msgq_inited to 0 so that all rd/wr will be blocked */
atomic_set(&vk->msgq_inited, 0);
@@ -527,7 +528,7 @@ void bcm_vk_blk_drv_access(struct bcm_vk *vk)
}
}
bcm_vk_tty_terminate_tty_user(vk);
- spin_unlock(&vk->ctx_lock);
+ spin_unlock_irqrestore(&vk->ctx_lock, flags);
}

static void bcm_vk_buf_notify(struct bcm_vk *vk, void *bufp,
@@ -1143,6 +1144,7 @@ static long bcm_vk_reset(struct bcm_vk *vk, struct vk_reset __user *arg)
int ret = 0;
u32 ramdump_reset;
int special_reset;
+ unsigned long flags;

if (copy_from_user(&reset, arg, sizeof(struct vk_reset)))
return -EFAULT;
@@ -1166,7 +1168,7 @@ static long bcm_vk_reset(struct bcm_vk *vk, struct vk_reset __user *arg)
*/
bcm_vk_send_shutdown_msg(vk, VK_SHUTDOWN_GRACEFUL, 0, 0);

- spin_lock(&vk->ctx_lock);
+ spin_lock_irqsave(&vk->ctx_lock, flags);
if (!vk->reset_pid) {
vk->reset_pid = task_pid_nr(current);
} else {
@@ -1174,7 +1176,7 @@ static long bcm_vk_reset(struct bcm_vk *vk, struct vk_reset __user *arg)
vk->reset_pid);
ret = -EACCES;
}
- spin_unlock(&vk->ctx_lock);
+ spin_unlock_irqrestore(&vk->ctx_lock, flags);
if (ret)
goto err_exit;

diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c b/drivers/misc/bcm-vk/bcm_vk_msg.c
index 3c081504f38c..ea887fa97a9e 100644
--- a/drivers/misc/bcm-vk/bcm_vk_msg.c
+++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
@@ -212,8 +212,9 @@ static struct bcm_vk_ctx *bcm_vk_get_ctx(struct bcm_vk *vk, const pid_t pid)
u32 i;
struct bcm_vk_ctx *ctx = NULL;
u32 hash_idx = hash_32(pid, VK_PID_HT_SHIFT_BIT);
+ unsigned long flags;

- spin_lock(&vk->ctx_lock);
+ spin_lock_irqsave(&vk->ctx_lock, flags);

/* check if it is in reset, if so, don't allow */
if (vk->reset_pid) {
@@ -253,7 +254,7 @@ static struct bcm_vk_ctx *bcm_vk_get_ctx(struct bcm_vk *vk, const pid_t pid)

all_in_use_exit:
in_reset_exit:
- spin_unlock(&vk->ctx_lock);
+ spin_unlock_irqrestore(&vk->ctx_lock, flags);

return ctx;
}
@@ -295,6 +296,7 @@ static int bcm_vk_free_ctx(struct bcm_vk *vk, struct bcm_vk_ctx *ctx)
pid_t pid;
struct bcm_vk_ctx *entry;
int count = 0;
+ unsigned long flags;

if (!ctx) {
dev_err(&vk->pdev->dev, "NULL context detected\n");
@@ -303,7 +305,7 @@ static int bcm_vk_free_ctx(struct bcm_vk *vk, struct bcm_vk_ctx *ctx)
idx = ctx->idx;
pid = ctx->pid;

- spin_lock(&vk->ctx_lock);
+ spin_lock_irqsave(&vk->ctx_lock, flags);

if (!vk->ctx[idx].in_use) {
dev_err(&vk->pdev->dev, "context[%d] not in use!\n", idx);
@@ -320,7 +322,7 @@ static int bcm_vk_free_ctx(struct bcm_vk *vk, struct bcm_vk_ctx *ctx)
}
}

- spin_unlock(&vk->ctx_lock);
+ spin_unlock_irqrestore(&vk->ctx_lock, flags);

return count;
}
--
2.17.1



2023-06-28 12:13:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

On Wed, Jun 28, 2023 at 11:29:58AM +0000, Chengfeng Ye wrote:
> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
> context, other process context code should disable irq or bottom-half
> before acquire the same lock, otherwise deadlock could happen if the
> timer preempt the execution while the lock is held in process context
> on the same CPU.
>
> Possible deadlock scenario
> bcm_vk_open()
> -> bcm_vk_get_ctx()
> -> spin_lock(&vk->ctx_lock)
> <timer iterrupt>
> -> bcm_vk_hb_poll()
> -> bcm_vk_blk_drv_access()
> -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>
> This flaw was found using an experimental static analysis tool we are
> developing for irq-related deadlock, which reported the following
> warning when analyzing the linux kernel 6.4-rc7 release.
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_open
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_release
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
>
> The tentative patch fix the potential deadlock by spin_lock_irqsave().
>
> Signed-off-by: Chengfeng Ye <[email protected]>
> ---

You do not mention how you tested to verify that your change is correct
:(


2023-06-28 12:21:02

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

On Wed, Jun 28, 2023, at 13:29, Chengfeng Ye wrote:
> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
> context, other process context code should disable irq or bottom-half
> before acquire the same lock, otherwise deadlock could happen if the
> timer preempt the execution while the lock is held in process context
> on the same CPU.
>
> Possible deadlock scenario
> bcm_vk_open()
> -> bcm_vk_get_ctx()
> -> spin_lock(&vk->ctx_lock)
> <timer iterrupt>
> -> bcm_vk_hb_poll()
> -> bcm_vk_blk_drv_access()
> -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>
> This flaw was found using an experimental static analysis tool we are
> developing for irq-related deadlock, which reported the following
> warning when analyzing the linux kernel 6.4-rc7 release.

The timer function does not seem to be performance critical at all,
it might be nicer to just move it into process context using
a delayed workqueue instead of a timer.

Arnd

2023-06-28 12:37:11

by Chengfeng Ye

[permalink] [raw]
Subject: Re: [PATCH v2] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

Honestly it is not tested dynamically, I only manually check the code
and send those issues I think are more likely to be true and show it
via the patch.

Just let me know If you plan to fix it and I am willing to help with new
patch version.

Best Regards,
Chengfeng

2023-06-29 18:36:46

by Chengfeng Ye

[permalink] [raw]
Subject: Re: [PATCH v2] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

> The timer function does not seem to be performance critical at all,
> it might be nicer to just move it into process context using
> a delayed workqueue instead of a timer.

Thanks for the suggestion, new patch is sent with a delayed workqueue.

Best Regards,
Chengfeng

Arnd Bergmann <[email protected]> 于2023年6月28日周三 19:56写道:
>
> On Wed, Jun 28, 2023, at 13:29, Chengfeng Ye wrote:
> > As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
> > context, other process context code should disable irq or bottom-half
> > before acquire the same lock, otherwise deadlock could happen if the
> > timer preempt the execution while the lock is held in process context
> > on the same CPU.
> >
> > Possible deadlock scenario
> > bcm_vk_open()
> > -> bcm_vk_get_ctx()
> > -> spin_lock(&vk->ctx_lock)
> > <timer iterrupt>
> > -> bcm_vk_hb_poll()
> > -> bcm_vk_blk_drv_access()
> > -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
> >
> > This flaw was found using an experimental static analysis tool we are
> > developing for irq-related deadlock, which reported the following
> > warning when analyzing the linux kernel 6.4-rc7 release.
>
> The timer function does not seem to be performance critical at all,
> it might be nicer to just move it into process context using
> a delayed workqueue instead of a timer.
>
> Arnd