2023-06-29 18:33:31

by Chengfeng Ye

[permalink] [raw]
Subject: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
context, other process context code should disable irq or bottom-half
before acquire the same lock, otherwise deadlock could happen if the
timer preempt the execution while the lock is held in process context
on the same CPU.

Possible deadlock scenario
bcm_vk_open()
-> bcm_vk_get_ctx()
-> spin_lock(&vk->ctx_lock)
<timer iterrupt>
-> bcm_vk_hb_poll()
-> bcm_vk_blk_drv_access()
-> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)

This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock, which reported the following
warning when analyzing the linux kernel 6.4-rc7 release.

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_open
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_release
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306

As suggested by Arnd, the tentative patch fix the potential deadlocks
by replacing the timer with delay workqueue. x86_64 allyesconfig using
GCC shows no new warning. Note that no runtime testing was performed
due to no device on hand.

Signed-off-by: Chengfeng Ye <[email protected]>
---
drivers/misc/bcm-vk/bcm_vk.h | 2 +-
drivers/misc/bcm-vk/bcm_vk_msg.c | 14 +++++++-------
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/bcm-vk/bcm_vk.h b/drivers/misc/bcm-vk/bcm_vk.h
index 25d51222eedf..386884c2a263 100644
--- a/drivers/misc/bcm-vk/bcm_vk.h
+++ b/drivers/misc/bcm-vk/bcm_vk.h
@@ -340,7 +340,7 @@ struct bcm_vk_proc_mon_info {
};

struct bcm_vk_hb_ctrl {
- struct timer_list timer;
+ struct delayed_work work;
u32 last_uptime;
u32 lost_cnt;
};
diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c b/drivers/misc/bcm-vk/bcm_vk_msg.c
index 3c081504f38c..e17d81231ea6 100644
--- a/drivers/misc/bcm-vk/bcm_vk_msg.c
+++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
@@ -137,11 +137,11 @@ void bcm_vk_set_host_alert(struct bcm_vk *vk, u32 bit_mask)
#define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ)
#define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S)

-static void bcm_vk_hb_poll(struct timer_list *t)
+static void bcm_vk_hb_poll(struct work_struct *work)
{
u32 uptime_s;
- struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl,
- timer);
+ struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work), struct bcm_vk_hb_ctrl,
+ work);
struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl);

if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) {
@@ -177,22 +177,22 @@ static void bcm_vk_hb_poll(struct timer_list *t)
bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL);
}
/* re-arm timer */
- mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
+ schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
}

void bcm_vk_hb_init(struct bcm_vk *vk)
{
struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;

- timer_setup(&hb->timer, bcm_vk_hb_poll, 0);
- mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
+ INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll);
+ schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
}

void bcm_vk_hb_deinit(struct bcm_vk *vk)
{
struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;

- del_timer(&hb->timer);
+ cancel_delayed_work_sync(&hb->work);
}

static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,
--
2.17.1



2023-06-29 18:55:48

by Scott Branden

[permalink] [raw]
Subject: Re: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

Hi Chengfeng,

Thanks for the work on this. We need some time to review and test.

On 2023-06-29 11:29, Chengfeng Ye wrote:
> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
> context, other process context code should disable irq or bottom-half
> before acquire the same lock, otherwise deadlock could happen if the
> timer preempt the execution while the lock is held in process context
> on the same CPU.
>
> Possible deadlock scenario
> bcm_vk_open()
> -> bcm_vk_get_ctx()
> -> spin_lock(&vk->ctx_lock)
> <timer iterrupt>
> -> bcm_vk_hb_poll()
> -> bcm_vk_blk_drv_access()
> -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>
> This flaw was found using an experimental static analysis tool we are
> developing for irq-related deadlock, which reported the following
> warning when analyzing the linux kernel 6.4-rc7 release.
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_open
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_release
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
>
> As suggested by Arnd, the tentative patch fix the potential deadlocks
> by replacing the timer with delay workqueue. x86_64 allyesconfig using
> GCC shows no new warning. Note that no runtime testing was performed
> due to no device on hand.
>
> Signed-off-by: Chengfeng Ye <[email protected]>
> ---
> drivers/misc/bcm-vk/bcm_vk.h | 2 +-
> drivers/misc/bcm-vk/bcm_vk_msg.c | 14 +++++++-------
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/misc/bcm-vk/bcm_vk.h b/drivers/misc/bcm-vk/bcm_vk.h
> index 25d51222eedf..386884c2a263 100644
> --- a/drivers/misc/bcm-vk/bcm_vk.h
> +++ b/drivers/misc/bcm-vk/bcm_vk.h
> @@ -340,7 +340,7 @@ struct bcm_vk_proc_mon_info {
> };
>
> struct bcm_vk_hb_ctrl {
> - struct timer_list timer;
> + struct delayed_work work;
> u32 last_uptime;
> u32 lost_cnt;
> };
> diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c b/drivers/misc/bcm-vk/bcm_vk_msg.c
> index 3c081504f38c..e17d81231ea6 100644
> --- a/drivers/misc/bcm-vk/bcm_vk_msg.c
> +++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
> @@ -137,11 +137,11 @@ void bcm_vk_set_host_alert(struct bcm_vk *vk, u32 bit_mask)
> #define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ)
> #define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S)
>
> -static void bcm_vk_hb_poll(struct timer_list *t)
> +static void bcm_vk_hb_poll(struct work_struct *work)
> {
> u32 uptime_s;
> - struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl,
> - timer);
> + struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work), struct bcm_vk_hb_ctrl,
> + work);
> struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl);
>
> if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) {
> @@ -177,22 +177,22 @@ static void bcm_vk_hb_poll(struct timer_list *t)
> bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL);
> }
> /* re-arm timer */
> - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
> + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
> }
>
> void bcm_vk_hb_init(struct bcm_vk *vk)
> {
> struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>
> - timer_setup(&hb->timer, bcm_vk_hb_poll, 0);
> - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
> + INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll);
> + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
> }
>
> void bcm_vk_hb_deinit(struct bcm_vk *vk)
> {
> struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>
> - del_timer(&hb->timer);
> + cancel_delayed_work_sync(&hb->work);
> }
>
> static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-07-19 17:19:21

by Scott Branden

[permalink] [raw]
Subject: Re: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

Works fine - thanks.

On 2023-06-29 11:29, Chengfeng Ye wrote:
> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
> context, other process context code should disable irq or bottom-half
> before acquire the same lock, otherwise deadlock could happen if the
> timer preempt the execution while the lock is held in process context
> on the same CPU.
>
> Possible deadlock scenario
> bcm_vk_open()
> -> bcm_vk_get_ctx()
> -> spin_lock(&vk->ctx_lock)
> <timer iterrupt>
> -> bcm_vk_hb_poll()
> -> bcm_vk_blk_drv_access()
> -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>
> This flaw was found using an experimental static analysis tool we are
> developing for irq-related deadlock, which reported the following
> warning when analyzing the linux kernel 6.4-rc7 release.
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_open
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_release
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
>
> As suggested by Arnd, the tentative patch fix the potential deadlocks
> by replacing the timer with delay workqueue. x86_64 allyesconfig using
> GCC shows no new warning. Note that no runtime testing was performed
> due to no device on hand.
>
> Signed-off-by: Chengfeng Ye <[email protected]>
Acked-by: Scott Branden <[email protected]>
Tested-by: Desmond Yan <[email protected]>

> ---
> drivers/misc/bcm-vk/bcm_vk.h | 2 +-
> drivers/misc/bcm-vk/bcm_vk_msg.c | 14 +++++++-------
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/misc/bcm-vk/bcm_vk.h b/drivers/misc/bcm-vk/bcm_vk.h
> index 25d51222eedf..386884c2a263 100644
> --- a/drivers/misc/bcm-vk/bcm_vk.h
> +++ b/drivers/misc/bcm-vk/bcm_vk.h
> @@ -340,7 +340,7 @@ struct bcm_vk_proc_mon_info {
> };
>
> struct bcm_vk_hb_ctrl {
> - struct timer_list timer;
> + struct delayed_work work;
> u32 last_uptime;
> u32 lost_cnt;
> };
> diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c b/drivers/misc/bcm-vk/bcm_vk_msg.c
> index 3c081504f38c..e17d81231ea6 100644
> --- a/drivers/misc/bcm-vk/bcm_vk_msg.c
> +++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
> @@ -137,11 +137,11 @@ void bcm_vk_set_host_alert(struct bcm_vk *vk, u32 bit_mask)
> #define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ)
> #define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S)
>
> -static void bcm_vk_hb_poll(struct timer_list *t)
> +static void bcm_vk_hb_poll(struct work_struct *work)
> {
> u32 uptime_s;
> - struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl,
> - timer);
> + struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work), struct bcm_vk_hb_ctrl,
> + work);
> struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl);
>
> if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) {
> @@ -177,22 +177,22 @@ static void bcm_vk_hb_poll(struct timer_list *t)
> bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL);
> }
> /* re-arm timer */
> - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
> + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
> }
>
> void bcm_vk_hb_init(struct bcm_vk *vk)
> {
> struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>
> - timer_setup(&hb->timer, bcm_vk_hb_poll, 0);
> - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
> + INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll);
> + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
> }
>
> void bcm_vk_hb_deinit(struct bcm_vk *vk)
> {
> struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>
> - del_timer(&hb->timer);
> + cancel_delayed_work_sync(&hb->work);
> }
>
> static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-07-19 17:33:32

by Scott Branden

[permalink] [raw]
Subject: Re: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

Works fine. Corrected Acked-by and Tested-by below.

On 2023-06-29 11:29, Chengfeng Ye wrote:
> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
> context, other process context code should disable irq or bottom-half
> before acquire the same lock, otherwise deadlock could happen if the
> timer preempt the execution while the lock is held in process context
> on the same CPU.
>
> Possible deadlock scenario
> bcm_vk_open()
> -> bcm_vk_get_ctx()
> -> spin_lock(&vk->ctx_lock)
> <timer iterrupt>
> -> bcm_vk_hb_poll()
> -> bcm_vk_blk_drv_access()
> -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>
> This flaw was found using an experimental static analysis tool we are
> developing for irq-related deadlock, which reported the following
> warning when analyzing the linux kernel 6.4-rc7 release.
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_ioctl
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_open
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
>
> [Deadlock]: &vk->ctx_lock
> [Interrupt]: bcm_vk_hb_poll
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
> [Locking Unit]: bcm_vk_release
> -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
>
> As suggested by Arnd, the tentative patch fix the potential deadlocks
> by replacing the timer with delay workqueue. x86_64 allyesconfig using
> GCC shows no new warning. Note that no runtime testing was performed
> due to no device on hand.
>
> Signed-off-by: Chengfeng Ye <[email protected]>
Acked-by: Scott Branden <[email protected]>
Tested-by: Desmond Yan <[email protected]>

> ---
> drivers/misc/bcm-vk/bcm_vk.h | 2 +-
> drivers/misc/bcm-vk/bcm_vk_msg.c | 14 +++++++-------
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/misc/bcm-vk/bcm_vk.h b/drivers/misc/bcm-vk/bcm_vk.h
> index 25d51222eedf..386884c2a263 100644
> --- a/drivers/misc/bcm-vk/bcm_vk.h
> +++ b/drivers/misc/bcm-vk/bcm_vk.h
> @@ -340,7 +340,7 @@ struct bcm_vk_proc_mon_info {
> };
>
> struct bcm_vk_hb_ctrl {
> - struct timer_list timer;
> + struct delayed_work work;
> u32 last_uptime;
> u32 lost_cnt;
> };
> diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c b/drivers/misc/bcm-vk/bcm_vk_msg.c
> index 3c081504f38c..e17d81231ea6 100644
> --- a/drivers/misc/bcm-vk/bcm_vk_msg.c
> +++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
> @@ -137,11 +137,11 @@ void bcm_vk_set_host_alert(struct bcm_vk *vk, u32 bit_mask)
> #define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ)
> #define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S)
>
> -static void bcm_vk_hb_poll(struct timer_list *t)
> +static void bcm_vk_hb_poll(struct work_struct *work)
> {
> u32 uptime_s;
> - struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl,
> - timer);
> + struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work), struct bcm_vk_hb_ctrl,
> + work);
> struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl);
>
> if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) {
> @@ -177,22 +177,22 @@ static void bcm_vk_hb_poll(struct timer_list *t)
> bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL);
> }
> /* re-arm timer */
> - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
> + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
> }
>
> void bcm_vk_hb_init(struct bcm_vk *vk)
> {
> struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>
> - timer_setup(&hb->timer, bcm_vk_hb_poll, 0);
> - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
> + INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll);
> + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
> }
>
> void bcm_vk_hb_deinit(struct bcm_vk *vk)
> {
> struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>
> - del_timer(&hb->timer);
> + cancel_delayed_work_sync(&hb->work);
> }
>
> static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-07-19 17:51:54

by Ray Jui

[permalink] [raw]
Subject: Re: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock



On 7/19/2023 10:07 AM, Scott Branden wrote:
> Works fine - thanks.

So apparently the choice of using the timer previously was not due to
performance reasons?

If performance is a concern by converting to use workqueue (now it runs
in process/thread context than softirq), I assume you are aware of
another easy way to fix this potential deadlock issue? :)
>
> On 2023-06-29 11:29, Chengfeng Ye wrote:
>> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
>> context, other process context code should disable irq or bottom-half
>> before acquire the same lock, otherwise deadlock could happen if the
>> timer preempt the execution while the lock is held in process context
>> on the same CPU.
>>
>> Possible deadlock scenario
>> bcm_vk_open()
>>      -> bcm_vk_get_ctx()
>>      -> spin_lock(&vk->ctx_lock)
>>     <timer iterrupt>
>>     -> bcm_vk_hb_poll()
>>     -> bcm_vk_blk_drv_access()
>>     -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>>
>> This flaw was found using an experimental static analysis tool we are
>> developing for irq-related deadlock, which reported the following
>> warning when analyzing the linux kernel 6.4-rc7 release.
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_ioctl
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_ioctl
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_open
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_release
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
>>
>> As suggested by Arnd, the tentative patch fix the potential deadlocks
>> by replacing the timer with delay workqueue. x86_64 allyesconfig using
>> GCC shows no new warning. Note that no runtime testing was performed
>> due to no device on hand.
>>
>> Signed-off-by: Chengfeng Ye <[email protected]>
> Acked-by: Scott Branden <[email protected]>
> Tested-by: Desmond Yan <[email protected]>
>
>> ---
>>   drivers/misc/bcm-vk/bcm_vk.h     |  2 +-
>>   drivers/misc/bcm-vk/bcm_vk_msg.c | 14 +++++++-------
>>   2 files changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/misc/bcm-vk/bcm_vk.h b/drivers/misc/bcm-vk/bcm_vk.h
>> index 25d51222eedf..386884c2a263 100644
>> --- a/drivers/misc/bcm-vk/bcm_vk.h
>> +++ b/drivers/misc/bcm-vk/bcm_vk.h
>> @@ -340,7 +340,7 @@ struct bcm_vk_proc_mon_info {
>>   };
>>     struct bcm_vk_hb_ctrl {
>> -    struct timer_list timer;
>> +    struct delayed_work work;
>>       u32 last_uptime;
>>       u32 lost_cnt;
>>   };
>> diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c
>> b/drivers/misc/bcm-vk/bcm_vk_msg.c
>> index 3c081504f38c..e17d81231ea6 100644
>> --- a/drivers/misc/bcm-vk/bcm_vk_msg.c
>> +++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
>> @@ -137,11 +137,11 @@ void bcm_vk_set_host_alert(struct bcm_vk *vk,
>> u32 bit_mask)
>>   #define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ)
>>   #define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S)
>>   -static void bcm_vk_hb_poll(struct timer_list *t)
>> +static void bcm_vk_hb_poll(struct work_struct *work)
>>   {
>>       u32 uptime_s;
>> -    struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl,
>> -                         timer);
>> +    struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work),
>> struct bcm_vk_hb_ctrl,
>> +                         work);
>>       struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl);
>>         if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) {
>> @@ -177,22 +177,22 @@ static void bcm_vk_hb_poll(struct timer_list *t)
>>           bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL);
>>       }
>>       /* re-arm timer */
>> -    mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
>> +    schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
>>   }
>>     void bcm_vk_hb_init(struct bcm_vk *vk)
>>   {
>>       struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>>   -    timer_setup(&hb->timer, bcm_vk_hb_poll, 0);
>> -    mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
>> +    INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll);
>> +    schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
>>   }
>>     void bcm_vk_hb_deinit(struct bcm_vk *vk)
>>   {
>>       struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>>   -    del_timer(&hb->timer);
>> +    cancel_delayed_work_sync(&hb->work);
>>   }
>>     static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,


Attachments:
smime.p7s (4.10 kB)
S/MIME Cryptographic Signature

2023-07-19 18:34:22

by Scott Branden

[permalink] [raw]
Subject: Re: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock



On 2023-07-19 10:31, Ray Jui wrote:
>
>
> On 7/19/2023 10:07 AM, Scott Branden wrote:
>> Works fine - thanks.
>
> So apparently the choice of using the timer previously was not due to
> performance reasons?
Correct - the heartbeat does not need high performance.

>
> If performance is a concern by converting to use workqueue (now it runs
> in process/thread context than softirq), I assume you are aware of
> another easy way to fix this potential deadlock issue? :)
Performance not a concern.

>>
>> On 2023-06-29 11:29, Chengfeng Ye wrote:
>>> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
>>> context, other process context code should disable irq or bottom-half
>>> before acquire the same lock, otherwise deadlock could happen if the
>>> timer preempt the execution while the lock is held in process context
>>> on the same CPU.
>>>
>>> Possible deadlock scenario
>>> bcm_vk_open()
>>>      -> bcm_vk_get_ctx()
>>>      -> spin_lock(&vk->ctx_lock)
>>>     <timer iterrupt>
>>>     -> bcm_vk_hb_poll()
>>>     -> bcm_vk_blk_drv_access()
>>>     -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>>>
>>> This flaw was found using an experimental static analysis tool we are
>>> developing for irq-related deadlock, which reported the following
>>> warning when analyzing the linux kernel 6.4-rc7 release.
>>>
>>> [Deadlock]: &vk->ctx_lock
>>>    [Interrupt]: bcm_vk_hb_poll
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>>    [Locking Unit]: bcm_vk_ioctl
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>>
>>> [Deadlock]: &vk->ctx_lock
>>>    [Interrupt]: bcm_vk_hb_poll
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>>    [Locking Unit]: bcm_vk_ioctl
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
>>>
>>> [Deadlock]: &vk->ctx_lock
>>>    [Interrupt]: bcm_vk_hb_poll
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>>    [Locking Unit]: bcm_vk_open
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
>>>
>>> [Deadlock]: &vk->ctx_lock
>>>    [Interrupt]: bcm_vk_hb_poll
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>>    [Locking Unit]: bcm_vk_release
>>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
>>>
>>> As suggested by Arnd, the tentative patch fix the potential deadlocks
>>> by replacing the timer with delay workqueue. x86_64 allyesconfig using
>>> GCC shows no new warning. Note that no runtime testing was performed
>>> due to no device on hand.
>>>
>>> Signed-off-by: Chengfeng Ye <[email protected]>
>> Acked-by: Scott Branden <[email protected]>
>> Tested-by: Desmond Yan <[email protected]>
>>
>>> ---
>>>   drivers/misc/bcm-vk/bcm_vk.h     |  2 +-
>>>   drivers/misc/bcm-vk/bcm_vk_msg.c | 14 +++++++-------
>>>   2 files changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/misc/bcm-vk/bcm_vk.h b/drivers/misc/bcm-vk/bcm_vk.h
>>> index 25d51222eedf..386884c2a263 100644
>>> --- a/drivers/misc/bcm-vk/bcm_vk.h
>>> +++ b/drivers/misc/bcm-vk/bcm_vk.h
>>> @@ -340,7 +340,7 @@ struct bcm_vk_proc_mon_info {
>>>   };
>>>     struct bcm_vk_hb_ctrl {
>>> -    struct timer_list timer;
>>> +    struct delayed_work work;
>>>       u32 last_uptime;
>>>       u32 lost_cnt;
>>>   };
>>> diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c
>>> b/drivers/misc/bcm-vk/bcm_vk_msg.c
>>> index 3c081504f38c..e17d81231ea6 100644
>>> --- a/drivers/misc/bcm-vk/bcm_vk_msg.c
>>> +++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
>>> @@ -137,11 +137,11 @@ void bcm_vk_set_host_alert(struct bcm_vk *vk,
>>> u32 bit_mask)
>>>   #define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ)
>>>   #define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S)
>>>   -static void bcm_vk_hb_poll(struct timer_list *t)
>>> +static void bcm_vk_hb_poll(struct work_struct *work)
>>>   {
>>>       u32 uptime_s;
>>> -    struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl,
>>> -                         timer);
>>> +    struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work),
>>> struct bcm_vk_hb_ctrl,
>>> +                         work);
>>>       struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl);
>>>         if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) {
>>> @@ -177,22 +177,22 @@ static void bcm_vk_hb_poll(struct timer_list *t)
>>>           bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL);
>>>       }
>>>       /* re-arm timer */
>>> -    mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
>>> +    schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
>>>   }
>>>     void bcm_vk_hb_init(struct bcm_vk *vk)
>>>   {
>>>       struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>>>   -    timer_setup(&hb->timer, bcm_vk_hb_poll, 0);
>>> -    mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
>>> +    INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll);
>>> +    schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
>>>   }
>>>     void bcm_vk_hb_deinit(struct bcm_vk *vk)
>>>   {
>>>       struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>>>   -    del_timer(&hb->timer);
>>> +    cancel_delayed_work_sync(&hb->work);
>>>   }
>>>     static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-07-20 07:59:53

by Chengfeng Ye

[permalink] [raw]
Subject: Re: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

> Works fine. Corrected Acked-by and Tested-by below.
> Tested-by: Desmond Yan <[email protected]>
> Acked-by: Scott Branden <[email protected]>

Thanks much for your effort in reviewing and testing the patch.

Best Regards,
Chengfeng