LinuxLists.cc - [PATCH] kdb: Fix the deadlock issue in KDB debugging.

2024-02-28 05:12:20

Subject: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

master cpu : After executing the go command, a deadlock occurs.
slave cpu: may be performing thread migration,
acquiring the running queue lock of master CPU.
Then it was interrupted by kdb NMI and entered the nmi_handler process.
(nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
while(1){ touch wathcdog}.)

example:
BUG: spinlock lockup suspected on CPU#0, namex/10450
lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
[<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
[<ffffffff8147a7fc>] ? schedule+0x3c/0x90
[<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
[<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
[<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
[<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
[<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
[<ffffffff8108c540>] ? wake_up_q+0x70/0x70
[<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
[<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
CPU: 0 PID: 10450 Comm: namex Tainted: G O 4.4.65 #1
Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
Call Trace:
<#DB> [<ffffffff8124e883>] dump_stack+0x85/0xc2
[<ffffffff810a7f7f>] spin_dump+0x7f/0x100
[<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
[<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
[<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
[<ffffffff8108c4c5>] wake_up_process+0x15/0x20
[<ffffffff8107b371>] insert_work+0x81/0xc0
[<ffffffff8107b4e5>] __queue_work+0x135/0x390
[<ffffffff8107b786>] queue_work_on+0x46/0x90
[<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
[<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
[<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
[<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
[<ffffffff81054eb5>] kgdb_notify+0x35/0x70
[<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
[<ffffffff8108304d>] notify_die+0x3d/0x50
[<ffffffff81017219>] do_int3+0x89/0x120
[<ffffffff81480fb4>] int3+0x44/0x80

Signed-off-by: LiuYe <[email protected]>
---
drivers/tty/serial/kgdboc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..945318ef1 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,9 @@
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/serial_core.h>
+#include <linux/smp.h>
+
+#include "../kernel/sched/sched.h"

#define MAX_CONFIG_LEN 40

@@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void)
dbg_restore_graphics = 0;
con_debug_leave();
}
- kgdboc_restore_input();
+ if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock)))
+ kgdboc_restore_input();
}

static struct kgdb_io kgdboc_io_ops = {
--
2.25.1

-------------------------------------------------------------------------------------------------------------------------------------
???ʼ????丽???????»??????ŵı?????Ϣ???????ڷ??͸???????ַ???г?
?ĸ??˻?Ⱥ?顣??ֹ?κ??????????κ???ʽʹ?ã?????????????ȫ???????ֵ?й¶?????ơ?
??ɢ???????ʼ??е???Ϣ?????????????˱??ʼ????????????绰???ʼ?֪ͨ?????˲?ɾ????
?ʼ???
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

2024-02-28 12:12:45

by Daniel Thompson

[permalink] [raw]

Subject: Re: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> master cpu : After executing the go command, a deadlock occurs.
> slave cpu: may be performing thread migration,
> acquiring the running queue lock of master CPU.
> Then it was interrupted by kdb NMI and entered the nmi_handler process.
> (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
> while(1){ touch wathcdog}.)

I think this description is a little short and doesn't clearly explain
the cause. How about:

Currently, if kgdboc includes 'kdb', then kgdboc will attempt to
use schedule_work() to provoke a keyboard reset when transitioning out
of the debugger and back to normal operation. This can cause
deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slace CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

> example:
> BUG: spinlock lockup suspected on CPU#0, namex/10450
> lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
> ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
> ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
> [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
> [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
> [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
> [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
> [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
> [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
> [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
> [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
> [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
> [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
> CPU: 0 PID: 10450 Comm: namex Tainted: G O 4.4.65 #1
> Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
> 0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
> ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
> 000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
> Call Trace:
> <#DB> [<ffffffff8124e883>] dump_stack+0x85/0xc2
> [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
> [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
> [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
> [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
> [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
> [<ffffffff8107b371>] insert_work+0x81/0xc0
> [<ffffffff8107b4e5>] __queue_work+0x135/0x390
> [<ffffffff8107b786>] queue_work_on+0x46/0x90
> [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
> [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
> [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
> [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
> [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
> [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
> [<ffffffff8108304d>] notify_die+0x3d/0x50
> [<ffffffff81017219>] do_int3+0x89/0x120
> [<ffffffff81480fb4>] int3+0x44/0x80
>
> Signed-off-by: LiuYe <[email protected]>
> ---
> drivers/tty/serial/kgdboc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index 7ce7bb164..945318ef1 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -22,6 +22,9 @@
> #include <linux/module.h>
> #include <linux/platform_device.h>
> #include <linux/serial_core.h>
> +#include <linux/smp.h>
> +
> +#include "../kernel/sched/sched.h"
>
> #define MAX_CONFIG_LEN 40
>
> @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void)
> dbg_restore_graphics = 0;
> con_debug_leave();
> }
> - kgdboc_restore_input();
> + if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock)))
> + kgdboc_restore_input();

I don't think solving this by access internal scheduler state is the
right approach .

The description I wrote above perhaps already suggests why. The
deadlock occurs because it is unsafe to call schedule_work() from
the debug trap handler. The debug trap handler in your stack trace is not
running from an NMI but it certainly has NMI-like properties. Therefore
a better fix is not to call schedule_work() at all from the debug trap
handler.

Instead we need to use an NMI-safe API such as irq_work_queue() and that
irq_work can call schedule_work() and trigger the keyboard reset.

Daniel.

2024-03-01 03:32:16

Subject: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: 答复 : [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: [PATCH] kdb: Fix the deadlock iss ue in KDB debugging.

Subject: Re: 答复: 答复 : [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlo ck issue in KDB debugging.

Subject: Re: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: 答复: 答复: 答复: 答复 : [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: 答复: 答复: 答复: [PATCH] kdb: Fi x the deadlock issue in KDB debugging.

Subject: Re: 答复: 答复: 答复: 答复: 答复 : [PATCH] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: 答复: 答复: 答复: 答复: [PATCH] k db: Fix the deadlock issue in KDB debugging.

Subject: [PATCH v1] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH v1] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: 答复 : [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: [PATCH V4] kdb: Fix the deadlock issu e in KDB debugging.

Subject: Re: 答复 : [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V5] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V5] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: [PATCH V5] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V6] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V6] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V7] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V7] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V9] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: 答复 : [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: [PATCH V10] kdb: Fix the dead lock issue in KDB debugging.

Subject: Re: 答复: 答复 : [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: [PATCH V11] kdb: Fix the deadlock issue in KDB debugging.

Subject: Re: 答复: 答复: 答复 : [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.

Subject: 答复: 答复: 答复: 答复: [PATCH V10] k db: Fix the deadlock issue in KDB debugging.