2020-05-22 14:38:19

by Sumit Garg

[permalink] [raw]
Subject: [PATCH v2] kdb: Make kdb_printf robust to run in NMI context

While rounding up CPUs via NMIs, its possible that a rounded up CPU
maybe holding a console port lock leading to kgdb master CPU stuck in
a deadlock during invocation of console write operations. So in order
to avoid such a deadlock, invoke bust_spinlocks() prior to invocation
of console handlers.

Also, add a check for console port to be enabled prior to invocation of
corresponding handler.

Suggested-by: Petr Mladek <[email protected]>
Suggested-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Sumit Garg <[email protected]>
---

Changes in v2:
- Use oops_in_progress directly instead of bust_spinlocks().

kernel/debug/kdb/kdb_io.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 924bc92..3a5a068 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -699,7 +699,11 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
}
}
for_each_console(c) {
+ if (!(c->flags & CON_ENABLED))
+ continue;
+ ++oops_in_progress;
c->write(c, cp, retlen - (cp - kdb_buffer));
+ --oops_in_progress;
touch_nmi_watchdog();
}
}
@@ -761,7 +765,11 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
}
}
for_each_console(c) {
+ if (!(c->flags & CON_ENABLED))
+ continue;
+ ++oops_in_progress;
c->write(c, moreprompt, strlen(moreprompt));
+ --oops_in_progress;
touch_nmi_watchdog();
}

--
2.7.4


2020-05-22 16:37:21

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v2] kdb: Make kdb_printf robust to run in NMI context

On Fri, May 22, 2020 at 08:03:47PM +0530, Sumit Garg wrote:
> While rounding up CPUs via NMIs, its possible that a rounded up CPU
> maybe holding a console port lock leading to kgdb master CPU stuck in
> a deadlock during invocation of console write operations. So in order
> to avoid such a deadlock, invoke bust_spinlocks() prior to invocation
> of console handlers.
>
> Also, add a check for console port to be enabled prior to invocation of
> corresponding handler.

Perhaps this should have been two patches.

In fact, to be honest, I'd suggest combining all the patches to improve
kdb console handling (including a fixed version of the RFC) into a
single patch set.


> Suggested-by: Petr Mladek <[email protected]>
> Suggested-by: Sergey Senozhatsky <[email protected]>
> Signed-off-by: Sumit Garg <[email protected]>
> ---
>
> Changes in v2:
> - Use oops_in_progress directly instead of bust_spinlocks().
>
> kernel/debug/kdb/kdb_io.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> index 924bc92..3a5a068 100644
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -699,7 +699,11 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> }
> }
> for_each_console(c) {
> + if (!(c->flags & CON_ENABLED))
> + continue;
> + ++oops_in_progress;

Given the subtly of what is going on I think we need some comments in
the code on what we are doing and why.


> c->write(c, cp, retlen - (cp - kdb_buffer));
> + --oops_in_progress;
> touch_nmi_watchdog();
> }
> }
> @@ -761,7 +765,11 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> }
> }
> for_each_console(c) {
> + if (!(c->flags & CON_ENABLED))
> + continue;
> + ++oops_in_progress;
> c->write(c, moreprompt, strlen(moreprompt));
> + --oops_in_progress;
> touch_nmi_watchdog();

As with the other patches maybe the first patch in the set should be
factoring out the common code before making changes to it.


Daniel.

2020-05-26 11:37:52

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH v2] kdb: Make kdb_printf robust to run in NMI context

On Fri, 22 May 2020 at 22:05, Daniel Thompson
<[email protected]> wrote:
>
> On Fri, May 22, 2020 at 08:03:47PM +0530, Sumit Garg wrote:
> > While rounding up CPUs via NMIs, its possible that a rounded up CPU
> > maybe holding a console port lock leading to kgdb master CPU stuck in
> > a deadlock during invocation of console write operations. So in order
> > to avoid such a deadlock, invoke bust_spinlocks() prior to invocation
> > of console handlers.
> >
> > Also, add a check for console port to be enabled prior to invocation of
> > corresponding handler.
>
> Perhaps this should have been two patches.
>

Okay, will split this patch into two.

> In fact, to be honest, I'd suggest combining all the patches to improve
> kdb console handling (including a fixed version of the RFC) into a
> single patch set.

Yeah it makes sense to have a combined patch set to improve kdb
console handling. But I posted the RFC patch separately as I expected
comments and discussions to come up with an accepted approach.

So let me wait for an agreement on RFC patch after which I can include
that patch in this patch set.

>
>
> > Suggested-by: Petr Mladek <[email protected]>
> > Suggested-by: Sergey Senozhatsky <[email protected]>
> > Signed-off-by: Sumit Garg <[email protected]>
> > ---
> >
> > Changes in v2:
> > - Use oops_in_progress directly instead of bust_spinlocks().
> >
> > kernel/debug/kdb/kdb_io.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> > index 924bc92..3a5a068 100644
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -699,7 +699,11 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> > }
> > }
> > for_each_console(c) {
> > + if (!(c->flags & CON_ENABLED))
> > + continue;
> > + ++oops_in_progress;
>
> Given the subtly of what is going on I think we need some comments in
> the code on what we are doing and why.

Sure, will add comments.

>
>
> > c->write(c, cp, retlen - (cp - kdb_buffer));
> > + --oops_in_progress;
> > touch_nmi_watchdog();
> > }
> > }
> > @@ -761,7 +765,11 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> > }
> > }
> > for_each_console(c) {
> > + if (!(c->flags & CON_ENABLED))
> > + continue;
> > + ++oops_in_progress;
> > c->write(c, moreprompt, strlen(moreprompt));
> > + --oops_in_progress;
> > touch_nmi_watchdog();
>
> As with the other patches maybe the first patch in the set should be
> factoring out the common code before making changes to it.

Sure, will factor out common code as initial patch.

-Sumit

>
>
> Daniel.