2022-03-08 22:08:18

by Sumit Garg

[permalink] [raw]
Subject: [RFT v4] tty/sysrq: Make sysrq handler NMI aware

Allow a magic sysrq to be triggered from an NMI context. This is done
via marking some sysrq actions as NMI safe. Safe actions will be allowed
to run from NMI context whilst that cannot run from an NMI will be queued
as irq_work for later processing.

The major use-case is to add NMI debugging capabilities to the kernel
in order to debug scenarios such as:
- Primary CPU is stuck in deadlock with interrupts disabled and hence
doesn't honor serial device interrupt. So having magic sysrq triggered
as an NMI is helpful for debugging.
- Always enabled NMI based magic sysrq irrespective of whether the serial
TTY port is active or not.
- Apart from UART interrupts, it allows magic sysrq to be triggered from
a diagnostic NMI interrupt on systems such as A64FX.

A particular sysrq handler is only marked as NMI safe in case the handler
isn't contending for any synchronization primitives as in NMI context
they are expected to cause deadlocks. Note that the debug sysrq do not
contend for any synchronization primitives. It does call kgdb_breakpoint()
to provoke a trap but that trap handler should be NMI safe on
architectures that implement an NMI.

Signed-off-by: Sumit Garg <[email protected]>
---

Changes in v4:
- Use atomic operations for sysrq key variable to gracefully handle
concurrent sysrq entry on multiple CPUs.
- Rename sysrq_nmi_key to sysrq_key as it isn't anymore specific to NMI
context.
- Addressed other misc. comments from Doug.

Changes in v3:
- Extend commit message to include use-cases.
- Get rid of redundant kfifo stuff.
- Incorporate other misc. feedback from Peter Z.

Changes in v2:
- Rebased to 5.17-rc5.
- Separate this patch from complete patch-set [1] as its relevant for
other diagnostic NMI interrupts [2] as well apart from uart NMI
interrupts.
- Incorporated suggestions from Doug.

[1] https://lore.kernel.org/linux-arm-kernel/CAFA6WYOWHgmYYt=KGXDh2hKiuy_rQbJfi279ev0+s-Qh7L21kA@mail.gmail.com/t/#m2b5006f08581448020eb24566927a104d0b95c44
[2] https://lore.kernel.org/all/[email protected]/T/

drivers/tty/sysrq.c | 50 ++++++++++++++++++++++++++++++++++++++-
include/linux/sysrq.h | 1 +
kernel/debug/debug_core.c | 1 +
3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index bbfd004449b5..005c9f9e0004 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -51,6 +51,7 @@
#include <linux/syscalls.h>
#include <linux/of.h>
#include <linux/rcupdate.h>
+#include <linux/irq_work.h>

#include <asm/ptrace.h>
#include <asm/irq_regs.h>
@@ -112,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
.help_msg = "loglevel(0-9)",
.action_msg = "Changing Loglevel",
.enable_mask = SYSRQ_ENABLE_LOG,
+ .nmi_safe = true,
};

#ifdef CONFIG_VT
@@ -159,6 +161,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
.help_msg = "crash(c)",
.action_msg = "Trigger a crash",
.enable_mask = SYSRQ_ENABLE_DUMP,
+ .nmi_safe = true,
};

static void sysrq_handle_reboot(int key)
@@ -172,6 +175,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
.help_msg = "reboot(b)",
.action_msg = "Resetting",
.enable_mask = SYSRQ_ENABLE_BOOT,
+ .nmi_safe = true,
};

const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
@@ -219,6 +223,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
.handler = sysrq_handle_showlocks,
.help_msg = "show-all-locks(d)",
.action_msg = "Show Locks Held",
+ .nmi_safe = true,
};
#else
#define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
@@ -291,6 +296,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
.help_msg = "show-registers(p)",
.action_msg = "Show Regs",
.enable_mask = SYSRQ_ENABLE_DUMP,
+ .nmi_safe = true,
};

static void sysrq_handle_showstate(int key)
@@ -328,6 +334,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
.help_msg = "dump-ftrace-buffer(z)",
.action_msg = "Dump ftrace buffer",
.enable_mask = SYSRQ_ENABLE_DUMP,
+ .nmi_safe = true,
};
#else
#define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
@@ -566,12 +573,46 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
sysrq_key_table[i] = op_p;
}

+static atomic_t sysrq_key = ATOMIC_INIT(-1);
+
+static void sysrq_do_irq_work(struct irq_work *work)
+{
+ const struct sysrq_key_op *op_p;
+ int orig_suppress_printk;
+ int key = atomic_read(&sysrq_key);
+
+ orig_suppress_printk = suppress_printk;
+ suppress_printk = 0;
+
+ rcu_sysrq_start();
+ rcu_read_lock();
+
+ op_p = __sysrq_get_key_op(key);
+ if (op_p)
+ op_p->handler(key);
+
+ rcu_read_unlock();
+ rcu_sysrq_end();
+
+ suppress_printk = orig_suppress_printk;
+ atomic_set(&sysrq_key, -1);
+}
+
+static DEFINE_IRQ_WORK(sysrq_irq_work, sysrq_do_irq_work);
+
void __handle_sysrq(int key, bool check_mask)
{
const struct sysrq_key_op *op_p;
int orig_log_level;
int orig_suppress_printk;
int i;
+ bool irq_work = false;
+
+ /* Skip sysrq handling if one already in progress */
+ if (atomic_cmpxchg(&sysrq_key, -1, key) != -1) {
+ pr_warn("Skip sysrq key: %i as one already in progress\n", key);
+ return;
+ }

orig_suppress_printk = suppress_printk;
suppress_printk = 0;
@@ -596,7 +637,11 @@ void __handle_sysrq(int key, bool check_mask)
if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
pr_info("%s\n", op_p->action_msg);
console_loglevel = orig_log_level;
- op_p->handler(key);
+
+ if (in_nmi() && !op_p->nmi_safe)
+ irq_work = irq_work_queue(&sysrq_irq_work);
+ else
+ op_p->handler(key);
} else {
pr_info("This sysrq operation is disabled.\n");
console_loglevel = orig_log_level;
@@ -623,6 +668,9 @@ void __handle_sysrq(int key, bool check_mask)
rcu_sysrq_end();

suppress_printk = orig_suppress_printk;
+
+ if (!irq_work)
+ atomic_set(&sysrq_key, -1);
}

void handle_sysrq(int key)
diff --git a/include/linux/sysrq.h b/include/linux/sysrq.h
index 3a582ec7a2f1..630b5b9dc225 100644
--- a/include/linux/sysrq.h
+++ b/include/linux/sysrq.h
@@ -34,6 +34,7 @@ struct sysrq_key_op {
const char * const help_msg;
const char * const action_msg;
const int enable_mask;
+ const bool nmi_safe;
};

#ifdef CONFIG_MAGIC_SYSRQ
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index da06a5553835..53b56114f59b 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -978,6 +978,7 @@ static const struct sysrq_key_op sysrq_dbg_op = {
.handler = sysrq_handle_dbg,
.help_msg = "debug(g)",
.action_msg = "DEBUG",
+ .nmi_safe = true,
};
#endif

--
2.25.1


2022-03-08 23:22:20

by Daniel Thompson

[permalink] [raw]
Subject: Re: [RFT v4] tty/sysrq: Make sysrq handler NMI aware

On Tue, Mar 08, 2022 at 08:13:43PM +0530, Sumit Garg wrote:
> Hi Daniel,
>
> On Mon, 7 Mar 2022 at 19:53, Daniel Thompson <[email protected]> wrote:
> >
> > On Mon, Mar 07, 2022 at 04:33:28PM +0530, Sumit Garg wrote:
> > > Allow a magic sysrq to be triggered from an NMI context. This is done
> > > via marking some sysrq actions as NMI safe. Safe actions will be allowed
> > > to run from NMI context whilst that cannot run from an NMI will be queued
> > > as irq_work for later processing.
> > >
> > > <snip>
> > >
> > > @@ -566,12 +573,46 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> > > sysrq_key_table[i] = op_p;
> > > }
> > >
> > > +static atomic_t sysrq_key = ATOMIC_INIT(-1);
> > > +
> > > +static void sysrq_do_irq_work(struct irq_work *work)
> > > +{
> > > + const struct sysrq_key_op *op_p;
> > > + int orig_suppress_printk;
> > > + int key = atomic_read(&sysrq_key);
> > > +
> > > + orig_suppress_printk = suppress_printk;
> > > + suppress_printk = 0;
> > > +
> > > + rcu_sysrq_start();
> > > + rcu_read_lock();
> > > +
> > > + op_p = __sysrq_get_key_op(key);
> > > + if (op_p)
> > > + op_p->handler(key);
> > > +
> > > + rcu_read_unlock();
> > > + rcu_sysrq_end();
> > > +
> > > + suppress_printk = orig_suppress_printk;
> > > + atomic_set(&sysrq_key, -1);
> > > +}
> > > +
> > > +static DEFINE_IRQ_WORK(sysrq_irq_work, sysrq_do_irq_work);
> > > +
> > > void __handle_sysrq(int key, bool check_mask)
> > > {
> > > const struct sysrq_key_op *op_p;
> > > int orig_log_level;
> > > int orig_suppress_printk;
> > > int i;
> > > + bool irq_work = false;
> > > +
> > > + /* Skip sysrq handling if one already in progress */
> > > + if (atomic_cmpxchg(&sysrq_key, -1, key) != -1) {
> > > + pr_warn("Skip sysrq key: %i as one already in progress\n", key);
> > > + return;
> > > + }
> >
> > Doesn't this logic needlessly jam sysrq handling if the irq_work cannot
> > be undertaken?
> >
>
> Here this is done purposefully to ensure synchronisation of three
> contexts while handling sysrq:
> 1. Thread context
> 2. IRQ context
> 3. NMI context

Why is it necessary to provide such synchronization?

Also, if there really is an existing bug in the way thread and irq
contexts interact (e.g. something we can tickle without NMI being
involved) then that should probably be tackled in a separate patch
and with an explanation of the synchronization problem.


> > A console user could unwittingly attempt an !nmi_safe SysRq action on
> > a damaged system that cannot service interrupts. Logic that prevents
> > things like backtrace, ftrace dump, kgdb or reboot is actively harmful
> > to that user's capability to figure out why their original sysrq doesn't
> > work.
>
> I see your point.
>
> >
> > I think the logic to prohibht multiple deferred sysrqs should only
> > be present on code paths where we are actually going to defer the sysrq.
> >
>
> It's not only there to prohibit multiple deferred sysrq (as that alone
> could be handled by irq_work_queue()) but rather to avoid parallelism
> scenarios that Doug mentioned on prior versions.

I'm afraid I'm still a little lost here. I've only done a quick review
but sysrq's entry/exit protocols look like they are intended to handle
stacked contexts. This shouldn't be all that suprising since, even
without your changes, a sysrq triggered by an irq will interrupt
a sysrq triggered using /proc/sysrq-trigger .


> How about the following add-on change to allow passthrough for broken
> irq_work systems?

My question ultimately boils down to whether the existing logic
is necessary, not whether we can make it even more complex!

So before thinking too much about this change I think it would be
useful to have a clear example of the circumstances that you think
it will not be safe to run an NMI-safe sysrq from an NMI.


Daniel.


> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index 005c9f9e0004..0a91d3ccf862 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -608,6 +608,15 @@ void __handle_sysrq(int key, bool check_mask)
> int i;
> bool irq_work = false;
>
> + /*
> + * Handle a case if irq_work cannot be undertaken on a damaged
> + * system stuck in hard lockup and cannot service interrupts.
> + * In such cases we shouldn't atleast block NMI safe handlers
> + * that doesn't depend on irq_work.
> + */
> + if (irq_work_is_pending(&sysrq_irq_work))
> + atomic_set(&sysrq_key, -1);
> +
> /* Skip sysrq handling if one already in progress */
> if (atomic_cmpxchg(&sysrq_key, -1, key) != -1) {
> pr_warn("Skip sysrq key: %i as one already in progress\n", key);
>
> -Sumit
>
> >
> > Daniel.

2022-03-09 08:31:41

by Sumit Garg

[permalink] [raw]
Subject: Re: [RFT v4] tty/sysrq: Make sysrq handler NMI aware

On Tue, 8 Mar 2022 at 21:16, Daniel Thompson <[email protected]> wrote:
>
> On Tue, Mar 08, 2022 at 08:13:43PM +0530, Sumit Garg wrote:
> > Hi Daniel,
> >
> > On Mon, 7 Mar 2022 at 19:53, Daniel Thompson <[email protected]> wrote:
> > >
> > > On Mon, Mar 07, 2022 at 04:33:28PM +0530, Sumit Garg wrote:
> > > > Allow a magic sysrq to be triggered from an NMI context. This is done
> > > > via marking some sysrq actions as NMI safe. Safe actions will be allowed
> > > > to run from NMI context whilst that cannot run from an NMI will be queued
> > > > as irq_work for later processing.
> > > >
> > > > <snip>
> > > >
> > > > @@ -566,12 +573,46 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> > > > sysrq_key_table[i] = op_p;
> > > > }
> > > >
> > > > +static atomic_t sysrq_key = ATOMIC_INIT(-1);
> > > > +
> > > > +static void sysrq_do_irq_work(struct irq_work *work)
> > > > +{
> > > > + const struct sysrq_key_op *op_p;
> > > > + int orig_suppress_printk;
> > > > + int key = atomic_read(&sysrq_key);
> > > > +
> > > > + orig_suppress_printk = suppress_printk;
> > > > + suppress_printk = 0;
> > > > +
> > > > + rcu_sysrq_start();
> > > > + rcu_read_lock();
> > > > +
> > > > + op_p = __sysrq_get_key_op(key);
> > > > + if (op_p)
> > > > + op_p->handler(key);
> > > > +
> > > > + rcu_read_unlock();
> > > > + rcu_sysrq_end();
> > > > +
> > > > + suppress_printk = orig_suppress_printk;
> > > > + atomic_set(&sysrq_key, -1);
> > > > +}
> > > > +
> > > > +static DEFINE_IRQ_WORK(sysrq_irq_work, sysrq_do_irq_work);
> > > > +
> > > > void __handle_sysrq(int key, bool check_mask)
> > > > {
> > > > const struct sysrq_key_op *op_p;
> > > > int orig_log_level;
> > > > int orig_suppress_printk;
> > > > int i;
> > > > + bool irq_work = false;
> > > > +
> > > > + /* Skip sysrq handling if one already in progress */
> > > > + if (atomic_cmpxchg(&sysrq_key, -1, key) != -1) {
> > > > + pr_warn("Skip sysrq key: %i as one already in progress\n", key);
> > > > + return;
> > > > + }
> > >
> > > Doesn't this logic needlessly jam sysrq handling if the irq_work cannot
> > > be undertaken?
> > >
> >
> > Here this is done purposefully to ensure synchronisation of three
> > contexts while handling sysrq:
> > 1. Thread context
> > 2. IRQ context
> > 3. NMI context
>
> Why is it necessary to provide such synchronization?
>
> Also, if there really is an existing bug in the way thread and irq
> contexts interact (e.g. something we can tickle without NMI being
> involved) then that should probably be tackled in a separate patch
> and with an explanation of the synchronization problem.
>
>
> > > A console user could unwittingly attempt an !nmi_safe SysRq action on
> > > a damaged system that cannot service interrupts. Logic that prevents
> > > things like backtrace, ftrace dump, kgdb or reboot is actively harmful
> > > to that user's capability to figure out why their original sysrq doesn't
> > > work.
> >
> > I see your point.
> >
> > >
> > > I think the logic to prohibht multiple deferred sysrqs should only
> > > be present on code paths where we are actually going to defer the sysrq.
> > >
> >
> > It's not only there to prohibit multiple deferred sysrq (as that alone
> > could be handled by irq_work_queue()) but rather to avoid parallelism
> > scenarios that Doug mentioned on prior versions.
>
> I'm afraid I'm still a little lost here. I've only done a quick review
> but sysrq's entry/exit protocols look like they are intended to handle
> stacked contexts. This shouldn't be all that suprising since, even
> without your changes, a sysrq triggered by an irq will interrupt
> a sysrq triggered using /proc/sysrq-trigger .
>

Yeah you are right. I see problems with how globals like
"suppress_printk" and "console_loglevel" are modified and restored
within __handle_sysrq(). A concurrent sysrq may easily lead to
incorrect value restoration as an example below:

Thread 1
Thread 2
orig_suppress_printk = suppress_printk; # here value is 1
suppress_printk = 0;

orig_suppress_printk = suppress_printk; #
here value is 0
suppress_printk = orig_suppress_printk; # here value is 1

suppress_printk = 0;

suppress_printk = orig_suppress_printk;
#incorrect value restored as 0

Greg,

Do let me know if I am missing something otherwise I will create a
separate patch for this issue.

-Sumit

>
> > How about the following add-on change to allow passthrough for broken
> > irq_work systems?
>
> My question ultimately boils down to whether the existing logic
> is necessary, not whether we can make it even more complex!
>
> So before thinking too much about this change I think it would be
> useful to have a clear example of the circumstances that you think
> it will not be safe to run an NMI-safe sysrq from an NMI.
>
>
> Daniel.
>
>
> > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > index 005c9f9e0004..0a91d3ccf862 100644
> > --- a/drivers/tty/sysrq.c
> > +++ b/drivers/tty/sysrq.c
> > @@ -608,6 +608,15 @@ void __handle_sysrq(int key, bool check_mask)
> > int i;
> > bool irq_work = false;
> >
> > + /*
> > + * Handle a case if irq_work cannot be undertaken on a damaged
> > + * system stuck in hard lockup and cannot service interrupts.
> > + * In such cases we shouldn't atleast block NMI safe handlers
> > + * that doesn't depend on irq_work.
> > + */
> > + if (irq_work_is_pending(&sysrq_irq_work))
> > + atomic_set(&sysrq_key, -1);
> > +
> > /* Skip sysrq handling if one already in progress */
> > if (atomic_cmpxchg(&sysrq_key, -1, key) != -1) {
> > pr_warn("Skip sysrq key: %i as one already in progress\n", key);
> >
> > -Sumit
> >
> > >
> > > Daniel.