Make it possible for UARTs to trigger magic sysrq from an NMI. With the
advent of pseudo NMIs on arm64 it became quite generic to request serial
device interrupt as an NMI rather than IRQ. And having NMI driven serial
RX will allow us to trigger magic sysrq as an NMI and hence drop into
kernel debugger in NMI context.
The major use-case is to add NMI debugging capabilities to the kernel
in order to debug scenarios such as:
- Primary CPU is stuck in deadlock with interrupts disabled and hence
doesn't honor serial device interrupt. So having magic sysrq triggered
as an NMI is helpful for debugging.
- Always enabled NMI based magic sysrq irrespective of whether the serial
TTY port is active or not.
Currently there is an existing kgdb NMI serial driver which provides
partial implementation in upstream to have a separate ttyNMI0 port but
that remained in silos with the serial core/drivers which made it a bit
odd to enable using serial device interrupt and hence remained unused. It
seems to be clearly intended to avoid almost all custom NMI changes to
the UART driver.
But this patch-set allows the serial core/drivers to be NMI aware which
in turn provides NMI debugging capabilities via magic sysrq and hence
there is no specific reason to keep this special driver. So remove it
instead.
Approach:
---------
The overall idea is to intercept serial RX characters in NMI context, if
those are specific to magic sysrq then allow corresponding handler to run
in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
work queue in order to run those in normal interrupt context.
This approach is demonstrated using amba-pl011 driver.
Patch-wise description:
-----------------------
Patch #1 prepares magic sysrq handler to be NMI aware.
Patch #2 adds NMI framework to serial core.
Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
Patch #5 removes kgdb NMI serial driver.
Goal of this RFC:
-----------------
My main reason for sharing this as an RFC is to help decide whether or
not to continue with this approach. The next step for me would to port
the work to a system with an 8250 UART.
Usage:
------
This RFC has been developed on top of 5.8-rc3 and if anyone is interested
to give this a try on QEMU, just enable following config options
additional to arm64 defconfig:
CONFIG_KGDB=y
CONFIG_KGDB_KDB=y
CONFIG_ARM64_PSEUDO_NMI=y
Qemu command line to test:
$ qemu-system-aarch64 -nographic -machine virt,gic-version=3 -cpu cortex-a57 \
-smp 2 -kernel arch/arm64/boot/Image -append 'console=ttyAMA0,38400 \
keep_bootcon root=/dev/vda2 irqchip.gicv3_pseudo_nmi=1 kgdboc=ttyAMA0' \
-initrd rootfs-arm64.cpio.gz
NMI entry into kgdb via sysrq:
- Ctrl a + b + g
Reference:
----------
For more details about NMI/FIQ debugger, refer to this blog post [1].
[1] https://www.linaro.org/blog/debugging-arm-kernels-using-nmifiq/
I do look forward to your comments and feedback.
Sumit Garg (5):
tty/sysrq: Make sysrq handler NMI aware
serial: core: Add framework to allow NMI aware serial drivers
serial: amba-pl011: Re-order APIs definition
serial: amba-pl011: Enable NMI aware uart port
serial: Remove KGDB NMI serial driver
drivers/tty/serial/Kconfig | 19 --
drivers/tty/serial/Makefile | 1 -
drivers/tty/serial/amba-pl011.c | 232 +++++++++++++++++-------
drivers/tty/serial/kgdb_nmi.c | 383 ---------------------------------------
drivers/tty/serial/kgdboc.c | 8 -
drivers/tty/serial/serial_core.c | 120 +++++++++++-
drivers/tty/sysrq.c | 33 +++-
include/linux/kgdb.h | 10 -
include/linux/serial_core.h | 67 +++++++
include/linux/sysrq.h | 1 +
kernel/debug/debug_core.c | 1 +
11 files changed, 386 insertions(+), 489 deletions(-)
delete mode 100644 drivers/tty/serial/kgdb_nmi.c
--
2.7.4
In a future patch we will add support to the serial core to make it
possible to trigger a magic sysrq from an NMI context. Prepare for this
by marking some sysrq actions as NMI safe. Safe actions will be allowed
to run from NMI context whilst that cannot run from an NMI will be queued
as irq_work for later processing.
A particular sysrq handler is only marked as NMI safe in case the handler
isn't contending for any synchronization primitives as in NMI context
they are expected to cause deadlocks. Note that the debug sysrq do not
contend for any synchronization primitives. It does call kgdb_breakpoint()
to provoke a trap but that trap handler should be NMI safe on
architectures that implement an NMI.
Signed-off-by: Sumit Garg <[email protected]>
---
drivers/tty/sysrq.c | 33 ++++++++++++++++++++++++++++++++-
include/linux/sysrq.h | 1 +
kernel/debug/debug_core.c | 1 +
3 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 7c95afa9..8017e33 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -50,6 +50,8 @@
#include <linux/syscalls.h>
#include <linux/of.h>
#include <linux/rcupdate.h>
+#include <linux/irq_work.h>
+#include <linux/kfifo.h>
#include <asm/ptrace.h>
#include <asm/irq_regs.h>
@@ -111,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
.help_msg = "loglevel(0-9)",
.action_msg = "Changing Loglevel",
.enable_mask = SYSRQ_ENABLE_LOG,
+ .nmi_safe = true,
};
#ifdef CONFIG_VT
@@ -157,6 +160,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
.help_msg = "crash(c)",
.action_msg = "Trigger a crash",
.enable_mask = SYSRQ_ENABLE_DUMP,
+ .nmi_safe = true,
};
static void sysrq_handle_reboot(int key)
@@ -170,6 +174,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
.help_msg = "reboot(b)",
.action_msg = "Resetting",
.enable_mask = SYSRQ_ENABLE_BOOT,
+ .nmi_safe = true,
};
const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
@@ -217,6 +222,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
.handler = sysrq_handle_showlocks,
.help_msg = "show-all-locks(d)",
.action_msg = "Show Locks Held",
+ .nmi_safe = true,
};
#else
#define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
@@ -289,6 +295,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
.help_msg = "show-registers(p)",
.action_msg = "Show Regs",
.enable_mask = SYSRQ_ENABLE_DUMP,
+ .nmi_safe = true,
};
static void sysrq_handle_showstate(int key)
@@ -326,6 +333,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
.help_msg = "dump-ftrace-buffer(z)",
.action_msg = "Dump ftrace buffer",
.enable_mask = SYSRQ_ENABLE_DUMP,
+ .nmi_safe = true,
};
#else
#define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
@@ -538,6 +546,23 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
sysrq_key_table[i] = op_p;
}
+#define SYSRQ_NMI_FIFO_SIZE 64
+static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
+
+static void sysrq_do_nmi_work(struct irq_work *work)
+{
+ const struct sysrq_key_op *op_p;
+ int key;
+
+ while (kfifo_out(&sysrq_nmi_fifo, &key, 1)) {
+ op_p = __sysrq_get_key_op(key);
+ if (op_p)
+ op_p->handler(key);
+ }
+}
+
+static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
+
void __handle_sysrq(int key, bool check_mask)
{
const struct sysrq_key_op *op_p;
@@ -568,7 +593,13 @@ void __handle_sysrq(int key, bool check_mask)
if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
pr_info("%s\n", op_p->action_msg);
console_loglevel = orig_log_level;
- op_p->handler(key);
+
+ if (in_nmi() && !op_p->nmi_safe) {
+ kfifo_in(&sysrq_nmi_fifo, &key, 1);
+ irq_work_queue(&sysrq_nmi_work);
+ } else {
+ op_p->handler(key);
+ }
} else {
pr_info("This sysrq operation is disabled.\n");
console_loglevel = orig_log_level;
diff --git a/include/linux/sysrq.h b/include/linux/sysrq.h
index 3a582ec..630b5b9 100644
--- a/include/linux/sysrq.h
+++ b/include/linux/sysrq.h
@@ -34,6 +34,7 @@ struct sysrq_key_op {
const char * const help_msg;
const char * const action_msg;
const int enable_mask;
+ const bool nmi_safe;
};
#ifdef CONFIG_MAGIC_SYSRQ
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 9e59347..2b51173 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -943,6 +943,7 @@ static const struct sysrq_key_op sysrq_dbg_op = {
.handler = sysrq_handle_dbg,
.help_msg = "debug(g)",
.action_msg = "DEBUG",
+ .nmi_safe = true,
};
#endif
--
2.7.4
Add NMI framework APIs in serial core which can be leveraged by serial
drivers to have NMI driven serial transfers. These APIs are kept under
CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
is the only known user to enable NMI driven serial port.
The general idea is to intercept RX characters in NMI context, if those
are specific to magic sysrq then allow corresponding handler to run in
NMI context. Otherwise defer all other RX and TX operations to IRQ work
queue in order to run those in normal interrupt context.
Also, since magic sysrq entry APIs will need to be invoked from NMI
context, so make those APIs NMI safe via deferring NMI unsafe work to
IRQ work queue.
Signed-off-by: Sumit Garg <[email protected]>
---
drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
include/linux/serial_core.h | 67 ++++++++++++++++++++++
2 files changed, 185 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 57840cf..6342e90 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
return true;
}
+#ifdef CONFIG_CONSOLE_POLL
+ if (in_nmi())
+ irq_work_queue(&port->nmi_state.sysrq_toggle_work);
+ else
+ schedule_work(&sysrq_enable_work);
+#else
schedule_work(&sysrq_enable_work);
-
+#endif
port->sysrq = 0;
return true;
}
@@ -3273,12 +3279,122 @@ int uart_handle_break(struct uart_port *port)
port->sysrq = 0;
}
- if (port->flags & UPF_SAK)
+ if (port->flags & UPF_SAK) {
+#ifdef CONFIG_CONSOLE_POLL
+ if (in_nmi())
+ irq_work_queue(&port->nmi_state.sysrq_sak_work);
+ else
+ do_SAK(state->port.tty);
+#else
do_SAK(state->port.tty);
+#endif
+ }
return 0;
}
EXPORT_SYMBOL_GPL(uart_handle_break);
+#ifdef CONFIG_CONSOLE_POLL
+int uart_nmi_handle_char(struct uart_port *port, unsigned int status,
+ unsigned int overrun, unsigned int ch,
+ unsigned int flag)
+{
+ struct uart_nmi_rx_data rx_data;
+
+ if (!in_nmi())
+ return 0;
+
+ rx_data.status = status;
+ rx_data.overrun = overrun;
+ rx_data.ch = ch;
+ rx_data.flag = flag;
+
+ if (!kfifo_in(&port->nmi_state.rx_fifo, &rx_data, 1))
+ ++port->icount.buf_overrun;
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(uart_nmi_handle_char);
+
+static void uart_nmi_rx_work(struct irq_work *rx_work)
+{
+ struct uart_nmi_state *nmi_state =
+ container_of(rx_work, struct uart_nmi_state, rx_work);
+ struct uart_port *port =
+ container_of(nmi_state, struct uart_port, nmi_state);
+ struct uart_nmi_rx_data rx_data;
+
+ /*
+ * In polling mode, serial device is initialized much prior to
+ * TTY port becoming active. This scenario is especially useful
+ * from debugging perspective such that magic sysrq or debugger
+ * entry would still be possible even when TTY port isn't
+ * active (consider a boot hang case or if a user hasn't opened
+ * the serial port). So we discard any other RX data apart from
+ * magic sysrq commands in case TTY port isn't active.
+ */
+ if (!port->state || !tty_port_active(&port->state->port)) {
+ kfifo_reset(&nmi_state->rx_fifo);
+ return;
+ }
+
+ spin_lock(&port->lock);
+ while (kfifo_out(&nmi_state->rx_fifo, &rx_data, 1))
+ uart_insert_char(port, rx_data.status, rx_data.overrun,
+ rx_data.ch, rx_data.flag);
+ spin_unlock(&port->lock);
+
+ tty_flip_buffer_push(&port->state->port);
+}
+
+static void uart_nmi_tx_work(struct irq_work *tx_work)
+{
+ struct uart_nmi_state *nmi_state =
+ container_of(tx_work, struct uart_nmi_state, tx_work);
+ struct uart_port *port =
+ container_of(nmi_state, struct uart_port, nmi_state);
+
+ spin_lock(&port->lock);
+ if (nmi_state->tx_irq_callback)
+ nmi_state->tx_irq_callback(port);
+ spin_unlock(&port->lock);
+}
+
+static void uart_nmi_sak_work(struct irq_work *work)
+{
+ struct uart_nmi_state *nmi_state =
+ container_of(work, struct uart_nmi_state, sysrq_sak_work);
+ struct uart_port *port =
+ container_of(nmi_state, struct uart_port, nmi_state);
+
+ do_SAK(port->state->port.tty);
+}
+
+#ifdef CONFIG_MAGIC_SYSRQ_SERIAL
+static void uart_nmi_toggle_work(struct irq_work *work)
+{
+ schedule_work(&sysrq_enable_work);
+}
+#endif
+
+int uart_nmi_state_init(struct uart_port *port)
+{
+ int ret;
+
+ ret = kfifo_alloc(&port->nmi_state.rx_fifo, 256, GFP_KERNEL);
+ if (ret)
+ return ret;
+
+ init_irq_work(&port->nmi_state.rx_work, uart_nmi_rx_work);
+ init_irq_work(&port->nmi_state.tx_work, uart_nmi_tx_work);
+ init_irq_work(&port->nmi_state.sysrq_sak_work, uart_nmi_sak_work);
+#ifdef CONFIG_MAGIC_SYSRQ_SERIAL
+ init_irq_work(&port->nmi_state.sysrq_toggle_work, uart_nmi_toggle_work);
+#endif
+ return ret;
+}
+EXPORT_SYMBOL_GPL(uart_nmi_state_init);
+#endif
+
EXPORT_SYMBOL(uart_write_wakeup);
EXPORT_SYMBOL(uart_register_driver);
EXPORT_SYMBOL(uart_unregister_driver);
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 9fd550e..84487a9 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -18,6 +18,8 @@
#include <linux/tty.h>
#include <linux/mutex.h>
#include <linux/sysrq.h>
+#include <linux/irq_work.h>
+#include <linux/kfifo.h>
#include <uapi/linux/serial_core.h>
#ifdef CONFIG_SERIAL_CORE_CONSOLE
@@ -103,6 +105,28 @@ struct uart_icount {
typedef unsigned int __bitwise upf_t;
typedef unsigned int __bitwise upstat_t;
+#ifdef CONFIG_CONSOLE_POLL
+struct uart_nmi_rx_data {
+ unsigned int status;
+ unsigned int overrun;
+ unsigned int ch;
+ unsigned int flag;
+};
+
+struct uart_nmi_state {
+ bool active;
+
+ struct irq_work tx_work;
+ void (*tx_irq_callback)(struct uart_port *port);
+
+ struct irq_work rx_work;
+ DECLARE_KFIFO_PTR(rx_fifo, struct uart_nmi_rx_data);
+
+ struct irq_work sysrq_sak_work;
+ struct irq_work sysrq_toggle_work;
+};
+#endif
+
struct uart_port {
spinlock_t lock; /* port lock */
unsigned long iobase; /* in/out[bwl] */
@@ -255,6 +279,9 @@ struct uart_port {
struct gpio_desc *rs485_term_gpio; /* enable RS485 bus termination */
struct serial_iso7816 iso7816;
void *private_data; /* generic platform data pointer */
+#ifdef CONFIG_CONSOLE_POLL
+ struct uart_nmi_state nmi_state;
+#endif
};
static inline int serial_port_in(struct uart_port *up, int offset)
@@ -475,4 +502,44 @@ extern int uart_handle_break(struct uart_port *port);
!((cflag) & CLOCAL))
int uart_get_rs485_mode(struct uart_port *port);
+
+/*
+ * The following are helper functions for the NMI aware serial drivers.
+ * Currently NMI support is only enabled under polling mode.
+ */
+
+#ifdef CONFIG_CONSOLE_POLL
+int uart_nmi_state_init(struct uart_port *port);
+int uart_nmi_handle_char(struct uart_port *port, unsigned int status,
+ unsigned int overrun, unsigned int ch,
+ unsigned int flag);
+
+static inline bool uart_nmi_active(struct uart_port *port)
+{
+ return port->nmi_state.active;
+}
+
+static inline void uart_set_nmi_active(struct uart_port *port, bool val)
+{
+ port->nmi_state.active = val;
+}
+#else
+static inline int uart_nmi_handle_char(struct uart_port *port,
+ unsigned int status,
+ unsigned int overrun,
+ unsigned int ch, unsigned int flag)
+{
+ return 0;
+}
+
+static inline bool uart_nmi_active(struct uart_port *port)
+{
+ return false;
+}
+
+static inline void uart_set_nmi_active(struct uart_port *port, bool val)
+{
+}
+#endif
+
#endif /* LINUX_SERIAL_CORE_H */
--
2.7.4
A future patch will need to call pl011_hwinit() and
pl011_enable_interrupts() before they are currently defined. Move
them closer to the front of the file. There is no change in the
implementation of either function.
Signed-off-by: Sumit Garg <[email protected]>
---
drivers/tty/serial/amba-pl011.c | 148 ++++++++++++++++++++--------------------
1 file changed, 74 insertions(+), 74 deletions(-)
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 8efd7c2..0983c5e 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -1581,6 +1581,80 @@ static void pl011_break_ctl(struct uart_port *port, int break_state)
spin_unlock_irqrestore(&uap->port.lock, flags);
}
+static int pl011_hwinit(struct uart_port *port)
+{
+ struct uart_amba_port *uap =
+ container_of(port, struct uart_amba_port, port);
+ int retval;
+
+ /* Optionaly enable pins to be muxed in and configured */
+ pinctrl_pm_select_default_state(port->dev);
+
+ /*
+ * Try to enable the clock producer.
+ */
+ retval = clk_prepare_enable(uap->clk);
+ if (retval)
+ return retval;
+
+ uap->port.uartclk = clk_get_rate(uap->clk);
+
+ /* Clear pending error and receive interrupts */
+ pl011_write(UART011_OEIS | UART011_BEIS | UART011_PEIS |
+ UART011_FEIS | UART011_RTIS | UART011_RXIS,
+ uap, REG_ICR);
+
+ /*
+ * Save interrupts enable mask, and enable RX interrupts in case if
+ * the interrupt is used for NMI entry.
+ */
+ uap->im = pl011_read(uap, REG_IMSC);
+ pl011_write(UART011_RTIM | UART011_RXIM, uap, REG_IMSC);
+
+ if (dev_get_platdata(uap->port.dev)) {
+ struct amba_pl011_data *plat;
+
+ plat = dev_get_platdata(uap->port.dev);
+ if (plat->init)
+ plat->init();
+ }
+ return 0;
+}
+
+/*
+ * Enable interrupts, only timeouts when using DMA
+ * if initial RX DMA job failed, start in interrupt mode
+ * as well.
+ */
+static void pl011_enable_interrupts(struct uart_amba_port *uap)
+{
+ unsigned int i;
+
+ spin_lock_irq(&uap->port.lock);
+
+ /* Clear out any spuriously appearing RX interrupts */
+ pl011_write(UART011_RTIS | UART011_RXIS, uap, REG_ICR);
+
+ /*
+ * RXIS is asserted only when the RX FIFO transitions from below
+ * to above the trigger threshold. If the RX FIFO is already
+ * full to the threshold this can't happen and RXIS will now be
+ * stuck off. Drain the RX FIFO explicitly to fix this:
+ */
+ for (i = 0; i < uap->fifosize * 2; ++i) {
+ if (pl011_read(uap, REG_FR) & UART01x_FR_RXFE)
+ break;
+
+ pl011_read(uap, REG_DR);
+ }
+
+ uap->im = UART011_RTIM;
+ if (!pl011_dma_rx_running(uap))
+ uap->im |= UART011_RXIM;
+ pl011_write(uap->im, uap, REG_IMSC);
+ spin_unlock_irq(&uap->port.lock);
+}
+
#ifdef CONFIG_CONSOLE_POLL
static void pl011_quiesce_irqs(struct uart_port *port)
@@ -1639,46 +1713,6 @@ static void pl011_put_poll_char(struct uart_port *port,
#endif /* CONFIG_CONSOLE_POLL */
-static int pl011_hwinit(struct uart_port *port)
-{
- struct uart_amba_port *uap =
- container_of(port, struct uart_amba_port, port);
- int retval;
-
- /* Optionaly enable pins to be muxed in and configured */
- pinctrl_pm_select_default_state(port->dev);
-
- /*
- * Try to enable the clock producer.
- */
- retval = clk_prepare_enable(uap->clk);
- if (retval)
- return retval;
-
- uap->port.uartclk = clk_get_rate(uap->clk);
-
- /* Clear pending error and receive interrupts */
- pl011_write(UART011_OEIS | UART011_BEIS | UART011_PEIS |
- UART011_FEIS | UART011_RTIS | UART011_RXIS,
- uap, REG_ICR);
-
- /*
- * Save interrupts enable mask, and enable RX interrupts in case if
- * the interrupt is used for NMI entry.
- */
- uap->im = pl011_read(uap, REG_IMSC);
- pl011_write(UART011_RTIM | UART011_RXIM, uap, REG_IMSC);
-
- if (dev_get_platdata(uap->port.dev)) {
- struct amba_pl011_data *plat;
-
- plat = dev_get_platdata(uap->port.dev);
- if (plat->init)
- plat->init();
- }
- return 0;
-}
-
static bool pl011_split_lcrh(const struct uart_amba_port *uap)
{
return pl011_reg_to_offset(uap, REG_LCRH_RX) !=
@@ -1707,40 +1741,6 @@ static int pl011_allocate_irq(struct uart_amba_port *uap)
return request_irq(uap->port.irq, pl011_int, IRQF_SHARED, "uart-pl011", uap);
}
-/*
- * Enable interrupts, only timeouts when using DMA
- * if initial RX DMA job failed, start in interrupt mode
- * as well.
- */
-static void pl011_enable_interrupts(struct uart_amba_port *uap)
-{
- unsigned int i;
-
- spin_lock_irq(&uap->port.lock);
-
- /* Clear out any spuriously appearing RX interrupts */
- pl011_write(UART011_RTIS | UART011_RXIS, uap, REG_ICR);
-
- /*
- * RXIS is asserted only when the RX FIFO transitions from below
- * to above the trigger threshold. If the RX FIFO is already
- * full to the threshold this can't happen and RXIS will now be
- * stuck off. Drain the RX FIFO explicitly to fix this:
- */
- for (i = 0; i < uap->fifosize * 2; ++i) {
- if (pl011_read(uap, REG_FR) & UART01x_FR_RXFE)
- break;
-
- pl011_read(uap, REG_DR);
- }
-
- uap->im = UART011_RTIM;
- if (!pl011_dma_rx_running(uap))
- uap->im |= UART011_RXIM;
- pl011_write(uap->im, uap, REG_IMSC);
- spin_unlock_irq(&uap->port.lock);
-}
-
static int pl011_startup(struct uart_port *port)
{
struct uart_amba_port *uap =
--
2.7.4
Allow serial device interrupt to be requested as an NMI during
initialization in polling mode. If the irqchip doesn't support serial
device interrupt as an NMI then fallback to it being as a normal IRQ.
Currently this NMI aware uart port only supports NMI driven programmed
IO operation whereas DMA operation isn't supported.
And while operating in NMI mode, RX always remains active irrespective
of whether corresponding TTY port is active or not. So we directly bail
out of startup, shutdown and rx_stop APIs if NMI mode is active.
Also, get rid of modification to interrupts enable mask in pl011_hwinit()
as now we have a proper way to enable interrupts for NMI entry using
pl011_enable_interrupts().
Signed-off-by: Sumit Garg <[email protected]>
---
drivers/tty/serial/amba-pl011.c | 124 ++++++++++++++++++++++++++++++++++++----
1 file changed, 113 insertions(+), 11 deletions(-)
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 0983c5e..5df1c07 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -41,6 +41,8 @@
#include <linux/sizes.h>
#include <linux/io.h>
#include <linux/acpi.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
#include "amba-pl011.h"
@@ -347,6 +349,10 @@ static int pl011_fifo_to_tty(struct uart_amba_port *uap)
if (uart_handle_sysrq_char(&uap->port, ch & 255))
continue;
+ if (uart_nmi_handle_char(&uap->port, ch, UART011_DR_OE, ch,
+ flag))
+ continue;
+
uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag);
}
@@ -1316,6 +1322,9 @@ static void pl011_stop_rx(struct uart_port *port)
struct uart_amba_port *uap =
container_of(port, struct uart_amba_port, port);
+ if (uart_nmi_active(port))
+ return;
+
uap->im &= ~(UART011_RXIM|UART011_RTIM|UART011_FEIM|
UART011_PEIM|UART011_BEIM|UART011_OEIM);
pl011_write(uap->im, uap, REG_IMSC);
@@ -1604,13 +1613,6 @@ static int pl011_hwinit(struct uart_port *port)
UART011_FEIS | UART011_RTIS | UART011_RXIS,
uap, REG_ICR);
- /*
- * Save interrupts enable mask, and enable RX interrupts in case if
- * the interrupt is used for NMI entry.
- */
- uap->im = pl011_read(uap, REG_IMSC);
- pl011_write(UART011_RTIM | UART011_RXIM, uap, REG_IMSC);
-
if (dev_get_platdata(uap->port.dev)) {
struct amba_pl011_data *plat;
@@ -1711,6 +1713,96 @@ static void pl011_put_poll_char(struct uart_port *port,
pl011_write(ch, uap, REG_DR);
}
+static irqreturn_t pl011_nmi_int(int irq, void *dev_id)
+{
+ struct uart_amba_port *uap = dev_id;
+ unsigned int status, pass_counter = AMBA_ISR_PASS_LIMIT;
+ int handled = 0;
+
+ status = pl011_read(uap, REG_MIS);
+ if (status) {
+ do {
+ check_apply_cts_event_workaround(uap);
+
+ pl011_write(status, uap, REG_ICR);
+
+ if (status & (UART011_RTIS|UART011_RXIS)) {
+ pl011_fifo_to_tty(uap);
+ irq_work_queue(&uap->port.nmi_state.rx_work);
+ }
+
+ if (status & UART011_TXIS)
+ irq_work_queue(&uap->port.nmi_state.tx_work);
+
+ if (pass_counter-- == 0)
+ break;
+
+ status = pl011_read(uap, REG_MIS);
+ } while (status != 0);
+ handled = 1;
+ }
+
+ return IRQ_RETVAL(handled);
+}
+
+static int pl011_allocate_nmi(struct uart_amba_port *uap)
+{
+ int ret;
+
+ irq_set_status_flags(uap->port.irq, IRQ_NOAUTOEN);
+ ret = request_nmi(uap->port.irq, pl011_nmi_int, IRQF_PERCPU,
+ "uart-pl011", uap);
+ if (ret) {
+ irq_clear_status_flags(uap->port.irq, IRQ_NOAUTOEN);
+ return ret;
+ }
+
+ enable_irq(uap->port.irq);
+
+ return ret;
+}
+
+static void pl011_tx_irq_callback(struct uart_port *port)
+{
+ struct uart_amba_port *uap =
+ container_of(port, struct uart_amba_port, port);
+
+ spin_lock(&port->lock);
+ pl011_tx_chars(uap, true);
+ spin_unlock(&port->lock);
+}
+
+static int pl011_poll_init(struct uart_port *port)
+{
+ struct uart_amba_port *uap =
+ container_of(port, struct uart_amba_port, port);
+ int retval;
+
+ retval = pl011_hwinit(port);
+ if (retval)
+ goto clk_dis;
+
+ /* In case NMI isn't supported, fallback to normal interrupt mode */
+ retval = pl011_allocate_nmi(uap);
+ if (retval)
+ return 0;
+
+ retval = uart_nmi_state_init(port);
+ if (retval)
+ goto clk_dis;
+
+ port->nmi_state.tx_irq_callback = pl011_tx_irq_callback;
+ uart_set_nmi_active(port, true);
+
+ pl011_enable_interrupts(uap);
+
+ return 0;
+
+ clk_dis:
+ clk_disable_unprepare(uap->clk);
+ return retval;
+}
+
#endif /* CONFIG_CONSOLE_POLL */
static bool pl011_split_lcrh(const struct uart_amba_port *uap)
@@ -1736,8 +1828,6 @@ static void pl011_write_lcr_h(struct uart_amba_port *uap, unsigned int lcr_h)
static int pl011_allocate_irq(struct uart_amba_port *uap)
{
- pl011_write(uap->im, uap, REG_IMSC);
-
return request_irq(uap->port.irq, pl011_int, IRQF_SHARED, "uart-pl011", uap);
}
@@ -1748,6 +1838,9 @@ static int pl011_startup(struct uart_port *port)
unsigned int cr;
int retval;
+ if (uart_nmi_active(port))
+ return 0;
+
retval = pl011_hwinit(port);
if (retval)
goto clk_dis;
@@ -1790,6 +1883,9 @@ static int sbsa_uart_startup(struct uart_port *port)
container_of(port, struct uart_amba_port, port);
int retval;
+ if (uart_nmi_active(port))
+ return 0;
+
retval = pl011_hwinit(port);
if (retval)
return retval;
@@ -1859,6 +1955,9 @@ static void pl011_shutdown(struct uart_port *port)
struct uart_amba_port *uap =
container_of(port, struct uart_amba_port, port);
+ if (uart_nmi_active(port))
+ return;
+
pl011_disable_interrupts(uap);
pl011_dma_shutdown(uap);
@@ -1891,6 +1990,9 @@ static void sbsa_uart_shutdown(struct uart_port *port)
struct uart_amba_port *uap =
container_of(port, struct uart_amba_port, port);
+ if (uart_nmi_active(port))
+ return;
+
pl011_disable_interrupts(uap);
free_irq(uap->port.irq, uap);
@@ -2142,7 +2244,7 @@ static const struct uart_ops amba_pl011_pops = {
.config_port = pl011_config_port,
.verify_port = pl011_verify_port,
#ifdef CONFIG_CONSOLE_POLL
- .poll_init = pl011_hwinit,
+ .poll_init = pl011_poll_init,
.poll_get_char = pl011_get_poll_char,
.poll_put_char = pl011_put_poll_char,
#endif
@@ -2173,7 +2275,7 @@ static const struct uart_ops sbsa_uart_pops = {
.config_port = pl011_config_port,
.verify_port = pl011_verify_port,
#ifdef CONFIG_CONSOLE_POLL
- .poll_init = pl011_hwinit,
+ .poll_init = pl011_poll_init,
.poll_get_char = pl011_get_poll_char,
.poll_put_char = pl011_put_poll_char,
#endif
--
2.7.4
This driver provided a special ttyNMI0 port to enable NMI debugging
capabilities for kgdb but it remained in silos with the serial
core/drivers which made it a bit odd to enable using serial device
interrupt and hence remained unused.
But now with the serial core/drivers becoming NMI aware which in turn
provides NMI debugging capabilities via magic sysrq, there is no specific
reason to keep this special driver. So remove it instead.
Signed-off-by: Sumit Garg <[email protected]>
---
drivers/tty/serial/Kconfig | 19 ---
drivers/tty/serial/Makefile | 1 -
drivers/tty/serial/kgdb_nmi.c | 383 ------------------------------------------
drivers/tty/serial/kgdboc.c | 8 -
include/linux/kgdb.h | 10 --
5 files changed, 421 deletions(-)
delete mode 100644 drivers/tty/serial/kgdb_nmi.c
diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index 780908d..625d283 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -176,25 +176,6 @@ config SERIAL_ATMEL_TTYAT
Say Y if you have an external 8250/16C550 UART. If unsure, say N.
-config SERIAL_KGDB_NMI
- bool "Serial console over KGDB NMI debugger port"
- depends on KGDB_SERIAL_CONSOLE
- help
- This special driver allows you to temporary use NMI debugger port
- as a normal console (assuming that the port is attached to KGDB).
-
- Unlike KDB's disable_nmi command, with this driver you are always
- able to go back to the debugger using KGDB escape sequence ($3#33).
- This is because this console driver processes the input in NMI
- context, and thus is able to intercept the magic sequence.
-
- Note that since the console interprets input and uses polling
- communication methods, for things like PPP you still must fully
- detach debugger port from the KGDB NMI (i.e. disable_nmi), and
- use raw console.
-
- If unsure, say N.
-
config SERIAL_MESON
tristate "Meson serial port support"
depends on ARCH_MESON
diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
index d056ee6..9ea6263 100644
--- a/drivers/tty/serial/Makefile
+++ b/drivers/tty/serial/Makefile
@@ -93,5 +93,4 @@ obj-$(CONFIG_SERIAL_SIFIVE) += sifive.o
# GPIOLIB helpers for modem control lines
obj-$(CONFIG_SERIAL_MCTRL_GPIO) += serial_mctrl_gpio.o
-obj-$(CONFIG_SERIAL_KGDB_NMI) += kgdb_nmi.o
obj-$(CONFIG_KGDB_SERIAL_CONSOLE) += kgdboc.o
diff --git a/drivers/tty/serial/kgdb_nmi.c b/drivers/tty/serial/kgdb_nmi.c
deleted file mode 100644
index 6004c0c..0000000
--- a/drivers/tty/serial/kgdb_nmi.c
+++ /dev/null
@@ -1,383 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * KGDB NMI serial console
- *
- * Copyright 2010 Google, Inc.
- * Arve Hjønnevåg <[email protected]>
- * Colin Cross <[email protected]>
- * Copyright 2012 Linaro Ltd.
- * Anton Vorontsov <[email protected]>
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/slab.h>
-#include <linux/errno.h>
-#include <linux/atomic.h>
-#include <linux/console.h>
-#include <linux/tty.h>
-#include <linux/tty_driver.h>
-#include <linux/tty_flip.h>
-#include <linux/serial_core.h>
-#include <linux/interrupt.h>
-#include <linux/hrtimer.h>
-#include <linux/tick.h>
-#include <linux/kfifo.h>
-#include <linux/kgdb.h>
-#include <linux/kdb.h>
-
-static int kgdb_nmi_knock = 1;
-module_param_named(knock, kgdb_nmi_knock, int, 0600);
-MODULE_PARM_DESC(knock, "if set to 1 (default), the special '$3#33' command " \
- "must be used to enter the debugger; when set to 0, " \
- "hitting return key is enough to enter the debugger; " \
- "when set to -1, the debugger is entered immediately " \
- "upon NMI");
-
-static char *kgdb_nmi_magic = "$3#33";
-module_param_named(magic, kgdb_nmi_magic, charp, 0600);
-MODULE_PARM_DESC(magic, "magic sequence to enter NMI debugger (default $3#33)");
-
-static atomic_t kgdb_nmi_num_readers = ATOMIC_INIT(0);
-
-static int kgdb_nmi_console_setup(struct console *co, char *options)
-{
- arch_kgdb_ops.enable_nmi(1);
-
- /* The NMI console uses the dbg_io_ops to issue console messages. To
- * avoid duplicate messages during kdb sessions we must inform kdb's
- * I/O utilities that messages sent to the console will automatically
- * be displayed on the dbg_io.
- */
- dbg_io_ops->cons = co;
-
- return 0;
-}
-
-static void kgdb_nmi_console_write(struct console *co, const char *s, uint c)
-{
- int i;
-
- for (i = 0; i < c; i++)
- dbg_io_ops->write_char(s[i]);
-}
-
-static struct tty_driver *kgdb_nmi_tty_driver;
-
-static struct tty_driver *kgdb_nmi_console_device(struct console *co, int *idx)
-{
- *idx = co->index;
- return kgdb_nmi_tty_driver;
-}
-
-static struct console kgdb_nmi_console = {
- .name = "ttyNMI",
- .setup = kgdb_nmi_console_setup,
- .write = kgdb_nmi_console_write,
- .device = kgdb_nmi_console_device,
- .flags = CON_PRINTBUFFER | CON_ANYTIME,
- .index = -1,
-};
-
-/*
- * This is usually the maximum rate on debug ports. We make fifo large enough
- * to make copy-pasting to the terminal usable.
- */
-#define KGDB_NMI_BAUD 115200
-#define KGDB_NMI_FIFO_SIZE roundup_pow_of_two(KGDB_NMI_BAUD / 8 / HZ)
-
-struct kgdb_nmi_tty_priv {
- struct tty_port port;
- struct timer_list timer;
- STRUCT_KFIFO(char, KGDB_NMI_FIFO_SIZE) fifo;
-};
-
-static struct tty_port *kgdb_nmi_port;
-
-static void kgdb_tty_recv(int ch)
-{
- struct kgdb_nmi_tty_priv *priv;
- char c = ch;
-
- if (!kgdb_nmi_port || ch < 0)
- return;
- /*
- * Can't use port->tty->driver_data as tty might be not there. Timer
- * will check for tty and will get the ref, but here we don't have to
- * do that, and actually, we can't: we're in NMI context, no locks are
- * possible.
- */
- priv = container_of(kgdb_nmi_port, struct kgdb_nmi_tty_priv, port);
- kfifo_in(&priv->fifo, &c, 1);
-}
-
-static int kgdb_nmi_poll_one_knock(void)
-{
- static int n;
- int c = -1;
- const char *magic = kgdb_nmi_magic;
- size_t m = strlen(magic);
- bool printch = false;
-
- c = dbg_io_ops->read_char();
- if (c == NO_POLL_CHAR)
- return c;
-
- if (!kgdb_nmi_knock && (c == '\r' || c == '\n')) {
- return 1;
- } else if (c == magic[n]) {
- n = (n + 1) % m;
- if (!n)
- return 1;
- printch = true;
- } else {
- n = 0;
- }
-
- if (atomic_read(&kgdb_nmi_num_readers)) {
- kgdb_tty_recv(c);
- return 0;
- }
-
- if (printch) {
- kdb_printf("%c", c);
- return 0;
- }
-
- kdb_printf("\r%s %s to enter the debugger> %*s",
- kgdb_nmi_knock ? "Type" : "Hit",
- kgdb_nmi_knock ? magic : "<return>", (int)m, "");
- while (m--)
- kdb_printf("\b");
- return 0;
-}
-
-/**
- * kgdb_nmi_poll_knock - Check if it is time to enter the debugger
- *
- * "Serial ports are often noisy, especially when muxed over another port (we
- * often use serial over the headset connector). Noise on the async command
- * line just causes characters that are ignored, on a command line that blocked
- * execution noise would be catastrophic." -- Colin Cross
- *
- * So, this function implements KGDB/KDB knocking on the serial line: we won't
- * enter the debugger until we receive a known magic phrase (which is actually
- * "$3#33", known as "escape to KDB" command. There is also a relaxed variant
- * of knocking, i.e. just pressing the return key is enough to enter the
- * debugger. And if knocking is disabled, the function always returns 1.
- */
-bool kgdb_nmi_poll_knock(void)
-{
- if (kgdb_nmi_knock < 0)
- return true;
-
- while (1) {
- int ret;
-
- ret = kgdb_nmi_poll_one_knock();
- if (ret == NO_POLL_CHAR)
- return false;
- else if (ret == 1)
- break;
- }
- return true;
-}
-
-/*
- * The tasklet is cheap, it does not cause wakeups when reschedules itself,
- * instead it waits for the next tick.
- */
-static void kgdb_nmi_tty_receiver(struct timer_list *t)
-{
- struct kgdb_nmi_tty_priv *priv = from_timer(priv, t, timer);
- char ch;
-
- priv->timer.expires = jiffies + (HZ/100);
- add_timer(&priv->timer);
-
- if (likely(!atomic_read(&kgdb_nmi_num_readers) ||
- !kfifo_len(&priv->fifo)))
- return;
-
- while (kfifo_out(&priv->fifo, &ch, 1))
- tty_insert_flip_char(&priv->port, ch, TTY_NORMAL);
- tty_flip_buffer_push(&priv->port);
-}
-
-static int kgdb_nmi_tty_activate(struct tty_port *port, struct tty_struct *tty)
-{
- struct kgdb_nmi_tty_priv *priv =
- container_of(port, struct kgdb_nmi_tty_priv, port);
-
- kgdb_nmi_port = port;
- priv->timer.expires = jiffies + (HZ/100);
- add_timer(&priv->timer);
-
- return 0;
-}
-
-static void kgdb_nmi_tty_shutdown(struct tty_port *port)
-{
- struct kgdb_nmi_tty_priv *priv =
- container_of(port, struct kgdb_nmi_tty_priv, port);
-
- del_timer(&priv->timer);
- kgdb_nmi_port = NULL;
-}
-
-static const struct tty_port_operations kgdb_nmi_tty_port_ops = {
- .activate = kgdb_nmi_tty_activate,
- .shutdown = kgdb_nmi_tty_shutdown,
-};
-
-static int kgdb_nmi_tty_install(struct tty_driver *drv, struct tty_struct *tty)
-{
- struct kgdb_nmi_tty_priv *priv;
- int ret;
-
- priv = kzalloc(sizeof(*priv), GFP_KERNEL);
- if (!priv)
- return -ENOMEM;
-
- INIT_KFIFO(priv->fifo);
- timer_setup(&priv->timer, kgdb_nmi_tty_receiver, 0);
- tty_port_init(&priv->port);
- priv->port.ops = &kgdb_nmi_tty_port_ops;
- tty->driver_data = priv;
-
- ret = tty_port_install(&priv->port, drv, tty);
- if (ret) {
- pr_err("%s: can't install tty port: %d\n", __func__, ret);
- goto err;
- }
- return 0;
-err:
- tty_port_destroy(&priv->port);
- kfree(priv);
- return ret;
-}
-
-static void kgdb_nmi_tty_cleanup(struct tty_struct *tty)
-{
- struct kgdb_nmi_tty_priv *priv = tty->driver_data;
-
- tty->driver_data = NULL;
- tty_port_destroy(&priv->port);
- kfree(priv);
-}
-
-static int kgdb_nmi_tty_open(struct tty_struct *tty, struct file *file)
-{
- struct kgdb_nmi_tty_priv *priv = tty->driver_data;
- unsigned int mode = file->f_flags & O_ACCMODE;
- int ret;
-
- ret = tty_port_open(&priv->port, tty, file);
- if (!ret && (mode == O_RDONLY || mode == O_RDWR))
- atomic_inc(&kgdb_nmi_num_readers);
-
- return ret;
-}
-
-static void kgdb_nmi_tty_close(struct tty_struct *tty, struct file *file)
-{
- struct kgdb_nmi_tty_priv *priv = tty->driver_data;
- unsigned int mode = file->f_flags & O_ACCMODE;
-
- if (mode == O_RDONLY || mode == O_RDWR)
- atomic_dec(&kgdb_nmi_num_readers);
-
- tty_port_close(&priv->port, tty, file);
-}
-
-static void kgdb_nmi_tty_hangup(struct tty_struct *tty)
-{
- struct kgdb_nmi_tty_priv *priv = tty->driver_data;
-
- tty_port_hangup(&priv->port);
-}
-
-static int kgdb_nmi_tty_write_room(struct tty_struct *tty)
-{
- /* Actually, we can handle any amount as we use polled writes. */
- return 2048;
-}
-
-static int kgdb_nmi_tty_write(struct tty_struct *tty, const unchar *buf, int c)
-{
- int i;
-
- for (i = 0; i < c; i++)
- dbg_io_ops->write_char(buf[i]);
- return c;
-}
-
-static const struct tty_operations kgdb_nmi_tty_ops = {
- .open = kgdb_nmi_tty_open,
- .close = kgdb_nmi_tty_close,
- .install = kgdb_nmi_tty_install,
- .cleanup = kgdb_nmi_tty_cleanup,
- .hangup = kgdb_nmi_tty_hangup,
- .write_room = kgdb_nmi_tty_write_room,
- .write = kgdb_nmi_tty_write,
-};
-
-int kgdb_register_nmi_console(void)
-{
- int ret;
-
- if (!arch_kgdb_ops.enable_nmi)
- return 0;
-
- kgdb_nmi_tty_driver = alloc_tty_driver(1);
- if (!kgdb_nmi_tty_driver) {
- pr_err("%s: cannot allocate tty\n", __func__);
- return -ENOMEM;
- }
- kgdb_nmi_tty_driver->driver_name = "ttyNMI";
- kgdb_nmi_tty_driver->name = "ttyNMI";
- kgdb_nmi_tty_driver->num = 1;
- kgdb_nmi_tty_driver->type = TTY_DRIVER_TYPE_SERIAL;
- kgdb_nmi_tty_driver->subtype = SERIAL_TYPE_NORMAL;
- kgdb_nmi_tty_driver->flags = TTY_DRIVER_REAL_RAW;
- kgdb_nmi_tty_driver->init_termios = tty_std_termios;
- tty_termios_encode_baud_rate(&kgdb_nmi_tty_driver->init_termios,
- KGDB_NMI_BAUD, KGDB_NMI_BAUD);
- tty_set_operations(kgdb_nmi_tty_driver, &kgdb_nmi_tty_ops);
-
- ret = tty_register_driver(kgdb_nmi_tty_driver);
- if (ret) {
- pr_err("%s: can't register tty driver: %d\n", __func__, ret);
- goto err_drv_reg;
- }
-
- register_console(&kgdb_nmi_console);
-
- return 0;
-err_drv_reg:
- put_tty_driver(kgdb_nmi_tty_driver);
- return ret;
-}
-EXPORT_SYMBOL_GPL(kgdb_register_nmi_console);
-
-int kgdb_unregister_nmi_console(void)
-{
- int ret;
-
- if (!arch_kgdb_ops.enable_nmi)
- return 0;
- arch_kgdb_ops.enable_nmi(0);
-
- ret = unregister_console(&kgdb_nmi_console);
- if (ret)
- return ret;
-
- ret = tty_unregister_driver(kgdb_nmi_tty_driver);
- if (ret)
- return ret;
- put_tty_driver(kgdb_nmi_tty_driver);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(kgdb_unregister_nmi_console);
diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 84ffede..e959e72 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -158,8 +158,6 @@ static void cleanup_kgdboc(void)
if (configured != 1)
return;
- if (kgdb_unregister_nmi_console())
- return;
kgdboc_unregister_kbd();
kgdb_unregister_io_module(&kgdboc_io_ops);
}
@@ -210,16 +208,10 @@ static int configure_kgdboc(void)
if (err)
goto noconfig;
- err = kgdb_register_nmi_console();
- if (err)
- goto nmi_con_failed;
-
configured = 1;
return 0;
-nmi_con_failed:
- kgdb_unregister_io_module(&kgdboc_io_ops);
noconfig:
kgdboc_unregister_kbd();
configured = 0;
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 529116b..2e8c5de 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -294,16 +294,6 @@ extern const struct kgdb_arch arch_kgdb_ops;
extern unsigned long kgdb_arch_pc(int exception, struct pt_regs *regs);
-#ifdef CONFIG_SERIAL_KGDB_NMI
-extern int kgdb_register_nmi_console(void);
-extern int kgdb_unregister_nmi_console(void);
-extern bool kgdb_nmi_poll_knock(void);
-#else
-static inline int kgdb_register_nmi_console(void) { return 0; }
-static inline int kgdb_unregister_nmi_console(void) { return 0; }
-static inline bool kgdb_nmi_poll_knock(void) { return true; }
-#endif
-
extern int kgdb_register_io_module(struct kgdb_io *local_kgdb_io_ops);
extern void kgdb_unregister_io_module(struct kgdb_io *local_kgdb_io_ops);
extern struct kgdb_io *dbg_io_ops;
--
2.7.4
On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
>
> Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> advent of pseudo NMIs on arm64 it became quite generic to request serial
> device interrupt as an NMI rather than IRQ. And having NMI driven serial
> RX will allow us to trigger magic sysrq as an NMI and hence drop into
> kernel debugger in NMI context.
>
> The major use-case is to add NMI debugging capabilities to the kernel
> in order to debug scenarios such as:
> - Primary CPU is stuck in deadlock with interrupts disabled and hence
> doesn't honor serial device interrupt. So having magic sysrq triggered
> as an NMI is helpful for debugging.
> - Always enabled NMI based magic sysrq irrespective of whether the serial
> TTY port is active or not.
>
> Currently there is an existing kgdb NMI serial driver which provides
> partial implementation in upstream to have a separate ttyNMI0 port but
> that remained in silos with the serial core/drivers which made it a bit
> odd to enable using serial device interrupt and hence remained unused. It
> seems to be clearly intended to avoid almost all custom NMI changes to
> the UART driver.
>
> But this patch-set allows the serial core/drivers to be NMI aware which
> in turn provides NMI debugging capabilities via magic sysrq and hence
> there is no specific reason to keep this special driver. So remove it
> instead.
>
> Approach:
> ---------
>
> The overall idea is to intercept serial RX characters in NMI context, if
> those are specific to magic sysrq then allow corresponding handler to run
> in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> work queue in order to run those in normal interrupt context.
>
> This approach is demonstrated using amba-pl011 driver.
>
> Patch-wise description:
> -----------------------
>
> Patch #1 prepares magic sysrq handler to be NMI aware.
> Patch #2 adds NMI framework to serial core.
> Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> Patch #5 removes kgdb NMI serial driver.
>
> Goal of this RFC:
> -----------------
>
> My main reason for sharing this as an RFC is to help decide whether or
> not to continue with this approach. The next step for me would to port
> the work to a system with an 8250 UART.
>
A gentle reminder to seek feedback on this series.
-Sumit
> Usage:
> ------
>
> This RFC has been developed on top of 5.8-rc3 and if anyone is interested
> to give this a try on QEMU, just enable following config options
> additional to arm64 defconfig:
>
> CONFIG_KGDB=y
> CONFIG_KGDB_KDB=y
> CONFIG_ARM64_PSEUDO_NMI=y
>
> Qemu command line to test:
>
> $ qemu-system-aarch64 -nographic -machine virt,gic-version=3 -cpu cortex-a57 \
> -smp 2 -kernel arch/arm64/boot/Image -append 'console=ttyAMA0,38400 \
> keep_bootcon root=/dev/vda2 irqchip.gicv3_pseudo_nmi=1 kgdboc=ttyAMA0' \
> -initrd rootfs-arm64.cpio.gz
>
> NMI entry into kgdb via sysrq:
> - Ctrl a + b + g
>
> Reference:
> ----------
>
> For more details about NMI/FIQ debugger, refer to this blog post [1].
>
> [1] https://www.linaro.org/blog/debugging-arm-kernels-using-nmifiq/
>
> I do look forward to your comments and feedback.
>
> Sumit Garg (5):
> tty/sysrq: Make sysrq handler NMI aware
> serial: core: Add framework to allow NMI aware serial drivers
> serial: amba-pl011: Re-order APIs definition
> serial: amba-pl011: Enable NMI aware uart port
> serial: Remove KGDB NMI serial driver
>
> drivers/tty/serial/Kconfig | 19 --
> drivers/tty/serial/Makefile | 1 -
> drivers/tty/serial/amba-pl011.c | 232 +++++++++++++++++-------
> drivers/tty/serial/kgdb_nmi.c | 383 ---------------------------------------
> drivers/tty/serial/kgdboc.c | 8 -
> drivers/tty/serial/serial_core.c | 120 +++++++++++-
> drivers/tty/sysrq.c | 33 +++-
> include/linux/kgdb.h | 10 -
> include/linux/serial_core.h | 67 +++++++
> include/linux/sysrq.h | 1 +
> kernel/debug/debug_core.c | 1 +
> 11 files changed, 386 insertions(+), 489 deletions(-)
> delete mode 100644 drivers/tty/serial/kgdb_nmi.c
>
> --
> 2.7.4
>
On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> >
> > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > kernel debugger in NMI context.
> >
> > The major use-case is to add NMI debugging capabilities to the kernel
> > in order to debug scenarios such as:
> > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > doesn't honor serial device interrupt. So having magic sysrq triggered
> > as an NMI is helpful for debugging.
> > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > TTY port is active or not.
> >
> > Currently there is an existing kgdb NMI serial driver which provides
> > partial implementation in upstream to have a separate ttyNMI0 port but
> > that remained in silos with the serial core/drivers which made it a bit
> > odd to enable using serial device interrupt and hence remained unused. It
> > seems to be clearly intended to avoid almost all custom NMI changes to
> > the UART driver.
> >
> > But this patch-set allows the serial core/drivers to be NMI aware which
> > in turn provides NMI debugging capabilities via magic sysrq and hence
> > there is no specific reason to keep this special driver. So remove it
> > instead.
> >
> > Approach:
> > ---------
> >
> > The overall idea is to intercept serial RX characters in NMI context, if
> > those are specific to magic sysrq then allow corresponding handler to run
> > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > work queue in order to run those in normal interrupt context.
> >
> > This approach is demonstrated using amba-pl011 driver.
> >
> > Patch-wise description:
> > -----------------------
> >
> > Patch #1 prepares magic sysrq handler to be NMI aware.
> > Patch #2 adds NMI framework to serial core.
> > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > Patch #5 removes kgdb NMI serial driver.
> >
> > Goal of this RFC:
> > -----------------
> >
> > My main reason for sharing this as an RFC is to help decide whether or
> > not to continue with this approach. The next step for me would to port
> > the work to a system with an 8250 UART.
> >
>
> A gentle reminder to seek feedback on this series.
It's the middle of the merge window, and I can't do anything.
Also, I almost never review RFC patches as I have have way too many
patches that people think are "right" to review first...
I suggest you work to flesh this out first and submit something that you
feels works properly.
good luck!
greg k-h
Hi Greg,
Thanks for your comments.
On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > >
> > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > kernel debugger in NMI context.
> > >
> > > The major use-case is to add NMI debugging capabilities to the kernel
> > > in order to debug scenarios such as:
> > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > as an NMI is helpful for debugging.
> > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > TTY port is active or not.
> > >
> > > Currently there is an existing kgdb NMI serial driver which provides
> > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > that remained in silos with the serial core/drivers which made it a bit
> > > odd to enable using serial device interrupt and hence remained unused. It
> > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > the UART driver.
> > >
> > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > there is no specific reason to keep this special driver. So remove it
> > > instead.
> > >
> > > Approach:
> > > ---------
> > >
> > > The overall idea is to intercept serial RX characters in NMI context, if
> > > those are specific to magic sysrq then allow corresponding handler to run
> > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > work queue in order to run those in normal interrupt context.
> > >
> > > This approach is demonstrated using amba-pl011 driver.
> > >
> > > Patch-wise description:
> > > -----------------------
> > >
> > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > Patch #2 adds NMI framework to serial core.
> > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > Patch #5 removes kgdb NMI serial driver.
> > >
> > > Goal of this RFC:
> > > -----------------
> > >
> > > My main reason for sharing this as an RFC is to help decide whether or
> > > not to continue with this approach. The next step for me would to port
> > > the work to a system with an 8250 UART.
> > >
> >
> > A gentle reminder to seek feedback on this series.
>
> It's the middle of the merge window, and I can't do anything.
>
> Also, I almost never review RFC patches as I have have way too many
> patches that people think are "right" to review first...
>
Okay, I understand and I can definitely wait for your feedback.
> I suggest you work to flesh this out first and submit something that you
> feels works properly.
>
IIUC, in order to make this approach substantial I need to make it
work with 8250 UART (major serial driver), correct? As currently it
works properly for amba-pl011 driver.
> good luck!
>
Thanks.
-Sumit
> greg k-h
On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
> Hi Greg,
>
> Thanks for your comments.
>
> On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > > >
> > > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > > kernel debugger in NMI context.
> > > >
> > > > The major use-case is to add NMI debugging capabilities to the kernel
> > > > in order to debug scenarios such as:
> > > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > > as an NMI is helpful for debugging.
> > > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > > TTY port is active or not.
> > > >
> > > > Currently there is an existing kgdb NMI serial driver which provides
> > > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > > that remained in silos with the serial core/drivers which made it a bit
> > > > odd to enable using serial device interrupt and hence remained unused. It
> > > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > > the UART driver.
> > > >
> > > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > > there is no specific reason to keep this special driver. So remove it
> > > > instead.
> > > >
> > > > Approach:
> > > > ---------
> > > >
> > > > The overall idea is to intercept serial RX characters in NMI context, if
> > > > those are specific to magic sysrq then allow corresponding handler to run
> > > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > > work queue in order to run those in normal interrupt context.
> > > >
> > > > This approach is demonstrated using amba-pl011 driver.
> > > >
> > > > Patch-wise description:
> > > > -----------------------
> > > >
> > > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > > Patch #2 adds NMI framework to serial core.
> > > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > > Patch #5 removes kgdb NMI serial driver.
> > > >
> > > > Goal of this RFC:
> > > > -----------------
> > > >
> > > > My main reason for sharing this as an RFC is to help decide whether or
> > > > not to continue with this approach. The next step for me would to port
> > > > the work to a system with an 8250 UART.
> > > >
> > >
> > > A gentle reminder to seek feedback on this series.
> >
> > It's the middle of the merge window, and I can't do anything.
> >
> > Also, I almost never review RFC patches as I have have way too many
> > patches that people think are "right" to review first...
> >
>
> Okay, I understand and I can definitely wait for your feedback.
My feedback here is this:
> > I suggest you work to flesh this out first and submit something that you
> > feels works properly.
:)
> IIUC, in order to make this approach substantial I need to make it
> work with 8250 UART (major serial driver), correct? As currently it
> works properly for amba-pl011 driver.
Yes, try to do that, or better yet, make it work with all serial drivers
automatically.
thanks,
greg k-h
Hi,
On Tue, Aug 11, 2020 at 7:58 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
> > Hi Greg,
> >
> > Thanks for your comments.
> >
> > On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > > > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > > > >
> > > > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > > > kernel debugger in NMI context.
> > > > >
> > > > > The major use-case is to add NMI debugging capabilities to the kernel
> > > > > in order to debug scenarios such as:
> > > > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > > > as an NMI is helpful for debugging.
> > > > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > > > TTY port is active or not.
> > > > >
> > > > > Currently there is an existing kgdb NMI serial driver which provides
> > > > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > > > that remained in silos with the serial core/drivers which made it a bit
> > > > > odd to enable using serial device interrupt and hence remained unused. It
> > > > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > > > the UART driver.
> > > > >
> > > > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > > > there is no specific reason to keep this special driver. So remove it
> > > > > instead.
> > > > >
> > > > > Approach:
> > > > > ---------
> > > > >
> > > > > The overall idea is to intercept serial RX characters in NMI context, if
> > > > > those are specific to magic sysrq then allow corresponding handler to run
> > > > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > > > work queue in order to run those in normal interrupt context.
> > > > >
> > > > > This approach is demonstrated using amba-pl011 driver.
> > > > >
> > > > > Patch-wise description:
> > > > > -----------------------
> > > > >
> > > > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > > > Patch #2 adds NMI framework to serial core.
> > > > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > > > Patch #5 removes kgdb NMI serial driver.
> > > > >
> > > > > Goal of this RFC:
> > > > > -----------------
> > > > >
> > > > > My main reason for sharing this as an RFC is to help decide whether or
> > > > > not to continue with this approach. The next step for me would to port
> > > > > the work to a system with an 8250 UART.
> > > > >
> > > >
> > > > A gentle reminder to seek feedback on this series.
It's been on my list for a while. I started it Friday but ran out of
time. This week hasn't been going as smoothly as I hoped but I'll
prioritize this since it's been too long.
> > > It's the middle of the merge window, and I can't do anything.
> > >
> > > Also, I almost never review RFC patches as I have have way too many
> > > patches that people think are "right" to review first...
> > >
> >
> > Okay, I understand and I can definitely wait for your feedback.
>
> My feedback here is this:
>
> > > I suggest you work to flesh this out first and submit something that you
> > > feels works properly.
>
> :)
>
> > IIUC, in order to make this approach substantial I need to make it
> > work with 8250 UART (major serial driver), correct? As currently it
> > works properly for amba-pl011 driver.
>
> Yes, try to do that, or better yet, make it work with all serial drivers
> automatically.
A bit of early feedback...
Although I'm not sure we can do Greg's "make it work everywhere
automatically", it's possible you could get half of your patch done
automatically. Specifically, your patch really does two things:
a) It leaves the serial port "active" all the time to look for sysrq.
In other words even if there is no serial client it's always reading
the port looking for characters. IMO this concept should be separated
out from the NMI concept and _could_ automatically work for all serial
drivers. You'd just need something in the serial core that acted like
a default client if nobody else opened the serial port. The nice
thing here is that we go through all the normal code paths and don't
need special cases in the driver.
b) It enables NMI for your particular serial driver. This seems like
it'd be hard to do automatically because you can't do the same things
at NMI that you could do in a normal interrupt handler.
NOTE: to me, a) is more important than b) (though it'd be nice to have
both). This would be especially true the earlier you could make a)
work since the main time when an "agetty" isn't running on my serial
port to read characters is during bootup.
Why is b) less important to me? Sure, it would let you drop into the
debugger in the case where the CPU handling serial port interrupts is
hung with IRQs disabled, but it _woudln't_ let you drop into the
debugger in the case where a different CPU is hung with IRQs disabled.
To get that we need NMI roundup (which, I know, you are also working
on for arm64). ...and, if we've got NMI roundup, presumably we can
find our way into the debugger by either moving the serial interrupt
to a different CPU ahead of time or using some type of lockup detector
(which I know you are also working on for arm64).
One last bit of feedback is that I noticed that you didn't try to
implement the old "knock" functionality of the old NMI driver that's
being deleted. That is: your new patches don't provide an alternate
way to drop into the debugger for systems where BREAK isn't hooked up.
That's not a hard requirement, but I was kinda hoping for it since I
have some systems that haven't routed BREAK properly. ;-)
I'll try to get some more detailed feedback in the next few days.
-Doug
On Tue, 11 Aug 2020 at 20:28, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
> > Hi Greg,
> >
> > Thanks for your comments.
> >
> > On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > > > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > > > >
> > > > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > > > kernel debugger in NMI context.
> > > > >
> > > > > The major use-case is to add NMI debugging capabilities to the kernel
> > > > > in order to debug scenarios such as:
> > > > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > > > as an NMI is helpful for debugging.
> > > > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > > > TTY port is active or not.
> > > > >
> > > > > Currently there is an existing kgdb NMI serial driver which provides
> > > > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > > > that remained in silos with the serial core/drivers which made it a bit
> > > > > odd to enable using serial device interrupt and hence remained unused. It
> > > > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > > > the UART driver.
> > > > >
> > > > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > > > there is no specific reason to keep this special driver. So remove it
> > > > > instead.
> > > > >
> > > > > Approach:
> > > > > ---------
> > > > >
> > > > > The overall idea is to intercept serial RX characters in NMI context, if
> > > > > those are specific to magic sysrq then allow corresponding handler to run
> > > > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > > > work queue in order to run those in normal interrupt context.
> > > > >
> > > > > This approach is demonstrated using amba-pl011 driver.
> > > > >
> > > > > Patch-wise description:
> > > > > -----------------------
> > > > >
> > > > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > > > Patch #2 adds NMI framework to serial core.
> > > > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > > > Patch #5 removes kgdb NMI serial driver.
> > > > >
> > > > > Goal of this RFC:
> > > > > -----------------
> > > > >
> > > > > My main reason for sharing this as an RFC is to help decide whether or
> > > > > not to continue with this approach. The next step for me would to port
> > > > > the work to a system with an 8250 UART.
> > > > >
> > > >
> > > > A gentle reminder to seek feedback on this series.
> > >
> > > It's the middle of the merge window, and I can't do anything.
> > >
> > > Also, I almost never review RFC patches as I have have way too many
> > > patches that people think are "right" to review first...
> > >
> >
> > Okay, I understand and I can definitely wait for your feedback.
>
> My feedback here is this:
>
> > > I suggest you work to flesh this out first and submit something that you
> > > feels works properly.
>
> :)
>
> > IIUC, in order to make this approach substantial I need to make it
> > work with 8250 UART (major serial driver), correct? As currently it
> > works properly for amba-pl011 driver.
>
> Yes, try to do that, or better yet, make it work with all serial drivers
> automatically.
I would like to make serial drivers work automatically but
unfortunately the interrupt request/ handling code is pretty specific
to the corresponding serial driver.
BTW, I will look for ways how we can make it much easier for serial
drivers to adapt.
-Sumit
>
> thanks,
>
> greg k-h
Hi Doug,
On Tue, 11 Aug 2020 at 22:46, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Tue, Aug 11, 2020 at 7:58 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
> > > Hi Greg,
> > >
> > > Thanks for your comments.
> > >
> > > On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > > > > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > > > > >
> > > > > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > > > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > > > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > > > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > > > > kernel debugger in NMI context.
> > > > > >
> > > > > > The major use-case is to add NMI debugging capabilities to the kernel
> > > > > > in order to debug scenarios such as:
> > > > > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > > > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > > > > as an NMI is helpful for debugging.
> > > > > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > > > > TTY port is active or not.
> > > > > >
> > > > > > Currently there is an existing kgdb NMI serial driver which provides
> > > > > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > > > > that remained in silos with the serial core/drivers which made it a bit
> > > > > > odd to enable using serial device interrupt and hence remained unused. It
> > > > > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > > > > the UART driver.
> > > > > >
> > > > > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > > > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > > > > there is no specific reason to keep this special driver. So remove it
> > > > > > instead.
> > > > > >
> > > > > > Approach:
> > > > > > ---------
> > > > > >
> > > > > > The overall idea is to intercept serial RX characters in NMI context, if
> > > > > > those are specific to magic sysrq then allow corresponding handler to run
> > > > > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > > > > work queue in order to run those in normal interrupt context.
> > > > > >
> > > > > > This approach is demonstrated using amba-pl011 driver.
> > > > > >
> > > > > > Patch-wise description:
> > > > > > -----------------------
> > > > > >
> > > > > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > > > > Patch #2 adds NMI framework to serial core.
> > > > > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > > > > Patch #5 removes kgdb NMI serial driver.
> > > > > >
> > > > > > Goal of this RFC:
> > > > > > -----------------
> > > > > >
> > > > > > My main reason for sharing this as an RFC is to help decide whether or
> > > > > > not to continue with this approach. The next step for me would to port
> > > > > > the work to a system with an 8250 UART.
> > > > > >
> > > > >
> > > > > A gentle reminder to seek feedback on this series.
>
> It's been on my list for a while. I started it Friday but ran out of
> time. This week hasn't been going as smoothly as I hoped but I'll
> prioritize this since it's been too long.
>
No worries and thanks for your feedback.
>
> > > > It's the middle of the merge window, and I can't do anything.
> > > >
> > > > Also, I almost never review RFC patches as I have have way too many
> > > > patches that people think are "right" to review first...
> > > >
> > >
> > > Okay, I understand and I can definitely wait for your feedback.
> >
> > My feedback here is this:
> >
> > > > I suggest you work to flesh this out first and submit something that you
> > > > feels works properly.
> >
> > :)
> >
> > > IIUC, in order to make this approach substantial I need to make it
> > > work with 8250 UART (major serial driver), correct? As currently it
> > > works properly for amba-pl011 driver.
> >
> > Yes, try to do that, or better yet, make it work with all serial drivers
> > automatically.
>
> A bit of early feedback...
>
> Although I'm not sure we can do Greg's "make it work everywhere
> automatically", it's possible you could get half of your patch done
> automatically. Specifically, your patch really does two things:
>
> a) It leaves the serial port "active" all the time to look for sysrq.
> In other words even if there is no serial client it's always reading
> the port looking for characters. IMO this concept should be separated
> out from the NMI concept and _could_ automatically work for all serial
> drivers. You'd just need something in the serial core that acted like
> a default client if nobody else opened the serial port. The nice
> thing here is that we go through all the normal code paths and don't
> need special cases in the driver.
Okay, will try to explore this option to have default serial port
client. Would this client be active in normal serial operation or only
active when we have kgdb active? One drawback I see for normal
operation could be power management as if user is not using serial
port and would like to disable corresponding clock in order to reduce
power consumption.
>
> b) It enables NMI for your particular serial driver. This seems like
> it'd be hard to do automatically because you can't do the same things
> at NMI that you could do in a normal interrupt handler.
Agree.
>
> NOTE: to me, a) is more important than b) (though it'd be nice to have
> both). This would be especially true the earlier you could make a)
> work since the main time when an "agetty" isn't running on my serial
> port to read characters is during bootup.
>
> Why is b) less important to me? Sure, it would let you drop into the
> debugger in the case where the CPU handling serial port interrupts is
> hung with IRQs disabled, but it _woudln't_ let you drop into the
> debugger in the case where a different CPU is hung with IRQs disabled.
> To get that we need NMI roundup (which, I know, you are also working
> on for arm64). ...and, if we've got NMI roundup, presumably we can
> find our way into the debugger by either moving the serial interrupt
> to a different CPU ahead of time or using some type of lockup detector
> (which I know you are also working on for arm64).
>
Thanks for sharing your preferences. I will try to get a) sorted out first.
Overall I agree with your approaches to debug hard-lockup scenarios
but they might not be so trivial for kernel engineers who doesn't
posses kernel debugging experience as you do. :)
And I still think NMI aware magic sysrq is useful for scenarios such as:
- Try to get system information during hard-lockup rather than just
panic via hard-lockup detection.
- Do normal start/stop debugger activity on a core which was stuck in
hard-lockup.
- Random boot freezes which are not easily reproducible.
>
> One last bit of feedback is that I noticed that you didn't try to
> implement the old "knock" functionality of the old NMI driver that's
> being deleted. That is: your new patches don't provide an alternate
> way to drop into the debugger for systems where BREAK isn't hooked up.
> That's not a hard requirement, but I was kinda hoping for it since I
> have some systems that haven't routed BREAK properly. ;-)
>
Yeah, this is on my TODO list to have a kgdb "knock" functionality to
be implemented via a common hook in serial core.
>
> I'll try to get some more detailed feedback in the next few days.
Thanks. I do look forward to your feedback.
-Sumit
>
> -Doug
Hi,
On Wed, Aug 12, 2020 at 7:53 AM Sumit Garg <[email protected]> wrote:
>
> Hi Doug,
>
> On Tue, 11 Aug 2020 at 22:46, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Tue, Aug 11, 2020 at 7:58 AM Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
> > > > Hi Greg,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > > > > > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > > > > > >
> > > > > > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > > > > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > > > > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > > > > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > > > > > kernel debugger in NMI context.
> > > > > > >
> > > > > > > The major use-case is to add NMI debugging capabilities to the kernel
> > > > > > > in order to debug scenarios such as:
> > > > > > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > > > > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > > > > > as an NMI is helpful for debugging.
> > > > > > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > > > > > TTY port is active or not.
> > > > > > >
> > > > > > > Currently there is an existing kgdb NMI serial driver which provides
> > > > > > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > > > > > that remained in silos with the serial core/drivers which made it a bit
> > > > > > > odd to enable using serial device interrupt and hence remained unused. It
> > > > > > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > > > > > the UART driver.
> > > > > > >
> > > > > > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > > > > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > > > > > there is no specific reason to keep this special driver. So remove it
> > > > > > > instead.
> > > > > > >
> > > > > > > Approach:
> > > > > > > ---------
> > > > > > >
> > > > > > > The overall idea is to intercept serial RX characters in NMI context, if
> > > > > > > those are specific to magic sysrq then allow corresponding handler to run
> > > > > > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > > > > > work queue in order to run those in normal interrupt context.
> > > > > > >
> > > > > > > This approach is demonstrated using amba-pl011 driver.
> > > > > > >
> > > > > > > Patch-wise description:
> > > > > > > -----------------------
> > > > > > >
> > > > > > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > > > > > Patch #2 adds NMI framework to serial core.
> > > > > > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > > > > > Patch #5 removes kgdb NMI serial driver.
> > > > > > >
> > > > > > > Goal of this RFC:
> > > > > > > -----------------
> > > > > > >
> > > > > > > My main reason for sharing this as an RFC is to help decide whether or
> > > > > > > not to continue with this approach. The next step for me would to port
> > > > > > > the work to a system with an 8250 UART.
> > > > > > >
> > > > > >
> > > > > > A gentle reminder to seek feedback on this series.
> >
> > It's been on my list for a while. I started it Friday but ran out of
> > time. This week hasn't been going as smoothly as I hoped but I'll
> > prioritize this since it's been too long.
> >
>
> No worries and thanks for your feedback.
>
> >
> > > > > It's the middle of the merge window, and I can't do anything.
> > > > >
> > > > > Also, I almost never review RFC patches as I have have way too many
> > > > > patches that people think are "right" to review first...
> > > > >
> > > >
> > > > Okay, I understand and I can definitely wait for your feedback.
> > >
> > > My feedback here is this:
> > >
> > > > > I suggest you work to flesh this out first and submit something that you
> > > > > feels works properly.
> > >
> > > :)
> > >
> > > > IIUC, in order to make this approach substantial I need to make it
> > > > work with 8250 UART (major serial driver), correct? As currently it
> > > > works properly for amba-pl011 driver.
> > >
> > > Yes, try to do that, or better yet, make it work with all serial drivers
> > > automatically.
> >
> > A bit of early feedback...
> >
> > Although I'm not sure we can do Greg's "make it work everywhere
> > automatically", it's possible you could get half of your patch done
> > automatically. Specifically, your patch really does two things:
> >
> > a) It leaves the serial port "active" all the time to look for sysrq.
> > In other words even if there is no serial client it's always reading
> > the port looking for characters. IMO this concept should be separated
> > out from the NMI concept and _could_ automatically work for all serial
> > drivers. You'd just need something in the serial core that acted like
> > a default client if nobody else opened the serial port. The nice
> > thing here is that we go through all the normal code paths and don't
> > need special cases in the driver.
>
> Okay, will try to explore this option to have default serial port
> client. Would this client be active in normal serial operation or only
> active when we have kgdb active? One drawback I see for normal
> operation could be power management as if user is not using serial
> port and would like to disable corresponding clock in order to reduce
> power consumption.
If I could pick the ideal, I'd say we'd do it any time the console is
configured for that port and magic sysrq is enabled. Presumably if
they're already choosing to output kernel log messages to the serial
port and they've enabled magic sysrq they're in a state where they'd
be OK with the extra power of also listening for characters?
> > b) It enables NMI for your particular serial driver. This seems like
> > it'd be hard to do automatically because you can't do the same things
> > at NMI that you could do in a normal interrupt handler.
>
> Agree.
>
> >
> > NOTE: to me, a) is more important than b) (though it'd be nice to have
> > both). This would be especially true the earlier you could make a)
> > work since the main time when an "agetty" isn't running on my serial
> > port to read characters is during bootup.
> >
> > Why is b) less important to me? Sure, it would let you drop into the
> > debugger in the case where the CPU handling serial port interrupts is
> > hung with IRQs disabled, but it _woudln't_ let you drop into the
> > debugger in the case where a different CPU is hung with IRQs disabled.
> > To get that we need NMI roundup (which, I know, you are also working
> > on for arm64). ...and, if we've got NMI roundup, presumably we can
> > find our way into the debugger by either moving the serial interrupt
> > to a different CPU ahead of time or using some type of lockup detector
> > (which I know you are also working on for arm64).
> >
>
> Thanks for sharing your preferences. I will try to get a) sorted out first.
>
> Overall I agree with your approaches to debug hard-lockup scenarios
> but they might not be so trivial for kernel engineers who doesn't
> posses kernel debugging experience as you do. :)
>
> And I still think NMI aware magic sysrq is useful for scenarios such as:
> - Try to get system information during hard-lockup rather than just
> panic via hard-lockup detection.
> - Do normal start/stop debugger activity on a core which was stuck in
> hard-lockup.
> - Random boot freezes which are not easily reproducible.
Don't get me wrong. Having sysrq from NMI seems like a good feature
to me. That being said, it will require non-trivial changes to each
serial driver to support it and that means that not all serial drivers
will support it. It also starts requiring knowledge of how NMIs work
(what's allowed in NMI mode / not allowed / how to avoid races) for
authors of serial drivers. I have a bit of a worry that the benefit
won't outweigh the extra complexity, but I guess time will tell. One
last worry is that I assume that most people testing (and even
automated testing labs) will either always enable NMI or won't enable
NMI. That means that everyone will be only testing one codepath or
the other and (given the complexity) the non-tested codepath will
break.
Hrm. Along the lines of the above, though: almost no modern systems
are uniprocessor. That means that even if one CPU is stuck with IRQs
off it's fairly likely that some other CPU is OK. Presumably you'd
get almost as much benefit as your patch but with more done
automatically if you could figure out how to detect that the serial
interrupt isn't being serviced and re-route it to a different CPU.
...or possibly you could use some variant of the hard lockup detector
and move all interrupts off a locked up CPU? You could make this an
option that's "default Y" when kgdb is turned on or something?
> > One last bit of feedback is that I noticed that you didn't try to
> > implement the old "knock" functionality of the old NMI driver that's
> > being deleted. That is: your new patches don't provide an alternate
> > way to drop into the debugger for systems where BREAK isn't hooked up.
> > That's not a hard requirement, but I was kinda hoping for it since I
> > have some systems that haven't routed BREAK properly. ;-)
> >
>
> Yeah, this is on my TODO list to have a kgdb "knock" functionality to
> be implemented via a common hook in serial core.
>
> >
> > I'll try to get some more detailed feedback in the next few days.
>
> Thanks. I do look forward to your feedback.
>
> -Sumit
>
> >
> > -Doug
Hi,
On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
>
> Allow serial device interrupt to be requested as an NMI during
> initialization in polling mode. If the irqchip doesn't support serial
> device interrupt as an NMI then fallback to it being as a normal IRQ.
>
> Currently this NMI aware uart port only supports NMI driven programmed
> IO operation whereas DMA operation isn't supported.
>
> And while operating in NMI mode, RX always remains active irrespective
> of whether corresponding TTY port is active or not. So we directly bail
> out of startup, shutdown and rx_stop APIs if NMI mode is active.
>
> Also, get rid of modification to interrupts enable mask in pl011_hwinit()
> as now we have a proper way to enable interrupts for NMI entry using
> pl011_enable_interrupts().
>
> Signed-off-by: Sumit Garg <[email protected]>
> ---
> drivers/tty/serial/amba-pl011.c | 124 ++++++++++++++++++++++++++++++++++++----
> 1 file changed, 113 insertions(+), 11 deletions(-)
Overall: I ran out of time to do a super full review, but presumably
you're going to spin this series anyway and I'll look at it again
then. For now a few things I noticed below...
> diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
> index 0983c5e..5df1c07 100644
> --- a/drivers/tty/serial/amba-pl011.c
> +++ b/drivers/tty/serial/amba-pl011.c
> @@ -41,6 +41,8 @@
> #include <linux/sizes.h>
> #include <linux/io.h>
> #include <linux/acpi.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
>
> #include "amba-pl011.h"
>
> @@ -347,6 +349,10 @@ static int pl011_fifo_to_tty(struct uart_amba_port *uap)
> if (uart_handle_sysrq_char(&uap->port, ch & 255))
> continue;
>
> + if (uart_nmi_handle_char(&uap->port, ch, UART011_DR_OE, ch,
> + flag))
> + continue;
> +
> uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag);
> }
>
> @@ -1316,6 +1322,9 @@ static void pl011_stop_rx(struct uart_port *port)
> struct uart_amba_port *uap =
> container_of(port, struct uart_amba_port, port);
>
> + if (uart_nmi_active(port))
> + return;
> +
> uap->im &= ~(UART011_RXIM|UART011_RTIM|UART011_FEIM|
> UART011_PEIM|UART011_BEIM|UART011_OEIM);
> pl011_write(uap->im, uap, REG_IMSC);
> @@ -1604,13 +1613,6 @@ static int pl011_hwinit(struct uart_port *port)
> UART011_FEIS | UART011_RTIS | UART011_RXIS,
> uap, REG_ICR);
>
> - /*
> - * Save interrupts enable mask, and enable RX interrupts in case if
> - * the interrupt is used for NMI entry.
> - */
> - uap->im = pl011_read(uap, REG_IMSC);
> - pl011_write(UART011_RTIM | UART011_RXIM, uap, REG_IMSC);
> -
> if (dev_get_platdata(uap->port.dev)) {
> struct amba_pl011_data *plat;
>
> @@ -1711,6 +1713,96 @@ static void pl011_put_poll_char(struct uart_port *port,
> pl011_write(ch, uap, REG_DR);
> }
>
> +static irqreturn_t pl011_nmi_int(int irq, void *dev_id)
> +{
I wish there was a better way to share code between this and
pl011_int(), but I guess it'd be too ugly? If nothing else it feels
like you should do something to make it more obvious to anyone looking
at them that they are sister functions and any change to one of them
should be reflected in the other. Maybe they should be logically next
to each other?
> + struct uart_amba_port *uap = dev_id;
> + unsigned int status, pass_counter = AMBA_ISR_PASS_LIMIT;
> + int handled = 0;
> +
> + status = pl011_read(uap, REG_MIS);
> + if (status) {
> + do {
> + check_apply_cts_event_workaround(uap);
> +
> + pl011_write(status, uap, REG_ICR);
> +
> + if (status & (UART011_RTIS|UART011_RXIS)) {
> + pl011_fifo_to_tty(uap);
> + irq_work_queue(&uap->port.nmi_state.rx_work);
It feels like it might be beneficial to not call irq_work_queue() in a
loop. It doesn't hurt but it feels like, at least, it's going to keep
doing a bunch of atomic operations. It's not like it'll cause the
work to run any sooner because it has to run on the same CPU, right?
> + }
> +
> + if (status & UART011_TXIS)
> + irq_work_queue(&uap->port.nmi_state.tx_work);
Here too...
> +
> + if (pass_counter-- == 0)
> + break;
> +
> + status = pl011_read(uap, REG_MIS);
> + } while (status != 0);
> + handled = 1;
> + }
> +
> + return IRQ_RETVAL(handled);
> +}
> +
> +static int pl011_allocate_nmi(struct uart_amba_port *uap)
> +{
> + int ret;
> +
> + irq_set_status_flags(uap->port.irq, IRQ_NOAUTOEN);
> + ret = request_nmi(uap->port.irq, pl011_nmi_int, IRQF_PERCPU,
> + "uart-pl011", uap);
> + if (ret) {
> + irq_clear_status_flags(uap->port.irq, IRQ_NOAUTOEN);
> + return ret;
> + }
> +
> + enable_irq(uap->port.irq);
> +
> + return ret;
> +}
> +
> +static void pl011_tx_irq_callback(struct uart_port *port)
> +{
> + struct uart_amba_port *uap =
> + container_of(port, struct uart_amba_port, port);
> +
> + spin_lock(&port->lock);
> + pl011_tx_chars(uap, true);
> + spin_unlock(&port->lock);
> +}
> +
> +static int pl011_poll_init(struct uart_port *port)
> +{
> + struct uart_amba_port *uap =
> + container_of(port, struct uart_amba_port, port);
> + int retval;
> +
> + retval = pl011_hwinit(port);
> + if (retval)
> + goto clk_dis;
I don't think you want "goto clk_dis" here.
> +
> + /* In case NMI isn't supported, fallback to normal interrupt mode */
> + retval = pl011_allocate_nmi(uap);
> + if (retval)
> + return 0;
> +
> + retval = uart_nmi_state_init(port);
> + if (retval)
> + goto clk_dis;
Wouldn't you also need to to somehow call free_nmi() in the error case?
> + port->nmi_state.tx_irq_callback = pl011_tx_irq_callback;
> + uart_set_nmi_active(port, true);
> +
> + pl011_enable_interrupts(uap);
> +
> + return 0;
> +
> + clk_dis:
> + clk_disable_unprepare(uap->clk);
> + return retval;
> +}
> +
> #endif /* CONFIG_CONSOLE_POLL */
>
> static bool pl011_split_lcrh(const struct uart_amba_port *uap)
> @@ -1736,8 +1828,6 @@ static void pl011_write_lcr_h(struct uart_amba_port *uap, unsigned int lcr_h)
>
> static int pl011_allocate_irq(struct uart_amba_port *uap)
> {
> - pl011_write(uap->im, uap, REG_IMSC);
> -
> return request_irq(uap->port.irq, pl011_int, IRQF_SHARED, "uart-pl011", uap);
> }
>
> @@ -1748,6 +1838,9 @@ static int pl011_startup(struct uart_port *port)
> unsigned int cr;
> int retval;
>
> + if (uart_nmi_active(port))
> + return 0;
> +
> retval = pl011_hwinit(port);
> if (retval)
> goto clk_dis;
> @@ -1790,6 +1883,9 @@ static int sbsa_uart_startup(struct uart_port *port)
> container_of(port, struct uart_amba_port, port);
> int retval;
>
> + if (uart_nmi_active(port))
> + return 0;
> +
> retval = pl011_hwinit(port);
> if (retval)
> return retval;
> @@ -1859,6 +1955,9 @@ static void pl011_shutdown(struct uart_port *port)
> struct uart_amba_port *uap =
> container_of(port, struct uart_amba_port, port);
>
> + if (uart_nmi_active(port))
> + return;
> +
> pl011_disable_interrupts(uap);
>
> pl011_dma_shutdown(uap);
> @@ -1891,6 +1990,9 @@ static void sbsa_uart_shutdown(struct uart_port *port)
> struct uart_amba_port *uap =
> container_of(port, struct uart_amba_port, port);
>
> + if (uart_nmi_active(port))
> + return;
> +
> pl011_disable_interrupts(uap);
>
> free_irq(uap->port.irq, uap);
> @@ -2142,7 +2244,7 @@ static const struct uart_ops amba_pl011_pops = {
> .config_port = pl011_config_port,
> .verify_port = pl011_verify_port,
> #ifdef CONFIG_CONSOLE_POLL
> - .poll_init = pl011_hwinit,
> + .poll_init = pl011_poll_init,
Do we need to add a "free" at this point?
> .poll_get_char = pl011_get_poll_char,
> .poll_put_char = pl011_put_poll_char,
> #endif
> @@ -2173,7 +2275,7 @@ static const struct uart_ops sbsa_uart_pops = {
> .config_port = pl011_config_port,
> .verify_port = pl011_verify_port,
> #ifdef CONFIG_CONSOLE_POLL
> - .poll_init = pl011_hwinit,
> + .poll_init = pl011_poll_init,
> .poll_get_char = pl011_get_poll_char,
> .poll_put_char = pl011_put_poll_char,
> #endif
> --
> 2.7.4
>
Hi,
On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
>
> Add NMI framework APIs in serial core which can be leveraged by serial
> drivers to have NMI driven serial transfers. These APIs are kept under
> CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> is the only known user to enable NMI driven serial port.
>
> The general idea is to intercept RX characters in NMI context, if those
> are specific to magic sysrq then allow corresponding handler to run in
> NMI context. Otherwise defer all other RX and TX operations to IRQ work
> queue in order to run those in normal interrupt context.
>
> Also, since magic sysrq entry APIs will need to be invoked from NMI
> context, so make those APIs NMI safe via deferring NMI unsafe work to
> IRQ work queue.
>
> Signed-off-by: Sumit Garg <[email protected]>
> ---
> drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> include/linux/serial_core.h | 67 ++++++++++++++++++++++
> 2 files changed, 185 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> index 57840cf..6342e90 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> return true;
> }
>
> +#ifdef CONFIG_CONSOLE_POLL
> + if (in_nmi())
> + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> + else
> + schedule_work(&sysrq_enable_work);
> +#else
> schedule_work(&sysrq_enable_work);
> -
> +#endif
It should be a very high bar to have #ifdefs inside functions. I
don't think this meets it. Instead maybe something like this
(untested and maybe slightly wrong syntax, but hopefully makes
sense?):
Outside the function:
#ifdef CONFIG_CONSOLE_POLL
#define queue_port_nmi_work(port, work_type)
irq_work_queue(&port->nmi_state.work_type)
#else
#define queue_port_nmi_work(port, work_type)
#endif
...and then:
if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
queue_port_nmi_work(port, sysrq_toggle_work);
else
schedule_work(&sysrq_enable_work);
---
The whole double-hopping is really quite annoying. I guess
schedule_work() can't be called from NMI context but can be called
from IRQ context? So you need to first transition from NMI context to
IRQ context and then go and schedule the work? Almost feels like we
should just fix schedule_work() to do this double-hop for you if
called from NMI context. Seems like you could even re-use the list
pointers in the work_struct to keep the queue of people who need to be
scheduled from the next irq_work? Worst case it seems like you could
add a schedule_work_nmi() that would do all the hoops for you. ...but
I also know very little about NMI so maybe I'm being naive.
> port->sysrq = 0;
> return true;
> }
> @@ -3273,12 +3279,122 @@ int uart_handle_break(struct uart_port *port)
> port->sysrq = 0;
> }
>
> - if (port->flags & UPF_SAK)
> + if (port->flags & UPF_SAK) {
> +#ifdef CONFIG_CONSOLE_POLL
> + if (in_nmi())
> + irq_work_queue(&port->nmi_state.sysrq_sak_work);
> + else
> + do_SAK(state->port.tty);
> +#else
> do_SAK(state->port.tty);
> +#endif
> + }
Similar comment as above about avoiding #ifdef in functions. NOTE: if
you have something like schedule_work_nmi() I think you could just
modify the do_SAK() function to call it and consider do_SAK() to be
NMI safe.
> return 0;
> }
> EXPORT_SYMBOL_GPL(uart_handle_break);
>
> +#ifdef CONFIG_CONSOLE_POLL
> +int uart_nmi_handle_char(struct uart_port *port, unsigned int status,
> + unsigned int overrun, unsigned int ch,
> + unsigned int flag)
> +{
> + struct uart_nmi_rx_data rx_data;
> +
> + if (!in_nmi())
> + return 0;
> +
> + rx_data.status = status;
> + rx_data.overrun = overrun;
> + rx_data.ch = ch;
> + rx_data.flag = flag;
> +
> + if (!kfifo_in(&port->nmi_state.rx_fifo, &rx_data, 1))
> + ++port->icount.buf_overrun;
> +
> + return 1;
> +}
> +EXPORT_SYMBOL_GPL(uart_nmi_handle_char);
> +
> +static void uart_nmi_rx_work(struct irq_work *rx_work)
> +{
> + struct uart_nmi_state *nmi_state =
> + container_of(rx_work, struct uart_nmi_state, rx_work);
> + struct uart_port *port =
> + container_of(nmi_state, struct uart_port, nmi_state);
> + struct uart_nmi_rx_data rx_data;
> +
> + /*
> + * In polling mode, serial device is initialized much prior to
> + * TTY port becoming active. This scenario is especially useful
> + * from debugging perspective such that magic sysrq or debugger
> + * entry would still be possible even when TTY port isn't
> + * active (consider a boot hang case or if a user hasn't opened
> + * the serial port). So we discard any other RX data apart from
> + * magic sysrq commands in case TTY port isn't active.
> + */
> + if (!port->state || !tty_port_active(&port->state->port)) {
> + kfifo_reset(&nmi_state->rx_fifo);
> + return;
> + }
> +
> + spin_lock(&port->lock);
> + while (kfifo_out(&nmi_state->rx_fifo, &rx_data, 1))
> + uart_insert_char(port, rx_data.status, rx_data.overrun,
> + rx_data.ch, rx_data.flag);
> + spin_unlock(&port->lock);
> +
> + tty_flip_buffer_push(&port->state->port);
> +}
> +
> +static void uart_nmi_tx_work(struct irq_work *tx_work)
> +{
> + struct uart_nmi_state *nmi_state =
> + container_of(tx_work, struct uart_nmi_state, tx_work);
> + struct uart_port *port =
> + container_of(nmi_state, struct uart_port, nmi_state);
> +
> + spin_lock(&port->lock);
> + if (nmi_state->tx_irq_callback)
> + nmi_state->tx_irq_callback(port);
> + spin_unlock(&port->lock);
> +}
> +
> +static void uart_nmi_sak_work(struct irq_work *work)
> +{
> + struct uart_nmi_state *nmi_state =
> + container_of(work, struct uart_nmi_state, sysrq_sak_work);
> + struct uart_port *port =
> + container_of(nmi_state, struct uart_port, nmi_state);
> +
> + do_SAK(port->state->port.tty);
> +}
> +
> +#ifdef CONFIG_MAGIC_SYSRQ_SERIAL
> +static void uart_nmi_toggle_work(struct irq_work *work)
> +{
> + schedule_work(&sysrq_enable_work);
> +}
Nit: weird that it's called "toggle" work but just wrapps "enable" work.
> +#endif
> +
> +int uart_nmi_state_init(struct uart_port *port)
> +{
> + int ret;
> +
> + ret = kfifo_alloc(&port->nmi_state.rx_fifo, 256, GFP_KERNEL);
> + if (ret)
> + return ret;
> +
> + init_irq_work(&port->nmi_state.rx_work, uart_nmi_rx_work);
> + init_irq_work(&port->nmi_state.tx_work, uart_nmi_tx_work);
> + init_irq_work(&port->nmi_state.sysrq_sak_work, uart_nmi_sak_work);
> +#ifdef CONFIG_MAGIC_SYSRQ_SERIAL
> + init_irq_work(&port->nmi_state.sysrq_toggle_work, uart_nmi_toggle_work);
> +#endif
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(uart_nmi_state_init);
> +#endif
> +
> EXPORT_SYMBOL(uart_write_wakeup);
> EXPORT_SYMBOL(uart_register_driver);
> EXPORT_SYMBOL(uart_unregister_driver);
> diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> index 9fd550e..84487a9 100644
> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -18,6 +18,8 @@
> #include <linux/tty.h>
> #include <linux/mutex.h>
> #include <linux/sysrq.h>
> +#include <linux/irq_work.h>
> +#include <linux/kfifo.h>
> #include <uapi/linux/serial_core.h>
>
> #ifdef CONFIG_SERIAL_CORE_CONSOLE
> @@ -103,6 +105,28 @@ struct uart_icount {
> typedef unsigned int __bitwise upf_t;
> typedef unsigned int __bitwise upstat_t;
>
> +#ifdef CONFIG_CONSOLE_POLL
> +struct uart_nmi_rx_data {
> + unsigned int status;
> + unsigned int overrun;
> + unsigned int ch;
> + unsigned int flag;
> +};
> +
> +struct uart_nmi_state {
> + bool active;
> +
> + struct irq_work tx_work;
> + void (*tx_irq_callback)(struct uart_port *port);
> +
> + struct irq_work rx_work;
> + DECLARE_KFIFO_PTR(rx_fifo, struct uart_nmi_rx_data);
> +
> + struct irq_work sysrq_sak_work;
> + struct irq_work sysrq_toggle_work;
> +};
> +#endif
> +
> struct uart_port {
> spinlock_t lock; /* port lock */
> unsigned long iobase; /* in/out[bwl] */
> @@ -255,6 +279,9 @@ struct uart_port {
> struct gpio_desc *rs485_term_gpio; /* enable RS485 bus termination */
> struct serial_iso7816 iso7816;
> void *private_data; /* generic platform data pointer */
> +#ifdef CONFIG_CONSOLE_POLL
> + struct uart_nmi_state nmi_state;
> +#endif
> };
>
> static inline int serial_port_in(struct uart_port *up, int offset)
> @@ -475,4 +502,44 @@ extern int uart_handle_break(struct uart_port *port);
> !((cflag) & CLOCAL))
>
> int uart_get_rs485_mode(struct uart_port *port);
> +
> +/*
> + * The following are helper functions for the NMI aware serial drivers.
> + * Currently NMI support is only enabled under polling mode.
> + */
> +
> +#ifdef CONFIG_CONSOLE_POLL
> +int uart_nmi_state_init(struct uart_port *port);
> +int uart_nmi_handle_char(struct uart_port *port, unsigned int status,
> + unsigned int overrun, unsigned int ch,
> + unsigned int flag);
> +
> +static inline bool uart_nmi_active(struct uart_port *port)
> +{
> + return port->nmi_state.active;
> +}
> +
> +static inline void uart_set_nmi_active(struct uart_port *port, bool val)
> +{
> + port->nmi_state.active = val;
> +}
> +#else
> +static inline int uart_nmi_handle_char(struct uart_port *port,
> + unsigned int status,
> + unsigned int overrun,
> + unsigned int ch, unsigned int flag)
> +{
> + return 0;
> +}
> +
> +static inline bool uart_nmi_active(struct uart_port *port)
> +{
> + return false;
> +}
> +
> +static inline void uart_set_nmi_active(struct uart_port *port, bool val)
> +{
> +}
> +#endif
> +
> #endif /* LINUX_SERIAL_CORE_H */
> --
> 2.7.4
>
Hi,
On Tue, Jul 21, 2020 at 5:10 AM Sumit Garg <[email protected]> wrote:
>
> In a future patch we will add support to the serial core to make it
> possible to trigger a magic sysrq from an NMI context. Prepare for this
> by marking some sysrq actions as NMI safe. Safe actions will be allowed
> to run from NMI context whilst that cannot run from an NMI will be queued
> as irq_work for later processing.
>
> A particular sysrq handler is only marked as NMI safe in case the handler
> isn't contending for any synchronization primitives as in NMI context
> they are expected to cause deadlocks. Note that the debug sysrq do not
> contend for any synchronization primitives. It does call kgdb_breakpoint()
> to provoke a trap but that trap handler should be NMI safe on
> architectures that implement an NMI.
>
> Signed-off-by: Sumit Garg <[email protected]>
> ---
> drivers/tty/sysrq.c | 33 ++++++++++++++++++++++++++++++++-
> include/linux/sysrq.h | 1 +
> kernel/debug/debug_core.c | 1 +
> 3 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index 7c95afa9..8017e33 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -50,6 +50,8 @@
> #include <linux/syscalls.h>
> #include <linux/of.h>
> #include <linux/rcupdate.h>
> +#include <linux/irq_work.h>
> +#include <linux/kfifo.h>
>
> #include <asm/ptrace.h>
> #include <asm/irq_regs.h>
> @@ -111,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
> .help_msg = "loglevel(0-9)",
> .action_msg = "Changing Loglevel",
> .enable_mask = SYSRQ_ENABLE_LOG,
> + .nmi_safe = true,
> };
>
> #ifdef CONFIG_VT
> @@ -157,6 +160,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
> .help_msg = "crash(c)",
> .action_msg = "Trigger a crash",
> .enable_mask = SYSRQ_ENABLE_DUMP,
> + .nmi_safe = true,
> };
>
> static void sysrq_handle_reboot(int key)
> @@ -170,6 +174,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
> .help_msg = "reboot(b)",
> .action_msg = "Resetting",
> .enable_mask = SYSRQ_ENABLE_BOOT,
> + .nmi_safe = true,
> };
>
> const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
> @@ -217,6 +222,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
> .handler = sysrq_handle_showlocks,
> .help_msg = "show-all-locks(d)",
> .action_msg = "Show Locks Held",
> + .nmi_safe = true,
> };
> #else
> #define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
> @@ -289,6 +295,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
> .help_msg = "show-registers(p)",
> .action_msg = "Show Regs",
> .enable_mask = SYSRQ_ENABLE_DUMP,
> + .nmi_safe = true,
> };
>
> static void sysrq_handle_showstate(int key)
> @@ -326,6 +333,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
> .help_msg = "dump-ftrace-buffer(z)",
> .action_msg = "Dump ftrace buffer",
> .enable_mask = SYSRQ_ENABLE_DUMP,
> + .nmi_safe = true,
> };
> #else
> #define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
> @@ -538,6 +546,23 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> sysrq_key_table[i] = op_p;
> }
>
> +#define SYSRQ_NMI_FIFO_SIZE 64
> +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
A 64-entry FIFO seems excessive. Quite honestly even a FIFO seems a
bit excessive and it feels like if two sysrqs were received in super
quick succession that it would be OK to just process the first one. I
guess if it simplifies the processing to have a FIFO then it shouldn't
hurt, but no need for 64 entries.
> +static void sysrq_do_nmi_work(struct irq_work *work)
> +{
> + const struct sysrq_key_op *op_p;
> + int key;
> +
> + while (kfifo_out(&sysrq_nmi_fifo, &key, 1)) {
> + op_p = __sysrq_get_key_op(key);
> + if (op_p)
> + op_p->handler(key);
> + }
Do you need to manage "suppress_printk" in this function? Do you need
to call rcu_sysrq_start() and rcu_read_lock()?
If so, how do you prevent racing between the mucking we're doing with
these things and the mucking that the NMI does with them?
> +}
> +
> +static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
> +
> void __handle_sysrq(int key, bool check_mask)
> {
> const struct sysrq_key_op *op_p;
> @@ -568,7 +593,13 @@ void __handle_sysrq(int key, bool check_mask)
> if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
> pr_info("%s\n", op_p->action_msg);
> console_loglevel = orig_log_level;
> - op_p->handler(key);
> +
> + if (in_nmi() && !op_p->nmi_safe) {
> + kfifo_in(&sysrq_nmi_fifo, &key, 1);
Rather than kfifo_in() and kfifo_out(), I think you can use
kfifo_put() and kfifo_get(). As I understand it those just get/put
one element which is what you want.
> + irq_work_queue(&sysrq_nmi_work);
Wishful thinking, but (as far as I can tell) irq_work_queue() only
queues work on the CPU running the NMI. I don't have lots of NMI
experience, but any chance there is a variant that will queue work on
any CPU? Then sysrq handlers that aren't NMI aware will be more
likely to work.
> + } else {
> + op_p->handler(key);
> + }
> } else {
> pr_info("This sysrq operation is disabled.\n");
> console_loglevel = orig_log_level;
> diff --git a/include/linux/sysrq.h b/include/linux/sysrq.h
> index 3a582ec..630b5b9 100644
> --- a/include/linux/sysrq.h
> +++ b/include/linux/sysrq.h
> @@ -34,6 +34,7 @@ struct sysrq_key_op {
> const char * const help_msg;
> const char * const action_msg;
> const int enable_mask;
> + const bool nmi_safe;
> };
>
> #ifdef CONFIG_MAGIC_SYSRQ
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 9e59347..2b51173 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -943,6 +943,7 @@ static const struct sysrq_key_op sysrq_dbg_op = {
> .handler = sysrq_handle_dbg,
> .help_msg = "debug(g)",
> .action_msg = "DEBUG",
> + .nmi_safe = true,
> };
> #endif
>
> --
> 2.7.4
>
Hi,
On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Wed, Aug 12, 2020 at 7:53 AM Sumit Garg <[email protected]> wrote:
> >
> > Hi Doug,
> >
> > On Tue, 11 Aug 2020 at 22:46, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Aug 11, 2020 at 7:58 AM Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
> > > > > Hi Greg,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > > > > > > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > > > > > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > > > > > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > > > > > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > > > > > > kernel debugger in NMI context.
> > > > > > > >
> > > > > > > > The major use-case is to add NMI debugging capabilities to the kernel
> > > > > > > > in order to debug scenarios such as:
> > > > > > > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > > > > > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > > > > > > as an NMI is helpful for debugging.
> > > > > > > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > > > > > > TTY port is active or not.
> > > > > > > >
> > > > > > > > Currently there is an existing kgdb NMI serial driver which provides
> > > > > > > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > > > > > > that remained in silos with the serial core/drivers which made it a bit
> > > > > > > > odd to enable using serial device interrupt and hence remained unused. It
> > > > > > > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > > > > > > the UART driver.
> > > > > > > >
> > > > > > > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > > > > > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > > > > > > there is no specific reason to keep this special driver. So remove it
> > > > > > > > instead.
> > > > > > > >
> > > > > > > > Approach:
> > > > > > > > ---------
> > > > > > > >
> > > > > > > > The overall idea is to intercept serial RX characters in NMI context, if
> > > > > > > > those are specific to magic sysrq then allow corresponding handler to run
> > > > > > > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > > > > > > work queue in order to run those in normal interrupt context.
> > > > > > > >
> > > > > > > > This approach is demonstrated using amba-pl011 driver.
> > > > > > > >
> > > > > > > > Patch-wise description:
> > > > > > > > -----------------------
> > > > > > > >
> > > > > > > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > > > > > > Patch #2 adds NMI framework to serial core.
> > > > > > > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > > > > > > Patch #5 removes kgdb NMI serial driver.
> > > > > > > >
> > > > > > > > Goal of this RFC:
> > > > > > > > -----------------
> > > > > > > >
> > > > > > > > My main reason for sharing this as an RFC is to help decide whether or
> > > > > > > > not to continue with this approach. The next step for me would to port
> > > > > > > > the work to a system with an 8250 UART.
> > > > > > > >
> > > > > > >
> > > > > > > A gentle reminder to seek feedback on this series.
> > >
> > > It's been on my list for a while. I started it Friday but ran out of
> > > time. This week hasn't been going as smoothly as I hoped but I'll
> > > prioritize this since it's been too long.
> > >
> >
> > No worries and thanks for your feedback.
> >
> > >
> > > > > > It's the middle of the merge window, and I can't do anything.
> > > > > >
> > > > > > Also, I almost never review RFC patches as I have have way too many
> > > > > > patches that people think are "right" to review first...
> > > > > >
> > > > >
> > > > > Okay, I understand and I can definitely wait for your feedback.
> > > >
> > > > My feedback here is this:
> > > >
> > > > > > I suggest you work to flesh this out first and submit something that you
> > > > > > feels works properly.
> > > >
> > > > :)
> > > >
> > > > > IIUC, in order to make this approach substantial I need to make it
> > > > > work with 8250 UART (major serial driver), correct? As currently it
> > > > > works properly for amba-pl011 driver.
> > > >
> > > > Yes, try to do that, or better yet, make it work with all serial drivers
> > > > automatically.
> > >
> > > A bit of early feedback...
> > >
> > > Although I'm not sure we can do Greg's "make it work everywhere
> > > automatically", it's possible you could get half of your patch done
> > > automatically. Specifically, your patch really does two things:
> > >
> > > a) It leaves the serial port "active" all the time to look for sysrq.
> > > In other words even if there is no serial client it's always reading
> > > the port looking for characters. IMO this concept should be separated
> > > out from the NMI concept and _could_ automatically work for all serial
> > > drivers. You'd just need something in the serial core that acted like
> > > a default client if nobody else opened the serial port. The nice
> > > thing here is that we go through all the normal code paths and don't
> > > need special cases in the driver.
> >
> > Okay, will try to explore this option to have default serial port
> > client. Would this client be active in normal serial operation or only
> > active when we have kgdb active? One drawback I see for normal
> > operation could be power management as if user is not using serial
> > port and would like to disable corresponding clock in order to reduce
> > power consumption.
>
> If I could pick the ideal, I'd say we'd do it any time the console is
> configured for that port and magic sysrq is enabled. Presumably if
> they're already choosing to output kernel log messages to the serial
> port and they've enabled magic sysrq they're in a state where they'd
> be OK with the extra power of also listening for characters?
>
>
> > > b) It enables NMI for your particular serial driver. This seems like
> > > it'd be hard to do automatically because you can't do the same things
> > > at NMI that you could do in a normal interrupt handler.
> >
> > Agree.
> >
> > >
> > > NOTE: to me, a) is more important than b) (though it'd be nice to have
> > > both). This would be especially true the earlier you could make a)
> > > work since the main time when an "agetty" isn't running on my serial
> > > port to read characters is during bootup.
> > >
> > > Why is b) less important to me? Sure, it would let you drop into the
> > > debugger in the case where the CPU handling serial port interrupts is
> > > hung with IRQs disabled, but it _woudln't_ let you drop into the
> > > debugger in the case where a different CPU is hung with IRQs disabled.
> > > To get that we need NMI roundup (which, I know, you are also working
> > > on for arm64). ...and, if we've got NMI roundup, presumably we can
> > > find our way into the debugger by either moving the serial interrupt
> > > to a different CPU ahead of time or using some type of lockup detector
> > > (which I know you are also working on for arm64).
> > >
> >
> > Thanks for sharing your preferences. I will try to get a) sorted out first.
> >
> > Overall I agree with your approaches to debug hard-lockup scenarios
> > but they might not be so trivial for kernel engineers who doesn't
> > posses kernel debugging experience as you do. :)
> >
> > And I still think NMI aware magic sysrq is useful for scenarios such as:
> > - Try to get system information during hard-lockup rather than just
> > panic via hard-lockup detection.
> > - Do normal start/stop debugger activity on a core which was stuck in
> > hard-lockup.
> > - Random boot freezes which are not easily reproducible.
>
> Don't get me wrong. Having sysrq from NMI seems like a good feature
> to me. That being said, it will require non-trivial changes to each
> serial driver to support it and that means that not all serial drivers
> will support it. It also starts requiring knowledge of how NMIs work
> (what's allowed in NMI mode / not allowed / how to avoid races) for
> authors of serial drivers. I have a bit of a worry that the benefit
> won't outweigh the extra complexity, but I guess time will tell. One
> last worry is that I assume that most people testing (and even
> automated testing labs) will either always enable NMI or won't enable
> NMI. That means that everyone will be only testing one codepath or
> the other and (given the complexity) the non-tested codepath will
> break.
>
> Hrm. Along the lines of the above, though: almost no modern systems
> are uniprocessor. That means that even if one CPU is stuck with IRQs
> off it's fairly likely that some other CPU is OK. Presumably you'd
> get almost as much benefit as your patch but with more done
> automatically if you could figure out how to detect that the serial
> interrupt isn't being serviced and re-route it to a different CPU.
> ...or possibly you could use some variant of the hard lockup detector
> and move all interrupts off a locked up CPU? You could make this an
> option that's "default Y" when kgdb is turned on or something?
One other idea occurred to me that's maybe simpler. You could in
theory just poll the serial port periodically to accomplish. It would
actually probably even work to call the normal serial port interrupt
routine from any random CPU. On many serial drivers the entire
interrupt handler is wrapped with:
spin_lock_irqsave(&uap->port.lock, flags);
...
spin_unlock_irqrestore(&uap->port.lock, flags);
And a few (the ones I was involved in fixing) have the similar pattern
of using uart_unlock_and_check_sysrq().
Any serial drivers following this pattern could have their interrupt
routine called periodically just to poll for characters and it'd be
fine, right? ...and having it take a second before a sysrq comes in
this case is probably not the end of the world?
One nice benefit of this is that it would actually work _better_ on
SMP systems for any sysrqs that aren't NMI safe. Specifically with
your patch series those would be queued with irq_work_queue() which
means they'd be blocked if the CPU processing the NMI is stuck with
IRQs disabled. With the polling mechanism they'd nicely just run on a
different CPU.
> > > One last bit of feedback is that I noticed that you didn't try to
> > > implement the old "knock" functionality of the old NMI driver that's
> > > being deleted. That is: your new patches don't provide an alternate
> > > way to drop into the debugger for systems where BREAK isn't hooked up.
> > > That's not a hard requirement, but I was kinda hoping for it since I
> > > have some systems that haven't routed BREAK properly. ;-)
> > >
> >
> > Yeah, this is on my TODO list to have a kgdb "knock" functionality to
> > be implemented via a common hook in serial core.
> >
> > >
> > > I'll try to get some more detailed feedback in the next few days.
> >
> > Thanks. I do look forward to your feedback.
> >
> > -Sumit
> >
> > >
> > > -Doug
On Thu, 13 Aug 2020 at 05:38, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
>
> On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Wed, Aug 12, 2020 at 7:53 AM Sumit Garg <[email protected]> wrote:
> > >
> > > Hi Doug,
> > >
> > > On Tue, 11 Aug 2020 at 22:46, Doug Anderson <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Aug 11, 2020 at 7:58 AM Greg Kroah-Hartman
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
> > > > > > Hi Greg,
> > > > > >
> > > > > > Thanks for your comments.
> > > > > >
> > > > > > On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
> > > > > > > > On Tue, 21 Jul 2020 at 17:40, Sumit Garg <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Make it possible for UARTs to trigger magic sysrq from an NMI. With the
> > > > > > > > > advent of pseudo NMIs on arm64 it became quite generic to request serial
> > > > > > > > > device interrupt as an NMI rather than IRQ. And having NMI driven serial
> > > > > > > > > RX will allow us to trigger magic sysrq as an NMI and hence drop into
> > > > > > > > > kernel debugger in NMI context.
> > > > > > > > >
> > > > > > > > > The major use-case is to add NMI debugging capabilities to the kernel
> > > > > > > > > in order to debug scenarios such as:
> > > > > > > > > - Primary CPU is stuck in deadlock with interrupts disabled and hence
> > > > > > > > > doesn't honor serial device interrupt. So having magic sysrq triggered
> > > > > > > > > as an NMI is helpful for debugging.
> > > > > > > > > - Always enabled NMI based magic sysrq irrespective of whether the serial
> > > > > > > > > TTY port is active or not.
> > > > > > > > >
> > > > > > > > > Currently there is an existing kgdb NMI serial driver which provides
> > > > > > > > > partial implementation in upstream to have a separate ttyNMI0 port but
> > > > > > > > > that remained in silos with the serial core/drivers which made it a bit
> > > > > > > > > odd to enable using serial device interrupt and hence remained unused. It
> > > > > > > > > seems to be clearly intended to avoid almost all custom NMI changes to
> > > > > > > > > the UART driver.
> > > > > > > > >
> > > > > > > > > But this patch-set allows the serial core/drivers to be NMI aware which
> > > > > > > > > in turn provides NMI debugging capabilities via magic sysrq and hence
> > > > > > > > > there is no specific reason to keep this special driver. So remove it
> > > > > > > > > instead.
> > > > > > > > >
> > > > > > > > > Approach:
> > > > > > > > > ---------
> > > > > > > > >
> > > > > > > > > The overall idea is to intercept serial RX characters in NMI context, if
> > > > > > > > > those are specific to magic sysrq then allow corresponding handler to run
> > > > > > > > > in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
> > > > > > > > > work queue in order to run those in normal interrupt context.
> > > > > > > > >
> > > > > > > > > This approach is demonstrated using amba-pl011 driver.
> > > > > > > > >
> > > > > > > > > Patch-wise description:
> > > > > > > > > -----------------------
> > > > > > > > >
> > > > > > > > > Patch #1 prepares magic sysrq handler to be NMI aware.
> > > > > > > > > Patch #2 adds NMI framework to serial core.
> > > > > > > > > Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
> > > > > > > > > Patch #5 removes kgdb NMI serial driver.
> > > > > > > > >
> > > > > > > > > Goal of this RFC:
> > > > > > > > > -----------------
> > > > > > > > >
> > > > > > > > > My main reason for sharing this as an RFC is to help decide whether or
> > > > > > > > > not to continue with this approach. The next step for me would to port
> > > > > > > > > the work to a system with an 8250 UART.
> > > > > > > > >
> > > > > > > >
> > > > > > > > A gentle reminder to seek feedback on this series.
> > > >
> > > > It's been on my list for a while. I started it Friday but ran out of
> > > > time. This week hasn't been going as smoothly as I hoped but I'll
> > > > prioritize this since it's been too long.
> > > >
> > >
> > > No worries and thanks for your feedback.
> > >
> > > >
> > > > > > > It's the middle of the merge window, and I can't do anything.
> > > > > > >
> > > > > > > Also, I almost never review RFC patches as I have have way too many
> > > > > > > patches that people think are "right" to review first...
> > > > > > >
> > > > > >
> > > > > > Okay, I understand and I can definitely wait for your feedback.
> > > > >
> > > > > My feedback here is this:
> > > > >
> > > > > > > I suggest you work to flesh this out first and submit something that you
> > > > > > > feels works properly.
> > > > >
> > > > > :)
> > > > >
> > > > > > IIUC, in order to make this approach substantial I need to make it
> > > > > > work with 8250 UART (major serial driver), correct? As currently it
> > > > > > works properly for amba-pl011 driver.
> > > > >
> > > > > Yes, try to do that, or better yet, make it work with all serial drivers
> > > > > automatically.
> > > >
> > > > A bit of early feedback...
> > > >
> > > > Although I'm not sure we can do Greg's "make it work everywhere
> > > > automatically", it's possible you could get half of your patch done
> > > > automatically. Specifically, your patch really does two things:
> > > >
> > > > a) It leaves the serial port "active" all the time to look for sysrq.
> > > > In other words even if there is no serial client it's always reading
> > > > the port looking for characters. IMO this concept should be separated
> > > > out from the NMI concept and _could_ automatically work for all serial
> > > > drivers. You'd just need something in the serial core that acted like
> > > > a default client if nobody else opened the serial port. The nice
> > > > thing here is that we go through all the normal code paths and don't
> > > > need special cases in the driver.
> > >
> > > Okay, will try to explore this option to have default serial port
> > > client. Would this client be active in normal serial operation or only
> > > active when we have kgdb active? One drawback I see for normal
> > > operation could be power management as if user is not using serial
> > > port and would like to disable corresponding clock in order to reduce
> > > power consumption.
> >
> > If I could pick the ideal, I'd say we'd do it any time the console is
> > configured for that port and magic sysrq is enabled. Presumably if
> > they're already choosing to output kernel log messages to the serial
> > port and they've enabled magic sysrq they're in a state where they'd
> > be OK with the extra power of also listening for characters?
> >
Okay, sounds reasonable to me.
> >
> > > > b) It enables NMI for your particular serial driver. This seems like
> > > > it'd be hard to do automatically because you can't do the same things
> > > > at NMI that you could do in a normal interrupt handler.
> > >
> > > Agree.
> > >
> > > >
> > > > NOTE: to me, a) is more important than b) (though it'd be nice to have
> > > > both). This would be especially true the earlier you could make a)
> > > > work since the main time when an "agetty" isn't running on my serial
> > > > port to read characters is during bootup.
> > > >
> > > > Why is b) less important to me? Sure, it would let you drop into the
> > > > debugger in the case where the CPU handling serial port interrupts is
> > > > hung with IRQs disabled, but it _woudln't_ let you drop into the
> > > > debugger in the case where a different CPU is hung with IRQs disabled.
> > > > To get that we need NMI roundup (which, I know, you are also working
> > > > on for arm64). ...and, if we've got NMI roundup, presumably we can
> > > > find our way into the debugger by either moving the serial interrupt
> > > > to a different CPU ahead of time or using some type of lockup detector
> > > > (which I know you are also working on for arm64).
> > > >
> > >
> > > Thanks for sharing your preferences. I will try to get a) sorted out first.
> > >
> > > Overall I agree with your approaches to debug hard-lockup scenarios
> > > but they might not be so trivial for kernel engineers who doesn't
> > > posses kernel debugging experience as you do. :)
> > >
> > > And I still think NMI aware magic sysrq is useful for scenarios such as:
> > > - Try to get system information during hard-lockup rather than just
> > > panic via hard-lockup detection.
> > > - Do normal start/stop debugger activity on a core which was stuck in
> > > hard-lockup.
> > > - Random boot freezes which are not easily reproducible.
> >
> > Don't get me wrong. Having sysrq from NMI seems like a good feature
> > to me.
Yeah I understand what you meant but I was just trying to highlight
additional benefits that sysrq from NMI provides.
> > That being said, it will require non-trivial changes to each
> > serial driver to support it and that means that not all serial drivers
> > will support it. It also starts requiring knowledge of how NMIs work
> > (what's allowed in NMI mode / not allowed / how to avoid races) for
> > authors of serial drivers. I have a bit of a worry that the benefit
> > won't outweigh the extra complexity, but I guess time will tell.
Yes I understand these concerns as well. That's why I have tried to
defer almost all of the work to the IRQ work queue apart from
essential parsing of RX chars in NMI mode. Also, I tried to keep most
of this code common in serial core.
> > One
> > last worry is that I assume that most people testing (and even
> > automated testing labs) will either always enable NMI or won't enable
> > NMI. That means that everyone will be only testing one codepath or
> > the other and (given the complexity) the non-tested codepath will
> > break.
> >
The current patch-set only makes this NMI to work when debugger (kgdb)
is enabled which I think is mostly suitable for development
environments. So most people testing will involve existing IRQ mode
only.
However, it's very much possible to make NMI mode as default for a
particular serial driver if the underlying irqchip supports it but it
depends if we really see any production level usage of NMI debug
feature.
> > Hrm. Along the lines of the above, though: almost no modern systems
> > are uniprocessor. That means that even if one CPU is stuck with IRQs
> > off it's fairly likely that some other CPU is OK. Presumably you'd
> > get almost as much benefit as your patch but with more done
> > automatically if you could figure out how to detect that the serial
> > interrupt isn't being serviced and re-route it to a different CPU.
> > ...or possibly you could use some variant of the hard lockup detector
> > and move all interrupts off a locked up CPU? You could make this an
> > option that's "default Y" when kgdb is turned on or something?
Yes we can reroute serial interrupts but what I meant to say is that
we can at most get a backtrace of CPU stuck in hard-lockup whereas
with NMI debugger entry on hard-lockup CPU, we can do normal
start/stop/single-step debugging on that CPU.
>
> One other idea occurred to me that's maybe simpler. You could in
> theory just poll the serial port periodically to accomplish. It would
> actually probably even work to call the normal serial port interrupt
> routine from any random CPU. On many serial drivers the entire
> interrupt handler is wrapped with:
>
> spin_lock_irqsave(&uap->port.lock, flags);
> ...
> spin_unlock_irqrestore(&uap->port.lock, flags);
>
> And a few (the ones I was involved in fixing) have the similar pattern
> of using uart_unlock_and_check_sysrq().
>
> Any serial drivers following this pattern could have their interrupt
> routine called periodically just to poll for characters and it'd be
> fine, right? ...and having it take a second before a sysrq comes in
> this case is probably not the end of the world?
>
Are you proposing to have complete RX operation in polling mode with
RX interrupt disabled (eg. using a kernel thread)?
>
> One nice benefit of this is that it would actually work _better_ on
> SMP systems for any sysrqs that aren't NMI safe. Specifically with
> your patch series those would be queued with irq_work_queue() which
> means they'd be blocked if the CPU processing the NMI is stuck with
> IRQs disabled.
Yes, the sysrq handlers which aren't NMI safe will behave similarly to
existing IRQ based sysrq handlers.
> With the polling mechanism they'd nicely just run on a
> different CPU.
It looks like polling would cause much CPU overhead. So I am not sure
if that is the preferred approach.
-Sumit
>
>
> > > > One last bit of feedback is that I noticed that you didn't try to
> > > > implement the old "knock" functionality of the old NMI driver that's
> > > > being deleted. That is: your new patches don't provide an alternate
> > > > way to drop into the debugger for systems where BREAK isn't hooked up.
> > > > That's not a hard requirement, but I was kinda hoping for it since I
> > > > have some systems that haven't routed BREAK properly. ;-)
> > > >
> > >
> > > Yeah, this is on my TODO list to have a kgdb "knock" functionality to
> > > be implemented via a common hook in serial core.
> > >
> > > >
> > > > I'll try to get some more detailed feedback in the next few days.
> > >
> > > Thanks. I do look forward to your feedback.
> > >
> > > -Sumit
> > >
> > > >
> > > > -Doug
On Thu, Aug 13, 2020 at 02:55:12PM +0530, Sumit Garg wrote:
> On Thu, 13 Aug 2020 at 05:38, Doug Anderson <[email protected]> wrote:
> > On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
> > > One
> > > last worry is that I assume that most people testing (and even
> > > automated testing labs) will either always enable NMI or won't enable
> > > NMI. That means that everyone will be only testing one codepath or
> > > the other and (given the complexity) the non-tested codepath will
> > > break.
> > >
>
> The current patch-set only makes this NMI to work when debugger (kgdb)
> is enabled which I think is mostly suitable for development
> environments. So most people testing will involve existing IRQ mode
> only.
>
> However, it's very much possible to make NMI mode as default for a
> particular serial driver if the underlying irqchip supports it but it
> depends if we really see any production level usage of NMI debug
> feature.
The effect of this patch is not to make kgdb work from NMI it is to make
(some) SysRqs work from NMI. I think that only allowing it to deploy for
kgdb users is a mistake.
Having it deploy automatically for kgdb users might be OK but it seems
sensible to make this feature available for other users too.
Daniel.
On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> >
> > Allow serial device interrupt to be requested as an NMI during
> > initialization in polling mode. If the irqchip doesn't support serial
> > device interrupt as an NMI then fallback to it being as a normal IRQ.
> >
> > Currently this NMI aware uart port only supports NMI driven programmed
> > IO operation whereas DMA operation isn't supported.
> >
> > And while operating in NMI mode, RX always remains active irrespective
> > of whether corresponding TTY port is active or not. So we directly bail
> > out of startup, shutdown and rx_stop APIs if NMI mode is active.
> >
> > Also, get rid of modification to interrupts enable mask in pl011_hwinit()
> > as now we have a proper way to enable interrupts for NMI entry using
> > pl011_enable_interrupts().
> >
> > Signed-off-by: Sumit Garg <[email protected]>
> > ---
> > drivers/tty/serial/amba-pl011.c | 124 ++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 113 insertions(+), 11 deletions(-)
>
> Overall: I ran out of time to do a super full review, but presumably
> you're going to spin this series anyway and I'll look at it again
> then. For now a few things I noticed below...
>
Sure and thanks for your review.
>
> > diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
> > index 0983c5e..5df1c07 100644
> > --- a/drivers/tty/serial/amba-pl011.c
> > +++ b/drivers/tty/serial/amba-pl011.c
> > @@ -41,6 +41,8 @@
> > #include <linux/sizes.h>
> > #include <linux/io.h>
> > #include <linux/acpi.h>
> > +#include <linux/irq.h>
> > +#include <linux/irqdesc.h>
> >
> > #include "amba-pl011.h"
> >
> > @@ -347,6 +349,10 @@ static int pl011_fifo_to_tty(struct uart_amba_port *uap)
> > if (uart_handle_sysrq_char(&uap->port, ch & 255))
> > continue;
> >
> > + if (uart_nmi_handle_char(&uap->port, ch, UART011_DR_OE, ch,
> > + flag))
> > + continue;
> > +
> > uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag);
> > }
> >
> > @@ -1316,6 +1322,9 @@ static void pl011_stop_rx(struct uart_port *port)
> > struct uart_amba_port *uap =
> > container_of(port, struct uart_amba_port, port);
> >
> > + if (uart_nmi_active(port))
> > + return;
> > +
> > uap->im &= ~(UART011_RXIM|UART011_RTIM|UART011_FEIM|
> > UART011_PEIM|UART011_BEIM|UART011_OEIM);
> > pl011_write(uap->im, uap, REG_IMSC);
> > @@ -1604,13 +1613,6 @@ static int pl011_hwinit(struct uart_port *port)
> > UART011_FEIS | UART011_RTIS | UART011_RXIS,
> > uap, REG_ICR);
> >
> > - /*
> > - * Save interrupts enable mask, and enable RX interrupts in case if
> > - * the interrupt is used for NMI entry.
> > - */
> > - uap->im = pl011_read(uap, REG_IMSC);
> > - pl011_write(UART011_RTIM | UART011_RXIM, uap, REG_IMSC);
> > -
> > if (dev_get_platdata(uap->port.dev)) {
> > struct amba_pl011_data *plat;
> >
> > @@ -1711,6 +1713,96 @@ static void pl011_put_poll_char(struct uart_port *port,
> > pl011_write(ch, uap, REG_DR);
> > }
> >
> > +static irqreturn_t pl011_nmi_int(int irq, void *dev_id)
> > +{
>
> I wish there was a better way to share code between this and
> pl011_int(), but I guess it'd be too ugly? If nothing else it feels
> like you should do something to make it more obvious to anyone looking
> at them that they are sister functions and any change to one of them
> should be reflected in the other. Maybe they should be logically next
> to each other?
>
Yes, I can make them sit logically next to each other.
>
> > + struct uart_amba_port *uap = dev_id;
> > + unsigned int status, pass_counter = AMBA_ISR_PASS_LIMIT;
> > + int handled = 0;
> > +
> > + status = pl011_read(uap, REG_MIS);
> > + if (status) {
> > + do {
> > + check_apply_cts_event_workaround(uap);
> > +
> > + pl011_write(status, uap, REG_ICR);
> > +
> > + if (status & (UART011_RTIS|UART011_RXIS)) {
> > + pl011_fifo_to_tty(uap);
> > + irq_work_queue(&uap->port.nmi_state.rx_work);
>
> It feels like it might be beneficial to not call irq_work_queue() in a
> loop. It doesn't hurt but it feels like, at least, it's going to keep
> doing a bunch of atomic operations. It's not like it'll cause the
> work to run any sooner because it has to run on the same CPU, right?
>
AFAIK, the loop here is about checking interrupt status if another
interrupt is raised while we are servicing the prior one. But I think
it would be an unlikely case here as we defer actual work and given
the slow serial transfer rate.
>
> > + }
> > +
> > + if (status & UART011_TXIS)
> > + irq_work_queue(&uap->port.nmi_state.tx_work);
>
> Here too...
>
Ditto.
>
> > +
> > + if (pass_counter-- == 0)
> > + break;
> > +
> > + status = pl011_read(uap, REG_MIS);
> > + } while (status != 0);
> > + handled = 1;
> > + }
> > +
> > + return IRQ_RETVAL(handled);
> > +}
> > +
> > +static int pl011_allocate_nmi(struct uart_amba_port *uap)
> > +{
> > + int ret;
> > +
> > + irq_set_status_flags(uap->port.irq, IRQ_NOAUTOEN);
> > + ret = request_nmi(uap->port.irq, pl011_nmi_int, IRQF_PERCPU,
> > + "uart-pl011", uap);
> > + if (ret) {
> > + irq_clear_status_flags(uap->port.irq, IRQ_NOAUTOEN);
> > + return ret;
> > + }
> > +
> > + enable_irq(uap->port.irq);
> > +
> > + return ret;
> > +}
> > +
> > +static void pl011_tx_irq_callback(struct uart_port *port)
> > +{
> > + struct uart_amba_port *uap =
> > + container_of(port, struct uart_amba_port, port);
> > +
> > + spin_lock(&port->lock);
> > + pl011_tx_chars(uap, true);
> > + spin_unlock(&port->lock);
> > +}
> > +
> > +static int pl011_poll_init(struct uart_port *port)
> > +{
> > + struct uart_amba_port *uap =
> > + container_of(port, struct uart_amba_port, port);
> > + int retval;
> > +
> > + retval = pl011_hwinit(port);
> > + if (retval)
> > + goto clk_dis;
>
> I don't think you want "goto clk_dis" here.
>
Yeah, will fix it to return here instead.
>
> > +
> > + /* In case NMI isn't supported, fallback to normal interrupt mode */
> > + retval = pl011_allocate_nmi(uap);
> > + if (retval)
> > + return 0;
> > +
> > + retval = uart_nmi_state_init(port);
> > + if (retval)
> > + goto clk_dis;
>
> Wouldn't you also need to to somehow call free_nmi() in the error case?
>
Yes, will fix it.
>
> > + port->nmi_state.tx_irq_callback = pl011_tx_irq_callback;
> > + uart_set_nmi_active(port, true);
> > +
> > + pl011_enable_interrupts(uap);
> > +
> > + return 0;
> > +
> > + clk_dis:
> > + clk_disable_unprepare(uap->clk);
> > + return retval;
> > +}
> > +
> > #endif /* CONFIG_CONSOLE_POLL */
> >
> > static bool pl011_split_lcrh(const struct uart_amba_port *uap)
> > @@ -1736,8 +1828,6 @@ static void pl011_write_lcr_h(struct uart_amba_port *uap, unsigned int lcr_h)
> >
> > static int pl011_allocate_irq(struct uart_amba_port *uap)
> > {
> > - pl011_write(uap->im, uap, REG_IMSC);
> > -
> > return request_irq(uap->port.irq, pl011_int, IRQF_SHARED, "uart-pl011", uap);
> > }
> >
> > @@ -1748,6 +1838,9 @@ static int pl011_startup(struct uart_port *port)
> > unsigned int cr;
> > int retval;
> >
> > + if (uart_nmi_active(port))
> > + return 0;
> > +
> > retval = pl011_hwinit(port);
> > if (retval)
> > goto clk_dis;
> > @@ -1790,6 +1883,9 @@ static int sbsa_uart_startup(struct uart_port *port)
> > container_of(port, struct uart_amba_port, port);
> > int retval;
> >
> > + if (uart_nmi_active(port))
> > + return 0;
> > +
> > retval = pl011_hwinit(port);
> > if (retval)
> > return retval;
> > @@ -1859,6 +1955,9 @@ static void pl011_shutdown(struct uart_port *port)
> > struct uart_amba_port *uap =
> > container_of(port, struct uart_amba_port, port);
> >
> > + if (uart_nmi_active(port))
> > + return;
> > +
> > pl011_disable_interrupts(uap);
> >
> > pl011_dma_shutdown(uap);
> > @@ -1891,6 +1990,9 @@ static void sbsa_uart_shutdown(struct uart_port *port)
> > struct uart_amba_port *uap =
> > container_of(port, struct uart_amba_port, port);
> >
> > + if (uart_nmi_active(port))
> > + return;
> > +
> > pl011_disable_interrupts(uap);
> >
> > free_irq(uap->port.irq, uap);
> > @@ -2142,7 +2244,7 @@ static const struct uart_ops amba_pl011_pops = {
> > .config_port = pl011_config_port,
> > .verify_port = pl011_verify_port,
> > #ifdef CONFIG_CONSOLE_POLL
> > - .poll_init = pl011_hwinit,
> > + .poll_init = pl011_poll_init,
>
> Do we need to add a "free" at this point?
>
Where do you envision its usage? As currently if we enable NMI once,
we would like it to be active throughout a boot cycle.
-Sumit
>
>
> > .poll_get_char = pl011_get_poll_char,
> > .poll_put_char = pl011_put_poll_char,
> > #endif
> > @@ -2173,7 +2275,7 @@ static const struct uart_ops sbsa_uart_pops = {
> > .config_port = pl011_config_port,
> > .verify_port = pl011_verify_port,
> > #ifdef CONFIG_CONSOLE_POLL
> > - .poll_init = pl011_hwinit,
> > + .poll_init = pl011_poll_init,
> > .poll_get_char = pl011_get_poll_char,
> > .poll_put_char = pl011_put_poll_char,
> > #endif
> > --
> > 2.7.4
> >
On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> >
> > Add NMI framework APIs in serial core which can be leveraged by serial
> > drivers to have NMI driven serial transfers. These APIs are kept under
> > CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> > is the only known user to enable NMI driven serial port.
> >
> > The general idea is to intercept RX characters in NMI context, if those
> > are specific to magic sysrq then allow corresponding handler to run in
> > NMI context. Otherwise defer all other RX and TX operations to IRQ work
> > queue in order to run those in normal interrupt context.
> >
> > Also, since magic sysrq entry APIs will need to be invoked from NMI
> > context, so make those APIs NMI safe via deferring NMI unsafe work to
> > IRQ work queue.
> >
> > Signed-off-by: Sumit Garg <[email protected]>
> > ---
> > drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> > include/linux/serial_core.h | 67 ++++++++++++++++++++++
> > 2 files changed, 185 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > index 57840cf..6342e90 100644
> > --- a/drivers/tty/serial/serial_core.c
> > +++ b/drivers/tty/serial/serial_core.c
> > @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> > return true;
> > }
> >
> > +#ifdef CONFIG_CONSOLE_POLL
> > + if (in_nmi())
> > + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> > + else
> > + schedule_work(&sysrq_enable_work);
> > +#else
> > schedule_work(&sysrq_enable_work);
> > -
> > +#endif
>
> It should be a very high bar to have #ifdefs inside functions. I
> don't think this meets it. Instead maybe something like this
> (untested and maybe slightly wrong syntax, but hopefully makes
> sense?):
>
> Outside the function:
>
> #ifdef CONFIG_CONSOLE_POLL
> #define queue_port_nmi_work(port, work_type)
> irq_work_queue(&port->nmi_state.work_type)
> #else
> #define queue_port_nmi_work(port, work_type)
> #endif
>
> ...and then:
>
> if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
> queue_port_nmi_work(port, sysrq_toggle_work);
> else
> schedule_work(&sysrq_enable_work);
>
> ---
>
> The whole double-hopping is really quite annoying. I guess
> schedule_work() can't be called from NMI context but can be called
> from IRQ context? So you need to first transition from NMI context to
> IRQ context and then go and schedule the work? Almost feels like we
> should just fix schedule_work() to do this double-hop for you if
> called from NMI context. Seems like you could even re-use the list
> pointers in the work_struct to keep the queue of people who need to be
> scheduled from the next irq_work? Worst case it seems like you could
> add a schedule_work_nmi() that would do all the hoops for you. ...but
> I also know very little about NMI so maybe I'm being naive.
>
Thanks for this suggestion and yes indeed we could make
schedule_work() NMI safe and in turn get rid of all this #ifdefs. Have
a look at below changes:
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 26de0ca..1daf1b4 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -14,6 +14,7 @@
#include <linux/atomic.h>
#include <linux/cpumask.h>
#include <linux/rcupdate.h>
+#include <linux/irq_work.h>
struct workqueue_struct;
@@ -106,6 +107,7 @@ struct work_struct {
#ifdef CONFIG_LOCKDEP
struct lockdep_map lockdep_map;
#endif
+ struct irq_work iw;
};
#define WORK_DATA_INIT() ATOMIC_LONG_INIT((unsigned
long)WORK_STRUCT_NO_POOL)
@@ -478,6 +480,8 @@ extern void print_worker_info(const char *log_lvl,
struct task_struct *task);
extern void show_workqueue_state(void);
extern void wq_worker_comm(char *buf, size_t size, struct task_struct *task);
+extern void queue_work_nmi(struct irq_work *iw);
+
/**
* queue_work - queue work on a workqueue
* @wq: workqueue to use
@@ -565,6 +569,11 @@ static inline bool schedule_work_on(int cpu,
struct work_struct *work)
*/
static inline bool schedule_work(struct work_struct *work)
{
+ if (in_nmi()) {
+ init_irq_work(&work->iw, queue_work_nmi);
+ return irq_work_queue(&work->iw);
+ }
+
return queue_work(system_wq, work);
}
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c41c3c1..aa22883 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1524,6 +1524,14 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq,
}
EXPORT_SYMBOL(queue_work_on);
+void queue_work_nmi(struct irq_work *iw)
+{
+ struct work_struct *work = container_of(iw, struct work_struct, iw);
+
+ queue_work(system_wq, work);
+}
+EXPORT_SYMBOL(queue_work_nmi);
+
/**
* workqueue_select_cpu_near - Select a CPU based on NUMA node
* @node: NUMA node ID that we want to select a CPU from
>
> > port->sysrq = 0;
> > return true;
> > }
> > @@ -3273,12 +3279,122 @@ int uart_handle_break(struct uart_port *port)
> > port->sysrq = 0;
> > }
> >
> > - if (port->flags & UPF_SAK)
> > + if (port->flags & UPF_SAK) {
> > +#ifdef CONFIG_CONSOLE_POLL
> > + if (in_nmi())
> > + irq_work_queue(&port->nmi_state.sysrq_sak_work);
> > + else
> > + do_SAK(state->port.tty);
> > +#else
> > do_SAK(state->port.tty);
> > +#endif
> > + }
>
> Similar comment as above about avoiding #ifdef in functions. NOTE: if
> you have something like schedule_work_nmi() I think you could just
> modify the do_SAK() function to call it and consider do_SAK() to be
> NMI safe.
>
See above.
>
> > return 0;
> > }
> > EXPORT_SYMBOL_GPL(uart_handle_break);
> >
> > +#ifdef CONFIG_CONSOLE_POLL
> > +int uart_nmi_handle_char(struct uart_port *port, unsigned int status,
> > + unsigned int overrun, unsigned int ch,
> > + unsigned int flag)
> > +{
> > + struct uart_nmi_rx_data rx_data;
> > +
> > + if (!in_nmi())
> > + return 0;
> > +
> > + rx_data.status = status;
> > + rx_data.overrun = overrun;
> > + rx_data.ch = ch;
> > + rx_data.flag = flag;
> > +
> > + if (!kfifo_in(&port->nmi_state.rx_fifo, &rx_data, 1))
> > + ++port->icount.buf_overrun;
> > +
> > + return 1;
> > +}
> > +EXPORT_SYMBOL_GPL(uart_nmi_handle_char);
> > +
> > +static void uart_nmi_rx_work(struct irq_work *rx_work)
> > +{
> > + struct uart_nmi_state *nmi_state =
> > + container_of(rx_work, struct uart_nmi_state, rx_work);
> > + struct uart_port *port =
> > + container_of(nmi_state, struct uart_port, nmi_state);
> > + struct uart_nmi_rx_data rx_data;
> > +
> > + /*
> > + * In polling mode, serial device is initialized much prior to
> > + * TTY port becoming active. This scenario is especially useful
> > + * from debugging perspective such that magic sysrq or debugger
> > + * entry would still be possible even when TTY port isn't
> > + * active (consider a boot hang case or if a user hasn't opened
> > + * the serial port). So we discard any other RX data apart from
> > + * magic sysrq commands in case TTY port isn't active.
> > + */
> > + if (!port->state || !tty_port_active(&port->state->port)) {
> > + kfifo_reset(&nmi_state->rx_fifo);
> > + return;
> > + }
> > +
> > + spin_lock(&port->lock);
> > + while (kfifo_out(&nmi_state->rx_fifo, &rx_data, 1))
> > + uart_insert_char(port, rx_data.status, rx_data.overrun,
> > + rx_data.ch, rx_data.flag);
> > + spin_unlock(&port->lock);
> > +
> > + tty_flip_buffer_push(&port->state->port);
> > +}
> > +
> > +static void uart_nmi_tx_work(struct irq_work *tx_work)
> > +{
> > + struct uart_nmi_state *nmi_state =
> > + container_of(tx_work, struct uart_nmi_state, tx_work);
> > + struct uart_port *port =
> > + container_of(nmi_state, struct uart_port, nmi_state);
> > +
> > + spin_lock(&port->lock);
> > + if (nmi_state->tx_irq_callback)
> > + nmi_state->tx_irq_callback(port);
> > + spin_unlock(&port->lock);
> > +}
> > +
> > +static void uart_nmi_sak_work(struct irq_work *work)
> > +{
> > + struct uart_nmi_state *nmi_state =
> > + container_of(work, struct uart_nmi_state, sysrq_sak_work);
> > + struct uart_port *port =
> > + container_of(nmi_state, struct uart_port, nmi_state);
> > +
> > + do_SAK(port->state->port.tty);
> > +}
> > +
> > +#ifdef CONFIG_MAGIC_SYSRQ_SERIAL
> > +static void uart_nmi_toggle_work(struct irq_work *work)
> > +{
> > + schedule_work(&sysrq_enable_work);
> > +}
>
> Nit: weird that it's called "toggle" work but just wrapps "enable" work.
>
Okay, but this API will no longer be needed if we make schedule_work()
NMI safe (see above).
-Sumit
>
>
> > +#endif
> > +
> > +int uart_nmi_state_init(struct uart_port *port)
> > +{
> > + int ret;
> > +
> > + ret = kfifo_alloc(&port->nmi_state.rx_fifo, 256, GFP_KERNEL);
> > + if (ret)
> > + return ret;
> > +
> > + init_irq_work(&port->nmi_state.rx_work, uart_nmi_rx_work);
> > + init_irq_work(&port->nmi_state.tx_work, uart_nmi_tx_work);
> > + init_irq_work(&port->nmi_state.sysrq_sak_work, uart_nmi_sak_work);
> > +#ifdef CONFIG_MAGIC_SYSRQ_SERIAL
> > + init_irq_work(&port->nmi_state.sysrq_toggle_work, uart_nmi_toggle_work);
> > +#endif
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(uart_nmi_state_init);
> > +#endif
> > +
> > EXPORT_SYMBOL(uart_write_wakeup);
> > EXPORT_SYMBOL(uart_register_driver);
> > EXPORT_SYMBOL(uart_unregister_driver);
> > diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> > index 9fd550e..84487a9 100644
> > --- a/include/linux/serial_core.h
> > +++ b/include/linux/serial_core.h
> > @@ -18,6 +18,8 @@
> > #include <linux/tty.h>
> > #include <linux/mutex.h>
> > #include <linux/sysrq.h>
> > +#include <linux/irq_work.h>
> > +#include <linux/kfifo.h>
> > #include <uapi/linux/serial_core.h>
> >
> > #ifdef CONFIG_SERIAL_CORE_CONSOLE
> > @@ -103,6 +105,28 @@ struct uart_icount {
> > typedef unsigned int __bitwise upf_t;
> > typedef unsigned int __bitwise upstat_t;
> >
> > +#ifdef CONFIG_CONSOLE_POLL
> > +struct uart_nmi_rx_data {
> > + unsigned int status;
> > + unsigned int overrun;
> > + unsigned int ch;
> > + unsigned int flag;
> > +};
> > +
> > +struct uart_nmi_state {
> > + bool active;
> > +
> > + struct irq_work tx_work;
> > + void (*tx_irq_callback)(struct uart_port *port);
> > +
> > + struct irq_work rx_work;
> > + DECLARE_KFIFO_PTR(rx_fifo, struct uart_nmi_rx_data);
> > +
> > + struct irq_work sysrq_sak_work;
> > + struct irq_work sysrq_toggle_work;
> > +};
> > +#endif
> > +
> > struct uart_port {
> > spinlock_t lock; /* port lock */
> > unsigned long iobase; /* in/out[bwl] */
> > @@ -255,6 +279,9 @@ struct uart_port {
> > struct gpio_desc *rs485_term_gpio; /* enable RS485 bus termination */
> > struct serial_iso7816 iso7816;
> > void *private_data; /* generic platform data pointer */
> > +#ifdef CONFIG_CONSOLE_POLL
> > + struct uart_nmi_state nmi_state;
> > +#endif
> > };
> >
> > static inline int serial_port_in(struct uart_port *up, int offset)
> > @@ -475,4 +502,44 @@ extern int uart_handle_break(struct uart_port *port);
> > !((cflag) & CLOCAL))
> >
> > int uart_get_rs485_mode(struct uart_port *port);
> > +
> > +/*
> > + * The following are helper functions for the NMI aware serial drivers.
> > + * Currently NMI support is only enabled under polling mode.
> > + */
> > +
> > +#ifdef CONFIG_CONSOLE_POLL
> > +int uart_nmi_state_init(struct uart_port *port);
> > +int uart_nmi_handle_char(struct uart_port *port, unsigned int status,
> > + unsigned int overrun, unsigned int ch,
> > + unsigned int flag);
> > +
> > +static inline bool uart_nmi_active(struct uart_port *port)
> > +{
> > + return port->nmi_state.active;
> > +}
> > +
> > +static inline void uart_set_nmi_active(struct uart_port *port, bool val)
> > +{
> > + port->nmi_state.active = val;
> > +}
> > +#else
> > +static inline int uart_nmi_handle_char(struct uart_port *port,
> > + unsigned int status,
> > + unsigned int overrun,
> > + unsigned int ch, unsigned int flag)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline bool uart_nmi_active(struct uart_port *port)
> > +{
> > + return false;
> > +}
> > +
> > +static inline void uart_set_nmi_active(struct uart_port *port, bool val)
> > +{
> > +}
> > +#endif
> > +
> > #endif /* LINUX_SERIAL_CORE_H */
> > --
> > 2.7.4
> >
Hi,
On Thu, Aug 13, 2020 at 7:19 AM Sumit Garg <[email protected]> wrote:
>
> On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> > >
> > > Add NMI framework APIs in serial core which can be leveraged by serial
> > > drivers to have NMI driven serial transfers. These APIs are kept under
> > > CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> > > is the only known user to enable NMI driven serial port.
> > >
> > > The general idea is to intercept RX characters in NMI context, if those
> > > are specific to magic sysrq then allow corresponding handler to run in
> > > NMI context. Otherwise defer all other RX and TX operations to IRQ work
> > > queue in order to run those in normal interrupt context.
> > >
> > > Also, since magic sysrq entry APIs will need to be invoked from NMI
> > > context, so make those APIs NMI safe via deferring NMI unsafe work to
> > > IRQ work queue.
> > >
> > > Signed-off-by: Sumit Garg <[email protected]>
> > > ---
> > > drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> > > include/linux/serial_core.h | 67 ++++++++++++++++++++++
> > > 2 files changed, 185 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > index 57840cf..6342e90 100644
> > > --- a/drivers/tty/serial/serial_core.c
> > > +++ b/drivers/tty/serial/serial_core.c
> > > @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> > > return true;
> > > }
> > >
> > > +#ifdef CONFIG_CONSOLE_POLL
> > > + if (in_nmi())
> > > + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> > > + else
> > > + schedule_work(&sysrq_enable_work);
> > > +#else
> > > schedule_work(&sysrq_enable_work);
> > > -
> > > +#endif
> >
> > It should be a very high bar to have #ifdefs inside functions. I
> > don't think this meets it. Instead maybe something like this
> > (untested and maybe slightly wrong syntax, but hopefully makes
> > sense?):
> >
> > Outside the function:
> >
> > #ifdef CONFIG_CONSOLE_POLL
> > #define queue_port_nmi_work(port, work_type)
> > irq_work_queue(&port->nmi_state.work_type)
> > #else
> > #define queue_port_nmi_work(port, work_type)
> > #endif
> >
> > ...and then:
> >
> > if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
> > queue_port_nmi_work(port, sysrq_toggle_work);
> > else
> > schedule_work(&sysrq_enable_work);
> >
> > ---
> >
> > The whole double-hopping is really quite annoying. I guess
> > schedule_work() can't be called from NMI context but can be called
> > from IRQ context? So you need to first transition from NMI context to
> > IRQ context and then go and schedule the work? Almost feels like we
> > should just fix schedule_work() to do this double-hop for you if
> > called from NMI context. Seems like you could even re-use the list
> > pointers in the work_struct to keep the queue of people who need to be
> > scheduled from the next irq_work? Worst case it seems like you could
> > add a schedule_work_nmi() that would do all the hoops for you. ...but
> > I also know very little about NMI so maybe I'm being naive.
> >
>
> Thanks for this suggestion and yes indeed we could make
> schedule_work() NMI safe and in turn get rid of all this #ifdefs. Have
> a look at below changes:
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 26de0ca..1daf1b4 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -14,6 +14,7 @@
> #include <linux/atomic.h>
> #include <linux/cpumask.h>
> #include <linux/rcupdate.h>
> +#include <linux/irq_work.h>
>
> struct workqueue_struct;
>
> @@ -106,6 +107,7 @@ struct work_struct {
> #ifdef CONFIG_LOCKDEP
> struct lockdep_map lockdep_map;
> #endif
> + struct irq_work iw;
Hrm, I was thinking you could just have a single queue per CPU then
you don't need to add all this extra data to every single "struct
work_struct". I was thinking you could use the existing list node in
the "struct work_struct" to keep track of the list of things. ...but
maybe my idea this isn't actually valid because the linked list might
be in use if we're scheduling work that's already pending / running?
In any case, I worry that people won't be happy with the extra
overhead per "struct work_struct". Can we reduce it at all? It still
does feel like you could get by with a single global queue and thus
you wouldn't need to store the function pointer and flags with every
"struct work_struct", right? So all you'd need is a single pointer
for the linked list? I haven't actually tried implementing this,
though, so I could certainly be wrong.
-Doug
Hi,
On Thu, Aug 13, 2020 at 2:25 AM Sumit Garg <[email protected]> wrote:
>
> > One other idea occurred to me that's maybe simpler. You could in
> > theory just poll the serial port periodically to accomplish. It would
> > actually probably even work to call the normal serial port interrupt
> > routine from any random CPU. On many serial drivers the entire
> > interrupt handler is wrapped with:
> >
> > spin_lock_irqsave(&uap->port.lock, flags);
> > ...
> > spin_unlock_irqrestore(&uap->port.lock, flags);
> >
> > And a few (the ones I was involved in fixing) have the similar pattern
> > of using uart_unlock_and_check_sysrq().
> >
> > Any serial drivers following this pattern could have their interrupt
> > routine called periodically just to poll for characters and it'd be
> > fine, right? ...and having it take a second before a sysrq comes in
> > this case is probably not the end of the world?
> >
>
> Are you proposing to have complete RX operation in polling mode with
> RX interrupt disabled (eg. using a kernel thread)?
No, I'm suggesting a hybrid approach. Leave the interrupts enabled as
usual, but _also_ poll every 500 ms or 1 second (maybe make it
configurable?). In some serial drivers (ones that hold the lock for
the whole interrupt routine) this polling function could actually be
the same as the normal interrupt handler so it'd be trivially easy to
implement and maintain.
NOTE: This is not the same type of polling that kgdb does today. The
existing polling is really only intended to work when we're dropped
into the debugger. This would be more like a "poll_irq" type function
that would do all the normal work the interrupt did and is really just
there in the case that the CPU that the interrupt is routed to is
locked up.
> > One nice benefit of this is that it would actually work _better_ on
> > SMP systems for any sysrqs that aren't NMI safe. Specifically with
> > your patch series those would be queued with irq_work_queue() which
> > means they'd be blocked if the CPU processing the NMI is stuck with
> > IRQs disabled.
>
> Yes, the sysrq handlers which aren't NMI safe will behave similarly to
> existing IRQ based sysrq handlers.
>
> > With the polling mechanism they'd nicely just run on a
> > different CPU.
>
> It looks like polling would cause much CPU overhead. So I am not sure
> if that is the preferred approach.
Maybe now it's clearer that there should be almost no overhead. When
dealing with a SYSRQ it's fine if there's a bit of a delay before it's
processed, so polling every 1 second is probably fine.
-Doug
+ Peter (author of irq_work.c)
On Thu, 13 Aug 2020 at 05:30, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Tue, Jul 21, 2020 at 5:10 AM Sumit Garg <[email protected]> wrote:
> >
> > In a future patch we will add support to the serial core to make it
> > possible to trigger a magic sysrq from an NMI context. Prepare for this
> > by marking some sysrq actions as NMI safe. Safe actions will be allowed
> > to run from NMI context whilst that cannot run from an NMI will be queued
> > as irq_work for later processing.
> >
> > A particular sysrq handler is only marked as NMI safe in case the handler
> > isn't contending for any synchronization primitives as in NMI context
> > they are expected to cause deadlocks. Note that the debug sysrq do not
> > contend for any synchronization primitives. It does call kgdb_breakpoint()
> > to provoke a trap but that trap handler should be NMI safe on
> > architectures that implement an NMI.
> >
> > Signed-off-by: Sumit Garg <[email protected]>
> > ---
> > drivers/tty/sysrq.c | 33 ++++++++++++++++++++++++++++++++-
> > include/linux/sysrq.h | 1 +
> > kernel/debug/debug_core.c | 1 +
> > 3 files changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > index 7c95afa9..8017e33 100644
> > --- a/drivers/tty/sysrq.c
> > +++ b/drivers/tty/sysrq.c
> > @@ -50,6 +50,8 @@
> > #include <linux/syscalls.h>
> > #include <linux/of.h>
> > #include <linux/rcupdate.h>
> > +#include <linux/irq_work.h>
> > +#include <linux/kfifo.h>
> >
> > #include <asm/ptrace.h>
> > #include <asm/irq_regs.h>
> > @@ -111,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
> > .help_msg = "loglevel(0-9)",
> > .action_msg = "Changing Loglevel",
> > .enable_mask = SYSRQ_ENABLE_LOG,
> > + .nmi_safe = true,
> > };
> >
> > #ifdef CONFIG_VT
> > @@ -157,6 +160,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
> > .help_msg = "crash(c)",
> > .action_msg = "Trigger a crash",
> > .enable_mask = SYSRQ_ENABLE_DUMP,
> > + .nmi_safe = true,
> > };
> >
> > static void sysrq_handle_reboot(int key)
> > @@ -170,6 +174,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
> > .help_msg = "reboot(b)",
> > .action_msg = "Resetting",
> > .enable_mask = SYSRQ_ENABLE_BOOT,
> > + .nmi_safe = true,
> > };
> >
> > const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
> > @@ -217,6 +222,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
> > .handler = sysrq_handle_showlocks,
> > .help_msg = "show-all-locks(d)",
> > .action_msg = "Show Locks Held",
> > + .nmi_safe = true,
> > };
> > #else
> > #define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
> > @@ -289,6 +295,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
> > .help_msg = "show-registers(p)",
> > .action_msg = "Show Regs",
> > .enable_mask = SYSRQ_ENABLE_DUMP,
> > + .nmi_safe = true,
> > };
> >
> > static void sysrq_handle_showstate(int key)
> > @@ -326,6 +333,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
> > .help_msg = "dump-ftrace-buffer(z)",
> > .action_msg = "Dump ftrace buffer",
> > .enable_mask = SYSRQ_ENABLE_DUMP,
> > + .nmi_safe = true,
> > };
> > #else
> > #define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
> > @@ -538,6 +546,23 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> > sysrq_key_table[i] = op_p;
> > }
> >
> > +#define SYSRQ_NMI_FIFO_SIZE 64
> > +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
>
> A 64-entry FIFO seems excessive. Quite honestly even a FIFO seems a
> bit excessive and it feels like if two sysrqs were received in super
> quick succession that it would be OK to just process the first one. I
> guess if it simplifies the processing to have a FIFO then it shouldn't
> hurt, but no need for 64 entries.
>
Okay, would a 2-entry FIFO work here? As here we need a FIFO to pass
on the key parameter.
>
> > +static void sysrq_do_nmi_work(struct irq_work *work)
> > +{
> > + const struct sysrq_key_op *op_p;
> > + int key;
> > +
> > + while (kfifo_out(&sysrq_nmi_fifo, &key, 1)) {
> > + op_p = __sysrq_get_key_op(key);
> > + if (op_p)
> > + op_p->handler(key);
> > + }
>
> Do you need to manage "suppress_printk" in this function? Do you need
> to call rcu_sysrq_start() and rcu_read_lock()?
Ah I missed those. Will add them here instead.
>
> If so, how do you prevent racing between the mucking we're doing with
> these things and the mucking that the NMI does with them?
IIUC, here you meant to highlight the race while scheduled sysrq is
executing in IRQ context and we receive a new sysrq in NMI context,
correct? If yes, this seems to be a trickier situation. I think the
appropriate way to handle it would be to deny any further sysrq
handling until the prior sysrq handling is complete, your views?
>
>
> > +}
> > +
> > +static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
> > +
> > void __handle_sysrq(int key, bool check_mask)
> > {
> > const struct sysrq_key_op *op_p;
> > @@ -568,7 +593,13 @@ void __handle_sysrq(int key, bool check_mask)
> > if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
> > pr_info("%s\n", op_p->action_msg);
> > console_loglevel = orig_log_level;
> > - op_p->handler(key);
> > +
> > + if (in_nmi() && !op_p->nmi_safe) {
> > + kfifo_in(&sysrq_nmi_fifo, &key, 1);
>
> Rather than kfifo_in() and kfifo_out(), I think you can use
> kfifo_put() and kfifo_get(). As I understand it those just get/put
> one element which is what you want.
Okay, will use kfifo_put() and kfifo_get() here instead.
>
>
> > + irq_work_queue(&sysrq_nmi_work);
>
> Wishful thinking, but (as far as I can tell) irq_work_queue() only
> queues work on the CPU running the NMI. I don't have lots of NMI
> experience, but any chance there is a variant that will queue work on
> any CPU? Then sysrq handlers that aren't NMI aware will be more
> likely to work.
>
Unfortunately, queuing work on other CPUs isn't safe in NMI context,
see this warning [1]. The comment mentions:
/* Arch remote IPI send/receive backend aren't NMI safe */
Peter,
Can you throw some light here as to why it isn't considered NMI-safe
to send remote IPI in NMI context? Is it an arch specific limitation?
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq_work.c#n103
-Sumit
>
>
>
> > + } else {
> > + op_p->handler(key);
> > + }
> > } else {
> > pr_info("This sysrq operation is disabled.\n");
> > console_loglevel = orig_log_level;
> > diff --git a/include/linux/sysrq.h b/include/linux/sysrq.h
> > index 3a582ec..630b5b9 100644
> > --- a/include/linux/sysrq.h
> > +++ b/include/linux/sysrq.h
> > @@ -34,6 +34,7 @@ struct sysrq_key_op {
> > const char * const help_msg;
> > const char * const action_msg;
> > const int enable_mask;
> > + const bool nmi_safe;
> > };
> >
> > #ifdef CONFIG_MAGIC_SYSRQ
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index 9e59347..2b51173 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -943,6 +943,7 @@ static const struct sysrq_key_op sysrq_dbg_op = {
> > .handler = sysrq_handle_dbg,
> > .help_msg = "debug(g)",
> > .action_msg = "DEBUG",
> > + .nmi_safe = true,
> > };
> > #endif
> >
> > --
> > 2.7.4
> >
On Thu, 13 Aug 2020 at 20:08, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Thu, Aug 13, 2020 at 7:19 AM Sumit Garg <[email protected]> wrote:
> >
> > On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> > > >
> > > > Add NMI framework APIs in serial core which can be leveraged by serial
> > > > drivers to have NMI driven serial transfers. These APIs are kept under
> > > > CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> > > > is the only known user to enable NMI driven serial port.
> > > >
> > > > The general idea is to intercept RX characters in NMI context, if those
> > > > are specific to magic sysrq then allow corresponding handler to run in
> > > > NMI context. Otherwise defer all other RX and TX operations to IRQ work
> > > > queue in order to run those in normal interrupt context.
> > > >
> > > > Also, since magic sysrq entry APIs will need to be invoked from NMI
> > > > context, so make those APIs NMI safe via deferring NMI unsafe work to
> > > > IRQ work queue.
> > > >
> > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > ---
> > > > drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> > > > include/linux/serial_core.h | 67 ++++++++++++++++++++++
> > > > 2 files changed, 185 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > > index 57840cf..6342e90 100644
> > > > --- a/drivers/tty/serial/serial_core.c
> > > > +++ b/drivers/tty/serial/serial_core.c
> > > > @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> > > > return true;
> > > > }
> > > >
> > > > +#ifdef CONFIG_CONSOLE_POLL
> > > > + if (in_nmi())
> > > > + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> > > > + else
> > > > + schedule_work(&sysrq_enable_work);
> > > > +#else
> > > > schedule_work(&sysrq_enable_work);
> > > > -
> > > > +#endif
> > >
> > > It should be a very high bar to have #ifdefs inside functions. I
> > > don't think this meets it. Instead maybe something like this
> > > (untested and maybe slightly wrong syntax, but hopefully makes
> > > sense?):
> > >
> > > Outside the function:
> > >
> > > #ifdef CONFIG_CONSOLE_POLL
> > > #define queue_port_nmi_work(port, work_type)
> > > irq_work_queue(&port->nmi_state.work_type)
> > > #else
> > > #define queue_port_nmi_work(port, work_type)
> > > #endif
> > >
> > > ...and then:
> > >
> > > if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
> > > queue_port_nmi_work(port, sysrq_toggle_work);
> > > else
> > > schedule_work(&sysrq_enable_work);
> > >
> > > ---
> > >
> > > The whole double-hopping is really quite annoying. I guess
> > > schedule_work() can't be called from NMI context but can be called
> > > from IRQ context? So you need to first transition from NMI context to
> > > IRQ context and then go and schedule the work? Almost feels like we
> > > should just fix schedule_work() to do this double-hop for you if
> > > called from NMI context. Seems like you could even re-use the list
> > > pointers in the work_struct to keep the queue of people who need to be
> > > scheduled from the next irq_work? Worst case it seems like you could
> > > add a schedule_work_nmi() that would do all the hoops for you. ...but
> > > I also know very little about NMI so maybe I'm being naive.
> > >
> >
> > Thanks for this suggestion and yes indeed we could make
> > schedule_work() NMI safe and in turn get rid of all this #ifdefs. Have
> > a look at below changes:
> >
> > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> > index 26de0ca..1daf1b4 100644
> > --- a/include/linux/workqueue.h
> > +++ b/include/linux/workqueue.h
> > @@ -14,6 +14,7 @@
> > #include <linux/atomic.h>
> > #include <linux/cpumask.h>
> > #include <linux/rcupdate.h>
> > +#include <linux/irq_work.h>
> >
> > struct workqueue_struct;
> >
> > @@ -106,6 +107,7 @@ struct work_struct {
> > #ifdef CONFIG_LOCKDEP
> > struct lockdep_map lockdep_map;
> > #endif
> > + struct irq_work iw;
>
> Hrm, I was thinking you could just have a single queue per CPU then
> you don't need to add all this extra data to every single "struct
> work_struct". I was thinking you could use the existing list node in
> the "struct work_struct" to keep track of the list of things. ...but
> maybe my idea this isn't actually valid because the linked list might
> be in use if we're scheduling work that's already pending / running?
>
> In any case, I worry that people won't be happy with the extra
> overhead per "struct work_struct". Can we reduce it at all? It still
> does feel like you could get by with a single global queue and thus
> you wouldn't need to store the function pointer and flags with every
> "struct work_struct", right? So all you'd need is a single pointer
> for the linked list? I haven't actually tried implementing this,
> though, so I could certainly be wrong.
Let me try to elaborate here:
Here we are dealing with 2 different layers of deferring work, one is
irq_work (NMI safe) using "struct irq_work" and other is normal
workqueue (NMI unsafe) using "struct work_struct".
So when we are in NMI context, the only option is to use irq_work to
defer work and need to pass reference to "struct irq_work". Now in
following irq_work function:
+void queue_work_nmi(struct irq_work *iw)
+{
+ struct work_struct *work = container_of(iw, struct work_struct, iw);
+
+ queue_work(system_wq, work);
+}
+EXPORT_SYMBOL(queue_work_nmi);
we can't find a reference to "struct work_struct" until there is 1:1
mapping with "struct irq_work". So we require a way to establish this
mapping and having "struct irq_work" as part of "struct work_struct"
tries to achieve that. If you have any better way to achieve this, I
can use that instead.
-Sumit
>
>
> -Doug
On Thu, 13 Aug 2020 at 15:47, Daniel Thompson
<[email protected]> wrote:
>
> On Thu, Aug 13, 2020 at 02:55:12PM +0530, Sumit Garg wrote:
> > On Thu, 13 Aug 2020 at 05:38, Doug Anderson <[email protected]> wrote:
> > > On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
> > > > One
> > > > last worry is that I assume that most people testing (and even
> > > > automated testing labs) will either always enable NMI or won't enable
> > > > NMI. That means that everyone will be only testing one codepath or
> > > > the other and (given the complexity) the non-tested codepath will
> > > > break.
> > > >
> >
> > The current patch-set only makes this NMI to work when debugger (kgdb)
> > is enabled which I think is mostly suitable for development
> > environments. So most people testing will involve existing IRQ mode
> > only.
> >
> > However, it's very much possible to make NMI mode as default for a
> > particular serial driver if the underlying irqchip supports it but it
> > depends if we really see any production level usage of NMI debug
> > feature.
>
> The effect of this patch is not to make kgdb work from NMI it is to make
> (some) SysRqs work from NMI. I think that only allowing it to deploy for
> kgdb users is a mistake.
>
> Having it deploy automatically for kgdb users might be OK but it seems
> sensible to make this feature available for other users too.
I think I wasn't clear enough in my prior reply. Actually I meant to
say that this patch-set enables NMI support for a particular serial
driver via ".poll_init()" interface and the only current user of that
interface is kgdb.
So if there are other users interested in this feature, they can use
".poll_init()" interface as well to enable it.
-Sumit
>
> Daniel.
On Thu, 13 Aug 2020 at 20:56, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Thu, Aug 13, 2020 at 2:25 AM Sumit Garg <[email protected]> wrote:
> >
> > > One other idea occurred to me that's maybe simpler. You could in
> > > theory just poll the serial port periodically to accomplish. It would
> > > actually probably even work to call the normal serial port interrupt
> > > routine from any random CPU. On many serial drivers the entire
> > > interrupt handler is wrapped with:
> > >
> > > spin_lock_irqsave(&uap->port.lock, flags);
> > > ...
> > > spin_unlock_irqrestore(&uap->port.lock, flags);
> > >
> > > And a few (the ones I was involved in fixing) have the similar pattern
> > > of using uart_unlock_and_check_sysrq().
> > >
> > > Any serial drivers following this pattern could have their interrupt
> > > routine called periodically just to poll for characters and it'd be
> > > fine, right? ...and having it take a second before a sysrq comes in
> > > this case is probably not the end of the world?
> > >
> >
> > Are you proposing to have complete RX operation in polling mode with
> > RX interrupt disabled (eg. using a kernel thread)?
>
> No, I'm suggesting a hybrid approach. Leave the interrupts enabled as
> usual, but _also_ poll every 500 ms or 1 second (maybe make it
> configurable?). In some serial drivers (ones that hold the lock for
> the whole interrupt routine) this polling function could actually be
> the same as the normal interrupt handler so it'd be trivially easy to
> implement and maintain.
>
> NOTE: This is not the same type of polling that kgdb does today. The
> existing polling is really only intended to work when we're dropped
> into the debugger. This would be more like a "poll_irq" type function
> that would do all the normal work the interrupt did and is really just
> there in the case that the CPU that the interrupt is routed to is
> locked up.
>
Your idea sounds interesting. I think where we are reaching is to have
an ever active listener to serial port that can be scheduled to any
random active CPU. And to keep its CPU overhead negligible, it can
sleep and only wake-up and listen once every 500 ms or 1 second
(configurable).
I will try to think more about it and probably give it a try with a PoC.
-Sumit
>
> > > One nice benefit of this is that it would actually work _better_ on
> > > SMP systems for any sysrqs that aren't NMI safe. Specifically with
> > > your patch series those would be queued with irq_work_queue() which
> > > means they'd be blocked if the CPU processing the NMI is stuck with
> > > IRQs disabled.
> >
> > Yes, the sysrq handlers which aren't NMI safe will behave similarly to
> > existing IRQ based sysrq handlers.
> >
> > > With the polling mechanism they'd nicely just run on a
> > > different CPU.
> >
> > It looks like polling would cause much CPU overhead. So I am not sure
> > if that is the preferred approach.
>
> Maybe now it's clearer that there should be almost no overhead. When
> dealing with a SYSRQ it's fine if there's a bit of a delay before it's
> processed, so polling every 1 second is probably fine.
>
> -Doug
On Fri, Aug 14, 2020 at 04:47:11PM +0530, Sumit Garg wrote:
> On Thu, 13 Aug 2020 at 20:08, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Thu, Aug 13, 2020 at 7:19 AM Sumit Garg <[email protected]> wrote:
> > >
> > > On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> > > > >
> > > > > Add NMI framework APIs in serial core which can be leveraged by serial
> > > > > drivers to have NMI driven serial transfers. These APIs are kept under
> > > > > CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> > > > > is the only known user to enable NMI driven serial port.
> > > > >
> > > > > The general idea is to intercept RX characters in NMI context, if those
> > > > > are specific to magic sysrq then allow corresponding handler to run in
> > > > > NMI context. Otherwise defer all other RX and TX operations to IRQ work
> > > > > queue in order to run those in normal interrupt context.
> > > > >
> > > > > Also, since magic sysrq entry APIs will need to be invoked from NMI
> > > > > context, so make those APIs NMI safe via deferring NMI unsafe work to
> > > > > IRQ work queue.
> > > > >
> > > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > > ---
> > > > > drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> > > > > include/linux/serial_core.h | 67 ++++++++++++++++++++++
> > > > > 2 files changed, 185 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > > > index 57840cf..6342e90 100644
> > > > > --- a/drivers/tty/serial/serial_core.c
> > > > > +++ b/drivers/tty/serial/serial_core.c
> > > > > @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> > > > > return true;
> > > > > }
> > > > >
> > > > > +#ifdef CONFIG_CONSOLE_POLL
> > > > > + if (in_nmi())
> > > > > + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> > > > > + else
> > > > > + schedule_work(&sysrq_enable_work);
> > > > > +#else
> > > > > schedule_work(&sysrq_enable_work);
> > > > > -
> > > > > +#endif
> > > >
> > > > It should be a very high bar to have #ifdefs inside functions. I
> > > > don't think this meets it. Instead maybe something like this
> > > > (untested and maybe slightly wrong syntax, but hopefully makes
> > > > sense?):
> > > >
> > > > Outside the function:
> > > >
> > > > #ifdef CONFIG_CONSOLE_POLL
> > > > #define queue_port_nmi_work(port, work_type)
> > > > irq_work_queue(&port->nmi_state.work_type)
> > > > #else
> > > > #define queue_port_nmi_work(port, work_type)
> > > > #endif
> > > >
> > > > ...and then:
> > > >
> > > > if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
> > > > queue_port_nmi_work(port, sysrq_toggle_work);
> > > > else
> > > > schedule_work(&sysrq_enable_work);
> > > >
> > > > ---
> > > >
> > > > The whole double-hopping is really quite annoying. I guess
> > > > schedule_work() can't be called from NMI context but can be called
> > > > from IRQ context? So you need to first transition from NMI context to
> > > > IRQ context and then go and schedule the work? Almost feels like we
> > > > should just fix schedule_work() to do this double-hop for you if
> > > > called from NMI context. Seems like you could even re-use the list
> > > > pointers in the work_struct to keep the queue of people who need to be
> > > > scheduled from the next irq_work? Worst case it seems like you could
> > > > add a schedule_work_nmi() that would do all the hoops for you. ...but
> > > > I also know very little about NMI so maybe I'm being naive.
> > > >
> > >
> > > Thanks for this suggestion and yes indeed we could make
> > > schedule_work() NMI safe and in turn get rid of all this #ifdefs. Have
> > > a look at below changes:
> > >
> > > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> > > index 26de0ca..1daf1b4 100644
> > > --- a/include/linux/workqueue.h
> > > +++ b/include/linux/workqueue.h
> > > @@ -14,6 +14,7 @@
> > > #include <linux/atomic.h>
> > > #include <linux/cpumask.h>
> > > #include <linux/rcupdate.h>
> > > +#include <linux/irq_work.h>
> > >
> > > struct workqueue_struct;
> > >
> > > @@ -106,6 +107,7 @@ struct work_struct {
> > > #ifdef CONFIG_LOCKDEP
> > > struct lockdep_map lockdep_map;
> > > #endif
> > > + struct irq_work iw;
> >
> > Hrm, I was thinking you could just have a single queue per CPU then
> > you don't need to add all this extra data to every single "struct
> > work_struct". I was thinking you could use the existing list node in
> > the "struct work_struct" to keep track of the list of things. ...but
> > maybe my idea this isn't actually valid because the linked list might
> > be in use if we're scheduling work that's already pending / running?
> >
> > In any case, I worry that people won't be happy with the extra
> > overhead per "struct work_struct". Can we reduce it at all? It still
> > does feel like you could get by with a single global queue and thus
> > you wouldn't need to store the function pointer and flags with every
> > "struct work_struct", right? So all you'd need is a single pointer
> > for the linked list? I haven't actually tried implementing this,
> > though, so I could certainly be wrong.
>
> Let me try to elaborate here:
>
> Here we are dealing with 2 different layers of deferring work, one is
> irq_work (NMI safe) using "struct irq_work" and other is normal
> workqueue (NMI unsafe) using "struct work_struct".
>
> So when we are in NMI context, the only option is to use irq_work to
> defer work and need to pass reference to "struct irq_work". Now in
> following irq_work function:
>
> +void queue_work_nmi(struct irq_work *iw)
> +{
> + struct work_struct *work = container_of(iw, struct work_struct, iw);
> +
> + queue_work(system_wq, work);
> +}
> +EXPORT_SYMBOL(queue_work_nmi);
>
> we can't find a reference to "struct work_struct" until there is 1:1
> mapping with "struct irq_work". So we require a way to establish this
> mapping and having "struct irq_work" as part of "struct work_struct"
> tries to achieve that. If you have any better way to achieve this, I
> can use that instead.
Perhaps don't consider this to be "fixing schedule_work()" but providing
an NMI-safe alternative to schedule_work().
Does it look better if you create a new type to map the two structures
together. Alternatively are there enough existing use-cases to want to
extend irq_work_queue() with irq_work_schedule() or something similar?
Daniel.
On Fri, Aug 14, 2020 at 05:36:36PM +0530, Sumit Garg wrote:
> On Thu, 13 Aug 2020 at 15:47, Daniel Thompson
> <[email protected]> wrote:
> >
> > On Thu, Aug 13, 2020 at 02:55:12PM +0530, Sumit Garg wrote:
> > > On Thu, 13 Aug 2020 at 05:38, Doug Anderson <[email protected]> wrote:
> > > > On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
> > > > > One
> > > > > last worry is that I assume that most people testing (and even
> > > > > automated testing labs) will either always enable NMI or won't enable
> > > > > NMI. That means that everyone will be only testing one codepath or
> > > > > the other and (given the complexity) the non-tested codepath will
> > > > > break.
> > > > >
> > >
> > > The current patch-set only makes this NMI to work when debugger (kgdb)
> > > is enabled which I think is mostly suitable for development
> > > environments. So most people testing will involve existing IRQ mode
> > > only.
> > >
> > > However, it's very much possible to make NMI mode as default for a
> > > particular serial driver if the underlying irqchip supports it but it
> > > depends if we really see any production level usage of NMI debug
> > > feature.
> >
> > The effect of this patch is not to make kgdb work from NMI it is to make
> > (some) SysRqs work from NMI. I think that only allowing it to deploy for
> > kgdb users is a mistake.
> >
> > Having it deploy automatically for kgdb users might be OK but it seems
> > sensible to make this feature available for other users too.
>
> I think I wasn't clear enough in my prior reply. Actually I meant to
> say that this patch-set enables NMI support for a particular serial
> driver via ".poll_init()" interface and the only current user of that
> interface is kgdb.
>
> So if there are other users interested in this feature, they can use
> ".poll_init()" interface as well to enable it.
Huh?
We appear to speaking interchangably about users (people who sit in
front of the machine and want a stack trace) and sub-systems ;-).
I don't think other SysRq commands have quite such a direct relationship
between the sub-system and the sysrq command. For example who are you
expecting to call .poll_init() if a user wants to use the SysRq to
provoke a stack trace?
Daniel.
On Fri, Aug 14, 2020 at 12:54:35PM +0530, Sumit Garg wrote:
> On Thu, 13 Aug 2020 at 05:30, Doug Anderson <[email protected]> wrote:
> > On Tue, Jul 21, 2020 at 5:10 AM Sumit Garg <[email protected]> wrote:
> > Wishful thinking, but (as far as I can tell) irq_work_queue() only
> > queues work on the CPU running the NMI. I don't have lots of NMI
> > experience, but any chance there is a variant that will queue work on
> > any CPU? Then sysrq handlers that aren't NMI aware will be more
> > likely to work.
> >
>
> Unfortunately, queuing work on other CPUs isn't safe in NMI context,
> see this warning [1]. The comment mentions:
>
> /* Arch remote IPI send/receive backend aren't NMI safe */
>
> Peter,
>
> Can you throw some light here as to why it isn't considered NMI-safe
> to send remote IPI in NMI context? Is it an arch specific limitation?
Yeah, remote irq_work uses __smp_call_single_queue() /
send_call_function_single_ipi() which isn't safe to be used from NMI
context in general.
arch_irq_work_raise() is very carefully implemented on a number of
platforms to be able to (self) IPI from NMI context.
Hi,
On Fri, Aug 14, 2020 at 4:17 AM Sumit Garg <[email protected]> wrote:
>
> On Thu, 13 Aug 2020 at 20:08, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Thu, Aug 13, 2020 at 7:19 AM Sumit Garg <[email protected]> wrote:
> > >
> > > On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> > > > >
> > > > > Add NMI framework APIs in serial core which can be leveraged by serial
> > > > > drivers to have NMI driven serial transfers. These APIs are kept under
> > > > > CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> > > > > is the only known user to enable NMI driven serial port.
> > > > >
> > > > > The general idea is to intercept RX characters in NMI context, if those
> > > > > are specific to magic sysrq then allow corresponding handler to run in
> > > > > NMI context. Otherwise defer all other RX and TX operations to IRQ work
> > > > > queue in order to run those in normal interrupt context.
> > > > >
> > > > > Also, since magic sysrq entry APIs will need to be invoked from NMI
> > > > > context, so make those APIs NMI safe via deferring NMI unsafe work to
> > > > > IRQ work queue.
> > > > >
> > > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > > ---
> > > > > drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> > > > > include/linux/serial_core.h | 67 ++++++++++++++++++++++
> > > > > 2 files changed, 185 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > > > index 57840cf..6342e90 100644
> > > > > --- a/drivers/tty/serial/serial_core.c
> > > > > +++ b/drivers/tty/serial/serial_core.c
> > > > > @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> > > > > return true;
> > > > > }
> > > > >
> > > > > +#ifdef CONFIG_CONSOLE_POLL
> > > > > + if (in_nmi())
> > > > > + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> > > > > + else
> > > > > + schedule_work(&sysrq_enable_work);
> > > > > +#else
> > > > > schedule_work(&sysrq_enable_work);
> > > > > -
> > > > > +#endif
> > > >
> > > > It should be a very high bar to have #ifdefs inside functions. I
> > > > don't think this meets it. Instead maybe something like this
> > > > (untested and maybe slightly wrong syntax, but hopefully makes
> > > > sense?):
> > > >
> > > > Outside the function:
> > > >
> > > > #ifdef CONFIG_CONSOLE_POLL
> > > > #define queue_port_nmi_work(port, work_type)
> > > > irq_work_queue(&port->nmi_state.work_type)
> > > > #else
> > > > #define queue_port_nmi_work(port, work_type)
> > > > #endif
> > > >
> > > > ...and then:
> > > >
> > > > if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
> > > > queue_port_nmi_work(port, sysrq_toggle_work);
> > > > else
> > > > schedule_work(&sysrq_enable_work);
> > > >
> > > > ---
> > > >
> > > > The whole double-hopping is really quite annoying. I guess
> > > > schedule_work() can't be called from NMI context but can be called
> > > > from IRQ context? So you need to first transition from NMI context to
> > > > IRQ context and then go and schedule the work? Almost feels like we
> > > > should just fix schedule_work() to do this double-hop for you if
> > > > called from NMI context. Seems like you could even re-use the list
> > > > pointers in the work_struct to keep the queue of people who need to be
> > > > scheduled from the next irq_work? Worst case it seems like you could
> > > > add a schedule_work_nmi() that would do all the hoops for you. ...but
> > > > I also know very little about NMI so maybe I'm being naive.
> > > >
> > >
> > > Thanks for this suggestion and yes indeed we could make
> > > schedule_work() NMI safe and in turn get rid of all this #ifdefs. Have
> > > a look at below changes:
> > >
> > > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> > > index 26de0ca..1daf1b4 100644
> > > --- a/include/linux/workqueue.h
> > > +++ b/include/linux/workqueue.h
> > > @@ -14,6 +14,7 @@
> > > #include <linux/atomic.h>
> > > #include <linux/cpumask.h>
> > > #include <linux/rcupdate.h>
> > > +#include <linux/irq_work.h>
> > >
> > > struct workqueue_struct;
> > >
> > > @@ -106,6 +107,7 @@ struct work_struct {
> > > #ifdef CONFIG_LOCKDEP
> > > struct lockdep_map lockdep_map;
> > > #endif
> > > + struct irq_work iw;
> >
> > Hrm, I was thinking you could just have a single queue per CPU then
> > you don't need to add all this extra data to every single "struct
> > work_struct". I was thinking you could use the existing list node in
> > the "struct work_struct" to keep track of the list of things. ...but
> > maybe my idea this isn't actually valid because the linked list might
> > be in use if we're scheduling work that's already pending / running?
> >
> > In any case, I worry that people won't be happy with the extra
> > overhead per "struct work_struct". Can we reduce it at all? It still
> > does feel like you could get by with a single global queue and thus
> > you wouldn't need to store the function pointer and flags with every
> > "struct work_struct", right? So all you'd need is a single pointer
> > for the linked list? I haven't actually tried implementing this,
> > though, so I could certainly be wrong.
>
> Let me try to elaborate here:
>
> Here we are dealing with 2 different layers of deferring work, one is
> irq_work (NMI safe) using "struct irq_work" and other is normal
> workqueue (NMI unsafe) using "struct work_struct".
>
> So when we are in NMI context, the only option is to use irq_work to
> defer work and need to pass reference to "struct irq_work". Now in
> following irq_work function:
>
> +void queue_work_nmi(struct irq_work *iw)
> +{
> + struct work_struct *work = container_of(iw, struct work_struct, iw);
> +
> + queue_work(system_wq, work);
> +}
> +EXPORT_SYMBOL(queue_work_nmi);
>
> we can't find a reference to "struct work_struct" until there is 1:1
> mapping with "struct irq_work". So we require a way to establish this
> mapping and having "struct irq_work" as part of "struct work_struct"
> tries to achieve that. If you have any better way to achieve this, I
> can use that instead.
So I guess the two options to avoid the overhead are:
1. Create a new struct:
struct nmi_queuable_work_struct {
struct work_struct work;
struct irq_work iw;
};
Then the overhead is only needed for those that want this
functionality. Those people would need to use a variant
nmi_schedule_work() which, depending on in_nmi(), would either
schedule it directly or use the extra work.
Looks like Daniel already responded and suggested this.
2. Something that duplicates the code of at least part of irq_work and
therefore saves the need to store the function pointer. Think of it
this way: if you made a whole copy of irq_work that was hardcoded to
just call the function you wanted then you wouldn't need to store a
function pointer. This is, of course, excessive. I was trying to
figure out if you could do less by only copying the NMI-safe
linked-list manipulation, but this is probably impossible and not
worth it anyway.
-Doug
Hi,
On Fri, Aug 14, 2020 at 12:24 AM Sumit Garg <[email protected]> wrote:
>
> + Peter (author of irq_work.c)
>
> On Thu, 13 Aug 2020 at 05:30, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Tue, Jul 21, 2020 at 5:10 AM Sumit Garg <[email protected]> wrote:
> > >
> > > In a future patch we will add support to the serial core to make it
> > > possible to trigger a magic sysrq from an NMI context. Prepare for this
> > > by marking some sysrq actions as NMI safe. Safe actions will be allowed
> > > to run from NMI context whilst that cannot run from an NMI will be queued
> > > as irq_work for later processing.
> > >
> > > A particular sysrq handler is only marked as NMI safe in case the handler
> > > isn't contending for any synchronization primitives as in NMI context
> > > they are expected to cause deadlocks. Note that the debug sysrq do not
> > > contend for any synchronization primitives. It does call kgdb_breakpoint()
> > > to provoke a trap but that trap handler should be NMI safe on
> > > architectures that implement an NMI.
> > >
> > > Signed-off-by: Sumit Garg <[email protected]>
> > > ---
> > > drivers/tty/sysrq.c | 33 ++++++++++++++++++++++++++++++++-
> > > include/linux/sysrq.h | 1 +
> > > kernel/debug/debug_core.c | 1 +
> > > 3 files changed, 34 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > > index 7c95afa9..8017e33 100644
> > > --- a/drivers/tty/sysrq.c
> > > +++ b/drivers/tty/sysrq.c
> > > @@ -50,6 +50,8 @@
> > > #include <linux/syscalls.h>
> > > #include <linux/of.h>
> > > #include <linux/rcupdate.h>
> > > +#include <linux/irq_work.h>
> > > +#include <linux/kfifo.h>
> > >
> > > #include <asm/ptrace.h>
> > > #include <asm/irq_regs.h>
> > > @@ -111,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
> > > .help_msg = "loglevel(0-9)",
> > > .action_msg = "Changing Loglevel",
> > > .enable_mask = SYSRQ_ENABLE_LOG,
> > > + .nmi_safe = true,
> > > };
> > >
> > > #ifdef CONFIG_VT
> > > @@ -157,6 +160,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
> > > .help_msg = "crash(c)",
> > > .action_msg = "Trigger a crash",
> > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > + .nmi_safe = true,
> > > };
> > >
> > > static void sysrq_handle_reboot(int key)
> > > @@ -170,6 +174,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
> > > .help_msg = "reboot(b)",
> > > .action_msg = "Resetting",
> > > .enable_mask = SYSRQ_ENABLE_BOOT,
> > > + .nmi_safe = true,
> > > };
> > >
> > > const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
> > > @@ -217,6 +222,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
> > > .handler = sysrq_handle_showlocks,
> > > .help_msg = "show-all-locks(d)",
> > > .action_msg = "Show Locks Held",
> > > + .nmi_safe = true,
> > > };
> > > #else
> > > #define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
> > > @@ -289,6 +295,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
> > > .help_msg = "show-registers(p)",
> > > .action_msg = "Show Regs",
> > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > + .nmi_safe = true,
> > > };
> > >
> > > static void sysrq_handle_showstate(int key)
> > > @@ -326,6 +333,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
> > > .help_msg = "dump-ftrace-buffer(z)",
> > > .action_msg = "Dump ftrace buffer",
> > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > + .nmi_safe = true,
> > > };
> > > #else
> > > #define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
> > > @@ -538,6 +546,23 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> > > sysrq_key_table[i] = op_p;
> > > }
> > >
> > > +#define SYSRQ_NMI_FIFO_SIZE 64
> > > +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
> >
> > A 64-entry FIFO seems excessive. Quite honestly even a FIFO seems a
> > bit excessive and it feels like if two sysrqs were received in super
> > quick succession that it would be OK to just process the first one. I
> > guess if it simplifies the processing to have a FIFO then it shouldn't
> > hurt, but no need for 64 entries.
> >
>
> Okay, would a 2-entry FIFO work here? As here we need a FIFO to pass
> on the key parameter.
...or even a 1-entry FIFO if that makes sense?
> > > +static void sysrq_do_nmi_work(struct irq_work *work)
> > > +{
> > > + const struct sysrq_key_op *op_p;
> > > + int key;
> > > +
> > > + while (kfifo_out(&sysrq_nmi_fifo, &key, 1)) {
> > > + op_p = __sysrq_get_key_op(key);
> > > + if (op_p)
> > > + op_p->handler(key);
> > > + }
> >
> > Do you need to manage "suppress_printk" in this function? Do you need
> > to call rcu_sysrq_start() and rcu_read_lock()?
>
> Ah I missed those. Will add them here instead.
>
> >
> > If so, how do you prevent racing between the mucking we're doing with
> > these things and the mucking that the NMI does with them?
>
> IIUC, here you meant to highlight the race while scheduled sysrq is
> executing in IRQ context and we receive a new sysrq in NMI context,
> correct? If yes, this seems to be a trickier situation. I think the
> appropriate way to handle it would be to deny any further sysrq
> handling until the prior sysrq handling is complete, your views?
The problem is that in some cases you're running NMIs directly at FIQ
time and other cases you're running them at IRQ time. So you
definitely can't just move it to NMI.
Skipping looking for other SYSRQs until the old one is complete sounds
good to me. Again my ignorance will make me sound like a fool,
probably, but can you use the kfifo as a form of mutual exclusion? If
you have a 1-entry kfifo, maybe:
1. First try to add to the "FIFO". If it fails (out of space) then a
sysrq is in progress. Ignore this one.
2. Decide if you're NMI-safe or not.
3. If NMI safe, modify "suppress_printk", call rcu functions, then
call the handler. Restore suppress_printk and then dequeue from FIFO.
4. If not-NMI safe, the irq worker would "peek" into the FIFO, do its
work (wrapped with "suppress_printk" and the like), and not dequeue
until it's done.
In the above you'd use the FIFO as a locking mechanism. I don't know
if that's a valid use of it or if there is a better NMI-safe mechanism
for this. I think the kfifo docs talk about only one reader and one
writer and here we have two readers, so maybe it's illegal. It also
seems weird to have a 1-entry "FIFO" and feels like there's probably a
better data structure for this.
On Fri, 14 Aug 2020 at 19:48, Daniel Thompson
<[email protected]> wrote:
>
> On Fri, Aug 14, 2020 at 05:36:36PM +0530, Sumit Garg wrote:
> > On Thu, 13 Aug 2020 at 15:47, Daniel Thompson
> > <[email protected]> wrote:
> > >
> > > On Thu, Aug 13, 2020 at 02:55:12PM +0530, Sumit Garg wrote:
> > > > On Thu, 13 Aug 2020 at 05:38, Doug Anderson <[email protected]> wrote:
> > > > > On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
> > > > > > One
> > > > > > last worry is that I assume that most people testing (and even
> > > > > > automated testing labs) will either always enable NMI or won't enable
> > > > > > NMI. That means that everyone will be only testing one codepath or
> > > > > > the other and (given the complexity) the non-tested codepath will
> > > > > > break.
> > > > > >
> > > >
> > > > The current patch-set only makes this NMI to work when debugger (kgdb)
> > > > is enabled which I think is mostly suitable for development
> > > > environments. So most people testing will involve existing IRQ mode
> > > > only.
> > > >
> > > > However, it's very much possible to make NMI mode as default for a
> > > > particular serial driver if the underlying irqchip supports it but it
> > > > depends if we really see any production level usage of NMI debug
> > > > feature.
> > >
> > > The effect of this patch is not to make kgdb work from NMI it is to make
> > > (some) SysRqs work from NMI. I think that only allowing it to deploy for
> > > kgdb users is a mistake.
> > >
> > > Having it deploy automatically for kgdb users might be OK but it seems
> > > sensible to make this feature available for other users too.
> >
> > I think I wasn't clear enough in my prior reply. Actually I meant to
> > say that this patch-set enables NMI support for a particular serial
> > driver via ".poll_init()" interface and the only current user of that
> > interface is kgdb.
> >
> > So if there are other users interested in this feature, they can use
> > ".poll_init()" interface as well to enable it.
>
> Huh?
>
> We appear to speaking interchangably about users (people who sit in
> front of the machine and want a stack trace) and sub-systems ;-).
>
> I don't think other SysRq commands have quite such a direct relationship
> between the sub-system and the sysrq command. For example who are you
> expecting to call .poll_init() if a user wants to use the SysRq to
> provoke a stack trace?
>
Ah, I see. So you meant to provide a user-space interface to
dynamically enable/disable NMI debug, correct? It will require IRQ <->
NMI switching at runtime which should be doable safely.
-Sumit
>
> Daniel.
On Mon, Aug 17, 2020 at 10:42:43AM +0530, Sumit Garg wrote:
> On Fri, 14 Aug 2020 at 19:48, Daniel Thompson
> <[email protected]> wrote:
> >
> > On Fri, Aug 14, 2020 at 05:36:36PM +0530, Sumit Garg wrote:
> > > On Thu, 13 Aug 2020 at 15:47, Daniel Thompson
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Aug 13, 2020 at 02:55:12PM +0530, Sumit Garg wrote:
> > > > > On Thu, 13 Aug 2020 at 05:38, Doug Anderson <[email protected]> wrote:
> > > > > > On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
> > > > > > > One
> > > > > > > last worry is that I assume that most people testing (and even
> > > > > > > automated testing labs) will either always enable NMI or won't enable
> > > > > > > NMI. That means that everyone will be only testing one codepath or
> > > > > > > the other and (given the complexity) the non-tested codepath will
> > > > > > > break.
> > > > > > >
> > > > >
> > > > > The current patch-set only makes this NMI to work when debugger (kgdb)
> > > > > is enabled which I think is mostly suitable for development
> > > > > environments. So most people testing will involve existing IRQ mode
> > > > > only.
> > > > >
> > > > > However, it's very much possible to make NMI mode as default for a
> > > > > particular serial driver if the underlying irqchip supports it but it
> > > > > depends if we really see any production level usage of NMI debug
> > > > > feature.
> > > >
> > > > The effect of this patch is not to make kgdb work from NMI it is to make
> > > > (some) SysRqs work from NMI. I think that only allowing it to deploy for
> > > > kgdb users is a mistake.
> > > >
> > > > Having it deploy automatically for kgdb users might be OK but it seems
> > > > sensible to make this feature available for other users too.
> > >
> > > I think I wasn't clear enough in my prior reply. Actually I meant to
> > > say that this patch-set enables NMI support for a particular serial
> > > driver via ".poll_init()" interface and the only current user of that
> > > interface is kgdb.
> > >
> > > So if there are other users interested in this feature, they can use
> > > ".poll_init()" interface as well to enable it.
> >
> > Huh?
> >
> > We appear to speaking interchangably about users (people who sit in
> > front of the machine and want a stack trace) and sub-systems ;-).
> >
> > I don't think other SysRq commands have quite such a direct relationship
> > between the sub-system and the sysrq command. For example who are you
> > expecting to call .poll_init() if a user wants to use the SysRq to
> > provoke a stack trace?
> >
>
> Ah, I see. So you meant to provide a user-space interface to
> dynamically enable/disable NMI debug, correct? It will require IRQ <->
> NMI switching at runtime which should be doable safely.
I haven't given much thought to the exact mechanism, though I would
perhaps have started by thinking about a module parameter).
From an RFC point of view, I simple think this feature is potentially
useful on systems without kgdb (which, let's be honest, are firmly in
the majority) so making .poll_init() the only way to activate it is a
mistake.
Daniel.
On Fri, 14 Aug 2020 at 19:43, Daniel Thompson
<[email protected]> wrote:
>
> On Fri, Aug 14, 2020 at 04:47:11PM +0530, Sumit Garg wrote:
> > On Thu, 13 Aug 2020 at 20:08, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Aug 13, 2020 at 7:19 AM Sumit Garg <[email protected]> wrote:
> > > >
> > > > On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> > > > > >
> > > > > > Add NMI framework APIs in serial core which can be leveraged by serial
> > > > > > drivers to have NMI driven serial transfers. These APIs are kept under
> > > > > > CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> > > > > > is the only known user to enable NMI driven serial port.
> > > > > >
> > > > > > The general idea is to intercept RX characters in NMI context, if those
> > > > > > are specific to magic sysrq then allow corresponding handler to run in
> > > > > > NMI context. Otherwise defer all other RX and TX operations to IRQ work
> > > > > > queue in order to run those in normal interrupt context.
> > > > > >
> > > > > > Also, since magic sysrq entry APIs will need to be invoked from NMI
> > > > > > context, so make those APIs NMI safe via deferring NMI unsafe work to
> > > > > > IRQ work queue.
> > > > > >
> > > > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > > > ---
> > > > > > drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> > > > > > include/linux/serial_core.h | 67 ++++++++++++++++++++++
> > > > > > 2 files changed, 185 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > > > > index 57840cf..6342e90 100644
> > > > > > --- a/drivers/tty/serial/serial_core.c
> > > > > > +++ b/drivers/tty/serial/serial_core.c
> > > > > > @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> > > > > > return true;
> > > > > > }
> > > > > >
> > > > > > +#ifdef CONFIG_CONSOLE_POLL
> > > > > > + if (in_nmi())
> > > > > > + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> > > > > > + else
> > > > > > + schedule_work(&sysrq_enable_work);
> > > > > > +#else
> > > > > > schedule_work(&sysrq_enable_work);
> > > > > > -
> > > > > > +#endif
> > > > >
> > > > > It should be a very high bar to have #ifdefs inside functions. I
> > > > > don't think this meets it. Instead maybe something like this
> > > > > (untested and maybe slightly wrong syntax, but hopefully makes
> > > > > sense?):
> > > > >
> > > > > Outside the function:
> > > > >
> > > > > #ifdef CONFIG_CONSOLE_POLL
> > > > > #define queue_port_nmi_work(port, work_type)
> > > > > irq_work_queue(&port->nmi_state.work_type)
> > > > > #else
> > > > > #define queue_port_nmi_work(port, work_type)
> > > > > #endif
> > > > >
> > > > > ...and then:
> > > > >
> > > > > if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
> > > > > queue_port_nmi_work(port, sysrq_toggle_work);
> > > > > else
> > > > > schedule_work(&sysrq_enable_work);
> > > > >
> > > > > ---
> > > > >
> > > > > The whole double-hopping is really quite annoying. I guess
> > > > > schedule_work() can't be called from NMI context but can be called
> > > > > from IRQ context? So you need to first transition from NMI context to
> > > > > IRQ context and then go and schedule the work? Almost feels like we
> > > > > should just fix schedule_work() to do this double-hop for you if
> > > > > called from NMI context. Seems like you could even re-use the list
> > > > > pointers in the work_struct to keep the queue of people who need to be
> > > > > scheduled from the next irq_work? Worst case it seems like you could
> > > > > add a schedule_work_nmi() that would do all the hoops for you. ...but
> > > > > I also know very little about NMI so maybe I'm being naive.
> > > > >
> > > >
> > > > Thanks for this suggestion and yes indeed we could make
> > > > schedule_work() NMI safe and in turn get rid of all this #ifdefs. Have
> > > > a look at below changes:
> > > >
> > > > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> > > > index 26de0ca..1daf1b4 100644
> > > > --- a/include/linux/workqueue.h
> > > > +++ b/include/linux/workqueue.h
> > > > @@ -14,6 +14,7 @@
> > > > #include <linux/atomic.h>
> > > > #include <linux/cpumask.h>
> > > > #include <linux/rcupdate.h>
> > > > +#include <linux/irq_work.h>
> > > >
> > > > struct workqueue_struct;
> > > >
> > > > @@ -106,6 +107,7 @@ struct work_struct {
> > > > #ifdef CONFIG_LOCKDEP
> > > > struct lockdep_map lockdep_map;
> > > > #endif
> > > > + struct irq_work iw;
> > >
> > > Hrm, I was thinking you could just have a single queue per CPU then
> > > you don't need to add all this extra data to every single "struct
> > > work_struct". I was thinking you could use the existing list node in
> > > the "struct work_struct" to keep track of the list of things. ...but
> > > maybe my idea this isn't actually valid because the linked list might
> > > be in use if we're scheduling work that's already pending / running?
> > >
> > > In any case, I worry that people won't be happy with the extra
> > > overhead per "struct work_struct". Can we reduce it at all? It still
> > > does feel like you could get by with a single global queue and thus
> > > you wouldn't need to store the function pointer and flags with every
> > > "struct work_struct", right? So all you'd need is a single pointer
> > > for the linked list? I haven't actually tried implementing this,
> > > though, so I could certainly be wrong.
> >
> > Let me try to elaborate here:
> >
> > Here we are dealing with 2 different layers of deferring work, one is
> > irq_work (NMI safe) using "struct irq_work" and other is normal
> > workqueue (NMI unsafe) using "struct work_struct".
> >
> > So when we are in NMI context, the only option is to use irq_work to
> > defer work and need to pass reference to "struct irq_work". Now in
> > following irq_work function:
> >
> > +void queue_work_nmi(struct irq_work *iw)
> > +{
> > + struct work_struct *work = container_of(iw, struct work_struct, iw);
> > +
> > + queue_work(system_wq, work);
> > +}
> > +EXPORT_SYMBOL(queue_work_nmi);
> >
> > we can't find a reference to "struct work_struct" until there is 1:1
> > mapping with "struct irq_work". So we require a way to establish this
> > mapping and having "struct irq_work" as part of "struct work_struct"
> > tries to achieve that. If you have any better way to achieve this, I
> > can use that instead.
>
> Perhaps don't consider this to be "fixing schedule_work()" but providing
> an NMI-safe alternative to schedule_work().
Okay.
>
> Does it look better if you create a new type to map the two structures
> together. Alternatively are there enough existing use-cases to want to
> extend irq_work_queue() with irq_work_schedule() or something similar?
>
Thanks for your suggestion, irq_work_schedule() looked even better
without any overhead, see below:
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 3082378..1eade89 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -3,6 +3,7 @@
#define _LINUX_IRQ_WORK_H
#include <linux/smp_types.h>
+#include <linux/workqueue.h>
/*
* An entry can be in one of four states:
@@ -24,6 +25,11 @@ struct irq_work {
void (*func)(struct irq_work *);
};
+struct irq_work_schedule {
+ struct irq_work work;
+ struct work_struct *sched_work;
+};
+
static inline
void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
{
{
@@ -39,6 +45,7 @@ void init_irq_work(struct irq_work *work, void
(*func)(struct irq_work *))
bool irq_work_queue(struct irq_work *work);
bool irq_work_queue_on(struct irq_work *work, int cpu);
+bool irq_work_schedule(struct work_struct *sched_work);
void irq_work_tick(void);
void irq_work_sync(struct irq_work *work);
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index eca8396..3880316 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -24,6 +24,8 @@
static DEFINE_PER_CPU(struct llist_head, raised_list);
static DEFINE_PER_CPU(struct llist_head, lazy_list);
+static struct irq_work_schedule irq_work_sched;
+
/*
* Claim the entry so that no one else will poke at it.
*/
@@ -79,6 +81,25 @@ bool irq_work_queue(struct irq_work *work)
}
EXPORT_SYMBOL_GPL(irq_work_queue);
+static void irq_work_schedule_fn(struct irq_work *work)
+{
+ struct irq_work_schedule *irq_work_sched =
+ container_of(work, struct irq_work_schedule, work);
+
+ if (irq_work_sched->sched_work)
+ schedule_work(irq_work_sched->sched_work);
+}
+
+/* Schedule work via irq work queue */
+bool irq_work_schedule(struct work_struct *sched_work)
+{
+ init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
+ irq_work_sched.sched_work = sched_work;
+
+ return irq_work_queue(&irq_work_sched.work);
+}
+EXPORT_SYMBOL_GPL(irq_work_schedule);
+
/*
* Enqueue the irq_work @work on @cpu unless it's already pending
* somewhere.
-Sumit
>
> Daniel.
On Fri, 14 Aug 2020 at 20:14, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Fri, Aug 14, 2020 at 4:17 AM Sumit Garg <[email protected]> wrote:
> >
> > On Thu, 13 Aug 2020 at 20:08, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Aug 13, 2020 at 7:19 AM Sumit Garg <[email protected]> wrote:
> > > >
> > > > On Thu, 13 Aug 2020 at 05:29, Doug Anderson <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Jul 21, 2020 at 5:11 AM Sumit Garg <[email protected]> wrote:
> > > > > >
> > > > > > Add NMI framework APIs in serial core which can be leveraged by serial
> > > > > > drivers to have NMI driven serial transfers. These APIs are kept under
> > > > > > CONFIG_CONSOLE_POLL as currently kgdb initializing uart in polling mode
> > > > > > is the only known user to enable NMI driven serial port.
> > > > > >
> > > > > > The general idea is to intercept RX characters in NMI context, if those
> > > > > > are specific to magic sysrq then allow corresponding handler to run in
> > > > > > NMI context. Otherwise defer all other RX and TX operations to IRQ work
> > > > > > queue in order to run those in normal interrupt context.
> > > > > >
> > > > > > Also, since magic sysrq entry APIs will need to be invoked from NMI
> > > > > > context, so make those APIs NMI safe via deferring NMI unsafe work to
> > > > > > IRQ work queue.
> > > > > >
> > > > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > > > ---
> > > > > > drivers/tty/serial/serial_core.c | 120 ++++++++++++++++++++++++++++++++++++++-
> > > > > > include/linux/serial_core.h | 67 ++++++++++++++++++++++
> > > > > > 2 files changed, 185 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > > > > index 57840cf..6342e90 100644
> > > > > > --- a/drivers/tty/serial/serial_core.c
> > > > > > +++ b/drivers/tty/serial/serial_core.c
> > > > > > @@ -3181,8 +3181,14 @@ static bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch)
> > > > > > return true;
> > > > > > }
> > > > > >
> > > > > > +#ifdef CONFIG_CONSOLE_POLL
> > > > > > + if (in_nmi())
> > > > > > + irq_work_queue(&port->nmi_state.sysrq_toggle_work);
> > > > > > + else
> > > > > > + schedule_work(&sysrq_enable_work);
> > > > > > +#else
> > > > > > schedule_work(&sysrq_enable_work);
> > > > > > -
> > > > > > +#endif
> > > > >
> > > > > It should be a very high bar to have #ifdefs inside functions. I
> > > > > don't think this meets it. Instead maybe something like this
> > > > > (untested and maybe slightly wrong syntax, but hopefully makes
> > > > > sense?):
> > > > >
> > > > > Outside the function:
> > > > >
> > > > > #ifdef CONFIG_CONSOLE_POLL
> > > > > #define queue_port_nmi_work(port, work_type)
> > > > > irq_work_queue(&port->nmi_state.work_type)
> > > > > #else
> > > > > #define queue_port_nmi_work(port, work_type)
> > > > > #endif
> > > > >
> > > > > ...and then:
> > > > >
> > > > > if (IS_ENABLED(CONFIG_CONSOLE_POLL) && in_nmi())
> > > > > queue_port_nmi_work(port, sysrq_toggle_work);
> > > > > else
> > > > > schedule_work(&sysrq_enable_work);
> > > > >
> > > > > ---
> > > > >
> > > > > The whole double-hopping is really quite annoying. I guess
> > > > > schedule_work() can't be called from NMI context but can be called
> > > > > from IRQ context? So you need to first transition from NMI context to
> > > > > IRQ context and then go and schedule the work? Almost feels like we
> > > > > should just fix schedule_work() to do this double-hop for you if
> > > > > called from NMI context. Seems like you could even re-use the list
> > > > > pointers in the work_struct to keep the queue of people who need to be
> > > > > scheduled from the next irq_work? Worst case it seems like you could
> > > > > add a schedule_work_nmi() that would do all the hoops for you. ...but
> > > > > I also know very little about NMI so maybe I'm being naive.
> > > > >
> > > >
> > > > Thanks for this suggestion and yes indeed we could make
> > > > schedule_work() NMI safe and in turn get rid of all this #ifdefs. Have
> > > > a look at below changes:
> > > >
> > > > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> > > > index 26de0ca..1daf1b4 100644
> > > > --- a/include/linux/workqueue.h
> > > > +++ b/include/linux/workqueue.h
> > > > @@ -14,6 +14,7 @@
> > > > #include <linux/atomic.h>
> > > > #include <linux/cpumask.h>
> > > > #include <linux/rcupdate.h>
> > > > +#include <linux/irq_work.h>
> > > >
> > > > struct workqueue_struct;
> > > >
> > > > @@ -106,6 +107,7 @@ struct work_struct {
> > > > #ifdef CONFIG_LOCKDEP
> > > > struct lockdep_map lockdep_map;
> > > > #endif
> > > > + struct irq_work iw;
> > >
> > > Hrm, I was thinking you could just have a single queue per CPU then
> > > you don't need to add all this extra data to every single "struct
> > > work_struct". I was thinking you could use the existing list node in
> > > the "struct work_struct" to keep track of the list of things. ...but
> > > maybe my idea this isn't actually valid because the linked list might
> > > be in use if we're scheduling work that's already pending / running?
> > >
> > > In any case, I worry that people won't be happy with the extra
> > > overhead per "struct work_struct". Can we reduce it at all? It still
> > > does feel like you could get by with a single global queue and thus
> > > you wouldn't need to store the function pointer and flags with every
> > > "struct work_struct", right? So all you'd need is a single pointer
> > > for the linked list? I haven't actually tried implementing this,
> > > though, so I could certainly be wrong.
> >
> > Let me try to elaborate here:
> >
> > Here we are dealing with 2 different layers of deferring work, one is
> > irq_work (NMI safe) using "struct irq_work" and other is normal
> > workqueue (NMI unsafe) using "struct work_struct".
> >
> > So when we are in NMI context, the only option is to use irq_work to
> > defer work and need to pass reference to "struct irq_work". Now in
> > following irq_work function:
> >
> > +void queue_work_nmi(struct irq_work *iw)
> > +{
> > + struct work_struct *work = container_of(iw, struct work_struct, iw);
> > +
> > + queue_work(system_wq, work);
> > +}
> > +EXPORT_SYMBOL(queue_work_nmi);
> >
> > we can't find a reference to "struct work_struct" until there is 1:1
> > mapping with "struct irq_work". So we require a way to establish this
> > mapping and having "struct irq_work" as part of "struct work_struct"
> > tries to achieve that. If you have any better way to achieve this, I
> > can use that instead.
>
> So I guess the two options to avoid the overhead are:
>
> 1. Create a new struct:
>
> struct nmi_queuable_work_struct {
> struct work_struct work;
> struct irq_work iw;
> };
>
> Then the overhead is only needed for those that want this
> functionality. Those people would need to use a variant
> nmi_schedule_work() which, depending on in_nmi(), would either
> schedule it directly or use the extra work.
>
> Looks like Daniel already responded and suggested this.
>
>
> 2. Something that duplicates the code of at least part of irq_work and
> therefore saves the need to store the function pointer. Think of it
> this way: if you made a whole copy of irq_work that was hardcoded to
> just call the function you wanted then you wouldn't need to store a
> function pointer. This is, of course, excessive. I was trying to
> figure out if you could do less by only copying the NMI-safe
> linked-list manipulation, but this is probably impossible and not
> worth it anyway.
>
Thanks for your suggestions. I came up with an approach without any
overhead (see my reply to Daniel).
-Sumit
> -Doug
Hi,
On Mon, Aug 17, 2020 at 5:27 AM Sumit Garg <[email protected]> wrote:
>
> Thanks for your suggestion, irq_work_schedule() looked even better
> without any overhead, see below:
>
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 3082378..1eade89 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -3,6 +3,7 @@
> #define _LINUX_IRQ_WORK_H
>
> #include <linux/smp_types.h>
> +#include <linux/workqueue.h>
>
> /*
> * An entry can be in one of four states:
> @@ -24,6 +25,11 @@ struct irq_work {
> void (*func)(struct irq_work *);
> };
>
> +struct irq_work_schedule {
> + struct irq_work work;
> + struct work_struct *sched_work;
> +};
> +
> static inline
> void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
> {
> {
> @@ -39,6 +45,7 @@ void init_irq_work(struct irq_work *work, void
> (*func)(struct irq_work *))
>
> bool irq_work_queue(struct irq_work *work);
> bool irq_work_queue_on(struct irq_work *work, int cpu);
> +bool irq_work_schedule(struct work_struct *sched_work);
>
> void irq_work_tick(void);
> void irq_work_sync(struct irq_work *work);
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index eca8396..3880316 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -24,6 +24,8 @@
> static DEFINE_PER_CPU(struct llist_head, raised_list);
> static DEFINE_PER_CPU(struct llist_head, lazy_list);
>
> +static struct irq_work_schedule irq_work_sched;
> +
> /*
> * Claim the entry so that no one else will poke at it.
> */
> @@ -79,6 +81,25 @@ bool irq_work_queue(struct irq_work *work)
> }
> EXPORT_SYMBOL_GPL(irq_work_queue);
>
> +static void irq_work_schedule_fn(struct irq_work *work)
> +{
> + struct irq_work_schedule *irq_work_sched =
> + container_of(work, struct irq_work_schedule, work);
> +
> + if (irq_work_sched->sched_work)
> + schedule_work(irq_work_sched->sched_work);
> +}
> +
> +/* Schedule work via irq work queue */
> +bool irq_work_schedule(struct work_struct *sched_work)
> +{
> + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> + irq_work_sched.sched_work = sched_work;
> +
> + return irq_work_queue(&irq_work_sched.work);
> +}
> +EXPORT_SYMBOL_GPL(irq_work_schedule);
Wait, howzat work? There's a single global variable that you stash
the "sched_work" into with no locking? What if two people schedule
work at the same time?
-Doug
On Fri, 14 Aug 2020 at 20:27, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Fri, Aug 14, 2020 at 12:24 AM Sumit Garg <[email protected]> wrote:
> >
> > + Peter (author of irq_work.c)
> >
> > On Thu, 13 Aug 2020 at 05:30, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Jul 21, 2020 at 5:10 AM Sumit Garg <[email protected]> wrote:
> > > >
> > > > In a future patch we will add support to the serial core to make it
> > > > possible to trigger a magic sysrq from an NMI context. Prepare for this
> > > > by marking some sysrq actions as NMI safe. Safe actions will be allowed
> > > > to run from NMI context whilst that cannot run from an NMI will be queued
> > > > as irq_work for later processing.
> > > >
> > > > A particular sysrq handler is only marked as NMI safe in case the handler
> > > > isn't contending for any synchronization primitives as in NMI context
> > > > they are expected to cause deadlocks. Note that the debug sysrq do not
> > > > contend for any synchronization primitives. It does call kgdb_breakpoint()
> > > > to provoke a trap but that trap handler should be NMI safe on
> > > > architectures that implement an NMI.
> > > >
> > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > ---
> > > > drivers/tty/sysrq.c | 33 ++++++++++++++++++++++++++++++++-
> > > > include/linux/sysrq.h | 1 +
> > > > kernel/debug/debug_core.c | 1 +
> > > > 3 files changed, 34 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > > > index 7c95afa9..8017e33 100644
> > > > --- a/drivers/tty/sysrq.c
> > > > +++ b/drivers/tty/sysrq.c
> > > > @@ -50,6 +50,8 @@
> > > > #include <linux/syscalls.h>
> > > > #include <linux/of.h>
> > > > #include <linux/rcupdate.h>
> > > > +#include <linux/irq_work.h>
> > > > +#include <linux/kfifo.h>
> > > >
> > > > #include <asm/ptrace.h>
> > > > #include <asm/irq_regs.h>
> > > > @@ -111,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
> > > > .help_msg = "loglevel(0-9)",
> > > > .action_msg = "Changing Loglevel",
> > > > .enable_mask = SYSRQ_ENABLE_LOG,
> > > > + .nmi_safe = true,
> > > > };
> > > >
> > > > #ifdef CONFIG_VT
> > > > @@ -157,6 +160,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
> > > > .help_msg = "crash(c)",
> > > > .action_msg = "Trigger a crash",
> > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > + .nmi_safe = true,
> > > > };
> > > >
> > > > static void sysrq_handle_reboot(int key)
> > > > @@ -170,6 +174,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
> > > > .help_msg = "reboot(b)",
> > > > .action_msg = "Resetting",
> > > > .enable_mask = SYSRQ_ENABLE_BOOT,
> > > > + .nmi_safe = true,
> > > > };
> > > >
> > > > const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
> > > > @@ -217,6 +222,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
> > > > .handler = sysrq_handle_showlocks,
> > > > .help_msg = "show-all-locks(d)",
> > > > .action_msg = "Show Locks Held",
> > > > + .nmi_safe = true,
> > > > };
> > > > #else
> > > > #define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
> > > > @@ -289,6 +295,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
> > > > .help_msg = "show-registers(p)",
> > > > .action_msg = "Show Regs",
> > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > + .nmi_safe = true,
> > > > };
> > > >
> > > > static void sysrq_handle_showstate(int key)
> > > > @@ -326,6 +333,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
> > > > .help_msg = "dump-ftrace-buffer(z)",
> > > > .action_msg = "Dump ftrace buffer",
> > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > + .nmi_safe = true,
> > > > };
> > > > #else
> > > > #define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
> > > > @@ -538,6 +546,23 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> > > > sysrq_key_table[i] = op_p;
> > > > }
> > > >
> > > > +#define SYSRQ_NMI_FIFO_SIZE 64
> > > > +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
> > >
> > > A 64-entry FIFO seems excessive. Quite honestly even a FIFO seems a
> > > bit excessive and it feels like if two sysrqs were received in super
> > > quick succession that it would be OK to just process the first one. I
> > > guess if it simplifies the processing to have a FIFO then it shouldn't
> > > hurt, but no need for 64 entries.
> > >
> >
> > Okay, would a 2-entry FIFO work here? As here we need a FIFO to pass
> > on the key parameter.
>
> ...or even a 1-entry FIFO if that makes sense?
>
Yes it would make sense but unfortunately not supported by kfifo
(size: power of 2).
>
> > > > +static void sysrq_do_nmi_work(struct irq_work *work)
> > > > +{
> > > > + const struct sysrq_key_op *op_p;
> > > > + int key;
> > > > +
> > > > + while (kfifo_out(&sysrq_nmi_fifo, &key, 1)) {
> > > > + op_p = __sysrq_get_key_op(key);
> > > > + if (op_p)
> > > > + op_p->handler(key);
> > > > + }
> > >
> > > Do you need to manage "suppress_printk" in this function? Do you need
> > > to call rcu_sysrq_start() and rcu_read_lock()?
> >
> > Ah I missed those. Will add them here instead.
> >
> > >
> > > If so, how do you prevent racing between the mucking we're doing with
> > > these things and the mucking that the NMI does with them?
> >
> > IIUC, here you meant to highlight the race while scheduled sysrq is
> > executing in IRQ context and we receive a new sysrq in NMI context,
> > correct? If yes, this seems to be a trickier situation. I think the
> > appropriate way to handle it would be to deny any further sysrq
> > handling until the prior sysrq handling is complete, your views?
>
> The problem is that in some cases you're running NMIs directly at FIQ
> time and other cases you're running them at IRQ time. So you
> definitely can't just move it to NMI.
>
> Skipping looking for other SYSRQs until the old one is complete sounds
> good to me. Again my ignorance will make me sound like a fool,
> probably, but can you use the kfifo as a form of mutual exclusion? If
> you have a 1-entry kfifo, maybe:
>
> 1. First try to add to the "FIFO". If it fails (out of space) then a
> sysrq is in progress. Ignore this one.
> 2. Decide if you're NMI-safe or not.
> 3. If NMI safe, modify "suppress_printk", call rcu functions, then
> call the handler. Restore suppress_printk and then dequeue from FIFO.
> 4. If not-NMI safe, the irq worker would "peek" into the FIFO, do its
> work (wrapped with "suppress_printk" and the like), and not dequeue
> until it's done.
>
> In the above you'd use the FIFO as a locking mechanism. I don't know
> if that's a valid use of it or if there is a better NMI-safe mechanism
> for this. I think the kfifo docs talk about only one reader and one
> writer and here we have two readers, so maybe it's illegal. It also
> seems weird to have a 1-entry "FIFO" and feels like there's probably a
> better data structure for this.
Thanks for your suggestions. Have a look at below implementation, I
have used 2-entry fifo but only single entry used for locking
mechanism:
@@ -538,6 +546,39 @@ static void __sysrq_put_key_op(int key, const
struct sysrq_key_op *op_p)
sysrq_key_table[i] = op_p;
}
+#define SYSRQ_NMI_FIFO_SIZE 2
+static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
+
+static void sysrq_do_nmi_work(struct irq_work *work)
+{
+ const struct sysrq_key_op *op_p;
+ int orig_suppress_printk;
+ int key;
+
+ orig_suppress_printk = suppress_printk;
+ suppress_printk = 0;
+
+ rcu_sysrq_start();
+ rcu_read_lock();
+
+ if (kfifo_peek(&sysrq_nmi_fifo, &key)) {
+ op_p = __sysrq_get_key_op(key);
+ if (op_p)
+ op_p->handler(key);
+ }
+
+ rcu_read_unlock();
+ rcu_sysrq_end();
+
+ suppress_printk = orig_suppress_printk;
+
+ /* Pop contents from fifo if any */
+ while (kfifo_get(&sysrq_nmi_fifo, &key))
+ ;
+}
+
+static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
+
void __handle_sysrq(int key, bool check_mask)
{
const struct sysrq_key_op *op_p;
+}
+
+static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
+
void __handle_sysrq(int key, bool check_mask)
{
const struct sysrq_key_op *op_p;
@@ -545,6 +586,10 @@ void __handle_sysrq(int key, bool check_mask)
int orig_suppress_printk;
int i;
+ /* Skip sysrq handling if one already in progress */
+ if (!kfifo_is_empty(&sysrq_nmi_fifo))
+ return;
+
orig_suppress_printk = suppress_printk;
suppress_printk = 0;
@@ -568,7 +613,13 @@ void __handle_sysrq(int key, bool check_mask)
if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
pr_info("%s\n", op_p->action_msg);
console_loglevel = orig_log_level;
- op_p->handler(key);
+
+ if (in_nmi() && !op_p->nmi_safe) {
+ kfifo_put(&sysrq_nmi_fifo, key);
+ irq_work_queue(&sysrq_nmi_work);
+ } else {
+ op_p->handler(key);
+ }
} else {
pr_info("This sysrq operation is disabled.\n");
console_loglevel = orig_log_level;
-Sumit
On Mon, 17 Aug 2020 at 14:58, Daniel Thompson
<[email protected]> wrote:
>
> On Mon, Aug 17, 2020 at 10:42:43AM +0530, Sumit Garg wrote:
> > On Fri, 14 Aug 2020 at 19:48, Daniel Thompson
> > <[email protected]> wrote:
> > >
> > > On Fri, Aug 14, 2020 at 05:36:36PM +0530, Sumit Garg wrote:
> > > > On Thu, 13 Aug 2020 at 15:47, Daniel Thompson
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Thu, Aug 13, 2020 at 02:55:12PM +0530, Sumit Garg wrote:
> > > > > > On Thu, 13 Aug 2020 at 05:38, Doug Anderson <[email protected]> wrote:
> > > > > > > On Wed, Aug 12, 2020 at 8:27 AM Doug Anderson <[email protected]> wrote:
> > > > > > > > One
> > > > > > > > last worry is that I assume that most people testing (and even
> > > > > > > > automated testing labs) will either always enable NMI or won't enable
> > > > > > > > NMI. That means that everyone will be only testing one codepath or
> > > > > > > > the other and (given the complexity) the non-tested codepath will
> > > > > > > > break.
> > > > > > > >
> > > > > >
> > > > > > The current patch-set only makes this NMI to work when debugger (kgdb)
> > > > > > is enabled which I think is mostly suitable for development
> > > > > > environments. So most people testing will involve existing IRQ mode
> > > > > > only.
> > > > > >
> > > > > > However, it's very much possible to make NMI mode as default for a
> > > > > > particular serial driver if the underlying irqchip supports it but it
> > > > > > depends if we really see any production level usage of NMI debug
> > > > > > feature.
> > > > >
> > > > > The effect of this patch is not to make kgdb work from NMI it is to make
> > > > > (some) SysRqs work from NMI. I think that only allowing it to deploy for
> > > > > kgdb users is a mistake.
> > > > >
> > > > > Having it deploy automatically for kgdb users might be OK but it seems
> > > > > sensible to make this feature available for other users too.
> > > >
> > > > I think I wasn't clear enough in my prior reply. Actually I meant to
> > > > say that this patch-set enables NMI support for a particular serial
> > > > driver via ".poll_init()" interface and the only current user of that
> > > > interface is kgdb.
> > > >
> > > > So if there are other users interested in this feature, they can use
> > > > ".poll_init()" interface as well to enable it.
> > >
> > > Huh?
> > >
> > > We appear to speaking interchangably about users (people who sit in
> > > front of the machine and want a stack trace) and sub-systems ;-).
> > >
> > > I don't think other SysRq commands have quite such a direct relationship
> > > between the sub-system and the sysrq command. For example who are you
> > > expecting to call .poll_init() if a user wants to use the SysRq to
> > > provoke a stack trace?
> > >
> >
> > Ah, I see. So you meant to provide a user-space interface to
> > dynamically enable/disable NMI debug, correct? It will require IRQ <->
> > NMI switching at runtime which should be doable safely.
>
> I haven't given much thought to the exact mechanism, though I would
> perhaps have started by thinking about a module parameter).
>
> From an RFC point of view, I simple think this feature is potentially
> useful on systems without kgdb (which, let's be honest, are firmly in
> the majority) so making .poll_init() the only way to activate it is a
> mistake.
>
Makes sense, will add a module parameter to enable this feature during
boot as well.
-Sumit
>
> Daniel.
On Mon, 17 Aug 2020 at 19:27, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Mon, Aug 17, 2020 at 5:27 AM Sumit Garg <[email protected]> wrote:
> >
> > Thanks for your suggestion, irq_work_schedule() looked even better
> > without any overhead, see below:
> >
> > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > index 3082378..1eade89 100644
> > --- a/include/linux/irq_work.h
> > +++ b/include/linux/irq_work.h
> > @@ -3,6 +3,7 @@
> > #define _LINUX_IRQ_WORK_H
> >
> > #include <linux/smp_types.h>
> > +#include <linux/workqueue.h>
> >
> > /*
> > * An entry can be in one of four states:
> > @@ -24,6 +25,11 @@ struct irq_work {
> > void (*func)(struct irq_work *);
> > };
> >
> > +struct irq_work_schedule {
> > + struct irq_work work;
> > + struct work_struct *sched_work;
> > +};
> > +
> > static inline
> > void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
> > {
> > {
> > @@ -39,6 +45,7 @@ void init_irq_work(struct irq_work *work, void
> > (*func)(struct irq_work *))
> >
> > bool irq_work_queue(struct irq_work *work);
> > bool irq_work_queue_on(struct irq_work *work, int cpu);
> > +bool irq_work_schedule(struct work_struct *sched_work);
> >
> > void irq_work_tick(void);
> > void irq_work_sync(struct irq_work *work);
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > index eca8396..3880316 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -24,6 +24,8 @@
> > static DEFINE_PER_CPU(struct llist_head, raised_list);
> > static DEFINE_PER_CPU(struct llist_head, lazy_list);
> >
> > +static struct irq_work_schedule irq_work_sched;
> > +
> > /*
> > * Claim the entry so that no one else will poke at it.
> > */
> > @@ -79,6 +81,25 @@ bool irq_work_queue(struct irq_work *work)
> > }
> > EXPORT_SYMBOL_GPL(irq_work_queue);
> >
> > +static void irq_work_schedule_fn(struct irq_work *work)
> > +{
> > + struct irq_work_schedule *irq_work_sched =
> > + container_of(work, struct irq_work_schedule, work);
> > +
> > + if (irq_work_sched->sched_work)
> > + schedule_work(irq_work_sched->sched_work);
> > +}
> > +
> > +/* Schedule work via irq work queue */
> > +bool irq_work_schedule(struct work_struct *sched_work)
> > +{
> > + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> > + irq_work_sched.sched_work = sched_work;
> > +
> > + return irq_work_queue(&irq_work_sched.work);
> > +}
> > +EXPORT_SYMBOL_GPL(irq_work_schedule);
>
> Wait, howzat work? There's a single global variable that you stash
> the "sched_work" into with no locking? What if two people schedule
> work at the same time?
This API is intended to be invoked from NMI context only, so I think
there will be a single user at a time. And we can make that explicit
as well:
+/* Schedule work via irq work queue */
+bool irq_work_schedule(struct work_struct *sched_work)
+{
+ if (in_nmi()) {
+ init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
+ irq_work_sched.sched_work = sched_work;
+
+ return irq_work_queue(&irq_work_sched.work);
+ }
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(irq_work_schedule);
-Sumit
>
> -Doug
On Mon, Aug 17, 2020 at 07:53:55PM +0530, Sumit Garg wrote:
> On Mon, 17 Aug 2020 at 19:27, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Mon, Aug 17, 2020 at 5:27 AM Sumit Garg <[email protected]> wrote:
> > >
> > > Thanks for your suggestion, irq_work_schedule() looked even better
> > > without any overhead, see below:
> > >
> > > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > > index 3082378..1eade89 100644
> > > --- a/include/linux/irq_work.h
> > > +++ b/include/linux/irq_work.h
> > > @@ -3,6 +3,7 @@
> > > #define _LINUX_IRQ_WORK_H
> > >
> > > #include <linux/smp_types.h>
> > > +#include <linux/workqueue.h>
> > >
> > > /*
> > > * An entry can be in one of four states:
> > > @@ -24,6 +25,11 @@ struct irq_work {
> > > void (*func)(struct irq_work *);
> > > };
> > >
> > > +struct irq_work_schedule {
> > > + struct irq_work work;
> > > + struct work_struct *sched_work;
> > > +};
> > > +
> > > static inline
> > > void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
> > > {
> > > {
> > > @@ -39,6 +45,7 @@ void init_irq_work(struct irq_work *work, void
> > > (*func)(struct irq_work *))
> > >
> > > bool irq_work_queue(struct irq_work *work);
> > > bool irq_work_queue_on(struct irq_work *work, int cpu);
> > > +bool irq_work_schedule(struct work_struct *sched_work);
> > >
> > > void irq_work_tick(void);
> > > void irq_work_sync(struct irq_work *work);
> > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > > index eca8396..3880316 100644
> > > --- a/kernel/irq_work.c
> > > +++ b/kernel/irq_work.c
> > > @@ -24,6 +24,8 @@
> > > static DEFINE_PER_CPU(struct llist_head, raised_list);
> > > static DEFINE_PER_CPU(struct llist_head, lazy_list);
> > >
> > > +static struct irq_work_schedule irq_work_sched;
> > > +
> > > /*
> > > * Claim the entry so that no one else will poke at it.
> > > */
> > > @@ -79,6 +81,25 @@ bool irq_work_queue(struct irq_work *work)
> > > }
> > > EXPORT_SYMBOL_GPL(irq_work_queue);
> > >
> > > +static void irq_work_schedule_fn(struct irq_work *work)
> > > +{
> > > + struct irq_work_schedule *irq_work_sched =
> > > + container_of(work, struct irq_work_schedule, work);
> > > +
> > > + if (irq_work_sched->sched_work)
> > > + schedule_work(irq_work_sched->sched_work);
> > > +}
> > > +
> > > +/* Schedule work via irq work queue */
> > > +bool irq_work_schedule(struct work_struct *sched_work)
> > > +{
> > > + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> > > + irq_work_sched.sched_work = sched_work;
> > > +
> > > + return irq_work_queue(&irq_work_sched.work);
> > > +}
> > > +EXPORT_SYMBOL_GPL(irq_work_schedule);
> >
> > Wait, howzat work? There's a single global variable that you stash
> > the "sched_work" into with no locking? What if two people schedule
> > work at the same time?
>
> This API is intended to be invoked from NMI context only, so I think
> there will be a single user at a time.
How can you possibly know that?
This is library code, not a helper in a driver.
Daniel.
> And we can make that explicit
> as well:
>
> +/* Schedule work via irq work queue */
> +bool irq_work_schedule(struct work_struct *sched_work)
> +{
> + if (in_nmi()) {
> + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> + irq_work_sched.sched_work = sched_work;
> +
> + return irq_work_queue(&irq_work_sched.work);
> + }
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(irq_work_schedule);
>
> -Sumit
>
> >
> > -Doug
Hi,
On Mon, Aug 17, 2020 at 7:08 AM Sumit Garg <[email protected]> wrote:
>
> On Fri, 14 Aug 2020 at 20:27, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Fri, Aug 14, 2020 at 12:24 AM Sumit Garg <[email protected]> wrote:
> > >
> > > + Peter (author of irq_work.c)
> > >
> > > On Thu, 13 Aug 2020 at 05:30, Doug Anderson <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Jul 21, 2020 at 5:10 AM Sumit Garg <[email protected]> wrote:
> > > > >
> > > > > In a future patch we will add support to the serial core to make it
> > > > > possible to trigger a magic sysrq from an NMI context. Prepare for this
> > > > > by marking some sysrq actions as NMI safe. Safe actions will be allowed
> > > > > to run from NMI context whilst that cannot run from an NMI will be queued
> > > > > as irq_work for later processing.
> > > > >
> > > > > A particular sysrq handler is only marked as NMI safe in case the handler
> > > > > isn't contending for any synchronization primitives as in NMI context
> > > > > they are expected to cause deadlocks. Note that the debug sysrq do not
> > > > > contend for any synchronization primitives. It does call kgdb_breakpoint()
> > > > > to provoke a trap but that trap handler should be NMI safe on
> > > > > architectures that implement an NMI.
> > > > >
> > > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > > ---
> > > > > drivers/tty/sysrq.c | 33 ++++++++++++++++++++++++++++++++-
> > > > > include/linux/sysrq.h | 1 +
> > > > > kernel/debug/debug_core.c | 1 +
> > > > > 3 files changed, 34 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > > > > index 7c95afa9..8017e33 100644
> > > > > --- a/drivers/tty/sysrq.c
> > > > > +++ b/drivers/tty/sysrq.c
> > > > > @@ -50,6 +50,8 @@
> > > > > #include <linux/syscalls.h>
> > > > > #include <linux/of.h>
> > > > > #include <linux/rcupdate.h>
> > > > > +#include <linux/irq_work.h>
> > > > > +#include <linux/kfifo.h>
> > > > >
> > > > > #include <asm/ptrace.h>
> > > > > #include <asm/irq_regs.h>
> > > > > @@ -111,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
> > > > > .help_msg = "loglevel(0-9)",
> > > > > .action_msg = "Changing Loglevel",
> > > > > .enable_mask = SYSRQ_ENABLE_LOG,
> > > > > + .nmi_safe = true,
> > > > > };
> > > > >
> > > > > #ifdef CONFIG_VT
> > > > > @@ -157,6 +160,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
> > > > > .help_msg = "crash(c)",
> > > > > .action_msg = "Trigger a crash",
> > > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > > + .nmi_safe = true,
> > > > > };
> > > > >
> > > > > static void sysrq_handle_reboot(int key)
> > > > > @@ -170,6 +174,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
> > > > > .help_msg = "reboot(b)",
> > > > > .action_msg = "Resetting",
> > > > > .enable_mask = SYSRQ_ENABLE_BOOT,
> > > > > + .nmi_safe = true,
> > > > > };
> > > > >
> > > > > const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
> > > > > @@ -217,6 +222,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
> > > > > .handler = sysrq_handle_showlocks,
> > > > > .help_msg = "show-all-locks(d)",
> > > > > .action_msg = "Show Locks Held",
> > > > > + .nmi_safe = true,
> > > > > };
> > > > > #else
> > > > > #define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
> > > > > @@ -289,6 +295,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
> > > > > .help_msg = "show-registers(p)",
> > > > > .action_msg = "Show Regs",
> > > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > > + .nmi_safe = true,
> > > > > };
> > > > >
> > > > > static void sysrq_handle_showstate(int key)
> > > > > @@ -326,6 +333,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
> > > > > .help_msg = "dump-ftrace-buffer(z)",
> > > > > .action_msg = "Dump ftrace buffer",
> > > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > > + .nmi_safe = true,
> > > > > };
> > > > > #else
> > > > > #define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
> > > > > @@ -538,6 +546,23 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> > > > > sysrq_key_table[i] = op_p;
> > > > > }
> > > > >
> > > > > +#define SYSRQ_NMI_FIFO_SIZE 64
> > > > > +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
> > > >
> > > > A 64-entry FIFO seems excessive. Quite honestly even a FIFO seems a
> > > > bit excessive and it feels like if two sysrqs were received in super
> > > > quick succession that it would be OK to just process the first one. I
> > > > guess if it simplifies the processing to have a FIFO then it shouldn't
> > > > hurt, but no need for 64 entries.
> > > >
> > >
> > > Okay, would a 2-entry FIFO work here? As here we need a FIFO to pass
> > > on the key parameter.
> >
> > ...or even a 1-entry FIFO if that makes sense?
> >
>
> Yes it would make sense but unfortunately not supported by kfifo
> (size: power of 2).
Typically 1 is considered to be a power of 2 since 2^0 = 1.
...ah, but it appears that size < 2 is not allowed. Oh well.
> > > > > +static void sysrq_do_nmi_work(struct irq_work *work)
> > > > > +{
> > > > > + const struct sysrq_key_op *op_p;
> > > > > + int key;
> > > > > +
> > > > > + while (kfifo_out(&sysrq_nmi_fifo, &key, 1)) {
> > > > > + op_p = __sysrq_get_key_op(key);
> > > > > + if (op_p)
> > > > > + op_p->handler(key);
> > > > > + }
> > > >
> > > > Do you need to manage "suppress_printk" in this function? Do you need
> > > > to call rcu_sysrq_start() and rcu_read_lock()?
> > >
> > > Ah I missed those. Will add them here instead.
> > >
> > > >
> > > > If so, how do you prevent racing between the mucking we're doing with
> > > > these things and the mucking that the NMI does with them?
> > >
> > > IIUC, here you meant to highlight the race while scheduled sysrq is
> > > executing in IRQ context and we receive a new sysrq in NMI context,
> > > correct? If yes, this seems to be a trickier situation. I think the
> > > appropriate way to handle it would be to deny any further sysrq
> > > handling until the prior sysrq handling is complete, your views?
> >
> > The problem is that in some cases you're running NMIs directly at FIQ
> > time and other cases you're running them at IRQ time. So you
> > definitely can't just move it to NMI.
> >
> > Skipping looking for other SYSRQs until the old one is complete sounds
> > good to me. Again my ignorance will make me sound like a fool,
> > probably, but can you use the kfifo as a form of mutual exclusion? If
> > you have a 1-entry kfifo, maybe:
> >
> > 1. First try to add to the "FIFO". If it fails (out of space) then a
> > sysrq is in progress. Ignore this one.
> > 2. Decide if you're NMI-safe or not.
> > 3. If NMI safe, modify "suppress_printk", call rcu functions, then
> > call the handler. Restore suppress_printk and then dequeue from FIFO.
> > 4. If not-NMI safe, the irq worker would "peek" into the FIFO, do its
> > work (wrapped with "suppress_printk" and the like), and not dequeue
> > until it's done.
> >
> > In the above you'd use the FIFO as a locking mechanism. I don't know
> > if that's a valid use of it or if there is a better NMI-safe mechanism
> > for this. I think the kfifo docs talk about only one reader and one
> > writer and here we have two readers, so maybe it's illegal. It also
> > seems weird to have a 1-entry "FIFO" and feels like there's probably a
> > better data structure for this.
>
> Thanks for your suggestions. Have a look at below implementation, I
> have used 2-entry fifo but only single entry used for locking
> mechanism:
>
> @@ -538,6 +546,39 @@ static void __sysrq_put_key_op(int key, const
> struct sysrq_key_op *op_p)
> sysrq_key_table[i] = op_p;
> }
>
> +#define SYSRQ_NMI_FIFO_SIZE 2
> +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
> +
> +static void sysrq_do_nmi_work(struct irq_work *work)
> +{
> + const struct sysrq_key_op *op_p;
> + int orig_suppress_printk;
> + int key;
> +
> + orig_suppress_printk = suppress_printk;
> + suppress_printk = 0;
> +
> + rcu_sysrq_start();
> + rcu_read_lock();
> +
> + if (kfifo_peek(&sysrq_nmi_fifo, &key)) {
> + op_p = __sysrq_get_key_op(key);
> + if (op_p)
> + op_p->handler(key);
> + }
> +
> + rcu_read_unlock();
> + rcu_sysrq_end();
> +
> + suppress_printk = orig_suppress_printk;
> +
> + /* Pop contents from fifo if any */
> + while (kfifo_get(&sysrq_nmi_fifo, &key))
> + ;
I think you can use kfifo_reset_out().
> +}
> +
> +static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
> +
> void __handle_sysrq(int key, bool check_mask)
> {
> const struct sysrq_key_op *op_p;
> +}
> +
> +static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
> +
> void __handle_sysrq(int key, bool check_mask)
> {
> const struct sysrq_key_op *op_p;
> @@ -545,6 +586,10 @@ void __handle_sysrq(int key, bool check_mask)
> int orig_suppress_printk;
> int i;
>
> + /* Skip sysrq handling if one already in progress */
> + if (!kfifo_is_empty(&sysrq_nmi_fifo))
> + return;
This _seems_ OK to me since I'd imagine kfifo_is_empty() is as safe
for the writer to do as kfifo_is_full() is and kfifo_is_full() is part
of kfifo_put().
I guess there's no better synchronism mechanism that we can use?
> +
> orig_suppress_printk = suppress_printk;
> suppress_printk = 0;
>
> @@ -568,7 +613,13 @@ void __handle_sysrq(int key, bool check_mask)
> if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
> pr_info("%s\n", op_p->action_msg);
> console_loglevel = orig_log_level;
> - op_p->handler(key);
> +
> + if (in_nmi() && !op_p->nmi_safe) {
> + kfifo_put(&sysrq_nmi_fifo, key);
> + irq_work_queue(&sysrq_nmi_work);
> + } else {
> + op_p->handler(key);
> + }
> } else {
> pr_info("This sysrq operation is disabled.\n");
> console_loglevel = orig_log_level;
>
> -Sumit
On Mon, Aug 17, 2020 at 05:57:03PM +0530, Sumit Garg wrote:
> On Fri, 14 Aug 2020 at 19:43, Daniel Thompson
> <[email protected]> wrote:
> > On Fri, Aug 14, 2020 at 04:47:11PM +0530, Sumit Garg wrote:
> > Does it look better if you create a new type to map the two structures
> > together. Alternatively are there enough existing use-cases to want to
> > extend irq_work_queue() with irq_work_schedule() or something similar?
> >
>
> Thanks for your suggestion, irq_work_schedule() looked even better
> without any overhead, see below:
>
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 3082378..1eade89 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -3,6 +3,7 @@
> #define _LINUX_IRQ_WORK_H
>
> #include <linux/smp_types.h>
> +#include <linux/workqueue.h>
>
> /*
> * An entry can be in one of four states:
> @@ -24,6 +25,11 @@ struct irq_work {
> void (*func)(struct irq_work *);
> };
>
> +struct irq_work_schedule {
> + struct irq_work work;
> + struct work_struct *sched_work;
> +};
> +
> static inline
> void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
> {
> {
> @@ -39,6 +45,7 @@ void init_irq_work(struct irq_work *work, void
> (*func)(struct irq_work *))
>
> bool irq_work_queue(struct irq_work *work);
> bool irq_work_queue_on(struct irq_work *work, int cpu);
> +bool irq_work_schedule(struct work_struct *sched_work);
>
> void irq_work_tick(void);
> void irq_work_sync(struct irq_work *work);
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index eca8396..3880316 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -24,6 +24,8 @@
> static DEFINE_PER_CPU(struct llist_head, raised_list);
> static DEFINE_PER_CPU(struct llist_head, lazy_list);
>
> +static struct irq_work_schedule irq_work_sched;
> +
> /*
> * Claim the entry so that no one else will poke at it.
> */
> @@ -79,6 +81,25 @@ bool irq_work_queue(struct irq_work *work)
> }
> EXPORT_SYMBOL_GPL(irq_work_queue);
>
> +static void irq_work_schedule_fn(struct irq_work *work)
> +{
> + struct irq_work_schedule *irq_work_sched =
> + container_of(work, struct irq_work_schedule, work);
> +
> + if (irq_work_sched->sched_work)
> + schedule_work(irq_work_sched->sched_work);
> +}
> +
> +/* Schedule work via irq work queue */
> +bool irq_work_schedule(struct work_struct *sched_work)
> +{
> + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> + irq_work_sched.sched_work = sched_work;
> +
> + return irq_work_queue(&irq_work_sched.work);
> +}
> +EXPORT_SYMBOL_GPL(irq_work_schedule);
> +
This is irredeemably broken.
Even if we didn't care about dropping events (which we do) then when you
overwrite irq_work_sched with a copy of another work_struct, either of
which could currently be enqueued somewhere, then you will cause some
very nasty corruption.
Daniel.
On Mon, 17 Aug 2020 at 19:58, Daniel Thompson
<[email protected]> wrote:
>
> On Mon, Aug 17, 2020 at 05:57:03PM +0530, Sumit Garg wrote:
> > On Fri, 14 Aug 2020 at 19:43, Daniel Thompson
> > <[email protected]> wrote:
> > > On Fri, Aug 14, 2020 at 04:47:11PM +0530, Sumit Garg wrote:
> > > Does it look better if you create a new type to map the two structures
> > > together. Alternatively are there enough existing use-cases to want to
> > > extend irq_work_queue() with irq_work_schedule() or something similar?
> > >
> >
> > Thanks for your suggestion, irq_work_schedule() looked even better
> > without any overhead, see below:
> >
> > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > index 3082378..1eade89 100644
> > --- a/include/linux/irq_work.h
> > +++ b/include/linux/irq_work.h
> > @@ -3,6 +3,7 @@
> > #define _LINUX_IRQ_WORK_H
> >
> > #include <linux/smp_types.h>
> > +#include <linux/workqueue.h>
> >
> > /*
> > * An entry can be in one of four states:
> > @@ -24,6 +25,11 @@ struct irq_work {
> > void (*func)(struct irq_work *);
> > };
> >
> > +struct irq_work_schedule {
> > + struct irq_work work;
> > + struct work_struct *sched_work;
> > +};
> > +
> > static inline
> > void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
> > {
> > {
> > @@ -39,6 +45,7 @@ void init_irq_work(struct irq_work *work, void
> > (*func)(struct irq_work *))
> >
> > bool irq_work_queue(struct irq_work *work);
> > bool irq_work_queue_on(struct irq_work *work, int cpu);
> > +bool irq_work_schedule(struct work_struct *sched_work);
> >
> > void irq_work_tick(void);
> > void irq_work_sync(struct irq_work *work);
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > index eca8396..3880316 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -24,6 +24,8 @@
> > static DEFINE_PER_CPU(struct llist_head, raised_list);
> > static DEFINE_PER_CPU(struct llist_head, lazy_list);
> >
> > +static struct irq_work_schedule irq_work_sched;
> > +
> > /*
> > * Claim the entry so that no one else will poke at it.
> > */
> > @@ -79,6 +81,25 @@ bool irq_work_queue(struct irq_work *work)
> > }
> > EXPORT_SYMBOL_GPL(irq_work_queue);
> >
> > +static void irq_work_schedule_fn(struct irq_work *work)
> > +{
> > + struct irq_work_schedule *irq_work_sched =
> > + container_of(work, struct irq_work_schedule, work);
> > +
> > + if (irq_work_sched->sched_work)
> > + schedule_work(irq_work_sched->sched_work);
> > +}
> > +
> > +/* Schedule work via irq work queue */
> > +bool irq_work_schedule(struct work_struct *sched_work)
> > +{
> > + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> > + irq_work_sched.sched_work = sched_work;
> > +
> > + return irq_work_queue(&irq_work_sched.work);
> > +}
> > +EXPORT_SYMBOL_GPL(irq_work_schedule);
> > +
>
> This is irredeemably broken.
>
> Even if we didn't care about dropping events (which we do) then when you
> overwrite irq_work_sched with a copy of another work_struct, either of
> which could currently be enqueued somewhere, then you will cause some
> very nasty corruption.
>
Okay, I see your point. I think there isn't a way to avoid caller
specific struct such as:
struct nmi_queuable_work_struct {
struct work_struct work;
struct irq_work iw;
};
So in that case will shift to approach as suggested by Doug to rather
have a new nmi_schedule_work() API.
-Sumit
>
> Daniel.
On Mon, 17 Aug 2020 at 20:02, Daniel Thompson
<[email protected]> wrote:
>
> On Mon, Aug 17, 2020 at 07:53:55PM +0530, Sumit Garg wrote:
> > On Mon, 17 Aug 2020 at 19:27, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, Aug 17, 2020 at 5:27 AM Sumit Garg <[email protected]> wrote:
> > > >
> > > > Thanks for your suggestion, irq_work_schedule() looked even better
> > > > without any overhead, see below:
> > > >
> > > > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > > > index 3082378..1eade89 100644
> > > > --- a/include/linux/irq_work.h
> > > > +++ b/include/linux/irq_work.h
> > > > @@ -3,6 +3,7 @@
> > > > #define _LINUX_IRQ_WORK_H
> > > >
> > > > #include <linux/smp_types.h>
> > > > +#include <linux/workqueue.h>
> > > >
> > > > /*
> > > > * An entry can be in one of four states:
> > > > @@ -24,6 +25,11 @@ struct irq_work {
> > > > void (*func)(struct irq_work *);
> > > > };
> > > >
> > > > +struct irq_work_schedule {
> > > > + struct irq_work work;
> > > > + struct work_struct *sched_work;
> > > > +};
> > > > +
> > > > static inline
> > > > void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
> > > > {
> > > > {
> > > > @@ -39,6 +45,7 @@ void init_irq_work(struct irq_work *work, void
> > > > (*func)(struct irq_work *))
> > > >
> > > > bool irq_work_queue(struct irq_work *work);
> > > > bool irq_work_queue_on(struct irq_work *work, int cpu);
> > > > +bool irq_work_schedule(struct work_struct *sched_work);
> > > >
> > > > void irq_work_tick(void);
> > > > void irq_work_sync(struct irq_work *work);
> > > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > > > index eca8396..3880316 100644
> > > > --- a/kernel/irq_work.c
> > > > +++ b/kernel/irq_work.c
> > > > @@ -24,6 +24,8 @@
> > > > static DEFINE_PER_CPU(struct llist_head, raised_list);
> > > > static DEFINE_PER_CPU(struct llist_head, lazy_list);
> > > >
> > > > +static struct irq_work_schedule irq_work_sched;
> > > > +
> > > > /*
> > > > * Claim the entry so that no one else will poke at it.
> > > > */
> > > > @@ -79,6 +81,25 @@ bool irq_work_queue(struct irq_work *work)
> > > > }
> > > > EXPORT_SYMBOL_GPL(irq_work_queue);
> > > >
> > > > +static void irq_work_schedule_fn(struct irq_work *work)
> > > > +{
> > > > + struct irq_work_schedule *irq_work_sched =
> > > > + container_of(work, struct irq_work_schedule, work);
> > > > +
> > > > + if (irq_work_sched->sched_work)
> > > > + schedule_work(irq_work_sched->sched_work);
> > > > +}
> > > > +
> > > > +/* Schedule work via irq work queue */
> > > > +bool irq_work_schedule(struct work_struct *sched_work)
> > > > +{
> > > > + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> > > > + irq_work_sched.sched_work = sched_work;
> > > > +
> > > > + return irq_work_queue(&irq_work_sched.work);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(irq_work_schedule);
> > >
> > > Wait, howzat work? There's a single global variable that you stash
> > > the "sched_work" into with no locking? What if two people schedule
> > > work at the same time?
> >
> > This API is intended to be invoked from NMI context only, so I think
> > there will be a single user at a time.
>
> How can you possibly know that?
I guess here you are referring to NMI nesting, correct?
Anyway, I am going to shift to another implementation as mentioned in
the other thread.
-Sumit
>
> This is library code, not a helper in a driver.
>
>
> Daniel.
>
>
> > And we can make that explicit
> > as well:
> >
> > +/* Schedule work via irq work queue */
> > +bool irq_work_schedule(struct work_struct *sched_work)
> > +{
> > + if (in_nmi()) {
> > + init_irq_work(&irq_work_sched.work, irq_work_schedule_fn);
> > + irq_work_sched.sched_work = sched_work;
> > +
> > + return irq_work_queue(&irq_work_sched.work);
> > + }
> > +
> > + return false;
> > +}
> > +EXPORT_SYMBOL_GPL(irq_work_schedule);
> >
> > -Sumit
> >
> > >
> > > -Doug
On Mon, 17 Aug 2020 at 22:49, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Mon, Aug 17, 2020 at 7:08 AM Sumit Garg <[email protected]> wrote:
> >
> > On Fri, 14 Aug 2020 at 20:27, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Aug 14, 2020 at 12:24 AM Sumit Garg <[email protected]> wrote:
> > > >
> > > > + Peter (author of irq_work.c)
> > > >
> > > > On Thu, 13 Aug 2020 at 05:30, Doug Anderson <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Jul 21, 2020 at 5:10 AM Sumit Garg <[email protected]> wrote:
> > > > > >
> > > > > > In a future patch we will add support to the serial core to make it
> > > > > > possible to trigger a magic sysrq from an NMI context. Prepare for this
> > > > > > by marking some sysrq actions as NMI safe. Safe actions will be allowed
> > > > > > to run from NMI context whilst that cannot run from an NMI will be queued
> > > > > > as irq_work for later processing.
> > > > > >
> > > > > > A particular sysrq handler is only marked as NMI safe in case the handler
> > > > > > isn't contending for any synchronization primitives as in NMI context
> > > > > > they are expected to cause deadlocks. Note that the debug sysrq do not
> > > > > > contend for any synchronization primitives. It does call kgdb_breakpoint()
> > > > > > to provoke a trap but that trap handler should be NMI safe on
> > > > > > architectures that implement an NMI.
> > > > > >
> > > > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > > > ---
> > > > > > drivers/tty/sysrq.c | 33 ++++++++++++++++++++++++++++++++-
> > > > > > include/linux/sysrq.h | 1 +
> > > > > > kernel/debug/debug_core.c | 1 +
> > > > > > 3 files changed, 34 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > > > > > index 7c95afa9..8017e33 100644
> > > > > > --- a/drivers/tty/sysrq.c
> > > > > > +++ b/drivers/tty/sysrq.c
> > > > > > @@ -50,6 +50,8 @@
> > > > > > #include <linux/syscalls.h>
> > > > > > #include <linux/of.h>
> > > > > > #include <linux/rcupdate.h>
> > > > > > +#include <linux/irq_work.h>
> > > > > > +#include <linux/kfifo.h>
> > > > > >
> > > > > > #include <asm/ptrace.h>
> > > > > > #include <asm/irq_regs.h>
> > > > > > @@ -111,6 +113,7 @@ static const struct sysrq_key_op sysrq_loglevel_op = {
> > > > > > .help_msg = "loglevel(0-9)",
> > > > > > .action_msg = "Changing Loglevel",
> > > > > > .enable_mask = SYSRQ_ENABLE_LOG,
> > > > > > + .nmi_safe = true,
> > > > > > };
> > > > > >
> > > > > > #ifdef CONFIG_VT
> > > > > > @@ -157,6 +160,7 @@ static const struct sysrq_key_op sysrq_crash_op = {
> > > > > > .help_msg = "crash(c)",
> > > > > > .action_msg = "Trigger a crash",
> > > > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > > > + .nmi_safe = true,
> > > > > > };
> > > > > >
> > > > > > static void sysrq_handle_reboot(int key)
> > > > > > @@ -170,6 +174,7 @@ static const struct sysrq_key_op sysrq_reboot_op = {
> > > > > > .help_msg = "reboot(b)",
> > > > > > .action_msg = "Resetting",
> > > > > > .enable_mask = SYSRQ_ENABLE_BOOT,
> > > > > > + .nmi_safe = true,
> > > > > > };
> > > > > >
> > > > > > const struct sysrq_key_op *__sysrq_reboot_op = &sysrq_reboot_op;
> > > > > > @@ -217,6 +222,7 @@ static const struct sysrq_key_op sysrq_showlocks_op = {
> > > > > > .handler = sysrq_handle_showlocks,
> > > > > > .help_msg = "show-all-locks(d)",
> > > > > > .action_msg = "Show Locks Held",
> > > > > > + .nmi_safe = true,
> > > > > > };
> > > > > > #else
> > > > > > #define sysrq_showlocks_op (*(const struct sysrq_key_op *)NULL)
> > > > > > @@ -289,6 +295,7 @@ static const struct sysrq_key_op sysrq_showregs_op = {
> > > > > > .help_msg = "show-registers(p)",
> > > > > > .action_msg = "Show Regs",
> > > > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > > > + .nmi_safe = true,
> > > > > > };
> > > > > >
> > > > > > static void sysrq_handle_showstate(int key)
> > > > > > @@ -326,6 +333,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
> > > > > > .help_msg = "dump-ftrace-buffer(z)",
> > > > > > .action_msg = "Dump ftrace buffer",
> > > > > > .enable_mask = SYSRQ_ENABLE_DUMP,
> > > > > > + .nmi_safe = true,
> > > > > > };
> > > > > > #else
> > > > > > #define sysrq_ftrace_dump_op (*(const struct sysrq_key_op *)NULL)
> > > > > > @@ -538,6 +546,23 @@ static void __sysrq_put_key_op(int key, const struct sysrq_key_op *op_p)
> > > > > > sysrq_key_table[i] = op_p;
> > > > > > }
> > > > > >
> > > > > > +#define SYSRQ_NMI_FIFO_SIZE 64
> > > > > > +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
> > > > >
> > > > > A 64-entry FIFO seems excessive. Quite honestly even a FIFO seems a
> > > > > bit excessive and it feels like if two sysrqs were received in super
> > > > > quick succession that it would be OK to just process the first one. I
> > > > > guess if it simplifies the processing to have a FIFO then it shouldn't
> > > > > hurt, but no need for 64 entries.
> > > > >
> > > >
> > > > Okay, would a 2-entry FIFO work here? As here we need a FIFO to pass
> > > > on the key parameter.
> > >
> > > ...or even a 1-entry FIFO if that makes sense?
> > >
> >
> > Yes it would make sense but unfortunately not supported by kfifo
> > (size: power of 2).
>
> Typically 1 is considered to be a power of 2 since 2^0 = 1.
>
> ...ah, but it appears that size < 2 is not allowed. Oh well.
>
>
> > > > > > +static void sysrq_do_nmi_work(struct irq_work *work)
> > > > > > +{
> > > > > > + const struct sysrq_key_op *op_p;
> > > > > > + int key;
> > > > > > +
> > > > > > + while (kfifo_out(&sysrq_nmi_fifo, &key, 1)) {
> > > > > > + op_p = __sysrq_get_key_op(key);
> > > > > > + if (op_p)
> > > > > > + op_p->handler(key);
> > > > > > + }
> > > > >
> > > > > Do you need to manage "suppress_printk" in this function? Do you need
> > > > > to call rcu_sysrq_start() and rcu_read_lock()?
> > > >
> > > > Ah I missed those. Will add them here instead.
> > > >
> > > > >
> > > > > If so, how do you prevent racing between the mucking we're doing with
> > > > > these things and the mucking that the NMI does with them?
> > > >
> > > > IIUC, here you meant to highlight the race while scheduled sysrq is
> > > > executing in IRQ context and we receive a new sysrq in NMI context,
> > > > correct? If yes, this seems to be a trickier situation. I think the
> > > > appropriate way to handle it would be to deny any further sysrq
> > > > handling until the prior sysrq handling is complete, your views?
> > >
> > > The problem is that in some cases you're running NMIs directly at FIQ
> > > time and other cases you're running them at IRQ time. So you
> > > definitely can't just move it to NMI.
> > >
> > > Skipping looking for other SYSRQs until the old one is complete sounds
> > > good to me. Again my ignorance will make me sound like a fool,
> > > probably, but can you use the kfifo as a form of mutual exclusion? If
> > > you have a 1-entry kfifo, maybe:
> > >
> > > 1. First try to add to the "FIFO". If it fails (out of space) then a
> > > sysrq is in progress. Ignore this one.
> > > 2. Decide if you're NMI-safe or not.
> > > 3. If NMI safe, modify "suppress_printk", call rcu functions, then
> > > call the handler. Restore suppress_printk and then dequeue from FIFO.
> > > 4. If not-NMI safe, the irq worker would "peek" into the FIFO, do its
> > > work (wrapped with "suppress_printk" and the like), and not dequeue
> > > until it's done.
> > >
> > > In the above you'd use the FIFO as a locking mechanism. I don't know
> > > if that's a valid use of it or if there is a better NMI-safe mechanism
> > > for this. I think the kfifo docs talk about only one reader and one
> > > writer and here we have two readers, so maybe it's illegal. It also
> > > seems weird to have a 1-entry "FIFO" and feels like there's probably a
> > > better data structure for this.
> >
> > Thanks for your suggestions. Have a look at below implementation, I
> > have used 2-entry fifo but only single entry used for locking
> > mechanism:
> >
> > @@ -538,6 +546,39 @@ static void __sysrq_put_key_op(int key, const
> > struct sysrq_key_op *op_p)
> > sysrq_key_table[i] = op_p;
> > }
> >
> > +#define SYSRQ_NMI_FIFO_SIZE 2
> > +static DEFINE_KFIFO(sysrq_nmi_fifo, int, SYSRQ_NMI_FIFO_SIZE);
> > +
> > +static void sysrq_do_nmi_work(struct irq_work *work)
> > +{
> > + const struct sysrq_key_op *op_p;
> > + int orig_suppress_printk;
> > + int key;
> > +
> > + orig_suppress_printk = suppress_printk;
> > + suppress_printk = 0;
> > +
> > + rcu_sysrq_start();
> > + rcu_read_lock();
> > +
> > + if (kfifo_peek(&sysrq_nmi_fifo, &key)) {
> > + op_p = __sysrq_get_key_op(key);
> > + if (op_p)
> > + op_p->handler(key);
> > + }
> > +
> > + rcu_read_unlock();
> > + rcu_sysrq_end();
> > +
> > + suppress_printk = orig_suppress_printk;
> > +
> > + /* Pop contents from fifo if any */
> > + while (kfifo_get(&sysrq_nmi_fifo, &key))
> > + ;
>
> I think you can use kfifo_reset_out().
>
Okay, it sounds safe as well when used concurrently with
kfifo_is_empty(). Will use it instead.
>
> > +}
> > +
> > +static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
> > +
> > void __handle_sysrq(int key, bool check_mask)
> > {
> > const struct sysrq_key_op *op_p;
> > +}
> > +
> > +static DEFINE_IRQ_WORK(sysrq_nmi_work, sysrq_do_nmi_work);
> > +
> > void __handle_sysrq(int key, bool check_mask)
> > {
> > const struct sysrq_key_op *op_p;
> > @@ -545,6 +586,10 @@ void __handle_sysrq(int key, bool check_mask)
> > int orig_suppress_printk;
> > int i;
> >
> > + /* Skip sysrq handling if one already in progress */
> > + if (!kfifo_is_empty(&sysrq_nmi_fifo))
> > + return;
>
> This _seems_ OK to me since I'd imagine kfifo_is_empty() is as safe
> for the writer to do as kfifo_is_full() is and kfifo_is_full() is part
> of kfifo_put().
>
> I guess there's no better synchronism mechanism that we can use?
>
Yeah, unless someone else has a better idea.
-Sumit
>
> > +
> > orig_suppress_printk = suppress_printk;
> > suppress_printk = 0;
> >
> > @@ -568,7 +613,13 @@ void __handle_sysrq(int key, bool check_mask)
> > if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
> > pr_info("%s\n", op_p->action_msg);
> > console_loglevel = orig_log_level;
> > - op_p->handler(key);
> > +
> > + if (in_nmi() && !op_p->nmi_safe) {
> > + kfifo_put(&sysrq_nmi_fifo, key);
> > + irq_work_queue(&sysrq_nmi_work);
> > + } else {
> > + op_p->handler(key);
> > + }
> > } else {
> > pr_info("This sysrq operation is disabled.\n");
> > console_loglevel = orig_log_level;
> >
> > -Sumit