I started out this series trying to make sysrq work over the serial
console on qcom_geni_serial, then fell into a rat's nest.
To solve the deadlock I faced when enabling sysrq I tried to borrow
code from '8250_port.c' which avoided grabbing the port lock in
console_write(). ...but since these days I try to run with lockdep on
all the time, I found it caused an annoying lockdep splat (which I
also reproduced on my rk3399 board). ...so I instead changed my
qcom_geni_serial solution to borrow code from 'msm_serial.c'
I wasn't super happy with the solution in 'msm_serial.c' though. I
don't like releasing the spinlock there. Not only is it ugly but it
means we are unlocking / re-locking _all the time_ even though sysrq
characters are rare. ...so I came up with what I think is a better
solution and then implemented it for qcom_geni_serial.
Since I had a good way to test 8250-based UARTs, I also fixed that
driver to use my new method. When doing so, I ran into a missing
msm_serial.c at all, so I didn't switch that (or all other serial
drivers for that matter) to the new method.
After fixing all the above issues, I found the next lockdep splat in
kgdb and I think I've worked around it in a good-enough way, but I'm
much less confident about this. Hopefully folks can take a look at
it.
In general, patches earlier in this series should be "less
controversial" and hopefully can land even if people don't like
patches later in the series. ;-)
Looking back, this is pretty much two series squashed that could be
treated indepdently. The first is a serial series and the second is a
kgdb series.
For all serial patches I'd expect them to go through the tty tree once
they've been reviewed.
If folks are OK w/ the 'smp' patch it probably should go in some core
kernel tree. The kgdb patch won't work without it, though, so to land
that we'd need coordination between the folks landing that and the
folks landing the 'smp' patch.
Douglas Anderson (7):
serial: qcom_geni_serial: Finish supporting sysrq
serial: core: Allow processing sysrq at port unlock time
serial: qcom_geni_serial: Process sysrq at port unlock time
serial: core: Include console.h from serial_core.h
serial: 8250: Process sysrq at port unlock time
smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
kgdb: Remove irq flags and local_irq_enable/disable from roundup
arch/arc/kernel/kgdb.c | 4 +--
arch/arm/kernel/kgdb.c | 4 +--
arch/arm64/kernel/kgdb.c | 4 +--
arch/hexagon/kernel/kgdb.c | 11 ++----
arch/mips/kernel/kgdb.c | 4 +--
arch/powerpc/kernel/kgdb.c | 2 +-
arch/sh/kernel/kgdb.c | 4 +--
arch/sparc/kernel/smp_64.c | 2 +-
arch/x86/kernel/kgdb.c | 9 ++---
drivers/tty/serial/8250/8250_aspeed_vuart.c | 6 +++-
drivers/tty/serial/8250/8250_fsl.c | 6 +++-
drivers/tty/serial/8250/8250_omap.c | 6 +++-
drivers/tty/serial/8250/8250_port.c | 8 ++---
drivers/tty/serial/qcom_geni_serial.c | 10 ++++--
include/linux/kgdb.h | 9 ++---
include/linux/serial_core.h | 38 ++++++++++++++++++++-
kernel/debug/debug_core.c | 2 +-
kernel/smp.c | 4 ++-
18 files changed, 80 insertions(+), 53 deletions(-)
--
2.19.1.568.g152ad8e336-goog
The geni serial driver already had some sysrq code in it, but since
SUPPORT_SYSRQ wasn't defined the code didn't do anything useful.
Let's make it useful by adding that define using the same formula
found in other serial drivers.
In order to prevent deadlock, we'll take a page from the
'msm_serial.c' where the spinlock is released around
uart_handle_sysrq_char(). This seemed better than copying from
'8250_port.c' where we skip locking in the console_write function
since the '8250_port.c' method can cause lockdep warnings when
dropping into kgdb.
Signed-off-by: Douglas Anderson <[email protected]>
---
drivers/tty/serial/qcom_geni_serial.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index d3b5261ee80a..3c8e0202da8b 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1,6 +1,10 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2017-2018, The Linux foundation. All rights reserved.
+#if defined(CONFIG_SERIAL_QCOM_GENI_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+# define SUPPORT_SYSRQ
+#endif
+
#include <linux/clk.h>
#include <linux/console.h>
#include <linux/io.h>
@@ -495,7 +499,10 @@ static int handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
continue;
}
+ spin_unlock(&uport->lock);
sysrq = uart_handle_sysrq_char(uport, buf[c]);
+ spin_lock(&uport->lock);
+
if (!sysrq)
tty_insert_flip_char(tport, buf[c], TTY_NORMAL);
}
--
2.19.1.568.g152ad8e336-goog
Let's take advantage of the new ("serial: core: Allow processing sysrq
at port unlock time") to handle sysrqs more cleanly.
Signed-off-by: Douglas Anderson <[email protected]>
---
drivers/tty/serial/qcom_geni_serial.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 3c8e0202da8b..20edce1e222e 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -499,9 +499,7 @@ static int handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
continue;
}
- spin_unlock(&uport->lock);
- sysrq = uart_handle_sysrq_char(uport, buf[c]);
- spin_lock(&uport->lock);
+ sysrq = uart_prepare_sysrq_char(uport, buf[c]);
if (!sysrq)
tty_insert_flip_char(tport, buf[c], TTY_NORMAL);
@@ -811,7 +809,8 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
qcom_geni_serial_handle_rx(uport, drop_rx);
out_unlock:
- spin_unlock_irqrestore(&uport->lock, flags);
+ uart_unlock_and_check_sysrq(uport, flags);
+
return IRQ_HANDLED;
}
--
2.19.1.568.g152ad8e336-goog
In kgdb_roundup_cpus() we've got code that looks like:
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
local_irq_disable();
In certain cases when we drop into kgdb (like with sysrq-g on a serial
console) we'll get a big yell that looks like:
sysrq: SysRq : DEBUG
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
pstate: 604003c9 (nZCv DAIF +PAN -UAO)
pc : lockdep_hardirqs_on+0xf0/0x160
...
Call trace:
lockdep_hardirqs_on+0xf0/0x160
trace_hardirqs_on+0x188/0x1ac
kgdb_roundup_cpus+0x14/0x3c
kgdb_cpu_enter+0x53c/0x5cc
kgdb_handle_exception+0x180/0x1d4
kgdb_compiled_brk_fn+0x30/0x3c
brk_handler+0x134/0x178
do_debug_exception+0xfc/0x178
el1_dbg+0x18/0x78
kgdb_breakpoint+0x34/0x58
sysrq_handle_dbg+0x54/0x5c
__handle_sysrq+0x114/0x21c
handle_sysrq+0x30/0x3c
qcom_geni_serial_isr+0x2dc/0x30c
...
...
irq event stamp: ...45
hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4
hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34
softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
---[ end trace adf21f830c46e638 ]---
Let's add kgdb to the list of reasons not to warn in
smp_call_function_many(). That will allow us (in a future patch) to
stop calling local_irq_enable() which will get rid of the original
splat.
NOTE: with this change comes the obvious question: will we start
deadlocking more often now when we drop into the debugger. I can't
say that for sure one way or the other, but the fact that we do the
same logic for "oops_in_progress" makes me feel like it shouldn't
happen too often. Also note that the old logic of turning on
interrupts temporarily wasn't exactly safe since (I presume) that
could have violated spin_lock_irqsave() semantics and ended up with a
deadlock of its own.
Signed-off-by: Douglas Anderson <[email protected]>
---
kernel/smp.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 163c451af42e..bb581e58c8dc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,6 +19,7 @@
#include <linux/sched.h>
#include <linux/sched/idle.h>
#include <linux/hypervisor.h>
+#include <linux/kgdb.h>
#include "smpboot.h"
@@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
* can't happen.
*/
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
- && !oops_in_progress && !early_boot_irqs_disabled);
+ && !oops_in_progress && !early_boot_irqs_disabled
+ && !in_dbg_master());
/* Try to fastpath. So, what's a CPU they want? Ignoring this one. */
cpu = cpumask_first_and(mask, cpu_online_mask);
--
2.19.1.568.g152ad8e336-goog
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:
> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().
Nobody used those flags. Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them. So we can definitely remove the flags.
Speaking of calling local_irq_enable(), it seems like a bad idea and
it caused a nice splat on my system with lockdep turned on.
Specifically it hit:
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
See the previous patch in this series ("smp: Don't yell about IRQs
disabled in kgdb_roundup_cpus()") for more details, but the last few
things on the stack were this on my arm64 board:
lockdep_hardirqs_on+0xf0/0x160
trace_hardirqs_on+0x188/0x1ac
kgdb_roundup_cpus+0x14/0x3c
As agrued in the the commit text of ("smp: Don't yell about IRQs
disabled in kgdb_roundup_cpus()"), it seems better to make
smp_call_function() lenient about kgdb than to locally turn on IRQs
here. Thus let's totally remove all the local_irq_enable() and
local_irq_disable() calls from all of the kgdb_roundup_cpus() calls.
Signed-off-by: Douglas Anderson <[email protected]>
---
arch/arc/kernel/kgdb.c | 4 +---
arch/arm/kernel/kgdb.c | 4 +---
arch/arm64/kernel/kgdb.c | 4 +---
arch/hexagon/kernel/kgdb.c | 11 ++---------
arch/mips/kernel/kgdb.c | 4 +---
arch/powerpc/kernel/kgdb.c | 2 +-
arch/sh/kernel/kgdb.c | 4 +---
arch/sparc/kernel/smp_64.c | 2 +-
arch/x86/kernel/kgdb.c | 9 ++-------
include/linux/kgdb.h | 9 ++-------
kernel/debug/debug_core.c | 2 +-
11 files changed, 14 insertions(+), 41 deletions(-)
diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..d94d3cb7f9e8 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -197,11 +197,9 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), NULL);
}
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
- local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
}
struct kgdb_arch arch_kgdb_ops = {
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..a80e9259f7e9 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -175,11 +175,9 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
- local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
}
static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..5d171c26788f 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -289,11 +289,9 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
- local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
}
static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..30fbc491cf45 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
*
* On SMP systems, we need to get the attention of the other CPUs
* and get them be in a known state. This should do what is needed
* to get the other CPUs to call kgdb_wait(). Note that on some arches,
* the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
*
* On non-SMP systems, this is not called.
*/
@@ -139,11 +134,9 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
- local_irq_enable();
smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
- local_irq_disable();
}
#endif
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..6671a279966f 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -219,11 +219,9 @@ static void kgdb_call_nmi_hook(void *ignored)
set_fs(old_fs);
}
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
- local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
}
static int compute_signal(int tt)
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 59c578f865aa..b0e804844be0 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
}
#ifdef CONFIG_SMP
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
smp_send_debugger_break();
}
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index 4f04c6638a4d..86b3ea927e42 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -319,11 +319,9 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
- local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
}
static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 4792e08ad36b..f45d876983f1 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1014,7 +1014,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
}
#ifdef CONFIG_KGDB
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
smp_cross_call(&xcall_kgdb_capture, 0, 0, 0);
}
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..ac6291a4178d 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -422,21 +422,16 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
#ifdef CONFIG_SMP
/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
*
* On SMP systems, we need to get the attention of the other CPUs
* and get them be in a known state. This should do what is needed
* to get the other CPUs to call kgdb_wait(). Note that on some arches,
* the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
*
* On non-SMP systems, this is not called.
*/
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
apic->send_IPI_allbutself(APIC_DM_NMI);
}
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index e465bb15912d..05e5b2eb0d32 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -178,21 +178,16 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,
/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
*
* On SMP systems, we need to get the attention of the other CPUs
* and get them into a known state. This should do what is needed
* to get the other CPUs to call kgdb_wait(). Note that on some arches,
* the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
*
* On non-SMP systems, this is not called.
*/
-extern void kgdb_roundup_cpus(unsigned long flags);
+extern void kgdb_roundup_cpus(void);
/**
* kgdb_arch_set_pc - Generic call back to the program counter
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 65c0f1363788..f3cadda45f07 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -593,7 +593,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
/* Signal the other CPUs to enter kgdb_wait() */
else if ((!kgdb_single_step) && kgdb_do_roundup)
- kgdb_roundup_cpus(flags);
+ kgdb_roundup_cpus();
#endif
/*
--
2.19.1.568.g152ad8e336-goog
Let's take advantage of the new ("serial: core: Allow processing sysrq
at port unlock time") to handle sysrqs more cleanly.
Signed-off-by: Douglas Anderson <[email protected]>
---
I don't have a great way to test all 8250 variants, but I've at least
tested rk3288 / rk3399 and they seem to work. Hopefully I got the
aspeed_vuart / fsl / omap variants right (I only compile tested
those).
drivers/tty/serial/8250/8250_aspeed_vuart.c | 6 +++++-
drivers/tty/serial/8250/8250_fsl.c | 6 +++++-
drivers/tty/serial/8250/8250_omap.c | 6 +++++-
drivers/tty/serial/8250/8250_port.c | 8 +++-----
4 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_aspeed_vuart.c b/drivers/tty/serial/8250/8250_aspeed_vuart.c
index 435bec40dee6..0438d9a905ce 100644
--- a/drivers/tty/serial/8250/8250_aspeed_vuart.c
+++ b/drivers/tty/serial/8250/8250_aspeed_vuart.c
@@ -5,6 +5,10 @@
* Copyright (C) 2016 Jeremy Kerr <[email protected]>, IBM Corp.
* Copyright (C) 2006 Arnd Bergmann <[email protected]>, IBM Corp.
*/
+#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#define SUPPORT_SYSRQ
+#endif
+
#include <linux/device.h>
#include <linux/module.h>
#include <linux/of_address.h>
@@ -293,7 +297,7 @@ static int aspeed_vuart_handle_irq(struct uart_port *port)
if (lsr & UART_LSR_THRE)
serial8250_tx_chars(up);
- spin_unlock_irqrestore(&port->lock, flags);
+ uart_unlock_and_check_sysrq(port, flags);
return 1;
}
diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c
index 6640a4c7ddd1..ff3dcaea5d93 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -1,4 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
+#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#define SUPPORT_SYSRQ
+#endif
+
#include <linux/serial_reg.h>
#include <linux/serial_8250.h>
@@ -54,7 +58,7 @@ int fsl8250_handle_irq(struct uart_port *port)
serial8250_tx_chars(up);
up->lsr_saved_flags = orig_lsr;
- spin_unlock_irqrestore(&up->port.lock, flags);
+ uart_unlock_and_check_sysrq(&up->port, flags);
return 1;
}
EXPORT_SYMBOL_GPL(fsl8250_handle_irq);
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index a019286f8bb6..ad7ba7d0f28d 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -8,6 +8,10 @@
*
*/
+#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#define SUPPORT_SYSRQ
+#endif
+
#include <linux/device.h>
#include <linux/io.h>
#include <linux/module.h>
@@ -1085,7 +1089,7 @@ static int omap_8250_dma_handle_irq(struct uart_port *port)
}
}
- spin_unlock_irqrestore(&port->lock, flags);
+ uart_unlock_and_check_sysrq(port, flags);
serial8250_rpm_put(up);
return 1;
}
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index f776b3eafb96..c39482b96111 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1755,7 +1755,7 @@ void serial8250_read_char(struct uart_8250_port *up, unsigned char lsr)
else if (lsr & UART_LSR_FE)
flag = TTY_FRAME;
}
- if (uart_handle_sysrq_char(port, ch))
+ if (uart_prepare_sysrq_char(port, ch))
return;
uart_insert_char(port, lsr, UART_LSR_OE, ch, flag);
@@ -1897,7 +1897,7 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
if ((!up->dma || up->dma->tx_err) && (status & UART_LSR_THRE))
serial8250_tx_chars(up);
- spin_unlock_irqrestore(&port->lock, flags);
+ uart_unlock_and_check_sysrq(port, flags);
return 1;
}
EXPORT_SYMBOL_GPL(serial8250_handle_irq);
@@ -3258,9 +3258,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
serial8250_rpm_get(up);
- if (port->sysrq)
- locked = 0;
- else if (oops_in_progress)
+ if (oops_in_progress)
locked = spin_trylock_irqsave(&port->lock, flags);
else
spin_lock_irqsave(&port->lock, flags);
--
2.19.1.568.g152ad8e336-goog
In the static inline function uart_handle_break() in serial_core.h we
dereference port->cons. That gives an error unless console.h is also
included.
This error hasn't shown up till now because everyone who has defined
SUPPORT_SYSRQ has also included console.h, but it's a bit ugly to make
this requirement. Let's make the include explicit.
Signed-off-by: Douglas Anderson <[email protected]>
---
include/linux/serial_core.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 78de9d929762..5fe2b037e833 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -22,6 +22,7 @@
#include <linux/bitops.h>
#include <linux/compiler.h>
+#include <linux/console.h>
#include <linux/interrupt.h>
#include <linux/circ_buf.h>
#include <linux/spinlock.h>
--
2.19.1.568.g152ad8e336-goog
Right now serial drivers process sysrq keys deep in their character
receiving code. This means that they've already grabbed their
port->lock spinlock. This can end up getting in the way if we've go
to do serial stuff (especially kgdb) in response to the sysrq.
Serial drivers have various hacks in them to handle this. Looking at
'8250_port.c' you can see that the console_write() skips locking if
we're in the sysrq handler. Looking at 'msm_serial.c' you can see
that the port lock is dropped around uart_handle_sysrq_char().
It turns out that these hacks aren't exactly perfect. If you have
lockdep turned on and use something like the 8250_port hack you'll get
a splat that looks like:
WARNING: possible circular locking dependency detected
[...] is trying to acquire lock:
... (console_owner){-.-.}, at: console_unlock+0x2e0/0x5e4
but task is already holding lock:
... (&port_lock_key){-.-.}, at: serial8250_handle_irq+0x30/0xe4
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&port_lock_key){-.-.}:
_raw_spin_lock_irqsave+0x58/0x70
serial8250_console_write+0xa8/0x250
univ8250_console_write+0x40/0x4c
console_unlock+0x528/0x5e4
register_console+0x2c4/0x3b0
uart_add_one_port+0x350/0x478
serial8250_register_8250_port+0x350/0x3a8
dw8250_probe+0x67c/0x754
platform_drv_probe+0x58/0xa4
really_probe+0x150/0x294
driver_probe_device+0xac/0xe8
__driver_attach+0x98/0xd0
bus_for_each_dev+0x84/0xc8
driver_attach+0x2c/0x34
bus_add_driver+0xf0/0x1ec
driver_register+0xb4/0x100
__platform_driver_register+0x60/0x6c
dw8250_platform_driver_init+0x20/0x28
...
-> #0 (console_owner){-.-.}:
lock_acquire+0x1e8/0x214
console_unlock+0x35c/0x5e4
vprintk_emit+0x230/0x274
vprintk_default+0x7c/0x84
vprintk_func+0x190/0x1bc
printk+0x80/0xa0
__handle_sysrq+0x104/0x21c
handle_sysrq+0x30/0x3c
serial8250_read_char+0x15c/0x18c
serial8250_rx_chars+0x34/0x74
serial8250_handle_irq+0x9c/0xe4
dw8250_handle_irq+0x98/0xcc
serial8250_interrupt+0x50/0xe8
...
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&port_lock_key);
lock(console_owner);
lock(&port_lock_key);
lock(console_owner);
*** DEADLOCK ***
The hack used in 'msm_serial.c' doesn't cause the above splats but it
seems a bit ugly to unlock / lock our spinlock deep in our irq
handler.
It seems like we could defer processing the sysrq until the end of the
interrupt handler right after we've unlocked the port. With this
scheme if a whole batch of sysrq characters comes in one irq then we
won't handle them all, but that seems like it should be a fine
compromise.
Signed-off-by: Douglas Anderson <[email protected]>
---
include/linux/serial_core.h | 37 ++++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 047fa67d039b..78de9d929762 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -175,6 +175,7 @@ struct uart_port {
struct console *cons; /* struct console, if any */
#if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
unsigned long sysrq; /* sysrq timeout */
+ unsigned int sysrq_ch; /* char for sysrq */
#endif
/* flags must be updated while holding port mutex */
@@ -485,8 +486,42 @@ uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
}
return 0;
}
+static inline int
+uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch)
+{
+ if (port->sysrq) {
+ if (ch && time_before(jiffies, port->sysrq)) {
+ port->sysrq_ch = ch;
+ port->sysrq = 0;
+ return 1;
+ }
+ port->sysrq = 0;
+ }
+ return 0;
+}
+static inline void
+uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
+{
+ int sysrq_ch;
+
+ sysrq_ch = port->sysrq_ch;
+ port->sysrq_ch = 0;
+
+ spin_unlock_irqrestore(&port->lock, irqflags);
+
+ if (sysrq_ch)
+ handle_sysrq(sysrq_ch);
+}
#else
-#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
+static inline int
+uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
+static inline int
+uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
+static inline void
+uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
+{
+ spin_unlock_irqrestore(&port->lock, irqflags);
+}
#endif
/*
--
2.19.1.568.g152ad8e336-goog
Hi,
On Mon, Oct 29, 2018 at 11:08 AM Douglas Anderson <[email protected]> wrote:
>
> Right now serial drivers process sysrq keys deep in their character
> receiving code. This means that they've already grabbed their
> port->lock spinlock. This can end up getting in the way if we've go
> to do serial stuff (especially kgdb) in response to the sysrq.
>
> Serial drivers have various hacks in them to handle this. Looking at
> '8250_port.c' you can see that the console_write() skips locking if
> we're in the sysrq handler. Looking at 'msm_serial.c' you can see
> that the port lock is dropped around uart_handle_sysrq_char().
>
> It turns out that these hacks aren't exactly perfect. If you have
> lockdep turned on and use something like the 8250_port hack you'll get
> a splat that looks like:
>
> WARNING: possible circular locking dependency detected
> [...] is trying to acquire lock:
> ... (console_owner){-.-.}, at: console_unlock+0x2e0/0x5e4
>
> but task is already holding lock:
> ... (&port_lock_key){-.-.}, at: serial8250_handle_irq+0x30/0xe4
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&port_lock_key){-.-.}:
> _raw_spin_lock_irqsave+0x58/0x70
> serial8250_console_write+0xa8/0x250
> univ8250_console_write+0x40/0x4c
> console_unlock+0x528/0x5e4
> register_console+0x2c4/0x3b0
> uart_add_one_port+0x350/0x478
> serial8250_register_8250_port+0x350/0x3a8
> dw8250_probe+0x67c/0x754
> platform_drv_probe+0x58/0xa4
> really_probe+0x150/0x294
> driver_probe_device+0xac/0xe8
> __driver_attach+0x98/0xd0
> bus_for_each_dev+0x84/0xc8
> driver_attach+0x2c/0x34
> bus_add_driver+0xf0/0x1ec
> driver_register+0xb4/0x100
> __platform_driver_register+0x60/0x6c
> dw8250_platform_driver_init+0x20/0x28
> ...
>
> -> #0 (console_owner){-.-.}:
> lock_acquire+0x1e8/0x214
> console_unlock+0x35c/0x5e4
> vprintk_emit+0x230/0x274
> vprintk_default+0x7c/0x84
> vprintk_func+0x190/0x1bc
> printk+0x80/0xa0
> __handle_sysrq+0x104/0x21c
> handle_sysrq+0x30/0x3c
> serial8250_read_char+0x15c/0x18c
> serial8250_rx_chars+0x34/0x74
> serial8250_handle_irq+0x9c/0xe4
> dw8250_handle_irq+0x98/0xcc
> serial8250_interrupt+0x50/0xe8
> ...
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&port_lock_key);
> lock(console_owner);
> lock(&port_lock_key);
> lock(console_owner);
>
> *** DEADLOCK ***
>
> The hack used in 'msm_serial.c' doesn't cause the above splats but it
> seems a bit ugly to unlock / lock our spinlock deep in our irq
> handler.
>
> It seems like we could defer processing the sysrq until the end of the
> interrupt handler right after we've unlocked the port. With this
> scheme if a whole batch of sysrq characters comes in one irq then we
> won't handle them all, but that seems like it should be a fine
> compromise.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> include/linux/serial_core.h | 37 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> index 047fa67d039b..78de9d929762 100644
> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -175,6 +175,7 @@ struct uart_port {
> struct console *cons; /* struct console, if any */
> #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
> unsigned long sysrq; /* sysrq timeout */
> + unsigned int sysrq_ch; /* char for sysrq */
> #endif
>
> /* flags must be updated while holding port mutex */
> @@ -485,8 +486,42 @@ uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
> }
> return 0;
> }
> +static inline int
> +uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch)
> +{
> + if (port->sysrq) {
> + if (ch && time_before(jiffies, port->sysrq)) {
> + port->sysrq_ch = ch;
> + port->sysrq = 0;
> + return 1;
> + }
> + port->sysrq = 0;
> + }
> + return 0;
> +}
> +static inline void
> +uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
> +{
> + int sysrq_ch;
> +
> + sysrq_ch = port->sysrq_ch;
> + port->sysrq_ch = 0;
> +
> + spin_unlock_irqrestore(&port->lock, irqflags);
> +
> + if (sysrq_ch)
> + handle_sysrq(sysrq_ch);
> +}
> #else
> -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
> +static inline int
> +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
> +static inline int
> +uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
> +static inline void
> +uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
> +{
> + spin_unlock_irqrestore(&port->lock, irqflags);
> +}
Jeremy wrote me to point out that I messed up and didn't get this
patch posted to the linux-serial list. Sorry about that. :( I guess
get_mainatiners doesn't notice that this include file is relevant to
serial and I didn't notice either since I was too focused on trying to
figure out if it was really the right call to Cc all the arch
maintainers on the cover letter and the last patch... Sigh.
If/when I need to repost, I'll make sure to get linux-serial. For now
at least they are on LKML so probably the easiest place to find all
the patches is:
https://lore.kernel.org/patchwork/cover/1004280/
...if you clock on the "show" next to "Related" you get the whole
series. Using the message ID from there you can also find them at:
https://lkml.kernel.org/r/[email protected]
Both places allow you to grab 'mbox' files which (which a bit of a
hassle--sorry) can allow you to reply /apply patches.
-Doug
On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> In kgdb_roundup_cpus() we've got code that looks like:
> local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> local_irq_disable();
>
> In certain cases when we drop into kgdb (like with sysrq-g on a serial
> console) we'll get a big yell that looks like:
>
> sysrq: SysRq : DEBUG
> ------------[ cut here ]------------
> DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
> pstate: 604003c9 (nZCv DAIF +PAN -UAO)
> pc : lockdep_hardirqs_on+0xf0/0x160
> ...
> Call trace:
> lockdep_hardirqs_on+0xf0/0x160
> trace_hardirqs_on+0x188/0x1ac
> kgdb_roundup_cpus+0x14/0x3c
> kgdb_cpu_enter+0x53c/0x5cc
> kgdb_handle_exception+0x180/0x1d4
> kgdb_compiled_brk_fn+0x30/0x3c
> brk_handler+0x134/0x178
> do_debug_exception+0xfc/0x178
> el1_dbg+0x18/0x78
> kgdb_breakpoint+0x34/0x58
> sysrq_handle_dbg+0x54/0x5c
> __handle_sysrq+0x114/0x21c
> handle_sysrq+0x30/0x3c
> qcom_geni_serial_isr+0x2dc/0x30c
> ...
> ...
> irq event stamp: ...45
> hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4
> hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
> softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34
> softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
> ---[ end trace adf21f830c46e638 ]---
>
> Let's add kgdb to the list of reasons not to warn in
> smp_call_function_many(). That will allow us (in a future patch) to
> stop calling local_irq_enable() which will get rid of the original
> splat.
>
> NOTE: with this change comes the obvious question: will we start
> deadlocking more often now when we drop into the debugger. I can't
> say that for sure one way or the other, but the fact that we do the
> same logic for "oops_in_progress" makes me feel like it shouldn't
> happen too often. Also note that the old logic of turning on
> interrupts temporarily wasn't exactly safe since (I presume) that
> could have violated spin_lock_irqsave() semantics and ended up with a
> deadlock of its own.
This is part of the code to bring all the cores to a halt and since
the other cores are still running kgdb isn't yet able to use the fact
all the CPUs are halted to bend the rules. It is better for this code
to play by the rules if it can.
Is is possible to get the roundup functions to use a private csd
alongside smp_call_function_single_async()? We could add a helper
function to the debug core to avoid having to add cpu_online loops into
every kgdb_roundup_cpus() implementaton.
Daniel.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> kernel/smp.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 163c451af42e..bb581e58c8dc 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -19,6 +19,7 @@
> #include <linux/sched.h>
> #include <linux/sched/idle.h>
> #include <linux/hypervisor.h>
> +#include <linux/kgdb.h>
>
> #include "smpboot.h"
>
> @@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
> * can't happen.
> */
> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> - && !oops_in_progress && !early_boot_irqs_disabled);
> + && !oops_in_progress && !early_boot_irqs_disabled
> + && !in_dbg_master());
>
> /* Try to fastpath. So, what's a CPU they want? Ignoring this one. */
> cpu = cpumask_first_and(mask, cpu_online_mask);
> --
> 2.19.1.568.g152ad8e336-goog
>
On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> The function kgdb_roundup_cpus() was passed a parameter that was
> documented as:
>
> > the flags that will be used when restoring the interrupts. There is
> > local_irq_save() call before kgdb_roundup_cpus().
>
> Nobody used those flags. Anyone who wanted to temporarily turn on
> interrupts just did local_irq_enable() and local_irq_disable() without
> looking at them. So we can definitely remove the flags.
On the whole I'd rather that this change...
> Speaking of calling local_irq_enable(), it seems like a bad idea and
> it caused a nice splat on my system with lockdep turned on.
> Specifically it hit:
> DEBUG_LOCKS_WARN_ON(current->hardirq_context)
... and fixes for this this were in separate patches. They don't appear
especially related.
Daniel.
> See the previous patch in this series ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()") for more details, but the last few
> things on the stack were this on my arm64 board:
> lockdep_hardirqs_on+0xf0/0x160
> trace_hardirqs_on+0x188/0x1ac
> kgdb_roundup_cpus+0x14/0x3c
>
> As agrued in the the commit text of ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()"), it seems better to make
> smp_call_function() lenient about kgdb than to locally turn on IRQs
> here. Thus let's totally remove all the local_irq_enable() and
> local_irq_disable() calls from all of the kgdb_roundup_cpus() calls.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> arch/arc/kernel/kgdb.c | 4 +---
> arch/arm/kernel/kgdb.c | 4 +---
> arch/arm64/kernel/kgdb.c | 4 +---
> arch/hexagon/kernel/kgdb.c | 11 ++---------
> arch/mips/kernel/kgdb.c | 4 +---
> arch/powerpc/kernel/kgdb.c | 2 +-
> arch/sh/kernel/kgdb.c | 4 +---
> arch/sparc/kernel/smp_64.c | 2 +-
> arch/x86/kernel/kgdb.c | 9 ++-------
> include/linux/kgdb.h | 9 ++-------
> kernel/debug/debug_core.c | 2 +-
> 11 files changed, 14 insertions(+), 41 deletions(-)
>
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..d94d3cb7f9e8 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -197,11 +197,9 @@ static void kgdb_call_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), NULL);
> }
>
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> - local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
> }
>
> struct kgdb_arch arch_kgdb_ops = {
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..a80e9259f7e9 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -175,11 +175,9 @@ static void kgdb_call_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> }
>
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> - local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
> }
>
> static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..5d171c26788f 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -289,11 +289,9 @@ static void kgdb_call_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> }
>
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> - local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
> }
>
> static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..30fbc491cf45 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
>
> /**
> * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
> *
> * On SMP systems, we need to get the attention of the other CPUs
> * and get them be in a known state. This should do what is needed
> * to get the other CPUs to call kgdb_wait(). Note that on some arches,
> * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
> *
> * On non-SMP systems, this is not called.
> */
> @@ -139,11 +134,9 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> }
>
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> - local_irq_enable();
> smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
> - local_irq_disable();
> }
> #endif
>
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..6671a279966f 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -219,11 +219,9 @@ static void kgdb_call_nmi_hook(void *ignored)
> set_fs(old_fs);
> }
>
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> - local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
> }
>
> static int compute_signal(int tt)
> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
> index 59c578f865aa..b0e804844be0 100644
> --- a/arch/powerpc/kernel/kgdb.c
> +++ b/arch/powerpc/kernel/kgdb.c
> @@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
> }
>
> #ifdef CONFIG_SMP
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> smp_send_debugger_break();
> }
> diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
> index 4f04c6638a4d..86b3ea927e42 100644
> --- a/arch/sh/kernel/kgdb.c
> +++ b/arch/sh/kernel/kgdb.c
> @@ -319,11 +319,9 @@ static void kgdb_call_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> }
>
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> - local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
> }
>
> static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
> index 4792e08ad36b..f45d876983f1 100644
> --- a/arch/sparc/kernel/smp_64.c
> +++ b/arch/sparc/kernel/smp_64.c
> @@ -1014,7 +1014,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
> }
>
> #ifdef CONFIG_KGDB
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> smp_cross_call(&xcall_kgdb_capture, 0, 0, 0);
> }
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 8e36f249646e..ac6291a4178d 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -422,21 +422,16 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
> #ifdef CONFIG_SMP
> /**
> * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
> *
> * On SMP systems, we need to get the attention of the other CPUs
> * and get them be in a known state. This should do what is needed
> * to get the other CPUs to call kgdb_wait(). Note that on some arches,
> * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
> *
> * On non-SMP systems, this is not called.
> */
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
> {
> apic->send_IPI_allbutself(APIC_DM_NMI);
> }
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index e465bb15912d..05e5b2eb0d32 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -178,21 +178,16 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,
>
> /**
> * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
> *
> * On SMP systems, we need to get the attention of the other CPUs
> * and get them into a known state. This should do what is needed
> * to get the other CPUs to call kgdb_wait(). Note that on some arches,
> * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
> *
> * On non-SMP systems, this is not called.
> */
> -extern void kgdb_roundup_cpus(unsigned long flags);
> +extern void kgdb_roundup_cpus(void);
>
> /**
> * kgdb_arch_set_pc - Generic call back to the program counter
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 65c0f1363788..f3cadda45f07 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -593,7 +593,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
>
> /* Signal the other CPUs to enter kgdb_wait() */
> else if ((!kgdb_single_step) && kgdb_do_roundup)
> - kgdb_roundup_cpus(flags);
> + kgdb_roundup_cpus();
> #endif
>
> /*
> --
> 2.19.1.568.g152ad8e336-goog
>
On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> Looking back, this is pretty much two series squashed that could be
> treated indepdently. The first is a serial series and the second is a
> kgdb series.
Indeed.
I couldn't work out the link between the first 5 patches and the last 2
until I read this...
Is anything in the 01->05 patch set even related to kgdb? From the stack
traces it looks to me like the lock dep warning would trigger for any
sysrq. I think separating into two threads for v2 would be sensible.
Daniel.
>
> For all serial patches I'd expect them to go through the tty tree once
> they've been reviewed.
>
> If folks are OK w/ the 'smp' patch it probably should go in some core
> kernel tree. The kgdb patch won't work without it, though, so to land
> that we'd need coordination between the folks landing that and the
> folks landing the 'smp' patch.
>
>
> Douglas Anderson (7):
> serial: qcom_geni_serial: Finish supporting sysrq
> serial: core: Allow processing sysrq at port unlock time
> serial: qcom_geni_serial: Process sysrq at port unlock time
> serial: core: Include console.h from serial_core.h
> serial: 8250: Process sysrq at port unlock time
> smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
> kgdb: Remove irq flags and local_irq_enable/disable from roundup
>
> arch/arc/kernel/kgdb.c | 4 +--
> arch/arm/kernel/kgdb.c | 4 +--
> arch/arm64/kernel/kgdb.c | 4 +--
> arch/hexagon/kernel/kgdb.c | 11 ++----
> arch/mips/kernel/kgdb.c | 4 +--
> arch/powerpc/kernel/kgdb.c | 2 +-
> arch/sh/kernel/kgdb.c | 4 +--
> arch/sparc/kernel/smp_64.c | 2 +-
> arch/x86/kernel/kgdb.c | 9 ++---
> drivers/tty/serial/8250/8250_aspeed_vuart.c | 6 +++-
> drivers/tty/serial/8250/8250_fsl.c | 6 +++-
> drivers/tty/serial/8250/8250_omap.c | 6 +++-
> drivers/tty/serial/8250/8250_port.c | 8 ++---
> drivers/tty/serial/qcom_geni_serial.c | 10 ++++--
> include/linux/kgdb.h | 9 ++---
> include/linux/serial_core.h | 38 ++++++++++++++++++++-
> kernel/debug/debug_core.c | 2 +-
> kernel/smp.c | 4 ++-
> 18 files changed, 80 insertions(+), 53 deletions(-)
>
> --
> 2.19.1.568.g152ad8e336-goog
>
On Tue, Oct 30, 2018 at 11:56:28AM +0000, Daniel Thompson wrote:
> On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> > Looking back, this is pretty much two series squashed that could be
> > treated indepdently. The first is a serial series and the second is a
> > kgdb series.
>
> Indeed.
>
> I couldn't work out the link between the first 5 patches and the last 2
> until I read this...
>
> Is anything in the 01->05 patch set even related to kgdb? From the stack
> traces it looks to me like the lock dep warning would trigger for any
> sysrq. I think separating into two threads for v2 would be sensible.
I'm concerned about calling smp_call_function() from IRQ context with
IRQs disabled - that will block the ability of the _calling_ CPU to
process IPIs from other CPUs in the system. If we have other CPUs
waiting on their IPIs to complete on _this_ CPU, we could end up
deadlocking while trying to grab the CSD lock.
This is the intention of warnings in smp_call_function*() - to catch
cases where deadlocks _can_ occur, but do not reliably show up.
The exceptions to the warning (disregarding oops_in_progress) are
chosen to allow IRQs-disabled calls when we're sure that the rest of
the system isn't going to be sending the calling CPU an IPI (eg,
because the CPU isn't marked online, and we only send IPIs to online
CPUs, or if we're still early in the kernel boot and hence have no
other CPUs running.) The exception is oops_in_progress, which can
occur at any time - even with the current CPU already holding some
CSD locks (eg, oops while trying to send an IPI.)
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> I started out this series trying to make sysrq work over the serial
> console on qcom_geni_serial, then fell into a rat's nest.
>
> To solve the deadlock I faced when enabling sysrq I tried to borrow
> code from '8250_port.c' which avoided grabbing the port lock in
> console_write(). ...but since these days I try to run with lockdep on
> all the time, I found it caused an annoying lockdep splat (which I
> also reproduced on my rk3399 board). ...so I instead changed my
> qcom_geni_serial solution to borrow code from 'msm_serial.c'
>
> I wasn't super happy with the solution in 'msm_serial.c' though. I
> don't like releasing the spinlock there. Not only is it ugly but it
> means we are unlocking / re-locking _all the time_ even though sysrq
> characters are rare. ...so I came up with what I think is a better
> solution and then implemented it for qcom_geni_serial.
>
> Since I had a good way to test 8250-based UARTs, I also fixed that
> driver to use my new method. When doing so, I ran into a missing
> msm_serial.c at all, so I didn't switch that (or all other serial
> drivers for that matter) to the new method.
>
> After fixing all the above issues, I found the next lockdep splat in
> kgdb and I think I've worked around it in a good-enough way, but I'm
> much less confident about this. Hopefully folks can take a look at
> it.
>
> In general, patches earlier in this series should be "less
> controversial" and hopefully can land even if people don't like
> patches later in the series. ;-)
>
> Looking back, this is pretty much two series squashed that could be
> treated indepdently. The first is a serial series and the second is a
> kgdb series.
>
> For all serial patches I'd expect them to go through the tty tree once
> they've been reviewed.
>
> If folks are OK w/ the 'smp' patch it probably should go in some core
> kernel tree. The kgdb patch won't work without it, though, so to land
> that we'd need coordination between the folks landing that and the
> folks landing the 'smp' patch.
I have got only 0/7 and 5/7, everything okay with your mail client and other tools?
>
>
> Douglas Anderson (7):
> serial: qcom_geni_serial: Finish supporting sysrq
> serial: core: Allow processing sysrq at port unlock time
> serial: qcom_geni_serial: Process sysrq at port unlock time
> serial: core: Include console.h from serial_core.h
> serial: 8250: Process sysrq at port unlock time
> smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
> kgdb: Remove irq flags and local_irq_enable/disable from roundup
>
> arch/arc/kernel/kgdb.c | 4 +--
> arch/arm/kernel/kgdb.c | 4 +--
> arch/arm64/kernel/kgdb.c | 4 +--
> arch/hexagon/kernel/kgdb.c | 11 ++----
> arch/mips/kernel/kgdb.c | 4 +--
> arch/powerpc/kernel/kgdb.c | 2 +-
> arch/sh/kernel/kgdb.c | 4 +--
> arch/sparc/kernel/smp_64.c | 2 +-
> arch/x86/kernel/kgdb.c | 9 ++---
> drivers/tty/serial/8250/8250_aspeed_vuart.c | 6 +++-
> drivers/tty/serial/8250/8250_fsl.c | 6 +++-
> drivers/tty/serial/8250/8250_omap.c | 6 +++-
> drivers/tty/serial/8250/8250_port.c | 8 ++---
> drivers/tty/serial/qcom_geni_serial.c | 10 ++++--
> include/linux/kgdb.h | 9 ++---
> include/linux/serial_core.h | 38 ++++++++++++++++++++-
> kernel/debug/debug_core.c | 2 +-
> kernel/smp.c | 4 ++-
> 18 files changed, 80 insertions(+), 53 deletions(-)
>
> --
> 2.19.1.568.g152ad8e336-goog
>
--
With Best Regards,
Andy Shevchenko
On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> In kgdb_roundup_cpus() we've got code that looks like:
> local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> local_irq_disable();
> Let's add kgdb to the list of reasons not to warn in
> smp_call_function_many(). That will allow us (in a future patch) to
> stop calling local_irq_enable() which will get rid of the original
> splat.
>
> NOTE: with this change comes the obvious question: will we start
> deadlocking more often now when we drop into the debugger. I can't
> say that for sure one way or the other, but the fact that we do the
> same logic for "oops_in_progress" makes me feel like it shouldn't
> happen too often. Also note that the old logic of turning on
> interrupts temporarily wasn't exactly safe since (I presume) that
> could have violated spin_lock_irqsave() semantics and ended up with a
> deadlock of its own.
How is any of that not utterly and terminally broken?
> @@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
> * can't happen.
> */
> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> - && !oops_in_progress && !early_boot_irqs_disabled);
> + && !oops_in_progress && !early_boot_irqs_disabled
> + && !in_dbg_master());
>
> /* Try to fastpath. So, what's a CPU they want? Ignoring this one. */
> cpu = cpumask_first_and(mask, cpu_online_mask);
Not a fan of this. There is a distinct difference between
oops_in_progress and dropping into kgdb in that you don't ever expect to
return/survive oopses, whereas we do expect to survive kgdb.
Also, how does kgdb work at all without actual NMIs ?
So no, NAK on this.
Daniel,
On Tue, Oct 30, 2018 at 2:42 AM Daniel Thompson
<[email protected]> wrote:
>
> On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> > In kgdb_roundup_cpus() we've got code that looks like:
> > local_irq_enable();
> > smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> > local_irq_disable();
> >
> > In certain cases when we drop into kgdb (like with sysrq-g on a serial
> > console) we'll get a big yell that looks like:
> >
> > sysrq: SysRq : DEBUG
> > ------------[ cut here ]------------
> > DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> > WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
> > pstate: 604003c9 (nZCv DAIF +PAN -UAO)
> > pc : lockdep_hardirqs_on+0xf0/0x160
> > ...
> > Call trace:
> > lockdep_hardirqs_on+0xf0/0x160
> > trace_hardirqs_on+0x188/0x1ac
> > kgdb_roundup_cpus+0x14/0x3c
> > kgdb_cpu_enter+0x53c/0x5cc
> > kgdb_handle_exception+0x180/0x1d4
> > kgdb_compiled_brk_fn+0x30/0x3c
> > brk_handler+0x134/0x178
> > do_debug_exception+0xfc/0x178
> > el1_dbg+0x18/0x78
> > kgdb_breakpoint+0x34/0x58
> > sysrq_handle_dbg+0x54/0x5c
> > __handle_sysrq+0x114/0x21c
> > handle_sysrq+0x30/0x3c
> > qcom_geni_serial_isr+0x2dc/0x30c
> > ...
> > ...
> > irq event stamp: ...45
> > hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4
> > hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
> > softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34
> > softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
> > ---[ end trace adf21f830c46e638 ]---
> >
> > Let's add kgdb to the list of reasons not to warn in
> > smp_call_function_many(). That will allow us (in a future patch) to
> > stop calling local_irq_enable() which will get rid of the original
> > splat.
> >
> > NOTE: with this change comes the obvious question: will we start
> > deadlocking more often now when we drop into the debugger. I can't
> > say that for sure one way or the other, but the fact that we do the
> > same logic for "oops_in_progress" makes me feel like it shouldn't
> > happen too often. Also note that the old logic of turning on
> > interrupts temporarily wasn't exactly safe since (I presume) that
> > could have violated spin_lock_irqsave() semantics and ended up with a
> > deadlock of its own.
>
> This is part of the code to bring all the cores to a halt and since
> the other cores are still running kgdb isn't yet able to use the fact
> all the CPUs are halted to bend the rules. It is better for this code
> to play by the rules if it can.
>
> Is is possible to get the roundup functions to use a private csd
> alongside smp_call_function_single_async()? We could add a helper
> function to the debug core to avoid having to add cpu_online loops into
> every kgdb_roundup_cpus() implementaton.
Exactly the kind of helpful suggestion I was looking for. Thank you
very much! See v2 and hopefully it matches what you were thinking of.
-Doug
Hi,
On Tue, Oct 30, 2018 at 4:46 AM Daniel Thompson
<[email protected]> wrote:
>
> On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> > The function kgdb_roundup_cpus() was passed a parameter that was
> > documented as:
> >
> > > the flags that will be used when restoring the interrupts. There is
> > > local_irq_save() call before kgdb_roundup_cpus().
> >
> > Nobody used those flags. Anyone who wanted to temporarily turn on
> > interrupts just did local_irq_enable() and local_irq_disable() without
> > looking at them. So we can definitely remove the flags.
>
> On the whole I'd rather that this change...
>
>
> > Speaking of calling local_irq_enable(), it seems like a bad idea and
> > it caused a nice splat on my system with lockdep turned on.
> > Specifically it hit:
> > DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>
> ... and fixes for this this were in separate patches. They don't appear
> especially related.
Agreed that this is cleaner. Done for v2.
-Doug
Quoting Douglas Anderson (2018-10-29 11:07:01)
> The geni serial driver already had some sysrq code in it, but since
> SUPPORT_SYSRQ wasn't defined the code didn't do anything useful.
> Let's make it useful by adding that define using the same formula
> found in other serial drivers.
>
> In order to prevent deadlock, we'll take a page from the
> 'msm_serial.c' where the spinlock is released around
> uart_handle_sysrq_char(). This seemed better than copying from
> '8250_port.c' where we skip locking in the console_write function
> since the '8250_port.c' method can cause lockdep warnings when
> dropping into kgdb.
>
> Signed-off-by: Douglas Anderson <[email protected]>
Given that it's the same as msm_serial.c, and I wrote the msm_serial.c
"hack" then:
Reviewed-by: Stephen Boyd <[email protected]>