Hi folks, this the second iteration of the panic notifiers refactor work,
but limited to the fixes/clean-ups in the first moment. The (full) V1 is
available at:
https://lore.kernel.org/lkml/[email protected]/
The idea of splitting the series is that, originally we had a bunch of fixes
followed by the notifiers refactor, but this second part (the effective
refactor) is a bit "polemic", with reviews having antagonistic goals and some
complexities - it might be hard to achieve consensus.
For the curious, here is a good summary of the conflicting views and some
strategies we might take in the refactor V2:
https://lore.kernel.org/lkml/[email protected]/
So splitting and sending only the simple fixes/clean-ups in a first moment
makes sense, this way we don't prevent them to be discussed/merged/reworked
while the more complex part is subject to scrutiny in a different (future)
email thread.
I've tried to test this series building for all affected architecture/drivers
and also through some boot/runtime tests; below the test "matrix" used:
Build tests (using cross-compilers): alpha, arm, arm64, parisc, um, x86_64.
Boot/Runtime tests: x86_64 (Hyper-V and QEMU guests).
Here is the link with the .config files used:
https://people.igalia.com/gpiccoli/panic_notifiers_configs/5.19-rc7/
(tried my best to build all the affected code).
The series is based on 5.19-rc7; I'd like to ask that, if possible, maintainers
take the patches here in their trees, since there is no need to merge the series
as whole, patches are independent from each other.
Regarding the CC strategy, I've tried to reduce a bit the list of CCed emails,
given that it was huge in the first iteration. Hopefully I didn't forget
anybody interested in the topic (my apologies if so).
As usual, reviews / comments are always welcome, thanks in advance for them!
Cheers,
Guilherme
Guilherme G. Piccoli (13):
ARM: Disable FIQs (but not IRQs) on CPUs shutdown paths
notifier: Add panic notifiers info and purge trailing whitespaces
firmware: google: Test spinlock on panic path to avoid lockups
soc: bcm: brcmstb: Document panic notifier action and remove useless header
alpha: Clean-up the panic notifier code
um: Improve panic notifiers consistency and ordering
parisc: Replace regular spinlock with spin_trylock on panic path
tracing: Improve panic/die notifiers
notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
EDAC/altera: Skip the panic notifier if kdump is loaded
video/hyperv_fb: Avoid taking busy spinlock on panic path
drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
panic: Fixes the panic_print NMI backtrace setting
arch/alpha/kernel/setup.c | 36 ++++-----
arch/arm/kernel/machine_kexec.c | 2 +
arch/arm/kernel/smp.c | 5 +-
arch/parisc/include/asm/pdc.h | 1 +
arch/parisc/kernel/firmware.c | 27 ++++++-
arch/um/drivers/mconsole_kern.c | 7 +-
arch/um/kernel/um_arch.c | 8 +-
drivers/edac/altera_edac.c | 16 +++-
drivers/firmware/google/gsmi.c | 8 ++
drivers/hv/ring_buffer.c | 16 ++++
drivers/hv/vmbus_drv.c | 109 +++++++++++++++++-----------
drivers/parisc/power.c | 17 +++--
drivers/soc/bcm/brcmstb/pm/pm-arm.c | 16 +++-
drivers/video/fbdev/hyperv_fb.c | 16 +++-
include/linux/hyperv.h | 2 +
include/linux/notifier.h | 8 +-
kernel/notifier.c | 22 ++++--
kernel/panic.c | 47 +++++++-----
kernel/trace/trace.c | 55 +++++++-------
19 files changed, 268 insertions(+), 150 deletions(-)
--
2.37.1
The Hyper-V framebuffer code registers a panic notifier in order
to try updating its fbdev if the kernel crashed. The notifier
callback is straightforward, but it calls the vmbus_sendpacket()
routine eventually, and such function takes a spinlock for the
ring buffer operations.
Panic path runs in atomic context, with local interrupts and
preemption disabled, and all secondary CPUs shutdown. That said,
taking a spinlock might cause a lockup if a secondary CPU was
disabled with such lock taken. Fix it here by checking if the
ring buffer spinlock is busy on Hyper-V framebuffer panic notifier;
if so, bail-out avoiding the potential lockup scenario.
Cc: Andrea Parri (Microsoft) <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Tianyu Lan <[email protected]>
Cc: Wei Liu <[email protected]>
Tested-by: Fabio A M Martins <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- new patch, based on the discussion in [0].
[0] https://lore.kernel.org/lkml/[email protected]/
drivers/hv/ring_buffer.c | 16 ++++++++++++++++
drivers/video/fbdev/hyperv_fb.c | 8 +++++++-
include/linux/hyperv.h | 2 ++
3 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 59a4aa86d1f3..9ceb3a7e8d19 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -280,6 +280,22 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info)
ring_info->pkt_buffer_size = 0;
}
+/*
+ * Check if the ring buffer spinlock is available to take or not; used on
+ * atomic contexts, like panic path (see the Hyper-V framebuffer driver).
+ */
+
+bool hv_ringbuffer_spinlock_busy(struct vmbus_channel *channel)
+{
+ struct hv_ring_buffer_info *rinfo = &channel->outbound;
+
+ if (spin_is_locked(&rinfo->ring_lock))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(hv_ringbuffer_spinlock_busy);
+
/* Write to the ring buffer. */
int hv_ringbuffer_write(struct vmbus_channel *channel,
const struct kvec *kv_list, u32 kv_count,
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index 886c564787f1..e1b65a01fb96 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -783,12 +783,18 @@ static void hvfb_ondemand_refresh_throttle(struct hvfb_par *par,
static int hvfb_on_panic(struct notifier_block *nb,
unsigned long e, void *p)
{
+ struct hv_device *hdev;
struct hvfb_par *par;
struct fb_info *info;
par = container_of(nb, struct hvfb_par, hvfb_panic_nb);
- par->synchronous_fb = true;
info = par->info;
+ hdev = device_to_hv_device(info->device);
+
+ if (hv_ringbuffer_spinlock_busy(hdev->channel))
+ return NOTIFY_DONE;
+
+ par->synchronous_fb = true;
if (par->need_docopy)
hvfb_docopy(par, 0, dio_fb_size);
synthvid_update(info, 0, 0, INT_MAX, INT_MAX);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 3b42264333ef..646f1da9f27e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1341,6 +1341,8 @@ struct hv_ring_buffer_debug_info {
int hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
struct hv_ring_buffer_debug_info *debug_info);
+bool hv_ringbuffer_spinlock_busy(struct vmbus_channel *channel);
+
/* Vmbus interface */
#define vmbus_driver_register(driver) \
__vmbus_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
--
2.37.1
Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
is responsible for that. IRQs are architecturally masked when we
take an interrupt, but FIQs are high priority than IRQs, hence they
aren't masked. With that said, it makes sense to disable FIQs here,
but there's no need for (re-)disabling IRQs.
More than that: there is an alternative path for disabling CPUs,
in the form of function crash_smp_send_stop(), which is used for
kexec/panic path. This function relies on a SMP call that also
triggers a busy-wait loop [at machine_crash_nonpanic_core()], but
without disabling FIQs. This might lead to odd scenarios, like
early interrupts in the boot of kexec'd kernel or even interrupts
in secondary "disabled" CPUs while the main one still works in the
panic path and assumes all secondary CPUs are (really!) off.
So, let's disable FIQs in both paths and *not* disable IRQs a second
time, since they are already masked in both paths by the architecture.
This way, we keep both CPU quiesce paths consistent and safe.
Cc: Marc Zyngier <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Russell King <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Small wording improvement (thanks Michael Kelley);
- Only disable FIQs, since IRQs are masked by architecture
definition when we take an interrupt. Thanks a lot Russell
and Marc for the discussion [0].
Should we add a Fixes tag here? If so, maybe the proper target is:
b23065313297 ("ARM: 6522/1: kexec: Add call to non-crashing cores through IPI")
[0] https://lore.kernel.org/lkml/[email protected]/
arch/arm/kernel/machine_kexec.c | 2 ++
arch/arm/kernel/smp.c | 5 ++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..0b482bcb97f7 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -77,6 +77,8 @@ void machine_crash_nonpanic_core(void *unused)
{
struct pt_regs regs;
+ local_fiq_disable();
+
crash_setup_regs(®s, get_irq_regs());
printk(KERN_DEBUG "CPU %u will stop doing anything useful since another CPU has crashed\n",
smp_processor_id());
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 73fc645fc4c7..0c24ec1fd88e 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -600,6 +600,8 @@ static DEFINE_RAW_SPINLOCK(stop_lock);
*/
static void ipi_cpu_stop(unsigned int cpu)
{
+ local_fiq_disable();
+
if (system_state <= SYSTEM_RUNNING) {
raw_spin_lock(&stop_lock);
pr_crit("CPU%u: stopping\n", cpu);
@@ -609,9 +611,6 @@ static void ipi_cpu_stop(unsigned int cpu)
set_cpu_online(cpu, false);
- local_fiq_disable();
- local_irq_disable();
-
while (1) {
cpu_relax();
wfe();
--
2.37.1
Currently we have a debug infrastructure in the notifiers file, but
it's very simple/limited. Extend it by:
(a) Showing all registered/unregistered notifiers' callback names;
(b) Adding a dynamic debug tuning to allow showing called notifiers'
function names. Notice that this should be guarded as a tunable since
it can flood the kernel log buffer.
Cc: Arjan van de Ven <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Xiaoming Ni <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Major improvement thanks to the great idea from Xiaoming - changed
all the ksym wheel reinvention to printk %ps modifier;
- Instead of ifdefs, using IS_ENABLED() - thanks Steven.
- Removed an unlikely() hint on debug path.
kernel/notifier.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 0d5bd62c480e..350761b34f8a 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -37,6 +37,10 @@ static int notifier_chain_register(struct notifier_block **nl,
}
n->next = *nl;
rcu_assign_pointer(*nl, n);
+
+ if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
+ pr_info("notifiers: registered %ps\n", n->notifier_call);
+
return 0;
}
@@ -46,6 +50,11 @@ static int notifier_chain_unregister(struct notifier_block **nl,
while ((*nl) != NULL) {
if ((*nl) == n) {
rcu_assign_pointer(*nl, n->next);
+
+ if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
+ pr_info("notifiers: unregistered %ps\n",
+ n->notifier_call);
+
return 0;
}
nl = &((*nl)->next);
@@ -77,13 +86,14 @@ static int notifier_call_chain(struct notifier_block **nl,
while (nb && nr_to_call) {
next_nb = rcu_dereference_raw(nb->next);
-#ifdef CONFIG_DEBUG_NOTIFIERS
- if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
- WARN(1, "Invalid notifier called!");
- nb = next_nb;
- continue;
+ if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS)) {
+ if (!func_ptr_is_kernel_text(nb->notifier_call)) {
+ WARN(1, "Invalid notifier called!");
+ nb = next_nb;
+ continue;
+ }
+ pr_debug("notifiers: calling %ps\n", nb->notifier_call);
}
-#endif
ret = nb->notifier_call(nb, val, v);
if (nr_calls)
--
2.37.1
The panic notifiers' callbacks execute in an atomic context, with
interrupts/preemption disabled, and all CPUs not running the panic
function are off, so it's very dangerous to wait on a regular
spinlock, there's a risk of deadlock.
Refactor the panic notifier of parisc/power driver to make use
of spin_trylock - for that, we've added a second version of the
soft-power function. Also, some comments were reorganized and
trailing white spaces, useless header inclusion and blank lines
were removed.
Cc: "James E.J. Bottomley" <[email protected]>
Acked-by: Helge Deller <[email protected]> # parisc
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Added Helge's ACK - thanks!
arch/parisc/include/asm/pdc.h | 1 +
arch/parisc/kernel/firmware.c | 27 +++++++++++++++++++++++----
drivers/parisc/power.c | 17 ++++++++++-------
3 files changed, 34 insertions(+), 11 deletions(-)
diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
index b643092d4b98..7a106008e258 100644
--- a/arch/parisc/include/asm/pdc.h
+++ b/arch/parisc/include/asm/pdc.h
@@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
int pdc_do_reset(void);
int pdc_soft_power_info(unsigned long *power_reg);
int pdc_soft_power_button(int sw_control);
+int pdc_soft_power_button_panic(int sw_control);
void pdc_io_reset(void);
void pdc_io_reset_devices(void);
int pdc_iodc_getc(void);
diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
index 6a7e315bcc2e..0e2f70b592f4 100644
--- a/arch/parisc/kernel/firmware.c
+++ b/arch/parisc/kernel/firmware.c
@@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long *power_reg)
}
/*
- * pdc_soft_power_button - Control the soft power button behaviour
- * @sw_control: 0 for hardware control, 1 for software control
+ * pdc_soft_power_button{_panic} - Control the soft power button behaviour
+ * @sw_control: 0 for hardware control, 1 for software control
*
*
* This PDC function places the soft power button under software or
* hardware control.
- * Under software control the OS may control to when to allow to shut
- * down the system. Under hardware control pressing the power button
+ * Under software control the OS may control to when to allow to shut
+ * down the system. Under hardware control pressing the power button
* powers off the system immediately.
+ *
+ * The _panic version relies in spin_trylock to prevent deadlock
+ * on panic path.
*/
int pdc_soft_power_button(int sw_control)
{
@@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
return retval;
}
+int pdc_soft_power_button_panic(int sw_control)
+{
+ int retval;
+ unsigned long flags;
+
+ if (!spin_trylock_irqsave(&pdc_lock, flags)) {
+ pr_emerg("Couldn't enable soft power button\n");
+ return -EBUSY; /* ignored by the panic notifier */
+ }
+
+ retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, __pa(pdc_result), sw_control);
+ spin_unlock_irqrestore(&pdc_lock, flags);
+
+ return retval;
+}
+
/*
* pdc_io_reset - Hack to avoid overlapping range registers of Bridges devices.
* Primarily a problem on T600 (which parisc-linux doesn't support) but
diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
index 456776bd8ee6..8512884de2cf 100644
--- a/drivers/parisc/power.c
+++ b/drivers/parisc/power.c
@@ -37,7 +37,6 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h>
-#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/reboot.h>
#include <linux/sched/signal.h>
@@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
-/* parisc_panic_event() is called by the panic handler.
- * As soon as a panic occurs, our tasklets above will not be
- * executed any longer. This function then re-enables the
- * soft-power switch and allows the user to switch off the system
+/*
+ * parisc_panic_event() is called by the panic handler.
+ *
+ * As soon as a panic occurs, our tasklets above will not
+ * be executed any longer. This function then re-enables
+ * the soft-power switch and allows the user to switch off
+ * the system. We rely in pdc_soft_power_button_panic()
+ * since this version spin_trylocks (instead of regular
+ * spinlock), preventing deadlocks on panic path.
*/
static int parisc_panic_event(struct notifier_block *this,
unsigned long event, void *ptr)
{
/* re-enable the soft-power switch */
- pdc_soft_power_button(0);
+ pdc_soft_power_button_panic(0);
return NOTIFY_DONE;
}
@@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
.priority = INT_MAX,
};
-
static int __init power_init(void)
{
unsigned long ret;
--
2.37.1
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing hypervisor <-> guest connection,
disabling some paravirtualized timer) as well as to data collection
(sending panic information to the hypervisor) and framebuffer management.
The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. As part of an effort to
clean-up the panic notifiers mechanism and better document things, we
hereby address some of the issues/complexities of Hyper-V panic handling
through the following changes:
(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.
This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.
(b) There is a Hyper-V framebuffer panic notifier, which relies in doing
a vmbus operation that demands a valid connection. So, we must order this
notifier with the panic notifier from vmbus_drv.c, to guarantee that the
framebuffer code executes before the vmbus connection is unloaded.
Also, this commit removes a useless header.
Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.
Cc: Andrea Parri (Microsoft) <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Tianyu Lan <[email protected]>
Cc: Wei Liu <[email protected]>
Tested-by: Fabio A M Martins <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Unfortunately we cannot rely in the crash shutdown (custom) handler
to perform the vmbus unload - arm64 architecture doesn't have this
"feature" [0]. So, in V2 we kept the notifier behavior and always
unload the vmbus connection, no matter what - thanks Michael for
pointing that;
- Removed the Fixes tags as per Michael suggestion;
- As per Petr suggestion, we abandoned the idea of distinguish among
notifiers using an id - so, in V2 we rely in the old and good address
comparison for that. Thanks Petr for the enriching discussion!
[0] https://lore.kernel.org/lkml/[email protected]/
drivers/hv/vmbus_drv.c | 109 +++++++++++++++++++-------------
drivers/video/fbdev/hyperv_fb.c | 8 +++
2 files changed, 74 insertions(+), 43 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 547ae334e5cd..75c5623e7289 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -25,7 +25,6 @@
#include <linux/sched/task_stack.h>
#include <linux/delay.h>
-#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/ptrace.h>
#include <linux/screen_info.h>
@@ -69,53 +68,74 @@ static int hyperv_report_reg(void)
return !sysctl_record_panic_msg || !hv_panic_page;
}
-static int hyperv_panic_event(struct notifier_block *nb, unsigned long val,
+/*
+ * The panic notifier below is responsible solely for unloading the
+ * vmbus connection, which is necessary in a panic event.
+ *
+ * Notice an intrincate relation of this notifier with Hyper-V
+ * framebuffer panic notifier exists - we need vmbus connection alive
+ * there in order to succeed, so we need to order both with each other
+ * [see hvfb_on_panic()] - this is done using notifiers' priorities.
+ */
+static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
void *args)
+{
+ vmbus_initiate_unload(true);
+ return NOTIFY_DONE;
+}
+static struct notifier_block hyperv_panic_vmbus_unload_block = {
+ .notifier_call = hv_panic_vmbus_unload,
+ .priority = INT_MIN + 1, /* almost the latest one to execute */
+};
+
+static int hv_die_panic_notify_crash(struct notifier_block *self,
+ unsigned long val, void *args);
+
+static struct notifier_block hyperv_die_report_block = {
+ .notifier_call = hv_die_panic_notify_crash,
+};
+static struct notifier_block hyperv_panic_report_block = {
+ .notifier_call = hv_die_panic_notify_crash,
+};
+
+/*
+ * The following callback works both as die and panic notifier; its
+ * goal is to provide panic information to the hypervisor unless the
+ * kmsg dumper is used [see hv_kmsg_dump()], which provides more
+ * information but isn't always available.
+ *
+ * Notice that both the panic/die report notifiers are registered only
+ * if we have the capability HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE set.
+ */
+static int hv_die_panic_notify_crash(struct notifier_block *self,
+ unsigned long val, void *args)
{
struct pt_regs *regs;
+ bool is_die;
- vmbus_initiate_unload(true);
-
- /*
- * Hyper-V should be notified only once about a panic. If we will be
- * doing hv_kmsg_dump() with kmsg data later, don't do the notification
- * here.
- */
- if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE
- && hyperv_report_reg()) {
+ /* Don't notify Hyper-V unless we have a die oops event or panic. */
+ if (self == &hyperv_panic_report_block) {
+ is_die = false;
regs = current_pt_regs();
- hyperv_report_panic(regs, val, false);
+ } else { /* die event */
+ if (val != DIE_OOPS)
+ return NOTIFY_DONE;
+
+ is_die = true;
+ regs = ((struct die_args *)args)->regs;
}
- return NOTIFY_DONE;
-}
-
-static int hyperv_die_event(struct notifier_block *nb, unsigned long val,
- void *args)
-{
- struct die_args *die = args;
- struct pt_regs *regs = die->regs;
-
- /* Don't notify Hyper-V if the die event is other than oops */
- if (val != DIE_OOPS)
- return NOTIFY_DONE;
/*
- * Hyper-V should be notified only once about a panic. If we will be
- * doing hv_kmsg_dump() with kmsg data later, don't do the notification
- * here.
+ * Hyper-V should be notified only once about a panic/die. If we will
+ * be calling hv_kmsg_dump() later with kmsg data, don't do the
+ * notification here.
*/
if (hyperv_report_reg())
- hyperv_report_panic(regs, val, true);
+ hyperv_report_panic(regs, val, is_die);
+
return NOTIFY_DONE;
}
-static struct notifier_block hyperv_die_block = {
- .notifier_call = hyperv_die_event,
-};
-static struct notifier_block hyperv_panic_block = {
- .notifier_call = hyperv_panic_event,
-};
-
static const char *fb_mmio_name = "fb_range";
static struct resource *fb_mmio;
static struct resource *hyperv_mmio;
@@ -1534,16 +1554,17 @@ static int vmbus_bus_init(void)
if (hyperv_crash_ctl & HV_CRASH_CTL_CRASH_NOTIFY_MSG)
hv_kmsg_dump_register();
- register_die_notifier(&hyperv_die_block);
+ register_die_notifier(&hyperv_die_report_block);
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &hyperv_panic_report_block);
}
/*
- * Always register the panic notifier because we need to unload
- * the VMbus channel connection to prevent any VMbus
- * activity after the VM panics.
+ * Always register the vmbus unload panic notifier because we
+ * need to shut the VMbus channel connection on panic.
*/
atomic_notifier_chain_register(&panic_notifier_list,
- &hyperv_panic_block);
+ &hyperv_panic_vmbus_unload_block);
vmbus_request_offers();
@@ -2765,15 +2786,17 @@ static void __exit vmbus_exit(void)
if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
kmsg_dump_unregister(&hv_kmsg_dumper);
- unregister_die_notifier(&hyperv_die_block);
+ unregister_die_notifier(&hyperv_die_report_block);
+ atomic_notifier_chain_unregister(&panic_notifier_list,
+ &hyperv_panic_report_block);
}
/*
- * The panic notifier is always registered, hence we should
+ * The vmbus panic notifier is always registered, hence we should
* also unconditionally unregister it here as well.
*/
atomic_notifier_chain_unregister(&panic_notifier_list,
- &hyperv_panic_block);
+ &hyperv_panic_vmbus_unload_block);
free_page((unsigned long)hv_panic_page);
unregister_sysctl_table(hv_ctl_table_hdr);
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index e1b65a01fb96..9234d31d4305 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -1216,7 +1216,15 @@ static int hvfb_probe(struct hv_device *hdev,
par->fb_ready = true;
par->synchronous_fb = false;
+
+ /*
+ * We need to be sure this panic notifier runs _before_ the
+ * vmbus disconnect, so order it by priority. It must execute
+ * before the function hv_panic_vmbus_unload() [drivers/hv/vmbus_drv.c],
+ * which is almost at the end of list, with priority = INT_MIN + 1.
+ */
par->hvfb_panic_nb.notifier_call = hvfb_on_panic;
+ par->hvfb_panic_nb.priority = INT_MIN + 10,
atomic_notifier_chain_register(&panic_notifier_list,
&par->hvfb_panic_nb);
--
2.37.1
The alpha panic notifier has some code issues, not following
the conventions of other notifiers. Also, it might halt the
machine but still it is set to run as early as possible, which
doesn't seem to be a good idea.
So, let's clean the code and set the notifier to run as the
latest, following the same approach other architectures are
doing - also, remove the unnecessary include of a header already
included indirectly.
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Richard Henderson <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Fixed rth email address;
- Added Petr's review tag - thanks!
arch/alpha/kernel/setup.c | 36 +++++++++++++++---------------------
1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index b4fbbba30aa2..d88bdf852753 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -41,19 +41,11 @@
#include <linux/sysrq.h>
#include <linux/reboot.h>
#endif
-#include <linux/notifier.h>
#include <asm/setup.h>
#include <asm/io.h>
#include <linux/log2.h>
#include <linux/export.h>
-static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
-static struct notifier_block alpha_panic_block = {
- alpha_panic_event,
- NULL,
- INT_MAX /* try to do it first */
-};
-
#include <linux/uaccess.h>
#include <asm/hwrpb.h>
#include <asm/dma.h>
@@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
};
#endif
+static int alpha_panic_event(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+ /* If we are using SRM and serial console, just hard halt here. */
+ if (alpha_using_srm && srmcons_output)
+ __halt();
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block alpha_panic_block = {
+ .notifier_call = alpha_panic_event,
+ .priority = INT_MIN, /* may not return, do it last */
+};
+
void __init
setup_arch(char **cmdline_p)
{
@@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
.show = show_cpuinfo,
};
-
-static int
-alpha_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
-{
-#if 1
- /* FIXME FIXME FIXME */
- /* If we are using SRM and serial console, just hard halt here. */
- if (alpha_using_srm && srmcons_output)
- __halt();
-#endif
- return NOTIFY_DONE;
-}
-
static __init int add_pcspkr(void)
{
struct platform_device *pd;
--
2.37.1
The panic notifier of this driver is very simple code-wise, just a
memory write to a special position with some numeric code. But this
is not clear from the semantic point-of-view, and there is no public
documentation about that either.
After discussing this in the mailing-lists [0] and having Florian
explained it very well, document that in the code for the future
generations asking the same questions. Also, while at it, remove
a useless header.
[0] https://lore.kernel.org/lkml/[email protected]
Cc: Brian Norris <[email protected]>
Cc: Doug Berger <[email protected]>
Cc: Justin Chen <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Markus Mayer <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Removed the Fixes tag;
- Added Florian's ACK - thanks!
drivers/soc/bcm/brcmstb/pm/pm-arm.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/soc/bcm/brcmstb/pm/pm-arm.c b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
index 70ad0f3dce28..2a6adaa29596 100644
--- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
+++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
@@ -25,7 +25,6 @@
#include <linux/kernel.h>
#include <linux/memblock.h>
#include <linux/module.h>
-#include <linux/notifier.h>
#include <linux/of.h>
#include <linux/of_address.h>
#include <linux/panic_notifier.h>
@@ -664,7 +663,20 @@ static void __iomem *brcmstb_ioremap_match(const struct of_device_id *matches,
return of_io_request_and_map(dn, index, dn->full_name);
}
-
+/*
+ * The AON is a small domain in the SoC that can retain its state across
+ * various system wide sleep states and specific reset conditions; the
+ * AON DATA RAM is a small RAM of a few words (< 1KB) which can store
+ * persistent information across such events.
+ *
+ * The purpose of the below panic notifier is to help with notifying
+ * the bootloader that a panic occurred and so that it should try its
+ * best to preserve the DRAM contents holding that buffer for recovery
+ * by the kernel as opposed to wiping out DRAM clean again.
+ *
+ * Reference: comment from Florian Fainelli, at
+ * https://lore.kernel.org/lkml/[email protected]
+ */
static int brcmstb_pm_panic_notify(struct notifier_block *nb,
unsigned long action, void *data)
{
--
2.37.1
Currently the panic notifiers from user mode linux don't follow
the convention for most of the other notifiers present in the
kernel (indentation, priority setting, numeric return).
More important, the priorities could be improved, since it's a
special case (userspace), hence we could run the notifiers earlier;
user mode linux shouldn't care much with other panic notifiers but
the ordering among the mconsole and arch notifier is important,
given that the arch one effectively triggers a core dump.
Fix that by running the mconsole notifier as the first panic
notifier, followed by the architecture one (that coredumps).
Cc: Anton Ivanov <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Richard Weinberger <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Kept the notifier header to avoid implicit usage - thanks
Johannes for the suggestion!
arch/um/drivers/mconsole_kern.c | 7 +++----
arch/um/kernel/um_arch.c | 8 ++++----
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 8ca67a692683..69af3ce8407a 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -846,13 +846,12 @@ static int notify_panic(struct notifier_block *self, unsigned long unused1,
mconsole_notify(notify_socket, MCONSOLE_PANIC, message,
strlen(message) + 1);
- return 0;
+ return NOTIFY_DONE;
}
static struct notifier_block panic_exit_notifier = {
- .notifier_call = notify_panic,
- .next = NULL,
- .priority = 1
+ .notifier_call = notify_panic,
+ .priority = INT_MAX, /* run as soon as possible */
};
static int add_notifier(void)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 9838967d0b2f..970fdccc2f94 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -246,13 +246,13 @@ static int panic_exit(struct notifier_block *self, unsigned long unused1,
bust_spinlocks(0);
uml_exitcode = 1;
os_dump_core();
- return 0;
+
+ return NOTIFY_DONE;
}
static struct notifier_block panic_exit_notifier = {
- .notifier_call = panic_exit,
- .next = NULL,
- .priority = 0
+ .notifier_call = panic_exit,
+ .priority = INT_MAX - 1, /* run as 2nd notifier, won't return */
};
void uml_finishsetup(void)
--
2.37.1
Commit 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in panic_print")
introduced a setting for the "panic_print" kernel parameter to allow
users to request a NMI backtrace on panic. Problem is that the panic_print
handling happens after the secondary CPUs are already disabled, hence
this option ended-up being kind of a no-op - kernel skips the NMI trace
in idling CPUs, which is the case of offline CPUs.
Fix it by checking the NMI backtrace bit in the panic_print prior to
the CPU disabling function.
Fixes: 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in panic_print")
Cc: Feng Tang <[email protected]>
Cc: Petr Mladek <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- new patch, there was no V1 of this one.
Hi folks, thanks upfront for reviews. This is a new patch, fixing an issue
I found in my tests, so I shoved it into this fixes series.
Notice that while at it, I got rid of the "crash_kexec_post_notifiers"
local copy in panic(). This was introduced by commit b26e27ddfd2a
("kexec: use core_param for crash_kexec_post_notifiers boot option"),
but it is not clear from comments or commit message why this local copy
is required.
My understanding is that it's a mechanism to prevent some concurrency,
in case some other CPU modify this variable while panic() is running.
I find it very unlikely, hence I removed it - but if people consider
this copy needed, I can respin this patch and keep it, even providing a
comment about that, in order to be explict about its need.
Let me know your thoughts! Cheers,
Guilherme
kernel/panic.c | 47 +++++++++++++++++++++++++++--------------------
1 file changed, 27 insertions(+), 20 deletions(-)
diff --git a/kernel/panic.c b/kernel/panic.c
index a3308af28a21..7fb604e95ef9 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -180,9 +180,6 @@ static void panic_print_sys_info(bool console_flush)
return;
}
- if (panic_print & PANIC_PRINT_ALL_CPU_BT)
- trigger_all_cpu_backtrace();
-
if (panic_print & PANIC_PRINT_TASK_INFO)
show_state();
@@ -199,6 +196,30 @@ static void panic_print_sys_info(bool console_flush)
ftrace_dump(DUMP_ALL);
}
+/*
+ * Helper that triggers the NMI backtrace (if set in panic_print)
+ * and then performs the secondary CPUs shutdown - we cannot have
+ * the NMI backtrace after the CPUs are off!
+ */
+static void panic_other_cpus_shutdown(void)
+{
+ if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+ trigger_all_cpu_backtrace();
+
+ /*
+ * Note that smp_send_stop() is the usual SMP shutdown function,
+ * which unfortunately may not be hardened to work in a panic
+ * situation. If we want to do crash dump after notifier calls
+ * and kmsg_dump, we will need architecture dependent extra
+ * bits in addition to stopping other CPUs, hence we rely on
+ * crash_smp_send_stop() for that.
+ */
+ if (!crash_kexec_post_notifiers)
+ smp_send_stop();
+ else
+ crash_smp_send_stop();
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -214,7 +235,6 @@ void panic(const char *fmt, ...)
long i, i_next = 0, len;
int state = 0;
int old_cpu, this_cpu;
- bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
if (panic_on_warn) {
/*
@@ -289,23 +309,10 @@ void panic(const char *fmt, ...)
*
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
- if (!_crash_kexec_post_notifiers) {
+ if (!crash_kexec_post_notifiers)
__crash_kexec(NULL);
- /*
- * Note smp_send_stop is the usual smp shutdown function, which
- * unfortunately means it may not be hardened to work in a
- * panic situation.
- */
- smp_send_stop();
- } else {
- /*
- * If we want to do crash dump after notifier calls and
- * kmsg_dump, we will need architecture dependent extra
- * works in addition to stopping other CPUs.
- */
- crash_smp_send_stop();
- }
+ panic_other_cpus_shutdown();
/*
* Run any panic handlers, including those that might need to
@@ -326,7 +333,7 @@ void panic(const char *fmt, ...)
*
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
- if (_crash_kexec_post_notifiers)
+ if (crash_kexec_post_notifiers)
__crash_kexec(NULL);
#ifdef CONFIG_VT
--
2.37.1
The altera_edac panic notifier performs some data collection with
regards errors detected; such code relies in the regmap layer to
perform reads/writes, so the code is abstracted and there is some
risk level to execute that, since the panic path runs in atomic
context, with interrupts/preemption and secondary CPUs disabled.
Users want the information collected in this panic notifier though,
so in order to balance the risk/benefit, let's skip the altera panic
notifier if kdump is loaded. While at it, remove a useless header
and encompass a macro inside the sole ifdef block it is used.
Cc: Dinh Nguyen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- new patch, based on the discussion in [0].
[0] https://lore.kernel.org/lkml/[email protected]/
drivers/edac/altera_edac.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index e7e8e624a436..741fe5539154 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -16,7 +16,6 @@
#include <linux/kernel.h>
#include <linux/mfd/altera-sysmgr.h>
#include <linux/mfd/syscon.h>
-#include <linux/notifier.h>
#include <linux/of_address.h>
#include <linux/of_irq.h>
#include <linux/of_platform.h>
@@ -24,6 +23,7 @@
#include <linux/platform_device.h>
#include <linux/regmap.h>
#include <linux/types.h>
+#include <linux/kexec.h>
#include <linux/uaccess.h>
#include "altera_edac.h"
@@ -2063,22 +2063,30 @@ static const struct irq_domain_ops a10_eccmgr_ic_ops = {
};
/************** Stratix 10 EDAC Double Bit Error Handler ************/
-#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
-
#ifdef CONFIG_64BIT
/* panic routine issues reboot on non-zero panic_timeout */
extern int panic_timeout;
+#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
+
/*
* The double bit error is handled through SError which is fatal. This is
* called as a panic notifier to printout ECC error info as part of the panic.
+ *
+ * Notice that if kdump is set, we take the risk avoidance approach and
+ * skip the notifier, given that users are expected to have access to a
+ * full vmcore.
*/
static int s10_edac_dberr_handler(struct notifier_block *this,
unsigned long event, void *ptr)
{
- struct altr_arria10_edac *edac = to_a10edac(this, panic_notifier);
+ struct altr_arria10_edac *edac;
int err_addr, dberror;
+ if (kexec_crash_loaded())
+ return NOTIFY_DONE;
+
+ edac = to_a10edac(this, panic_notifier);
regmap_read(edac->ecc_mgr_map, S10_SYSMGR_ECC_INTSTAT_DERR_OFST,
&dberror);
regmap_write(edac->ecc_mgr_map, S10_SYSMGR_UE_VAL_OFST, dberror);
--
2.37.1
Currently the tracing dump_on_oops feature is implemented
through separate notifiers, one for die/oops and the other
for panic - given they have the same functionality, let's
unify them.
Also improve the function comment and change the priority of
the notifier to make it execute earlier, avoiding showing useless
trace data (like the callback names for the other notifiers);
finally, we also removed an unnecessary header inclusion.
Cc: Petr Mladek <[email protected]>
Cc: Sergei Shtylyov <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
V2:
- Different approach; instead of using IDs to distinguish die and
panic events, rely on address comparison like other notifiers do
and as per Petr's suggestion;
- Removed ACK from Steven since the code changed.
kernel/trace/trace.c | 55 ++++++++++++++++++++++----------------------
1 file changed, 27 insertions(+), 28 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b8dd54627075..2a436b645c70 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -19,7 +19,6 @@
#include <linux/kallsyms.h>
#include <linux/security.h>
#include <linux/seq_file.h>
-#include <linux/notifier.h>
#include <linux/irqflags.h>
#include <linux/debugfs.h>
#include <linux/tracefs.h>
@@ -9777,40 +9776,40 @@ static __init int tracer_init_tracefs(void)
fs_initcall(tracer_init_tracefs);
-static int trace_panic_handler(struct notifier_block *this,
- unsigned long event, void *unused)
-{
- if (ftrace_dump_on_oops)
- ftrace_dump(ftrace_dump_on_oops);
- return NOTIFY_OK;
-}
+static int trace_die_panic_handler(struct notifier_block *self,
+ unsigned long ev, void *unused);
static struct notifier_block trace_panic_notifier = {
- .notifier_call = trace_panic_handler,
- .next = NULL,
- .priority = 150 /* priority: INT_MAX >= x >= 0 */
+ .notifier_call = trace_die_panic_handler,
+ .priority = INT_MAX - 1,
};
-static int trace_die_handler(struct notifier_block *self,
- unsigned long val,
- void *data)
-{
- switch (val) {
- case DIE_OOPS:
- if (ftrace_dump_on_oops)
- ftrace_dump(ftrace_dump_on_oops);
- break;
- default:
- break;
- }
- return NOTIFY_OK;
-}
-
static struct notifier_block trace_die_notifier = {
- .notifier_call = trace_die_handler,
- .priority = 200
+ .notifier_call = trace_die_panic_handler,
+ .priority = INT_MAX - 1,
};
+/*
+ * The idea is to execute the following die/panic callback early, in order
+ * to avoid showing irrelevant information in the trace (like other panic
+ * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
+ * warnings get disabled (to prevent potential log flooding).
+ */
+static int trace_die_panic_handler(struct notifier_block *self,
+ unsigned long ev, void *unused)
+{
+ if (!ftrace_dump_on_oops)
+ goto out;
+
+ if (self == &trace_die_notifier && ev != DIE_OOPS)
+ goto out;
+
+ ftrace_dump(ftrace_dump_on_oops);
+
+out:
+ return NOTIFY_DONE;
+}
+
/*
* printk is set to max of 1024, we really don't need it that big.
* Nothing should be printing 1000 characters anyway.
--
2.37.1
On 19/07/2022 17:33, Arjan van de Ven wrote:
> On 7/19/2022 12:53 PM, Guilherme G. Piccoli wrote:
>> Currently we have a debug infrastructure in the notifiers file, but
>> it's very simple/limited. Extend it by:
>>
>> (a) Showing all registered/unregistered notifiers' callback names;
>
>
> I'm not yet convinced that this is the right direction.
> The original intent for this "debug" feature was to be lightweight enough that it could run in production, since at the time, rootkits
> liked to clobber/hijack notifiers and there were also some other signs of corruption at the time.
>
> By making something print (even at pr_info) for what are probably frequent non-error operations, you turn something that is light
> into something that's a lot more heavy and generally that's not a great idea... it'll be a performance surprise.
>
>
Is registering/un-registering notifiers a hot path, or performance
sensitive usually? For me, this patch proved to be very useful, and once
enabled, shows relatively few entries in dmesg, these operations aren't
so common thing it seems.
Also, if this Kconfig option was meant to run in production, maybe the
first thing would be have some sysfs tuning or anything able to turn it
on - I've worked with a variety of customers and the most terrifying
thing in servers is to install a new kernel and reboot heh
My understanding is that this debug infrastructure would be useful for
notifiers writers and people playing with the core notifiers
code...tracing would be much more useful in the context of checking if
some specific notifier got registered/executed in production environment
I guess.
Cheers,
Guilherme
On 7/19/2022 1:44 PM, Guilherme G. Piccoli wrote:
> On 19/07/2022 17:33, Arjan van de Ven wrote:
>> On 7/19/2022 12:53 PM, Guilherme G. Piccoli wrote:
>>> Currently we have a debug infrastructure in the notifiers file, but
>>> it's very simple/limited. Extend it by:
>>>
>>> (a) Showing all registered/unregistered notifiers' callback names;
>>
>>
>> I'm not yet convinced that this is the right direction.
>> The original intent for this "debug" feature was to be lightweight enough that it could run in production, since at the time, rootkits
>> liked to clobber/hijack notifiers and there were also some other signs of corruption at the time.
>>
>> By making something print (even at pr_info) for what are probably frequent non-error operations, you turn something that is light
>> into something that's a lot more heavy and generally that's not a great idea... it'll be a performance surprise.
>>
>>
>
> Is registering/un-registering notifiers a hot path, or performance
> sensitive usually? For me, this patch proved to be very useful, and once
> enabled, shows relatively few entries in dmesg, these operations aren't
> so common thing it seems.
>
> Also, if this Kconfig option was meant to run in production, maybe the
> first thing would be have some sysfs tuning or anything able to turn it
> on - I've worked with a variety of customers and the most terrifying
> thing in servers is to install a new kernel and reboot heh
>
> My understanding is that this debug infrastructure would be useful for
> notifiers writers and people playing with the core notifiers
> code...tracing would be much more useful in the context of checking if
> some specific notifier got registered/executed in production environment
> I guess.
I would totally support an approach where instead of pr_info, there's a tracepoint
for these events (and that shouldnt' need to be conditional on a config option)
that's not what the patch does though.
On 7/19/2022 12:53 PM, Guilherme G. Piccoli wrote:
> Currently we have a debug infrastructure in the notifiers file, but
> it's very simple/limited. Extend it by:
>
> (a) Showing all registered/unregistered notifiers' callback names;
I'm not yet convinced that this is the right direction.
The original intent for this "debug" feature was to be lightweight enough that it could run in production, since at the time, rootkits
liked to clobber/hijack notifiers and there were also some other signs of corruption at the time.
By making something print (even at pr_info) for what are probably frequent non-error operations, you turn something that is light
into something that's a lot more heavy and generally that's not a great idea... it'll be a performance surprise.
On 19/07/2022 17:48, Arjan van de Ven wrote:
> [...]
> I would totally support an approach where instead of pr_info, there's a tracepoint
> for these events (and that shouldnt' need to be conditional on a config option)
>
> that's not what the patch does though.
This is a good idea Arjan! We could use trace events or pr_debug() -
which one do you prefer?
With that, we could maybe remove this Kconfig option, having it always
enabled, what do you think ?
On 7/19/2022 2:00 PM, Guilherme G. Piccoli wrote:
> On 19/07/2022 17:48, Arjan van de Ven wrote:
>> [...]
>> I would totally support an approach where instead of pr_info, there's a tracepoint
>> for these events (and that shouldnt' need to be conditional on a config option)
>>
>> that's not what the patch does though.
>
> This is a good idea Arjan! We could use trace events or pr_debug() -
> which one do you prefer?
>
I'd go for a trace point to be honest
Hi Guilherme,
On Tue, 19 Jul 2022 16:53:20 -0300
"Guilherme G. Piccoli" <[email protected]> wrote:
> The panic notifiers' callbacks execute in an atomic context, with
> interrupts/preemption disabled, and all CPUs not running the panic
> function are off, so it's very dangerous to wait on a regular
> spinlock, there's a risk of deadlock.
>
> Refactor the panic notifier of parisc/power driver to make use
> of spin_trylock - for that, we've added a second version of the
> soft-power function. Also, some comments were reorganized and
> trailing white spaces, useless header inclusion and blank lines
> were removed.
>
> Cc: "James E.J. Bottomley" <[email protected]>
> Acked-by: Helge Deller <[email protected]> # parisc
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - Added Helge's ACK - thanks!
>
> arch/parisc/include/asm/pdc.h | 1 +
> arch/parisc/kernel/firmware.c | 27 +++++++++++++++++++++++----
> drivers/parisc/power.c | 17 ++++++++++-------
> 3 files changed, 34 insertions(+), 11 deletions(-)
>
> diff --git a/arch/parisc/include/asm/pdc.h
> b/arch/parisc/include/asm/pdc.h index b643092d4b98..7a106008e258
> 100644 --- a/arch/parisc/include/asm/pdc.h
> +++ b/arch/parisc/include/asm/pdc.h
> @@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long
> ftc_bitmap); int pdc_do_reset(void);
> int pdc_soft_power_info(unsigned long *power_reg);
> int pdc_soft_power_button(int sw_control);
> +int pdc_soft_power_button_panic(int sw_control);
> void pdc_io_reset(void);
> void pdc_io_reset_devices(void);
> int pdc_iodc_getc(void);
> diff --git a/arch/parisc/kernel/firmware.c
> b/arch/parisc/kernel/firmware.c index 6a7e315bcc2e..0e2f70b592f4
> 100644 --- a/arch/parisc/kernel/firmware.c
> +++ b/arch/parisc/kernel/firmware.c
> @@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long
> *power_reg) }
>
> /*
> - * pdc_soft_power_button - Control the soft power button behaviour
> - * @sw_control: 0 for hardware control, 1 for software control
> + * pdc_soft_power_button{_panic} - Control the soft power button
> behaviour
> + * @sw_control: 0 for hardware control, 1 for software control
> *
> *
> * This PDC function places the soft power button under software or
> * hardware control.
> - * Under software control the OS may control to when to allow to
> shut
> - * down the system. Under hardware control pressing the power button
> + * Under software control the OS may control to when to allow to shut
> + * down the system. Under hardware control pressing the power button
> * powers off the system immediately.
> + *
> + * The _panic version relies in spin_trylock to prevent deadlock
> + * on panic path.
in => on
> */
> int pdc_soft_power_button(int sw_control)
> {
> @@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
> return retval;
> }
>
> +int pdc_soft_power_button_panic(int sw_control)
> +{
> + int retval;
> + unsigned long flags;
> +
> + if (!spin_trylock_irqsave(&pdc_lock, flags)) {
> + pr_emerg("Couldn't enable soft power button\n");
> + return -EBUSY; /* ignored by the panic notifier */
> + }
> +
> + retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE,
> __pa(pdc_result), sw_control);
> + spin_unlock_irqrestore(&pdc_lock, flags);
> +
> + return retval;
> +}
> +
> /*
> * pdc_io_reset - Hack to avoid overlapping range registers of
> Bridges devices.
> * Primarily a problem on T600 (which parisc-linux doesn't support)
> but diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
> index 456776bd8ee6..8512884de2cf 100644
> --- a/drivers/parisc/power.c
> +++ b/drivers/parisc/power.c
> @@ -37,7 +37,6 @@
> #include <linux/module.h>
> #include <linux/init.h>
> #include <linux/kernel.h>
> -#include <linux/notifier.h>
> #include <linux/panic_notifier.h>
> #include <linux/reboot.h>
> #include <linux/sched/signal.h>
> @@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void
> *x)
>
>
> -/* parisc_panic_event() is called by the panic handler.
> - * As soon as a panic occurs, our tasklets above will not be
> - * executed any longer. This function then re-enables the
> - * soft-power switch and allows the user to switch off the system
> +/*
> + * parisc_panic_event() is called by the panic handler.
> + *
> + * As soon as a panic occurs, our tasklets above will not
> + * be executed any longer. This function then re-enables
> + * the soft-power switch and allows the user to switch off
> + * the system. We rely in pdc_soft_power_button_panic()
> + * since this version spin_trylocks (instead of regular
> + * spinlock), preventing deadlocks on panic path.
> */
> static int parisc_panic_event(struct notifier_block *this,
> unsigned long event, void *ptr)
> {
> /* re-enable the soft-power switch */
> - pdc_soft_power_button(0);
> + pdc_soft_power_button_panic(0);
> return NOTIFY_DONE;
> }
>
> @@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block =
> { .priority = INT_MAX,
> };
>
> -
> static int __init power_init(void)
> {
> unsigned long ret;
Kind regards,
jer
On Tue, 19 Jul 2022 16:53:17 -0300, "Guilherme G. Piccoli" <[email protected]> wrote:
> The panic notifier of this driver is very simple code-wise, just a
> memory write to a special position with some numeric code. But this
> is not clear from the semantic point-of-view, and there is no public
> documentation about that either.
>
> After discussing this in the mailing-lists [0] and having Florian
> explained it very well, document that in the code for the future
> generations asking the same questions. Also, while at it, remove
> a useless header.
>
> [0] https://lore.kernel.org/lkml/[email protected]
>
> Cc: Brian Norris <[email protected]>
> Cc: Doug Berger <[email protected]>
> Cc: Justin Chen <[email protected]>
> Cc: Lee Jones <[email protected]>
> Cc: Markus Mayer <[email protected]>
> Acked-by: Florian Fainelli <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
Applied to https://github.com/Broadcom/stblinux/commits/drivers/next, thanks!
--
Florian
On 19/07/2022 22:43, Jeroen Roovers wrote:
> Hi Guilherme,
> [...]
>> + *
>> + * The _panic version relies in spin_trylock to prevent deadlock
>> + * on panic path.
>
> in => on
>
Hi Jer, thanks for the suggestion!
Helge, do you think you could fix it when applying, if there's no other
issue in the patch?
Thanks,
Guilherme
On 19/07/2022 19:04, Arjan van de Ven wrote:
> On 7/19/2022 2:00 PM, Guilherme G. Piccoli wrote:
>> On 19/07/2022 17:48, Arjan van de Ven wrote:
>>> [...]
>>> I would totally support an approach where instead of pr_info, there's a tracepoint
>>> for these events (and that shouldnt' need to be conditional on a config option)
>>>
>>> that's not what the patch does though.
>>
>> This is a good idea Arjan! We could use trace events or pr_debug() -
>> which one do you prefer?
>>
>
> I'd go for a trace point to be honest
>
ACK, I'll re-work it as a trace event then.
Cheers,
Guilherme
On 20/07/2022 20:00, Florian Fainelli wrote:
> [...]
>
> Applied to https://github.com/Broadcom/stblinux/commits/drivers/next, thanks!
> --
> Florian
Thanks Florian =)
On 7/21/22 15:19, Guilherme G. Piccoli wrote:
> On 19/07/2022 22:43, Jeroen Roovers wrote:
>> Hi Guilherme,
>> [...]
>>> + *
>>> + * The _panic version relies in spin_trylock to prevent deadlock
>>> + * on panic path.
>>
>> in => on
>
> Hi Jer, thanks for the suggestion!
>
> Helge, do you think you could fix it when applying, if there's no other
> issue in the patch?
> Thanks,
Guilherme, I'd really prefer that you push the whole series at once through
some generic tree.
Helge
On 21/07/2022 10:45, Helge Deller wrote:
> [...]
> Guilherme, I'd really prefer that you push the whole series at once through
> some generic tree.
>
> Helge
Hmm..OK.
Some maintainers will take patches from here and merge, but given your
preference I can talk to Andrew to see if the can pick via his tree
(along with the generic panic patches).
Cheers,
Guilherme
From: Guilherme G. Piccoli <[email protected]> Sent: Tuesday, July 19, 2022 12:53 PM
>
> The Hyper-V framebuffer code registers a panic notifier in order
> to try updating its fbdev if the kernel crashed. The notifier
> callback is straightforward, but it calls the vmbus_sendpacket()
> routine eventually, and such function takes a spinlock for the
> ring buffer operations.
>
> Panic path runs in atomic context, with local interrupts and
> preemption disabled, and all secondary CPUs shutdown. That said,
> taking a spinlock might cause a lockup if a secondary CPU was
> disabled with such lock taken. Fix it here by checking if the
> ring buffer spinlock is busy on Hyper-V framebuffer panic notifier;
> if so, bail-out avoiding the potential lockup scenario.
>
> Cc: Andrea Parri (Microsoft) <[email protected]>
> Cc: Dexuan Cui <[email protected]>
> Cc: Haiyang Zhang <[email protected]>
> Cc: "K. Y. Srinivasan" <[email protected]>
> Cc: Michael Kelley <[email protected]>
> Cc: Stephen Hemminger <[email protected]>
> Cc: Tianyu Lan <[email protected]>
> Cc: Wei Liu <[email protected]>
> Tested-by: Fabio A M Martins <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - new patch, based on the discussion in [0].
> [0] https://lore.kernel.org/lkml/[email protected]/
>
> drivers/hv/ring_buffer.c | 16 ++++++++++++++++
> drivers/video/fbdev/hyperv_fb.c | 8 +++++++-
> include/linux/hyperv.h | 2 ++
> 3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 59a4aa86d1f3..9ceb3a7e8d19 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -280,6 +280,22 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info
> *ring_info)
> ring_info->pkt_buffer_size = 0;
> }
>
> +/*
> + * Check if the ring buffer spinlock is available to take or not; used on
> + * atomic contexts, like panic path (see the Hyper-V framebuffer driver).
> + */
> +
> +bool hv_ringbuffer_spinlock_busy(struct vmbus_channel *channel)
> +{
> + struct hv_ring_buffer_info *rinfo = &channel->outbound;
> +
> + if (spin_is_locked(&rinfo->ring_lock))
> + return true;
> +
> + return false;
Could simplify the code as just:
return spin_is_locked(&rinfo->ring_lock);
> +}
> +EXPORT_SYMBOL_GPL(hv_ringbuffer_spinlock_busy);
> +
> /* Write to the ring buffer. */
> int hv_ringbuffer_write(struct vmbus_channel *channel,
> const struct kvec *kv_list, u32 kv_count,
> diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
> index 886c564787f1..e1b65a01fb96 100644
> --- a/drivers/video/fbdev/hyperv_fb.c
> +++ b/drivers/video/fbdev/hyperv_fb.c
> @@ -783,12 +783,18 @@ static void hvfb_ondemand_refresh_throttle(struct
> hvfb_par *par,
> static int hvfb_on_panic(struct notifier_block *nb,
> unsigned long e, void *p)
> {
> + struct hv_device *hdev;
> struct hvfb_par *par;
> struct fb_info *info;
>
> par = container_of(nb, struct hvfb_par, hvfb_panic_nb);
> - par->synchronous_fb = true;
> info = par->info;
> + hdev = device_to_hv_device(info->device);
> +
> + if (hv_ringbuffer_spinlock_busy(hdev->channel))
> + return NOTIFY_DONE;
> +
> + par->synchronous_fb = true;
> if (par->need_docopy)
> hvfb_docopy(par, 0, dio_fb_size);
> synthvid_update(info, 0, 0, INT_MAX, INT_MAX);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 3b42264333ef..646f1da9f27e 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1341,6 +1341,8 @@ struct hv_ring_buffer_debug_info {
> int hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
> struct hv_ring_buffer_debug_info *debug_info);
>
> +bool hv_ringbuffer_spinlock_busy(struct vmbus_channel *channel);
> +
> /* Vmbus interface */
> #define vmbus_driver_register(driver) \
> __vmbus_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
> --
> 2.37.1
From: Guilherme G. Piccoli <[email protected]> Sent: Tuesday, July 19, 2022 12:53 PM
>
> Currently Hyper-V guests are among the most relevant users of the panic
> infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
> both in cleaning-up procedures (closing hypervisor <-> guest connection,
> disabling some paravirtualized timer) as well as to data collection
> (sending panic information to the hypervisor) and framebuffer management.
>
> The thing is: some notifiers are related to others, ordering matters, some
> functionalities are duplicated and there are lots of conditionals behind
> sending panic information to the hypervisor. As part of an effort to
> clean-up the panic notifiers mechanism and better document things, we
> hereby address some of the issues/complexities of Hyper-V panic handling
> through the following changes:
>
> (a) We have die and panic notifiers on vmbus_drv.c and both have goals of
> sending panic information to the hypervisor, though the panic notifier is
> also responsible for a cleaning-up procedure.
>
> This commit clears the code by splitting the panic notifier in two, one
> for closing the vmbus connection whereas the other is only for sending
> panic info to hypervisor. With that, it was possible to merge the die and
> panic notifiers in a single/well-documented function, and clear some
> conditional complexities on sending such information to the hypervisor.
>
> (b) There is a Hyper-V framebuffer panic notifier, which relies in doing
> a vmbus operation that demands a valid connection. So, we must order this
> notifier with the panic notifier from vmbus_drv.c, to guarantee that the
> framebuffer code executes before the vmbus connection is unloaded.
>
> Also, this commit removes a useless header.
>
> Although there is code rework and re-ordering, we expect that this change
> has no functional regressions but instead optimize the path and increase
> panic reliability on Hyper-V. This was tested on Hyper-V with success.
>
> Cc: Andrea Parri (Microsoft) <[email protected]>
> Cc: Dexuan Cui <[email protected]>
> Cc: Haiyang Zhang <[email protected]>
> Cc: "K. Y. Srinivasan" <[email protected]>
> Cc: Michael Kelley <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Stephen Hemminger <[email protected]>
> Cc: Tianyu Lan <[email protected]>
> Cc: Wei Liu <[email protected]>
> Tested-by: Fabio A M Martins <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
>
> V2:
> - Unfortunately we cannot rely in the crash shutdown (custom) handler
> to perform the vmbus unload - arm64 architecture doesn't have this
> "feature" [0]. So, in V2 we kept the notifier behavior and always
> unload the vmbus connection, no matter what - thanks Michael for
> pointing that;
>
> - Removed the Fixes tags as per Michael suggestion;
>
> - As per Petr suggestion, we abandoned the idea of distinguish among
> notifiers using an id - so, in V2 we rely in the old and good address
> comparison for that. Thanks Petr for the enriching discussion!
>
> [0]
> https://lore.kernel.org/lkml/[email protected]/
>
>
> drivers/hv/vmbus_drv.c | 109 +++++++++++++++++++-------------
> drivers/video/fbdev/hyperv_fb.c | 8 +++
> 2 files changed, 74 insertions(+), 43 deletions(-)
>
Reviewed-by: Michael Kelley <[email protected]>
On 25/07/2022 15:09, Michael Kelley (LINUX) wrote:
> [...]
>> +bool hv_ringbuffer_spinlock_busy(struct vmbus_channel *channel)
>> +{
>> + struct hv_ring_buffer_info *rinfo = &channel->outbound;
>> +
>> + if (spin_is_locked(&rinfo->ring_lock))
>> + return true;
>> +
>> + return false;
>
> Could simplify the code as just:
>
> return spin_is_locked(&rinfo->ring_lock);
>
Sure, makes sense! Thanks for the suggestion, I'll do that for V3.
Cheers,
Guilherme
On 07/19/22 at 04:53pm, Guilherme G. Piccoli wrote:
> Currently the tracing dump_on_oops feature is implemented
> through separate notifiers, one for die/oops and the other
> for panic - given they have the same functionality, let's
> unify them.
>
> Also improve the function comment and change the priority of
> the notifier to make it execute earlier, avoiding showing useless
> trace data (like the callback names for the other notifiers);
> finally, we also removed an unnecessary header inclusion.
>
> Cc: Petr Mladek <[email protected]>
> Cc: Sergei Shtylyov <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - Different approach; instead of using IDs to distinguish die and
> panic events, rely on address comparison like other notifiers do
> and as per Petr's suggestion;
>
> - Removed ACK from Steven since the code changed.
>
> kernel/trace/trace.c | 55 ++++++++++++++++++++++----------------------
> 1 file changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index b8dd54627075..2a436b645c70 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -19,7 +19,6 @@
> #include <linux/kallsyms.h>
> #include <linux/security.h>
> #include <linux/seq_file.h>
> -#include <linux/notifier.h>
> #include <linux/irqflags.h>
> #include <linux/debugfs.h>
> #include <linux/tracefs.h>
> @@ -9777,40 +9776,40 @@ static __init int tracer_init_tracefs(void)
>
> fs_initcall(tracer_init_tracefs);
>
> -static int trace_panic_handler(struct notifier_block *this,
> - unsigned long event, void *unused)
> -{
> - if (ftrace_dump_on_oops)
> - ftrace_dump(ftrace_dump_on_oops);
> - return NOTIFY_OK;
> -}
> +static int trace_die_panic_handler(struct notifier_block *self,
> + unsigned long ev, void *unused);
>
> static struct notifier_block trace_panic_notifier = {
> - .notifier_call = trace_panic_handler,
> - .next = NULL,
> - .priority = 150 /* priority: INT_MAX >= x >= 0 */
> + .notifier_call = trace_die_panic_handler,
> + .priority = INT_MAX - 1,
> };
>
> -static int trace_die_handler(struct notifier_block *self,
> - unsigned long val,
> - void *data)
> -{
> - switch (val) {
> - case DIE_OOPS:
> - if (ftrace_dump_on_oops)
> - ftrace_dump(ftrace_dump_on_oops);
> - break;
> - default:
> - break;
> - }
> - return NOTIFY_OK;
> -}
> -
> static struct notifier_block trace_die_notifier = {
> - .notifier_call = trace_die_handler,
> - .priority = 200
> + .notifier_call = trace_die_panic_handler,
> + .priority = INT_MAX - 1,
> };
>
> +/*
> + * The idea is to execute the following die/panic callback early, in order
> + * to avoid showing irrelevant information in the trace (like other panic
> + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> + * warnings get disabled (to prevent potential log flooding).
> + */
> +static int trace_die_panic_handler(struct notifier_block *self,
> + unsigned long ev, void *unused)
> +{
> + if (!ftrace_dump_on_oops)
> + goto out;
> +
> + if (self == &trace_die_notifier && ev != DIE_OOPS)
> + goto out;
Although the switch-case code of original trace_die_handler() is werid,
this unification is not much more comfortable. Just personal feeling
from code style, not strong opinion. Leave it to trace reviewers.
> +
> + ftrace_dump(ftrace_dump_on_oops);
> +
> +out:
> + return NOTIFY_DONE;
> +}
> +
> /*
> * printk is set to max of 1024, we really don't need it that big.
> * Nothing should be printing 1000 characters anyway.
> --
> 2.37.1
>
On 08/03/22 at 05:36pm, Baoquan He wrote:
> On 07/19/22 at 04:53pm, Guilherme G. Piccoli wrote:
> > Currently the tracing dump_on_oops feature is implemented
> > through separate notifiers, one for die/oops and the other
> > for panic - given they have the same functionality, let's
> > unify them.
> >
> > Also improve the function comment and change the priority of
> > the notifier to make it execute earlier, avoiding showing useless
> > trace data (like the callback names for the other notifiers);
> > finally, we also removed an unnecessary header inclusion.
> >
> > Cc: Petr Mladek <[email protected]>
> > Cc: Sergei Shtylyov <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Signed-off-by: Guilherme G. Piccoli <[email protected]>
> >
> > ---
> >
> > V2:
> > - Different approach; instead of using IDs to distinguish die and
> > panic events, rely on address comparison like other notifiers do
> > and as per Petr's suggestion;
> >
> > - Removed ACK from Steven since the code changed.
> >
> > kernel/trace/trace.c | 55 ++++++++++++++++++++++----------------------
> > 1 file changed, 27 insertions(+), 28 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index b8dd54627075..2a436b645c70 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -19,7 +19,6 @@
> > #include <linux/kallsyms.h>
> > #include <linux/security.h>
> > #include <linux/seq_file.h>
> > -#include <linux/notifier.h>
> > #include <linux/irqflags.h>
> > #include <linux/debugfs.h>
> > #include <linux/tracefs.h>
> > @@ -9777,40 +9776,40 @@ static __init int tracer_init_tracefs(void)
> >
> > fs_initcall(tracer_init_tracefs);
> >
> > -static int trace_panic_handler(struct notifier_block *this,
> > - unsigned long event, void *unused)
> > -{
> > - if (ftrace_dump_on_oops)
> > - ftrace_dump(ftrace_dump_on_oops);
> > - return NOTIFY_OK;
> > -}
> > +static int trace_die_panic_handler(struct notifier_block *self,
> > + unsigned long ev, void *unused);
> >
> > static struct notifier_block trace_panic_notifier = {
> > - .notifier_call = trace_panic_handler,
> > - .next = NULL,
> > - .priority = 150 /* priority: INT_MAX >= x >= 0 */
> > + .notifier_call = trace_die_panic_handler,
> > + .priority = INT_MAX - 1,
> > };
> >
> > -static int trace_die_handler(struct notifier_block *self,
> > - unsigned long val,
> > - void *data)
> > -{
> > - switch (val) {
> > - case DIE_OOPS:
> > - if (ftrace_dump_on_oops)
> > - ftrace_dump(ftrace_dump_on_oops);
> > - break;
> > - default:
> > - break;
> > - }
> > - return NOTIFY_OK;
> > -}
> > -
> > static struct notifier_block trace_die_notifier = {
> > - .notifier_call = trace_die_handler,
> > - .priority = 200
> > + .notifier_call = trace_die_panic_handler,
> > + .priority = INT_MAX - 1,
> > };
> >
> > +/*
> > + * The idea is to execute the following die/panic callback early, in order
> > + * to avoid showing irrelevant information in the trace (like other panic
> > + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> > + * warnings get disabled (to prevent potential log flooding).
> > + */
> > +static int trace_die_panic_handler(struct notifier_block *self,
> > + unsigned long ev, void *unused)
> > +{
> > + if (!ftrace_dump_on_oops)
> > + goto out;
> > +
> > + if (self == &trace_die_notifier && ev != DIE_OOPS)
> > + goto out;
>
> Although the switch-case code of original trace_die_handler() is werid,
> this unification is not much more comfortable. Just personal feeling
> from code style, not strong opinion. Leave it to trace reviewers.
Please ignore this comment.
I use b4 to grab this patchset and applied, and started to check patch
one by one. Then I realize it's all about cleanups which have got
consensus in earlier rounds. Hope it can be merged when other people's
concern is addressed, the whole series looks good to me, I have no
strong concern to them.
>
> > +
> > + ftrace_dump(ftrace_dump_on_oops);
> > +
> > +out:
> > + return NOTIFY_DONE;
> > +}
> > +
> > /*
> > * printk is set to max of 1024, we really don't need it that big.
> > * Nothing should be printing 1000 characters anyway.
> > --
> > 2.37.1
> >
>
On 03/08/2022 06:52, Baoquan He wrote:
> [...]
>>
>> Although the switch-case code of original trace_die_handler() is werid,
>> this unification is not much more comfortable. Just personal feeling
>> from code style, not strong opinion. Leave it to trace reviewers.
>
> Please ignore this comment.
>
> I use b4 to grab this patchset and applied, and started to check patch
> one by one. Then I realize it's all about cleanups which have got
> consensus in earlier rounds. Hope it can be merged when other people's
> concern is addressed, the whole series looks good to me, I have no
> strong concern to them.
>
Thanks a lot for your reviews Baoquan, much appreciated =)
On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> Currently the panic notifiers from user mode linux don't follow
> the convention for most of the other notifiers present in the
> kernel (indentation, priority setting, numeric return).
> More important, the priorities could be improved, since it's a
> special case (userspace), hence we could run the notifiers earlier;
> user mode linux shouldn't care much with other panic notifiers but
> the ordering among the mconsole and arch notifier is important,
> given that the arch one effectively triggers a core dump.
>
> Fix that by running the mconsole notifier as the first panic
> notifier, followed by the architecture one (that coredumps).
>
> Cc: Anton Ivanov <[email protected]>
> Cc: Johannes Berg <[email protected]>
> Cc: Richard Weinberger <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - Kept the notifier header to avoid implicit usage - thanks
> Johannes for the suggestion!
>
> arch/um/drivers/mconsole_kern.c | 7 +++----
> arch/um/kernel/um_arch.c | 8 ++++----
> 2 files changed, 7 insertions(+), 8 deletions(-)
> [...]
Hi Johannes, do you feel this one is good now, after your last review?
Thanks in advance,
Guilherme
On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. IRQs are architecturally masked when we
> take an interrupt, but FIQs are high priority than IRQs, hence they
> aren't masked. With that said, it makes sense to disable FIQs here,
> but there's no need for (re-)disabling IRQs.
>
> More than that: there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), which is used for
> kexec/panic path. This function relies on a SMP call that also
> triggers a busy-wait loop [at machine_crash_nonpanic_core()], but
> without disabling FIQs. This might lead to odd scenarios, like
> early interrupts in the boot of kexec'd kernel or even interrupts
> in secondary "disabled" CPUs while the main one still works in the
> panic path and assumes all secondary CPUs are (really!) off.
>
> So, let's disable FIQs in both paths and *not* disable IRQs a second
> time, since they are already masked in both paths by the architecture.
> This way, we keep both CPU quiesce paths consistent and safe.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Michael Kelley <[email protected]>
> Cc: Russell King <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - Small wording improvement (thanks Michael Kelley);
> - Only disable FIQs, since IRQs are masked by architecture
> definition when we take an interrupt. Thanks a lot Russell
> and Marc for the discussion [0].
>
> Should we add a Fixes tag here? If so, maybe the proper target is:
> b23065313297 ("ARM: 6522/1: kexec: Add call to non-crashing cores through IPI")
>
> [0] https://lore.kernel.org/lkml/[email protected]/
>
> arch/arm/kernel/machine_kexec.c | 2 ++
> arch/arm/kernel/smp.c | 5 ++---
> 2 files changed, 4 insertions(+), 3 deletions(-)
> [...]
Hi Mark / Russell, do you think this one is good enough or is there room
for improvement?
Appreciate the reviews!
Cheers,
Guilherme
On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> Currently the tracing dump_on_oops feature is implemented
> through separate notifiers, one for die/oops and the other
> for panic - given they have the same functionality, let's
> unify them.
>
> Also improve the function comment and change the priority of
> the notifier to make it execute earlier, avoiding showing useless
> trace data (like the callback names for the other notifiers);
> finally, we also removed an unnecessary header inclusion.
>
> Cc: Petr Mladek <[email protected]>
> Cc: Sergei Shtylyov <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - Different approach; instead of using IDs to distinguish die and
> panic events, rely on address comparison like other notifiers do
> and as per Petr's suggestion;
>
> - Removed ACK from Steven since the code changed.
>
> kernel/trace/trace.c | 55 ++++++++++++++++++++++----------------------
> 1 file changed, 27 insertions(+), 28 deletions(-)
> [...]
Hi Sergei / Steve, do you think this version is good now, after your
last round of reviews?
Thanks,
Guilherme
On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
>
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
>
> Cc: Dinh Nguyen <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Tony Luck <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - new patch, based on the discussion in [0].
> [0] https://lore.kernel.org/lkml/[email protected]/
>
> drivers/edac/altera_edac.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
> [...]
Hey Tony / Dinh, do you think this patch is good, based on the
discussion we had [0] last iteration?
Thanks in advance,
Guilherme
[0]
https://lore.kernel.org/lkml/[email protected]/
On Sun, 2022-08-07 at 12:40 -0300, Guilherme G. Piccoli wrote:
> On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> > Currently the panic notifiers from user mode linux don't follow
> > the convention for most of the other notifiers present in the
> > kernel (indentation, priority setting, numeric return).
> > More important, the priorities could be improved, since it's a
> > special case (userspace), hence we could run the notifiers earlier;
> > user mode linux shouldn't care much with other panic notifiers but
> > the ordering among the mconsole and arch notifier is important,
> > given that the arch one effectively triggers a core dump.
> >
> > Fix that by running the mconsole notifier as the first panic
> > notifier, followed by the architecture one (that coredumps).
> >
> > Cc: Anton Ivanov <[email protected]>
> > Cc: Johannes Berg <[email protected]>
> > Cc: Richard Weinberger <[email protected]>
> > Signed-off-by: Guilherme G. Piccoli <[email protected]>
> >
> > ---
> >
> > V2:
> > - Kept the notifier header to avoid implicit usage - thanks
> > Johannes for the suggestion!
> >
> > arch/um/drivers/mconsole_kern.c | 7 +++----
> > arch/um/kernel/um_arch.c | 8 ++++----
> > 2 files changed, 7 insertions(+), 8 deletions(-)
> > [...]
>
> Hi Johannes, do you feel this one is good now, after your last review?
> Thanks in advance,
>
Yeah, no objections, my previous comment was just a minor almost style
issue anyway.
johannes
On 09/08/2022 15:09, Johannes Berg wrote:
> [...]
>>> V2:
>>> - Kept the notifier header to avoid implicit usage - thanks
>>> Johannes for the suggestion!
>>>
>>> arch/um/drivers/mconsole_kern.c | 7 +++----
>>> arch/um/kernel/um_arch.c | 8 ++++----
>>> 2 files changed, 7 insertions(+), 8 deletions(-)
>>> [...]
>>
>> Hi Johannes, do you feel this one is good now, after your last review?
>> Thanks in advance,
>>
>
> Yeah, no objections, my previous comment was just a minor almost style
> issue anyway.
>
> johannes
Perfect, thank you! Let me take the opportunity to ask you something I'm
asking all the maintainers involved here - do you prefer taking the
patch through your tree, or to get it landed with the whole series, at
once, from some maintainer?
Cheers!
On Tue, 2022-08-09 at 16:03 -0300, Guilherme G. Piccoli wrote:
> On 09/08/2022 15:09, Johannes Berg wrote:
> > [...]
> > > > V2:
> > > > - Kept the notifier header to avoid implicit usage - thanks
> > > > Johannes for the suggestion!
> > > >
> > > > arch/um/drivers/mconsole_kern.c | 7 +++----
> > > > arch/um/kernel/um_arch.c | 8 ++++----
> > > > 2 files changed, 7 insertions(+), 8 deletions(-)
> > > > [...]
> > >
> > > Hi Johannes, do you feel this one is good now, after your last review?
> > > Thanks in advance,
> > >
> >
> > Yeah, no objections, my previous comment was just a minor almost style
> > issue anyway.
> >
> > johannes
>
> Perfect, thank you! Let me take the opportunity to ask you something I'm
> asking all the maintainers involved here - do you prefer taking the
> patch through your tree, or to get it landed with the whole series, at
> once, from some maintainer?
>
Hm. I don't think we'd really care, but so far I was thinking - since
it's a series - it'd go through some appropriate tree all together. If
you think it should be applied separately, let us know.
johannes
On 09/08/2022 16:08, Johannes Berg wrote:
> [...]
>> Perfect, thank you! Let me take the opportunity to ask you something I'm
>> asking all the maintainers involved here - do you prefer taking the
>> patch through your tree, or to get it landed with the whole series, at
>> once, from some maintainer?
>>
> Hm. I don't think we'd really care, but so far I was thinking - since
> it's a series - it'd go through some appropriate tree all together. If
> you think it should be applied separately, let us know.
>
> johannes
For me, it would be easier if maintainers pick the patches into their
-next/-fixes trees when they think the patch is good enough, but some
maintainers complained that prefer the whole series approach (and some
others are already taking the patches into their trees).
Given that, in case you do have a linux-um tree and feel OK with that, I
appreciate if you merge it, so I can remove the patch in next iteration.
If you prefer the whole series approach, OK as well, your call =)
Thanks,
Guilherme
On Tue, 19 Jul 2022 16:53:21 -0300
"Guilherme G. Piccoli" <[email protected]> wrote:
Sorry for the late review, but this fell to the bottom of my queue :-/
> +/*
> + * The idea is to execute the following die/panic callback early, in order
> + * to avoid showing irrelevant information in the trace (like other panic
> + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> + * warnings get disabled (to prevent potential log flooding).
> + */
> +static int trace_die_panic_handler(struct notifier_block *self,
> + unsigned long ev, void *unused)
> +{
> + if (!ftrace_dump_on_oops)
> + goto out;
> +
> + if (self == &trace_die_notifier && ev != DIE_OOPS)
> + goto out;
I really hate gotos that are not for clean ups.
> +
> + ftrace_dump(ftrace_dump_on_oops);
> +
> +out:
> + return NOTIFY_DONE;
> +}
> +
Just do:
static int trace_die_panic_handler(struct notifier_block *self,
unsigned long ev, void *unused)
{
if (!ftrace_dump_on_oops)
return NOTIFY_DONE;
/* The die notifier requires DIE_OOPS to trigger */
if (self == &trace_die_notifier && ev != DIE_OOPS)
return NOTIFY_DONE;
ftrace_dump(ftrace_dump_on_oops);
return NOTIFY_DONE;
}
Thanks,
Other than that, Acked-by: Steven Rostedt (Google) <[email protected]>
-- Steve
On Tue, Aug 16, 2022 at 10:14:45AM -0400, Steven Rostedt wrote:
> On Tue, 19 Jul 2022 16:53:21 -0300
> "Guilherme G. Piccoli" <[email protected]> wrote:
>
>
> Sorry for the late review, but this fell to the bottom of my queue :-/
>
> > +/*
> > + * The idea is to execute the following die/panic callback early, in order
> > + * to avoid showing irrelevant information in the trace (like other panic
> > + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> > + * warnings get disabled (to prevent potential log flooding).
> > + */
> > +static int trace_die_panic_handler(struct notifier_block *self,
> > + unsigned long ev, void *unused)
> > +{
> > + if (!ftrace_dump_on_oops)
> > + goto out;
> > +
> > + if (self == &trace_die_notifier && ev != DIE_OOPS)
> > + goto out;
>
> I really hate gotos that are not for clean ups.
>
> > +
> > + ftrace_dump(ftrace_dump_on_oops);
> > +
> > +out:
> > + return NOTIFY_DONE;
> > +}
> > +
>
> Just do:
>
> static int trace_die_panic_handler(struct notifier_block *self,
> unsigned long ev, void *unused)
> {
> if (!ftrace_dump_on_oops)
> return NOTIFY_DONE;
>
> /* The die notifier requires DIE_OOPS to trigger */
> if (self == &trace_die_notifier && ev != DIE_OOPS)
> return NOTIFY_DONE;
>
> ftrace_dump(ftrace_dump_on_oops);
>
> return NOTIFY_DONE;
> }
Or better yet:
if (ftrace_dump_on_oops) {
/* The die notifier requires DIE_OOPS to trigger */
if (self != &trace_die_notifier || ev == DIE_OOPS)
ftrace_dump(ftrace_dump_on_oops);
}
return NOTIFY_DONE;
Alan Stern
> Thanks,
>
> Other than that, Acked-by: Steven Rostedt (Google) <[email protected]>
>
> -- Steve
On Tue, 16 Aug 2022 10:57:20 -0400
Alan Stern <[email protected]> wrote:
> > static int trace_die_panic_handler(struct notifier_block *self,
> > unsigned long ev, void *unused)
> > {
> > if (!ftrace_dump_on_oops)
> > return NOTIFY_DONE;
> >
> > /* The die notifier requires DIE_OOPS to trigger */
> > if (self == &trace_die_notifier && ev != DIE_OOPS)
> > return NOTIFY_DONE;
> >
> > ftrace_dump(ftrace_dump_on_oops);
> >
> > return NOTIFY_DONE;
> > }
>
> Or better yet:
>
> if (ftrace_dump_on_oops) {
>
> /* The die notifier requires DIE_OOPS to trigger */
> if (self != &trace_die_notifier || ev == DIE_OOPS)
> ftrace_dump(ftrace_dump_on_oops);
> }
> return NOTIFY_DONE;
>
That may be more consolidated but less easy to read and follow. This is far
from a fast path.
As I maintain this bike-shed, I prefer the one I suggested ;-)
-- Steve
On 7/19/22 14:53, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
>
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
>
> Cc: Dinh Nguyen <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Tony Luck <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> ---
>
> V2:
> - new patch, based on the discussion in [0].
> [0] https://lore.kernel.org/lkml/[email protected]/
>
Acked-by: Dinh Nguyen <[email protected]>
On 16/08/2022 15:44, Dinh Nguyen wrote:
> [...]
>> V2:
>> - new patch, based on the discussion in [0].
>> [0] https://lore.kernel.org/lkml/[email protected]/
>>
>
> Acked-by: Dinh Nguyen <[email protected]>
Thanks a lot Dinh!
There is something I'm asking for maintainers on patches in this series,
so here it goes for you (CC Borislav / Tony): do you think it's possible
to pick this one in your tree directly, from this series, or do you
prefer I re-submit as a standalone patch, on linux-edac maybe?
Thanks,
Guilherme
On 16/08/2022 12:52, Steven Rostedt wrote:
> On Tue, 16 Aug 2022 10:57:20 -0400
> Alan Stern <[email protected]> wrote:
>
>>> static int trace_die_panic_handler(struct notifier_block *self,
>>> unsigned long ev, void *unused)
>>> {
>>> if (!ftrace_dump_on_oops)
>>> return NOTIFY_DONE;
>>>
>>> /* The die notifier requires DIE_OOPS to trigger */
>>> if (self == &trace_die_notifier && ev != DIE_OOPS)
>>> return NOTIFY_DONE;
>>>
>>> ftrace_dump(ftrace_dump_on_oops);
>>>
>>> return NOTIFY_DONE;
>>> }
>>
>> Or better yet:
>>
>> if (ftrace_dump_on_oops) {
>>
>> /* The die notifier requires DIE_OOPS to trigger */
>> if (self != &trace_die_notifier || ev == DIE_OOPS)
>> ftrace_dump(ftrace_dump_on_oops);
>> }
>> return NOTIFY_DONE;
>>
>
> That may be more consolidated but less easy to read and follow. This is far
> from a fast path.
>
> As I maintain this bike-shed, I prefer the one I suggested ;-)
>
> -- Steve
Perfect Steve and Alan, appreciate your suggestions!
I'll submit V3 using your change Steve - honestly, I'm not sure why in
the heck I put a goto there, yours is basically the same code, modulo
the goto heheh
A braino from me, for sure!
Cheers,
Guilherme
On Tue, Jul 19, 2022 at 04:53:23PM -0300, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
>
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
How does the fact that kdump is loaded, obviate the need to print
information about the errors?
Are you suggesting that people who have the whole vmcore would be able
to piece together the error information?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 17/08/2022 14:31, Borislav Petkov wrote:
> [...]
>
> How does the fact that kdump is loaded, obviate the need to print
> information about the errors?
>
> Are you suggesting that people who have the whole vmcore would be able
> to piece together the error information?
>
Hi Boris, thanks for chiming in.
So, this is part of an effort to clean-up the panic path. Currently, if
a kdump happens, *all* the panic notifiers are skipped by default,
including this one. In this scenario, this patch seems like a no-op.
But happens that in the refactor we are proposing [0], some notifiers
should run before the kdump. We are basically putting some ordering in
the way notifiers are executed, while documenting this properly and with
the goal of not increasing the failure risk for kdump.
This patch is useful so we can bring the altera EDAC notifier to run
earlier while not increasing the risk on kdump - this operation is a bit
"delicate" to happen in the panic scenario. The origin of this patch was
a discussion with Tony/Peter [1], guess we can call it a "compromise
solution".
Let me know if you disagree or have more questions, and in case you'd
like to check/participate in the whole panic notifiers refactor
discussion, it would be great =)
Cheers,
Guilherme
[0]
https://lore.kernel.org/lkml/[email protected]/
[1]
https://lore.kernel.org/lkml/[email protected]/
On Wed, Aug 17, 2022 at 03:45:30PM -0300, Guilherme G. Piccoli wrote:
> But happens that in the refactor we are proposing [0], some notifiers
> should run before the kdump. We are basically putting some ordering in
> the way notifiers are executed, while documenting this properly and with
> the goal of not increasing the failure risk for kdump.
What is "the failure risk for kdump"?
Some of the notifiers which run before kdump might fail and thus prevent
the machine from kdumping?
> This patch is useful so we can bring the altera EDAC notifier to run
> earlier while not increasing the risk on kdump - this operation is a bit
> "delicate" to happen in the panic scenario. The origin of this patch was
> a discussion with Tony/Peter [1], guess we can call it a "compromise
> solution".
My question stands: if kdump is loaded and the s10_edac_dberr_handler()
does not read the the fatal errors and they don't get shown in dmesg
before the machine panics, how do you intend to show that information to
the user?
Because fatal errors are something you absolutely wanna show, at least,
in dmesg!
I don't think you can "read" the errors from vmcore - they need to be
read from the hw registers before the machine dies.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 17/08/2022 16:34, Borislav Petkov wrote:
> [...]
>
> What is "the failure risk for kdump"?
>
> Some of the notifiers which run before kdump might fail and thus prevent
> the machine from kdumping?
>
Exactly; some notifiers could break the machine and prevent a successful
kdump. The EDAC one is consider medium risk, due to invasive operations
(register readings on panic situation).
> [...]
> My question stands: if kdump is loaded and the s10_edac_dberr_handler()
> does not read the the fatal errors and they don't get shown in dmesg
> before the machine panics, how do you intend to show that information to
> the user?
>
> Because fatal errors are something you absolutely wanna show, at least,
> in dmesg!
>
> I don't think you can "read" the errors from vmcore - they need to be
> read from the hw registers before the machine dies.
>
My understanding is the same as yours, i.e., this is not possible to
collect from vmcore, it requires register reading. But again: if you
kdump your machine today, you won't collect this information, patch
changed nothing in that regard.
The one thing it changes is that you'd skip the altera register dump if
kdump is set AND you managed to also set "crash_kexec_post_notifiers".
In case you / Dinh / Tony disagrees with the patch, it's fine and we can
discard it, but then this notifier couldn't run early in the refactor we
are doing, it'd postponed to run later. This are is full of trade-offs,
we just need to choose what compromise solution is preferred by the
majority of developers =)
Cheers,
Guilherme
On Wed, Aug 17, 2022 at 05:28:34PM -0300, Guilherme G. Piccoli wrote:
> My understanding is the same as yours, i.e., this is not possible to
> collect from vmcore, it requires register reading. But again: if you
> kdump your machine today, you won't collect this information, patch
> changed nothing in that regard.
Why won't you be able to collect it? You can certainly access dmesg in
the vmcore and see those errors logged there.
> The one thing it changes is that you'd skip the altera register dump if
> kdump is set AND you managed to also set "crash_kexec_post_notifiers".
What your patch changes is, it prevents s10_edac_dberr_handler() from
logging potentially important fatal hw errors when kdump is loaded.
If Dinh is fine with that, I'll take the patch. But it looks like a bad
idea to me.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Aug 17, 2022 at 06:39:07PM -0300, Guilherme G. Piccoli wrote:
> Sorry for the confusion, let me try to be a bit more clear:
I think you're missing the point. Lemme try again:
You *absolutely* must log those errors because they're important. It
doesn't matter if this is done in a panic notifier and you're changing
that whole shebang or through some other magic.
If you do, then this driver needs to *still* *log* those fatal errors -
regardless through a panic notifier or some novel contraption it wants
to use.
So if you want to change the panic notifiers, you *must* make sure those
errors still do get logged.
Better?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 17/08/2022 18:02, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 05:28:34PM -0300, Guilherme G. Piccoli wrote:
>> My understanding is the same as yours, i.e., this is not possible to
>> collect from vmcore, it requires register reading. But again: if you
>> kdump your machine today, you won't collect this information, patch
>> changed nothing in that regard.
>
> Why won't you be able to collect it? You can certainly access dmesg in
> the vmcore and see those errors logged there.
Sorry for the confusion, let me try to be a bit more clear:
(1) if we kdump but we *didn't run* s10_edac_dberr_handler() before
kdump, the information is lost, since s10_edac_dberr_handler() performs
register readings. That information is not contained inside the vmcore.
(2) If for some reason the function s10_edac_dberr_handler() *was
executed prior to kdump*, of course the registers information would be
on dmesg, easy to collect in the vmcore.
Makes sense?
>
>> The one thing it changes is that you'd skip the altera register dump if
>> kdump is set AND you managed to also set "crash_kexec_post_notifiers".
>
> What your patch changes is, it prevents s10_edac_dberr_handler() from
> logging potentially important fatal hw errors when kdump is loaded.
Agreed. If kdump is loaded, we cannot log that information (modulo that
we do not collect it today by default on kdump as well).
The other part of story (the reason of the patch) is that we plan to
start running this panic notifier a bit earlier, being able to collect
such edac logs with pstore, for example.
>
> If Dinh is fine with that, I'll take the patch. But it looks like a bad
> idea to me.
>
I think we should seek what the majority of the folks consider the best,
in order to converge to some well-accepted solution. I'm completely OK
in dropping this one and rework with some other idea, or we can leave
this panic notifier as is, continue running that a bit later.
Tony / Petr (when back), suggestions are welcome =)
Cheers,
Guilherme
On Wed, Aug 17, 2022 at 06:56:11PM -0300, Guilherme G. Piccoli wrote:
> But do you agree that currently, in case of a kdump, that information
> *is not collected*, with our without my patch?
If for some reason that panic notifier does not get run before kdump,
then that's a bug and that driver should not use a panic notifier to log
hw errors in the first place.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Aug 17, 2022 at 07:09:26PM -0300, Guilherme G. Piccoli wrote:
> Again - a matter of a trade-off, a good compromise must be agreed by all
> parties (kdump maintainers are usually extremely afraid of taking risks
> to not break kdump).
Right, and logging hw errors from a panic notifier feels simply weird.
x86 has its own, special notifier exactly for that and it is independent
from the panic path but it gets run right after the exception raised due
to the hw error is done.
Dunno if ARM has such facilities tho...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 17/08/2022 19:00, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 06:56:11PM -0300, Guilherme G. Piccoli wrote:
>> But do you agree that currently, in case of a kdump, that information
>> *is not collected*, with our without my patch?
>
> If for some reason that panic notifier does not get run before kdump,
> then that's a bug and that driver should not use a panic notifier to log
> hw errors in the first place.
>
Indeed, the notifiers don't run before kdump by default, in a conscious
decision of the kdump maintainers.
You might be right, in the sense that maybe the edac error handler
shouldn't run as a panic notifier. Let's see if Tony / Dinh can chime in
on that discussion - we could move it to run in the kexec event as well,
so it'd always run before a kdump, but maybe the risk it offers during
panic time is not worth.
Again - a matter of a trade-off, a good compromise must be agreed by all
parties (kdump maintainers are usually extremely afraid of taking risks
to not break kdump).
Cheers!
On 17/08/2022 18:46, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 06:39:07PM -0300, Guilherme G. Piccoli wrote:
>> Sorry for the confusion, let me try to be a bit more clear:
>
> I think you're missing the point. Lemme try again:
>
> You *absolutely* must log those errors because they're important. It
> doesn't matter if this is done in a panic notifier and you're changing
> that whole shebang or through some other magic.
>
> If you do, then this driver needs to *still* *log* those fatal errors -
> regardless through a panic notifier or some novel contraption it wants
> to use.
>
> So if you want to change the panic notifiers, you *must* make sure those
> errors still do get logged.
>
> Better?
>
Boris, I understand the importance of the logs, for sure!
But do you agree that currently, in case of a kdump, that information
*is not collected*, with our without my patch?
On 17/08/2022 19:19, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 07:09:26PM -0300, Guilherme G. Piccoli wrote:
>> Again - a matter of a trade-off, a good compromise must be agreed by all
>> parties (kdump maintainers are usually extremely afraid of taking risks
>> to not break kdump).
>
> Right, and logging hw errors from a panic notifier feels simply weird.
>
> x86 has its own, special notifier exactly for that and it is independent
> from the panic path but it gets run right after the exception raised due
> to the hw error is done.
>
> Dunno if ARM has such facilities tho...
>
> Thx.
>
Yeah, MCE stuff right? Not sure for ARM and other architectures as well.
Cheers,
Guilherme