2022-04-28 05:48:31

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 00/30] The panic notifiers refactor

Hey folks, this is an attempt to improve/refactor the dated panic notifiers
infrastructure. This is strongly based in a suggestion made by Pter Mladek [0]
some time ago, and it's finally ready. Below I'll detail the patch ordering,
testing made, etc.
First, a bit about the reason behind this.

The panic notifiers list is an infrastructure that allows callbacks to execute
during panic time. Happens that anybody can add functions there, no ordering
is enforced (by default) and the decision to execute or not such notifiers
before kdump may lead to high risk of failure in crash scenarios - default is
not to execute any of them. There is a parameter acting as a switch for that.
But some architectures require some notifiers, so..it's messy.

The suggestion from Petr came after a patch submission to add a notifiers
filter, allowing the notifiers selection by function name, which was welcomed
by some people, but not by Petr, which claimed the code should indeed have a
refactor - and it made a lot of sense, his suggestion makes code more clear
and reliable.

So, this series might be split in 3 portions:

Part 1: the first 18 patches are mostly fixes (one or two might be considered
improvements), mostly replacing spinlocks/mutexes with safer alternatives for
atomic contexts, like spin_trylock, etc. We also focused on commenting
everything that is possible and clean-up code.

Part 2, the core: patches 19-25 are the main refactor, which splits the panic
notifiers list in three, introduce the concept of panic notifier level and
clean-up and highly comment the code, effectively leading to a more reliable
and clear, yet highly customizable panic path.

Part 3: The remaining 5 patches are fixes that _require the main refactor_
patches, they don't make sense without the core changes - but again, these are
small fixes and not part of the main goal of refactoring the panic code.

I've tried my best to make the patches the more "bisectable" as possible, so
they tend to be self-contained and easy to backport (specially patches from
part 1). Notice that the series is *based on 5.18-rc4* - usually a refactor
like this would be based on linux-next, but since we have many fixes in the
series, I kept it based on mainline tree. Of course I could change that in a
subsequent iteration, if desired.

Since this touches multiple architectures and drivers, it's very difficult to
test it really (by executing all touched code). So, my tests split in two
approaches: build tests and real tests, that involves panic triggering with
and without kdump, changing panic notifiers level, etc.

Build tests (using cross-compilers): alpha, arm, arm64, mips (sgi 22 and 32),
parisc, s390, sparc, um, x86_64 (couldn't get a functional xtensa cross
compiler).

Real/full tests: x86_64 (Hyper-V and QEMU guests) + PowerPC (pseries guest).

Here is the link with the .config files used: https://people.igalia.com/gpiccoli/panic_notifiers_configs/
(tried my best to build all the affected code).

Finally, a bit about my CCing strategy: I've included everybody present in the
original thread [0] plus some maintainers and other interested parties as CC
in the full series. But the patches have individual CC lists, for people that
are definitely related to them but might not care much for the whole series;
nevertheless, _everybody_ mentioned at least once in some patch is CCed in this
cover-letter. Hopefully I didn't forget to include anybody - all the mailing
lists were CCed in the whole series. Apologies in advance if (a) you received
emails you didn't want to or, (b) I forgot to include you but it was something
considered interesting by you.

Thanks in advance for reviews / comments / suggestions!
Cheers,

Guilherme

[0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

Guilherme G. Piccoli (30):
x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
notifier: Add panic notifiers info and purge trailing whitespaces
firmware: google: Convert regular spinlock into trylock on panic path
misc/pvpanic: Convert regular spinlock into trylock on panic path
soc: bcm: brcmstb: Document panic notifier action and remove useless header
mips: ip22: Reword PANICED to PANICKED and remove useless header
powerpc/setup: Refactor/untangle panic notifiers
coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
alpha: Clean-up the panic notifier code
um: Improve panic notifiers consistency and ordering
parisc: Replace regular spinlock with spin_trylock on panic path
s390/consoles: Improve panic notifiers reliability
panic: Properly identify the panic event to the notifiers' callbacks
bus: brcmstb_gisb: Clean-up panic/die notifiers
drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
tracing: Improve panic/die notifiers
notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
panic: Add the panic hypervisor notifier list
panic: Add the panic informational notifier list
panic: Introduce the panic pre-reboot notifier list
panic: Introduce the panic post-reboot notifier list
printk: kmsg_dump: Introduce helper to inform number of dumpers
panic: Refactor the panic path
panic, printk: Add console flush parameter and convert panic_print to a notifier
Drivers: hv: Do not force all panic notifiers to execute before kdump
powerpc: Do not force all panic notifiers to execute before kdump
panic: Unexport crash_kexec_post_notifiers
powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic
um: Avoid duplicate call to kmsg_dump()

.../admin-guide/kernel-parameters.txt | 54 ++-
Documentation/admin-guide/sysctl/kernel.rst | 5 +-
arch/alpha/kernel/setup.c | 40 +--
arch/arm/kernel/machine_kexec.c | 3 +
arch/arm64/kernel/setup.c | 2 +-
arch/mips/kernel/relocate.c | 2 +-
arch/mips/sgi-ip22/ip22-reset.c | 13 +-
arch/mips/sgi-ip32/ip32-reset.c | 3 +-
arch/parisc/include/asm/pdc.h | 1 +
arch/parisc/kernel/firmware.c | 27 +-
arch/parisc/kernel/pdc_chassis.c | 3 +-
arch/powerpc/include/asm/bug.h | 2 +-
arch/powerpc/kernel/fadump.c | 8 -
arch/powerpc/kernel/setup-common.c | 76 ++--
arch/powerpc/kernel/traps.c | 6 +-
arch/powerpc/platforms/powernv/opal.c | 2 +-
arch/powerpc/platforms/ps3/setup.c | 2 +-
arch/powerpc/platforms/pseries/setup.c | 2 +-
arch/s390/kernel/ipl.c | 4 +-
arch/s390/kernel/setup.c | 19 +-
arch/sparc/kernel/setup_32.c | 27 +-
arch/sparc/kernel/setup_64.c | 29 +-
arch/sparc/kernel/sstate.c | 3 +-
arch/um/drivers/mconsole_kern.c | 10 +-
arch/um/kernel/um_arch.c | 11 +-
arch/x86/include/asm/cpu.h | 1 +
arch/x86/kernel/crash.c | 8 +-
arch/x86/kernel/reboot.c | 14 +-
arch/x86/kernel/setup.c | 2 +-
arch/x86/xen/enlighten.c | 2 +-
arch/xtensa/platforms/iss/setup.c | 4 +-
drivers/bus/brcmstb_gisb.c | 28 +-
drivers/char/ipmi/ipmi_msghandler.c | 12 +-
drivers/edac/altera_edac.c | 3 +-
drivers/firmware/google/gsmi.c | 10 +-
drivers/hv/hv_common.c | 12 -
drivers/hv/vmbus_drv.c | 113 +++---
.../hwtracing/coresight/coresight-cpu-debug.c | 11 +-
drivers/leds/trigger/ledtrig-activity.c | 4 +-
drivers/leds/trigger/ledtrig-heartbeat.c | 4 +-
drivers/leds/trigger/ledtrig-panic.c | 3 +-
drivers/misc/bcm-vk/bcm_vk_dev.c | 6 +-
drivers/misc/ibmasm/heartbeat.c | 16 +-
drivers/misc/pvpanic/pvpanic.c | 14 +-
drivers/net/ipa/ipa_smp2p.c | 5 +-
drivers/parisc/power.c | 21 +-
drivers/power/reset/ltc2952-poweroff.c | 4 +-
drivers/remoteproc/remoteproc_core.c | 6 +-
drivers/s390/char/con3215.c | 38 +-
drivers/s390/char/con3270.c | 36 +-
drivers/s390/char/raw3270.c | 18 +
drivers/s390/char/raw3270.h | 1 +
drivers/s390/char/sclp_con.c | 30 +-
drivers/s390/char/sclp_vt220.c | 44 +--
drivers/s390/char/zcore.c | 5 +-
drivers/soc/bcm/brcmstb/pm/pm-arm.c | 18 +-
drivers/soc/tegra/ari-tegra186.c | 3 +-
drivers/staging/olpc_dcon/olpc_dcon.c | 6 +-
drivers/video/fbdev/hyperv_fb.c | 12 +-
include/linux/console.h | 2 +
include/linux/kmsg_dump.h | 7 +
include/linux/notifier.h | 8 +-
include/linux/panic.h | 3 -
include/linux/panic_notifier.h | 12 +-
include/linux/printk.h | 1 +
kernel/hung_task.c | 3 +-
kernel/kexec_core.c | 8 +-
kernel/notifier.c | 48 ++-
kernel/panic.c | 335 +++++++++++-------
kernel/printk/printk.c | 76 ++++
kernel/rcu/tree.c | 1 -
kernel/rcu/tree_stall.h | 3 +-
kernel/trace/trace.c | 59 +--
.../selftests/pstore/pstore_crash_test | 5 +-
74 files changed, 953 insertions(+), 486 deletions(-)

--
2.36.0


2022-04-28 05:59:09

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

The panic notifier infrastructure executes registered callbacks when
a panic event happens - such callbacks are executed in atomic context,
with interrupts and preemption disabled in the running CPU and all other
CPUs disabled. That said, mutexes in such context are not a good idea.

This patch replaces a regular mutex with a mutex_trylock safer approach;
given the nature of the mutex used in the driver, it should be pretty
uncommon being unable to acquire such mutex in the panic path, hence
no functional change should be observed (and if it is, that would be
likely a deadlock with the regular mutex).

Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
drivers/hwtracing/coresight/coresight-cpu-debug.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 8845ec4b4402..1874df7c6a73 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -380,9 +380,10 @@ static int debug_notifier_call(struct notifier_block *self,
int cpu;
struct debug_drvdata *drvdata;

- mutex_lock(&debug_lock);
+ /* Bail out if we can't acquire the mutex or the functionality is off */
+ if (!mutex_trylock(&debug_lock))
+ return NOTIFY_DONE;

- /* Bail out if the functionality is disabled */
if (!debug_enable)
goto skip_dump;

@@ -401,7 +402,7 @@ static int debug_notifier_call(struct notifier_block *self,

skip_dump:
mutex_unlock(&debug_lock);
- return 0;
+ return NOTIFY_DONE;
}

static struct notifier_block debug_notifier = {
--
2.36.0

2022-04-28 06:24:51

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

Currently we don't have a way to check if there are dumpers set,
except counting the list members maybe. This patch introduces a very
simple helper to provide this information, by just keeping track of
registered/unregistered kmsg dumpers. It's going to be used on the
panic path in the subsequent patch.

Notice that the spinlock guarding kmsg_dumpers list also guards
increment/decrement of the dumper's counter, but there's no need
for that when reading the counter in the panic path, since that is
an atomic path and there's no other (planned) user.

Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
include/linux/kmsg_dump.h | 7 +++++++
kernel/printk/printk.c | 14 ++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 906521c2329c..abea1974bff8 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -65,6 +65,8 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,

void kmsg_dump_rewind(struct kmsg_dump_iter *iter);

+bool kmsg_has_dumpers(void);
+
int kmsg_dump_register(struct kmsg_dumper *dumper);

int kmsg_dump_unregister(struct kmsg_dumper *dumper);
@@ -91,6 +93,11 @@ static inline void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
{
}

+static inline bool kmsg_has_dumpers(void)
+{
+ return false;
+}
+
static inline int kmsg_dump_register(struct kmsg_dumper *dumper)
{
return -EINVAL;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index da03c15ecc89..e3a1c429fbbc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3399,6 +3399,18 @@ EXPORT_SYMBOL(printk_timed_ratelimit);

static DEFINE_SPINLOCK(dump_list_lock);
static LIST_HEAD(dump_list);
+static int num_dumpers;
+
+/**
+ * kmsg_has_dumpers - inform if there is any kmsg dumper registered.
+ *
+ * Returns true if there's at least one registered dumper, or false
+ * if otherwise.
+ */
+bool kmsg_has_dumpers(void)
+{
+ return num_dumpers ? true : false;
+}

/**
* kmsg_dump_register - register a kernel log dumper.
@@ -3423,6 +3435,7 @@ int kmsg_dump_register(struct kmsg_dumper *dumper)
dumper->registered = 1;
list_add_tail_rcu(&dumper->list, &dump_list);
err = 0;
+ num_dumpers++;
}
spin_unlock_irqrestore(&dump_list_lock, flags);

@@ -3447,6 +3460,7 @@ int kmsg_dump_unregister(struct kmsg_dumper *dumper)
dumper->registered = 0;
list_del_rcu(&dumper->list);
err = 0;
+ num_dumpers--;
}
spin_unlock_irqrestore(&dump_list_lock, flags);
synchronize_rcu();
--
2.36.0

2022-04-28 07:02:57

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

Currently the panic notifiers from user mode linux don't follow
the convention for most of the other notifiers present in the
kernel (indentation, priority setting, numeric return).
More important, the priorities could be improved, since it's a
special case (userspace), hence we could run the notifiers earlier;
user mode linux shouldn't care much with other panic notifiers but
the ordering among the mconsole and arch notifier is important,
given that the arch one effectively triggers a core dump.

This patch fixes that by running the mconsole notifier as the first
panic notifier, followed by the architecture one (that coredumps).
Also, we remove a useless header inclusion.

Cc: Anton Ivanov <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Richard Weinberger <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
arch/um/drivers/mconsole_kern.c | 8 +++-----
arch/um/kernel/um_arch.c | 8 ++++----
2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 8ca67a692683..2ea0421bcc3f 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -11,7 +11,6 @@
#include <linux/list.h>
#include <linux/mm.h>
#include <linux/module.h>
-#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/reboot.h>
#include <linux/sched/debug.h>
@@ -846,13 +845,12 @@ static int notify_panic(struct notifier_block *self, unsigned long unused1,

mconsole_notify(notify_socket, MCONSOLE_PANIC, message,
strlen(message) + 1);
- return 0;
+ return NOTIFY_DONE;
}

static struct notifier_block panic_exit_notifier = {
- .notifier_call = notify_panic,
- .next = NULL,
- .priority = 1
+ .notifier_call = notify_panic,
+ .priority = INT_MAX, /* run as soon as possible */
};

static int add_notifier(void)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 0760e24f2eba..4485b1a7c8e4 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -246,13 +246,13 @@ static int panic_exit(struct notifier_block *self, unsigned long unused1,
bust_spinlocks(0);
uml_exitcode = 1;
os_dump_core();
- return 0;
+
+ return NOTIFY_DONE;
}

static struct notifier_block panic_exit_notifier = {
- .notifier_call = panic_exit,
- .next = NULL,
- .priority = 0
+ .notifier_call = panic_exit,
+ .priority = INT_MAX - 1, /* run as 2nd notifier, won't return */
};

void uml_finishsetup(void)
--
2.36.0

2022-04-28 07:04:41

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 10/30] alpha: Clean-up the panic notifier code

The alpha panic notifier has some code issues, not following
the conventions of other notifiers. Also, it might halt the
machine but still it is set to run as early as possible, which
doesn't seem to be a good idea.

This patch cleans the code, and set the notifier to run as the
latest, following the same approach other architectures are doing.
Also, we remove the unnecessary include of a header already
included indirectly.

Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Richard Henderson <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
arch/alpha/kernel/setup.c | 36 +++++++++++++++---------------------
1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index b4fbbba30aa2..d88bdf852753 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -41,19 +41,11 @@
#include <linux/sysrq.h>
#include <linux/reboot.h>
#endif
-#include <linux/notifier.h>
#include <asm/setup.h>
#include <asm/io.h>
#include <linux/log2.h>
#include <linux/export.h>

-static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
-static struct notifier_block alpha_panic_block = {
- alpha_panic_event,
- NULL,
- INT_MAX /* try to do it first */
-};
-
#include <linux/uaccess.h>
#include <asm/hwrpb.h>
#include <asm/dma.h>
@@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
};
#endif

+static int alpha_panic_event(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+ /* If we are using SRM and serial console, just hard halt here. */
+ if (alpha_using_srm && srmcons_output)
+ __halt();
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block alpha_panic_block = {
+ .notifier_call = alpha_panic_event,
+ .priority = INT_MIN, /* may not return, do it last */
+};
+
void __init
setup_arch(char **cmdline_p)
{
@@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
.show = show_cpuinfo,
};

-
-static int
-alpha_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
-{
-#if 1
- /* FIXME FIXME FIXME */
- /* If we are using SRM and serial console, just hard halt here. */
- if (alpha_using_srm && srmcons_output)
- __halt();
-#endif
- return NOTIFY_DONE;
-}
-
static __init int add_pcspkr(void)
{
struct platform_device *pd;
--
2.36.0

2022-04-28 07:04:56

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

Currently we have a debug infrastructure in the notifiers file, but
it's very simple/limited. This patch extends it by:

(a) Showing all registered/unregistered notifiers' callback names;

(b) Adding a dynamic debug tuning to allow showing called notifiers'
function names. Notice that this should be guarded as a tunable since
it can flood the kernel log buffer.

Cc: Arjan van de Ven <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Xiaoming Ni <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---

We have some design decisions that worth discussing here:

(a) First of call, using C99 helps a lot to write clear and concise code, but
due to commit 4d94f910e79a ("Kbuild: use -Wdeclaration-after-statement") we
have a warning if mixing variable declarations with code. For this patch though,
doing that makes the code way clear, so decision was to add the debug code
inside brackets whenever this warning pops up. We can change that, but that'll
cause more ifdefs in the same function.

(b) In the symbol lookup helper function, we modify the parameter passed but
even more, we return it as well! This is unusual and seems unnecessary, but was
the strategy taken to allow embedding such function in the pr_debug() call.

Not doing that would likely requiring 3 symbol_name variables to avoid
concurrency (registering notifier A while calling notifier B) - we rely in
local variables as a serialization mechanism.

We're open for suggestions in case this design is not appropriate;
thanks in advance!

kernel/notifier.c | 48 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index ba005ebf4730..21032ebcde57 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -7,6 +7,22 @@
#include <linux/vmalloc.h>
#include <linux/reboot.h>

+#ifdef CONFIG_DEBUG_NOTIFIERS
+#include <linux/kallsyms.h>
+
+/*
+ * Helper to get symbol names in case DEBUG_NOTIFIERS is set.
+ * Return the modified parameter is a strategy used to achieve
+ * the pr_debug() functionality - with this, function is only
+ * executed if the dynamic debug tuning is effectively set.
+ */
+static inline char *notifier_name(struct notifier_block *nb, char *sym_name)
+{
+ lookup_symbol_name((unsigned long)(nb->notifier_call), sym_name);
+ return sym_name;
+}
+#endif
+
/*
* Notifier list for kernel code which wants to be called
* at shutdown. This is used to stop any idling DMA operations
@@ -34,20 +50,41 @@ static int notifier_chain_register(struct notifier_block **nl,
}
n->next = *nl;
rcu_assign_pointer(*nl, n);
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+ {
+ char sym_name[KSYM_NAME_LEN];
+
+ pr_info("notifiers: registered %s()\n",
+ notifier_name(n, sym_name));
+ }
+#endif
return 0;
}

static int notifier_chain_unregister(struct notifier_block **nl,
struct notifier_block *n)
{
+ int ret = -ENOENT;
+
while ((*nl) != NULL) {
if ((*nl) == n) {
rcu_assign_pointer(*nl, n->next);
- return 0;
+ ret = 0;
+ break;
}
nl = &((*nl)->next);
}
- return -ENOENT;
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+ if (!ret) {
+ char sym_name[KSYM_NAME_LEN];
+
+ pr_info("notifiers: unregistered %s()\n",
+ notifier_name(n, sym_name));
+ }
+#endif
+ return ret;
}

/**
@@ -80,6 +117,13 @@ static int notifier_call_chain(struct notifier_block **nl,
nb = next_nb;
continue;
}
+
+ {
+ char sym_name[KSYM_NAME_LEN];
+
+ pr_debug("notifiers: calling %s()\n",
+ notifier_name(nb, sym_name));
+ }
#endif
ret = nb->notifier_call(nb, val, v);

--
2.36.0

2022-04-28 07:16:48

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump

Since commit a11589563e96 ("x86/Hyper-V: Report crash register
data or kmsg before running crash kernel") Hyper-V forcibly sets
the kernel parameter "crash_kexec_post_notifiers"; with that, it
did enforce the execution of *all* panic notifiers before kdump.
The main reason behind that is that Hyper-V has an hypervisor
notification mechanism that has the ability of warning the
hypervisor when the guest panics.

Happens that after the panic notifiers refactor, we now have 3 lists
and a level mechanism that defines the ordering of the notifiers
execution with regards to kdump. And for Hyper-V, the specific
notifier to inform the hypervisor about a panic lies in the first
list, which *by default* is set to execute before kdump. Hence,
this patch removes the hardcoded setting, effectively reverting
the aforementioned commit.

One of the problems with the forced approach was greatly exposed by
commit d57d6fe5bf34 ("drivers: hv: log when enabling crash_kexec_post_notifiers")
which ended-up confusing the user that didn't expect the notifiers
to execute before kdump, since it's a user setting and wasn't
enabled by such user. With the patch hereby proposed, that kind
of issue doesn't happen anymore, the panic notifiers level is
well-documented and users can expect a predictable behavior.

Fixes: a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before running crash kernel")
Fixes: d57d6fe5bf34 ("drivers: hv: log when enabling crash_kexec_post_notifiers"
Cc: Andrea Parri (Microsoft) <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Stephen Brennan <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Tianyu Lan <[email protected]>
Cc: Wei Liu <[email protected]>
Tested-by: Fabio A M Martins <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---

Special thanks to Michael Kelley for the good information about the Hyper-V
panic path in email threads some months ago, and to Fabio for the testing
performed.

drivers/hv/hv_common.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..af59793de523 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -73,18 +73,6 @@ int __init hv_common_init(void)
{
int i;

- /*
- * Hyper-V expects to get crash register data or kmsg when
- * crash enlightment is available and system crashes. Set
- * crash_kexec_post_notifiers to be true to make sure that
- * calling crash enlightment interface before running kdump
- * kernel.
- */
- if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
- crash_kexec_post_notifiers = true;
- pr_info("Hyper-V: enabling crash_kexec_post_notifiers\n");
- }
-
/*
* Allocate the per-CPU state for the hypercall input arg.
* If this allocation fails, we will not be able to setup
--
2.36.0

2022-04-28 07:17:29

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

Currently we have 3 notifier lists in the panic path, which will
be wired in a way to allow the notifier callbacks to run in
different moments at panic time, in a subsequent patch.

But there is also an odd set of architecture calls hardcoded in
the end of panic path, after the restart machinery. They're
responsible for late time tunings / events, like enabling a stop
button (Sparc) or effectively stopping the machine (s390).

This patch introduces yet another notifier list to offer the
architectures a way to add callbacks in such late moment on
panic path without the need of ifdefs / hardcoded approaches.

Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
arch/s390/kernel/setup.c | 19 ++++++++++++++++++-
arch/sparc/kernel/setup_32.c | 27 +++++++++++++++++++++++----
arch/sparc/kernel/setup_64.c | 29 ++++++++++++++++++++++++-----
include/linux/panic_notifier.h | 1 +
kernel/panic.c | 19 +++++++------------
5 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index d860ac300919..d816b2045f1e 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -39,7 +39,6 @@
#include <linux/kernel_stat.h>
#include <linux/dma-map-ops.h>
#include <linux/device.h>
-#include <linux/notifier.h>
#include <linux/pfn.h>
#include <linux/ctype.h>
#include <linux/reboot.h>
@@ -51,6 +50,7 @@
#include <linux/start_kernel.h>
#include <linux/hugetlb.h>
#include <linux/kmemleak.h>
+#include <linux/panic_notifier.h>

#include <asm/boot_data.h>
#include <asm/ipl.h>
@@ -943,6 +943,20 @@ static void __init log_component_list(void)
}
}

+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void *unused)
+{
+ disabled_wait();
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+ .notifier_call = panic_event,
+};
+
/*
* Setup function called from init/main.c just after the banner
* was printed.
@@ -1058,4 +1072,7 @@ void __init setup_arch(char **cmdline_p)

/* Add system specific data to the random pool */
setup_randomness();
+
+ atomic_notifier_chain_register(&panic_post_reboot_list,
+ &post_reboot_panic_block);
}
diff --git a/arch/sparc/kernel/setup_32.c b/arch/sparc/kernel/setup_32.c
index c8e0dd99f370..4e2428972f76 100644
--- a/arch/sparc/kernel/setup_32.c
+++ b/arch/sparc/kernel/setup_32.c
@@ -34,6 +34,7 @@
#include <linux/kdebug.h>
#include <linux/export.h>
#include <linux/start_kernel.h>
+#include <linux/panic_notifier.h>
#include <uapi/linux/mount.h>

#include <asm/io.h>
@@ -51,6 +52,7 @@

#include "kernel.h"

+int stop_a_enabled = 1;
struct screen_info screen_info = {
0, 0, /* orig-x, orig-y */
0, /* unused */
@@ -293,6 +295,24 @@ void __init sparc32_start_kernel(struct linux_romvec *rp)
start_kernel();
}

+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void *unused)
+{
+ /* Make sure the user can actually press Stop-A (L1-A) */
+ stop_a_enabled = 1;
+ pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
+ "twice on console to return to the boot prom\n");
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+ .notifier_call = panic_event,
+};
+
void __init setup_arch(char **cmdline_p)
{
int i;
@@ -368,9 +388,10 @@ void __init setup_arch(char **cmdline_p)
paging_init();

smp_setup_cpu_possible_map();
-}

-extern int stop_a_enabled;
+ atomic_notifier_chain_register(&panic_post_reboot_list,
+ &post_reboot_panic_block);
+}

void sun_do_break(void)
{
@@ -384,8 +405,6 @@ void sun_do_break(void)
}
EXPORT_SYMBOL(sun_do_break);

-int stop_a_enabled = 1;
-
static int __init topology_init(void)
{
int i, ncpus, err;
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 48abee4eee29..9066c25ecc07 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -33,6 +33,7 @@
#include <linux/module.h>
#include <linux/start_kernel.h>
#include <linux/memblock.h>
+#include <linux/panic_notifier.h>
#include <uapi/linux/mount.h>

#include <asm/io.h>
@@ -62,6 +63,8 @@
#include "entry.h"
#include "kernel.h"

+int stop_a_enabled = 1;
+
/* Used to synchronize accesses to NatSemi SUPER I/O chip configure
* operations in asm/ns87303.h
*/
@@ -632,6 +635,24 @@ void __init alloc_irqstack_bootmem(void)
}
}

+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void *unused)
+{
+ /* Make sure the user can actually press Stop-A (L1-A) */
+ stop_a_enabled = 1;
+ pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
+ "twice on console to return to the boot prom\n");
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+ .notifier_call = panic_event,
+};
+
void __init setup_arch(char **cmdline_p)
{
/* Initialize PROM console and command line. */
@@ -691,9 +712,10 @@ void __init setup_arch(char **cmdline_p)
* allocate the IRQ stacks.
*/
alloc_irqstack_bootmem();
-}

-extern int stop_a_enabled;
+ atomic_notifier_chain_register(&panic_post_reboot_list,
+ &post_reboot_panic_block);
+}

void sun_do_break(void)
{
@@ -706,6 +728,3 @@ void sun_do_break(void)
prom_cmdline();
}
EXPORT_SYMBOL(sun_do_break);
-
-int stop_a_enabled = 1;
-EXPORT_SYMBOL(stop_a_enabled);
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 7912aacbc0e5..bcf6a5ea9d7f 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -8,6 +8,7 @@
extern struct atomic_notifier_head panic_hypervisor_list;
extern struct atomic_notifier_head panic_info_list;
extern struct atomic_notifier_head panic_pre_reboot_list;
+extern struct atomic_notifier_head panic_post_reboot_list;

extern bool crash_kexec_post_notifiers;

diff --git a/kernel/panic.c b/kernel/panic.c
index a9d43b98b05b..bf792102b43e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -78,6 +78,9 @@ EXPORT_SYMBOL(panic_info_list);
ATOMIC_NOTIFIER_HEAD(panic_pre_reboot_list);
EXPORT_SYMBOL(panic_pre_reboot_list);

+ATOMIC_NOTIFIER_HEAD(panic_post_reboot_list);
+EXPORT_SYMBOL(panic_post_reboot_list);
+
static long no_blink(int state)
{
return 0;
@@ -359,18 +362,10 @@ void panic(const char *fmt, ...)
reboot_mode = panic_reboot_mode;
emergency_restart();
}
-#ifdef __sparc__
- {
- extern int stop_a_enabled;
- /* Make sure the user can actually press Stop-A (L1-A) */
- stop_a_enabled = 1;
- pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
- "twice on console to return to the boot prom\n");
- }
-#endif
-#if defined(CONFIG_S390)
- disabled_wait();
-#endif
+
+ atomic_notifier_call_chain(&panic_post_reboot_list,
+ PANIC_NOTIFIER, buf);
+
pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);

/* Do not scroll important messages printed above */
--
2.36.0

2022-04-28 07:18:57

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

In the panic path we have a list of functions to be called, the panic
notifiers - such callbacks perform various actions in the machine's
last breath, and sometimes users want them to run before kdump. We
have the parameter "crash_kexec_post_notifiers" for that. When such
parameter is used, the function "crash_smp_send_stop()" is executed
to poweroff all secondary CPUs through the NMI-shootdown mechanism;
part of this process involves disabling virtualization features in
all CPUs (except the main one).

Now, in the emergency restart procedure we have also a way of
disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
what happens though is that in case we already NMI-disabled all CPUs,
the emergency restart fails due to a second addition of the same items
in the NMI list, as per the following log output:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
[...]
Rebooting in 2 seconds..
list_add double add: new=<addr1>, prev=<addr2>, next=<addr1>.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP PTI

In order to reproduce the problem, users just need to set the kernel
parameter "crash_kexec_post_notifiers" *without* kdump set in any
system with the VMX feature present.

Since there is no benefit in re-disabling VMX in all CPUs in case
it was already done, this patch prevents that by guarding the restart
routine against doubly issuing NMIs unnecessarily. Notice we still
need to disable VMX locally in the emergency restart.

Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path")
Cc: David P. Reed <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
arch/x86/include/asm/cpu.h | 1 +
arch/x86/kernel/crash.c | 8 ++++----
arch/x86/kernel/reboot.c | 14 ++++++++++++--
3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 86e5e4e26fcb..b6a9062d387f 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -36,6 +36,7 @@ extern int _debug_hotplug_cpu(int cpu, int action);
#endif
#endif

+extern bool crash_cpus_stopped;
int mwait_usable(const struct cpuinfo_x86 *);

unsigned int x86_family(unsigned int sig);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8326a8d1c5d..71dd1a990e8d 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -42,6 +42,8 @@
#include <asm/crash.h>
#include <asm/cmdline.h>

+bool crash_cpus_stopped;
+
/* Used while preparing memory map entries for second kernel */
struct crash_memmap_data {
struct boot_params *params;
@@ -108,9 +110,7 @@ void kdump_nmi_shootdown_cpus(void)
/* Override the weak function in kernel/panic.c */
void crash_smp_send_stop(void)
{
- static int cpus_stopped;
-
- if (cpus_stopped)
+ if (crash_cpus_stopped)
return;

if (smp_ops.crash_stop_other_cpus)
@@ -118,7 +118,7 @@ void crash_smp_send_stop(void)
else
smp_send_stop();

- cpus_stopped = 1;
+ crash_cpus_stopped = true;
}

#else
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index fa700b46588e..2fc42b8402ac 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -589,8 +589,18 @@ static void native_machine_emergency_restart(void)
int orig_reboot_type = reboot_type;
unsigned short mode;

- if (reboot_emergency)
- emergency_vmx_disable_all();
+ /*
+ * We can reach this point in the end of panic path, having
+ * NMI-disabled all secondary CPUs. This process involves
+ * disabling the CPU virtualization technologies, so if that
+ * is the case, we only miss disabling the local CPU VMX...
+ */
+ if (reboot_emergency) {
+ if (!crash_cpus_stopped)
+ emergency_vmx_disable_all();
+ else
+ cpu_emergency_vmxoff();
+ }

tboot_shutdown(TB_SHUTDOWN_REBOOT);

--
2.36.0

2022-04-28 08:02:01

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

The pvpanic driver relies on panic notifiers to execute a callback
on panic event. Such function is executed in atomic context - the
panic function disables local IRQs, preemption and all other CPUs
that aren't running the panic code.

With that said, it's dangerous to use regular spinlocks in such path,
as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
This patch fixes that by replacing regular spinlocks with the trylock
safer approach.

It also fixes an old comment (about a long gone framebuffer code) and
the notifier priority - we should execute hypervisor notifiers early,
deferring this way the panic action to the hypervisor, as expected by
the users that are setting up pvpanic.

Fixes: b3c0f8774668 ("misc/pvpanic: probe multiple instances")
Cc: Christophe JAILLET <[email protected]>
Cc: Mihai Carabas <[email protected]>
Cc: Shile Zhang <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Cc: zhenwei pi <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
drivers/misc/pvpanic/pvpanic.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/pvpanic/pvpanic.c b/drivers/misc/pvpanic/pvpanic.c
index 4b8f1c7d726d..049a12006348 100644
--- a/drivers/misc/pvpanic/pvpanic.c
+++ b/drivers/misc/pvpanic/pvpanic.c
@@ -34,7 +34,9 @@ pvpanic_send_event(unsigned int event)
{
struct pvpanic_instance *pi_cur;

- spin_lock(&pvpanic_lock);
+ if (!spin_trylock(&pvpanic_lock))
+ return;
+
list_for_each_entry(pi_cur, &pvpanic_list, list) {
if (event & pi_cur->capability & pi_cur->events)
iowrite8(event, pi_cur->base);
@@ -55,9 +57,13 @@ pvpanic_panic_notify(struct notifier_block *nb, unsigned long code, void *unused
return NOTIFY_DONE;
}

+/*
+ * Call our notifier very early on panic, deferring the
+ * action taken to the hypervisor.
+ */
static struct notifier_block pvpanic_panic_nb = {
.notifier_call = pvpanic_panic_notify,
- .priority = 1, /* let this called before broken drm_fb_helper() */
+ .priority = INT_MAX,
};

static void pvpanic_remove(void *param)
--
2.36.0

2022-04-28 08:08:53

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

Currently the parameter "panic_print" relies in a function called
directly on panic path; one of the flags the users can set for
panic_print triggers a console replay mechanism, to show the
entire kernel log buffer (from the beginning) in a panic event.

Two problems with that: the dual nature of the panic_print
isn't really appropriate, the function was originally meant
to allow users dumping system information on panic events,
and was "overridden" to also force a console flush of the full
kernel log buffer. It also turns the code a bit more complex
and duplicate than it needs to be.

This patch proposes 2 changes: first, we decouple panic_print
from the console flushing mechanism, in the form of a new kernel
core parameter (panic_console_replay); we kept the functionality
on panic_print to avoid userspace breakage, although we comment
in both code and documentation that this panic_print usage is
deprecated.

We converted panic_print function to a panic notifier too, adding
it on the panic informational notifier list, executed as the final
callback. This allows a more clear code and makes sense, as
panic_print_sys_info() is really a panic-time only function.
We also moved its code to kernel/printk.c, it seems to make more
sense given it's related to printing stuff.

Suggested-by: Petr Mladek <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 12 +++-
Documentation/admin-guide/sysctl/kernel.rst | 5 +-
include/linux/console.h | 2 +
include/linux/panic.h | 1 -
include/linux/printk.h | 1 +
kernel/panic.c | 51 +++------------
kernel/printk/printk.c | 62 +++++++++++++++++++
7 files changed, 87 insertions(+), 47 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8d3524060ce3..c99da8b2b216 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3791,6 +3791,14 @@
timeout < 0: reboot immediately
Format: <timeout>

+ panic_console_replay
+ [KNL] Force a kernel log replay in the console on
+ panic event. Notice that there is already a flush
+ mechanism for pending messages; this option is meant
+ for users that wish to replay the *full* buffer.
+ It deprecates the bit 5 setting on "panic_print",
+ both having the same functionality.
+
panic_notifiers_level=
[KNL] Set the panic notifiers execution order.
Format: <unsigned int>
@@ -3825,12 +3833,14 @@
bit 2: print timer info
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
- bit 5: print all printk messages in buffer
+ bit 5: print all printk messages in buffer (DEPRECATED)
bit 6: print all CPUs backtrace (if available in the arch)
*Be aware* that this option may print a _lot_ of lines,
so there are risks of losing older messages in the log.
Use this option carefully, maybe worth to setup a
bigger log buffer with "log_buf_len" along with this.
+ Also, notice that bit 5 was deprecated in favor of the
+ parameter "panic_console_replay".

panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: <hex>[,nousertaint]
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 1144ea3229a3..17b293a0e566 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -763,10 +763,13 @@ bit 1 print system memory info
bit 2 print timer info
bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
-bit 5 print all printk messages in buffer
+bit 5 print all printk messages in buffer (DEPRECATED)
bit 6 print all CPUs backtrace (if available in the arch)
===== ============================================

+Notice that bit 5 was deprecated in favor of kernel core parameter
+"panic_console_replay" (see kernel-parameters.txt documentation).
+
So for example to print tasks and memory info on panic, user can::

echo 3 > /proc/sys/kernel/panic_print
diff --git a/include/linux/console.h b/include/linux/console.h
index 7cd758a4f44e..351c14f623ad 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -169,6 +169,8 @@ enum con_flush_mode {
CONSOLE_REPLAY_ALL,
};

+extern bool panic_console_replay;
+
extern int add_preferred_console(char *name, int idx, char *options);
extern void register_console(struct console *);
extern int unregister_console(struct console *);
diff --git a/include/linux/panic.h b/include/linux/panic.h
index f5844908a089..34175d0188d0 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -22,7 +22,6 @@ extern unsigned int sysctl_oops_all_cpu_backtrace;
#endif /* CONFIG_SMP */

extern int panic_timeout;
-extern unsigned long panic_print;
extern int panic_on_oops;
extern int panic_on_unrecovered_nmi;
extern int panic_on_io_nmi;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1522df223c0f..aee2e8ebd541 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -13,6 +13,7 @@
extern const char linux_banner[];
extern const char linux_proc_banner[];

+extern unsigned long panic_print;
extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */

#define PRINTK_MAX_SINGLE_HEADER_LEN 2
diff --git a/kernel/panic.c b/kernel/panic.c
index b7c055d4f87f..ff257bd8f81b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -68,15 +68,6 @@ unsigned int panic_notifiers_level = 2;
/* DEPRECATED in favor of panic_notifiers_level */
bool crash_kexec_post_notifiers;

-#define PANIC_PRINT_TASK_INFO 0x00000001
-#define PANIC_PRINT_MEM_INFO 0x00000002
-#define PANIC_PRINT_TIMER_INFO 0x00000004
-#define PANIC_PRINT_LOCK_INFO 0x00000008
-#define PANIC_PRINT_FTRACE_INFO 0x00000010
-#define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020
-#define PANIC_PRINT_ALL_CPU_BT 0x00000040
-unsigned long panic_print;
-
ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
EXPORT_SYMBOL(panic_hypervisor_list);

@@ -168,33 +159,6 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
}
EXPORT_SYMBOL(nmi_panic);

-static void panic_print_sys_info(bool console_flush)
-{
- if (console_flush) {
- if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
- console_flush_on_panic(CONSOLE_REPLAY_ALL);
- return;
- }
-
- if (panic_print & PANIC_PRINT_ALL_CPU_BT)
- trigger_all_cpu_backtrace();
-
- if (panic_print & PANIC_PRINT_TASK_INFO)
- show_state();
-
- if (panic_print & PANIC_PRINT_MEM_INFO)
- show_mem(0, NULL);
-
- if (panic_print & PANIC_PRINT_TIMER_INFO)
- sysrq_timer_list_show();
-
- if (panic_print & PANIC_PRINT_LOCK_INFO)
- debug_show_all_locks();
-
- if (panic_print & PANIC_PRINT_FTRACE_INFO)
- ftrace_dump(DUMP_ALL);
-}
-
/*
* Helper that accumulates all console flushing routines executed on panic.
*/
@@ -218,7 +182,11 @@ static void console_flushing(void)
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);

- panic_print_sys_info(true);
+ /* In case users wish to replay the full log buffer... */
+ if (panic_console_replay) {
+ pr_warn("Replaying the log buffer from the beginning\n");
+ console_flush_on_panic(CONSOLE_REPLAY_ALL);
+ }
}

#define PN_HYPERVISOR_BIT 0
@@ -431,10 +399,8 @@ void panic(const char *fmt, ...)
crash_smp_send_stop();
panic_notifier_hypervisor_once(buf);

- if (panic_notifier_info_once(buf)) {
- panic_print_sys_info(false);
+ if (panic_notifier_info_once(buf))
kmsg_dump(KMSG_DUMP_PANIC);
- }

panic_notifier_pre_reboot_once(buf);

@@ -442,10 +408,8 @@ void panic(const char *fmt, ...)

panic_notifier_hypervisor_once(buf);

- if (panic_notifier_info_once(buf)) {
- panic_print_sys_info(false);
+ if (panic_notifier_info_once(buf))
kmsg_dump(KMSG_DUMP_PANIC);
- }

panic_notifier_pre_reboot_once(buf);

@@ -814,7 +778,6 @@ EXPORT_SYMBOL(__stack_chk_fail);
#endif

core_param(panic, panic_timeout, int, 0644);
-core_param(panic_print, panic_print, ulong, 0644);
core_param(pause_on_oops, pause_on_oops, int, 0644);
core_param(panic_on_warn, panic_on_warn, int, 0644);
core_param(panic_notifiers_level, panic_notifiers_level, uint, 0644);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e3a1c429fbbc..ad91d4c04246 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -35,6 +35,7 @@
#include <linux/memblock.h>
#include <linux/syscalls.h>
#include <linux/crash_core.h>
+#include <linux/panic_notifier.h>
#include <linux/ratelimit.h>
#include <linux/kmsg_dump.h>
#include <linux/syslog.h>
@@ -3234,6 +3235,61 @@ void __init console_init(void)
}
}

+#define PANIC_PRINT_TASK_INFO 0x00000001
+#define PANIC_PRINT_MEM_INFO 0x00000002
+#define PANIC_PRINT_TIMER_INFO 0x00000004
+#define PANIC_PRINT_LOCK_INFO 0x00000008
+#define PANIC_PRINT_FTRACE_INFO 0x00000010
+
+/* DEPRECATED - please use "panic_console_replay" */
+#define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020
+
+#define PANIC_PRINT_ALL_CPU_BT 0x00000040
+
+unsigned long panic_print;
+bool panic_console_replay;
+
+static int panic_print_sys_info(struct notifier_block *self,
+ unsigned long ev, void *unused)
+{
+ if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+ trigger_all_cpu_backtrace();
+
+ if (panic_print & PANIC_PRINT_TASK_INFO)
+ show_state();
+
+ if (panic_print & PANIC_PRINT_MEM_INFO)
+ show_mem(0, NULL);
+
+ if (panic_print & PANIC_PRINT_TIMER_INFO)
+ sysrq_timer_list_show();
+
+ if (panic_print & PANIC_PRINT_LOCK_INFO)
+ debug_show_all_locks();
+
+ if (panic_print & PANIC_PRINT_FTRACE_INFO)
+ ftrace_dump(DUMP_ALL);
+
+ /*
+ * This is legacy/deprecated feature from panic_print,
+ * the console force flushing. We have now the parameter
+ * "panic_console_replay", but we need to keep the
+ * retro-compatibility with the old stuff...
+ */
+ if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
+ panic_console_replay = true;
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block panic_print_nb = {
+ .notifier_call = panic_print_sys_info,
+ .priority = INT_MIN, /* defer to run as late as possible */
+};
+
+core_param(panic_print, panic_print, ulong, 0644);
+core_param(panic_console_replay, panic_console_replay, bool, 0644);
+
/*
* Some boot consoles access data that is in the init section and which will
* be discarded after the initcalls have been run. To make sure that no code
@@ -3253,6 +3309,12 @@ static int __init printk_late_init(void)
struct console *con;
int ret;

+ /*
+ * Register the panic notifier to print user information
+ * in case the user have that set.
+ */
+ atomic_notifier_chain_register(&panic_info_list, &panic_print_nb);
+
for_each_console(con) {
if (!(con->flags & CON_BOOT))
continue;
--
2.36.0

2022-04-28 08:38:56

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 28/30] panic: Unexport crash_kexec_post_notifiers

There is no users anymore of this variable that requires
it to be "exported" in the headers; also, it was deprecated
by the kernel parameter "panic_notifiers_level".

Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
include/linux/panic.h | 2 --
include/linux/panic_notifier.h | 1 -
2 files changed, 3 deletions(-)

diff --git a/include/linux/panic.h b/include/linux/panic.h
index 34175d0188d0..d301db07a8af 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -34,8 +34,6 @@ extern int sysctl_panic_on_rcu_stall;
extern int sysctl_max_rcu_stall_to_panic;
extern int sysctl_panic_on_stackoverflow;

-extern bool crash_kexec_post_notifiers;
-
/*
* panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
* holds a CPU number which is executing panic() currently. A value of
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index b5041132321d..8fda7045e2f7 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -11,7 +11,6 @@ extern struct atomic_notifier_head panic_pre_reboot_list;
extern struct atomic_notifier_head panic_post_reboot_list;

bool panic_notifiers_before_kdump(void);
-extern bool crash_kexec_post_notifiers;

enum panic_notifier_val {
PANIC_UNUSED,
--
2.36.0

2022-04-28 08:49:49

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
is responsible for that. This makes sense, since we're turning off
such CPUs, putting them in an endless busy-wait loop.

Problem is that there is an alternative path for disabling CPUs,
in the form of function crash_smp_send_stop(), used for kexec/panic
paths. This functions relies in a SMP call that also triggers a
busy-wait loop [at machine_crash_nonpanic_core()], but *without*
disabling interrupts. This might lead to odd scenarios, like early
interrupts in the boot of kexec'd kernel or even interrupts in
other CPUs while the main one still works in the panic path and
assumes all secondary CPUs are (really!) off.

This patch mimics the ipi_cpu_stop() interrupt disable mechanism
in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
secondary CPUs in the kexec/panic path as well.

Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
arch/arm/kernel/machine_kexec.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..ef788ee00519 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
set_cpu_online(smp_processor_id(), false);
atomic_dec(&waiting_for_crash_ipi);

+ local_fiq_disable();
+ local_irq_disable();
+
while (1) {
cpu_relax();
wfe();
--
2.36.0

2022-04-28 13:23:40

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

[trimming massive CC list]

On Wed, 2022-04-27 at 19:49 -0300, Guilherme G. Piccoli wrote:
>
> Also, we remove a useless header inclusion.

I wouldn't say it's useless, generally we try not to rely on implicit
includes so much? And you at least now use NOTIFY_DONE from it.

Otherwise looks fine to me.

johannes

2022-04-28 16:45:36

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

Hi Guilherme,

On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
> The panic notifier infrastructure executes registered callbacks when
> a panic event happens - such callbacks are executed in atomic context,
> with interrupts and preemption disabled in the running CPU and all other
> CPUs disabled. That said, mutexes in such context are not a good idea.
>
> This patch replaces a regular mutex with a mutex_trylock safer approach;
> given the nature of the mutex used in the driver, it should be pretty
> uncommon being unable to acquire such mutex in the panic path, hence
> no functional change should be observed (and if it is, that would be
> likely a deadlock with the regular mutex).
>
> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
> Cc: Leo Yan <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>

How would you like to proceed with queuing this ? I am happy
either way. In case you plan to push this as part of this
series (I don't see any potential conflicts) :

Reviewed-by: Suzuki K Poulose <[email protected]>

2022-04-28 19:44:08

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

The notifiers infrastructure provides a way to pass an "id" to the
callbacks to determine what kind of event happened, i.e., what is
the reason behind they getting called.

The panic notifier currently pass 0, but this is soon to be
used in a multi-targeted notifier, so let's pass a meaningful
"id" over there.

Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
include/linux/panic_notifier.h | 5 +++++
kernel/panic.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 41e32483d7a7..07dced83a783 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;

extern bool crash_kexec_post_notifiers;

+enum panic_notifier_val {
+ PANIC_UNUSED,
+ PANIC_NOTIFIER = 0xDEAD,
+};
+
#endif /* _LINUX_PANIC_NOTIFIERS_H */
diff --git a/kernel/panic.c b/kernel/panic.c
index eb4dfb932c85..523bc9ccd0e9 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -287,7 +287,7 @@ void panic(const char *fmt, ...)
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
*/
- atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
+ atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);

panic_print_sys_info(false);

--
2.36.0

2022-04-29 12:28:48

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

This patch improves the panic/die notifiers in this driver by
making use of a passed "id" instead of comparing pointer
address; also, it removes an useless prototype declaration
and unnecessary header inclusion.

This is part of a panic notifiers refactor - this notifier in
the future will be moved to a new list, that encompass the
information notifiers only.

Fixes: 9eb60880d9a9 ("bus: brcmstb_gisb: add notifier handling")
Cc: Brian Norris <[email protected]>
Cc: Florian Fainelli <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
drivers/bus/brcmstb_gisb.c | 26 +++++++++++---------------
1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 183d5cc37d42..1ea7b015e225 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -19,7 +19,6 @@
#include <linux/pm.h>
#include <linux/kernel.h>
#include <linux/kdebug.h>
-#include <linux/notifier.h>

#ifdef CONFIG_MIPS
#include <asm/traps.h>
@@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, void *dev_id)
/*
* Dump out gisb errors on die or panic.
*/
-static int dump_gisb_error(struct notifier_block *self, unsigned long v,
- void *p);
-
-static struct notifier_block gisb_die_notifier = {
- .notifier_call = dump_gisb_error,
-};
-
-static struct notifier_block gisb_panic_notifier = {
- .notifier_call = dump_gisb_error,
-};
-
static int dump_gisb_error(struct notifier_block *self, unsigned long v,
void *p)
{
struct brcmstb_gisb_arb_device *gdev;
- const char *reason = "panic";
+ const char *reason = "die";

- if (self == &gisb_die_notifier)
- reason = "die";
+ if (v == PANIC_NOTIFIER)
+ reason = "panic";

/* iterate over each GISB arb registered handlers */
list_for_each_entry(gdev, &brcmstb_gisb_arb_device_list, next)
@@ -374,6 +362,14 @@ static int dump_gisb_error(struct notifier_block *self, unsigned long v,
return NOTIFY_DONE;
}

+static struct notifier_block gisb_die_notifier = {
+ .notifier_call = dump_gisb_error,
+};
+
+static struct notifier_block gisb_panic_notifier = {
+ .notifier_call = dump_gisb_error,
+};
+
static DEVICE_ATTR(gisb_arb_timeout, S_IWUSR | S_IRUGO,
gisb_arb_get_timeout, gisb_arb_set_timeout);

--
2.36.0

2022-04-29 12:39:27

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

This patch renames the panic_notifier_list to panic_pre_reboot_list;
the idea is that a subsequent patch will refactor the panic path
in order to better split the notifiers, running some of them very
early, some of them not so early [but still before kmsg_dump()] and
finally, the rest should execute late, after kdump. The latter ones
are now in the panic pre-reboot list - the name comes from the idea
that these notifiers execute before panic() attempts rebooting the
machine (if that option is set).

We also took the opportunity to clean-up useless header inclusions,
improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
and more important, change some priorities - we hereby set 2 notifiers
to run late in the list [iss_panic_event() and the IPMI panic_event()]
due to the risks they offer (may not return, for example).
Proper documentation is going to be provided in a subsequent patch,
that effectively refactors the panic path.

Cc: Alex Elder <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Corey Minyard <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: James Morse <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Wei Liu <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---

Notice that, with this name change, out-of-tree code that relies in the global
exported "panic_notifier_list" will fail to build. We could easily keep the
retro-compatibility by making the old symbol to still exist and point to the
pre_reboot list (or even, keep the old naming).

But our design choice was to allow the breakage, making users rethink their
notifiers, adding them in the list that fits best. If that wasn't a good
decision, we're open to change it, of course.
Thanks in advance for the review!

arch/alpha/kernel/setup.c | 4 ++--
arch/parisc/kernel/pdc_chassis.c | 3 +--
arch/powerpc/kernel/setup-common.c | 2 +-
arch/s390/kernel/ipl.c | 4 ++--
arch/um/drivers/mconsole_kern.c | 2 +-
arch/um/kernel/um_arch.c | 2 +-
arch/x86/xen/enlighten.c | 2 +-
arch/xtensa/platforms/iss/setup.c | 4 ++--
drivers/char/ipmi/ipmi_msghandler.c | 12 +++++++-----
drivers/edac/altera_edac.c | 3 +--
drivers/hv/vmbus_drv.c | 4 ++--
drivers/leds/trigger/ledtrig-panic.c | 3 +--
drivers/misc/ibmasm/heartbeat.c | 16 +++++++++-------
drivers/net/ipa/ipa_smp2p.c | 5 ++---
drivers/parisc/power.c | 4 ++--
drivers/remoteproc/remoteproc_core.c | 6 ++++--
drivers/s390/char/con3215.c | 2 +-
drivers/s390/char/con3270.c | 2 +-
drivers/s390/char/sclp_con.c | 2 +-
drivers/s390/char/sclp_vt220.c | 2 +-
drivers/staging/olpc_dcon/olpc_dcon.c | 6 ++++--
drivers/video/fbdev/hyperv_fb.c | 4 ++--
include/linux/panic_notifier.h | 2 +-
kernel/panic.c | 9 ++++-----
24 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index d88bdf852753..8ace0d7113b6 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -472,8 +472,8 @@ setup_arch(char **cmdline_p)
}

/* Register a call for panic conditions. */
- atomic_notifier_chain_register(&panic_notifier_list,
- &alpha_panic_block);
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
+ &alpha_panic_block);

#ifndef alpha_using_srm
/* Assume that we've booted from SRM if we haven't booted from MILO.
diff --git a/arch/parisc/kernel/pdc_chassis.c b/arch/parisc/kernel/pdc_chassis.c
index da154406d368..0fd8d87fb4f9 100644
--- a/arch/parisc/kernel/pdc_chassis.c
+++ b/arch/parisc/kernel/pdc_chassis.c
@@ -22,7 +22,6 @@
#include <linux/kernel.h>
#include <linux/panic_notifier.h>
#include <linux/reboot.h>
-#include <linux/notifier.h>
#include <linux/cache.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
@@ -135,7 +134,7 @@ void __init parisc_pdc_chassis_init(void)
PDC_CHASSIS_VER);

/* initialize panic notifier chain */
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&pdc_chassis_panic_block);

/* initialize reboot notifier chain */
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index d04b8bf8dbc7..3518bebc10ad 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -762,7 +762,7 @@ void __init setup_panic(void)

/* Low-level platform-specific routines that should run on panic */
if (ppc_md.panic)
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&ppc_panic_block);
}

diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 1cc85b8ff42e..4a88c5bb6e15 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -2034,7 +2034,7 @@ static int on_panic_notify(struct notifier_block *self,
unsigned long event, void *data)
{
do_panic();
- return NOTIFY_OK;
+ return NOTIFY_DONE;
}

static struct notifier_block on_panic_nb = {
@@ -2069,7 +2069,7 @@ void __init setup_ipl(void)
/* We have no info to copy */
break;
}
- atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
}

void s390_reset_system(void)
diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 2ea0421bcc3f..21c13b4e24a3 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -855,7 +855,7 @@ static struct notifier_block panic_exit_notifier = {

static int add_notifier(void)
{
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&panic_exit_notifier);
return 0;
}
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 4485b1a7c8e4..fc6e443299da 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -257,7 +257,7 @@ static struct notifier_block panic_exit_notifier = {

void uml_finishsetup(void)
{
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&panic_exit_notifier);

uml_postsetup();
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 30c6e986a6cd..d4f4de239a21 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -290,7 +290,7 @@ static struct notifier_block xen_panic_block = {

int xen_panic_handler_init(void)
{
- atomic_notifier_chain_register(&panic_notifier_list, &xen_panic_block);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &xen_panic_block);
return 0;
}

diff --git a/arch/xtensa/platforms/iss/setup.c b/arch/xtensa/platforms/iss/setup.c
index d3433e1bb94e..eeeeb6cff6bd 100644
--- a/arch/xtensa/platforms/iss/setup.c
+++ b/arch/xtensa/platforms/iss/setup.c
@@ -13,7 +13,6 @@
*/
#include <linux/init.h>
#include <linux/kernel.h>
-#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/printk.h>
#include <linux/string.h>
@@ -53,6 +52,7 @@ iss_panic_event(struct notifier_block *this, unsigned long event, void *ptr)

static struct notifier_block iss_panic_block = {
.notifier_call = iss_panic_event,
+ .priority = INT_MIN, /* run as late as possible, may not return */
};

void __init platform_setup(char **p_cmdline)
@@ -81,5 +81,5 @@ void __init platform_setup(char **p_cmdline)
}
}

- atomic_notifier_chain_register(&panic_notifier_list, &iss_panic_block);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &iss_panic_block);
}
diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index c59265146e9c..6c4770949c01 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -25,7 +25,6 @@
#include <linux/slab.h>
#include <linux/ipmi.h>
#include <linux/ipmi_smi.h>
-#include <linux/notifier.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <linux/rcupdate.h>
@@ -5375,10 +5374,13 @@ static int ipmi_register_driver(void)
return rv;
}

+/*
+ * we should execute this panic callback late, since it involves
+ * a complex call-chain and panic() runs in atomic context.
+ */
static struct notifier_block panic_block = {
.notifier_call = panic_event,
- .next = NULL,
- .priority = 200 /* priority: INT_MAX >= x >= 0 */
+ .priority = INT_MIN + 1,
};

static int ipmi_init_msghandler(void)
@@ -5406,7 +5408,7 @@ static int ipmi_init_msghandler(void)
timer_setup(&ipmi_timer, ipmi_timeout, 0);
mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES);

- atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &panic_block);

initialized = true;

@@ -5438,7 +5440,7 @@ static void __exit cleanup_ipmi(void)
if (initialized) {
destroy_workqueue(remove_work_wq);

- atomic_notifier_chain_unregister(&panic_notifier_list,
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
&panic_block);

/*
diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index e7e8e624a436..4890e9cba6fb 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -16,7 +16,6 @@
#include <linux/kernel.h>
#include <linux/mfd/altera-sysmgr.h>
#include <linux/mfd/syscon.h>
-#include <linux/notifier.h>
#include <linux/of_address.h>
#include <linux/of_irq.h>
#include <linux/of_platform.h>
@@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
int dberror, err_addr;

edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&edac->panic_notifier);

/* Printout a message if uncorrectable error previously. */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 901b97034308..3717c323aa36 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1622,7 +1622,7 @@ static int vmbus_bus_init(void)
* Always register the vmbus unload panic notifier because we
* need to shut the VMbus channel connection on panic.
*/
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&hyperv_panic_vmbus_unload_block);

vmbus_request_offers();
@@ -2851,7 +2851,7 @@ static void __exit vmbus_exit(void)
* The vmbus panic notifier is always registered, hence we should
* also unconditionally unregister it here as well.
*/
- atomic_notifier_chain_unregister(&panic_notifier_list,
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
&hyperv_panic_vmbus_unload_block);

free_page((unsigned long)hv_panic_page);
diff --git a/drivers/leds/trigger/ledtrig-panic.c b/drivers/leds/trigger/ledtrig-panic.c
index 64abf2e91608..34fd5170723f 100644
--- a/drivers/leds/trigger/ledtrig-panic.c
+++ b/drivers/leds/trigger/ledtrig-panic.c
@@ -7,7 +7,6 @@

#include <linux/kernel.h>
#include <linux/init.h>
-#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/leds.h>
#include "../leds.h"
@@ -64,7 +63,7 @@ static long led_panic_blink(int state)

static int __init ledtrig_panic_init(void)
{
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&led_trigger_panic_nb);

led_trigger_register_simple("panic", &trigger);
diff --git a/drivers/misc/ibmasm/heartbeat.c b/drivers/misc/ibmasm/heartbeat.c
index 59c9a0d95659..d6acae88b722 100644
--- a/drivers/misc/ibmasm/heartbeat.c
+++ b/drivers/misc/ibmasm/heartbeat.c
@@ -8,7 +8,6 @@
* Author: Max Asböck <[email protected]>
*/

-#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include "ibmasm.h"
#include "dot_command.h"
@@ -24,7 +23,7 @@ static int suspend_heartbeats = 0;
* In the case of a panic the interrupt handler continues to work and thus
* continues to respond to heartbeats, making the service processor believe
* the OS is still running and thus preventing a reboot.
- * To prevent this from happening a callback is added the panic_notifier_list.
+ * To prevent this from happening a callback is added in a panic notifier list.
* Before responding to a heartbeat the driver checks if a panic has happened,
* if yes it suspends heartbeat, causing the service processor to reboot as
* expected.
@@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
{
suspend_heartbeats = 1;
- return 0;
+ return NOTIFY_DONE;
}

-static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
+static struct notifier_block panic_notifier = {
+ .notifier_call = panic_happened,
+};

void ibmasm_register_panic_notifier(void)
{
- atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
+ &panic_notifier);
}

void ibmasm_unregister_panic_notifier(void)
{
- atomic_notifier_chain_unregister(&panic_notifier_list,
- &panic_notifier);
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
+ &panic_notifier);
}


diff --git a/drivers/net/ipa/ipa_smp2p.c b/drivers/net/ipa/ipa_smp2p.c
index 211233612039..92cdf6e0637c 100644
--- a/drivers/net/ipa/ipa_smp2p.c
+++ b/drivers/net/ipa/ipa_smp2p.c
@@ -7,7 +7,6 @@
#include <linux/types.h>
#include <linux/device.h>
#include <linux/interrupt.h>
-#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/pm_runtime.h>
#include <linux/soc/qcom/smem.h>
@@ -138,13 +137,13 @@ static int ipa_smp2p_panic_notifier_register(struct ipa_smp2p *smp2p)
smp2p->panic_notifier.notifier_call = ipa_smp2p_panic_notifier;
smp2p->panic_notifier.priority = INT_MAX; /* Do it early */

- return atomic_notifier_chain_register(&panic_notifier_list,
+ return atomic_notifier_chain_register(&panic_pre_reboot_list,
&smp2p->panic_notifier);
}

static void ipa_smp2p_panic_notifier_unregister(struct ipa_smp2p *smp2p)
{
- atomic_notifier_chain_unregister(&panic_notifier_list,
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
&smp2p->panic_notifier);
}

diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
index 8512884de2cf..5bb0868f5f08 100644
--- a/drivers/parisc/power.c
+++ b/drivers/parisc/power.c
@@ -233,7 +233,7 @@ static int __init power_init(void)
}

/* Register a call for panic conditions. */
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&parisc_panic_block);

return 0;
@@ -243,7 +243,7 @@ static void __exit power_exit(void)
{
kthread_stop(power_task);

- atomic_notifier_chain_unregister(&panic_notifier_list,
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
&parisc_panic_block);

pdc_soft_power_button(0);
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c510125769b9..24799ff239e6 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2795,12 +2795,14 @@ static int rproc_panic_handler(struct notifier_block *nb, unsigned long event,
static void __init rproc_init_panic(void)
{
rproc_panic_nb.notifier_call = rproc_panic_handler;
- atomic_notifier_chain_register(&panic_notifier_list, &rproc_panic_nb);
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
+ &rproc_panic_nb);
}

static void __exit rproc_exit_panic(void)
{
- atomic_notifier_chain_unregister(&panic_notifier_list, &rproc_panic_nb);
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
+ &rproc_panic_nb);
}

static int __init remoteproc_init(void)
diff --git a/drivers/s390/char/con3215.c b/drivers/s390/char/con3215.c
index 192198bd3dc4..07379dd3f1f3 100644
--- a/drivers/s390/char/con3215.c
+++ b/drivers/s390/char/con3215.c
@@ -867,7 +867,7 @@ static int __init con3215_init(void)
raw3215[0] = NULL;
return -ENODEV;
}
- atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
register_reboot_notifier(&on_reboot_nb);
register_console(&con3215);
return 0;
diff --git a/drivers/s390/char/con3270.c b/drivers/s390/char/con3270.c
index 476202f3d8a0..e79bf3e7bde3 100644
--- a/drivers/s390/char/con3270.c
+++ b/drivers/s390/char/con3270.c
@@ -645,7 +645,7 @@ con3270_init(void)
condev->cline->len = 0;
con3270_create_status(condev);
condev->input = alloc_string(&condev->freemem, 80);
- atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
register_reboot_notifier(&on_reboot_nb);
register_console(&con3270);
return 0;
diff --git a/drivers/s390/char/sclp_con.c b/drivers/s390/char/sclp_con.c
index e5d947c763ea..7ca9d4c45d60 100644
--- a/drivers/s390/char/sclp_con.c
+++ b/drivers/s390/char/sclp_con.c
@@ -288,7 +288,7 @@ sclp_console_init(void)
timer_setup(&sclp_con_timer, sclp_console_timeout, 0);

/* enable printk-access to this driver */
- atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
register_reboot_notifier(&on_reboot_nb);
register_console(&sclp_console);
return 0;
diff --git a/drivers/s390/char/sclp_vt220.c b/drivers/s390/char/sclp_vt220.c
index a32f34a1c6d2..97cf9e290c28 100644
--- a/drivers/s390/char/sclp_vt220.c
+++ b/drivers/s390/char/sclp_vt220.c
@@ -838,7 +838,7 @@ sclp_vt220_con_init(void)
if (rc)
return rc;
/* Attach linux console */
- atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+ atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
register_reboot_notifier(&on_reboot_nb);
register_console(&sclp_vt220_console);
return 0;
diff --git a/drivers/staging/olpc_dcon/olpc_dcon.c b/drivers/staging/olpc_dcon/olpc_dcon.c
index 7284cb4ac395..cb50471f2246 100644
--- a/drivers/staging/olpc_dcon/olpc_dcon.c
+++ b/drivers/staging/olpc_dcon/olpc_dcon.c
@@ -653,7 +653,8 @@ static int dcon_probe(struct i2c_client *client, const struct i2c_device_id *id)
}

register_reboot_notifier(&dcon->reboot_nb);
- atomic_notifier_chain_register(&panic_notifier_list, &dcon_panic_nb);
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
+ &dcon_panic_nb);

return 0;

@@ -676,7 +677,8 @@ static int dcon_remove(struct i2c_client *client)
struct dcon_priv *dcon = i2c_get_clientdata(client);

unregister_reboot_notifier(&dcon->reboot_nb);
- atomic_notifier_chain_unregister(&panic_notifier_list, &dcon_panic_nb);
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
+ &dcon_panic_nb);

free_irq(DCON_IRQ, dcon);

diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index f3494b868a64..ec21e63592be 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -1253,7 +1253,7 @@ static int hvfb_probe(struct hv_device *hdev,
*/
par->hvfb_panic_nb.notifier_call = hvfb_on_panic;
par->hvfb_panic_nb.priority = INT_MIN + 10,
- atomic_notifier_chain_register(&panic_notifier_list,
+ atomic_notifier_chain_register(&panic_pre_reboot_list,
&par->hvfb_panic_nb);

return 0;
@@ -1276,7 +1276,7 @@ static int hvfb_remove(struct hv_device *hdev)
struct fb_info *info = hv_get_drvdata(hdev);
struct hvfb_par *par = info->par;

- atomic_notifier_chain_unregister(&panic_notifier_list,
+ atomic_notifier_chain_unregister(&panic_pre_reboot_list,
&par->hvfb_panic_nb);

par->update = false;
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 7364a346bcb0..7912aacbc0e5 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -5,9 +5,9 @@
#include <linux/notifier.h>
#include <linux/types.h>

-extern struct atomic_notifier_head panic_notifier_list;
extern struct atomic_notifier_head panic_hypervisor_list;
extern struct atomic_notifier_head panic_info_list;
+extern struct atomic_notifier_head panic_pre_reboot_list;

extern bool crash_kexec_post_notifiers;

diff --git a/kernel/panic.c b/kernel/panic.c
index 73ca1bc44e30..a9d43b98b05b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -69,16 +69,15 @@ EXPORT_SYMBOL_GPL(panic_timeout);
#define PANIC_PRINT_ALL_CPU_BT 0x00000040
unsigned long panic_print;

-ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
-
-EXPORT_SYMBOL(panic_notifier_list);
-
ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
EXPORT_SYMBOL(panic_hypervisor_list);

ATOMIC_NOTIFIER_HEAD(panic_info_list);
EXPORT_SYMBOL(panic_info_list);

+ATOMIC_NOTIFIER_HEAD(panic_pre_reboot_list);
+EXPORT_SYMBOL(panic_pre_reboot_list);
+
static long no_blink(int state)
{
return 0;
@@ -295,7 +294,7 @@ void panic(const char *fmt, ...)
*/
atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
- atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
+ atomic_notifier_call_chain(&panic_pre_reboot_list, PANIC_NOTIFIER, buf);

panic_print_sys_info(false);

--
2.36.0

2022-04-29 20:31:25

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

From: Guilherme G. Piccoli <[email protected]> Sent: Wednesday, April 27, 2022 3:49 PM
>
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. This makes sense, since we're turning off
> such CPUs, putting them in an endless busy-wait loop.
>
> Problem is that there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), used for kexec/panic
> paths. This functions relies in a SMP call that also triggers a

s/functions relies in/function relies on/

> busy-wait loop [at machine_crash_nonpanic_core()], but *without*
> disabling interrupts. This might lead to odd scenarios, like early
> interrupts in the boot of kexec'd kernel or even interrupts in
> other CPUs while the main one still works in the panic path and
> assumes all secondary CPUs are (really!) off.
>
> This patch mimics the ipi_cpu_stop() interrupt disable mechanism
> in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
> secondary CPUs in the kexec/panic path as well.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
> arch/arm/kernel/machine_kexec.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index f567032a09c0..ef788ee00519 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
> set_cpu_online(smp_processor_id(), false);
> atomic_dec(&waiting_for_crash_ipi);
>
> + local_fiq_disable();
> + local_irq_disable();
> +
> while (1) {
> cpu_relax();
> wfe();
> --
> 2.36.0

2022-04-29 23:30:26

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

On 29/04/2022 13:04, Max Filippov wrote:
> [...]
>> arch/xtensa/platforms/iss/setup.c | 4 ++--For xtensa:
>
> For xtensa:
> Acked-by: Max Filippov <[email protected]>
>

Perfect, thanks Max!
Cheers,


Guilherme

2022-04-30 02:33:51

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

On 28/04/2022 05:30, Johannes Berg wrote:
> [trimming massive CC list]
>
> On Wed, 2022-04-27 at 19:49 -0300, Guilherme G. Piccoli wrote:
>>
>> Also, we remove a useless header inclusion.
>
> I wouldn't say it's useless, generally we try not to rely on implicit
> includes so much? And you at least now use NOTIFY_DONE from it.
>
> Otherwise looks fine to me.
>
> johannes

Hi Johannes, thanks for your prompt response, and for clearing a bit the
*huge* CC list, it got really humongous...

I agree with you here - I missed that there's also a reboot notifier in
"mconsole_kern.c", so seems it makes sense to keep the header. I'll fix
that for V2, ok? If it was only a panic notifier I would consider it
useless maybe, but we have non-panic notifier as well heh

Cheers,


Guilherme

2022-04-30 14:06:11

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

From: Guilherme G. Piccoli <[email protected]> Sent: Wednesday, April 27, 2022 3:49 PM
>
> Currently we have a debug infrastructure in the notifiers file, but
> it's very simple/limited. This patch extends it by:
>
> (a) Showing all registered/unregistered notifiers' callback names;
>
> (b) Adding a dynamic debug tuning to allow showing called notifiers'
> function names. Notice that this should be guarded as a tunable since
> it can flood the kernel log buffer.
>
> Cc: Arjan van de Ven <[email protected]>
> Cc: Cong Wang <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Xiaoming Ni <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
>
> We have some design decisions that worth discussing here:
>
> (a) First of call, using C99 helps a lot to write clear and concise code, but

s/call/all/

> due to commit 4d94f910e79a ("Kbuild: use -Wdeclaration-after-statement") we
> have a warning if mixing variable declarations with code. For this patch though,
> doing that makes the code way clear, so decision was to add the debug code
> inside brackets whenever this warning pops up. We can change that, but that'll
> cause more ifdefs in the same function.
>
> (b) In the symbol lookup helper function, we modify the parameter passed but
> even more, we return it as well! This is unusual and seems unnecessary, but was
> the strategy taken to allow embedding such function in the pr_debug() call.
>
> Not doing that would likely requiring 3 symbol_name variables to avoid
> concurrency (registering notifier A while calling notifier B) - we rely in
> local variables as a serialization mechanism.
>
> We're open for suggestions in case this design is not appropriate;
> thanks in advance!
>
> kernel/notifier.c | 48 +++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index ba005ebf4730..21032ebcde57 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -7,6 +7,22 @@
> #include <linux/vmalloc.h>
> #include <linux/reboot.h>
>
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> +#include <linux/kallsyms.h>
> +
> +/*
> + * Helper to get symbol names in case DEBUG_NOTIFIERS is set.
> + * Return the modified parameter is a strategy used to achieve
> + * the pr_debug() functionality - with this, function is only
> + * executed if the dynamic debug tuning is effectively set.
> + */
> +static inline char *notifier_name(struct notifier_block *nb, char *sym_name)
> +{
> + lookup_symbol_name((unsigned long)(nb->notifier_call), sym_name);
> + return sym_name;
> +}
> +#endif
> +
> /*
> * Notifier list for kernel code which wants to be called
> * at shutdown. This is used to stop any idling DMA operations
> @@ -34,20 +50,41 @@ static int notifier_chain_register(struct notifier_block **nl,
> }
> n->next = *nl;
> rcu_assign_pointer(*nl, n);
> +
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> + {
> + char sym_name[KSYM_NAME_LEN];
> +
> + pr_info("notifiers: registered %s()\n",
> + notifier_name(n, sym_name));
> + }
> +#endif
> return 0;
> }
>
> static int notifier_chain_unregister(struct notifier_block **nl,
> struct notifier_block *n)
> {
> + int ret = -ENOENT;
> +
> while ((*nl) != NULL) {
> if ((*nl) == n) {
> rcu_assign_pointer(*nl, n->next);
> - return 0;
> + ret = 0;
> + break;
> }
> nl = &((*nl)->next);
> }
> - return -ENOENT;
> +
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> + if (!ret) {
> + char sym_name[KSYM_NAME_LEN];
> +
> + pr_info("notifiers: unregistered %s()\n",
> + notifier_name(n, sym_name));
> + }
> +#endif
> + return ret;
> }
>
> /**
> @@ -80,6 +117,13 @@ static int notifier_call_chain(struct notifier_block **nl,
> nb = next_nb;
> continue;
> }
> +
> + {
> + char sym_name[KSYM_NAME_LEN];
> +
> + pr_debug("notifiers: calling %s()\n",
> + notifier_name(nb, sym_name));
> + }
> #endif
> ret = nb->notifier_call(nb, val, v);
>
> --
> 2.36.0

2022-04-30 15:38:53

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

Thanks Marc and Michael for the review/discussion.

On 29/04/2022 15:20, Marc Zyngier wrote:
> [...]

> My expectations would be that, since we're getting here using an IPI,
> interrupts are already masked. So what reenabled them the first place?
>
> Thanks,
>
> M.
>

Marc, I did some investigation in the code (and tried/failed in the ARM
documentation as well heh), but this is still not 100% clear for me.

You're saying IPI calls disable IRQs/FIQs by default in the the target
CPUs? Where does it happen? I'm a bit confused if this a processor
mechanism, or it's in code.

Looking the smp_send_stop() in arch/arm/, it does IPI the CPUs, with the
flag IPI_CPU_STOP, eventually calling ipi_cpu_stop(), and the latter
does disable IRQ/FIQ in code - that's where I stole my code from.

But crash_smp_send_stop() is different, it seems to IPI the other CPUs
with the flag IPI_CALL_FUNC, which leads to calling
generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
as well? I couldn't find it.

Appreciate your clarifications about that, thanks again.
Cheers,


Guilherme

2022-05-01 00:01:25

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote:
> Thanks Marc and Michael for the review/discussion.
>
> On 29/04/2022 15:20, Marc Zyngier wrote:
> > [...]
>
> > My expectations would be that, since we're getting here using an IPI,
> > interrupts are already masked. So what reenabled them the first place?
> >
> > Thanks,
> >
> > M.
> >
>
> Marc, I did some investigation in the code (and tried/failed in the ARM
> documentation as well heh), but this is still not 100% clear for me.
>
> You're saying IPI calls disable IRQs/FIQs by default in the the target
> CPUs? Where does it happen? I'm a bit confused if this a processor
> mechanism, or it's in code.

When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
themselves interrupts, so IRQs will be masked while the IPI is being
processed. Therefore, there should be no need to re-disable the
already disabled interrupts.

> But crash_smp_send_stop() is different, it seems to IPI the other CPUs
> with the flag IPI_CALL_FUNC, which leads to calling
> generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
> as well? I couldn't find it.

It's buried in the architecture behaviour. When the CPU takes an
interrupt and jumps to the interrupt vector in the vectors page, it is
architecturally defined that interrupts will be disabled. If they
weren't architecturally disabled at this point, then as soon as the
first instruction is processed (at the interrupt vector, likely a
branch) the CPU would immediately take another jump to the interrupt
vector, and this process would continue indefinitely, making interrupt
handling utterly useless.

So, you won't find an explicit instruction in the code path from the
vectors to the IPI handler that disables interrupts - because it's
written into the architecture that this is what must happen.

IRQs are a lower priority than FIQs, so FIQs remain unmasked.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-05-02 01:53:03

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

On Fri, 29 Apr 2022 22:45:14 +0100,
"Russell King (Oracle)" <[email protected]> wrote:
>
> On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote:
> > Thanks Marc and Michael for the review/discussion.
> >
> > On 29/04/2022 15:20, Marc Zyngier wrote:
> > > [...]
> >
> > > My expectations would be that, since we're getting here using an IPI,
> > > interrupts are already masked. So what reenabled them the first place?
> > >
> > > Thanks,
> > >
> > > M.
> > >
> >
> > Marc, I did some investigation in the code (and tried/failed in the ARM
> > documentation as well heh), but this is still not 100% clear for me.
> >
> > You're saying IPI calls disable IRQs/FIQs by default in the the target
> > CPUs? Where does it happen? I'm a bit confused if this a processor
> > mechanism, or it's in code.
>
> When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
> themselves interrupts, so IRQs will be masked while the IPI is being
> processed. Therefore, there should be no need to re-disable the
> already disabled interrupts.
>
> > But crash_smp_send_stop() is different, it seems to IPI the other CPUs
> > with the flag IPI_CALL_FUNC, which leads to calling
> > generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
> > as well? I couldn't find it.
>
> It's buried in the architecture behaviour. When the CPU takes an
> interrupt and jumps to the interrupt vector in the vectors page, it is
> architecturally defined that interrupts will be disabled. If they
> weren't architecturally disabled at this point, then as soon as the
> first instruction is processed (at the interrupt vector, likely a
> branch) the CPU would immediately take another jump to the interrupt
> vector, and this process would continue indefinitely, making interrupt
> handling utterly useless.
>
> So, you won't find an explicit instruction in the code path from the
> vectors to the IPI handler that disables interrupts - because it's
> written into the architecture that this is what must happen.
>
> IRQs are a lower priority than FIQs, so FIQs remain unmasked.

Ah, you're of course right. That's one of the huge differences between
AArch32 and AArch64, where the former has per target mode masking
rules, and the later masks everything on entry...

M.

--
Without deviation from the norm, progress is not possible.

2022-05-02 04:38:41

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

On 27/04/2022 22:01, Xiaoming Ni wrote:
> [...]
> Duplicate Code.
>
> Is it better to use __func__ and %pS?
>
> pr_info("%s: %pS\n", __func__, n->notifier_call);
>
>

This is a great suggestion Xiaoming, much appreciated!
I feel like reinventing the wheel here - with your idea, code was super
clear and concise, very nice suggestion!!

The only 2 things that diverge from your idea: I'm using '%ps' (not
showing offsets) and also, kept the wording "(un)registered/calling",
not using __func__ - I feel it's a bit odd in the output.
OK for you?

I'm definitely using your idea in V2 heh
Cheers,


Guilherme

2022-05-02 07:05:46

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

On 28/04/2022 05:11, Suzuki K Poulose wrote:
> Hi Guilherme,
> [...]
> How would you like to proceed with queuing this ? I am happy
> either way. In case you plan to push this as part of this
> series (I don't see any potential conflicts) :
>
> Reviewed-by: Suzuki K Poulose <[email protected]>

Thanks for your review Suzuki, much appreciated!

About your question, I'm not sure yet - in case the core changes would
take a while (like if community find them polemic, require many changes,
etc) I might split this series in 2 parts, the fixes part vs the
improvements per se. Either way, a V2 is going to happen for sure, and
in that moment, I'll let you know what I think it's best.

But either way, any choice you prefer is fine by me as well (like if you
want to merge it now or postpone to get merged in the future), this is
not an urgent fix I think =)
Cheers,


Guilherme

2022-05-02 23:33:19

by Max Filippov

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

On Wed, Apr 27, 2022 at 3:55 PM Guilherme G. Piccoli
<[email protected]> wrote:
>
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
>
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.

[...]

> arch/xtensa/platforms/iss/setup.c | 4 ++--For xtensa:

For xtensa:
Acked-by: Max Filippov <[email protected]>

--
Thanks.
-- Max

2022-05-02 23:51:56

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

On 02/05/2022 12:38, Florian Fainelli wrote:
> [...]
>
> Acked-by: Florian Fainelli <[email protected]>
>
> Not sure if the Fixes tag is warranted however as this is a clean up,
> and not really fixing a bug.

Perfect, thanks Florian. I'll add your ACK and remove the fixes tag in V2.
Cheers,


Guilherme

2022-05-03 00:25:25

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers



On 4/27/2022 3:49 PM, Guilherme G. Piccoli wrote:
> This patch improves the panic/die notifiers in this driver by
> making use of a passed "id" instead of comparing pointer
> address; also, it removes an useless prototype declaration
> and unnecessary header inclusion.
>
> This is part of a panic notifiers refactor - this notifier in
> the future will be moved to a new list, that encompass the
> information notifiers only.
>
> Fixes: 9eb60880d9a9 ("bus: brcmstb_gisb: add notifier handling")
> Cc: Brian Norris <[email protected]>
> Cc: Florian Fainelli <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>

Acked-by: Florian Fainelli <[email protected]>

Not sure if the Fixes tag is warranted however as this is a clean up,
and not really fixing a bug.
--
Florian

2022-05-03 01:01:00

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

On 29/04/2022 18:45, Russell King (Oracle) wrote:
> [...]
>> Marc, I did some investigation in the code (and tried/failed in the ARM
>> documentation as well heh), but this is still not 100% clear for me.
>>
>> You're saying IPI calls disable IRQs/FIQs by default in the the target
>> CPUs? Where does it happen? I'm a bit confused if this a processor
>> mechanism, or it's in code.
>
> When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
> themselves interrupts, so IRQs will be masked while the IPI is being
> processed. Therefore, there should be no need to re-disable the
> already disabled interrupts.
>
>> But crash_smp_send_stop() is different, it seems to IPI the other CPUs
>> with the flag IPI_CALL_FUNC, which leads to calling
>> generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
>> as well? I couldn't find it.
>
> It's buried in the architecture behaviour. When the CPU takes an
> interrupt and jumps to the interrupt vector in the vectors page, it is
> architecturally defined that interrupts will be disabled. If they
> weren't architecturally disabled at this point, then as soon as the
> first instruction is processed (at the interrupt vector, likely a
> branch) the CPU would immediately take another jump to the interrupt
> vector, and this process would continue indefinitely, making interrupt
> handling utterly useless.
>
> So, you won't find an explicit instruction in the code path from the
> vectors to the IPI handler that disables interrupts - because it's
> written into the architecture that this is what must happen.
>
> IRQs are a lower priority than FIQs, so FIQs remain unmasked.
>

Thanks a lot for the *great* explanation Russell, much appreciated.
So, this leads to the both following questions:

a) Shall we then change the patch to only disable FIQs, since it's panic
path and we don't want secondary CPUs getting interrupted, but only
spinning quietly "forever"?

b) How about cleaning ipi_cpu_stop() then, by dropping the call to
local_irq_disable() there, to avoid the double IRQ disabling?

Thanks,


Guilherme

2022-05-03 01:19:03

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

On Wed, 27 Apr 2022 23:48:56 +0100,
"Guilherme G. Piccoli" <[email protected]> wrote:
>
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. This makes sense, since we're turning off
> such CPUs, putting them in an endless busy-wait loop.
>
> Problem is that there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), used for kexec/panic
> paths. This functions relies in a SMP call that also triggers a
> busy-wait loop [at machine_crash_nonpanic_core()], but *without*
> disabling interrupts. This might lead to odd scenarios, like early
> interrupts in the boot of kexec'd kernel or even interrupts in
> other CPUs while the main one still works in the panic path and
> assumes all secondary CPUs are (really!) off.
>
> This patch mimics the ipi_cpu_stop() interrupt disable mechanism
> in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
> secondary CPUs in the kexec/panic path as well.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
> arch/arm/kernel/machine_kexec.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index f567032a09c0..ef788ee00519 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
> set_cpu_online(smp_processor_id(), false);
> atomic_dec(&waiting_for_crash_ipi);
>
> + local_fiq_disable();
> + local_irq_disable();
> +

My expectations would be that, since we're getting here using an IPI,
interrupts are already masked. So what reenabled them the first place?

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2022-05-09 12:45:21

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

On 27/04/2022 19:48, Guilherme G. Piccoli wrote:
> In the panic path we have a list of functions to be called, the panic
> notifiers - such callbacks perform various actions in the machine's
> last breath, and sometimes users want them to run before kdump. We
> have the parameter "crash_kexec_post_notifiers" for that. When such
> parameter is used, the function "crash_smp_send_stop()" is executed
> to poweroff all secondary CPUs through the NMI-shootdown mechanism;
> part of this process involves disabling virtualization features in
> all CPUs (except the main one).
>
> Now, in the emergency restart procedure we have also a way of
> disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
> what happens though is that in case we already NMI-disabled all CPUs,
> the emergency restart fails due to a second addition of the same items
> in the NMI list, as per the following log output:
>
> sysrq: Trigger a crash
> Kernel panic - not syncing: sysrq triggered crash
> [...]
> Rebooting in 2 seconds..
> list_add double add: new=<addr1>, prev=<addr2>, next=<addr1>.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
>
> In order to reproduce the problem, users just need to set the kernel
> parameter "crash_kexec_post_notifiers" *without* kdump set in any
> system with the VMX feature present.
>
> Since there is no benefit in re-disabling VMX in all CPUs in case
> it was already done, this patch prevents that by guarding the restart
> routine against doubly issuing NMIs unnecessarily. Notice we still
> need to disable VMX locally in the emergency restart.
>
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
> Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path")
> Cc: David P. Reed <[email protected]>
> Cc: Hidehiro Kawai <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
> arch/x86/include/asm/cpu.h | 1 +
> arch/x86/kernel/crash.c | 8 ++++----
> arch/x86/kernel/reboot.c | 14 ++++++++++++--
> 3 files changed, 17 insertions(+), 6 deletions(-)
>

Hi Paolo / Sean / Vitaly, sorry for the ping.
But do you think this fix is OK from the VMX point-of-view?

I'd like to send a V2 of this set soon, so any review here is highly
appreciated!

Cheers,


Guilherme


2022-05-09 13:14:21

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

On 28/04/2022 05:11, Suzuki K Poulose wrote:
> Hi Guilherme,
>
> On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
>> The panic notifier infrastructure executes registered callbacks when
>> a panic event happens - such callbacks are executed in atomic context,
>> with interrupts and preemption disabled in the running CPU and all other
>> CPUs disabled. That said, mutexes in such context are not a good idea.
>>
>> This patch replaces a regular mutex with a mutex_trylock safer approach;
>> given the nature of the mutex used in the driver, it should be pretty
>> uncommon being unable to acquire such mutex in the panic path, hence
>> no functional change should be observed (and if it is, that would be
>> likely a deadlock with the regular mutex).
>>
>> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
>> Cc: Leo Yan <[email protected]>
>> Cc: Mathieu Poirier <[email protected]>
>> Cc: Mike Leach <[email protected]>
>> Cc: Suzuki K Poulose <[email protected]>
>> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> How would you like to proceed with queuing this ? I am happy
> either way. In case you plan to push this as part of this
> series (I don't see any potential conflicts) :
>
> Reviewed-by: Suzuki K Poulose <[email protected]>

Hi Suzuki, some other maintainers are taking the patches to their next
branches for example. I'm working on V2, and I guess in the end would be
nice to reduce the size of the series a bit.

So, do you think you could pick this one for your coresight/next branch
(or even for rc cycle, your call - this is really a fix)?
This way, I won't re-submit this one in V2, since it's gonna be merged
already in your branch.

Thanks in advance,


Guilherme

2022-05-09 14:19:11

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 10/30] alpha: Clean-up the panic notifier code

On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> The alpha panic notifier has some code issues, not following
> the conventions of other notifiers. Also, it might halt the
> machine but still it is set to run as early as possible, which
> doesn't seem to be a good idea.
>
> This patch cleans the code, and set the notifier to run as the
> latest, following the same approach other architectures are doing.
> Also, we remove the unnecessary include of a header already
> included indirectly.
>
> Cc: Ivan Kokshaysky <[email protected]>
> Cc: Matt Turner <[email protected]>
> Cc: Richard Henderson <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
> arch/alpha/kernel/setup.c | 36 +++++++++++++++---------------------
> 1 file changed, 15 insertions(+), 21 deletions(-)
>
> diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
> index b4fbbba30aa2..d88bdf852753 100644
> --- a/arch/alpha/kernel/setup.c
> +++ b/arch/alpha/kernel/setup.c
> @@ -41,19 +41,11 @@
> #include <linux/sysrq.h>
> #include <linux/reboot.h>
> #endif
> -#include <linux/notifier.h>
> #include <asm/setup.h>
> #include <asm/io.h>
> #include <linux/log2.h>
> #include <linux/export.h>
>
> -static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
> -static struct notifier_block alpha_panic_block = {
> - alpha_panic_event,
> - NULL,
> - INT_MAX /* try to do it first */
> -};
> -
> #include <linux/uaccess.h>
> #include <asm/hwrpb.h>
> #include <asm/dma.h>
> @@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
> };
> #endif
>
> +static int alpha_panic_event(struct notifier_block *this,
> + unsigned long event, void *ptr)
> +{
> + /* If we are using SRM and serial console, just hard halt here. */
> + if (alpha_using_srm && srmcons_output)
> + __halt();
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block alpha_panic_block = {
> + .notifier_call = alpha_panic_event,
> + .priority = INT_MIN, /* may not return, do it last */
> +};
> +
> void __init
> setup_arch(char **cmdline_p)
> {
> @@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
> .show = show_cpuinfo,
> };
>
> -
> -static int
> -alpha_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
> -{
> -#if 1
> - /* FIXME FIXME FIXME */
> - /* If we are using SRM and serial console, just hard halt here. */
> - if (alpha_using_srm && srmcons_output)
> - __halt();
> -#endif
> - return NOTIFY_DONE;
> -}
> -
> static __init int add_pcspkr(void)
> {
> struct platform_device *pd;


Hi folks, I'm updating Richard's email and re-sending the V1, any
reviews are greatly appreciated!

Cheers,


Guilherme

2022-05-09 14:22:53

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
>
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
>
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.
>
> Cc: Alexander Gordeev <[email protected]>
> Cc: Christian Borntraeger <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: Heiko Carstens <[email protected]>
> Cc: Sven Schnelle <[email protected]>
> Cc: Vasily Gorbik <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>

Hey S390/SPARC folks, sorry for the ping!

Any reviews on this V1 would be greatly appreciated, I'm working on V2
and seeking feedback in the non-reviewed patches.

Thanks in advance,


Guilherme

2022-05-09 15:57:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

I find the shortlog to be very confusing, the bug has nothing to do with disabling
VMX and I distinctly remember wrapping VMXOFF with exception fixup to prevent doom
if VMX is already disabled :-). The issue is really that nmi_shootdown_cpus() doesn't
play nice with being called twice.

On Wed, Apr 27, 2022, Guilherme G. Piccoli wrote:
> In the panic path we have a list of functions to be called, the panic
> notifiers - such callbacks perform various actions in the machine's
> last breath, and sometimes users want them to run before kdump. We
> have the parameter "crash_kexec_post_notifiers" for that. When such
> parameter is used, the function "crash_smp_send_stop()" is executed
> to poweroff all secondary CPUs through the NMI-shootdown mechanism;
> part of this process involves disabling virtualization features in
> all CPUs (except the main one).
>
> Now, in the emergency restart procedure we have also a way of
> disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
> what happens though is that in case we already NMI-disabled all CPUs,
> the emergency restart fails due to a second addition of the same items
> in the NMI list, as per the following log output:
>
> sysrq: Trigger a crash
> Kernel panic - not syncing: sysrq triggered crash
> [...]
> Rebooting in 2 seconds..
> list_add double add: new=<addr1>, prev=<addr2>, next=<addr1>.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI

Call stacks for the two callers would be very, very helpful.

> In order to reproduce the problem, users just need to set the kernel
> parameter "crash_kexec_post_notifiers" *without* kdump set in any
> system with the VMX feature present.
>
> Since there is no benefit in re-disabling VMX in all CPUs in case
> it was already done, this patch prevents that by guarding the restart
> routine against doubly issuing NMIs unnecessarily. Notice we still
> need to disable VMX locally in the emergency restart.
>
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
> Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path")
> Cc: David P. Reed <[email protected]>
> Cc: Hidehiro Kawai <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
> arch/x86/include/asm/cpu.h | 1 +
> arch/x86/kernel/crash.c | 8 ++++----
> arch/x86/kernel/reboot.c | 14 ++++++++++++--
> 3 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index 86e5e4e26fcb..b6a9062d387f 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -36,6 +36,7 @@ extern int _debug_hotplug_cpu(int cpu, int action);
> #endif
> #endif
>
> +extern bool crash_cpus_stopped;
> int mwait_usable(const struct cpuinfo_x86 *);
>
> unsigned int x86_family(unsigned int sig);
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index e8326a8d1c5d..71dd1a990e8d 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -42,6 +42,8 @@
> #include <asm/crash.h>
> #include <asm/cmdline.h>
>
> +bool crash_cpus_stopped;
> +
> /* Used while preparing memory map entries for second kernel */
> struct crash_memmap_data {
> struct boot_params *params;
> @@ -108,9 +110,7 @@ void kdump_nmi_shootdown_cpus(void)
> /* Override the weak function in kernel/panic.c */
> void crash_smp_send_stop(void)
> {
> - static int cpus_stopped;
> -
> - if (cpus_stopped)
> + if (crash_cpus_stopped)
> return;
>
> if (smp_ops.crash_stop_other_cpus)
> @@ -118,7 +118,7 @@ void crash_smp_send_stop(void)
> else
> smp_send_stop();
>
> - cpus_stopped = 1;
> + crash_cpus_stopped = true;

This feels like were just adding more duct tape to the mess. nmi_shootdown() is
still unsafe for more than one caller, and it takes a _lot_ of staring and searching
to understand that crash_smp_send_stop() is invoked iff CONFIG_KEXEC_CORE=y, i.e.
that it will call smp_ops.crash_stop_other_cpus() and not just smp_send_stop().

Rather than shared a flag between two relatively unrelated functions, what if we
instead disabling virtualization in crash_nmi_callback() and then turn the reboot
call into a nop if an NMI shootdown has already occurred? That will also add a
bit of documentation about multiple shootdowns not working.

And I believe there's also a lurking bug in native_machine_emergency_restart() that
can be fixed with cleanup. SVM can also block INIT and so should be disabled during
an emergency reboot.

The attached patches are compile tested only. If they seem sane, I'll post an
official mini series.

> }
>
> #else
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index fa700b46588e..2fc42b8402ac 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -589,8 +589,18 @@ static void native_machine_emergency_restart(void)
> int orig_reboot_type = reboot_type;
> unsigned short mode;
>
> - if (reboot_emergency)
> - emergency_vmx_disable_all();
> + /*
> + * We can reach this point in the end of panic path, having
> + * NMI-disabled all secondary CPUs. This process involves
> + * disabling the CPU virtualization technologies, so if that
> + * is the case, we only miss disabling the local CPU VMX...
> + */
> + if (reboot_emergency) {
> + if (!crash_cpus_stopped)
> + emergency_vmx_disable_all();
> + else
> + cpu_emergency_vmxoff();
> + }
>
> tboot_shutdown(TB_SHUTDOWN_REBOOT);
>
> --
> 2.36.0
>


Attachments:
(No filename) (5.70 kB)
0001-x86-crash-Disable-virt-in-core-NMI-crash-handler-to-.patch (6.64 kB)
0002-x86-reboot-Disable-virtualization-in-an-emergency-if.patch (2.73 kB)
Download all attachments

2022-05-09 16:17:27

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

Hi

On 09/05/2022 14:09, Guilherme G. Piccoli wrote:
> On 28/04/2022 05:11, Suzuki K Poulose wrote:
>> Hi Guilherme,
>>
>> On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
>>> The panic notifier infrastructure executes registered callbacks when
>>> a panic event happens - such callbacks are executed in atomic context,
>>> with interrupts and preemption disabled in the running CPU and all other
>>> CPUs disabled. That said, mutexes in such context are not a good idea.
>>>
>>> This patch replaces a regular mutex with a mutex_trylock safer approach;
>>> given the nature of the mutex used in the driver, it should be pretty
>>> uncommon being unable to acquire such mutex in the panic path, hence
>>> no functional change should be observed (and if it is, that would be
>>> likely a deadlock with the regular mutex).
>>>
>>> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
>>> Cc: Leo Yan <[email protected]>
>>> Cc: Mathieu Poirier <[email protected]>
>>> Cc: Mike Leach <[email protected]>
>>> Cc: Suzuki K Poulose <[email protected]>
>>> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>>
>> How would you like to proceed with queuing this ? I am happy
>> either way. In case you plan to push this as part of this
>> series (I don't see any potential conflicts) :
>>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>
> Hi Suzuki, some other maintainers are taking the patches to their next
> branches for example. I'm working on V2, and I guess in the end would be
> nice to reduce the size of the series a bit.
>
> So, do you think you could pick this one for your coresight/next branch
> (or even for rc cycle, your call - this is really a fix)?
> This way, I won't re-submit this one in V2, since it's gonna be merged
> already in your branch.

I have queued this to coresight/next.

Thanks
Suzuki

2022-05-09 16:34:23

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

On 09/05/2022 13:14, Suzuki K Poulose wrote:
> [...]>
> I have queued this to coresight/next.
>
> Thanks
> Suzuki


Thanks a lot Suzuki!

2022-05-10 14:53:43

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

On Wed 2022-04-27 19:48:59, Guilherme G. Piccoli wrote:
> The pvpanic driver relies on panic notifiers to execute a callback
> on panic event. Such function is executed in atomic context - the
> panic function disables local IRQs, preemption and all other CPUs
> that aren't running the panic code.
>
> With that said, it's dangerous to use regular spinlocks in such path,
> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
> This patch fixes that by replacing regular spinlocks with the trylock
> safer approach.

It seems that the lock is used just to manipulating a list. A super
safe solution would be to use the rcu API: rcu_add_rcu() and
list_del_rcu() under rcu_read_lock(). The spin lock will not be
needed and the list will always be valid.

The advantage would be that it will always call members that
were successfully added earlier. That said, I am not familiar
with pvpanic and am not sure if it is worth it.

> It also fixes an old comment (about a long gone framebuffer code) and
> the notifier priority - we should execute hypervisor notifiers early,
> deferring this way the panic action to the hypervisor, as expected by
> the users that are setting up pvpanic.

This should be done in a separate patch. It changes the behavior.
Also there might be a discussion whether it really should be
the maximal priority.

Best Regards,
Petr

2022-05-10 16:01:03

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

On 10/05/2022 09:14, Petr Mladek wrote:
> [...]
>> With that said, it's dangerous to use regular spinlocks in such path,
>> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
>> This patch fixes that by replacing regular spinlocks with the trylock
>> safer approach.
>
> It seems that the lock is used just to manipulating a list. A super
> safe solution would be to use the rcu API: rcu_add_rcu() and
> list_del_rcu() under rcu_read_lock(). The spin lock will not be
> needed and the list will always be valid.
>
> The advantage would be that it will always call members that
> were successfully added earlier. That said, I am not familiar
> with pvpanic and am not sure if it is worth it.
>
>> It also fixes an old comment (about a long gone framebuffer code) and
>> the notifier priority - we should execute hypervisor notifiers early,
>> deferring this way the panic action to the hypervisor, as expected by
>> the users that are setting up pvpanic.
>
> This should be done in a separate patch. It changes the behavior.
> Also there might be a discussion whether it really should be
> the maximal priority.
>
> Best Regards,
> Petr

Thanks for the review Petr. Patch was already merged - my goal was to be
concise, i.e., a patch per driver / module, so the patch kinda fixes
whatever I think is wrong with the driver with regards panic handling.

Do you think it worth to remove this patch from Greg's branch just to
split it in 2? Personally I think it's not worth, but opinions are welcome.

About the RCU part, this one really could be a new patch, a good
improvement patch - it makes sense to me, we can think about that after
the fixes I guess.

Cheers,


Guilherme

2022-05-10 17:52:55

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 10/30] alpha: Clean-up the panic notifier code

On Mon 2022-05-09 11:13:17, Guilherme G. Piccoli wrote:
> On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> > The alpha panic notifier has some code issues, not following
> > the conventions of other notifiers. Also, it might halt the
> > machine but still it is set to run as early as possible, which
> > doesn't seem to be a good idea.

Yeah, it is pretty strange behavior.

I looked into the history. This notifier was added into the alpha code
in 2.4.0-test2pre2. In this historic code, the default panic() code
either rebooted after a timeout or ended in a infinite loop. There
was not crasdump at that times.

The notifier allowed to change the behavior. There were 3 notifiers:

+ mips and mips64 ended with blinking in panic()
+ alpha did __halt() in this srm case

They both still do this. I guess that it is some historic behavior
that people using these architectures are used to.

Anyway, it makes sense to do this as the last notifier after
dumping other information.

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2022-05-10 19:35:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

On Thu, 28 Apr 2022 09:01:13 +0800
Xiaoming Ni <[email protected]> wrote:

> > +#ifdef CONFIG_DEBUG_NOTIFIERS
> > + {
> > + char sym_name[KSYM_NAME_LEN];
> > +
> > + pr_info("notifiers: registered %s()\n",
> > + notifier_name(n, sym_name));
> > + }
>
> Duplicate Code.
>
> Is it better to use __func__ and %pS?
>
> pr_info("%s: %pS\n", __func__, n->notifier_call);
>
>
> > +#endif

Also, don't sprinkle #ifdef in C code. Instead:

if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
pr_info("notifers: regsiter %ps()\n",
n->notifer_call);


Or define a print macro at the start of the C file that is a nop if it is
not defined, and use the macro.

-- Steve

2022-05-10 19:42:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

On Wed, 27 Apr 2022 19:49:17 -0300
"Guilherme G. Piccoli" <[email protected]> wrote:

> Currently we don't have a way to check if there are dumpers set,
> except counting the list members maybe. This patch introduces a very
> simple helper to provide this information, by just keeping track of
> registered/unregistered kmsg dumpers. It's going to be used on the
> panic path in the subsequent patch.

FYI, it is considered "bad form" to reference in the change log "this
patch". We know this is a patch. The change log should just talk about what
is being done. So can you reword your change logs (you do this is almost
every patch). Here's what I would reword the above to be:

Currently we don't have a way to check if there are dumpers set, except
perhaps by counting the list members. Introduce a very simple helper to
provide this information, by just keeping track of registered/unregistered
kmsg dumpers. This will simplify the refactoring of the panic path.


-- Steve

2022-05-10 20:01:09

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

On 10/05/2022 12:16, Petr Mladek wrote:
> [...]
> Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> All notifiers will be always called with PANIC_NOTIFIER.
>
> The @val parameter is normally used when the same notifier_list
> is used in different situations.
>
> But you are going to use it when the same notifier is used
> in more lists. This is normally distinguished by the @nh
> (atomic_notifier_head) parameter.
>
> IMHO, it is a bad idea. First, it would confuse people because
> it does not follow the original design of the parameters.
> Second, the related code must be touched anyway when
> the notifier is moved into another list so it does not
> help much.
>
> Or do I miss anything, please?
>
> Best Regards,
> Petr

Hi Petr, thanks for the review.

I'm not strong attached to this patch, so we could drop it and refactor
the code of next patches to use the @nh as identification - but
personally, I feel this parameter could be used to identify the list
that called such function, in other words, what is the event that
triggered the callback. Some notifiers are even declared with this
parameter called "ev", like the event that triggers the notifier.


You mentioned 2 cases:

(a) Same notifier_list used in different situations;

(b) Same *notifier callback* used in different lists;

Mine is case (b), right? Can you show me an example of case (a)? You can
see in the following patches (or grep the kernel) that people are using
this identification parameter to determine which kind of OOPS trigger
the callback to condition the execution of the function to specific
cases. IIUIC, this is more or less what I'm doing, but extending the
idea for panic notifiers.

Again, as a personal preference, it makes sense to me using id's VS
comparing pointers to differentiate events/callers.

Cheers,


Guilherme

2022-05-10 20:31:23

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

On Wed 2022-04-27 19:49:09, Guilherme G. Piccoli wrote:
> This patch improves the panic/die notifiers in this driver by
> making use of a passed "id" instead of comparing pointer
> address; also, it removes an useless prototype declaration
> and unnecessary header inclusion.
>
> This is part of a panic notifiers refactor - this notifier in
> the future will be moved to a new list, that encompass the
> information notifiers only.
>
> --- a/drivers/bus/brcmstb_gisb.c
> +++ b/drivers/bus/brcmstb_gisb.c
> @@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, void *dev_id)
> /*
> * Dump out gisb errors on die or panic.
> */
> -static int dump_gisb_error(struct notifier_block *self, unsigned long v,
> - void *p);
> -
> -static struct notifier_block gisb_die_notifier = {
> - .notifier_call = dump_gisb_error,
> -};
> -
> -static struct notifier_block gisb_panic_notifier = {
> - .notifier_call = dump_gisb_error,
> -};
> -
> static int dump_gisb_error(struct notifier_block *self, unsigned long v,
> void *p)
> {
> struct brcmstb_gisb_arb_device *gdev;
> - const char *reason = "panic";
> + const char *reason = "die";
>
> - if (self == &gisb_die_notifier)
> - reason = "die";
> + if (v == PANIC_NOTIFIER)
> + reason = "panic";

IMHO, the check of the @self parameter was the proper solution.

"gisb_die_notifier" list uses @val from enum die_val.
"gisb_panic_notifier" list uses @val from enum panic_notifier_val.

These are unrelated types. It might easily break when
someone defines the same constant also in enum die_val.

Best Regards,
Petr

2022-05-10 21:10:43

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

On Wed 2022-04-27 19:49:05, Guilherme G. Piccoli wrote:
> Currently the panic notifiers from user mode linux don't follow
> the convention for most of the other notifiers present in the
> kernel (indentation, priority setting, numeric return).
> More important, the priorities could be improved, since it's a
> special case (userspace), hence we could run the notifiers earlier;
> user mode linux shouldn't care much with other panic notifiers but
> the ordering among the mconsole and arch notifier is important,
> given that the arch one effectively triggers a core dump.

It is not clear to me why user mode linux should not care about
the other notifiers. It might be because I do not know much
about the user mode linux.

Is the because they always create core dump or are never running
in a hypervisor or ...?

AFAIK, the notifiers do many different things. For example, there
is a notifier that disables RCU watchdog, print some extra
information. Why none of them make sense here?

> This patch fixes that by running the mconsole notifier as the first
> panic notifier, followed by the architecture one (that coredumps).
> Also, we remove a useless header inclusion.


Best Regards,
Petr

2022-05-10 22:24:00

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

On Wed 2022-04-27 19:49:08, Guilherme G. Piccoli wrote:
> The notifiers infrastructure provides a way to pass an "id" to the
> callbacks to determine what kind of event happened, i.e., what is
> the reason behind they getting called.
>
> The panic notifier currently pass 0, but this is soon to be
> used in a multi-targeted notifier, so let's pass a meaningful
> "id" over there.
>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
> include/linux/panic_notifier.h | 5 +++++
> kernel/panic.c | 2 +-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index 41e32483d7a7..07dced83a783 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;
>
> extern bool crash_kexec_post_notifiers;
>
> +enum panic_notifier_val {
> + PANIC_UNUSED,
> + PANIC_NOTIFIER = 0xDEAD,
> +};

Hmm, this looks like a hack. PANIC_UNUSED will never be used.
All notifiers will be always called with PANIC_NOTIFIER.

The @val parameter is normally used when the same notifier_list
is used in different situations.

But you are going to use it when the same notifier is used
in more lists. This is normally distinguished by the @nh
(atomic_notifier_head) parameter.

IMHO, it is a bad idea. First, it would confuse people because
it does not follow the original design of the parameters.
Second, the related code must be touched anyway when
the notifier is moved into another list so it does not
help much.

Or do I miss anything, please?

Best Regards,
Petr

2022-05-11 03:18:35

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

On 09/05/2022 12:52, Sean Christopherson wrote:
> I find the shortlog to be very confusing, the bug has nothing to do with disabling
> VMX and I distinctly remember wrapping VMXOFF with exception fixup to prevent doom
> if VMX is already disabled :-). The issue is really that nmi_shootdown_cpus() doesn't
> play nice with being called twice.
>

Hey Sean, OK - I agree with you, the issue is really about the double
list addition.

> [...]
>
> Call stacks for the two callers would be very, very helpful.
> [...]

> This feels like were just adding more duct tape to the mess. nmi_shootdown() is
> still unsafe for more than one caller, and it takes a _lot_ of staring and searching
> to understand that crash_smp_send_stop() is invoked iff CONFIG_KEXEC_CORE=y, i.e.
> that it will call smp_ops.crash_stop_other_cpus() and not just smp_send_stop().
>
> Rather than shared a flag between two relatively unrelated functions, what if we
> instead disabling virtualization in crash_nmi_callback() and then turn the reboot
> call into a nop if an NMI shootdown has already occurred? That will also add a
> bit of documentation about multiple shootdowns not working.
>
> And I believe there's also a lurking bug in native_machine_emergency_restart() that
> can be fixed with cleanup. SVM can also block INIT and so should be disabled during
> an emergency reboot.
>
> The attached patches are compile tested only. If they seem sane, I'll post an
> official mini series.

Thanks Sean, it makes sense - my patch is more a "band-aid" whereas
yours fixes it in a more generic way. Confess I found the logic of your
patch complex, but as you said, it requires a *lot* of code analysis to
understand these multiple shutdown patches, the problem is complicated
by nature heh

I've tested your patch 0001 and it works well for all cases [0], so go
ahead and submit the miniseries, feel free to add:

Reported-and-tested-by: Guilherme G. Piccoli <[email protected]>


I've read patch 0002 and it makes sense to me as well, a good proactive
bug fix =)

With that said, I'll of course drop this one from V2 of this series.
Cheers,


Guilherme




[0]
A summary of my tests and the code paths that the panic shutdown take
depending on some conditions:

New function that disables VMX/SVM: cpu_crash_disable_virtualization()
[should be executed in every online CPU on shutdown)

The panic path triggers the following call stacks depending on kdump and
post_notifiers:


(1) kexec/kdump + !crash_kexec_post_notifiers
->machine_crash_shutdown()
----.crash_shutdown() <custom handler>
------native_machine_crash_shutdown() [all custom handlers except Xen PV
call the native generic function]
--------crash_smp_send_stop()
----------kdump_nmi_shootdown_cpus()
------------nmi_shootdown_cpus(kdump_nmi_callback)
--------------crash_nmi_callback()
----------------kdump_nmi_callback()
------------------cpu_crash_disable_virtualization()


(2) kexec/kdump + crash_kexec_post_notifiers
->crash_smp_send_stop()
----kdump_nmi_shootdown_cpus()
------nmi_shootdown_cpus(kdump_nmi_callback)
--------crash_nmi_callback()
----------kdump_nmi_callback()
------------cpu_crash_disable_virtualization()

After this path, will execute machine_crash_shutdown() but
crash_smp_send_stop()
is guarded against double execution. Also, emergency restart calls
emergency_vmx_disable_all() .


(3) !kexec/kdump + crash_kexec_post_notifiers

Same as (2)


(4) !kexec/kdump + !crash_kexec_post_notifiers
-> smp_send_stop()
----native_stop_other_cpus()
------apic_send_IPI_allbutself(REBOOT_VECTOR)
--------sysvec_reboot
----------cpu_emergency_vmxoff() <if the IPI approach succeeded, CPU
stopped here>

If not:
------register_stop_handler()
--------apic_send_IPI_allbutself(NMI_VECTOR)
----------smp_stop_nmi_callback()
------------cpu_emergency_vmxoff()

After that, emergency_vmx_disable_all() gets called in the emergency
restart path as well.

2022-05-11 23:20:19

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

On 11/05/2022 13:45, Heiko Carstens wrote:
> [...]
>>
>> Hey S390/SPARC folks, sorry for the ping!
>>
>> Any reviews on this V1 would be greatly appreciated, I'm working on V2
>> and seeking feedback in the non-reviewed patches.
>
> Sorry, missed that this is quite s390 specific. So, yes, this looks
> good to me and nice to see that one of the remaining CONFIG_S390 in
> common code will be removed!
>
> For the s390 bits:
> Acked-by: Heiko Carstens <[email protected]>

No need for apologies, I really appreciate your review!
Will add your Ack for V2 =)

Cheers,


Guilherme

2022-05-12 13:27:17

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

On 10/05/2022 11:28, Petr Mladek wrote:
> [...]
> It is not clear to me why user mode linux should not care about
> the other notifiers. It might be because I do not know much
> about the user mode linux.
>
> Is the because they always create core dump or are never running
> in a hypervisor or ...?
>
> AFAIK, the notifiers do many different things. For example, there
> is a notifier that disables RCU watchdog, print some extra
> information. Why none of them make sense here?
>

Hi Petr, my understanding is that UML is a form of running Linux as a
regular userspace process for testing purposes. With that said, as soon
as we exit in the error path, less "pollution" would happen, so users
can use GDB to debug the core dump for example.

In later patches of this series (when we split the panic notifiers in 3
lists) these UML notifiers run in the pre-reboot list, so they run after
the informational notifiers for example (in the default level).
But without the list split we cannot order properly, so my gut feeling
is that makes sense to run them rather earlier than later in the panic
process...

Maybe Anton / Johannes / Richard could give their opinions - appreciate
that, I'm not attached to the priority here, it's more about users'
common usage of UML I can think of...

Cheers,


Guilherme

2022-05-13 02:14:31

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

On Mon, May 09, 2022 at 11:16:10AM -0300, Guilherme G. Piccoli wrote:
> On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> > Currently we have 3 notifier lists in the panic path, which will
> > be wired in a way to allow the notifier callbacks to run in
> > different moments at panic time, in a subsequent patch.
> >
> > But there is also an odd set of architecture calls hardcoded in
> > the end of panic path, after the restart machinery. They're
> > responsible for late time tunings / events, like enabling a stop
> > button (Sparc) or effectively stopping the machine (s390).
> >
> > This patch introduces yet another notifier list to offer the
> > architectures a way to add callbacks in such late moment on
> > panic path without the need of ifdefs / hardcoded approaches.
> >
> > Cc: Alexander Gordeev <[email protected]>
> > Cc: Christian Borntraeger <[email protected]>
> > Cc: "David S. Miller" <[email protected]>
> > Cc: Heiko Carstens <[email protected]>
> > Cc: Sven Schnelle <[email protected]>
> > Cc: Vasily Gorbik <[email protected]>
> > Signed-off-by: Guilherme G. Piccoli <[email protected]>
>
> Hey S390/SPARC folks, sorry for the ping!
>
> Any reviews on this V1 would be greatly appreciated, I'm working on V2
> and seeking feedback in the non-reviewed patches.

Sorry, missed that this is quite s390 specific. So, yes, this looks
good to me and nice to see that one of the remaining CONFIG_S390 in
common code will be removed!

For the s390 bits:
Acked-by: Heiko Carstens <[email protected]>

2022-05-13 14:28:28

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 10/30] alpha: Clean-up the panic notifier code

On 10/05/2022 11:16, Petr Mladek wrote:
> [...]
> Yeah, it is pretty strange behavior.
>
> I looked into the history. This notifier was added into the alpha code
> in 2.4.0-test2pre2. In this historic code, the default panic() code
> either rebooted after a timeout or ended in a infinite loop. There
> was not crasdump at that times.
>
> The notifier allowed to change the behavior. There were 3 notifiers:
>
> + mips and mips64 ended with blinking in panic()
> + alpha did __halt() in this srm case
>
> They both still do this. I guess that it is some historic behavior
> that people using these architectures are used to.
>
> Anyway, it makes sense to do this as the last notifier after
> dumping other information.
>
> Reviewed-by: Petr Mladek <[email protected]>
>
> Best Regards,
> Petr

Thanks a bunch for the review - added your tag for V2 =)

2022-05-13 17:28:49

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

On 10/05/2022 14:40, Steven Rostedt wrote:
> On Wed, 27 Apr 2022 19:49:17 -0300
> "Guilherme G. Piccoli" <[email protected]> wrote:
>
>> Currently we don't have a way to check if there are dumpers set,
>> except counting the list members maybe. This patch introduces a very
>> simple helper to provide this information, by just keeping track of
>> registered/unregistered kmsg dumpers. It's going to be used on the
>> panic path in the subsequent patch.
>
> FYI, it is considered "bad form" to reference in the change log "this
> patch". We know this is a patch. The change log should just talk about what
> is being done. So can you reword your change logs (you do this is almost
> every patch). Here's what I would reword the above to be:
>
> Currently we don't have a way to check if there are dumpers set, except
> perhaps by counting the list members. Introduce a very simple helper to
> provide this information, by just keeping track of registered/unregistered
> kmsg dumpers. This will simplify the refactoring of the panic path.

Thanks for the hint, you're right - it's almost in all of my patches.
I'll reword all of them (except the ones already merged) to remove this
"bad form".

Cheers,


Guilherme

2022-05-14 00:42:43

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

On Wed, 2022-05-11 at 17:22 -0300, Guilherme G. Piccoli wrote:
> On 10/05/2022 11:28, Petr Mladek wrote:
> > [...]
> > It is not clear to me why user mode linux should not care about
> > the other notifiers. It might be because I do not know much
> > about the user mode linux.
> >
> > Is the because they always create core dump or are never running
> > in a hypervisor or ...?
> >
> > AFAIK, the notifiers do many different things. For example, there
> > is a notifier that disables RCU watchdog, print some extra
> > information. Why none of them make sense here?
> >
>
> Hi Petr, my understanding is that UML is a form of running Linux as a
> regular userspace process for testing purposes.

Correct.

> With that said, as soon
> as we exit in the error path, less "pollution" would happen, so users
> can use GDB to debug the core dump for example.
>
> In later patches of this series (when we split the panic notifiers in 3
> lists) these UML notifiers run in the pre-reboot list, so they run after
> the informational notifiers for example (in the default level).
> But without the list split we cannot order properly, so my gut feeling
> is that makes sense to run them rather earlier than later in the panic
> process...
>
> Maybe Anton / Johannes / Richard could give their opinions - appreciate
> that, I'm not attached to the priority here, it's more about users'
> common usage of UML I can think of...

It's hard to say ... In a sense I'm not sure it matters?

OTOH something like the ftrace dump notifier (kernel/trace/trace.c)
might still be useful to run before the mconsole and coredump ones, even
if you could probably use gdb to figure out the information.

Personally, I don't have a scenario where I'd care about the trace
buffers though, and most of the others I found would seem irrelevant
(drivers that aren't even compiled, hung tasks won't really happen since
we exit immediately, and similar.)

johannes

2022-05-16 10:18:28

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

On 13/05/2022 11:44, Johannes Berg wrote:
> [...]
>> Maybe Anton / Johannes / Richard could give their opinions - appreciate
>> that, I'm not attached to the priority here, it's more about users'
>> common usage of UML I can think of...
>
> It's hard to say ... In a sense I'm not sure it matters?
>
> OTOH something like the ftrace dump notifier (kernel/trace/trace.c)
> might still be useful to run before the mconsole and coredump ones, even
> if you could probably use gdb to figure out the information.
>
> Personally, I don't have a scenario where I'd care about the trace
> buffers though, and most of the others I found would seem irrelevant
> (drivers that aren't even compiled, hung tasks won't really happen since
> we exit immediately, and similar.)
>
> johannes

Thanks Johannes, I agree with you.

We don't have great ordering now, one thing we need to enforce is the
order between the 2 UML notifiers, and this patch is doing that..trying
to order against other callbacks like the ftrace dumper is messy in the
current code.

OTOH if this patch set is accepted at some point, we'll likely have 3
lists, and with that we can improve ordering a lot - this notifier for
instance would run in the pre-reboot list, *after* the ftrace dumper (if
a kmsg dumper is set).

So, my intention is to keep this patch as is for V2 (with some changes
Johannes suggested before), unless Petr or the other maintainers want
something different.
Cheers,


Guilherme

2022-05-16 19:43:08

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

On 16/05/2022 11:56, Petr Mladek wrote:
> [...]
> I really like both changes. Just please split it them into two
> patchset. I mean one patch for the new "panic_console_replay"
> parameter and 2nd that moves "printk_info" into the notifier.
>

OK sure, will do that in V2.
Thanks,


Guilherme

2022-05-17 00:17:23

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

> So, my reasoning here is: this notifier should fit the info list,
> definitely! But...it's very high risk for kdump. It deep dives into the
> regmap API (there are locks in such code) plus there is an (MM)IO write
> to the device and an ARM firmware call. So, despite the nature of this
> notifier _fits the informational list_, the _code is risky_ so we should
> avoid running it before a kdump.
>
> Now, we indeed have a chicken/egg problem: want to avoid it before
> kdump, BUT in case kdump is not set, kmsg_dump() (and console flushing,
> after your suggestion Petr) will run before it and not save collected
> information from EDAC PoV.

Would it be possible to have some global "kdump is configured + enabled" flag?

Then notifiers could make an informed choice on whether to deep dive to
get all the possible details (when there is no kdump) or just skim the high
level stuff (to maximize chance of getting a successful kdump).

-Tony

2022-05-17 01:04:58

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

On 16/05/2022 13:18, Luck, Tony wrote:
>> [...]
> Would it be possible to have some global "kdump is configured + enabled" flag?
>
> Then notifiers could make an informed choice on whether to deep dive to
> get all the possible details (when there is no kdump) or just skim the high
> level stuff (to maximize chance of getting a successful kdump).
>
> -Tony

Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?

With that, are you/Petr/Dinh OK in moving it for the info list?
Cheers,


Guilherme

2022-05-17 01:20:49

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

On 16/05/2022 11:45, Petr Mladek wrote:
> [...]
>
> The patch looks good to me. I would just suggest two changes.
>
> 1. I would rename the list to "panic_loop_list" instead of
> "panic_post_reboot_list".
>
> It will be more clear that it includes things that are
> needed before panic() enters the infinite loop.
>
>
> 2. I would move all the notifiers that enable blinking here.
>
> The blinking should be done only during the infinite
> loop when there is nothing else to do. If we enable
> earlier then it might disturb/break more important
> functionality (dumping information, reboot).
>

Perfect, I agree with you. I'll change both points in V2 =)

2022-05-17 02:11:08

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

On Wed 2022-04-27 19:49:19, Guilherme G. Piccoli wrote:
> Currently the parameter "panic_print" relies in a function called
> directly on panic path; one of the flags the users can set for
> panic_print triggers a console replay mechanism, to show the
> entire kernel log buffer (from the beginning) in a panic event.
>
> Two problems with that: the dual nature of the panic_print
> isn't really appropriate, the function was originally meant
> to allow users dumping system information on panic events,
> and was "overridden" to also force a console flush of the full
> kernel log buffer. It also turns the code a bit more complex
> and duplicate than it needs to be.
>
> This patch proposes 2 changes: first, we decouple panic_print
> from the console flushing mechanism, in the form of a new kernel
> core parameter (panic_console_replay); we kept the functionality
> on panic_print to avoid userspace breakage, although we comment
> in both code and documentation that this panic_print usage is
> deprecated.
>
> We converted panic_print function to a panic notifier too, adding
> it on the panic informational notifier list, executed as the final
> callback. This allows a more clear code and makes sense, as
> panic_print_sys_info() is really a panic-time only function.
> We also moved its code to kernel/printk.c, it seems to make more
> sense given it's related to printing stuff.

I really like both changes. Just please split it them into two
patchset. I mean one patch for the new "panic_console_replay"
parameter and 2nd that moves "printk_info" into the notifier.

Best Regards,
Petr

2022-05-17 02:23:33

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

On 16/05/2022 11:50, Petr Mladek wrote:
> [...]
> Shame on me that I do not care that much about the style of the commit
> message :-)
>
> Anyway, the code looks good to me. With the better commit message:
>
> Reviewed-by: Petr Mladek <[email protected]>
>

Heheh, cool - I'll add your tag and improve the message in V2.
Thanks,


Guilherme

2022-05-17 02:33:49

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

On Wed 2022-04-27 19:49:16, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
>
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
>
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.

The patch looks good to me. I would just suggest two changes.

1. I would rename the list to "panic_loop_list" instead of
"panic_post_reboot_list".

It will be more clear that it includes things that are
needed before panic() enters the infinite loop.


2. I would move all the notifiers that enable blinking here.

The blinking should be done only during the infinite
loop when there is nothing else to do. If we enable
earlier then it might disturb/break more important
functionality (dumping information, reboot).

Best Regards,
Petr

2022-05-17 03:36:36

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

On Wed 2022-05-11 17:03:51, Guilherme G. Piccoli wrote:
> On 10/05/2022 14:40, Steven Rostedt wrote:
> > On Wed, 27 Apr 2022 19:49:17 -0300
> > "Guilherme G. Piccoli" <[email protected]> wrote:
> >
> >> Currently we don't have a way to check if there are dumpers set,
> >> except counting the list members maybe. This patch introduces a very
> >> simple helper to provide this information, by just keeping track of
> >> registered/unregistered kmsg dumpers. It's going to be used on the
> >> panic path in the subsequent patch.
> >
> > FYI, it is considered "bad form" to reference in the change log "this
> > patch". We know this is a patch. The change log should just talk about what
> > is being done. So can you reword your change logs (you do this is almost
> > every patch). Here's what I would reword the above to be:
> >
> > Currently we don't have a way to check if there are dumpers set, except
> > perhaps by counting the list members. Introduce a very simple helper to
> > provide this information, by just keeping track of registered/unregistered
> > kmsg dumpers. This will simplify the refactoring of the panic path.
>
> Thanks for the hint, you're right - it's almost in all of my patches.
> I'll reword all of them (except the ones already merged) to remove this
> "bad form".

Shame on me that I do not care that much about the style of the commit
message :-)

Anyway, the code looks good to me. With the better commit message:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2022-05-17 04:43:06

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

On Wed 2022-04-27 19:49:15, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
>
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.
>
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
> int dberror, err_addr;
>
> edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_pre_reboot_list,

My understanding is that this notifier first prints info about ECC
errors and then triggers reboot. It might make sense to split it
into two notifiers.


> &edac->panic_notifier);
>
> /* Printout a message if uncorrectable error previously. */
> --- a/drivers/leds/trigger/ledtrig-panic.c
> +++ b/drivers/leds/trigger/ledtrig-panic.c
> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>
> static int __init ledtrig_panic_init(void)
> {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_pre_reboot_list,
> &led_trigger_panic_nb);

Blinking => should go to the last "post_reboot/loop" list.


>
> led_trigger_register_simple("panic", &trigger);
> --- a/drivers/misc/ibmasm/heartbeat.c
> +++ b/drivers/misc/ibmasm/heartbeat.c
> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
> static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
> {
> suspend_heartbeats = 1;
> - return 0;
> + return NOTIFY_DONE;
> }
>
> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
> +static struct notifier_block panic_notifier = {
> + .notifier_call = panic_happened,
> +};
>
> void ibmasm_register_panic_notifier(void)
> {
> - atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
> + atomic_notifier_chain_register(&panic_pre_reboot_list,
> + &panic_notifier);

Same here. Blinking => should go to the last "post_reboot/loop" list.


> }
>
> void ibmasm_unregister_panic_notifier(void)
> {
> - atomic_notifier_chain_unregister(&panic_notifier_list,
> - &panic_notifier);
> + atomic_notifier_chain_unregister(&panic_pre_reboot_list,
> + &panic_notifier);
> }


The rest of the moved notifiers seem to fit well this "pre_reboot"
list.

Best Regards,
Petr

2022-05-17 06:43:26

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

On 10/05/2022 14:29, Steven Rostedt wrote:
> [...]
> Also, don't sprinkle #ifdef in C code. Instead:
>
> if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
> pr_info("notifers: regsiter %ps()\n",
> n->notifer_call);
>
>
> Or define a print macro at the start of the C file that is a nop if it is
> not defined, and use the macro.

Thanks, I'll go with the IS_ENABLED() idea in V2 - appreciate the hint.
Cheers,


Guilherme

2022-05-17 11:02:41

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

Thanks again for the review! Comments inline below:


On 16/05/2022 11:33, Petr Mladek wrote:
> [...]
>> --- a/drivers/edac/altera_edac.c
>> +++ b/drivers/edac/altera_edac.c
>> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
>> int dberror, err_addr;
>>
>> edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
>> - atomic_notifier_chain_register(&panic_notifier_list,
>> + atomic_notifier_chain_register(&panic_pre_reboot_list,
>
> My understanding is that this notifier first prints info about ECC
> errors and then triggers reboot. It might make sense to split it
> into two notifiers.

I disagree here - looping the maintainers for comments (CCing Dinh /
Tony). BTW, sorry for not having you on CC already Dinh, it was my mistake.

So, my reasoning here is: this notifier should fit the info list,
definitely! But...it's very high risk for kdump. It deep dives into the
regmap API (there are locks in such code) plus there is an (MM)IO write
to the device and an ARM firmware call. So, despite the nature of this
notifier _fits the informational list_, the _code is risky_ so we should
avoid running it before a kdump.

Now, we indeed have a chicken/egg problem: want to avoid it before
kdump, BUT in case kdump is not set, kmsg_dump() (and console flushing,
after your suggestion Petr) will run before it and not save collected
information from EDAC PoV.

My idea: I could call a second kmsg_dump() or at least a panic console
flush for within such notifier. Let me know what you think Petr (also
Dinh / Tony and all interested parties).


> [...]
>> --- a/drivers/leds/trigger/ledtrig-panic.c
>> +++ b/drivers/leds/trigger/ledtrig-panic.c
>> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>>
>> static int __init ledtrig_panic_init(void)
>> {
>> - atomic_notifier_chain_register(&panic_notifier_list,
>> + atomic_notifier_chain_register(&panic_pre_reboot_list,
>> &led_trigger_panic_nb);
>
> Blinking => should go to the last "post_reboot/loop" list.
> [...]
>> --- a/drivers/misc/ibmasm/heartbeat.c
>> +++ b/drivers/misc/ibmasm/heartbeat.c
>> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
>> static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
>> {
>> suspend_heartbeats = 1;
>> - return 0;
>> + return NOTIFY_DONE;
>> }
>>
>> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
>> +static struct notifier_block panic_notifier = {
>> + .notifier_call = panic_happened,
>> +};
>>
>> void ibmasm_register_panic_notifier(void)
>> {
>> - atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
>> + atomic_notifier_chain_register(&panic_pre_reboot_list,
>> + &panic_notifier);
>
> Same here. Blinking => should go to the last "post_reboot/loop" list.

Ack on both.

IBMasm is not blinking IIUC, but still fits properly the loop list. This
notifier would make a heartbeat mechanism stop, and once it's stopped,
service processor is allowed to reboot - that's my understanding.


Cheers,


Guilherme

2022-05-17 16:05:01

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

On Tue 2022-05-10 10:00:58, Guilherme G. Piccoli wrote:
> On 10/05/2022 09:14, Petr Mladek wrote:
> > [...]
> >> With that said, it's dangerous to use regular spinlocks in such path,
> >> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
> >> This patch fixes that by replacing regular spinlocks with the trylock
> >> safer approach.
> >
> > It seems that the lock is used just to manipulating a list. A super
> > safe solution would be to use the rcu API: rcu_add_rcu() and
> > list_del_rcu() under rcu_read_lock(). The spin lock will not be
> > needed and the list will always be valid.
> >
> > The advantage would be that it will always call members that
> > were successfully added earlier. That said, I am not familiar
> > with pvpanic and am not sure if it is worth it.
> >
> >> It also fixes an old comment (about a long gone framebuffer code) and
> >> the notifier priority - we should execute hypervisor notifiers early,
> >> deferring this way the panic action to the hypervisor, as expected by
> >> the users that are setting up pvpanic.
> >
> > This should be done in a separate patch. It changes the behavior.
> > Also there might be a discussion whether it really should be
> > the maximal priority.
> >
> > Best Regards,
> > Petr
>
> Thanks for the review Petr. Patch was already merged - my goal was to be
> concise, i.e., a patch per driver / module, so the patch kinda fixes
> whatever I think is wrong with the driver with regards panic handling.
>
> Do you think it worth to remove this patch from Greg's branch just to
> split it in 2? Personally I think it's not worth, but opinions are welcome.

No problem. It is not worth the effort.


> About the RCU part, this one really could be a new patch, a good
> improvement patch - it makes sense to me, we can think about that after
> the fixes I guess.

Yup.

Best Regards,
Petr

2022-05-17 18:33:15

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

On 17/05/2022 10:11, Petr Mladek wrote:
> [...]
>> You mentioned 2 cases:
>>
>> (a) Same notifier_list used in different situations;
>>
>> (b) Same *notifier callback* used in different lists;
>>
>> Mine is case (b), right? Can you show me an example of case (a)?
>
> There are many examples of case (a):
>
> [... snip ...]
> These all call the same list/chain in different situations.
> The situation is distinguished by @val.
>
>
>> You can see in the following patches (or grep the kernel) that people are using
>> this identification parameter to determine which kind of OOPS trigger
>> the callback to condition the execution of the function to specific
>> cases.
>
> Could you please show me some existing code for case (b)?
> I am not able to find any except in your patches.
>

Hi Petr, thanks for the examples - I agree with you. In the end, seems
I'm kind of abusing the API. This id is used to distinguish different
situations in which the callback is called, but in the same
"realm"/notifier list.

In my case I have different list calling the same callback and
(ab-)using the id to make distinction. I can rework the patches using
pointer comparison, it's fine =)

So, I'll drop this patch in V2.

> Anyway, the solution in 16th patch is bad, definitely.
> hv_die_panic_notify_crash() uses "val" to disinguish
> both:
>
> + "panic_notifier_list" vs "die_chain"
> + die_val when callen via "die_chain"
>
> The API around "die_chain" API is not aware of enum panic_notifier_val
> and the API using "panic_notifier_list" is not aware of enum die_val.
> As I said, it is mixing apples and oranges and it is error prone.
>

OK, I'll re-work that patch - there's more there to be changed, that one
is complex heheh

Cheers!

2022-05-17 19:12:06

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

On 17/05/2022 07:58, Petr Mladek wrote:
> [...]
>> Thanks for the review Petr. Patch was already merged - my goal was to be
>> concise, i.e., a patch per driver / module, so the patch kinda fixes
>> whatever I think is wrong with the driver with regards panic handling.
>>
>> Do you think it worth to remove this patch from Greg's branch just to
>> split it in 2? Personally I think it's not worth, but opinions are welcome.
>
> No problem. It is not worth the effort.
>

OK, perfect!

Cheers,


Guilherme

2022-05-17 20:09:04

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

On 17/05/2022 14:02, Luck, Tony wrote:
>> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
>> we run the code as-is? Does that make sense to you?
>
> The "skip" option sounds like it needs some special flag associated with
> an entry on the notifier chain. But there are other notifier chains ... so that
> sounds messy to me.
>
> Just all the notifiers in priority order. If any want to take different actions
> based on kdump status, change the code. That seems more flexible than
> an "all or nothing" approach by skipping.
>
> -Tony

I guess I've expressed myself in a poor way - sorry!

What I'm planning to do in the altera_edac notifier is:

if (kdump_is_set)
return;

/* regular code */

In other words: if the kdump is set, this notifier will be effectively a
nop (although it's gonna be called).

Lemme know your thoughts Tony, if that makes sense.
Thanks,


Guilherme

2022-05-17 20:57:27

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
> we run the code as-is? Does that make sense to you?

The "skip" option sounds like it needs some special flag associated with
an entry on the notifier chain. But there are other notifier chains ... so that
sounds messy to me.

Just all the notifiers in priority order. If any want to take different actions
based on kdump status, change the code. That seems more flexible than
an "all or nothing" approach by skipping.

-Tony

2022-05-18 03:20:28

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

On Mon 2022-05-16 13:33:51, Guilherme G. Piccoli wrote:
> On 16/05/2022 13:18, Luck, Tony wrote:
> >> [...]
> > Would it be possible to have some global "kdump is configured + enabled" flag?
> >
> > Then notifiers could make an informed choice on whether to deep dive to
> > get all the possible details (when there is no kdump) or just skim the high
> > level stuff (to maximize chance of getting a successful kdump).
> >
> > -Tony
>
> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?

I like this idea.

One small problem is that kexec_crash_loaded() has valid result
only under kexec_mutex. On the other hand, it should stay true
once loaded so that the small race window should be innocent.

> With that, are you/Petr/Dinh OK in moving it for the info list?

Sounds good to me.

Best Regards,
Petr

2022-05-18 03:26:16

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

On 17/05/2022 11:11, Petr Mladek wrote:
> [...]
>>> Then notifiers could make an informed choice on whether to deep dive to
>>> get all the possible details (when there is no kdump) or just skim the high
>>> level stuff (to maximize chance of getting a successful kdump).
>>>
>>> -Tony
>>
>> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?
>
> I like this idea.
>
> One small problem is that kexec_crash_loaded() has valid result
> only under kexec_mutex. On the other hand, it should stay true
> once loaded so that the small race window should be innocent.
>
>> With that, are you/Petr/Dinh OK in moving it for the info list?
>
> Sounds good to me.
>
> Best Regards,
> Petr

Perfect, I'll do that for V2 then =)

Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
we run the code as-is? Does that make sense to you?

I'll postpone it to run almost in the end of info list (last position is
for panic_print).

Thanks,


Guilherme

2022-05-18 03:37:38

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

On Tue 2022-05-10 13:16:54, Guilherme G. Piccoli wrote:
> On 10/05/2022 12:16, Petr Mladek wrote:
> > [...]
> > Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> > All notifiers will be always called with PANIC_NOTIFIER.
> >
> > The @val parameter is normally used when the same notifier_list
> > is used in different situations.
> >
> > But you are going to use it when the same notifier is used
> > in more lists. This is normally distinguished by the @nh
> > (atomic_notifier_head) parameter.
> >
> > IMHO, it is a bad idea. First, it would confuse people because
> > it does not follow the original design of the parameters.
> > Second, the related code must be touched anyway when
> > the notifier is moved into another list so it does not
> > help much.
> >
> > Or do I miss anything, please?
> >
> > Best Regards,
> > Petr
>
> Hi Petr, thanks for the review.
>
> I'm not strong attached to this patch, so we could drop it and refactor
> the code of next patches to use the @nh as identification - but
> personally, I feel this parameter could be used to identify the list
> that called such function, in other words, what is the event that
> triggered the callback. Some notifiers are even declared with this
> parameter called "ev", like the event that triggers the notifier.
>
>
> You mentioned 2 cases:
>
> (a) Same notifier_list used in different situations;
>
> (b) Same *notifier callback* used in different lists;
>
> Mine is case (b), right? Can you show me an example of case (a)?

There are many examples of case (a):

+ module_notify_list:
MODULE_STATE_LIVE, /* Normal state. */
MODULE_STATE_COMING, /* Full formed, running module_init. */
MODULE_STATE_GOING, /* Going away. */
MODULE_STATE_UNFORMED, /* Still setting it up. */


+ netdev_chain:

NETDEV_UP = 1, /* For now you can't veto a device up/down */
NETDEV_DOWN,
NETDEV_REBOOT, /* Tell a protocol stack a network interface
detected a hardware crash and restarted
- we can use this eg to kick tcp sessions
once done */
NETDEV_CHANGE, /* Notify device state change */
NETDEV_REGISTER,
NETDEV_UNREGISTER,
NETDEV_CHANGEMTU, /* notify after mtu change happened */
NETDEV_CHANGEADDR, /* notify after the address change */
NETDEV_PRE_CHANGEADDR, /* notify before the address change */
NETDEV_GOING_DOWN,
...

+ vt_notifier_list:

#define VT_ALLOCATE 0x0001 /* Console got allocated */
#define VT_DEALLOCATE 0x0002 /* Console will be deallocated */
#define VT_WRITE 0x0003 /* A char got output */
#define VT_UPDATE 0x0004 /* A bigger update occurred */
#define VT_PREWRITE 0x0005 /* A char is about to be written to the console */

+ die_chain:

DIE_OOPS = 1,
DIE_INT3,
DIE_DEBUG,
DIE_PANIC,
DIE_NMI,
DIE_DIE,
DIE_KERNELDEBUG,
...

These all call the same list/chain in different situations.
The situation is distinguished by @val.


> You can see in the following patches (or grep the kernel) that people are using
> this identification parameter to determine which kind of OOPS trigger
> the callback to condition the execution of the function to specific
> cases.

Could you please show me some existing code for case (b)?
I am not able to find any except in your patches.

Anyway, the solution in 16th patch is bad, definitely.
hv_die_panic_notify_crash() uses "val" to disinguish
both:

+ "panic_notifier_list" vs "die_chain"
+ die_val when callen via "die_chain"

The API around "die_chain" API is not aware of enum panic_notifier_val
and the API using "panic_notifier_list" is not aware of enum die_val.
As I said, it is mixing apples and oranges and it is error prone.

Best Regards,
Petr

2022-05-18 04:08:27

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

> What I'm planning to do in the altera_edac notifier is:
>
> if (kdump_is_set)
> return;

Yes. That's what I think should happen.

-Tony

2022-05-18 04:15:47

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

On 10/05/2022 12:28, Petr Mladek wrote:
> [...]
> IMHO, the check of the @self parameter was the proper solution.
>
> "gisb_die_notifier" list uses @val from enum die_val.
> "gisb_panic_notifier" list uses @val from enum panic_notifier_val.
>
> These are unrelated types. It might easily break when
> someone defines the same constant also in enum die_val.
>
> Best Regards,
> Petr

OK Petr, I'll drop this idea for V2 - will just remove the useless
header / prototype then. (CC Florian)


Cheers,


Guilherme