The kernel notifier infrastructure allows function callbacks to be
added in multiple lists, which are then called in the proper time,
like in a reboot or panic event. The panic_notifier_list specifically
contains the callbacks that are executed during a panic event. As any
other notifier list, the panic one has no filtering and all functions
previously registered are executed.
The kdump infrastructure, on the other hand, enables users to set
a crash kernel that is kexec'ed in a panic event, and vmcore/logs
are collected in such crash kernel. When kdump is set, by default
the panic notifiers are ignored - the kexec jumps to the crash kernel
before the list is checked and callbacks executed.
There are some cases though in which kdump users might want to
allow panic notifier callbacks to execute _before_ the kexec to
the crash kernel, for a variety of reasons - for example, users
may think kexec is very prone to fail and want to give a chance
to kmsg dumpers to run (and save logs using pstore), or maybe
some panic notifier is required to properly quiesce some hardware
that must be used to the crash kernel. For these cases, we have
the kernel parameter "crash_kexec_post_notifiers".
But there's a problem: currently it's an "all-or-nothing" situation,
the kdump user choice is either to execute all panic notifiers or
none of them. Given that panic notifiers may increase the risk of a
kdump failure, this is a tough decision and may affect the debug of
hard to reproduce bugs, if for some reason the user choice is to
enable panic notifiers, but kdump then fails.
So, this patch aims to ease this decision: we hereby introduce a filter
for the panic notifier list, in which users may select specifically
which callbacks they wish to run, allowing a safer kdump. The allowlist
should be provided using the parameter "panic_notifier_filter=a,b,..."
where a, b are valid callback names. Invalid symbols are discarded.
Currently up to 16 symbols may be passed in this list, we consider
that this numbers allows enough flexibility (and no matter what
architecture is used, at most 30 panic callbacks are registered).
In an experiment using a qemu x86 virtual machine, by default only
six callbacks are registered in the panic notifier list.
Once a valid callback name is provided in the list, such function
is allowed to be registered/unregistered in the panic_notifier_list;
all other panic callbacks are ignored. Notice that this filter is
only for the panic notifiers and has no effect in the other notifiers.
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---
Hi folks, thanks in advance for reviews/suggestions! Some design
decisions are worth to mention:
(a) I've played first with a list approach instead of a static
array, but noticed that..it'd require relying in memblock allocs,
since we need to set the filter pretty early in boot process. So,
decided to abandon that - a small array is cheap and much easier to
implement.
(b) The size of the array: I've counted < 30 panic notifiers for
any architecture, and in my experiment only six are registered by
default, in a qemu x86 guest. Also, doesn't make sense for the
filter users to add a bunch of callbacks to the list, or else
why are they using the filter in first place? So 16 seems a
good trade-off.
(c) Allowlist vs. denylist: this is something to consider. My
approach is that functions listed in the filter are *allowed*
to execute in the panic notifier, because this seems to me
the most appropriate use case - kdump users might want one or 2
callbacks to execute, and block all the other in a denylist
seems bummer. But I'm open to ideas, of course, if the denylist
approach seems more reasonable.
(d) I've also tested __setup() instead of early_param(), but
seems we don't catch all notifiers by then, so decided to set
the filter really early.
(e) Finally, an alternative would be to check the filter during
the notifier call chain, but I thought it is simple and likely
less invasive to just prevent registering the disallowed functions.
Cheers,
Guilherme
.../admin-guide/kernel-parameters.txt | 14 +++++-
include/linux/panic_notifier.h | 10 +++++
kernel/notifier.c | 42 +++++++++++++++++-
kernel/panic.c | 44 +++++++++++++++++++
4 files changed, 107 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2fba82431efb..2dc4e98823ae 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3727,13 +3727,25 @@
panic_on_warn panic() instead of WARN(). Useful to cause kdump
on a WARN().
+ panic_notifier_filter=[function-list]
+ Limit the functions registered by the panic notifier
+ infrastructure. This allowlist is composed by function
+ names, comma separated (invalid symbols are filtered
+ out). Such functionality is useful for kdump users
+ that set "crash_kexec_post_notifiers" in order to
+ execute panic notifiers, but at the same time wish to
+ have just a subset of notifiers, not all of them. The
+ list of functions is limited to 16 entries currently.
+
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
succeeds in any situation.
Note that this also increases risks of kdump failure,
because some panic notifiers can make the crashed
- kernel more unstable.
+ kernel more unstable. See the "panic_notifier_filter"
+ parameter to have more control of which notifiers to
+ execute.
parkbd.port= [HW] Parallel port number the keyboard adapter is
connected to, default is 0.
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 41e32483d7a7..c84b31c342fd 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -5,6 +5,16 @@
#include <linux/notifier.h>
#include <linux/types.h>
+/*
+ * The panic notifier filter infrastructure - each array element holds a
+ * function address, to be checked against panic_notifier register/unregister
+ * operations; these functions are allowed to be registered in the panic
+ * notifier list. This setting is useful for kdump, since users may want
+ * some panic notifiers to execute, but not all of them.
+ */
+extern unsigned int panic_nf_functions[];
+extern long panic_nf_count;
+
extern struct atomic_notifier_head panic_notifier_list;
extern bool crash_kexec_post_notifiers;
diff --git a/kernel/notifier.c b/kernel/notifier.c
index b8251dc0bc0f..04cb9e956058 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/panic_notifier.h>
#include <linux/kdebug.h>
#include <linux/kprobes.h>
#include <linux/export.h>
@@ -127,12 +128,34 @@ static int notifier_call_chain_robust(struct notifier_block **nl,
* use a spinlock, and call_chain is synchronized by RCU (no locks).
*/
+/*
+ * The following helper is part of the panic notifier filter infrastructure;
+ * users can filter what functions they wish to allow being registered in the
+ * notifier system, restricted to the panic notifier. This is useful for kdump
+ * for example, when some notifiers are relevant but running all of them imposes
+ * risks to the kdump kernel reliability.
+ */
+static bool is_panic_notifier_filtered(struct notifier_block *n)
+{
+ int i;
+
+ for (i = 0; i < panic_nf_count; i++) {
+ if ((unsigned long)(n->notifier_call) == panic_nf_functions[i])
+ return true;
+ }
+
+ return false;
+}
+
/**
* atomic_notifier_chain_register - Add notifier to an atomic notifier chain
* @nh: Pointer to head of the atomic notifier chain
* @n: New entry in notifier chain
*
* Adds a notifier to an atomic notifier chain.
+ * If "panic_notifier_filter" is provided, we hereby filter the
+ * panic_notifier_list and only allow registering the functions
+ * that are present in the filter.
*
* Currently always returns zero.
*/
@@ -140,10 +163,16 @@ int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
struct notifier_block *n)
{
unsigned long flags;
- int ret;
+ int ret = 0;
spin_lock_irqsave(&nh->lock, flags);
+ if (unlikely(panic_nf_count) && nh == &panic_notifier_list)
+ if (!is_panic_notifier_filtered(n))
+ goto panic_filtered_out;
+
ret = notifier_chain_register(&nh->head, n);
+
+panic_filtered_out:
spin_unlock_irqrestore(&nh->lock, flags);
return ret;
}
@@ -155,6 +184,9 @@ EXPORT_SYMBOL_GPL(atomic_notifier_chain_register);
* @n: Entry to remove from notifier chain
*
* Removes a notifier from an atomic notifier chain.
+ * If "panic_notifier_filter" is provided, we hereby filter the
+ * panic_notifier_list and only allow unregistering the functions
+ * that are present in the filter.
*
* Returns zero on success or %-ENOENT on failure.
*/
@@ -162,10 +194,16 @@ int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
struct notifier_block *n)
{
unsigned long flags;
- int ret;
+ int ret = 0;
spin_lock_irqsave(&nh->lock, flags);
+ if (unlikely(panic_nf_count) && nh == &panic_notifier_list)
+ if (!is_panic_notifier_filtered(n))
+ goto panic_filtered_out;
+
ret = notifier_chain_unregister(&nh->head, n);
+
+panic_filtered_out:
spin_unlock_irqrestore(&nh->lock, flags);
synchronize_rcu();
return ret;
diff --git a/kernel/panic.c b/kernel/panic.c
index cefd7d82366f..c23fa2012be1 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -31,6 +31,7 @@
#include <linux/console.h>
#include <linux/bug.h>
#include <linux/ratelimit.h>
+#include <linux/kallsyms.h>
#include <linux/debugfs.h>
#include <asm/sections.h>
@@ -67,6 +68,16 @@ EXPORT_SYMBOL_GPL(panic_timeout);
#define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020
unsigned long panic_print;
+/*
+ * Kernel has currently < 30 panic handlers no matter the arch,
+ * based on some code counting; so 16 items seems a good amount;
+ * users that are filtering panic notifiers shouldn't add all
+ * of them in theory, that doesn't make sense...
+ */
+#define PANIC_NF_MAX 16
+unsigned long panic_nf_functions[PANIC_NF_MAX];
+int panic_nf_count;
+
ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
EXPORT_SYMBOL(panic_notifier_list);
@@ -146,6 +157,39 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
}
EXPORT_SYMBOL(nmi_panic);
+static int __init panic_notifier_filter_setup(char *buf)
+{
+ char *func;
+ unsigned long addr;
+
+ if (!buf)
+ return -EINVAL;
+
+ while (buf) {
+ func = strsep(&buf, ",");
+ addr = kallsyms_lookup_name(func);
+
+ if (!addr) {
+ pr_info("panic_notifier_filter: invalid symbol %s\n", func);
+ continue;
+ }
+
+ if (panic_nf_count < PANIC_NF_MAX) {
+ panic_nf_functions[panic_nf_count] = addr;
+ panic_nf_count++;
+ pr_debug("panic_notifier_filter: added symbol %s\n", func);
+ } else {
+ pr_warn("panic_notifier_filter: exceeded maximum notifiers (%d), aborting\n",
+ PANIC_NF_MAX);
+ panic_nf_count = 0;
+ break;
+ }
+ }
+
+ return 0;
+}
+early_param("panic_notifier_filter", panic_notifier_filter_setup);
+
static void panic_print_sys_info(void)
{
if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
--
2.34.1
On 06/01/2022 13:27, Guilherme G. Piccoli wrote:
> [...]
I'm sorry, I've made a mistake in the header file - just sent a V2.
Please *ignore* this version.
Apologies,
Guilherme
On 07/01/2022 20:07, Andy Shevchenko wrote:
> [...]
> + if (!buf)
> + return -EINVAL;
>
>
> I also implied that you don’t need this condition.
>
> + while ((func = strsep(&buf, ","))) {
> + addr = kallsyms_lookup_name(func);
>
>
>
> +
>
>
> Btw, this is redundant blank line.
>
>
> + if (!addr) {
Thanks Andy, just submitted a V4 that incorporates your suggestions.
Cheers,
Guilherme