Subject: [PATCH RFC 0/5] dump_stack: Allow runtime updates of the hardware description

When the kernel emits a stack trace, typically it includes a hardware
description string, e.g.

Kernel panic - not syncing: sysrq triggered crash
CPU: 6 PID: 46433 Comm: bash Tainted: G W 6.7.0-rc2+ #83
> Hardware name: IBM,9040-MR9 POWER9 (architected) 0x4e2102 0xf000005 of:IBM,FW950.01 (VM950_047) hv:phyp pSeries
Call Trace:
dump_stack_lvl+0xc4/0x170 (unreliable)
panic+0x39c/0x584
sysrq_handle_crash+0x80/0xe0
__handle_sysrq+0x208/0x4bc
[...]

This string is a statically allocated buffer populated during boot by
arch code calling dump_stack_set_arch_desc(). For most platforms this
is sufficient.

But the string may become inaccurate on the IBM PowerVM platform due
to live migration between machine models and firmware versions. Stack
dumps emitted after a migration reflect the machine on which the
kernel booted, not necessarily the machine on which it is currently
running. This is potentially confusing for anyone investigating
kernel issues on the platform.

To address this, this series introduces a new function that safely
updates the hardware description string and updates the powerpc
pseries platform code to call it after a migration. The series also
includes changes addressing minor latent issues identified during the
implementation.

Platforms which do not need the new functionality remain unchanged.

For this initial version at least, the powerpc/pseries part includes
some "self-test" code that 1. verifies that reconstructing the
hardware description string late in boot matches the one that was
built earlier, and 2. fully exercises the update path before any
migrations occur. This could be dropped or made configurable in the
future.

Signed-off-by: Nathan Lynch <[email protected]>
---
Nathan Lynch (5):
dump_stack: Make arch description buffer __ro_after_init
dump_stack: Allow update of arch description string at runtime
powerpc/prom: Add CPU info to hardware description string later
powerpc/pseries: Prepare pseries_add_hw_description() for runtime use
powerpc/pseries: Update hardware description string after migration

arch/powerpc/kernel/prom.c | 12 +++--
arch/powerpc/platforms/pseries/mobility.c | 5 ++
arch/powerpc/platforms/pseries/pseries.h | 1 +
arch/powerpc/platforms/pseries/setup.c | 80 +++++++++++++++++++++++++++++--
include/linux/printk.h | 5 ++
lib/dump_stack.c | 57 ++++++++++++++++++++--
6 files changed, 146 insertions(+), 14 deletions(-)
---
base-commit: 44a1aad2fe6c10bfe0589d8047057b10a4c18a19
change-id: 20240111-update-dump-stack-arch-str-7f0880d23f30

Best regards,
--
Nathan Lynch <[email protected]>



Subject: [PATCH RFC 4/5] powerpc/pseries: Prepare pseries_add_hw_description() for runtime use

From: Nathan Lynch <[email protected]>

pseries_add_hw_description() will be used after boot to update the
hardware description string emitted in stack dumps. Remove the __init
and make it take a seq_buf * parameter instead of referencing
ppc_hw_desc directly.

Signed-off-by: Nathan Lynch <[email protected]>
---
arch/powerpc/platforms/pseries/setup.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index ecea85c74c43..9ae1951f8312 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -1007,7 +1007,7 @@ static void __init pSeries_cmo_feature_init(void)
pr_debug(" <- fw_cmo_feature_init()\n");
}

-static void __init pseries_add_hw_description(void)
+static void pseries_add_hw_description(struct seq_buf *sb)
{
struct device_node *dn;
const char *s;
@@ -1015,7 +1015,7 @@ static void __init pseries_add_hw_description(void)
dn = of_find_node_by_path("/openprom");
if (dn) {
if (of_property_read_string(dn, "model", &s) == 0)
- seq_buf_printf(&ppc_hw_desc, "of:%s ", s);
+ seq_buf_printf(sb, "of:%s ", s);

of_node_put(dn);
}
@@ -1023,7 +1023,7 @@ static void __init pseries_add_hw_description(void)
dn = of_find_node_by_path("/hypervisor");
if (dn) {
if (of_property_read_string(dn, "compatible", &s) == 0)
- seq_buf_printf(&ppc_hw_desc, "hv:%s ", s);
+ seq_buf_printf(sb, "hv:%s ", s);

of_node_put(dn);
return;
@@ -1031,7 +1031,7 @@ static void __init pseries_add_hw_description(void)

if (of_property_read_bool(of_root, "ibm,powervm-partition") ||
of_property_read_bool(of_root, "ibm,fw-net-version"))
- seq_buf_printf(&ppc_hw_desc, "hv:phyp ");
+ seq_buf_printf(sb, "hv:phyp ");
}

/*
@@ -1041,7 +1041,7 @@ static void __init pseries_init(void)
{
pr_debug(" -> pseries_init()\n");

- pseries_add_hw_description();
+ pseries_add_hw_description(&ppc_hw_desc);

#ifdef CONFIG_HVC_CONSOLE
if (firmware_has_feature(FW_FEATURE_LPAR))

--
2.43.0


Subject: [PATCH RFC 2/5] dump_stack: Allow update of arch description string at runtime

From: Nathan Lynch <[email protected]>

The IBM PowerVM platform (targeted by powerpc/pseries) exposes the
physical machine model and firmware version to partitions (guests),
and this information is used to populate the arch description string,
e.g.

IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 \
of:IBM,FW860.50 (SV860_146) hv:phyp pSeries

The platform supports live migration of partitions between different
machine models and firmware versions, so the arch description string
set at boot can become inaccurate, potentially misleading anyone who's
analyzing stack traces produced after a migration.

Introduce a RCU-guarded pointer to the current arch description
string, initializing it to the static buffer populated at boot. Add to
dump_stack_print_info() a RCU read-side critical section that accesses
the buffer through this pointer. The majority of architectures which
don't need to update the string after boot incur only an additional
indirection.

As for platforms which do need that ability, they can use
dump_stack_update_arch_desc(), which allocates and formats a new
buffer, updates the pointer, and if appropriate frees the previous
buffer.

Signed-off-by: Nathan Lynch <[email protected]>
---
include/linux/printk.h | 5 +++++
lib/dump_stack.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 8ef499ab3c1e..6138ae019d2a 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -187,6 +187,7 @@ u32 log_buf_len_get(void);
void log_buf_vmcoreinfo_setup(void);
void __init setup_log_buf(int early);
__printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
+__printf(1, 2) void dump_stack_update_arch_desc(const char *fmt, ...);
void dump_stack_print_info(const char *log_lvl);
void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
@@ -253,6 +254,10 @@ static inline __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...)
{
}

+static inline __printf(1, 2) void dump_stack_update_arch_desc(const char *fmt, ...)
+{
+}
+
static inline void dump_stack_print_info(const char *log_lvl)
{
}
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 1057f102f6f2..bd497e7797ee 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -8,15 +8,18 @@
#include <linux/buildid.h>
#include <linux/cache.h>
#include <linux/export.h>
+#include <linux/rcupdate.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
#include <linux/smp.h>
+#include <linux/spinlock.h>
#include <linux/atomic.h>
#include <linux/kexec.h>
#include <linux/utsname.h>
#include <linux/stop_machine.h>

static char dump_stack_arch_desc_str[128] __ro_after_init;
+static const char *dump_stack_arch_desc_ptr = dump_stack_arch_desc_str;

/**
* dump_stack_set_arch_desc - set arch-specific str to show with task dumps
@@ -28,7 +31,7 @@ static char dump_stack_arch_desc_str[128] __ro_after_init;
* arch wants to make use of such an ID string, it should initialize this
* as soon as possible during boot.
*/
-void __init dump_stack_set_arch_desc(const char *fmt, ...)
+void dump_stack_set_arch_desc(const char *fmt, ...)
{
va_list args;

@@ -38,6 +41,45 @@ void __init dump_stack_set_arch_desc(const char *fmt, ...)
va_end(args);
}

+/**
+ * dump_stack_update_arch_desc() - Update the arch description string at runtime.
+ * @fmt: printf-style format string
+ * @...: arguments for the format string
+ *
+ * A runtime counterpart of dump_stack_set_arch_desc(). Arch code
+ * should use this when the arch description set at boot potentially
+ * has become inaccurate, such as after a guest migration.
+ *
+ * Context: May sleep.
+ */
+void dump_stack_update_arch_desc(const char *fmt, ...)
+{
+ static DEFINE_SPINLOCK(arch_desc_update_lock);
+ const char *old;
+ const char *new;
+ va_list args;
+
+ va_start(args, fmt);
+ new = kvasprintf(GFP_KERNEL, fmt, args);
+ va_end(args);
+
+ if (!new)
+ return;
+
+ spin_lock(&arch_desc_update_lock);
+ old = rcu_replace_pointer(dump_stack_arch_desc_ptr, new,
+ lockdep_is_held(&arch_desc_update_lock));
+ spin_unlock(&arch_desc_update_lock);
+
+ /*
+ * Avoid freeing the static buffer initialized during boot.
+ */
+ if (old == dump_stack_arch_desc_str)
+ return;
+
+ kfree_rcu_mightsleep(old);
+}
+
#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
#define BUILD_ID_FMT " %20phN"
#define BUILD_ID_VAL vmlinux_build_id
@@ -55,6 +97,8 @@ void __init dump_stack_set_arch_desc(const char *fmt, ...)
*/
void dump_stack_print_info(const char *log_lvl)
{
+ const char *arch_str;
+
printk("%sCPU: %d PID: %d Comm: %.20s %s%s %s %.*s" BUILD_ID_FMT "\n",
log_lvl, raw_smp_processor_id(), current->pid, current->comm,
kexec_crash_loaded() ? "Kdump: loaded " : "",
@@ -63,9 +107,11 @@ void dump_stack_print_info(const char *log_lvl)
(int)strcspn(init_utsname()->version, " "),
init_utsname()->version, BUILD_ID_VAL);

- if (dump_stack_arch_desc_str[0] != '\0')
- printk("%sHardware name: %s\n",
- log_lvl, dump_stack_arch_desc_str);
+ rcu_read_lock();
+ arch_str = rcu_dereference(dump_stack_arch_desc_ptr);
+ if (arch_str[0] != '\0')
+ printk("%sHardware name: %s\n", log_lvl, arch_str);
+ rcu_read_unlock();

print_worker_info(log_lvl, current);
print_stop_info(log_lvl, current);

--
2.43.0