2023-10-27 21:47:08

by Krister Johansen

[permalink] [raw]
Subject: [PATCH 0/2] Triggering a softlockup panic during SMP boot

Hi,
This pair of patches was the result of an unsuccessful attempt to set
softlockup_panic before SMP boot. The rationale for wanting to set this
parameter is that some of the VMs that my team runs will occasionally
get stuck while onlining the non-boot processors as part of SMP boot.

In the cases where this happens, we find out about it after the instance
successfully boots; however, the machines can get stuck for tens of
minutes at a time before finally completing onlining processors. Since
we pay per minute for many of these VMs there were two goals for setting
this value on boot: first, fail fast and hope that a subsequent boot
attempt will be successful. Second, a panic is a little easier to keep
track of, especially if we're scraping serial logs after the fact. In
essence, the goal is to trigger the failure earlier and hopefully get
more useful information for further debugging the problem as well.

While testing to make sure that this value was getting correctly set on
boot, I ran into a pair of surprises. First, when the softlockup_panic
parameter was migrated to a sysctl alias, it had the side effect of
setting the parameter value after SMP boot has occurred, when it used to
be set before this. Second, testing revealed that even though the
aliases were being correctly processed, the kernel was reporting the
commandline arguments as unrecognized. This generated a message in the
logs about an unrecognized parameter (even though it was) and the
parameter was passed as an environment variable to init.

The first patch ensures that aliased sysctl arguments are not reported
as unrecognized boot arguments.

The second patch moves the setting of softlockup_panic earlier in boot,
where it can take effect before SMP boot beings.

Thanks,

-K

Krister Johansen (2):
proc: sysctl: prevent aliased sysctls from getting passed to init
watchdog: move softlockup_panic back to early_param

fs/proc/proc_sysctl.c | 8 +++++++-
include/linux/sysctl.h | 6 ++++++
init/main.c | 4 ++++
kernel/watchdog.c | 7 +++++++
4 files changed, 24 insertions(+), 1 deletion(-)

--
2.25.1


2023-10-27 21:47:34

by Krister Johansen

[permalink] [raw]
Subject: [PATCH 1/2] proc: sysctl: prevent aliased sysctls from getting passed to init

The code that checks for unknown boot options is unaware of the sysctl
alias facility, which maps bootparams to sysctl values. If a user sets
an old value that has a valid alias, a message about an invalid
parameter will be printed during boot, and the parameter will get passed
to init. Fix by checking for the existence of aliased parameters in the
unknown boot parameter code. If an alias exists, don't return an error
or pass the value to init.

Signed-off-by: Krister Johansen <[email protected]>
Cc: [email protected]
Fixes: 0a477e1ae21b ("kernel/sysctl: support handling command line aliases")
---
fs/proc/proc_sysctl.c | 7 +++++++
include/linux/sysctl.h | 6 ++++++
init/main.c | 4 ++++
3 files changed, 17 insertions(+)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c88854df0b62..1c9635dddb70 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1592,6 +1592,13 @@ static const char *sysctl_find_alias(char *param)
return NULL;
}

+bool sysctl_is_alias(char *param)
+{
+ const char *alias = sysctl_find_alias(param);
+
+ return alias != NULL;
+}
+
/* Set sysctl value passed on kernel command line. */
static int process_sysctl_arg(char *param, char *val,
const char *unused, void *arg)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 09d7429d67c0..61b40ea81f4d 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -242,6 +242,7 @@ extern void __register_sysctl_init(const char *path, struct ctl_table *table,
extern struct ctl_table_header *register_sysctl_mount_point(const char *path);

void do_sysctl_args(void);
+bool sysctl_is_alias(char *param);
int do_proc_douintvec(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos,
int (*conv)(unsigned long *lvalp,
@@ -287,6 +288,11 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
static inline void do_sysctl_args(void)
{
}
+
+static inline bool sysctl_is_alias(char *param)
+{
+ return false;
+}
#endif /* CONFIG_SYSCTL */

int sysctl_max_threads(struct ctl_table *table, int write, void *buffer,
diff --git a/init/main.c b/init/main.c
index 436d73261810..e24b0780fdff 100644
--- a/init/main.c
+++ b/init/main.c
@@ -530,6 +530,10 @@ static int __init unknown_bootoption(char *param, char *val,
{
size_t len = strlen(param);

+ /* Handle params aliased to sysctls */
+ if (sysctl_is_alias(param))
+ return 0;
+
repair_env_string(param, val);

/* Handle obsolete-style parameters */
--
2.25.1

2023-10-27 21:57:41

by Krister Johansen

[permalink] [raw]
Subject: [PATCH 2/2] watchdog: move softlockup_panic back to early_param

Setting softlockup_panic from do_sysctl_args() causes it to take effect
later in boot. The lockup detector is enabled before SMP is brought
online, but do_sysctl_args runs afterwards. If a user wants to set
softlockup_panic on boot and have it trigger should a softlockup occur
during onlining of the non-boot processors, they could do this prior to
commit f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot
parameters to sysctl aliases"). However, after this commit the value
of softlockup_panic is set too late to be of help for this type of
problem. Restore the prior behavior.

Signed-off-by: Krister Johansen <[email protected]>
Cc: [email protected]
Fixes: f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases")
---
fs/proc/proc_sysctl.c | 1 -
kernel/watchdog.c | 7 +++++++
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 1c9635dddb70..de484195f49f 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1576,7 +1576,6 @@ static const struct sysctl_alias sysctl_aliases[] = {
{"hung_task_panic", "kernel.hung_task_panic" },
{"numa_zonelist_order", "vm.numa_zonelist_order" },
{"softlockup_all_cpu_backtrace", "kernel.softlockup_all_cpu_backtrace" },
- {"softlockup_panic", "kernel.softlockup_panic" },
{ }
};

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index d145305d95fe..5cd6d4e26915 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -283,6 +283,13 @@ static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
static DEFINE_PER_CPU(bool, softlockup_touch_sync);
static unsigned long soft_lockup_nmi_warn;

+static int __init softlockup_panic_setup(char *str)
+{
+ softlockup_panic = simple_strtoul(str, NULL, 0);
+ return 1;
+}
+__setup("softlockup_panic=", softlockup_panic_setup);
+
static int __init nowatchdog_setup(char *str)
{
watchdog_user_enabled = 0;
--
2.25.1

2023-10-27 22:06:18

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH 0/2] Triggering a softlockup panic during SMP boot

On Fri, Oct 27, 2023 at 02:46:26PM -0700, Krister Johansen wrote:
> Hi,
> This pair of patches was the result of an unsuccessful attempt to set
> softlockup_panic before SMP boot. The rationale for wanting to set this
> parameter is that some of the VMs that my team runs will occasionally
> get stuck while onlining the non-boot processors as part of SMP boot.
>
> In the cases where this happens, we find out about it after the instance
> successfully boots; however, the machines can get stuck for tens of
> minutes at a time before finally completing onlining processors. Since
> we pay per minute for many of these VMs there were two goals for setting
> this value on boot: first, fail fast and hope that a subsequent boot
> attempt will be successful. Second, a panic is a little easier to keep
> track of, especially if we're scraping serial logs after the fact. In
> essence, the goal is to trigger the failure earlier and hopefully get
> more useful information for further debugging the problem as well.
>
> While testing to make sure that this value was getting correctly set on
> boot, I ran into a pair of surprises. First, when the softlockup_panic
> parameter was migrated to a sysctl alias, it had the side effect of
> setting the parameter value after SMP boot has occurred, when it used to
> be set before this. Second, testing revealed that even though the
> aliases were being correctly processed, the kernel was reporting the
> commandline arguments as unrecognized. This generated a message in the
> logs about an unrecognized parameter (even though it was) and the
> parameter was passed as an environment variable to init.
>
> The first patch ensures that aliased sysctl arguments are not reported
> as unrecognized boot arguments.
>
> The second patch moves the setting of softlockup_panic earlier in boot,
> where it can take effect before SMP boot beings.

Sounds all great but I only got the cover letter, so may be resend?

Luis

2023-10-27 23:08:15

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH 0/2] Triggering a softlockup panic during SMP boot

On Fri, Oct 27, 2023 at 03:04:56PM -0700, Luis Chamberlain wrote:
> On Fri, Oct 27, 2023 at 02:46:26PM -0700, Krister Johansen wrote:
> > Hi,
> > This pair of patches was the result of an unsuccessful attempt to set
> > softlockup_panic before SMP boot. The rationale for wanting to set this
> > parameter is that some of the VMs that my team runs will occasionally
> > get stuck while onlining the non-boot processors as part of SMP boot.
> >
> > In the cases where this happens, we find out about it after the instance
> > successfully boots; however, the machines can get stuck for tens of
> > minutes at a time before finally completing onlining processors. Since
> > we pay per minute for many of these VMs there were two goals for setting
> > this value on boot: first, fail fast and hope that a subsequent boot
> > attempt will be successful. Second, a panic is a little easier to keep
> > track of, especially if we're scraping serial logs after the fact. In
> > essence, the goal is to trigger the failure earlier and hopefully get
> > more useful information for further debugging the problem as well.
> >
> > While testing to make sure that this value was getting correctly set on
> > boot, I ran into a pair of surprises. First, when the softlockup_panic
> > parameter was migrated to a sysctl alias, it had the side effect of
> > setting the parameter value after SMP boot has occurred, when it used to
> > be set before this. Second, testing revealed that even though the
> > aliases were being correctly processed, the kernel was reporting the
> > commandline arguments as unrecognized. This generated a message in the
> > logs about an unrecognized parameter (even though it was) and the
> > parameter was passed as an environment variable to init.
> >
> > The first patch ensures that aliased sysctl arguments are not reported
> > as unrecognized boot arguments.
> >
> > The second patch moves the setting of softlockup_panic earlier in boot,
> > where it can take effect before SMP boot beings.
>
> Sounds all great but I only got the cover letter, so may be resend?

Apologies, I'm not sure quite what went wrong there. I've resent the
patches to the people in the To: of the original messages, in an attempt
to avoid sending copies to everybody a second time.

The entire set seems to have made it to lore:

https://lore.kernel.org/linux-fsdevel/ZTw0CACF3jtT3%[email protected]/T/#r831972d73aad653c3b732e4e36e743cd53673847

If you still haven't got the copies, please let me know and I'll see
if there's something else I can do to get them to you.

Sorry about this. :/

-K

2023-11-01 19:11:19

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH 0/2] Triggering a softlockup panic during SMP boot

On Fri, Oct 27, 2023 at 02:46:26PM -0700, Krister Johansen wrote:
> Hi,
> This pair of patches was the result of an unsuccessful attempt to set
> softlockup_panic before SMP boot. The rationale for wanting to set this
> parameter is that some of the VMs that my team runs will occasionally
> get stuck while onlining the non-boot processors as part of SMP boot.
>
> In the cases where this happens, we find out about it after the instance
> successfully boots; however, the machines can get stuck for tens of
> minutes at a time before finally completing onlining processors. Since
> we pay per minute for many of these VMs there were two goals for setting
> this value on boot: first, fail fast and hope that a subsequent boot
> attempt will be successful. Second, a panic is a little easier to keep
> track of, especially if we're scraping serial logs after the fact. In
> essence, the goal is to trigger the failure earlier and hopefully get
> more useful information for further debugging the problem as well.
>
> While testing to make sure that this value was getting correctly set on
> boot, I ran into a pair of surprises. First, when the softlockup_panic
> parameter was migrated to a sysctl alias, it had the side effect of
> setting the parameter value after SMP boot has occurred, when it used to
> be set before this. Second, testing revealed that even though the
> aliases were being correctly processed, the kernel was reporting the
> commandline arguments as unrecognized. This generated a message in the
> logs about an unrecognized parameter (even though it was) and the
> parameter was passed as an environment variable to init.
>
> The first patch ensures that aliased sysctl arguments are not reported
> as unrecognized boot arguments.
>
> The second patch moves the setting of softlockup_panic earlier in boot,
> where it can take effect before SMP boot beings.

Thanks! Looks good, merged and will push to Linus soon.

Luis