2022-06-14 14:00:23

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 0/4] Extending NMI watchdog during LPM

When a partition is transferred, once it arrives at the destination node,
the partition is active but much of its memory must be transferred from the
start node.

It depends on the activity in the partition, but the more CPU the partition
has, the more memory to be transferred is likely to be. This causes latency
when accessing pages that need to be transferred, and often, for large
partitions, it triggers the NMI watchdog.

The NMI watchdog causes the CPU stack to dump where it appears to be
stuck. In this case, it does not bring much information since it can happen
during any memory access of the kernel.

In addition, the NMI interrupt mechanism is not secure and can generate a
dump system in the event that the interruption is taken while MSR[RI]=0.

Depending on the LPAR size and load, it may be interesting to extend the
NMI watchdog timer during the LPM.

That's configurable through sysctl with the new introduced variable
(specific to powerpc) lpm_nmi_watchdog_factor. This value represents the
percentage added to watchdog_tresh to set the NMI watchdog timeout during a
LPM.

Changes in v2:
- introduce a timer factor.

v1:
[PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer
https://lore.kernel.org/linuxppc-dev/[email protected]/#r

Laurent Dufour (4):
powerpc/mobility: Wait for memory transfer to complete
watchdog: export watchdog_mutex and lockup_detector_reconfigure
powerpc/watchdog: introduce a LPM factor
pseries/mobility: Set NMI watchdog factor during LPM

Documentation/admin-guide/sysctl/kernel.rst | 12 +++
arch/powerpc/include/asm/nmi.h | 2 +
arch/powerpc/kernel/watchdog.c | 22 ++++-
arch/powerpc/platforms/pseries/mobility.c | 90 ++++++++++++++++++++-
include/linux/nmi.h | 3 +
kernel/watchdog.c | 6 +-
6 files changed, 129 insertions(+), 6 deletions(-)

--
2.36.1


2022-06-14 14:00:27

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 1/4] powerpc/mobility: Wait for memory transfer to complete

In pseries_migration_partition(), loop until the memory transfer is
complete. This way the calling drmgr process will not exit earlier,
allowing callbacks to be run only once the migration is fully completed.

If reading the VASI state is done after the hypervisor has completed the
migration, the HCALL is returning H_PARAMETER. We can safely assume that
the memory transfer is achieved if this happens.

This will also allow to manage the NMI watchdog state in the next commits.

Signed-off-by: Laurent Dufour <[email protected]>
---
arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 78f3f74c7056..179bbd4ae881 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
return ret;
}

+static void wait_for_vasi_session_completed(u64 handle)
+{
+ unsigned long state = 0;
+ int ret;
+
+ pr_info("waiting for memory transfert to complete...\n");
+ /*
+ * Wait for transition from H_VASI_RESUMED to
+ * H_VASI_COMPLETED. Treat anything else as an error.
+ */
+ while (true) {
+ ret = poll_vasi_state(handle, &state);
+
+ /*
+ * If the memory transfer is already complete and the migration
+ * has been cleaned up by the hypervisor, H_PARAMETER is return,
+ * which is translate in EINVAL by poll_vasi_state().
+ */
+ if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
+ pr_info("memory transfert completed.\n");
+ break;
+ }
+
+ if (ret) {
+ pr_err("H_VASI_STATE return error (%d)\n", ret);
+ break;
+ }
+
+ if (state != H_VASI_RESUMED) {
+ pr_err("unexpected H_VASI_STATE result %lu\n", state);
+ break;
+ }
+
+ msleep(500);
+ }
+}
+
static void prod_single(unsigned int target_cpu)
{
long hvrc;
@@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
vas_migration_handler(VAS_SUSPEND);

ret = pseries_suspend(handle);
- if (ret == 0)
+ if (ret == 0) {
post_mobility_fixup();
- else
+ wait_for_vasi_session_completed(handle);
+ } else
pseries_cancel_migration(handle, ret);

vas_migration_handler(VAS_RESUME);
--
2.36.1

2022-06-14 14:01:25

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 4/4] pseries/mobility: Set NMI watchdog factor during LPM

During a LPM, while the memory transfer is in progress on the arrival side,
some latencies is generated when accessing not yet transferred pages on the
arrival side. Thus, the NMI watchdog may be triggered too frequently, which
increases the risk to hit a NMI interrupt in a bad place in the kernel,
leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a too
strong work around, some users would want this timeout to be eventually
triggered if the system is hanging even during LPM.

Introduce a new sysctl variable lpm_nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
stopped for the switchover sequence, the NMI watchdog timer is set to
watchdog_tresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the NMI
watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during a LPM.

Signed-off-by: Laurent Dufour <[email protected]>
---
Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
arch/powerpc/platforms/pseries/mobility.c | 48 +++++++++++++++++++++
2 files changed, 60 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index ddccd1077462..53701ed671de 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -485,6 +485,18 @@ When ``kptr_restrict`` is set to 2, kernel pointers printed using
%pK will be replaced with 0s regardless of privileges.


+lpm_nmi_watchdog_factor (PPC only)
+==================================
+
+Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
+set to 1). This factor represents the percentage added to
+``watchdog_thresh`` when calculating the NMI watchdog timeout during a
+LPM. The soft lockup timeout is not impacted.
+
+A value of 0 means no change. The default value is 200 meaning the NMI
+watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
+
+
modprobe
========

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 179bbd4ae881..4284ceaf9060 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -48,6 +48,39 @@ struct update_props_workarea {
#define MIGRATION_SCOPE (1)
#define PRRN_SCOPE -2

+#ifdef CONFIG_PPC_WATCHDOG
+static unsigned int lpm_nmi_wd_factor = 200;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table lpm_nmi_wd_factor_ctl_table[] = {
+ {
+ .procname = "lpm_nmi_watchdog_factor",
+ .data = &lpm_nmi_wd_factor,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ },
+ {}
+};
+static struct ctl_table lpm_nmi_wd_factor_sysctl_root[] = {
+ {
+ .procname = "kernel",
+ .mode = 0555,
+ .child = lpm_nmi_wd_factor_ctl_table,
+ },
+ {}
+};
+
+static int __init register_lpm_nmi_wd_factor_sysctl(void)
+{
+ register_sysctl_table(lpm_nmi_wd_factor_sysctl_root);
+
+ return 0;
+}
+device_initcall(register_lpm_nmi_wd_factor_sysctl);
+#endif /* CONFIG_SYSCTL */
+#endif /* CONFIG_PPC_WATCHDOG */
+
static int mobility_rtas_call(int token, char *buf, s32 scope)
{
int rc;
@@ -702,6 +735,7 @@ static int pseries_suspend(u64 handle)
static int pseries_migrate_partition(u64 handle)
{
int ret;
+ unsigned int factor = lpm_nmi_wd_factor;

ret = wait_for_vasi_session_suspending(handle);
if (ret)
@@ -709,6 +743,13 @@ static int pseries_migrate_partition(u64 handle)

vas_migration_handler(VAS_SUSPEND);

+#ifdef CONFIG_PPC_WATCHDOG
+ if (factor) {
+ pr_info("Set the NMI watchdog factor to %u%%\n", factor);
+ watchdog_nmi_set_lpm_factor(factor);
+ }
+#endif /* CONFIG_PPC_WATCHDOG */
+
ret = pseries_suspend(handle);
if (ret == 0) {
post_mobility_fixup();
@@ -716,6 +757,13 @@ static int pseries_migrate_partition(u64 handle)
} else
pseries_cancel_migration(handle, ret);

+#ifdef CONFIG_PPC_WATCHDOG
+ if (factor) {
+ pr_info("Restoring NMI watchdog timer\n");
+ watchdog_nmi_set_lpm_factor(0);
+ }
+#endif /* CONFIG_PPC_WATCHDOG */
+
vas_migration_handler(VAS_RESUME);

return ret;
--
2.36.1

2022-06-14 14:01:42

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 3/4] powerpc/watchdog: introduce a LPM factor

Introduce a factor which would apply to the NMI watchdog timeout.

This factor is a percentage added to the watchdog_tresh value. The value is
set under the watchdog_mutex protection and lockup_detector_reconfigure()
is called to recompute wd_panic_timeout_tb.

Once the factor is set, it remains until it is set back to 0, which means
no impact.

Signed-off-by: Laurent Dufour <[email protected]>
---
arch/powerpc/include/asm/nmi.h | 2 ++
arch/powerpc/kernel/watchdog.c | 22 +++++++++++++++++++++-
2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index ea0e487f87b1..4eb894ef12a3 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -5,8 +5,10 @@
#ifdef CONFIG_PPC_WATCHDOG
extern void arch_touch_nmi_watchdog(void);
long soft_nmi_interrupt(struct pt_regs *regs);
+void watchdog_nmi_set_lpm_factor(u64 factor);
#else
static inline void arch_touch_nmi_watchdog(void) {}
+static void watchdog_nmi_set_lpm_factor(u64 factor) {}
#endif

#ifdef CONFIG_NMI_IPI
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 7d28b9553654..faaf5ba14d69 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
static cpumask_t wd_smp_cpus_stuck;
static u64 wd_smp_last_reset_tb;

+#ifdef CONFIG_PPC_PSERIES
+static u64 wd_factor;
+#endif
+
/*
* Try to take the exclusive watchdog action / NMI IPI / printing lock.
* wd_smp_lock must be held. If this fails, we should return and wait
@@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)

static void watchdog_calc_timeouts(void)
{
- wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
+ u64 threshold = watchdog_thresh;
+
+#ifdef CONFIG_PPC_PSERIES
+ threshold += (wd_factor * threshold) / 100;
+#endif
+
+ wd_panic_timeout_tb = threshold * ppc_tb_freq;

/* Have the SMP detector trigger a bit later */
wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
@@ -570,3 +580,13 @@ int __init watchdog_nmi_probe(void)
}
return 0;
}
+
+#ifdef CONFIG_PPC_PSERIES
+void watchdog_nmi_set_lpm_factor(u64 factor)
+{
+ mutex_lock(&watchdog_mutex);
+ wd_factor = factor;
+ lockup_detector_reconfigure();
+ mutex_unlock(&watchdog_mutex);
+}
+#endif
--
2.36.1

2022-06-14 14:38:57

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 2/4] watchdog: export watchdog_mutex and lockup_detector_reconfigure

In some cricunstances it may be interesting to reconfigure the watchdog
from inside the kernel.

On PowerPC, this may helpful before and after a LPAR migration (LPM) is
initiated, because it implies some latencies, watchdog, and especially NMI
watchdog is expected to be triggered during this operation. Reconfiguring
the watchdog, would prevent it to happen too frequently during LPM.

The watchdog_mutex is exported to allow some variable to be changed under
its protection and prevent any conflict.
The lockup_detector_reconfigure() function is exported and is expected to
be called under the protection of watchdog_mutex.

Signed-off-by: Laurent Dufour <[email protected]>
---
include/linux/nmi.h | 3 +++
kernel/watchdog.c | 6 +++---
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..84300fb0f90a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -122,6 +122,9 @@ int watchdog_nmi_probe(void);
int watchdog_nmi_enable(unsigned int cpu);
void watchdog_nmi_disable(unsigned int cpu);

+extern struct mutex watchdog_mutex;
+void lockup_detector_reconfigure(void);
+
/**
* touch_nmi_watchdog - restart NMI watchdog timeout.
*
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 20a7a55e62b6..0a67a2dd1258 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -27,7 +27,7 @@
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>

-static DEFINE_MUTEX(watchdog_mutex);
+DEFINE_MUTEX(watchdog_mutex);

#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
# define WATCHDOG_DEFAULT (SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
@@ -541,7 +541,7 @@ int lockup_detector_offline_cpu(unsigned int cpu)
return 0;
}

-static void lockup_detector_reconfigure(void)
+void lockup_detector_reconfigure(void)
{
cpus_read_lock();
watchdog_nmi_stop();
@@ -583,7 +583,7 @@ static __init void lockup_detector_setup(void)
}

#else /* CONFIG_SOFTLOCKUP_DETECTOR */
-static void lockup_detector_reconfigure(void)
+void lockup_detector_reconfigure(void)
{
cpus_read_lock();
watchdog_nmi_stop();
--
2.36.1

2022-06-21 17:02:01

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] powerpc/mobility: Wait for memory transfer to complete

Laurent Dufour <[email protected]> writes:

> In pseries_migration_partition(), loop until the memory transfer is
> complete. This way the calling drmgr process will not exit earlier,
> allowing callbacks to be run only once the migration is fully completed.
>
> If reading the VASI state is done after the hypervisor has completed the
> migration, the HCALL is returning H_PARAMETER. We can safely assume that
> the memory transfer is achieved if this happens.
>
> This will also allow to manage the NMI watchdog state in the next commits.
>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 78f3f74c7056..179bbd4ae881 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
> return ret;
> }
>
> +static void wait_for_vasi_session_completed(u64 handle)
> +{
> + unsigned long state = 0;
> + int ret;
> +
> + pr_info("waiting for memory transfert to complete...\n");
> + /*
> + * Wait for transition from H_VASI_RESUMED to
> + * H_VASI_COMPLETED. Treat anything else as an error.

"Treat anything else as an error" does not match the code since there is
a special case for when the stream handle has expired. So that should be
dropped from this comment.

> + */
> + while (true) {
> + ret = poll_vasi_state(handle, &state);
> +
> + /*
> + * If the memory transfer is already complete and the migration
> + * has been cleaned up by the hypervisor, H_PARAMETER is return,
> + * which is translate in EINVAL by poll_vasi_state().
> + */
> + if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
> + pr_info("memory transfert completed.\n");
> + break;
> + }
> +
> + if (ret) {
> + pr_err("H_VASI_STATE return error (%d)\n", ret);
> + break;
> + }
> +
> + if (state != H_VASI_RESUMED) {
> + pr_err("unexpected H_VASI_STATE result %lu\n", state);
> + break;
> + }
> +
> + msleep(500);
> + }
> +}
> +
> static void prod_single(unsigned int target_cpu)
> {
> long hvrc;
> @@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
> vas_migration_handler(VAS_SUSPEND);
>
> ret = pseries_suspend(handle);
> - if (ret == 0)
> + if (ret == 0) {
> post_mobility_fixup();
> - else
> + wait_for_vasi_session_completed(handle);
> + } else
> pseries_cancel_migration(handle, ret);
>
> vas_migration_handler(VAS_RESUME);

While this may noticeably lengthen the time it takes for drmgr to return
from the system call, it seems like the right thing to do. The migration
should not be considered complete until the VASI stream poll yields a
"Complete" status. And we still need to add code to send gratuitous ARPs
through ibmveth interfaces while waiting for the transition, which would
likely build on this.

I believe the HMC and associated software can cope with the drmgr
command taking a longer time to return in cases where the partition
memory needs a while to completely sync to the destination.

Apart from the small critique on the comment in
wait_for_vasi_session_completed(), this looks fine to me.

Reviewed-by: Nathan Lynch <[email protected]>

2022-06-22 09:42:03

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] powerpc/watchdog: introduce a LPM factor

Hi Laurent,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linus/master v5.19-rc3 next-20220621]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Laurent-Dufour/Extending-NMI-watchdog-during-LPM/20220614-215716
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-mgcoge_defconfig (https://download.01.org/0day-ci/archive/20220622/[email protected]/config)
compiler: powerpc-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/368bca30c0737461c2ed32a788293018c25fc9c7
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Laurent-Dufour/Extending-NMI-watchdog-during-LPM/20220614-215716
git checkout 368bca30c0737461c2ed32a788293018c25fc9c7
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/kernel/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from arch/powerpc/kernel/traps.c:69:
>> arch/powerpc/include/asm/nmi.h:11:13: error: 'watchdog_nmi_set_lpm_factor' defined but not used [-Werror=unused-function]
11 | static void watchdog_nmi_set_lpm_factor(u64 factor) {}
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors


vim +/watchdog_nmi_set_lpm_factor +11 arch/powerpc/include/asm/nmi.h

4
5 #ifdef CONFIG_PPC_WATCHDOG
6 extern void arch_touch_nmi_watchdog(void);
7 long soft_nmi_interrupt(struct pt_regs *regs);
8 void watchdog_nmi_set_lpm_factor(u64 factor);
9 #else
10 static inline void arch_touch_nmi_watchdog(void) {}
> 11 static void watchdog_nmi_set_lpm_factor(u64 factor) {}
12 #endif
13

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-06-23 19:18:23

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] pseries/mobility: Set NMI watchdog factor during LPM

Laurent Dufour <[email protected]> writes:
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 179bbd4ae881..4284ceaf9060 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -48,6 +48,39 @@ struct update_props_workarea {
> #define MIGRATION_SCOPE (1)
> #define PRRN_SCOPE -2
>
> +#ifdef CONFIG_PPC_WATCHDOG
> +static unsigned int lpm_nmi_wd_factor = 200;
> +
> +#ifdef CONFIG_SYSCTL
> +static struct ctl_table lpm_nmi_wd_factor_ctl_table[] = {
> + {
> + .procname = "lpm_nmi_watchdog_factor",

Assuming the basic idea is acceptable, I suggest making the user-visible
name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
apply this to other contexts in the future.

> + .data = &lpm_nmi_wd_factor,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_douintvec_minmax,
> + },
> + {}
> +};
> +static struct ctl_table lpm_nmi_wd_factor_sysctl_root[] = {
> + {
> + .procname = "kernel",
> + .mode = 0555,
> + .child = lpm_nmi_wd_factor_ctl_table,
> + },
> + {}
> +};
> +
> +static int __init register_lpm_nmi_wd_factor_sysctl(void)
> +{
> + register_sysctl_table(lpm_nmi_wd_factor_sysctl_root);
> +
> + return 0;
> +}
> +device_initcall(register_lpm_nmi_wd_factor_sysctl);
> +#endif /* CONFIG_SYSCTL */
> +#endif /* CONFIG_PPC_WATCHDOG */
> +
> static int mobility_rtas_call(int token, char *buf, s32 scope)
> {
> int rc;
> @@ -702,6 +735,7 @@ static int pseries_suspend(u64 handle)
> static int pseries_migrate_partition(u64 handle)
> {
> int ret;
> + unsigned int factor = lpm_nmi_wd_factor;
>
> ret = wait_for_vasi_session_suspending(handle);
> if (ret)
> @@ -709,6 +743,13 @@ static int pseries_migrate_partition(u64 handle)
>
> vas_migration_handler(VAS_SUSPEND);
>
> +#ifdef CONFIG_PPC_WATCHDOG
> + if (factor) {
> + pr_info("Set the NMI watchdog factor to %u%%\n", factor);
> + watchdog_nmi_set_lpm_factor(factor);
> + }
> +#endif /* CONFIG_PPC_WATCHDOG */
> +
> ret = pseries_suspend(handle);
> if (ret == 0) {
> post_mobility_fixup();
> @@ -716,6 +757,13 @@ static int pseries_migrate_partition(u64 handle)
> } else
> pseries_cancel_migration(handle, ret);
>
> +#ifdef CONFIG_PPC_WATCHDOG
> + if (factor) {
> + pr_info("Restoring NMI watchdog timer\n");
> + watchdog_nmi_set_lpm_factor(0);
> + }
> +#endif /* CONFIG_PPC_WATCHDOG */
> +

A couple more suggestions:

* Move the prints into a single statement in watchdog_nmi_set_lpm_factor().

* Add no-op versions of watchdog_nmi_set_lpm_factor for
!CONFIG_PPC_WATCHDOG so we can minimize the #ifdef here.

Otherwise this looks fine to me.

2022-06-24 07:08:22

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] watchdog: export watchdog_mutex and lockup_detector_reconfigure

Laurent Dufour <[email protected]> writes:
> In some cricunstances it may be interesting to reconfigure the watchdog
> from inside the kernel.
>
> On PowerPC, this may helpful before and after a LPAR migration (LPM) is
> initiated, because it implies some latencies, watchdog, and especially NMI
> watchdog is expected to be triggered during this operation. Reconfiguring
> the watchdog, would prevent it to happen too frequently during LPM.
>
> The watchdog_mutex is exported to allow some variable to be changed under
> its protection and prevent any conflict.
> The lockup_detector_reconfigure() function is exported and is expected to
> be called under the protection of watchdog_mutex.
>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> include/linux/nmi.h | 3 +++
> kernel/watchdog.c | 6 +++---
> 2 files changed, 6 insertions(+), 3 deletions(-)

Is there a maintainer for kernel/watchdog.c ?

There's Wim & Guenter at linux-watchdog@vger but I think that's only for
drivers/watchdog?

Maybe we should Cc that list anyway?


> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 750c7f395ca9..84300fb0f90a 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -122,6 +122,9 @@ int watchdog_nmi_probe(void);
> int watchdog_nmi_enable(unsigned int cpu);
> void watchdog_nmi_disable(unsigned int cpu);
>
> +extern struct mutex watchdog_mutex;
> +void lockup_detector_reconfigure(void);

It would be preferable if we didn't export the mutex.

I think you could arrange that by ...

Renaming lockup_detector_configure() to __lockup_detector_configure()
and then adding a new lockup_detector_configure() that is non-static and
takes the lock around __lockup_detector_configure().

cheers

2022-06-24 09:01:47

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] watchdog: export watchdog_mutex and lockup_detector_reconfigure

On 24/06/2022, 08:31:55, Michael Ellerman wrote:
> Laurent Dufour <[email protected]> writes:
>> In some cricunstances it may be interesting to reconfigure the watchdog
>> from inside the kernel.
>>
>> On PowerPC, this may helpful before and after a LPAR migration (LPM) is
>> initiated, because it implies some latencies, watchdog, and especially NMI
>> watchdog is expected to be triggered during this operation. Reconfiguring
>> the watchdog, would prevent it to happen too frequently during LPM.
>>
>> The watchdog_mutex is exported to allow some variable to be changed under
>> its protection and prevent any conflict.
>> The lockup_detector_reconfigure() function is exported and is expected to
>> be called under the protection of watchdog_mutex.
>>
>> Signed-off-by: Laurent Dufour <[email protected]>
>> ---
>> include/linux/nmi.h | 3 +++
>> kernel/watchdog.c | 6 +++---
>> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> Is there a maintainer for kernel/watchdog.c ?

Nothing clearly identified AFAICT.

I'll add the commit signers reported by get_maintainer.pl.

> There's Wim & Guenter at linux-watchdog@vger but I think that's only for
> drivers/watchdog?
>
> Maybe we should Cc that list anyway?

Yes, that's a good idea.

>
>
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index 750c7f395ca9..84300fb0f90a 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -122,6 +122,9 @@ int watchdog_nmi_probe(void);
>> int watchdog_nmi_enable(unsigned int cpu);
>> void watchdog_nmi_disable(unsigned int cpu);
>>
>> +extern struct mutex watchdog_mutex;
>> +void lockup_detector_reconfigure(void);
>
> It would be preferable if we didn't export the mutex.
>
> I think you could arrange that by ...
>
> Renaming lockup_detector_configure() to __lockup_detector_configure()
> and then adding a new lockup_detector_configure() that is non-static and
> takes the lock around __lockup_detector_configure().

Unfortunately, that will not be enough, because this mutex is also used to
protect wd_watchdog, to ensure it is not changed while another operation is
in progress.

I may try finding another way to protect that value, may be using
WRITE/READ_ONCE(). Indeed, the only requirement is to read a stable value
in watchdog_calc_timeouts().

Thanks,
Laurent.



2022-06-24 09:42:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] watchdog: export watchdog_mutex and lockup_detector_reconfigure

On Tue, Jun 14, 2022 at 03:54:12PM +0200, Laurent Dufour wrote:
> The watchdog_mutex is exported to allow some variable to be changed under
> its protection and prevent any conflict.
> The lockup_detector_reconfigure() function is exported and is expected to
> be called under the protection of watchdog_mutex.

Please provide an actual function accessor instead of directly touching
a global lock.

2022-06-24 13:02:39

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] watchdog: export watchdog_mutex and lockup_detector_reconfigure

On 24/06/2022, 11:37:23, Christoph Hellwig wrote:
> On Tue, Jun 14, 2022 at 03:54:12PM +0200, Laurent Dufour wrote:
>> The watchdog_mutex is exported to allow some variable to be changed under
>> its protection and prevent any conflict.
>> The lockup_detector_reconfigure() function is exported and is expected to
>> be called under the protection of watchdog_mutex.
>
> Please provide an actual function accessor instead of directly touching
> a global lock.

Thanks Christoph,

I'll try to not touch to that mutex, if that's not doable, I'll create
function accessor as you're suggesting.

2022-06-24 14:16:50

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] pseries/mobility: Set NMI watchdog factor during LPM

On 23/06/2022, 19:28:34, Nathan Lynch wrote:
> Laurent Dufour <[email protected]> writes:
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 179bbd4ae881..4284ceaf9060 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -48,6 +48,39 @@ struct update_props_workarea {
>> #define MIGRATION_SCOPE (1)
>> #define PRRN_SCOPE -2
>>
>> +#ifdef CONFIG_PPC_WATCHDOG
>> +static unsigned int lpm_nmi_wd_factor = 200;
>> +
>> +#ifdef CONFIG_SYSCTL
>> +static struct ctl_table lpm_nmi_wd_factor_ctl_table[] = {
>> + {
>> + .procname = "lpm_nmi_watchdog_factor",
>
> Assuming the basic idea is acceptable, I suggest making the user-visible
> name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
> apply this to other contexts in the future.

Fair enough, indeed, I was wondering if "lpm" is meaningful.

>
>> + .data = &lpm_nmi_wd_factor,
>> + .maxlen = sizeof(int),
>> + .mode = 0644,
>> + .proc_handler = proc_douintvec_minmax,
>> + },
>> + {}
>> +};
>> +static struct ctl_table lpm_nmi_wd_factor_sysctl_root[] = {
>> + {
>> + .procname = "kernel",
>> + .mode = 0555,
>> + .child = lpm_nmi_wd_factor_ctl_table,
>> + },
>> + {}
>> +};
>> +
>> +static int __init register_lpm_nmi_wd_factor_sysctl(void)
>> +{
>> + register_sysctl_table(lpm_nmi_wd_factor_sysctl_root);
>> +
>> + return 0;
>> +}
>> +device_initcall(register_lpm_nmi_wd_factor_sysctl);
>> +#endif /* CONFIG_SYSCTL */
>> +#endif /* CONFIG_PPC_WATCHDOG */
>> +
>> static int mobility_rtas_call(int token, char *buf, s32 scope)
>> {
>> int rc;
>> @@ -702,6 +735,7 @@ static int pseries_suspend(u64 handle)
>> static int pseries_migrate_partition(u64 handle)
>> {
>> int ret;
>> + unsigned int factor = lpm_nmi_wd_factor;
>>
>> ret = wait_for_vasi_session_suspending(handle);
>> if (ret)
>> @@ -709,6 +743,13 @@ static int pseries_migrate_partition(u64 handle)
>>
>> vas_migration_handler(VAS_SUSPEND);
>>
>> +#ifdef CONFIG_PPC_WATCHDOG
>> + if (factor) {
>> + pr_info("Set the NMI watchdog factor to %u%%\n", factor);
>> + watchdog_nmi_set_lpm_factor(factor);
>> + }
>> +#endif /* CONFIG_PPC_WATCHDOG */
>> +
>> ret = pseries_suspend(handle);
>> if (ret == 0) {
>> post_mobility_fixup();
>> @@ -716,6 +757,13 @@ static int pseries_migrate_partition(u64 handle)
>> } else
>> pseries_cancel_migration(handle, ret);
>>
>> +#ifdef CONFIG_PPC_WATCHDOG
>> + if (factor) {
>> + pr_info("Restoring NMI watchdog timer\n");
>> + watchdog_nmi_set_lpm_factor(0);
>> + }
>> +#endif /* CONFIG_PPC_WATCHDOG */
>> +
>
> A couple more suggestions:
>
> * Move the prints into a single statement in watchdog_nmi_set_lpm_factor().

You're right that sounds a better place.

>
> * Add no-op versions of watchdog_nmi_set_lpm_factor for
> !CONFIG_PPC_WATCHDOG so we can minimize the #ifdef here.
>

Furthermore, this breaks compilation when !CONFIG_PPC_WATCHDOG because
lpm_nmi_wd_factor is not defined. I'll rework that part.

> Otherwise this looks fine to me.

Thanks,
Laurent.