2020-09-24 00:10:59

by Prasad Sodagudi

[permalink] [raw]
Subject: [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints

During the cpu hot plug stress testing, couple of messages
continuous flooding on to the console is causing timers
migration delay. Delayed time migrations from hot plugging
core is causing device instability with watchdog. So reduce
log level for couple of prints in cpu hot plug flow.

Signed-off-by: Prasad Sodagudi <[email protected]>
---
arch/arm64/kernel/smp.c | 2 +-
kernel/irq/cpuhotplug.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 355ee9e..08da6e3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -338,7 +338,7 @@ void __cpu_die(unsigned int cpu)
pr_crit("CPU%u: cpu didn't die\n", cpu);
return;
}
- pr_notice("CPU%u: shutdown\n", cpu);
+ pr_info("CPU%u: shutdown\n", cpu);

/*
* Now that the dying CPU is beyond the point of no return w.r.t.
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 02236b1..82802e0 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -42,7 +42,7 @@ static inline bool irq_needs_fixup(struct irq_data *d)
* If this happens then there was a missed IRQ fixup at some
* point. Warn about it and enforce fixup.
*/
- pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
+ pr_info("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
cpumask_pr_args(m), d->irq, cpu);
return true;
}
@@ -166,7 +166,7 @@ void irq_migrate_all_off_this_cpu(void)
raw_spin_unlock(&desc->lock);

if (affinity_broken) {
- pr_warn_ratelimited("IRQ %u: no longer affine to CPU%u\n",
+ pr_info_ratelimited("IRQ %u: no longer affine to CPU%u\n",
irq, smp_processor_id());
}
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2020-09-24 06:34:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints

On Wed, Sep 23, 2020 at 05:08:31PM -0700, Prasad Sodagudi wrote:
> During the cpu hot plug stress testing, couple of messages
> continuous flooding on to the console is causing timers
> migration delay. Delayed time migrations from hot plugging
> core is causing device instability with watchdog. So reduce
> log level for couple of prints in cpu hot plug flow.
>
> Signed-off-by: Prasad Sodagudi <[email protected]>
> ---
> arch/arm64/kernel/smp.c | 2 +-
> kernel/irq/cpuhotplug.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 355ee9e..08da6e3 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -338,7 +338,7 @@ void __cpu_die(unsigned int cpu)
> pr_crit("CPU%u: cpu didn't die\n", cpu);
> return;
> }
> - pr_notice("CPU%u: shutdown\n", cpu);
> + pr_info("CPU%u: shutdown\n", cpu);
>
> /*
> * Now that the dying CPU is beyond the point of no return w.r.t.
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 02236b1..82802e0 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -42,7 +42,7 @@ static inline bool irq_needs_fixup(struct irq_data *d)
> * If this happens then there was a missed IRQ fixup at some
> * point. Warn about it and enforce fixup.
> */
> - pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
> + pr_info("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
> cpumask_pr_args(m), d->irq, cpu);
> return true;
> }
> @@ -166,7 +166,7 @@ void irq_migrate_all_off_this_cpu(void)
> raw_spin_unlock(&desc->lock);
>
> if (affinity_broken) {
> - pr_warn_ratelimited("IRQ %u: no longer affine to CPU%u\n",
> + pr_info_ratelimited("IRQ %u: no longer affine to CPU%u\n",
> irq, smp_processor_id());
> }
> }
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

Reviewed-by: Greg Kroah-Hartman <[email protected]>

2020-09-24 18:10:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints

On Wed, Sep 23 2020 at 17:08, Prasad Sodagudi wrote:
> During the cpu hot plug stress testing, couple of messages
> continuous flooding on to the console is causing timers
> migration delay. Delayed time migrations from hot plugging
> core is causing device instability with watchdog. So reduce
> log level for couple of prints in cpu hot plug flow.

This is fixing the wrong end, really.

Timer migration can be delayed by other means as well.

The real problem is that the migration happens _after_ the CPU is
completely dead and the hotplug control thread is not guaranteed to
reach the timer migration state before timers are overdue at all.

There is a bunch of related problems, e.g. the interrupt migration
mechanism kicks in late as well.

I'm not against changing the log level per se, but the justification for
doing so is just bogus.

The more obvious question is whether these printks are useful at all
other than at the pr_debug() level.

Thanks,

tglx