Watchdog use performance monitor of cpu clock cycle to generate NMI to detect hard lockup.
But when cpu's frequency changes, the event period will also change.
It's not as expected as the configration.
For example, set the NMI event handler period is 10 seconds when the cpu is 2.0GHz.
If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
So it may make hard lockup detect not work if the watchdog timeout is not long enough.
Now, set a notifier to listen to the cpu frequency change.
And dynamic re-config the NMI event to make the event period correct.
Signed-off-by: Pan Zhenjie <[email protected]>
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1d795df..78fc218 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -564,7 +564,10 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
int src_cpu, int dst_cpu);
extern u64 perf_event_read_value(struct perf_event *event,
u64 *enabled, u64 *running);
-
+#ifdef CONFIG_CPU_FREQ
+extern void perf_dynamic_adjust_period(struct perf_event *event,
+ u64 sample_period);
+#endif
struct perf_sample_data {
u64 type;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7e0962e..bbe5f57 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -37,6 +37,7 @@
#include <linux/ftrace_event.h>
#include <linux/hw_breakpoint.h>
#include <linux/mm_types.h>
+#include <linux/math64.h>
#include "internal.h"
@@ -2428,6 +2429,44 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
}
}
+#ifdef CONFIG_CPU_FREQ
+static int perf_percpu_dynamic_adjust_period(void *info)
+{
+ struct perf_event *event = info;
+ s64 left;
+ u64 old_period = event->hw.sample_period;
+ u64 new_period = event->attr.sample_period;
+ u64 shift = 0;
+
+ /* precision is enough */
+ while (old_period > 0xF && new_period > 0xF) {
+ old_period >>= 1;
+ new_period >>= 1;
+ shift++;
+ }
+
+ event->pmu->stop(event, PERF_EF_UPDATE);
+
+ left = local64_read(&event->hw.period_left);
+ left = (s64)div64_u64(left * (event->attr.sample_period >> shift),
+ (event->hw.sample_period >> shift));
+ local64_set(&event->hw.period_left, left);
+
+ event->hw.sample_period = event->attr.sample_period;
+
+ event->pmu->start(event, PERF_EF_RELOAD);
+
+ return 0;
+}
+
+void perf_dynamic_adjust_period(struct perf_event *event, u64 sample_period)
+{
+ event->attr.sample_period = sample_period;
+ cpu_function_call(event->cpu, perf_percpu_dynamic_adjust_period, event);
+}
+EXPORT_SYMBOL_GPL(perf_dynamic_adjust_period);
+#endif
+
/*
* combine freq adjustment with unthrottling to avoid two passes over the
* events. At the same time, make sure, having freq events does not change
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4a94467..32f3391 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,6 +28,7 @@
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
#include <linux/perf_event.h>
+#include <linux/cpufreq.h>
int watchdog_enabled = 1;
int __read_mostly watchdog_thresh = 10;
@@ -470,6 +471,44 @@ static void watchdog_nmi_disable(unsigned int cpu)
}
return;
}
+
+#ifdef CONFIG_CPU_FREQ
+static int watchdog_cpufreq_transition(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ struct perf_event *event;
+ struct cpufreq_freqs *freq = data;
+
+ if (val == CPUFREQ_POSTCHANGE) {
+ event = per_cpu(watchdog_ev, freq->cpu);
+ perf_dynamic_adjust_period(event,
+ (u64)freq->new * 1000 * watchdog_thresh);
+ }
+
+ return 0;
+}
+
+static struct notifier_block watchdog_nb = {
+ .notifier_call = watchdog_cpufreq_transition,
+ .priority = 0,
+};
+
+static int __init watchdog_cpufreq(void)
+{
+ int ret;
+
+ ret = cpufreq_register_notifier(&watchdog_nb,
+ CPUFREQ_TRANSITION_NOTIFIER);
+
+ if (ret < 0)
+ pr_err("watchdog register CPU frequency notifier fail(%d)\n",
+ ret);
+
+ return ret;
+}
+late_initcall(watchdog_cpufreq);
+#endif
+
#else
static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
static void watchdog_nmi_disable(unsigned int cpu) { return; }
--
1.7.9.5
On Tue, 2013-04-16 at 06:57 +0000, Pan, Zhenjie wrote:
> Watchdog use performance monitor of cpu clock cycle to generate NMI to detect hard lockup.
> But when cpu's frequency changes, the event period will also change.
> It's not as expected as the configration.
> For example, set the NMI event handler period is 10 seconds when the cpu is 2.0GHz.
> If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
> So it may make hard lockup detect not work if the watchdog timeout is not long enough.
> Now, set a notifier to listen to the cpu frequency change.
> And dynamic re-config the NMI event to make the event period correct.
>
Urgh,. does this really matter.. all we really want is for that NMI to
hit eventually in the not too distant future. Does the frequency really
matter _that_ much?
Also, can't we simply pick an event that's invariant to the cpufreq
nonsense? Something like CPU_CLK_UNHALTED.REF -- or better the
fixed_ctr2 which nobody ever uses anyway.
On Thu, Apr 18, 2013 at 1:42 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2013-04-16 at 06:57 +0000, Pan, Zhenjie wrote:
>> Watchdog use performance monitor of cpu clock cycle to generate NMI to detect hard lockup.
>> But when cpu's frequency changes, the event period will also change.
>> It's not as expected as the configration.
>> For example, set the NMI event handler period is 10 seconds when the cpu is 2.0GHz.
>> If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
>> So it may make hard lockup detect not work if the watchdog timeout is not long enough.
>> Now, set a notifier to listen to the cpu frequency change.
>> And dynamic re-config the NMI event to make the event period correct.
>>
>
>
> Urgh,. does this really matter.. all we really want is for that NMI to
> hit eventually in the not too distant future. Does the frequency really
> matter _that_ much?
>
I agree, it does not really matter. Set the watchdog to a couple of minutes
and it should be fine, shouldn't it?
> Also, can't we simply pick an event that's invariant to the cpufreq
> nonsense? Something like CPU_CLK_UNHALTED.REF -- or better the
> fixed_ctr2 which nobody ever uses anyway.
>
You don't want to use fixed counter 2 for NMI watchdog because it's pinned.
No other counter can count this event. And it is very useful. I use it often.
On Thu, Apr 18, 2013 at 02:04:00PM +0200, Stephane Eranian wrote:
> On Thu, Apr 18, 2013 at 1:42 PM, Peter Zijlstra <[email protected]> wrote:
> > On Tue, 2013-04-16 at 06:57 +0000, Pan, Zhenjie wrote:
> >> Watchdog use performance monitor of cpu clock cycle to generate NMI to detect hard lockup.
> >> But when cpu's frequency changes, the event period will also change.
> >> It's not as expected as the configration.
> >> For example, set the NMI event handler period is 10 seconds when the cpu is 2.0GHz.
> >> If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
> >> So it may make hard lockup detect not work if the watchdog timeout is not long enough.
> >> Now, set a notifier to listen to the cpu frequency change.
> >> And dynamic re-config the NMI event to make the event period correct.
> >>
> >
> >
> > Urgh,. does this really matter.. all we really want is for that NMI to
> > hit eventually in the not too distant future. Does the frequency really
> > matter _that_ much?
> >
> I agree, it does not really matter. Set the watchdog to a couple of minutes
> and it should be fine, shouldn't it?
I believe it mattered to the Chrome folks. They want the watchdog to be as
tight as possible so the user experience isn't a hang but a quick reboot
instead. They like setting the watchdog to something like 2 seconds.
There was a patch a few months ago that tried to hack around this issue
and I suggested this approach as a better solution. I forgot what the
original problem was. Perhaps someone can jump in and explain the problem
being solved (other than the watchdog isn't always 10 seconds)?
Cheers,
Don
>
> > Also, can't we simply pick an event that's invariant to the cpufreq
> > nonsense? Something like CPU_CLK_UNHALTED.REF -- or better the
> > fixed_ctr2 which nobody ever uses anyway.
> >
> You don't want to use fixed counter 2 for NMI watchdog because it's pinned.
> No other counter can count this event. And it is very useful. I use it often.
> I believe it mattered to the Chrome folks. They want the watchdog to be as
> tight as possible so the user experience isn't a hang but a quick reboot
> instead. They like setting the watchdog to something like 2 seconds.
>
> There was a patch a few months ago that tried to hack around this issue and I
> suggested this approach as a better solution. I forgot what the original
> problem was. Perhaps someone can jump in and explain the problem being
> solved (other than the watchdog isn't always 10 seconds)?
>
> Cheers,
> Don
Yes, I also think the period is important sometimes.
As I mentioned before, the case I meet is:
When the system hang with interrupt disabled, we use NMI to detect.
Then it will find hard lockup and cause a panic.
Panic is very important for debug these kind of issues.
But if cpu frequency change, the period will be 2 times, 3 times even more.(if cpu can down from 2.0GHz to 200MHz, will be 10 times, it's a very big deviation)
This make watchdog reset happen before hard lockup detect.
Thanks
Pan Zhenjie
On Mon, Apr 22, 2013 at 12:50:34AM +0000, Pan, Zhenjie wrote:
> > I believe it mattered to the Chrome folks. They want the watchdog to be as
> > tight as possible so the user experience isn't a hang but a quick reboot
> > instead. They like setting the watchdog to something like 2 seconds.
> >
> > There was a patch a few months ago that tried to hack around this issue and I
> > suggested this approach as a better solution. I forgot what the original
> > problem was. Perhaps someone can jump in and explain the problem being
> > solved (other than the watchdog isn't always 10 seconds)?
> >
> > Cheers,
> > Don
>
> Yes, I also think the period is important sometimes.
> As I mentioned before, the case I meet is:
> When the system hang with interrupt disabled, we use NMI to detect.
> Then it will find hard lockup and cause a panic.
> Panic is very important for debug these kind of issues.
>
> But if cpu frequency change, the period will be 2 times, 3 times even more.(if cpu can down from 2.0GHz to 200MHz, will be 10 times, it's a very big deviation)
> This make watchdog reset happen before hard lockup detect.
So you are saying with the longer hard lockup delay, the iTCO_wdt is
firing before the hard lockup detector?
Cheers,
Don
On Mon, 2013-04-22 at 00:50 +0000, Pan, Zhenjie wrote:
> This make watchdog reset happen before hard lockup detect.
Doesn't your watchdog trigger an NMI you can use to print the panic?
ISTR some people (hi Don!) spending quite a lot of time to make this
work for some other platforms.
IIRC those things would fire an NMI at some point and then hard-reset
the machine not much later.. the difficulty was detecting this
'unclaimed' nmi and allowing drivers to register for it.
NMI_UNKNOWN and unknown_nmi_panic are the result of that.
> -----Original Message-----
> From: Don Zickus [mailto:[email protected]]
> Sent: Tuesday, April 23, 2013 2:59 AM
> To: Pan, Zhenjie
> Cc: Stephane Eranian; Peter Zijlstra; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Liu, Chuansheng; [email protected]
> Subject: Re: [PATCH v2] NMI: fix NMI period is not correct when cpu
> frequency changes issue.
>
> On Mon, Apr 22, 2013 at 12:50:34AM +0000, Pan, Zhenjie wrote:
> > > I believe it mattered to the Chrome folks. They want the watchdog to
> > > be as tight as possible so the user experience isn't a hang but a
> > > quick reboot instead. They like setting the watchdog to something like 2
> seconds.
> > >
> > > There was a patch a few months ago that tried to hack around this
> > > issue and I suggested this approach as a better solution. I forgot
> > > what the original problem was. Perhaps someone can jump in and
> > > explain the problem being solved (other than the watchdog isn't always
> 10 seconds)?
> > >
> > > Cheers,
> > > Don
> >
> > Yes, I also think the period is important sometimes.
> > As I mentioned before, the case I meet is:
> > When the system hang with interrupt disabled, we use NMI to detect.
> > Then it will find hard lockup and cause a panic.
> > Panic is very important for debug these kind of issues.
> >
> > But if cpu frequency change, the period will be 2 times, 3 times even
> > more.(if cpu can down from 2.0GHz to 200MHz, will be 10 times, it's a very
> big deviation) This make watchdog reset happen before hard lockup detect.
>
> So you are saying with the longer hard lockup delay, the iTCO_wdt is firing
> before the hard lockup detector?
>
> Cheers,
> Don
Give you a detail example:
0s 50s 60s 70s
|_____________________________________|___________|__________|
When 50s, a watchdog interrupt happen to inform watchdog daemon to update watchdog.
If watchdog daemon does not update watchdog in 10s, another watchdog interrupt will happen at 60s to cause a panic.
Then system will have 10s to do some dump.
At 70s, watchdog hardware reset happen.
But if interrupt is disabled at 60s, panic will be lost.
So we need NMI interrupt by performance monitor to detect hard lockup.
If the NMI period is 10s, it can guarantee that hard lockup will be detected before 70s.
But if the period is changed with cpu frequency, this will be not ensure.
Hope my explanation is clear.
BTW, I use intel_scu_watchdog(but looks have big difference with that in upstream), not iTCO_wdt.
Thanks
Pan Zhenjie
On Mon, Apr 22, 2013 at 10:37:36PM +0200, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 00:50 +0000, Pan, Zhenjie wrote:
> > This make watchdog reset happen before hard lockup detect.
>
> Doesn't your watchdog trigger an NMI you can use to print the panic?
>
> ISTR some people (hi Don!) spending quite a lot of time to make this
> work for some other platforms.
>
> IIRC those things would fire an NMI at some point and then hard-reset
> the machine not much later.. the difficulty was detecting this
> 'unclaimed' nmi and allowing drivers to register for it.
>
> NMI_UNKNOWN and unknown_nmi_panic are the result of that.
I think you are confusing the hard lockup detector watchdog (which uses
the perf counters) with a physical hardware watchdog (which just resets
the cpu if not kicked frequently; ie
drivers/watchdog/intel_scu_watchdog.c).
I believe what Zhenjie's problem is the hard lockup detector (ie
nmi_watchdog) becomes useless because sometimes it can correctly fire
before the hardware watchdog expires, other times it may not.
In order for the hard lockup detector to be useful, it should be reliable.
Today it isn't because it period inversely varies with cpu frequency.
I don't have a real issue with his patch. I was just concerned about the
frequency of the changes (10-15 times a second seems like a lot).
Cheers,
Don