Watchdog use performance monitor of cpu clock cycle to generate NMI to detect hard lockup.
But when cpu's frequency changes, the event period will also change.
It's not as expected as the configuration.
For example, set the NMI event handler period is 10 seconds when the cpu is 2.0GHz.
If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
So it may make hard lockup detect not work if the watchdog timeout is not long enough.
Now, set a notifier to listen to the cpu frequency change.
And dynamic re-config the NMI event to make the event period correct.
Signed-off-by: Pan Zhenjie <[email protected]>
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1d795df..717fdac 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -564,7 +564,8 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
int src_cpu, int dst_cpu);
extern u64 perf_event_read_value(struct perf_event *event,
u64 *enabled, u64 *running);
-
+extern void perf_dynamic_adjust_period(struct perf_event *event,
+ u64 sample_period);
struct perf_sample_data {
u64 type;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 59412d0..96596d1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -37,6 +37,7 @@
#include <linux/ftrace_event.h>
#include <linux/hw_breakpoint.h>
#include <linux/mm_types.h>
+#include <linux/math64.h>
#include "internal.h"
@@ -2428,6 +2429,42 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
}
}
+static int perf_percpu_dynamic_adjust_period(void *info)
+{
+ struct perf_event *event = (struct perf_event *)info;
+ s64 left;
+ u64 old_period = event->hw.sample_period;
+ u64 new_period = event->attr.sample_period;
+ u64 shift = 0;
+
+ /* precision is enough */
+ while (old_period > 0xF && new_period > 0xF) {
+ old_period >>= 1;
+ new_period >>= 1;
+ shift++;
+ }
+
+ event->pmu->stop(event, PERF_EF_UPDATE);
+
+ left = local64_read(&event->hw.period_left);
+ left = (s64)div64_u64(left * (event->attr.sample_period >> shift),
+ (event->hw.sample_period >> shift));
+ local64_set(&event->hw.period_left, left);
+
+ event->hw.sample_period = event->attr.sample_period;
+
+ event->pmu->start(event, PERF_EF_RELOAD);
+
+ return 0;
+}
+
+void perf_dynamic_adjust_period(struct perf_event *event, u64 sample_period)
+{
+ event->attr.sample_period = sample_period;
+ cpu_function_call(event->cpu, perf_percpu_dynamic_adjust_period, event);
+}
+EXPORT_SYMBOL_GPL(perf_dynamic_adjust_period);
+
/*
* combine freq adjustment with unthrottling to avoid two passes over the
* events. At the same time, make sure, having freq events does not change
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4a94467..34a953a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,6 +28,7 @@
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
#include <linux/perf_event.h>
+#include <linux/cpufreq.h>
int watchdog_enabled = 1;
int __read_mostly watchdog_thresh = 10;
@@ -470,6 +471,31 @@ static void watchdog_nmi_disable(unsigned int cpu)
}
return;
}
+
+static int watchdog_cpufreq_transition(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ struct perf_event *event;
+ struct cpufreq_freqs *freq = data;
+
+ if (val == CPUFREQ_POSTCHANGE) {
+ event = per_cpu(watchdog_ev, freq->cpu);
+ perf_dynamic_adjust_period(event,
+ (u64)freq->new * 1000 * watchdog_thresh);
+ }
+
+ return 0;
+}
+
+static int __init watchdog_cpufreq(void)
+{
+ static struct notifier_block watchdog_nb;
+ watchdog_nb.notifier_call = watchdog_cpufreq_transition;
+ cpufreq_register_notifier(&watchdog_nb, CPUFREQ_TRANSITION_NOTIFIER);
+
+ return 0;
+}
+late_initcall(watchdog_cpufreq);
#else
static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
static void watchdog_nmi_disable(unsigned int cpu) { return; }
--
1.7.9.5
On Mon, Apr 01, 2013 at 03:47:42AM +0000, Pan, Zhenjie wrote:
> Watchdog use performance monitor of cpu clock cycle to generate NMI to detect hard lockup.
> But when cpu's frequency changes, the event period will also change.
> It's not as expected as the configuration.
> For example, set the NMI event handler period is 10 seconds when the cpu is 2.0GHz.
> If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
> So it may make hard lockup detect not work if the watchdog timeout is not long enough.
> Now, set a notifier to listen to the cpu frequency change.
> And dynamic re-config the NMI event to make the event period correct.
The idea makes sense. A quick glance and things seem ok. How does it
look when you tested it? How often was the frequency changing? Or the
callback being called? That was always my concern, the cpu frequency was
jumping around to much causing thrashing by the watchdog.
Thanks for doing this work. I didn't realize how easy it is to hook into
the cpufreq notifier.
Cheers,
Don
>
> Signed-off-by: Pan Zhenjie <[email protected]>
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1d795df..717fdac 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -564,7 +564,8 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
> int src_cpu, int dst_cpu);
> extern u64 perf_event_read_value(struct perf_event *event,
> u64 *enabled, u64 *running);
> -
> +extern void perf_dynamic_adjust_period(struct perf_event *event,
> + u64 sample_period);
>
> struct perf_sample_data {
> u64 type;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 59412d0..96596d1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -37,6 +37,7 @@
> #include <linux/ftrace_event.h>
> #include <linux/hw_breakpoint.h>
> #include <linux/mm_types.h>
> +#include <linux/math64.h>
>
> #include "internal.h"
>
> @@ -2428,6 +2429,42 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
> }
> }
>
> +static int perf_percpu_dynamic_adjust_period(void *info)
> +{
> + struct perf_event *event = (struct perf_event *)info;
> + s64 left;
> + u64 old_period = event->hw.sample_period;
> + u64 new_period = event->attr.sample_period;
> + u64 shift = 0;
> +
> + /* precision is enough */
> + while (old_period > 0xF && new_period > 0xF) {
> + old_period >>= 1;
> + new_period >>= 1;
> + shift++;
> + }
> +
> + event->pmu->stop(event, PERF_EF_UPDATE);
> +
> + left = local64_read(&event->hw.period_left);
> + left = (s64)div64_u64(left * (event->attr.sample_period >> shift),
> + (event->hw.sample_period >> shift));
> + local64_set(&event->hw.period_left, left);
> +
> + event->hw.sample_period = event->attr.sample_period;
> +
> + event->pmu->start(event, PERF_EF_RELOAD);
> +
> + return 0;
> +}
> +
> +void perf_dynamic_adjust_period(struct perf_event *event, u64 sample_period)
> +{
> + event->attr.sample_period = sample_period;
> + cpu_function_call(event->cpu, perf_percpu_dynamic_adjust_period, event);
> +}
> +EXPORT_SYMBOL_GPL(perf_dynamic_adjust_period);
> +
> /*
> * combine freq adjustment with unthrottling to avoid two passes over the
> * events. At the same time, make sure, having freq events does not change
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 4a94467..34a953a 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -28,6 +28,7 @@
> #include <asm/irq_regs.h>
> #include <linux/kvm_para.h>
> #include <linux/perf_event.h>
> +#include <linux/cpufreq.h>
>
> int watchdog_enabled = 1;
> int __read_mostly watchdog_thresh = 10;
> @@ -470,6 +471,31 @@ static void watchdog_nmi_disable(unsigned int cpu)
> }
> return;
> }
> +
> +static int watchdog_cpufreq_transition(struct notifier_block *nb,
> + unsigned long val, void *data)
> +{
> + struct perf_event *event;
> + struct cpufreq_freqs *freq = data;
> +
> + if (val == CPUFREQ_POSTCHANGE) {
> + event = per_cpu(watchdog_ev, freq->cpu);
> + perf_dynamic_adjust_period(event,
> + (u64)freq->new * 1000 * watchdog_thresh);
> + }
> +
> + return 0;
> +}
> +
> +static int __init watchdog_cpufreq(void)
> +{
> + static struct notifier_block watchdog_nb;
> + watchdog_nb.notifier_call = watchdog_cpufreq_transition;
> + cpufreq_register_notifier(&watchdog_nb, CPUFREQ_TRANSITION_NOTIFIER);
> +
> + return 0;
> +}
> +late_initcall(watchdog_cpufreq);
> #else
> static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
> static void watchdog_nmi_disable(unsigned int cpu) { return; }
> --
> 1.7.9.5
On Mon, 1 Apr 2013 03:47:42 +0000 "Pan, Zhenjie" <[email protected]> wrote:
> Watchdog use performance monitor of cpu clock cycle to generate NMI to detect hard lockup.
> But when cpu's frequency changes, the event period will also change.
> It's not as expected as the configuration.
> For example, set the NMI event handler period is 10 seconds when the cpu is 2.0GHz.
> If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
> So it may make hard lockup detect not work if the watchdog timeout is not long enough.
> Now, set a notifier to listen to the cpu frequency change.
> And dynamic re-config the NMI event to make the event period correct.
>
> Signed-off-by: Pan Zhenjie <[email protected]>
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1d795df..717fdac 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -564,7 +564,8 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
> int src_cpu, int dst_cpu);
> extern u64 perf_event_read_value(struct perf_event *event,
> u64 *enabled, u64 *running);
> -
> +extern void perf_dynamic_adjust_period(struct perf_event *event,
> + u64 sample_period);
>
> struct perf_sample_data {
> u64 type;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 59412d0..96596d1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -37,6 +37,7 @@
> #include <linux/ftrace_event.h>
> #include <linux/hw_breakpoint.h>
> #include <linux/mm_types.h>
> +#include <linux/math64.h>
>
> #include "internal.h"
>
> @@ -2428,6 +2429,42 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
> }
> }
>
> +static int perf_percpu_dynamic_adjust_period(void *info)
> +{
> + struct perf_event *event = (struct perf_event *)info;
The cast of void * is unneeded and is somewhat undesirable, as it might
suppress valid warnings if the type of `info' is later changed.
> + s64 left;
> + u64 old_period = event->hw.sample_period;
> + u64 new_period = event->attr.sample_period;
> + u64 shift = 0;
> +
> + /* precision is enough */
> + while (old_period > 0xF && new_period > 0xF) {
> + old_period >>= 1;
> + new_period >>= 1;
> + shift++;
> + }
> +
> + event->pmu->stop(event, PERF_EF_UPDATE);
> +
> + left = local64_read(&event->hw.period_left);
> + left = (s64)div64_u64(left * (event->attr.sample_period >> shift),
> + (event->hw.sample_period >> shift));
> + local64_set(&event->hw.period_left, left);
> +
> + event->hw.sample_period = event->attr.sample_period;
> +
> + event->pmu->start(event, PERF_EF_RELOAD);
> +
> + return 0;
> +}
>
> ...
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -28,6 +28,7 @@
> #include <asm/irq_regs.h>
> #include <linux/kvm_para.h>
> #include <linux/perf_event.h>
> +#include <linux/cpufreq.h>
>
> int watchdog_enabled = 1;
> int __read_mostly watchdog_thresh = 10;
> @@ -470,6 +471,31 @@ static void watchdog_nmi_disable(unsigned int cpu)
> }
> return;
> }
> +
> +static int watchdog_cpufreq_transition(struct notifier_block *nb,
> + unsigned long val, void *data)
> +{
> + struct perf_event *event;
> + struct cpufreq_freqs *freq = data;
> +
> + if (val == CPUFREQ_POSTCHANGE) {
> + event = per_cpu(watchdog_ev, freq->cpu);
> + perf_dynamic_adjust_period(event,
> + (u64)freq->new * 1000 * watchdog_thresh);
I think this will break the build if CONFIG_PERF_EVENTS=n and
CONFIG_LOCKUP_DETECTOR=y. I was able to create such a config for
powerpc. If I'm reading it correctly, CONFIG_PERF_EVENTS cannot be
disabled on x86_64? If so, what the heck?
> + }
> +
> + return 0;
> +}
> +
> +static int __init watchdog_cpufreq(void)
> +{
> + static struct notifier_block watchdog_nb;
> + watchdog_nb.notifier_call = watchdog_cpufreq_transition;
> + cpufreq_register_notifier(&watchdog_nb, CPUFREQ_TRANSITION_NOTIFIER);
> +
> + return 0;
> +}
> +late_initcall(watchdog_cpufreq);
Overall the patch looks desirable, but it increases the kernel size by
several hundred bytes when CONFIG_CPU_FREQ=n. It should produce no
code in this case! Take a look at the magic in
register_hotcpu_notifier(), the way in which it causes all the code to
be removed by the compiler in the CONFIG_HOTPLUG_CPU=n case. That
trick can be used here.
Also, your patch is a bit buggy - it left watchdog_nb.priority
uninitialized. Easily fixed with
static struct notifier_block watchdog_nb = {
.notifier_call = watchdog_cpufreq_transition,
.priority = ??,
};
and that will result in less code generation as well.
Finally, Don's (good) questions about this patch remain unanswered - please
do attend to that.
Thanks, Don.
First, I think the frequency of CPU frequency change is depend on the special platform.
I tested it on Atom Android platform.
When the system is idle(800MHz) or very busy(2000MHz), it's the easiest case.
If do some user operations like move the screen by finger, the cpu will change frequency about 10~15 times/second.
And I'll do more explanation about my patch.
1. The left time of the event will be obtain and update when cpu will change frequency.
So it will not make the event period wrong when stop/restart the performance counter.
2. When changing the period, only some easy algorithm and enable/disable performance monitor by registers existed.
So I think 10~15 times/second will not affect performance.
Thanks
Pan Zhenjie
> -----Original Message-----
> From: Don Zickus [mailto:[email protected]]
> Sent: Thursday, April 04, 2013 2:00 AM
> To: Pan, Zhenjie
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Liu, Chuansheng; [email protected]
> Subject: Re: [PATCH] NMI: fix NMI period is not correct when cpu frequency
> changes issue.
>
> On Mon, Apr 01, 2013 at 03:47:42AM +0000, Pan, Zhenjie wrote:
> > Watchdog use performance monitor of cpu clock cycle to generate NMI to
> detect hard lockup.
> > But when cpu's frequency changes, the event period will also change.
> > It's not as expected as the configuration.
> > For example, set the NMI event handler period is 10 seconds when the cpu
> is 2.0GHz.
> > If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
> > So it may make hard lockup detect not work if the watchdog timeout is not
> long enough.
> > Now, set a notifier to listen to the cpu frequency change.
> > And dynamic re-config the NMI event to make the event period correct.
>
> The idea makes sense. A quick glance and things seem ok. How does it look
> when you tested it? How often was the frequency changing? Or the callback
> being called? That was always my concern, the cpu frequency was jumping
> around to much causing thrashing by the watchdog.
>
> Thanks for doing this work. I didn't realize how easy it is to hook into the
> cpufreq notifier.
>
> Cheers,
> Don
>
> >
> > Signed-off-by: Pan Zhenjie <[email protected]>
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 1d795df..717fdac 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -564,7 +564,8 @@ extern void perf_pmu_migrate_context(struct pmu
> *pmu,
> > int src_cpu, int dst_cpu);
> > extern u64 perf_event_read_value(struct perf_event *event,
> > u64 *enabled, u64 *running);
> > -
> > +extern void perf_dynamic_adjust_period(struct perf_event *event,
> > + u64 sample_period);
> >
> > struct perf_sample_data {
> > u64 type;
> > diff --git a/kernel/events/core.c b/kernel/events/core.c index
> > 59412d0..96596d1 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -37,6 +37,7 @@
> > #include <linux/ftrace_event.h>
> > #include <linux/hw_breakpoint.h>
> > #include <linux/mm_types.h>
> > +#include <linux/math64.h>
> >
> > #include "internal.h"
> >
> > @@ -2428,6 +2429,42 @@ static void perf_adjust_period(struct perf_event
> *event, u64 nsec, u64 count, bo
> > }
> > }
> >
> > +static int perf_percpu_dynamic_adjust_period(void *info) {
> > + struct perf_event *event = (struct perf_event *)info;
> > + s64 left;
> > + u64 old_period = event->hw.sample_period;
> > + u64 new_period = event->attr.sample_period;
> > + u64 shift = 0;
> > +
> > + /* precision is enough */
> > + while (old_period > 0xF && new_period > 0xF) {
> > + old_period >>= 1;
> > + new_period >>= 1;
> > + shift++;
> > + }
> > +
> > + event->pmu->stop(event, PERF_EF_UPDATE);
> > +
> > + left = local64_read(&event->hw.period_left);
> > + left = (s64)div64_u64(left * (event->attr.sample_period >> shift),
> > + (event->hw.sample_period >> shift));
> > + local64_set(&event->hw.period_left, left);
> > +
> > + event->hw.sample_period = event->attr.sample_period;
> > +
> > + event->pmu->start(event, PERF_EF_RELOAD);
> > +
> > + return 0;
> > +}
> > +
> > +void perf_dynamic_adjust_period(struct perf_event *event, u64
> > +sample_period) {
> > + event->attr.sample_period = sample_period;
> > + cpu_function_call(event->cpu, perf_percpu_dynamic_adjust_period,
> > +event); } EXPORT_SYMBOL_GPL(perf_dynamic_adjust_period);
> > +
> > /*
> > * combine freq adjustment with unthrottling to avoid two passes over the
> > * events. At the same time, make sure, having freq events does not
> > change diff --git a/kernel/watchdog.c b/kernel/watchdog.c index
> > 4a94467..34a953a 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -28,6 +28,7 @@
> > #include <asm/irq_regs.h>
> > #include <linux/kvm_para.h>
> > #include <linux/perf_event.h>
> > +#include <linux/cpufreq.h>
> >
> > int watchdog_enabled = 1;
> > int __read_mostly watchdog_thresh = 10; @@ -470,6 +471,31 @@ static
> > void watchdog_nmi_disable(unsigned int cpu)
> > }
> > return;
> > }
> > +
> > +static int watchdog_cpufreq_transition(struct notifier_block *nb,
> > + unsigned long val, void *data)
> > +{
> > + struct perf_event *event;
> > + struct cpufreq_freqs *freq = data;
> > +
> > + if (val == CPUFREQ_POSTCHANGE) {
> > + event = per_cpu(watchdog_ev, freq->cpu);
> > + perf_dynamic_adjust_period(event,
> > + (u64)freq->new * 1000 * watchdog_thresh);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int __init watchdog_cpufreq(void) {
> > + static struct notifier_block watchdog_nb;
> > + watchdog_nb.notifier_call = watchdog_cpufreq_transition;
> > + cpufreq_register_notifier(&watchdog_nb,
> > +CPUFREQ_TRANSITION_NOTIFIER);
> > +
> > + return 0;
> > +}
> > +late_initcall(watchdog_cpufreq);
> > #else
> > static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
> > static void watchdog_nmi_disable(unsigned int cpu) { return; }
> > --
> > 1.7.9.5
Thanks for your detail comments, Andrew.
Please see my comments below.
> -----Original Message-----
> From: Andrew Morton [mailto:[email protected]]
> Sent: Tuesday, April 16, 2013 7:31 AM
> To: Pan, Zhenjie
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; Liu,
> Chuansheng; [email protected]
> Subject: Re: [PATCH] NMI: fix NMI period is not correct when cpu frequency
> changes issue.
>
> On Mon, 1 Apr 2013 03:47:42 +0000 "Pan, Zhenjie" <[email protected]>
> wrote:
>
> > Watchdog use performance monitor of cpu clock cycle to generate NMI to
> detect hard lockup.
> > But when cpu's frequency changes, the event period will also change.
> > It's not as expected as the configuration.
> > For example, set the NMI event handler period is 10 seconds when the cpu
> is 2.0GHz.
> > If the cpu changes to 800MHz, the period will be 10*(2000/800)=25 seconds.
> > So it may make hard lockup detect not work if the watchdog timeout is not
> long enough.
> > Now, set a notifier to listen to the cpu frequency change.
> > And dynamic re-config the NMI event to make the event period correct.
> >
> > Signed-off-by: Pan Zhenjie <[email protected]>
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 1d795df..717fdac 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -564,7 +564,8 @@ extern void perf_pmu_migrate_context(struct pmu
> *pmu,
> > int src_cpu, int dst_cpu);
> > extern u64 perf_event_read_value(struct perf_event *event,
> > u64 *enabled, u64 *running);
> > -
> > +extern void perf_dynamic_adjust_period(struct perf_event *event,
> > + u64 sample_period);
> >
> > struct perf_sample_data {
> > u64 type;
> > diff --git a/kernel/events/core.c b/kernel/events/core.c index
> > 59412d0..96596d1 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -37,6 +37,7 @@
> > #include <linux/ftrace_event.h>
> > #include <linux/hw_breakpoint.h>
> > #include <linux/mm_types.h>
> > +#include <linux/math64.h>
> >
> > #include "internal.h"
> >
> > @@ -2428,6 +2429,42 @@ static void perf_adjust_period(struct perf_event
> *event, u64 nsec, u64 count, bo
> > }
> > }
> >
> > +static int perf_percpu_dynamic_adjust_period(void *info) {
> > + struct perf_event *event = (struct perf_event *)info;
>
> The cast of void * is unneeded and is somewhat undesirable, as it might
> suppress valid warnings if the type of `info' is later changed.
I'll fix it.
>
> > + s64 left;
> > + u64 old_period = event->hw.sample_period;
> > + u64 new_period = event->attr.sample_period;
> > + u64 shift = 0;
> > +
> > + /* precision is enough */
> > + while (old_period > 0xF && new_period > 0xF) {
> > + old_period >>= 1;
> > + new_period >>= 1;
> > + shift++;
> > + }
> > +
> > + event->pmu->stop(event, PERF_EF_UPDATE);
> > +
> > + left = local64_read(&event->hw.period_left);
> > + left = (s64)div64_u64(left * (event->attr.sample_period >> shift),
> > + (event->hw.sample_period >> shift));
> > + local64_set(&event->hw.period_left, left);
> > +
> > + event->hw.sample_period = event->attr.sample_period;
> > +
> > + event->pmu->start(event, PERF_EF_RELOAD);
> > +
> > + return 0;
> > +}
> >
> > ...
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -28,6 +28,7 @@
> > #include <asm/irq_regs.h>
> > #include <linux/kvm_para.h>
> > #include <linux/perf_event.h>
> > +#include <linux/cpufreq.h>
> >
> > int watchdog_enabled = 1;
> > int __read_mostly watchdog_thresh = 10; @@ -470,6 +471,31 @@ static
> > void watchdog_nmi_disable(unsigned int cpu)
> > }
> > return;
> > }
> > +
> > +static int watchdog_cpufreq_transition(struct notifier_block *nb,
> > + unsigned long val, void *data)
> > +{
> > + struct perf_event *event;
> > + struct cpufreq_freqs *freq = data;
> > +
> > + if (val == CPUFREQ_POSTCHANGE) {
> > + event = per_cpu(watchdog_ev, freq->cpu);
> > + perf_dynamic_adjust_period(event,
> > + (u64)freq->new * 1000 * watchdog_thresh);
>
> I think this will break the build if CONFIG_PERF_EVENTS=n and
> CONFIG_LOCKUP_DETECTOR=y. I was able to create such a config for
> powerpc. If I'm reading it correctly, CONFIG_PERF_EVENTS cannot be
> disabled on x86_64? If so, what the heck?
These two functions I added are in CONFIG_HARDLOCKUP_DETECTOR.
And HARDLOCKUP_DETECTOR depends on PERF_EVENTS.
config HARDLOCKUP_DETECTOR
def_bool y
depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
So it should not have this risk.
>
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int __init watchdog_cpufreq(void) {
> > + static struct notifier_block watchdog_nb;
> > + watchdog_nb.notifier_call = watchdog_cpufreq_transition;
> > + cpufreq_register_notifier(&watchdog_nb,
> > +CPUFREQ_TRANSITION_NOTIFIER);
> > +
> > + return 0;
> > +}
> > +late_initcall(watchdog_cpufreq);
>
> Overall the patch looks desirable, but it increases the kernel size by several
> hundred bytes when CONFIG_CPU_FREQ=n. It should produce no code in
> this case! Take a look at the magic in register_hotcpu_notifier(), the way in
> which it causes all the code to be removed by the compiler in the
> CONFIG_HOTPLUG_CPU=n case. That trick can be used here.
I have checked if CONFIG_CPU_FREQ=n, cpufreq_register_notifier() will be a blank function.
So I think it will not increases the kernel size.
>
> Also, your patch is a bit buggy - it left watchdog_nb.priority uninitialized.
> Easily fixed with
>
>
> static struct notifier_block watchdog_nb = {
> .notifier_call = watchdog_cpufreq_transition,
> .priority = ??,
> };
>
> and that will result in less code generation as well.
I'll fix it.
>
> Finally, Don's (good) questions about this patch remain unanswered - please
> do attend to that.
I've answered it in the reply mail to Don.
Thanks
Pan Zhenjie
On Tue, 16 Apr 2013 03:45:15 +0000 "Pan, Zhenjie" <[email protected]> wrote:
> > Overall the patch looks desirable, but it increases the kernel size by several
> > hundred bytes when CONFIG_CPU_FREQ=n. It should produce no code in
> > this case! Take a look at the magic in register_hotcpu_notifier(), the way in
> > which it causes all the code to be removed by the compiler in the
> > CONFIG_HOTPLUG_CPU=n case. That trick can be used here.
>
> I have checked if CONFIG_CPU_FREQ=n, cpufreq_register_notifier() will be a blank function.
> So I think it will not increases the kernel size.
I tested it. The patch adds ~350 bytes of dead code.
This is partly an infrastructural problem: unlike
register_hotcpu_notifier(), the cpufreq notifier code lacks the
infrastructure with which we can prevent this problem.
On Mon, 2013-04-15 at 16:30 -0700, Andrew Morton wrote:
> I think this will break the build if CONFIG_PERF_EVENTS=n and
> CONFIG_LOCKUP_DETECTOR=y. I was able to create such a config for
> powerpc. If I'm reading it correctly, CONFIG_PERF_EVENTS cannot be
> disabled on x86_64? If so, what the heck?
Frederic and Ingo made that happen,.. a long while ago Frederic
promised to fix that.. I suppose its never been quite important enough
to get around to :/
(Frederic; read this as a gentle prod to move his upward on the todo
list)
On Thu, Apr 18, 2013 at 01:32:09PM +0200, Peter Zijlstra wrote:
> On Mon, 2013-04-15 at 16:30 -0700, Andrew Morton wrote:
> > I think this will break the build if CONFIG_PERF_EVENTS=n and
> > CONFIG_LOCKUP_DETECTOR=y. I was able to create such a config for
> > powerpc. If I'm reading it correctly, CONFIG_PERF_EVENTS cannot be
> > disabled on x86_64? If so, what the heck?
>
> Frederic and Ingo made that happen,.. a long while ago Frederic
> promised to fix that.. I suppose its never been quite important enough
> to get around to :/
>
> (Frederic; read this as a gentle prod to move his upward on the todo
> list)
>
Sorry to answer so late on this.
So the breakpoint code used by ptrace is what retains perf events from
ever being conditionally built in x86.
This is because the breakpoints and the perf events subsystem are
interdependent in some nasty ways:
* perf uses the breakpoint subsystem to create perf events on breakpoints
* ptrace use the breakpoint subsystem which then relies on perf to create
breakpoints.
I don't feel very comfortable with this layout. Ideally, the breakpoint
subsystem should be a standalone layer which both perf and ptrace can use
without forcing perf as a midlayer as in what we have currently.
So here's my point of view. And the upside is that we could make x86
build without CONFIG_PERF_EVENTS.
But I discussed that with Ingo and he seemed to prefer that we keep the
current state, because perf is a handy interface for ptrace to use.
And he's arguably right. The interface is there and it's easy to use,
it provides all the task context management that is needed, all the breakpoint
code and callback support. It's not perfect though: this would be even nicer if we
could pass arch information directly to the breakpoint creation instead of
needing to convert it to generic breakpoint interface datas first.
But still: it's quite handy.
Now the tradeoff is indeed this nasty inter-dependency between perf and breakpoints.
I wouldn't mind changing that and make the breakpoint susbsystem independant
from perf. And in my opinion we probably should do that.
But I need to be sure that Ingo agrees with this aproach because that require quite
some work and I really don't want to spend days of work and weeks of patchset iteration
if there are good chances that it gets eventually pushed back.
Is there any user of x86 that really don't care about perf events? May be some
users are burdened with the .text and .data used by perf that increase the
kernel image?
If some users really need that to change, this would certainly cut short the issue.
Thanks.