2016-04-02 09:32:45

by Markus Trippelsdorf

[permalink] [raw]
Subject: "perf: interrupt took too long" messages even with perf_cpu_time_max_percent==0

Current git kernel sometimes shows:

perf: interrupt took too long (71 > 52), lowering kernel.perf_event_max_sample_rate to 300
perf: interrupt took too long (103 > 88), lowering kernel.perf_event_max_sample_rate to 300
perf: interrupt took too long (130 > 128), lowering kernel.perf_event_max_sample_rate to 300
perf: interrupt took too long (175 > 162), lowering kernel.perf_event_max_sample_rate to 300
perf: interrupt took too long (219 > 218), lowering kernel.perf_event_max_sample_rate to 300
...

when running e.g. "perf top" even when
/proc/sys/kernel/perf_cpu_time_max_percent is set to 0.

--
Markus


2016-04-02 11:00:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: "perf: interrupt took too long" messages even with perf_cpu_time_max_percent==0

On Sat, Apr 02, 2016 at 11:32:39AM +0200, Markus Trippelsdorf wrote:
> Current git kernel sometimes shows:
>
> perf: interrupt took too long (71 > 52), lowering kernel.perf_event_max_sample_rate to 300
> perf: interrupt took too long (103 > 88), lowering kernel.perf_event_max_sample_rate to 300
> perf: interrupt took too long (130 > 128), lowering kernel.perf_event_max_sample_rate to 300
> perf: interrupt took too long (175 > 162), lowering kernel.perf_event_max_sample_rate to 300
> perf: interrupt took too long (219 > 218), lowering kernel.perf_event_max_sample_rate to 300
> ...
>
> when running e.g. "perf top" even when
> /proc/sys/kernel/perf_cpu_time_max_percent is set to 0.


Ah, was 0 also meant to disable it?

Does the below help?

---
kernel/events/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8c3b35f2a269..21ba024c9ed1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -412,7 +412,8 @@ int perf_cpu_time_max_percent_handler(struct ctl_table *table, int write,
if (ret || !write)
return ret;

- if (sysctl_perf_cpu_time_max_percent == 100) {
+ if (sysctl_perf_cpu_time_max_percent == 100 ||
+ sysctl_perf_cpu_time_max_percent == 0) {
printk(KERN_WARNING
"perf: Dynamic interrupt throttling disabled, can hang your system!\n");
WRITE_ONCE(perf_sample_allowed_ns, 0);

2016-04-02 11:17:31

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: "perf: interrupt took too long" messages even with perf_cpu_time_max_percent==0

On 2016.04.02 at 13:00 +0200, Peter Zijlstra wrote:
> On Sat, Apr 02, 2016 at 11:32:39AM +0200, Markus Trippelsdorf wrote:
> > Current git kernel sometimes shows:
> >
> > perf: interrupt took too long (71 > 52), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (103 > 88), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (130 > 128), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (175 > 162), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (219 > 218), lowering kernel.perf_event_max_sample_rate to 300
> > ...
> >
> > when running e.g. "perf top" even when
> > /proc/sys/kernel/perf_cpu_time_max_percent is set to 0.
>
>
> Ah, was 0 also meant to disable it?

Yes. From Documentation/sysctl/kernel.txt:

perf_cpu_time_max_percent:

Hints to the kernel how much CPU time it should be allowed to
use to handle perf sampling events. If the perf subsystem
is informed that its samples are exceeding this limit, it
will drop its sampling frequency to attempt to reduce its CPU
usage.

Some perf sampling happens in NMIs. If these samples
unexpectedly take too long to execute, the NMIs can become
stacked up next to each other so much that nothing else is
allowed to execute.

0: disable the mechanism. Do not monitor or correct perf's
sampling rate no matter how CPU time it takes.

1-100: attempt to throttle perf's sample rate to this
percentage of CPU. Note: the kernel calculates an
"expected" length of each sample event. 100 here means
100% of that expected length. Even if this is set to
100, you may still see sample throttling if this
length is exceeded. Set to 0 if you truly do not care
how much CPU is consumed.

> Does the below help?

Thanks. I will test it later. But 91a612eea9a3 makes the assumption that only
sysctl_perf_cpu_time_max_percent==100 disables the feature also in
kernel/events/core.c.

--
Markus

2016-04-02 11:23:55

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: "perf: interrupt took too long" messages even with perf_cpu_time_max_percent==0

On 2016.04.02 at 13:17 +0200, Markus Trippelsdorf wrote:
> > Does the below help?
>
> Thanks. I will test it later. But 91a612eea9a3 makes the assumption that only
> sysctl_perf_cpu_time_max_percent==100 disables the feature also in
> kernel/events/core.c.

Please ignore the last sentence. It doesn't make sense.

--
Markus

2016-04-02 16:33:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: "perf: interrupt took too long" messages even with perf_cpu_time_max_percent==0

On Sat, Apr 02, 2016 at 01:17:26PM +0200, Markus Trippelsdorf wrote:
> On 2016.04.02 at 13:00 +0200, Peter Zijlstra wrote:

> > Ah, was 0 also meant to disable it?
>
> Yes. From Documentation/sysctl/kernel.txt:

Oh wow, we have documentation on this! Learn something new every day :-)

2016-04-13 10:03:32

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: "perf: interrupt took too long" messages even with perf_cpu_time_max_percent==0

On 2016.04.02 at 13:00 +0200, Peter Zijlstra wrote:
> On Sat, Apr 02, 2016 at 11:32:39AM +0200, Markus Trippelsdorf wrote:
> > Current git kernel sometimes shows:
> >
> > perf: interrupt took too long (71 > 52), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (103 > 88), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (130 > 128), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (175 > 162), lowering kernel.perf_event_max_sample_rate to 300
> > perf: interrupt took too long (219 > 218), lowering kernel.perf_event_max_sample_rate to 300
> > ...
> >
> > when running e.g. "perf top" even when
> > /proc/sys/kernel/perf_cpu_time_max_percent is set to 0.
>
>
> Ah, was 0 also meant to disable it?
>
> Does the below help?

Yes, it obviously fixes the issue.

--
Markus