2024-04-17 03:57:26

by Luo Gengkun

[permalink] [raw]
Subject: [PATCH RESEND v2 0/2] Fix small negative period being ignored

v1 -> v2:
1. Add reviewed by for perf/core: Fix small negative period being ignored
2. Add perf/core: Fix incorrected time diff in tick adjust period

Luo Gengkun (2):
perf/core: Fix small negative period being ignored
perf/core: Fix incorrected time diff in tick adjust period

include/linux/perf_event.h | 1 +
kernel/events/core.c | 21 +++++++++++++++++----
2 files changed, 18 insertions(+), 4 deletions(-)

--
2.34.1



2024-04-17 03:57:39

by Luo Gengkun

[permalink] [raw]
Subject: [PATCH RESEND v2 1/2] perf/core: Fix small negative period being ignored

In perf_adjust_period, we will first calculate period, and then use
this period to calculate delta. However, when delta is less than 0,
there will be a deviation compared to when delta is greater than or
equal to 0. For example, when delta is in the range of [-14,-1], the
range of delta = delta + 7 is between [-7,6], so the final value of
delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
This is unacceptable when the target period is very short, because
we will lose a lot of samples.

Here are some tests and analyzes:
before:
# perf record -e cs -F 1000 ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]

# perf script
...
a.out 396 257.956048: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.957891: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.959730: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.961545: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.963355: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.965163: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.966973: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.968785: 23 cs: ffffffff81f4eeec schedul>
a.out 396 257.970593: 23 cs: ffffffff81f4eeec schedul>
...

after:
# perf record -e cs -F 1000 ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]

# perf script
...
a.out 395 59.338813: 11 cs: ffffffff81f4eeec schedul>
a.out 395 59.339707: 12 cs: ffffffff81f4eeec schedul>
a.out 395 59.340682: 13 cs: ffffffff81f4eeec schedul>
a.out 395 59.341751: 13 cs: ffffffff81f4eeec schedul>
a.out 395 59.342799: 12 cs: ffffffff81f4eeec schedul>
a.out 395 59.343765: 11 cs: ffffffff81f4eeec schedul>
a.out 395 59.344651: 11 cs: ffffffff81f4eeec schedul>
a.out 395 59.345539: 12 cs: ffffffff81f4eeec schedul>
a.out 395 59.346502: 13 cs: ffffffff81f4eeec schedul>
...

test.c

int main() {
for (int i = 0; i < 20000; i++)
usleep(10);

return 0;
}

# time ./a.out
real 0m1.583s
user 0m0.040s
sys 0m0.298s

The above results were tested on x86-64 qemu with KVM enabled using
test.c as test program. Ideally, we should have around 1500 samples,
but the previous algorithm had only about 500, whereas the modified
algorithm now has about 1400. Further more, the new version shows 1
sample per 0.001s, while the previous one is 1 sample per 0.002s.This
indicates that the new algorithm is more sensitive to small negative
values compared to old algorithm.

Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")
Signed-off-by: Luo Gengkun <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
---
kernel/events/core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 683dc086ef10..cad50d3439f1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4078,7 +4078,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
period = perf_calculate_period(event, nsec, count);

delta = (s64)(period - hwc->sample_period);
- delta = (delta + 7) / 8; /* low pass filter */
+ if (delta >= 0)
+ delta += 7;
+ else
+ delta -= 7;
+ delta /= 8; /* low pass filter */

sample_period = hwc->sample_period + delta;

--
2.34.1


2024-04-17 03:57:51

by Luo Gengkun

[permalink] [raw]
Subject: [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period

Adrian found that there is a probability that the number of samples
is small, which is caused by the unreasonable large sampling period.

# taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
# perf script
...
test 865 265.377846: 16 cs: ffffffff832e927b schedule+0x2b
test 865 265.378900: 15 cs: ffffffff832e927b schedule+0x2b
test 865 265.379845: 14 cs: ffffffff832e927b schedule+0x2b
test 865 265.380770: 14 cs: ffffffff832e927b schedule+0x2b
test 865 265.381647: 15 cs: ffffffff832e927b schedule+0x2b
test 865 265.382638: 16 cs: ffffffff832e927b schedule+0x2b
test 865 265.383647: 16 cs: ffffffff832e927b schedule+0x2b
test 865 265.384704: 15 cs: ffffffff832e927b schedule+0x2b
test 865 265.385649: 14 cs: ffffffff832e927b schedule+0x2b
test 865 265.386578: 152 cs: ffffffff832e927b schedule+0x2b
test 865 265.396383: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.406183: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.415839: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.425445: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.435052: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.444708: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.454314: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.463970: 154 cs: ffffffff832e927b schedule+0x2b
test 865 265.473577: 154 cs: ffffffff832e927b schedule+0x2b
...

It seems that the Timer Interrupts is not coming every TICK_NSEC when
system is idle. For example, counter increase n during 2 * TICK_NSEC,
and it call perf_adjust_period using n and TICK_NSEC, so the final period
we calculated will be bigger than expected one. What's more, if the
the overflow time is larger than 2 * TICK_NSEC we cannot tune the period
using __perf_event_account_interrupt. To fix this problem, perf can
calculate the tick interval by itself.

Signed-off-by: Luo Gengkun <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 15 ++++++++++++---
2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index afb028c54f33..2708f1d0692c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -265,6 +265,7 @@ struct hw_perf_event {
* State for freq target events, see __perf_event_overflow() and
* perf_adjust_freq_unthr_context().
*/
+ u64 freq_tick_stamp;
u64 freq_time_stamp;
u64 freq_count_stamp;
#endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cad50d3439f1..0f2025d631aa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4112,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
{
struct perf_event *event;
struct hw_perf_event *hwc;
- u64 now, period = TICK_NSEC;
+ u64 now, period, tick_stamp;
s64 delta;

/*
@@ -4151,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
*/
event->pmu->stop(event, PERF_EF_UPDATE);

+ tick_stamp = perf_clock();
+ period = tick_stamp - hwc->freq_tick_stamp;
+ hwc->freq_tick_stamp = tick_stamp;
+
now = local64_read(&event->count);
delta = now - hwc->freq_count_stamp;
hwc->freq_count_stamp = now;
@@ -4162,8 +4166,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
* to perf_adjust_period() to avoid stopping it
* twice.
*/
- if (delta > 0)
- perf_adjust_period(event, period, delta, false);
+ if (delta > 0) {
+ /*
+ * we skip first tick adjust period
+ */
+ if (likely(period != tick_stamp))
+ perf_adjust_period(event, period, delta, false);
+ }

event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
next:
--
2.34.1


2024-05-06 11:28:29

by Luo Gengkun

[permalink] [raw]
Subject: Re: [PATCH RESEND v2 0/2] Fix small negative period being ignored

Ping.

The previous link is :
https://lore.kernel.org/all/[email protected]/

On 2024/4/17 19:54, Luo Gengkun wrote:
> v1 -> v2:
> 1. Add reviewed by for perf/core: Fix small negative period being ignored
> 2. Add perf/core: Fix incorrected time diff in tick adjust period
>
> Luo Gengkun (2):
> perf/core: Fix small negative period being ignored
> perf/core: Fix incorrected time diff in tick adjust period
>
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 21 +++++++++++++++++----
> 2 files changed, 18 insertions(+), 4 deletions(-)
>