2023-02-28 19:22:02

by Song Liu

[permalink] [raw]
Subject: [PATCH] perf: fix perf_event_context->time

Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
accurate time_enabled and time_running for an event. The difference between
ctx->timestamp and ctx->time is the among of time when the context is not
enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
it is not the case at the moment. To show this with drgn [1]:

drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
For help, type help(drgn).
...
>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
>>> ctx.timestamp * 1.0 / ctx.time
(double)1.0385869134111765
>>>

ctx->timestamp is about 4% larger than ctx.time. This issue causes time
read by perf_event_read_local() goes back in some cases.

Fix this by updating the condition for __update_context_time(ctx, false).
Specifically, it should only be called when we enable EVENT_TIME for the
ctx.

[1] drgn: https://github.com/osandov/drgn
Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time")
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 380476a934e8..67478f43e26e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3872,7 +3872,7 @@ ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type)
if (likely(!ctx->nr_events))
return;

- if (is_active ^ EVENT_TIME) {
+ if (!(is_active & EVENT_TIME)) {
/* start ctx time */
__update_context_time(ctx, false);
perf_cgroup_set_timestamp(cpuctx);
--
2.30.2



2023-03-01 22:29:55

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH] perf: fix perf_event_context->time

Hi Song,

On Tue, Feb 28, 2023 at 11:22 AM Song Liu <[email protected]> wrote:
>
> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
> accurate time_enabled and time_running for an event. The difference between
> ctx->timestamp and ctx->time is the among of time when the context is not
> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,

I'm not sure if it's correct. The timestamp can go when the context is disabled
for example, in ctx_resched() even if the NMI watchdog is enabled, right?

> it is not the case at the moment. To show this with drgn [1]:
>
> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
> For help, type help(drgn).
> ...
> >>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
> >>> ctx.timestamp * 1.0 / ctx.time
> (double)1.0385869134111765
> >>>
>
> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
> read by perf_event_read_local() goes back in some cases.

I don't think the difference between the ctx.time and the timestamp
itself is a problem. I think the problem is it CAN update the timestamp
(and the timeoffset) while the context is enabled. Then depending on
the timing, event times can return smaller values than before.

>
> Fix this by updating the condition for __update_context_time(ctx, false).
> Specifically, it should only be called when we enable EVENT_TIME for the
> ctx.

So this change fixes the condition to update the timestamp only if
it moves from disabled to enabled state.

>
> [1] drgn: https://github.com/osandov/drgn
> Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time")
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Signed-off-by: Song Liu <[email protected]>

Actually I got some similar reports lately but couldn't find time to
dig in deeply. Thanks for the fix!

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung


> ---
> kernel/events/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 380476a934e8..67478f43e26e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3872,7 +3872,7 @@ ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type)
> if (likely(!ctx->nr_events))
> return;
>
> - if (is_active ^ EVENT_TIME) {
> + if (!(is_active & EVENT_TIME)) {
> /* start ctx time */
> __update_context_time(ctx, false);
> perf_cgroup_set_timestamp(cpuctx);
> --
> 2.30.2
>

2023-03-01 23:16:40

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH] perf: fix perf_event_context->time



> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <[email protected]> wrote:
>
> Hi Song,
>
> On Tue, Feb 28, 2023 at 11:22 AM Song Liu <[email protected]> wrote:
>>
>> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
>> accurate time_enabled and time_running for an event. The difference between
>> ctx->timestamp and ctx->time is the among of time when the context is not
>> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
>
> I'm not sure if it's correct. The timestamp can go when the context is disabled
> for example, in ctx_resched() even if the NMI watchdog is enabled, right?

I think we do not disable EVENT_TIME for per cpu ctx?

>
>> it is not the case at the moment. To show this with drgn [1]:
>>
>> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
>> For help, type help(drgn).
>> ...
>>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
>>>>> ctx.timestamp * 1.0 / ctx.time
>> (double)1.0385869134111765
>>>>>
>>
>> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
>> read by perf_event_read_local() goes back in some cases.
>
> I don't think the difference between the ctx.time and the timestamp
> itself is a problem. I think the problem is it CAN update the timestamp
> (and the timeoffset) while the context is enabled. Then depending on
> the timing, event times can return smaller values than before.

For per cpu ctx, I think timeoffset should stay the same (may not be zero
though).

Thanks,
Song



2023-03-02 21:16:02

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH] perf: fix perf_event_context->time

On Wed, Mar 1, 2023 at 3:16 PM Song Liu <[email protected]> wrote:
>
>
>
> > On Mar 1, 2023, at 2:29 PM, Namhyung Kim <[email protected]> wrote:
> >
> > Hi Song,
> >
> > On Tue, Feb 28, 2023 at 11:22 AM Song Liu <[email protected]> wrote:
> >>
> >> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
> >> accurate time_enabled and time_running for an event. The difference between
> >> ctx->timestamp and ctx->time is the among of time when the context is not
> >> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
> >
> > I'm not sure if it's correct. The timestamp can go when the context is disabled
> > for example, in ctx_resched() even if the NMI watchdog is enabled, right?
>
> I think we do not disable EVENT_TIME for per cpu ctx?

I can see ctx_sched_out(ctx, EVENT_TIME) in some places.
Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is
cleared.

Thanks,
Namhyung

>
> >
> >> it is not the case at the moment. To show this with drgn [1]:
> >>
> >> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
> >> For help, type help(drgn).
> >> ...
> >>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
> >>>>> ctx.timestamp * 1.0 / ctx.time
> >> (double)1.0385869134111765
> >>>>>
> >>
> >> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
> >> read by perf_event_read_local() goes back in some cases.
> >
> > I don't think the difference between the ctx.time and the timestamp
> > itself is a problem. I think the problem is it CAN update the timestamp
> > (and the timeoffset) while the context is enabled. Then depending on
> > the timing, event times can return smaller values than before.
>
> For per cpu ctx, I think timeoffset should stay the same (may not be zero
> though).
>
> Thanks,
> Song
>
>

2023-03-03 01:22:00

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH] perf: fix perf_event_context->time



> On Mar 2, 2023, at 1:15 PM, Namhyung Kim <[email protected]> wrote:
>
> On Wed, Mar 1, 2023 at 3:16 PM Song Liu <[email protected]> wrote:
>>
>>
>>
>>> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <[email protected]> wrote:
>>>
>>> Hi Song,
>>>
>>> On Tue, Feb 28, 2023 at 11:22 AM Song Liu <[email protected]> wrote:
>>>>
>>>> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
>>>> accurate time_enabled and time_running for an event. The difference between
>>>> ctx->timestamp and ctx->time is the among of time when the context is not
>>>> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
>>>
>>> I'm not sure if it's correct. The timestamp can go when the context is disabled
>>> for example, in ctx_resched() even if the NMI watchdog is enabled, right?
>>
>> I think we do not disable EVENT_TIME for per cpu ctx?
>
> I can see ctx_sched_out(ctx, EVENT_TIME) in some places.
> Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is
> cleared.

Yeah, you are right. I missed this part.

Hi Peter,

Does this fix look good do you?

Thanks,
Song