2023-11-15 10:33:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0


Namhyung, could you please take a look, you know how to operate this
cgroup stuff.

On Tue, Nov 14, 2023 at 11:20:32PM +0800, David Wang wrote:
>
>
> At 2023-11-14 12:31:12, "David Wang" <[email protected]> wrote:
> >Hi,
> >
> >I was making kernel profiling to identify kernel changes between revisions under the load of fio benchmark,
>
> >But I did not notice a significant change in the overhead of perf tracing based on the fio report,
> >that is why I am not sure whether this is a regression or an improvement....
> >Just report the change.
> >
>
> I run the same test on another disk which is almost idle all the time, and still, significant perf sample stats changes detected:
>
> 6.7.0-rc1
>
> aio_write(49.340% 131501/266521)
> ext4_file_write_iter(94.567% 124356/131501)
> iomap_dio_rw(78.494% 97612/124356)
> __iomap_dio_rw(99.550% 97173/97612)
> iomap_dio_bio_iter(47.773% 46422/97173) <<----
> blk_finish_plug(29.931% 29085/97173) <<----
> iomap_iter(14.082% 13684/97173)
> kmalloc_trace(1.814% 1763/97173)
> kiocb_invalidate_pages(0.631% 613/97173)
> invalidate_inode_pages2_range(0.448% 435/97173)
> srso_return_thunk(0.320% 311/97173)
> blk_start_plug(0.167% 162/97173)
>
>
> 6.7.0-rc1 with f06cc667f79 reverted
>
> aio_write(49.071% 155873/317649)
> ext4_file_write_iter(95.211% 148409/155873)
> iomap_dio_rw(81.816% 121422/148409)
> __iomap_dio_rw(99.684% 121038/121422)
> iomap_dio_bio_iter(40.280% 48754/121038) <<---
> blk_finish_plug(36.760% 44494/121038) <<---
> iomap_iter(14.657% 17740/121038)
> kmalloc_trace(1.775% 2149/121038)
> kiocb_invalidate_pages(0.599% 725/121038)
> invalidate_inode_pages2_range(0.330% 399/121038)
> srso_return_thunk(0.263% 318/121038)
> blk_start_plug(0.169% 204/121038)
>
>
> I remove the 4 occurrences for checking nr_cgroups:
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 683dc086ef10..7583418e94d9 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -690,8 +690,8 @@ static void perf_ctx_disable(struct perf_event_context *ctx, bool cgroup)
> struct perf_event_pmu_context *pmu_ctx;
>
> list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
> - if (cgroup && !pmu_ctx->nr_cgroups)
> - continue;
> + // if (cgroup && !pmu_ctx->nr_cgroups)
> + // continue;
> perf_pmu_disable(pmu_ctx->pmu);
> }
> }
> @@ -701,8 +701,8 @@ static void perf_ctx_enable(struct perf_event_context *ctx, bool cgroup)
> struct perf_event_pmu_context *pmu_ctx;
>
> list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
> - if (cgroup && !pmu_ctx->nr_cgroups)
> - continue;
> + // if (cgroup && !pmu_ctx->nr_cgroups)
> + // continue;
> perf_pmu_enable(pmu_ctx->pmu);
> }
> }
> @@ -3307,8 +3307,8 @@ ctx_sched_out(struct perf_event_context *ctx, enum event_type_t event_type)
> is_active ^= ctx->is_active; /* changed bits */
>
> list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
> - if (cgroup && !pmu_ctx->nr_cgroups)
> - continue;
> + // if (cgroup && !pmu_ctx->nr_cgroups)
> + // continue;
> __pmu_ctx_sched_out(pmu_ctx, is_active);
> }
> }
>
>
> The resulting profiling stats is similar to reverting f06cc667f79.
> I think there are some cases where pmu_ctx->nr_cgroups is zero but there is still cgroup event within, and some samples are missed, causing the stats changes, just a guess.
>
>
> David Wang
>


2023-11-15 11:00:54

by David Wang

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0



At 2023-11-15 18:32:41, "Peter Zijlstra" <[email protected]> wrote:
>
>Namhyung, could you please take a look, you know how to operate this
>cgroup stuff.
>

More information, I run the profiling with 8cpu machine on a SSD with ext4 filesystem :

# mkdir /sys/fs/cgroup/mytest
# echo $$ > /sys/fs/cgroup/mytest/cgroup.procs
## Start profiling targeting cgroup /sys/fs/cgroup/mytest on another terminal
# fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1

I got a feeling that f06cc667f7990 would decrease total samples by 10%~20% when profiling IO benchmark within cgroup.


David Wang

2023-11-15 15:46:31

by Namhyung Kim

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

Hello,

On Wed, Nov 15, 2023 at 2:32 AM Peter Zijlstra <[email protected]> wrote:
>
>
> Namhyung, could you please take a look, you know how to operate this
> cgroup stuff.

Sure.

>
> On Tue, Nov 14, 2023 at 11:20:32PM +0800, David Wang wrote:
> >
> >
> > At 2023-11-14 12:31:12, "David Wang" <[email protected]> wrote:
> > >Hi,
> > >
> > >I was making kernel profiling to identify kernel changes between revisions under the load of fio benchmark,
> >
> > >But I did not notice a significant change in the overhead of perf tracing based on the fio report,
> > >that is why I am not sure whether this is a regression or an improvement....
> > >Just report the change.

Do you mean f06cc667f79 ("perf: Optimize perf_cgroup_switch")?
Did you have cgroup events when you ran the benchmark?
Otherwise it should not make any differences.

> > >
> >
> > I run the same test on another disk which is almost idle all the time, and still, significant perf sample stats changes detected:
> >
> > 6.7.0-rc1
> >
> > aio_write(49.340% 131501/266521)
> > ext4_file_write_iter(94.567% 124356/131501)
> > iomap_dio_rw(78.494% 97612/124356)
> > __iomap_dio_rw(99.550% 97173/97612)
> > iomap_dio_bio_iter(47.773% 46422/97173) <<----
> > blk_finish_plug(29.931% 29085/97173) <<----
> > iomap_iter(14.082% 13684/97173)
> > kmalloc_trace(1.814% 1763/97173)
> > kiocb_invalidate_pages(0.631% 613/97173)
> > invalidate_inode_pages2_range(0.448% 435/97173)
> > srso_return_thunk(0.320% 311/97173)
> > blk_start_plug(0.167% 162/97173)
> >
> >
> > 6.7.0-rc1 with f06cc667f79 reverted
> >
> > aio_write(49.071% 155873/317649)
> > ext4_file_write_iter(95.211% 148409/155873)
> > iomap_dio_rw(81.816% 121422/148409)
> > __iomap_dio_rw(99.684% 121038/121422)
> > iomap_dio_bio_iter(40.280% 48754/121038) <<---
> > blk_finish_plug(36.760% 44494/121038) <<---
> > iomap_iter(14.657% 17740/121038)
> > kmalloc_trace(1.775% 2149/121038)
> > kiocb_invalidate_pages(0.599% 725/121038)
> > invalidate_inode_pages2_range(0.330% 399/121038)
> > srso_return_thunk(0.263% 318/121038)
> > blk_start_plug(0.169% 204/121038)

I cannot say whether it's good or not just by looking
at these numbers. Maybe fs/block folks know..

> >
> >
> > I remove the 4 occurrences for checking nr_cgroups:
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 683dc086ef10..7583418e94d9 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -690,8 +690,8 @@ static void perf_ctx_disable(struct perf_event_context *ctx, bool cgroup)
> > struct perf_event_pmu_context *pmu_ctx;
> >
> > list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
> > - if (cgroup && !pmu_ctx->nr_cgroups)
> > - continue;
> > + // if (cgroup && !pmu_ctx->nr_cgroups)
> > + // continue;
> > perf_pmu_disable(pmu_ctx->pmu);
> > }
> > }
> > @@ -701,8 +701,8 @@ static void perf_ctx_enable(struct perf_event_context *ctx, bool cgroup)
> > struct perf_event_pmu_context *pmu_ctx;
> >
> > list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
> > - if (cgroup && !pmu_ctx->nr_cgroups)
> > - continue;
> > + // if (cgroup && !pmu_ctx->nr_cgroups)
> > + // continue;
> > perf_pmu_enable(pmu_ctx->pmu);
> > }
> > }
> > @@ -3307,8 +3307,8 @@ ctx_sched_out(struct perf_event_context *ctx, enum event_type_t event_type)
> > is_active ^= ctx->is_active; /* changed bits */
> >
> > list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
> > - if (cgroup && !pmu_ctx->nr_cgroups)
> > - continue;
> > + // if (cgroup && !pmu_ctx->nr_cgroups)
> > + // continue;
> > __pmu_ctx_sched_out(pmu_ctx, is_active);
> > }
> > }
> >
> >
> > The resulting profiling stats is similar to reverting f06cc667f79.

Yeah, that's because you effectively reverted the change. :)

> > I think there are some cases where pmu_ctx->nr_cgroups is zero but there is still cgroup event within, and some samples are missed, causing the stats changes, just a guess.

I don't know what makes you think so. Do you see
differences in the perf output? Any cgroup events having
smaller number of samples or counts?

Or was the fio benchmark degraded seriously?

Thanks,
Namhyung

2023-11-15 15:49:01

by Namhyung Kim

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

On Wed, Nov 15, 2023 at 3:00 AM David Wang <[email protected]> wrote:
>
>
>
> At 2023-11-15 18:32:41, "Peter Zijlstra" <[email protected]> wrote:
> >
> >Namhyung, could you please take a look, you know how to operate this
> >cgroup stuff.
> >
>
> More information, I run the profiling with 8cpu machine on a SSD with ext4 filesystem :
>
> # mkdir /sys/fs/cgroup/mytest
> # echo $$ > /sys/fs/cgroup/mytest/cgroup.procs
> ## Start profiling targeting cgroup /sys/fs/cgroup/mytest on another terminal
> # fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1
>
> I got a feeling that f06cc667f7990 would decrease total samples by 10%~20% when profiling IO benchmark within cgroup.

Oh sorry, I missed this message. Can you please share the
command line and the output?

Thanks,
Namhyung

2023-11-15 16:12:40

by David Wang

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0


在 2023-11-15 23:48:33,"Namhyung Kim" <[email protected]> 写道:
>On Wed, Nov 15, 2023 at 3:00 AM David Wang <[email protected]> wrote:
>>
>>
>>
>> At 2023-11-15 18:32:41, "Peter Zijlstra" <[email protected]> wrote:
>> >
>> >Namhyung, could you please take a look, you know how to operate this
>> >cgroup stuff.
>> >
>>
>> More information, I run the profiling with 8cpu machine on a SSD with ext4 filesystem :
>>
>> # mkdir /sys/fs/cgroup/mytest
>> # echo $$ > /sys/fs/cgroup/mytest/cgroup.procs
>> ## Start profiling targeting cgroup /sys/fs/cgroup/mytest on another terminal
>> # fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1
>>
>> I got a feeling that f06cc667f7990 would decrease total samples by 10%~20% when profiling IO benchmark within cgroup.
>
>Oh sorry, I missed this message. Can you please share the
>command line and the output?
>
I did not use perf..... This is the part where it is not quite convincing to report the change, I am using a profiling tool of my own as I mentioned in the first mail.....
But I believe my profiling tools did detect some changes.

I am not experienced with the perf-tool at all, too complicated a tool for me.... But I think I can try it.

David Wang.

2023-11-15 16:25:28

by David Wang

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0



At 2023-11-15 23:45:38, "Namhyung Kim" <[email protected]> wrote:
>Hello,
>

>>
>> On Tue, Nov 14, 2023 at 11:20:32PM +0800, David Wang wrote:
>> >
>> >
>> > At 2023-11-14 12:31:12, "David Wang" <[email protected]> wrote:
>> > >Hi,
>> > >
>> > >I was making kernel profiling to identify kernel changes between revisions under the load of fio benchmark,
>> >
>> > >But I did not notice a significant change in the overhead of perf tracing based on the fio report,
>> > >that is why I am not sure whether this is a regression or an improvement....
>> > >Just report the change.
>
>Do you mean f06cc667f79 ("perf: Optimize perf_cgroup_switch")?

Yes

>Did you have cgroup events when you ran the benchmark?
>Otherwise it should not make any differences.
>

My test happened to trigger frequent context switch (IO bench), and my profiling was targeting a cgroup (v2) where fio runs. , yes, cgroup and context switch :)

>
>> > I think there are some cases where pmu_ctx->nr_cgroups is zero but there is still cgroup event within, and some samples are missed, causing the stats changes, just a guess.
>
>I don't know what makes you think so. Do you see
>differences in the perf output? Any cgroup events having
>smaller number of samples or counts?

I obverse total sample count decreased 10%~20%, one report show sample count increase from 266521 to 317649 after reverting the commit.
But I did not use perf-tool, I will try it.

>
>Or was the fio benchmark degraded seriously?
>

Noop, benchmark did degrade when profiling is running, but no significant changes observed w/o the commit.



David Wang

2023-11-15 16:27:11

by Namhyung Kim

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

On Wed, Nov 15, 2023 at 8:12 AM David Wang <[email protected]> wrote:
>
>
> 在 2023-11-15 23:48:33,"Namhyung Kim" <[email protected]> 写道:
> >On Wed, Nov 15, 2023 at 3:00 AM David Wang <[email protected]> wrote:
> >>
> >>
> >>
> >> At 2023-11-15 18:32:41, "Peter Zijlstra" <[email protected]> wrote:
> >> >
> >> >Namhyung, could you please take a look, you know how to operate this
> >> >cgroup stuff.
> >> >
> >>
> >> More information, I run the profiling with 8cpu machine on a SSD with ext4 filesystem :
> >>
> >> # mkdir /sys/fs/cgroup/mytest
> >> # echo $$ > /sys/fs/cgroup/mytest/cgroup.procs
> >> ## Start profiling targeting cgroup /sys/fs/cgroup/mytest on another terminal
> >> # fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1
> >>
> >> I got a feeling that f06cc667f7990 would decrease total samples by 10%~20% when profiling IO benchmark within cgroup.
> >
> >Oh sorry, I missed this message. Can you please share the
> >command line and the output?
> >
> I did not use perf..... This is the part where it is not quite convincing to report the change, I am using a profiling tool of my own as I mentioned in the first mail.....
> But I believe my profiling tools did detect some changes.

Oh.. ok. You didn't use perf.

Then what is your profiling tool? Where did you see
the 10%~20% drop in samples?

>
> I am not experienced with the perf-tool at all, too complicated a tool for me.... But I think I can try it.

I feel sorry about that. In most cases, just `perf record -a` and
then `perf report` would work well. :)

Thanks,
Namhyung

2023-11-16 04:09:44

by David Wang

[permalink] [raw]
Subject: Re:Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0


At 2023-11-16 00:26:06, "Namhyung Kim" <[email protected]> wrote:
>On Wed, Nov 15, 2023 at 8:12 AM David Wang <[email protected]> wrote:
>>
>>
>> 在 2023-11-15 23:48:33,"Namhyung Kim" <[email protected]> 写道:
>> >On Wed, Nov 15, 2023 at 3:00 AM David Wang <[email protected]> wrote:
>> >>
>> >>
>> >>
>> >> At 2023-11-15 18:32:41, "Peter Zijlstra" <[email protected]> wrote:
>> >> >
>> >> >Namhyung, could you please take a look, you know how to operate this
>> >> >cgroup stuff.
>> >> >
>> >>
>> >> More information, I run the profiling with 8cpu machine on a SSD with ext4 filesystem :
>> >>
>> >> # mkdir /sys/fs/cgroup/mytest
>> >> # echo $$ > /sys/fs/cgroup/mytest/cgroup.procs
>> >> ## Start profiling targeting cgroup /sys/fs/cgroup/mytest on another terminal
>> >> # fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1
>> >>
>> >> I got a feeling that f06cc667f7990 would decrease total samples by 10%~20% when profiling IO benchmark within cgroup.


>
>Then what is your profiling tool? Where did you see
>the 10%~20% drop in samples?
>

I wrote a simple/raw tool just for profiling callchains, which use perf_event_open with following attr:
attr.type = PERF_TYPE_SOFTWARE;
attr.config = PERF_COUNT_SW_CPU_CLOCK;
attr.sample_freq = 777; // adjust it
attr.freq = 1;
attr.wakeup_events = 16;
attr.sample_type = PERF_SAMPLE_TID|PERF_SAMPLE_CALLCHAIN;
attr.sample_max_stack = 32;

The source code could be found here: https://github.com/zq-david-wang/linux-tools/tree/main/perf/profiler
>>
>> I am not experienced with the perf-tool at all, too complicated a tool for me.... But I think I can try it.
>
>I feel sorry about that. In most cases, just `perf record -a` and
>then `perf report` would work well. :)
>
Thanks for the information, I use following command to profile with perf:
`./perf record -a -e cpu-clock -G mytest`
I have run several round of test, and before each test, the system was rebooted, and perf output is

On 6.7.0-rc1:
$ sudo ./perf record -a -e cpu-clock -G mytest
^C[ perf record: Woken up 527 times to write data ]
[ perf record: Captured and wrote 132.648 MB perf.data (2478745 samples) ]
---reboot
$ sudo ./perf record -a -e cpu-clock -G mytest
^C[ perf record: Woken up 473 times to write data ]
[ perf record: Captured and wrote 119.205 MB perf.data (2226994 samples) ]


On 6.7.0-rc1 with f06cc667f79909e9175460b167c277b7c64d3df0 reverted

$ sudo ./perf record -a -e cpu-clock -G mytest
^C[ perf record: Woken up 567 times to write data ]
[ perf record: Captured and wrote 142.771 MB perf.data (2668224 samples) ]
---reboot
$ sudo ./perf record -a -e cpu-clock -G mytest
^C[ perf record: Woken up 557 times to write data ]
[ perf record: Captured and wrote 140.604 MB perf.data (2627167 samples) ]


I also run with `-F 777`, which is some random number I used in my tool, (just to compare with my tool )

On 6.7.0-rc1
$ sudo ./perf record -a -e cpu-clock -F 777 -G mytest
^C[ perf record: Woken up 93 times to write data ]
[ perf record: Captured and wrote 24.575 MB perf.data (455222 samples) ] ( My tool have only ~359K samples, not stable)


On 6.7.0-rc1 with f06cc667f79909e9175460b167c277b7c64d3df0 reverted
$ sudo ./perf record -a -e cpu-clock -F 777 -G mytest
^C[ perf record: Woken up 98 times to write data ]
[ perf record: Captured and wrote 25.703 MB perf.data (476390 samples) ] (My tool have about ~446K, stable)


From the data I collected, I think two problem could be observed for f06cc667f79909e9175460b167c277b7c64d3df0
1. sample missing.
2. sample unstable, total sample count drift a lot between tests.

Thanks
David


2023-11-17 21:12:00

by Namhyung Kim

[permalink] [raw]
Subject: Re: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

On Wed, Nov 15, 2023 at 8:09 PM David Wang <[email protected]> wrote:
>
>
> At 2023-11-16 00:26:06, "Namhyung Kim" <[email protected]> wrote:
> >On Wed, Nov 15, 2023 at 8:12 AM David Wang <[email protected]> wrote:
> >>
> >>
> >> 在 2023-11-15 23:48:33,"Namhyung Kim" <[email protected]> 写道:
> >> >On Wed, Nov 15, 2023 at 3:00 AM David Wang <[email protected]> wrote:
> >> >>
> >> >>
> >> >>
> >> >> At 2023-11-15 18:32:41, "Peter Zijlstra" <[email protected]> wrote:
> >> >> >
> >> >> >Namhyung, could you please take a look, you know how to operate this
> >> >> >cgroup stuff.
> >> >> >
> >> >>
> >> >> More information, I run the profiling with 8cpu machine on a SSD with ext4 filesystem :
> >> >>
> >> >> # mkdir /sys/fs/cgroup/mytest
> >> >> # echo $$ > /sys/fs/cgroup/mytest/cgroup.procs
> >> >> ## Start profiling targeting cgroup /sys/fs/cgroup/mytest on another terminal
> >> >> # fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1
> >> >>
> >> >> I got a feeling that f06cc667f7990 would decrease total samples by 10%~20% when profiling IO benchmark within cgroup.
>
>
> >
> >Then what is your profiling tool? Where did you see
> >the 10%~20% drop in samples?
> >
>
> I wrote a simple/raw tool just for profiling callchains, which use perf_event_open with following attr:
> attr.type = PERF_TYPE_SOFTWARE;
> attr.config = PERF_COUNT_SW_CPU_CLOCK;
> attr.sample_freq = 777; // adjust it
> attr.freq = 1;
> attr.wakeup_events = 16;
> attr.sample_type = PERF_SAMPLE_TID|PERF_SAMPLE_CALLCHAIN;
> attr.sample_max_stack = 32;
>
> The source code could be found here: https://github.com/zq-david-wang/linux-tools/tree/main/perf/profiler
> >>
> >> I am not experienced with the perf-tool at all, too complicated a tool for me.... But I think I can try it.
> >
> >I feel sorry about that. In most cases, just `perf record -a` and
> >then `perf report` would work well. :)
> >
> Thanks for the information, I use following command to profile with perf:
> `./perf record -a -e cpu-clock -G mytest`
> I have run several round of test, and before each test, the system was rebooted, and perf output is
>
> On 6.7.0-rc1:
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 527 times to write data ]
> [ perf record: Captured and wrote 132.648 MB perf.data (2478745 samples) ]
> ---reboot
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 473 times to write data ]
> [ perf record: Captured and wrote 119.205 MB perf.data (2226994 samples) ]
>
>
> On 6.7.0-rc1 with f06cc667f79909e9175460b167c277b7c64d3df0 reverted
>
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 567 times to write data ]
> [ perf record: Captured and wrote 142.771 MB perf.data (2668224 samples) ]
> ---reboot
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 557 times to write data ]
> [ perf record: Captured and wrote 140.604 MB perf.data (2627167 samples) ]
>
>
> I also run with `-F 777`, which is some random number I used in my tool, (just to compare with my tool )
>
> On 6.7.0-rc1
> $ sudo ./perf record -a -e cpu-clock -F 777 -G mytest
> ^C[ perf record: Woken up 93 times to write data ]
> [ perf record: Captured and wrote 24.575 MB perf.data (455222 samples) ] ( My tool have only ~359K samples, not stable)
>
>
> On 6.7.0-rc1 with f06cc667f79909e9175460b167c277b7c64d3df0 reverted
> $ sudo ./perf record -a -e cpu-clock -F 777 -G mytest
> ^C[ perf record: Woken up 98 times to write data ]
> [ perf record: Captured and wrote 25.703 MB perf.data (476390 samples) ] (My tool have about ~446K, stable)
>
>
> From the data I collected, I think two problem could be observed for f06cc667f79909e9175460b167c277b7c64d3df0
> 1. sample missing.
> 2. sample unstable, total sample count drift a lot between tests.

Hmm.. so the fio process was running in the background during
the profiling, right? But I'm not sure how you measured the same
amount of time. Probably you need to run this (for 10 seconds):

sudo perf record -a -G mytest -- sleep 10

And I guess you don't run the perf command in the target cgroup
which is good.

And is there any chance if it's improved because of the change?
Are the numbers in 6.7 better or worse?

Thanks,
Namhyung

2023-11-18 01:48:37

by David Wang

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0


At 2023-11-18 05:11:02, "Namhyung Kim" <[email protected]> wrote:
>On Wed, Nov 15, 2023 at 8:09 PM David Wang <[email protected]> wrote:
>>

>>
>>
>> From the data I collected, I think two problem could be observed for f06cc667f79909e9175460b167c277b7c64d3df0
>> 1. sample missing.
>> 2. sample unstable, total sample count drift a lot between tests.
>
>Hmm.. so the fio process was running in the background during
>the profiling, right? But I'm not sure how you measured the same
>amount of time. Probably you need to run this (for 10 seconds):
>
> sudo perf record -a -G mytest -- sleep 10
>
>And I guess you don't run the perf command in the target cgroup
>which is good.
>

Yes profiling process was not in the target cgroup.
I use fio with `fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1` which would run 600 seconds.
There would be drifts in the profiling report between runs, from those small samples of test data I collected, maybe not enough to make a firm conclusion, I feel when the commit is reverted, the expectation for total sample count is higher and the standard deviation is smaller.

>And is there any chance if it's improved because of the change?
>Are the numbers in 6.7 better or worse?
>
I have no idea whether the change of expected total sample count a bug or a fix, but, the observed result that total sample count drift a lot (bigger standard deviation), I think , is a bad thing.


Thanks
David Wang

2023-11-20 22:59:55

by Namhyung Kim

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

On Fri, Nov 17, 2023 at 5:48 PM David Wang <[email protected]> wrote:
>
>
> At 2023-11-18 05:11:02, "Namhyung Kim" <[email protected]> wrote:
> >On Wed, Nov 15, 2023 at 8:09 PM David Wang <[email protected]> wrote:
> >>
>
> >>
> >>
> >> From the data I collected, I think two problem could be observed for f06cc667f79909e9175460b167c277b7c64d3df0
> >> 1. sample missing.
> >> 2. sample unstable, total sample count drift a lot between tests.
> >
> >Hmm.. so the fio process was running in the background during
> >the profiling, right? But I'm not sure how you measured the same
> >amount of time. Probably you need to run this (for 10 seconds):
> >
> > sudo perf record -a -G mytest -- sleep 10
> >
> >And I guess you don't run the perf command in the target cgroup
> >which is good.
> >
>
> Yes profiling process was not in the target cgroup.
> I use fio with `fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1` which would run 600 seconds.
> There would be drifts in the profiling report between runs, from those small samples of test data I collected, maybe not enough to make a firm conclusion, I feel when the commit is reverted, the expectation for total sample count is higher and the standard deviation is smaller.
>
> >And is there any chance if it's improved because of the change?
> >Are the numbers in 6.7 better or worse?
> >
> I have no idea whether the change of expected total sample count a bug or a fix, but, the observed result that total sample count drift a lot (bigger standard deviation), I think , is a bad thing.

Right. Can you run perf stat to measure the number of context
switches and cgroup switches, then?

sudo perf stat -a -e context-switches,cgroup-switches -- sleep 10

Thanks,
Namhyung

2023-11-21 05:54:23

by David Wang

[permalink] [raw]
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0


At 2023-11-21 06:59:18, "Namhyung Kim" <[email protected]> wrote:
>On Fri, Nov 17, 2023 at 5:48 PM David Wang <[email protected]> wrote:
>>

>> >And is there any chance if it's improved because of the change?
>> >Are the numbers in 6.7 better or worse?
>> >
>> I have no idea whether the change of expected total sample count a bug or a fix, but, the observed result that total sample count drift a lot (bigger standard deviation), I think , is a bad thing.
>
>Right. Can you run perf stat to measure the number of context
>switches and cgroup switches, then?
>
> sudo perf stat -a -e context-switches,cgroup-switches -- sleep 10
>

I upgraded to 6.7-rc2, now I am testing v6.7-rc2 against v6.7-rc2 with f06cc66 reverted with following command:
$ for i in {1..6}; do sudo ./perf stat -a -e context-switches,cgroup-switches -G mytest -- sleep 10; sleep 1; done
while `fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=300 --numjobs=4 --time_based=1` was running in cgroup mytest.

v6.7-rc2
+----+------------------+-----------------+----------+
| | context-switches | cgroup-switches | ratio |
+----+------------------+-----------------+----------+
| #1 | 505168 | 503494 | 1:0.9967 |
| #2 | 503627 | 501932 | 1:0.9966 |
| #3 | 509352 | 507397 | 1:0.9962 |
| #4 | 510274 | 508357 | 1:0.9962 |
| #5 | 508872 | 507063 | 1:0.9964 |
| #6 | 506138 | 504205 | 1:0.9962 |
+----+------------------+-----------------+----------+

v6.7-rc2 with f06cc66 reverted:
+----+------------------+-----------------+----------+
| | context-switches | cgroup-switches | ratio |
+----+------------------+-----------------+----------+
| #1 | 462504 | 460733 | 1:0.9962 |
| #2 | 464717 | 463036 | 1:0.9964 |
| #3 | 462245 | 460361 | 1:0.9959 |
| #4 | 463522 | 461783 | 1:0.9962 |
| #5 | 459760 | 458005 | 1:0.9962 |
| #6 | 457863 | 456024 | 1:0.9960 |
+----+------------------+-----------------+----------+

The different mean value is apparent, but no wider deviation observed, maybe the larger standard deviation observed in my last mail is caused by other factors during fio session.


Thanks
David Wang