LinuxLists.cc - Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

2021-11-29 18:33:23

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On 29/11/21 13:15, Josef Bacik wrote:
> On Mon, Nov 29, 2021 at 06:03:24PM +0000, Valentin Schneider wrote:
>> Would you happen to have execution traces by any chance? If not I should be
>> able to get one out of that fsperf thingie.
>>
>
> I don't, if you want to tell me how I can do it right now. I've disabled
> everything on this box for now so it's literally just sitting there waiting to
> have things done to it. Thanks,
>

I see you have Ftrace enabled in your config, so that ought to do it:

trace-cmd record -e 'sched:*' -e 'cpu_idle' $your_test_cmd

> Josef

2021-11-29 22:40:44

by Josef Bacik

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On Mon, Nov 29, 2021 at 06:31:17PM +0000, Valentin Schneider wrote:
> On 29/11/21 13:15, Josef Bacik wrote:
> > On Mon, Nov 29, 2021 at 06:03:24PM +0000, Valentin Schneider wrote:
> >> Would you happen to have execution traces by any chance? If not I should be
> >> able to get one out of that fsperf thingie.
> >>
> >
> > I don't, if you want to tell me how I can do it right now. I've disabled
> > everything on this box for now so it's literally just sitting there waiting to
> > have things done to it. Thanks,
> >
>
> I see you have Ftrace enabled in your config, so that ought to do it:
>
> trace-cmd record -e 'sched:*' -e 'cpu_idle' $your_test_cmd
>

http://toxicpanda.com/performance/trace.dat

it's like 16mib. Enjoy,

Josef

2021-11-30 00:26:38

by Valentin Schneider

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On 29/11/21 14:49, Josef Bacik wrote:
> On Mon, Nov 29, 2021 at 06:31:17PM +0000, Valentin Schneider wrote:
>> On 29/11/21 13:15, Josef Bacik wrote:
>> > On Mon, Nov 29, 2021 at 06:03:24PM +0000, Valentin Schneider wrote:
>> >> Would you happen to have execution traces by any chance? If not I should be
>> >> able to get one out of that fsperf thingie.
>> >>
>> >
>> > I don't, if you want to tell me how I can do it right now. I've disabled
>> > everything on this box for now so it's literally just sitting there waiting to
>> > have things done to it. Thanks,
>> >
>>
>> I see you have Ftrace enabled in your config, so that ought to do it:
>>
>> trace-cmd record -e 'sched:*' -e 'cpu_idle' $your_test_cmd
>>
>
> http://toxicpanda.com/performance/trace.dat
>
> it's like 16mib. Enjoy,
>

Neat, thanks!

Runqueue depth seems to be very rarely greater than 1, tasks with ~1ms
runtime and lots of sleeping (also bursty kworker activity with activations
of tens of µs), and some cores (Internet tells me that Xeon Bronze 3204
doesn't have SMT) spend most of their time idling. Not the most apocalyptic
task placement vs ILB selection, but the task activation patterns roughly
look like what I was thinking of - there might be hope for me yet.

I'll continue the headscratching after tomorrow's round of thinking juice.

> Josef

2021-12-03 12:03:42

by Valentin Schneider

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On 30/11/21 00:26, Valentin Schneider wrote:
> On 29/11/21 14:49, Josef Bacik wrote:
>> On Mon, Nov 29, 2021 at 06:31:17PM +0000, Valentin Schneider wrote:
>>> On 29/11/21 13:15, Josef Bacik wrote:
>>> > On Mon, Nov 29, 2021 at 06:03:24PM +0000, Valentin Schneider wrote:
>>> >> Would you happen to have execution traces by any chance? If not I should be
>>> >> able to get one out of that fsperf thingie.
>>> >>
>>> >
>>> > I don't, if you want to tell me how I can do it right now. I've disabled
>>> > everything on this box for now so it's literally just sitting there waiting to
>>> > have things done to it. Thanks,
>>> >
>>>
>>> I see you have Ftrace enabled in your config, so that ought to do it:
>>>
>>> trace-cmd record -e 'sched:*' -e 'cpu_idle' $your_test_cmd
>>>
>>
>> http://toxicpanda.com/performance/trace.dat
>>
>> it's like 16mib. Enjoy,
>>
>
> Neat, thanks!
>
> Runqueue depth seems to be very rarely greater than 1, tasks with ~1ms
> runtime and lots of sleeping (also bursty kworker activity with activations
> of tens of µs), and some cores (Internet tells me that Xeon Bronze 3204
> doesn't have SMT) spend most of their time idling. Not the most apocalyptic
> task placement vs ILB selection, but the task activation patterns roughly
> look like what I was thinking of - there might be hope for me yet.
>
> I'll continue the headscratching after tomorrow's round of thinking juice.
>

Could you give the 4 top patches, i.e. those above
8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
a try?

https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression

I gave that a quick test on the platform that caused me to write the patch
you bisected and looks like it didn't break the original fix. If the above
counter-measures aren't sufficient, I'll have to go poke at your
reproducers...

>> Josef

2021-12-03 19:00:37

by Josef Bacik

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On Fri, Dec 03, 2021 at 12:03:27PM +0000, Valentin Schneider wrote:
> On 30/11/21 00:26, Valentin Schneider wrote:
> > On 29/11/21 14:49, Josef Bacik wrote:
> >> On Mon, Nov 29, 2021 at 06:31:17PM +0000, Valentin Schneider wrote:
> >>> On 29/11/21 13:15, Josef Bacik wrote:
> >>> > On Mon, Nov 29, 2021 at 06:03:24PM +0000, Valentin Schneider wrote:
> >>> >> Would you happen to have execution traces by any chance? If not I should be
> >>> >> able to get one out of that fsperf thingie.
> >>> >>
> >>> >
> >>> > I don't, if you want to tell me how I can do it right now. I've disabled
> >>> > everything on this box for now so it's literally just sitting there waiting to
> >>> > have things done to it. Thanks,
> >>> >
> >>>
> >>> I see you have Ftrace enabled in your config, so that ought to do it:
> >>>
> >>> trace-cmd record -e 'sched:*' -e 'cpu_idle' $your_test_cmd
> >>>
> >>
> >> http://toxicpanda.com/performance/trace.dat
> >>
> >> it's like 16mib. Enjoy,
> >>
> >
> > Neat, thanks!
> >
> > Runqueue depth seems to be very rarely greater than 1, tasks with ~1ms
> > runtime and lots of sleeping (also bursty kworker activity with activations
> > of tens of ?s), and some cores (Internet tells me that Xeon Bronze 3204
> > doesn't have SMT) spend most of their time idling. Not the most apocalyptic
> > task placement vs ILB selection, but the task activation patterns roughly
> > look like what I was thinking of - there might be hope for me yet.
> >
> > I'll continue the headscratching after tomorrow's round of thinking juice.
> >
>
> Could you give the 4 top patches, i.e. those above
> 8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
> a try?
>
> https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression
>
> I gave that a quick test on the platform that caused me to write the patch
> you bisected and looks like it didn't break the original fix. If the above
> counter-measures aren't sufficient, I'll have to go poke at your
> reproducers...
>

It's better but still around 6% regression. If I compare these patches to the
average of the last few days worth of runs you're 5% better than before, so
progress but not completely erased.

metric baseline current stdev diff
======================================================================
write_io_kbytes 125000 125000 0 0.00%
read_clat_ns_p99 0 0 0 0.00%
write_bw_bytes 1.73e+08 1.74e+08 5370366.50 0.69%
read_iops 0 0 0 0.00%
write_clat_ns_p50 18265.60 18150.40 345.21 -0.63%
read_io_kbytes 0 0 0 0.00%
read_io_bytes 0 0 0 0.00%
write_clat_ns_p99 84684.80 90316.80 6607.94 6.65%
read_bw_bytes 0 0 0 0.00%
elapsed 1 1 0 0.00%
write_lat_ns_min 0 0 0 0.00%
sys_cpu 91.22 91.00 1.40 -0.24%
write_lat_ns_max 0 0 0 0.00%
read_lat_ns_min 0 0 0 0.00%
write_iops 42308.54 42601.71 1311.12 0.69%
read_lat_ns_max 0 0 0 0.00%
read_clat_ns_p50 0 0 0 0.00%

Thanks,

Josef

2021-12-06 09:48:25

by Valentin Schneider

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On 03/12/21 14:00, Josef Bacik wrote:
> On Fri, Dec 03, 2021 at 12:03:27PM +0000, Valentin Schneider wrote:
>> Could you give the 4 top patches, i.e. those above
>> 8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
>> a try?
>>
>> https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression
>>
>> I gave that a quick test on the platform that caused me to write the patch
>> you bisected and looks like it didn't break the original fix. If the above
>> counter-measures aren't sufficient, I'll have to go poke at your
>> reproducers...
>>
>
> It's better but still around 6% regression. If I compare these patches to the
> average of the last few days worth of runs you're 5% better than before, so
> progress but not completely erased.
>

Hmph, time for me to reproduce this locally then. Thanks!

2021-12-09 17:22:16

by Valentin Schneider

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On 06/12/21 09:48, Valentin Schneider wrote:
> On 03/12/21 14:00, Josef Bacik wrote:
>> On Fri, Dec 03, 2021 at 12:03:27PM +0000, Valentin Schneider wrote:
>>> Could you give the 4 top patches, i.e. those above
>>> 8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
>>> a try?
>>>
>>> https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression
>>>
>>> I gave that a quick test on the platform that caused me to write the patch
>>> you bisected and looks like it didn't break the original fix. If the above
>>> counter-measures aren't sufficient, I'll have to go poke at your
>>> reproducers...
>>>
>>
>> It's better but still around 6% regression. If I compare these patches to the
>> average of the last few days worth of runs you're 5% better than before, so
>> progress but not completely erased.
>>
>
> Hmph, time for me to reproduce this locally then. Thanks!

I carved out a partition out of an Ampere eMAG's HDD to play with BTRFS
via fsperf; this is what I get for the bisected commit (baseline is
bisected patchset's immediate parent, aka v5.15-rc4) via a handful of
./fsperf -p before-regression -c btrfs -n 100 -t emptyfiles500k

write_clat_ns_p99 195395.92 198790.46 4797.01 1.74%
write_iops 17305.79 17471.57 250.66 0.96%

write_clat_ns_p99 195395.92 197694.06 4797.01 1.18%
write_iops 17305.79 17533.62 250.66 1.32%

write_clat_ns_p99 195395.92 197903.67 4797.01 1.28%
write_iops 17305.79 17519.71 250.66 1.24%

If I compare against tip/sched/core however:

write_clat_ns_p99 195395.92 202936.32 4797.01 3.86%
write_iops 17305.79 17065.46 250.66 -1.39%

write_clat_ns_p99 195395.92 204349.44 4797.01 4.58%
write_iops 17305.79 17097.79 250.66 -1.20%

write_clat_ns_p99 195395.92 204169.05 4797.01 4.49%
write_iops 17305.79 17112.29 250.66 -1.12%

tip/sched/core + my patches:

write_clat_ns_p99 195395.92 205721.60 4797.01 5.28%
write_iops 17305.79 16947.59 250.66 -2.07%

write_clat_ns_p99 195395.92 203358.04 4797.01 4.07%
write_iops 17305.79 16953.24 250.66 -2.04%

write_clat_ns_p99 195395.92 201830.40 4797.01 3.29%
write_iops 17305.79 17041.18 250.66 -1.53%

So tip/sched/core seems to have a much worse regression, and my patches
are making things worse on that system...

I've started a bisection to see where the above leads me, unfortunately
this machine needs more babysitting than I thought so it's gonna take a
while.

@Josef any chance you could see if the above also applies to you? tip lives
at https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, though from
where my bisection is taking me it looks like you should see that against
Linus' tree as well.

Thanks,
Valentin

2021-12-09 19:16:26

by Josef Bacik

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On Thu, Dec 09, 2021 at 05:22:05PM +0000, Valentin Schneider wrote:
> On 06/12/21 09:48, Valentin Schneider wrote:
> > On 03/12/21 14:00, Josef Bacik wrote:
> >> On Fri, Dec 03, 2021 at 12:03:27PM +0000, Valentin Schneider wrote:
> >>> Could you give the 4 top patches, i.e. those above
> >>> 8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
> >>> a try?
> >>>
> >>> https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression
> >>>
> >>> I gave that a quick test on the platform that caused me to write the patch
> >>> you bisected and looks like it didn't break the original fix. If the above
> >>> counter-measures aren't sufficient, I'll have to go poke at your
> >>> reproducers...
> >>>
> >>
> >> It's better but still around 6% regression. If I compare these patches to the
> >> average of the last few days worth of runs you're 5% better than before, so
> >> progress but not completely erased.
> >>
> >
> > Hmph, time for me to reproduce this locally then. Thanks!
>
> I carved out a partition out of an Ampere eMAG's HDD to play with BTRFS
> via fsperf; this is what I get for the bisected commit (baseline is
> bisected patchset's immediate parent, aka v5.15-rc4) via a handful of
> ./fsperf -p before-regression -c btrfs -n 100 -t emptyfiles500k
>
> write_clat_ns_p99 195395.92 198790.46 4797.01 1.74%
> write_iops 17305.79 17471.57 250.66 0.96%
>
> write_clat_ns_p99 195395.92 197694.06 4797.01 1.18%
> write_iops 17305.79 17533.62 250.66 1.32%
>
> write_clat_ns_p99 195395.92 197903.67 4797.01 1.28%
> write_iops 17305.79 17519.71 250.66 1.24%
>
> If I compare against tip/sched/core however:
>
> write_clat_ns_p99 195395.92 202936.32 4797.01 3.86%
> write_iops 17305.79 17065.46 250.66 -1.39%
>
> write_clat_ns_p99 195395.92 204349.44 4797.01 4.58%
> write_iops 17305.79 17097.79 250.66 -1.20%
>
> write_clat_ns_p99 195395.92 204169.05 4797.01 4.49%
> write_iops 17305.79 17112.29 250.66 -1.12%
>
> tip/sched/core + my patches:
>
> write_clat_ns_p99 195395.92 205721.60 4797.01 5.28%
> write_iops 17305.79 16947.59 250.66 -2.07%
>
> write_clat_ns_p99 195395.92 203358.04 4797.01 4.07%
> write_iops 17305.79 16953.24 250.66 -2.04%
>
> write_clat_ns_p99 195395.92 201830.40 4797.01 3.29%
> write_iops 17305.79 17041.18 250.66 -1.53%
>
> So tip/sched/core seems to have a much worse regression, and my patches
> are making things worse on that system...
>
> I've started a bisection to see where the above leads me, unfortunately
> this machine needs more babysitting than I thought so it's gonna take a
> while.
>
> @Josef any chance you could see if the above also applies to you? tip lives
> at https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, though from
> where my bisection is taking me it looks like you should see that against
> Linus' tree as well.
>

This has made us all curious, so we're all fucking around with schbench to see
if we can make it show up without needing to use fsperf. Maybe that'll help
with the bisect, because I had to bisect twice to land on your patches, and I
only emailed when I could see the change right before and right after your
patch. It would not surprise me at all if there's something else here that's
causing us pain.
> Thanks,
> Valentin

2021-12-22 12:42:56

by Thorsten Leemhuis

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

Hi, this is your Linux kernel regression tracker speaking.

On 09.12.21 20:16, Josef Bacik wrote:
> On Thu, Dec 09, 2021 at 05:22:05PM +0000, Valentin Schneider wrote:
>> On 06/12/21 09:48, Valentin Schneider wrote:
>>> On 03/12/21 14:00, Josef Bacik wrote:
>>>> On Fri, Dec 03, 2021 at 12:03:27PM +0000, Valentin Schneider wrote:
>>>>> Could you give the 4 top patches, i.e. those above
>>>>> 8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
>>>>> a try?
>>>>>
>>>>> https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression
>>>>>
>>>>> I gave that a quick test on the platform that caused me to write the patch
>>>>> you bisected and looks like it didn't break the original fix. If the above
>>>>> counter-measures aren't sufficient, I'll have to go poke at your
>>>>> reproducers...
>>>>>
>>>>
>>>> It's better but still around 6% regression. If I compare these patches to the
>>>> average of the last few days worth of runs you're 5% better than before, so
>>>> progress but not completely erased.
>>>>
>>>
>>> Hmph, time for me to reproduce this locally then. Thanks!
>>
>> I carved out a partition out of an Ampere eMAG's HDD to play with BTRFS
>> via fsperf; this is what I get for the bisected commit (baseline is
>> bisected patchset's immediate parent, aka v5.15-rc4) via a handful of
>> ./fsperf -p before-regression -c btrfs -n 100 -t emptyfiles500k
>>
>> write_clat_ns_p99 195395.92 198790.46 4797.01 1.74%
>> write_iops 17305.79 17471.57 250.66 0.96%
>>
>> write_clat_ns_p99 195395.92 197694.06 4797.01 1.18%
>> write_iops 17305.79 17533.62 250.66 1.32%
>>
>> write_clat_ns_p99 195395.92 197903.67 4797.01 1.28%
>> write_iops 17305.79 17519.71 250.66 1.24%
>>
>> If I compare against tip/sched/core however:
>>
>> write_clat_ns_p99 195395.92 202936.32 4797.01 3.86%
>> write_iops 17305.79 17065.46 250.66 -1.39%
>>
>> write_clat_ns_p99 195395.92 204349.44 4797.01 4.58%
>> write_iops 17305.79 17097.79 250.66 -1.20%
>>
>> write_clat_ns_p99 195395.92 204169.05 4797.01 4.49%
>> write_iops 17305.79 17112.29 250.66 -1.12%
>>
>> tip/sched/core + my patches:
>>
>> write_clat_ns_p99 195395.92 205721.60 4797.01 5.28%
>> write_iops 17305.79 16947.59 250.66 -2.07%
>>
>> write_clat_ns_p99 195395.92 203358.04 4797.01 4.07%
>> write_iops 17305.79 16953.24 250.66 -2.04%
>>
>> write_clat_ns_p99 195395.92 201830.40 4797.01 3.29%
>> write_iops 17305.79 17041.18 250.66 -1.53%
>>
>> So tip/sched/core seems to have a much worse regression, and my patches
>> are making things worse on that system...
>>
>> I've started a bisection to see where the above leads me, unfortunately
>> this machine needs more babysitting than I thought so it's gonna take a
>> while.
>>
>> @Josef any chance you could see if the above also applies to you? tip lives
>> at https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, though from
>> where my bisection is taking me it looks like you should see that against
>> Linus' tree as well.
>>
>
> This has made us all curious, so we're all fucking around with schbench to see
> if we can make it show up without needing to use fsperf. Maybe that'll help
> with the bisect, because I had to bisect twice to land on your patches, and I
> only emailed when I could see the change right before and right after your
> patch. It would not surprise me at all if there's something else here that's
> causing us pain.

What's the status here? Just wondering, because there hasn't been any
activity in this thread since 11 days and the festive season is upon us.

Was the discussion moved elsewhere? Or is this still a mystery? And if
it is: how bad is it, does it need to be fixed before Linus releases 5.16?

Ciao, Thorsten

#regzbot poke

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave thus might sent someone reading this down the wrong
rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

2021-12-22 16:07:45

by Valentin Schneider

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

Hi,

On 22/12/21 13:42, Thorsten Leemhuis wrote:
> What's the status here? Just wondering, because there hasn't been any
> activity in this thread since 11 days and the festive season is upon us.
>
> Was the discussion moved elsewhere? Or is this still a mystery? And if
> it is: how bad is it, does it need to be fixed before Linus releases 5.16?
>

I got to the end of bisect #3 yesterday, the incriminated commit doesn't
seem to make much sense but I've just re-tested it and there is a clear
regression between that commit and its parent (unlike bisect #1 and #2):

2127d22509aec3a83dffb2a3c736df7ba747a7ce mm, slub: fix two bugs in slab_debug_trace_open()
write_clat_ns_p99 195395.92 199638.20 4797.01 2.17%
write_iops 17305.79 17188.24 250.66 -0.68%

write_clat_ns_p99 195543.84 199996.70 5122.88 2.28%
write_iops 17300.61 17241.86 251.56 -0.34%

write_clat_ns_p99 195543.84 200724.48 5122.88 2.65%
write_iops 17300.61 17246.63 251.56 -0.31%

write_clat_ns_p99 195543.84 200445.41 5122.88 2.51%
write_iops 17300.61 17215.47 251.56 -0.49%

6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
write_clat_ns_p99 195395.92 197942.30 4797.01 1.30%
write_iops 17305.79 17246.56 250.66 -0.34%

write_clat_ns_p99 195543.84 196183.92 5122.88 0.33%
write_iops 17300.61 17310.33 251.56 0.06%

write_clat_ns_p99 195543.84 196990.71 5122.88 0.74%
write_iops 17300.61 17346.32 251.56 0.26%

write_clat_ns_p99 195543.84 196362.24 5122.88 0.42%
write_iops 17300.61 17315.71 251.56 0.09%

It's pure debug stuff and AFAICT is a correct fix...
@Josef, could you test that on your side?

> Ciao, Thorsten
>

2022-01-03 16:16:08

by Josef Bacik

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On Wed, Dec 22, 2021 at 04:07:35PM +0000, Valentin Schneider wrote:
>
> Hi,
>
> On 22/12/21 13:42, Thorsten Leemhuis wrote:
> > What's the status here? Just wondering, because there hasn't been any
> > activity in this thread since 11 days and the festive season is upon us.
> >
> > Was the discussion moved elsewhere? Or is this still a mystery? And if
> > it is: how bad is it, does it need to be fixed before Linus releases 5.16?
> >
>
> I got to the end of bisect #3 yesterday, the incriminated commit doesn't
> seem to make much sense but I've just re-tested it and there is a clear
> regression between that commit and its parent (unlike bisect #1 and #2):
>
> 2127d22509aec3a83dffb2a3c736df7ba747a7ce mm, slub: fix two bugs in slab_debug_trace_open()
> write_clat_ns_p99 195395.92 199638.20 4797.01 2.17%
> write_iops 17305.79 17188.24 250.66 -0.68%
>
> write_clat_ns_p99 195543.84 199996.70 5122.88 2.28%
> write_iops 17300.61 17241.86 251.56 -0.34%
>
> write_clat_ns_p99 195543.84 200724.48 5122.88 2.65%
> write_iops 17300.61 17246.63 251.56 -0.31%
>
> write_clat_ns_p99 195543.84 200445.41 5122.88 2.51%
> write_iops 17300.61 17215.47 251.56 -0.49%
>
> 6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
> write_clat_ns_p99 195395.92 197942.30 4797.01 1.30%
> write_iops 17305.79 17246.56 250.66 -0.34%
>
> write_clat_ns_p99 195543.84 196183.92 5122.88 0.33%
> write_iops 17300.61 17310.33 251.56 0.06%
>
> write_clat_ns_p99 195543.84 196990.71 5122.88 0.74%
> write_iops 17300.61 17346.32 251.56 0.26%
>
> write_clat_ns_p99 195543.84 196362.24 5122.88 0.42%
> write_iops 17300.61 17315.71 251.56 0.09%
>
> It's pure debug stuff and AFAICT is a correct fix...
> @Josef, could you test that on your side?

Sorry, holidays and all that. I see 0 difference between the two commits, and
no regression from baseline. It'll take me a few days to recover from the
holidays, but I'll put some more effort into actively debugging wtf is going on
here on my side since we're all having trouble pinning down what's going on.
Thanks,

Josef

2022-01-13 16:42:09

by Valentin Schneider

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On 03/01/22 11:16, Josef Bacik wrote:
> On Wed, Dec 22, 2021 at 04:07:35PM +0000, Valentin Schneider wrote:
>>
>> Hi,
>>
>> On 22/12/21 13:42, Thorsten Leemhuis wrote:
>> > What's the status here? Just wondering, because there hasn't been any
>> > activity in this thread since 11 days and the festive season is upon us.
>> >
>> > Was the discussion moved elsewhere? Or is this still a mystery? And if
>> > it is: how bad is it, does it need to be fixed before Linus releases 5.16?
>> >
>>
>> I got to the end of bisect #3 yesterday, the incriminated commit doesn't
>> seem to make much sense but I've just re-tested it and there is a clear
>> regression between that commit and its parent (unlike bisect #1 and #2):
>>
>> 2127d22509aec3a83dffb2a3c736df7ba747a7ce mm, slub: fix two bugs in slab_debug_trace_open()
>> write_clat_ns_p99 195395.92 199638.20 4797.01 2.17%
>> write_iops 17305.79 17188.24 250.66 -0.68%
>>
>> write_clat_ns_p99 195543.84 199996.70 5122.88 2.28%
>> write_iops 17300.61 17241.86 251.56 -0.34%
>>
>> write_clat_ns_p99 195543.84 200724.48 5122.88 2.65%
>> write_iops 17300.61 17246.63 251.56 -0.31%
>>
>> write_clat_ns_p99 195543.84 200445.41 5122.88 2.51%
>> write_iops 17300.61 17215.47 251.56 -0.49%
>>
>> 6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
>> write_clat_ns_p99 195395.92 197942.30 4797.01 1.30%
>> write_iops 17305.79 17246.56 250.66 -0.34%
>>
>> write_clat_ns_p99 195543.84 196183.92 5122.88 0.33%
>> write_iops 17300.61 17310.33 251.56 0.06%
>>
>> write_clat_ns_p99 195543.84 196990.71 5122.88 0.74%
>> write_iops 17300.61 17346.32 251.56 0.26%
>>
>> write_clat_ns_p99 195543.84 196362.24 5122.88 0.42%
>> write_iops 17300.61 17315.71 251.56 0.09%
>>
>> It's pure debug stuff and AFAICT is a correct fix...
>> @Josef, could you test that on your side?
>
> Sorry, holidays and all that. I see 0 difference between the two commits, and
> no regression from baseline. It'll take me a few days to recover from the
> holidays, but I'll put some more effort into actively debugging wtf is going on
> here on my side since we're all having trouble pinning down what's going
> on.

Humph, that's unfortunate... I just came back from my holidays, so I'll be
untangling my inbox for the next few days. Do keep us posted!

> Thanks,
>
> Josef

2022-01-13 16:57:54

by Roman Gushchin

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On Thu, Jan 13, 2022 at 04:41:57PM +0000, Valentin Schneider wrote:
> On 03/01/22 11:16, Josef Bacik wrote:
> > On Wed, Dec 22, 2021 at 04:07:35PM +0000, Valentin Schneider wrote:
> >>
> >> Hi,
> >>
> >> On 22/12/21 13:42, Thorsten Leemhuis wrote:
> >> > What's the status here? Just wondering, because there hasn't been any
> >> > activity in this thread since 11 days and the festive season is upon us.
> >> >
> >> > Was the discussion moved elsewhere? Or is this still a mystery? And if
> >> > it is: how bad is it, does it need to be fixed before Linus releases 5.16?
> >> >
> >>
> >> I got to the end of bisect #3 yesterday, the incriminated commit doesn't
> >> seem to make much sense but I've just re-tested it and there is a clear
> >> regression between that commit and its parent (unlike bisect #1 and #2):
> >>
> >> 2127d22509aec3a83dffb2a3c736df7ba747a7ce mm, slub: fix two bugs in slab_debug_trace_open()
> >> write_clat_ns_p99 195395.92 199638.20 4797.01 2.17%
> >> write_iops 17305.79 17188.24 250.66 -0.68%
> >>
> >> write_clat_ns_p99 195543.84 199996.70 5122.88 2.28%
> >> write_iops 17300.61 17241.86 251.56 -0.34%
> >>
> >> write_clat_ns_p99 195543.84 200724.48 5122.88 2.65%
> >> write_iops 17300.61 17246.63 251.56 -0.31%
> >>
> >> write_clat_ns_p99 195543.84 200445.41 5122.88 2.51%
> >> write_iops 17300.61 17215.47 251.56 -0.49%
> >>
> >> 6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
> >> write_clat_ns_p99 195395.92 197942.30 4797.01 1.30%
> >> write_iops 17305.79 17246.56 250.66 -0.34%
> >>
> >> write_clat_ns_p99 195543.84 196183.92 5122.88 0.33%
> >> write_iops 17300.61 17310.33 251.56 0.06%
> >>
> >> write_clat_ns_p99 195543.84 196990.71 5122.88 0.74%
> >> write_iops 17300.61 17346.32 251.56 0.26%
> >>
> >> write_clat_ns_p99 195543.84 196362.24 5122.88 0.42%
> >> write_iops 17300.61 17315.71 251.56 0.09%
> >>
> >> It's pure debug stuff and AFAICT is a correct fix...
> >> @Josef, could you test that on your side?
> >
> > Sorry, holidays and all that. I see 0 difference between the two commits, and
> > no regression from baseline. It'll take me a few days to recover from the
> > holidays, but I'll put some more effort into actively debugging wtf is going on
> > here on my side since we're all having trouble pinning down what's going
> > on.
>
> Humph, that's unfortunate... I just came back from my holidays, so I'll be
> untangling my inbox for the next few days. Do keep us posted!

I'm trying to bisect it independently and make sense of it too, thanks to Josef
for providing me a test setup. From the very first data I've got yesterday,
the only thing I can say the data is very noisy and I'm not totally convinced
that the regression is coming from the patch which was blamed initially.

I hope to make more progress today/tomorrow, will keep you updated.

Thanks!

2022-02-18 11:37:26

by Thorsten Leemhuis

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

Hi, this is your Linux kernel regression tracker speaking. Top-posting
for once, to make this easy accessible to everyone.

FWIW, this is a gentle reminder that I'm still tracking this regression.
Afaics nothing happened in the last few weeks.

If the discussion continued somewhere else, please let me know; you can
do this directly or simply tell my regression tracking bot yourself by
sending a reply to this mail with a paragraph containing a regzbot
command like "#regzbot monitor
https://lore.kernel.org/r/[email protected]/"

If you think there are valid reasons to drop this regressions from the
tracking, let me know; you can do this directly or simply tell my
regression tracking bot yourself by sending a reply to this mail with a
paragraph containing a regzbot command like "#regzbot invalid: Some
explanation" (without the quotes).

Anyway: I'm putting it on back burner now to reduce the noise, as this
afaics is less important than other regressions:

#regzbot backburner: Culprit is hard to track down
#regzbot poke

You likely get two more mails like this after the next two merge
windows, then I'll drop it if I don't here anything back.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

On 13.01.22 17:57, Roman Gushchin wrote:
> On Thu, Jan 13, 2022 at 04:41:57PM +0000, Valentin Schneider wrote:
>> On 03/01/22 11:16, Josef Bacik wrote:
>>> On Wed, Dec 22, 2021 at 04:07:35PM +0000, Valentin Schneider wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 22/12/21 13:42, Thorsten Leemhuis wrote:
>>>>> What's the status here? Just wondering, because there hasn't been any
>>>>> activity in this thread since 11 days and the festive season is upon us.
>>>>>
>>>>> Was the discussion moved elsewhere? Or is this still a mystery? And if
>>>>> it is: how bad is it, does it need to be fixed before Linus releases 5.16?
>>>>>
>>>>
>>>> I got to the end of bisect #3 yesterday, the incriminated commit doesn't
>>>> seem to make much sense but I've just re-tested it and there is a clear
>>>> regression between that commit and its parent (unlike bisect #1 and #2):
>>>>
>>>> 2127d22509aec3a83dffb2a3c736df7ba747a7ce mm, slub: fix two bugs in slab_debug_trace_open()
>>>> write_clat_ns_p99 195395.92 199638.20 4797.01 2.17%
>>>> write_iops 17305.79 17188.24 250.66 -0.68%
>>>>
>>>> write_clat_ns_p99 195543.84 199996.70 5122.88 2.28%
>>>> write_iops 17300.61 17241.86 251.56 -0.34%
>>>>
>>>> write_clat_ns_p99 195543.84 200724.48 5122.88 2.65%
>>>> write_iops 17300.61 17246.63 251.56 -0.31%
>>>>
>>>> write_clat_ns_p99 195543.84 200445.41 5122.88 2.51%
>>>> write_iops 17300.61 17215.47 251.56 -0.49%
>>>>
>>>> 6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
>>>> write_clat_ns_p99 195395.92 197942.30 4797.01 1.30%
>>>> write_iops 17305.79 17246.56 250.66 -0.34%
>>>>
>>>> write_clat_ns_p99 195543.84 196183.92 5122.88 0.33%
>>>> write_iops 17300.61 17310.33 251.56 0.06%
>>>>
>>>> write_clat_ns_p99 195543.84 196990.71 5122.88 0.74%
>>>> write_iops 17300.61 17346.32 251.56 0.26%
>>>>
>>>> write_clat_ns_p99 195543.84 196362.24 5122.88 0.42%
>>>> write_iops 17300.61 17315.71 251.56 0.09%
>>>>
>>>> It's pure debug stuff and AFAICT is a correct fix...
>>>> @Josef, could you test that on your side?
>>>
>>> Sorry, holidays and all that. I see 0 difference between the two commits, and
>>> no regression from baseline. It'll take me a few days to recover from the
>>> holidays, but I'll put some more effort into actively debugging wtf is going on
>>> here on my side since we're all having trouble pinning down what's going
>>> on.
>>
>> Humph, that's unfortunate... I just came back from my holidays, so I'll be
>> untangling my inbox for the next few days. Do keep us posted!
>
> I'm trying to bisect it independently and make sense of it too, thanks to Josef
> for providing me a test setup. From the very first data I've got yesterday,
> the only thing I can say the data is very noisy and I'm not totally convinced
> that the regression is coming from the patch which was blamed initially.
>
> I hope to make more progress today/tomorrow, will keep you updated.
>
> Thanks!
>

2022-02-18 16:16:29

by Josef Bacik

[permalink] [raw]

Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On Fri, Feb 18, 2022 at 12:00:41PM +0100, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking. Top-posting
> for once, to make this easy accessible to everyone.
>
> FWIW, this is a gentle reminder that I'm still tracking this regression.
> Afaics nothing happened in the last few weeks.
>
> If the discussion continued somewhere else, please let me know; you can
> do this directly or simply tell my regression tracking bot yourself by
> sending a reply to this mail with a paragraph containing a regzbot
> command like "#regzbot monitor
> https://lore.kernel.org/r/[email protected]/"
>
> If you think there are valid reasons to drop this regressions from the
> tracking, let me know; you can do this directly or simply tell my
> regression tracking bot yourself by sending a reply to this mail with a
> paragraph containing a regzbot command like "#regzbot invalid: Some
> explanation" (without the quotes).
>
> Anyway: I'm putting it on back burner now to reduce the noise, as this
> afaics is less important than other regressions:
>
> #regzbot backburner: Culprit is hard to track down
> #regzbot poke
>
> You likely get two more mails like this after the next two merge
> windows, then I'll drop it if I don't here anything back.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
>
>

Roman and I sat down to mess with this some more and had some weird
observations.

On our Facebook internal boxes we couldn't reproduce. If we disable all the
normal FB specific stuff so the box is "quiet" 5.16 performs better. However
these are all single socket machines with stupid high numbers of cores, my local
machine is 2 socket 6 cores.

On my box it was actually pretty noisy testing in isolation as well. In the end
I rigged up fsperf to run 1000 runs and graph each kernel on top of eachother.
What came out was really strange.

1. The "good" kernel had a period for the first ~100 runs that were very low,
the p50 was ~9000ns, but after those first 100 runs it jumped up and was right
ontop of 5.16. This explains why it shows up on my overnight tests, the box
literally reboots and runs tests. So there's a "warmup" period for the
scheduler, once it's been hammered on enough it matches 5.16 exactly, otherwise
its faster at the beginning.

2. The regression essentially disappears looking at the graphs over 1000 runs.
The results are so jittery this was the only way we could honestly look at the
results and see anything. The only place the "regression" shows up is in the
write completion latency p99. There 5.15 ranges between 75000-85000 ns, whereas
5.16 ranges between 80000 and 100000 ns. However again this is only on my
machine, and the p50 latencies and the actual bw_bytes is the same.

Given this test is relatively bursty anyways and the fact that we can't
reproduce it internally, and the fact that 5.16 actually consistently performs
better internally has convinced us to drop this, it's simply too noisy to get a
handle on to actually call it a problem.

#regzbot invalid: test too noisy and the results aren't clear cut.

Thanks,

Josef