2022-08-31 07:25:21

by kernel test robot

[permalink] [raw]
Subject: d4252071b9: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression


hi, pleased be noted that we read this patch and understand it as a fix,
also what we understand is, since the patch itself adds some memory barrier,
some regression in block IO area is kind of expected.

after more internal review, we still decided to report out to share our finding
in our tests, and for your information that how this patch could impact
performance in some cases. please let us know if you have any concern.

Thanks a lot!

below is the full report.


Greeting,

FYI, we noticed a -26.5% regression of fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec due to commit:


commit: d4252071b97d2027d246f6a82cbee4d52f618b47 ("add barriers to buffer_uptodate and set_buffer_uptodate")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fxmark
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
with following parameters:

disk: 1SSD
media: ssd
test: DWTL
fstype: ext4_no_jnl
directio: directio
cpufreq_governor: performance
ucode: 0xd000363

test-description: FxMark is a filesystem benchmark that test multicore scalability.
test-url: https://github.com/sslab-gatech/fxmark

In addition to that, the commit also has significant impact on the following tests:


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/directio/disk/fstype/kconfig/media/rootfs/tbox_group/test/testcase/ucode:
gcc-11/performance/directio/1SSD/ext4_no_jnl/x86_64-rhel-8.3/ssd/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/DWTL/fxmark/0xd000363

commit:
e394ff83bb ("Merge tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux")
d4252071b9 ("add barriers to buffer_uptodate and set_buffer_uptodate")

e394ff83bbca1c72 d4252071b97d2027d246f6a82cb
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.03 +15.9% 0.04 ? 2% fxmark.ssd_ext4_no_jnl_DWTL_2_directio.secs
733363 -13.7% 632887 ? 2% fxmark.ssd_ext4_no_jnl_DWTL_2_directio.works/sec
1.70 ? 38% -46.9% 0.90 ? 32% fxmark.ssd_ext4_no_jnl_DWTL_36_directio.irq_util
0.00 ? 10% +57.1% 0.01 ? 12% fxmark.ssd_ext4_no_jnl_DWTL_36_directio.secs
0.15 ? 49% +84.3% 0.27 ? 11% fxmark.ssd_ext4_no_jnl_DWTL_36_directio.sys_sec
11.01 ? 46% +68.6% 18.56 ? 8% fxmark.ssd_ext4_no_jnl_DWTL_36_directio.sys_util
5084846 ? 11% -36.2% 3242280 ? 11% fxmark.ssd_ext4_no_jnl_DWTL_36_directio.works/sec
0.05 ? 5% +10.5% 0.06 ? 3% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.real_sec
0.02 ? 6% +20.5% 0.03 ? 4% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.secs
0.09 ? 7% +18.2% 0.11 ? 3% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.sys_sec
19.84 ? 6% -11.5% 17.57 ? 5% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.user_util
1105172 ? 6% -17.2% 914623 ? 4% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.works/sec
0.00 ? 8% +37.2% 0.01 ? 12% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.secs
0.22 ? 18% +67.9% 0.38 ? 11% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.sys_sec
11.60 ? 13% +49.8% 17.38 ? 9% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.sys_util
5081260 ? 8% -26.5% 3734460 ? 12% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec


#regzbot introduced: d4252071b9


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (4.44 kB)
config-5.19.0-13322-gd4252071b97d (167.13 kB)
job-script (8.21 kB)
job.yaml (5.69 kB)
reproduce (263.00 B)
Download all attachments

2022-08-31 17:16:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: d4252071b9: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression

On Wed, Aug 31, 2022 at 12:21 AM kernel test robot
<[email protected]> wrote:
>
> hi, pleased be noted that we read this patch and understand it as a fix,
> also what we understand is, since the patch itself adds some memory barrier,
> some regression in block IO area is kind of expected.

Well, yes and no.

It's a memory ordering fix, but the memory ordering part is one that
should *not* have any actual impact on x86, because the addition of
smp_mb__before_atomic() should be a total no-op, and
"smp_load_acquire()" should only imply a compiler scheduling barrier.

IOW, it most definitely shouldn't cause something like this:

> FYI, we noticed a -26.5% regression of
> fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec

because at most it should have caused tiny perturbation of the
instruction scheduling (obviously possibly register allocation, stack
spill differences and and instruction choice).

Except there was a change there that isn't just about memory ordering:

> after more internal review, we still decided to report out to share our finding
> in our tests, and for your information that how this patch could impact
> performance in some cases. please let us know if you have any concern.

Oh, it's absolutely interesting and unexpected.

And I think the cause is obvious: our "set_buffer_uptodate()" *used*
to use the BUFFER_FNS() macro, which does that bit setting
conditionally.

And while that isn't actually correct in an "atomic op" situation, it
*is* fine in the case of set_buffer_uptodate(), since if the buffer
was already uptodate, any other CPU looking at that bit will not be
caring about what *this* CPU did.

IOW, if this CPU sees the bit as having ever been uptodate before,
then any barriers are irrelevant, because they are about the original
setting of 'uptodate', not the new one.

So I think we can just do this:

--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -137,12 +137,14 @@ BUFFER_FNS(Defer_Completion, defer_completion)

static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
{
- /*
- * make it consistent with folio_mark_uptodate
- * pairs with smp_load_acquire in buffer_uptodate
- */
- smp_mb__before_atomic();
- set_bit(BH_Uptodate, &bh->b_state);
+ if (!test_bit(BH_Uptodate, &bh->b_state)) {
+ /*
+ * make it consistent with folio_mark_uptodate
+ * pairs with smp_load_acquire in buffer_uptodate
+ */
+ smp_mb__before_atomic();
+ set_bit(BH_Uptodate, &bh->b_state);
+ }
}

static __always_inline void clear_buffer_uptodate(struct buffer_head *bh)

and re-introduce the original code (maybe extend that comment to talk
about this "only first up-to-date matters".

HOWEVER.

I'd love to hear if you have a clear profile change, and to see
exactly which set_buffer_uptodate() is *so* important. Honestly, I
didn't expect the buffer head functions to even really matter much any
more, with pretty much all IO being about the page cache..

Linus

2022-09-08 09:22:47

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [LKP] Re: d4252071b9: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression

Hi Linus,

On 9/1/2022 12:46 AM, Linus Torvalds wrote:
> So I think we can just do this:
>
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -137,12 +137,14 @@ BUFFER_FNS(Defer_Completion, defer_completion)
>
> static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
> {
> - /*
> - * make it consistent with folio_mark_uptodate
> - * pairs with smp_load_acquire in buffer_uptodate
> - */
> - smp_mb__before_atomic();
> - set_bit(BH_Uptodate, &bh->b_state);
> + if (!test_bit(BH_Uptodate, &bh->b_state)) {
> + /*
> + * make it consistent with folio_mark_uptodate
> + * pairs with smp_load_acquire in buffer_uptodate
> + */
> + smp_mb__before_atomic();
> + set_bit(BH_Uptodate, &bh->b_state);
> + }
> }
>
> static __always_inline void clear_buffer_uptodate(struct buffer_head *bh)
>
> and re-introduce the original code (maybe extend that comment to talk
> about this "only first up-to-date matters".
Test result:

commit:
e394ff83bbca1c72427b1feb5c6b9d4dad832f01 -> parent of bad commit
d4252071b97d2027d246f6a82cbee4d52f618b47 -> bad commit
6812880b7af46dc2a78ad2069e5279adcbfd5133 -> The fixing patch commit

e394ff83bbca1c72 d4252071b97d2027d246f6a82cb 6812880b7af46dc2a78ad2069e5
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
0.01 ± 3% +30.7% 0.01 ± 4% -2.5% 0.01 ± 3% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.secs
0.14 ± 3% +29.3% 0.18 ± 5% -4.9% 0.13 ± 6% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.sys_sec
20.21 ± 4% +16.6% 23.58 -7.7% 18.66 ± 5% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.sys_util
3377886 ± 3% -23.4% 2586083 ± 4% +2.6% 3466796 ± 3% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.works/sec

0.06 +15.9% 0.07 ± 2% +2.7% 0.06 fxmark.ssd_ext4_no_jnl_DWTL_2_directio.real_sec
0.03 +24.9% 0.04 ± 3% +3.4% 0.03 fxmark.ssd_ext4_no_jnl_DWTL_2_directio.secs
0.07 +23.8% 0.09 ± 5% -4.8% 0.07 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_2_directio.sys_sec
55.34 ± 3% +7.1% 59.26 ± 3% -7.3% 51.28 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_2_directio.sys_util
738881 -19.9% 592194 ± 3% -3.3% 714200 fxmark.ssd_ext4_no_jnl_DWTL_2_directio.works/sec

38.31 ± 3% -20.0% 30.64 ± 9% -15.8% 32.27 ± 8% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.idle_util
0.02 +30.0% 0.03 ± 5% +4.6% 0.02 fxmark.ssd_ext4_no_jnl_DWTL_4_directio.secs
0.08 ± 5% +32.0% 0.11 +16.0% 0.10 ± 12% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.sys_sec
41.65 ± 2% +22.1% 50.86 ± 4% +6.7% 44.43 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.sys_util
1163070 -22.9% 896752 ± 5% -4.4% 1111656 fxmark.ssd_ext4_no_jnl_DWTL_4_directio.works/sec

1.32 ± 18% -16.6% 1.10 ± 22% -22.0% 1.03 ± 3% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.irq_util
0.00 ± 2% +25.1% 0.01 ± 8% -1.2% 0.00 ± 4% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.secs
0.24 ± 3% +37.5% 0.33 ± 4% +16.7% 0.28 ± 2% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.sys_sec
11.85 ± 4% +30.8% 15.50 ± 4% +21.7% 14.41 ± 6% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.sys_util
5031984 ± 2% -19.5% 4049295 ± 8% +1.3% 5099506 ± 4% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec

0.00 ± 3% +37.5% 0.01 ± 11% +13.8% 0.00 ± 9% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.secs
0.33 ± 9% +29.0% 0.43 ± 8% +12.0% 0.37 ± 8% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.sys_sec
12.11 ± 10% +21.9% 14.76 ± 6% +12.5% 13.63 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.sys_util
5533529 ± 4% -26.3% 4078851 ± 12% -11.5% 4896500 ± 8% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.works/sec

The test result looks good (only test with 72 process doesn't restore to same level as
original result).

You may notice the test with 36 process are not showed here. It is because of the lkp
script problem and we are working on it.

Checked the test result of 36 process manually:
e394ff83bbca1c72: avg = 4267358.820936666 -> the parent of bad commit
d4252071b97d2027: avg = 3821149.8479718883 -> the bad commit
6812880b7af46dc2: avg = 4724775.219265333 -> the fixing patch commit
It looks good also.

>
> HOWEVER.
>
> I'd love to hear if you have a clear profile change, and to see
> exactly which set_buffer_uptodate() is *so* important. Honestly, I
> didn't expect the buffer head functions to even really matter much any
> more, with pretty much all IO being about the page cache..

Unfortunately, the valid profile data was not captured yet. We will keep
working on it and share out once we get the valid profile data. Thanks.


Regards
Yin, Fengwei

2022-09-08 13:02:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LKP] Re: d4252071b9: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression

On Thu, Sep 8, 2022 at 4:56 AM Yin, Fengwei <[email protected]> wrote:
>
> The test result looks good (only test with 72 process doesn't restore to same level as
> original result).

Thank you. I've committed that fix (with a slight syntax change and a
full commit message).

On an entirely different note, I'll just have to admit to my
ignorance, and didn't know how you prefer your name in the "tested-by"
line. For all I know, you might prefer the family name first, with or
without a comma. But I ended up writing it Westernized as "Fengwei
Yin", which seems to be the most common form in other kernel commit
messages, and in some comments.

Maybe you don't care, or maybe you care deeply. If you do, just let me
know for future reference and I'll try to remember.

(Maybe you don't care deeply, but I remember the mess international
"spelling" of Finnish names used to be when 7-bit US-ASCII was the
norm, so I _try_ to make sure that we at least attempt to get names of
people right in the kernel.)

I only later noticed that you actually signed your email "Yin,
Fengwei", which is what makes me think I screwed up and is the reason
for this note.

Linus

2022-09-08 13:47:15

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [LKP] Re: d4252071b9: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression

Hi Linus,

On 9/8/2022 8:14 PM, Linus Torvalds wrote:
> On Thu, Sep 8, 2022 at 4:56 AM Yin, Fengwei <[email protected]> wrote:
>>
>> The test result looks good (only test with 72 process doesn't restore to same level as
>> original result).
>
> Thank you. I've committed that fix (with a slight syntax change and a
> full commit message).
>
> On an entirely different note, I'll just have to admit to my
> ignorance, and didn't know how you prefer your name in the "tested-by"
> line. For all I know, you might prefer the family name first, with or
> without a comma. But I ended up writing it Westernized as "Fengwei
> Yin", which seems to be the most common form in other kernel commit
> messages, and in some comments.
>
> Maybe you don't care, or maybe you care deeply. If you do, just let me
> know for future reference and I'll try to remember.
>
> (Maybe you don't care deeply, but I remember the mess international
> "spelling" of Finnish names used to be when 7-bit US-ASCII was the
> norm, so I _try_ to make sure that we at least attempt to get names of
> people right in the kernel.)
>
> I only later noticed that you actually signed your email "Yin,
> Fengwei", which is what makes me think I screwed up and is the reason
> for this note.

No worries. Either way works for me. Thanks.


Regards
Yin, Fengwei

>
> Linus
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]