2024-01-22 09:59:42

by Baokun Li

[permalink] [raw]
Subject: [PATCH 0/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

This patchset follows the linus suggestion to make the i_size_read/write
helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
in filemap_read() is no longer needed, so it is removed.

Functional tests were performed and no new problems were found.

Here are the results of unixbench tests based on 6.7.0-next-20240118 on
arm64, with some degradation in single-threading and some optimization in
multi-threading, but overall the impact is not significant.

### 72 CPUs in system; running 1 parallel copy of tests
System Benchmarks Index Values | base | patched | cmp |
--------------------------------------|---------|---------|--------|
Dhrystone 2 using register variables | 3635.06 | 3596.3 | -1.07% |
Double-Precision Whetstone | 808.58 | 808.58 | 0.00% |
Execl Throughput | 623.52 | 618.1 | -0.87% |
File Copy 1024 bufsize 2000 maxblocks | 1715.82 | 1668.58 | -2.75% |
File Copy 256 bufsize 500 maxblocks | 1320.98 | 1250.16 | -5.36% |
File Copy 4096 bufsize 8000 maxblocks | 2639.36 | 2488.48 | -5.72% |
Pipe Throughput | 869.06 | 872.3 | 0.37% |
Pipe-based Context Switching | 106.26 | 117.22 | 10.31% |
Process Creation | 247.72 | 246.74 | -0.40% |
Shell Scripts (1 concurrent) | 1234.98 | 1226 | -0.73% |
Shell Scripts (8 concurrent) | 6893.96 | 6210.46 | -9.91% |
System Call Overhead | 493.72 | 494.28 | 0.11% |
--------------------------------------|---------|---------|--------|
Total | 1003.92 | 989.58 | -1.43% |

### 72 CPUs in system; running 72 parallel copy of tests
System Benchmarks Index Values | base | patched | cmp |
--------------------------------------|-----------|-----------|--------|
Dhrystone 2 using register variables | 260471.88 | 258065.04 | -0.92% |
Double-Precision Whetstone | 58212.32 | 58219.3 | 0.01% |
Execl Throughput | 6954.7 | 7444.08 | 7.04% |
File Copy 1024 bufsize 2000 maxblocks | 64244.74 | 64618.24 | 0.58% |
File Copy 256 bufsize 500 maxblocks | 89933.8 | 87026.38 | -3.23% |
File Copy 4096 bufsize 8000 maxblocks | 79808.14 | 81916.42 | 2.64% |
Pipe Throughput | 62174.38 | 62389.74 | 0.35% |
Pipe-based Context Switching | 27239.28 | 27887.24 | 2.38% |
Process Creation | 3551.28 | 3800.54 | 7.02% |
Shell Scripts (1 concurrent) | 19212.26 | 20749.34 | 8.00% |
Shell Scripts (8 concurrent) | 20842.02 | 21958.12 | 5.36% |
System Call Overhead | 35328.24 | 35451.68 | 0.35% |
--------------------------------------|-----------|-----------|--------|
Total | 35592.42 | 36450.36 | 2.41% |

Baokun Li (2):
fs: make the i_size_read/write helpers be
smp_load_acquire/store_release()
Revert "mm/filemap: avoid buffered read/write race to read
inconsistent data"

include/linux/fs.h | 10 ++++++++--
mm/filemap.c | 9 ---------
2 files changed, 8 insertions(+), 11 deletions(-)

--
2.31.1



2024-01-22 11:31:54

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 0/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

On Mon, 22 Jan 2024 17:45:34 +0800, Baokun Li wrote:
> This patchset follows the linus suggestion to make the i_size_read/write
> helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
> in filemap_read() is no longer needed, so it is removed.
>
> Functional tests were performed and no new problems were found.
>
> Here are the results of unixbench tests based on 6.7.0-next-20240118 on
> arm64, with some degradation in single-threading and some optimization in
> multi-threading, but overall the impact is not significant.
>
> [...]

Hm, we can certainly try but I wouldn't rule it out that someone will
complain aobut the "non-significant" degradation in single-threading.
We'll see. Let that performance bot chew on it for a bit as well.

But I agree that the smp_load_acquire()/smp_store_release() is clearer
than the open-coded smp_rmb().

---

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()
https://git.kernel.org/vfs/vfs/c/7d7825fde8ba
[2/2] Revert "mm/filemap: avoid buffered read/write race to read inconsistent data"
https://git.kernel.org/vfs/vfs/c/83dfed690b90

2024-01-22 12:26:50

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH 0/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

On 2024/1/22 19:14, Christian Brauner wrote:
> On Mon, 22 Jan 2024 17:45:34 +0800, Baokun Li wrote:
>> This patchset follows the linus suggestion to make the i_size_read/write
>> helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
>> in filemap_read() is no longer needed, so it is removed.
>>
>> Functional tests were performed and no new problems were found.
>>
>> Here are the results of unixbench tests based on 6.7.0-next-20240118 on
>> arm64, with some degradation in single-threading and some optimization in
>> multi-threading, but overall the impact is not significant.
>>
>> [...]
> Hm, we can certainly try but I wouldn't rule it out that someone will
> complain aobut the "non-significant" degradation in single-threading.
> We'll see. Let that performance bot chew on it for a bit as well.
>
> But I agree that the smp_load_acquire()/smp_store_release() is clearer
> than the open-coded smp_rmb().
Thank you very much for applying this patch!

Adding barriers where none existed does introduce some performance
degradation. But the multi-threaded test results here look pretty
good, it's just that the single-threaded test results have a bit too
much degradation for Shell Scripts (8 concurrent).  I've tracked
down this test item, which calls clone() and wait4() and then triggers
isize reads and writes frequently, so the degradation here is as
expected, just not sure if anyone cares about this scenario.
> ---
>
> Applied to the vfs.misc branch of the vfs/vfs.git tree.
> Patches in the vfs.misc branch should appear in linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
>
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs.misc
>
> [1/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()
> https://git.kernel.org/vfs/vfs/c/7d7825fde8ba
> [2/2] Revert "mm/filemap: avoid buffered read/write race to read inconsistent data"
> https://git.kernel.org/vfs/vfs/c/83dfed690b90
Thanks!
--
With Best Regards,
Baokun Li
.

2024-01-23 19:13:51

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 0/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

On Mon 22-01-24 12:14:52, Christian Brauner wrote:
> On Mon, 22 Jan 2024 17:45:34 +0800, Baokun Li wrote:
> > This patchset follows the linus suggestion to make the i_size_read/write
> > helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
> > in filemap_read() is no longer needed, so it is removed.
> >
> > Functional tests were performed and no new problems were found.
> >
> > Here are the results of unixbench tests based on 6.7.0-next-20240118 on
> > arm64, with some degradation in single-threading and some optimization in
> > multi-threading, but overall the impact is not significant.
> >
> > [...]
>
> Hm, we can certainly try but I wouldn't rule it out that someone will
> complain aobut the "non-significant" degradation in single-threading.
> We'll see. Let that performance bot chew on it for a bit as well.

Yeah, over 5% regression in buffered read/write cost is a bit hard to
swallow. I somewhat wonder why this is so much - maybe people call
i_size_read() without thinking too much and now it becomes atomic op on
arm? Also LKP tests only on x86 (where these changes are going to be
for noop) and I'm not sure anybody else runs performance tests on
linux-next, even less so on ARM... So not sure anybody will complain until
this gets into some distro (such as Android).

> But I agree that the smp_load_acquire()/smp_store_release() is clearer
> than the open-coded smp_rmb().

Agreed, conceptually this is nice and it will also silence some KCSAN
warnings about i_size updates vs reads.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-24 08:21:45

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH 0/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

On 2024/1/24 2:56, Jan Kara wrote:
> On Mon 22-01-24 12:14:52, Christian Brauner wrote:
>> On Mon, 22 Jan 2024 17:45:34 +0800, Baokun Li wrote:
>>> This patchset follows the linus suggestion to make the i_size_read/write
>>> helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
>>> in filemap_read() is no longer needed, so it is removed.
>>>
>>> Functional tests were performed and no new problems were found.
>>>
>>> Here are the results of unixbench tests based on 6.7.0-next-20240118 on
>>> arm64, with some degradation in single-threading and some optimization in
>>> multi-threading, but overall the impact is not significant.
>>>
>>> [...]
>> Hm, we can certainly try but I wouldn't rule it out that someone will
>> complain aobut the "non-significant" degradation in single-threading.
>> We'll see. Let that performance bot chew on it for a bit as well.
> Yeah, over 5% regression in buffered read/write cost is a bit hard to
> swallow. I somewhat wonder why this is so much - maybe people call
> i_size_read() without thinking too much and now it becomes atomic op on
> arm? Also LKP tests only on x86 (where these changes are going to be
> for noop) and I'm not sure anybody else runs performance tests on
> linux-next, even less so on ARM... So not sure anybody will complain until
> this gets into some distro (such as Android).
>
>> But I agree that the smp_load_acquire()/smp_store_release() is clearer
>> than the open-coded smp_rmb().
> Agreed, conceptually this is nice and it will also silence some KCSAN
> warnings about i_size updates vs reads.
>
> Honza
Hello Honza!

Are there any other performance tests you'd like to perform?
I can test it on my machine if you have any.

Cheers!
--
With Best Regards,
Baokun Li
.

2024-01-24 11:22:16

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 0/2] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

On Tue, Jan 23, 2024 at 07:56:22PM +0100, Jan Kara wrote:
> On Mon 22-01-24 12:14:52, Christian Brauner wrote:
> > On Mon, 22 Jan 2024 17:45:34 +0800, Baokun Li wrote:
> > > This patchset follows the linus suggestion to make the i_size_read/write
> > > helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
> > > in filemap_read() is no longer needed, so it is removed.
> > >
> > > Functional tests were performed and no new problems were found.
> > >
> > > Here are the results of unixbench tests based on 6.7.0-next-20240118 on
> > > arm64, with some degradation in single-threading and some optimization in
> > > multi-threading, but overall the impact is not significant.
> > >
> > > [...]
> >
> > Hm, we can certainly try but I wouldn't rule it out that someone will
> > complain aobut the "non-significant" degradation in single-threading.
> > We'll see. Let that performance bot chew on it for a bit as well.
>
> Yeah, over 5% regression in buffered read/write cost is a bit hard to
> swallow. I somewhat wonder why this is so much - maybe people call
> i_size_read() without thinking too much and now it becomes atomic op on
> arm? Also LKP tests only on x86 (where these changes are going to be
> for noop) and I'm not sure anybody else runs performance tests on
> linux-next, even less so on ARM... So not sure anybody will complain until
> this gets into some distro (such as Android).

The LKP thing does iirc. We get reports from them quite often but there's
no way to request a test on a specific branch and get a result in some
timeframe (1 week would already be great) back. That's what I'd really like.

And similar for the build tests from the intel build bot it would be
nice if one could opt-in to get notifications that no performance
regression did indeed happen.

>
> > But I agree that the smp_load_acquire()/smp_store_release() is clearer
> > than the open-coded smp_rmb().
>
> Agreed, conceptually this is nice and it will also silence some KCSAN
> warnings about i_size updates vs reads.
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR