2024-01-24 14:36:13

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 0/3] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

V1->V2:
Add patch 3 to fix an error when compiling code for 32-bit architectures
without CONFIG_SMP enabled.

This patchset follows the Linus suggestion to make the i_size_read/write
helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
in filemap_read() is no longer needed, so it is removed. And remove the
extra type checking in smp_load_acquire/smp_store_release under the
!CONFIG_SMP case to avoid compilation errors.

Functional tests were performed and no new problems were found.

Here are the results of unixbench tests based on 6.7.0-next-20240118 on
arm64, with some degradation in single-threading and some optimization in
multi-threading, but overall the impact is not significant.

### 72 CPUs in system; running 1 parallel copy of tests
System Benchmarks Index Values | base | patched | cmp |
--------------------------------------|---------|---------|--------|
Dhrystone 2 using register variables | 3635.06 | 3596.3 | -1.07% |
Double-Precision Whetstone | 808.58 | 808.58 | 0.00% |
Execl Throughput | 623.52 | 618.1 | -0.87% |
File Copy 1024 bufsize 2000 maxblocks | 1715.82 | 1668.58 | -2.75% |
File Copy 256 bufsize 500 maxblocks | 1320.98 | 1250.16 | -5.36% |
File Copy 4096 bufsize 8000 maxblocks | 2639.36 | 2488.48 | -5.72% |
Pipe Throughput | 869.06 | 872.3 | 0.37% |
Pipe-based Context Switching | 106.26 | 117.22 | 10.31% |
Process Creation | 247.72 | 246.74 | -0.40% |
Shell Scripts (1 concurrent) | 1234.98 | 1226 | -0.73% |
Shell Scripts (8 concurrent) | 6893.96 | 6210.46 | -9.91% |
System Call Overhead | 493.72 | 494.28 | 0.11% |
--------------------------------------|---------|---------|--------|
Total | 1003.92 | 989.58 | -1.43% |

### 72 CPUs in system; running 72 parallel copy of tests
System Benchmarks Index Values | base | patched | cmp |
--------------------------------------|-----------|-----------|--------|
Dhrystone 2 using register variables | 260471.88 | 258065.04 | -0.92% |
Double-Precision Whetstone | 58212.32 | 58219.3 | 0.01% |
Execl Throughput | 6954.7 | 7444.08 | 7.04% |
File Copy 1024 bufsize 2000 maxblocks | 64244.74 | 64618.24 | 0.58% |
File Copy 256 bufsize 500 maxblocks | 89933.8 | 87026.38 | -3.23% |
File Copy 4096 bufsize 8000 maxblocks | 79808.14 | 81916.42 | 2.64% |
Pipe Throughput | 62174.38 | 62389.74 | 0.35% |
Pipe-based Context Switching | 27239.28 | 27887.24 | 2.38% |
Process Creation | 3551.28 | 3800.54 | 7.02% |
Shell Scripts (1 concurrent) | 19212.26 | 20749.34 | 8.00% |
Shell Scripts (8 concurrent) | 20842.02 | 21958.12 | 5.36% |
System Call Overhead | 35328.24 | 35451.68 | 0.35% |
--------------------------------------|-----------|-----------|--------|
Total | 35592.42 | 36450.36 | 2.41% |

Baokun Li (3):
fs: make the i_size_read/write helpers be
smp_load_acquire/store_release()
Revert "mm/filemap: avoid buffered read/write race to read
inconsistent data"
asm-generic: remove extra type checking in acquire/release for non-SMP
case

include/asm-generic/barrier.h | 2 --
include/linux/fs.h | 10 ++++++++--
mm/filemap.c | 9 ---------
3 files changed, 8 insertions(+), 13 deletions(-)

--
2.31.1



2024-01-24 14:36:24

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 1/3] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

In [Link] Linus mentions that acquire/release makes it clear which
_particular_ memory accesses are the ordered ones, and it's unlikely
to make any performance difference, so it's much better to pair up
the release->acquire ordering than have a "wmb->rmb" ordering.

=========================================================
update pagecache
folio_mark_uptodate(folio)
smp_wmb()
set_bit PG_uptodate

=== ↑↑↑ STLR ↑↑↑ === smp_store_release(&inode->i_size, i_size)

folio_test_uptodate(folio)
test_bit PG_uptodate
smp_rmb()

=== ↓↓↓ LDAR ↓↓↓ === smp_load_acquire(&inode->i_size)

copy_page_to_iter()
=========================================================

Calling smp_store_release() in i_size_write() ensures that the data
in the page and the PG_uptodate bit are updated before the isize is
updated, and calling smp_load_acquire() in i_size_read ensures that
it will not read a newer isize than the data in the page. Therefore,
this avoids buffered read-write inconsistencies caused by Load-Load
reordering.

Link: https://lore.kernel.org/r/CAHk-=wifOnmeJq+sn+2s-P46zw0SFEbw9BSCGgp2c5fYPtRPGw@mail.gmail.com/
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Baokun Li <[email protected]>
---
include/linux/fs.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6bb10bbd7035..1cc1f3f08107 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -907,7 +907,8 @@ static inline loff_t i_size_read(const struct inode *inode)
preempt_enable();
return i_size;
#else
- return inode->i_size;
+ /* Pairs with smp_store_release() in i_size_write() */
+ return smp_load_acquire(&inode->i_size);
#endif
}

@@ -929,7 +930,12 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
inode->i_size = i_size;
preempt_enable();
#else
- inode->i_size = i_size;
+ /*
+ * Pairs with smp_load_acquire() in i_size_read() to ensure
+ * changes related to inode size (such as page contents) are
+ * visible before we see the changed inode size.
+ */
+ smp_store_release(&inode->i_size, i_size);
#endif
}

--
2.31.1


2024-01-24 15:06:34

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 3/3] asm-generic: remove extra type checking in acquire/release for non-SMP case

If CONFIG_SMP is not enabled, the smp_load_acquire/smp_store_release is
implemented as READ_ONCE/READ_ONCE and barrier() and type checking.
READ_ONCE/READ_ONCE already checks the pointer type, and then checks it
more stringently outside, but the non-SMP case simply isn't relevant, so
remove the extra compiletime_assert_atomic_type() to avoid compilation
errors.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Baokun Li <[email protected]>
---
include/asm-generic/barrier.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 961f4d88f9ef..0c0695763bea 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -193,7 +193,6 @@ do { \
#ifndef smp_store_release
#define smp_store_release(p, v) \
do { \
- compiletime_assert_atomic_type(*p); \
barrier(); \
WRITE_ONCE(*p, v); \
} while (0)
@@ -203,7 +202,6 @@ do { \
#define smp_load_acquire(p) \
({ \
__unqual_scalar_typeof(*p) ___p1 = READ_ONCE(*p); \
- compiletime_assert_atomic_type(*p); \
barrier(); \
(typeof(*p))___p1; \
})
--
2.31.1


2024-01-25 16:33:48

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

On Wed, 24 Jan 2024 22:28:54 +0800, Baokun Li wrote:
> V1->V2:
> Add patch 3 to fix an error when compiling code for 32-bit architectures
> without CONFIG_SMP enabled.
>
> This patchset follows the Linus suggestion to make the i_size_read/write
> helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
> in filemap_read() is no longer needed, so it is removed. And remove the
> extra type checking in smp_load_acquire/smp_store_release under the
> !CONFIG_SMP case to avoid compilation errors.
>
> [...]

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/3] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()
https://git.kernel.org/vfs/vfs/c/6238fe4d7cad
[2/3] Revert "mm/filemap: avoid buffered read/write race to read inconsistent data"
https://git.kernel.org/vfs/vfs/c/bf7aad3980da
[3/3] asm-generic: remove extra type checking in acquire/release for non-SMP case
https://git.kernel.org/vfs/vfs/c/e9cbdca0a243