2020-01-29 18:50:29

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH] btrfs: optimize barrier usage for Rmw atomics

Use smp_mb__after_atomic() instead of smp_mb() and avoid the
unnecessary barrier for non LL/SC architectures, such as x86.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
fs/btrfs/btrfs_inode.h | 2 +-
fs/btrfs/file.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e12a477d32e..54e0d2ae22cc 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -325,7 +325,7 @@ struct btrfs_dio_private {
static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode)
{
set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
- smp_mb();
+ smp_mb__after_atomic();
}

static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a16da274c9aa..ea79ab068079 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2143,7 +2143,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
}
atomic_inc(&root->log_batch);

- smp_mb();
+ smp_mb__after_atomic();
if (btrfs_inode_in_log(BTRFS_I(inode), fs_info->generation) ||
BTRFS_I(inode)->last_trans <= fs_info->last_trans_committed) {
/*
--
2.16.4


2020-01-29 19:08:56

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics



On 29.01.20 г. 20:03 ч., Davidlohr Bueso wrote:
> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
> unnecessary barrier for non LL/SC architectures, such as x86.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>


While on the topic of this I've been sitting on the following local
patch for about a year, care to review the barriers:




Attachments:
0001-btrfs-Fix-memory-ordering-of-unlocked-dio-reads-vs-t.patch (3.77 kB)

2020-01-29 19:43:57

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics

On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
> unnecessary barrier for non LL/SC architectures, such as x86.

So that's a conflicting advice from what we got when discussing wich
barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
memory is still fresh. My first idea was to take the
smp_mb__after_atomic and __before_atomic variants and after discussion
with various people the plain smp_wmb/smp_rmb were suggested and used in
the end.

I can dig the email threads and excerpts from irc conversations, maybe
Nik has them at hand too. We do want to get rid of all unnecessary and
uncommented barriers in btrfs code, so I appreciate your patch.

> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> fs/btrfs/btrfs_inode.h | 2 +-
> fs/btrfs/file.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4e12a477d32e..54e0d2ae22cc 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -325,7 +325,7 @@ struct btrfs_dio_private {
> static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode)
> {
> set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
> - smp_mb();
> + smp_mb__after_atomic();

In this case I think we should use the smp_wmb/smp_rmb pattern rather
than the full barrier.

> }
>
> static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode)
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a16da274c9aa..ea79ab068079 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2143,7 +2143,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> }
> atomic_inc(&root->log_batch);
>
> - smp_mb();
> + smp_mb__after_atomic();

That's the problem with uncommented barriers that it's not clear what
are they related to. In this case it's not the atomic_inc above that
would justify __after_atomic. The patch that added it is years old so
any change to that barrier would require deeper analysis.

> if (btrfs_inode_in_log(BTRFS_I(inode), fs_info->generation) ||
> BTRFS_I(inode)->last_trans <= fs_info->last_trans_committed) {
> /*

2020-01-29 19:46:05

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics

On Wed, 29 Jan 2020, David Sterba wrote:

>On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
>> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
>> unnecessary barrier for non LL/SC architectures, such as x86.
>
>So that's a conflicting advice from what we got when discussing wich
>barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
>memory is still fresh. My first idea was to take the
>smp_mb__after_atomic and __before_atomic variants and after discussion
>with various people the plain smp_wmb/smp_rmb were suggested and used in
>the end.

So the patch you mention deals with test_bit(), which is out of the scope
of smp_mb__{before,after}_atomic() as it's not a RMW operation. atomic_inc()
and set_bit() are, however, meant to use these barriers.

>
>I can dig the email threads and excerpts from irc conversations, maybe
>Nik has them at hand too. We do want to get rid of all unnecessary and
>uncommented barriers in btrfs code, so I appreciate your patch.

Yeah, I struggled with the amount of undocumented barriers, and decided
not to go down that rabbit hole. This patch is only an equivalent of
what is currently there. When possible, getting rid of barriers is of
course better.

Thanks,
Davidlohr

2020-01-29 23:58:01

by Qu Wenruo

[permalink] [raw]
Subject: Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics



On 2020/1/30 上午3:25, Davidlohr Bueso wrote:
> On Wed, 29 Jan 2020, David Sterba wrote:
>
>> On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
>>> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
>>> unnecessary barrier for non LL/SC architectures, such as x86.
>>
>> So that's a conflicting advice from what we got when discussing wich
>> barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
>> memory is still fresh. My first idea was to take the
>> smp_mb__after_atomic and __before_atomic variants and after discussion
>> with various people the plain smp_wmb/smp_rmb were suggested and used in
>> the end.
>
> So the patch you mention deals with test_bit(), which is out of the scope
> of smp_mb__{before,after}_atomic() as it's not a RMW operation.
> atomic_inc()
> and set_bit() are, however, meant to use these barriers.

Exactly!
I'm still not convinced to use full barrier for test_bit() and I see no
reason to use any barrier for test_bit().
All mb should only be needed between two or more memory access, thus mb
should sit between set/clear_bit() and other operations, not around
test_bit().

>
>>
>> I can dig the email threads and excerpts from irc conversations, maybe
>> Nik has them at hand too. We do want to get rid of all unnecessary and
>> uncommented barriers in btrfs code, so I appreciate your patch.
>
> Yeah, I struggled with the amount of undocumented barriers, and decided
> not to go down that rabbit hole. This patch is only an equivalent of
> what is currently there. When possible, getting rid of barriers is of
> course better.

BTW, is there any convincing method to do proper mb examination?

I really found it hard to convince others or even myself when mb is
involved.

Thanks,
Qu

>
> Thanks,
> Davidlohr

2020-01-30 08:19:45

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics



On 30.01.20 г. 1:55 ч., Qu Wenruo wrote:
>
>
> On 2020/1/30 上午3:25, Davidlohr Bueso wrote:
>> On Wed, 29 Jan 2020, David Sterba wrote:
>>
>>> On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
>>>> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
>>>> unnecessary barrier for non LL/SC architectures, such as x86.
>>>
>>> So that's a conflicting advice from what we got when discussing wich
>>> barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
>>> memory is still fresh. My first idea was to take the
>>> smp_mb__after_atomic and __before_atomic variants and after discussion
>>> with various people the plain smp_wmb/smp_rmb were suggested and used in
>>> the end.
>>
>> So the patch you mention deals with test_bit(), which is out of the scope
>> of smp_mb__{before,after}_atomic() as it's not a RMW operation.
>> atomic_inc()
>> and set_bit() are, however, meant to use these barriers.
>
> Exactly!
> I'm still not convinced to use full barrier for test_bit() and I see no
> reason to use any barrier for test_bit().
> All mb should only be needed between two or more memory access, thus mb
> should sit between set/clear_bit() and other operations, not around
> test_bit().
>
>>
>>>
>>> I can dig the email threads and excerpts from irc conversations, maybe
>>> Nik has them at hand too. We do want to get rid of all unnecessary and
>>> uncommented barriers in btrfs code, so I appreciate your patch.
>>
>> Yeah, I struggled with the amount of undocumented barriers, and decided
>> not to go down that rabbit hole. This patch is only an equivalent of
>> what is currently there. When possible, getting rid of barriers is of
>> course better.
>
> BTW, is there any convincing method to do proper mb examination?
>
> I really found it hard to convince others or even myself when mb is
> involved.

Yes there is - the LKMM, you can write a litmus test. Check out
tootls/memory-model
>
> Thanks,
> Qu
>
>>
>> Thanks,
>> Davidlohr