LinuxLists.cc - [PATCH] mm/util: fix a data race in __vm_enough

2020-01-30 02:53:51

Subject: [PATCH] mm/util: fix a data race in __vm_enough_memory()

"vm_committed_as.count" could be accessed concurrently as reported by
KCSAN,

read to 0xffffffff923164f8 of 8 bytes by task 1268 on cpu 38:
__vm_enough_memory+0x43/0x280 mm/util.c:801
mmap_region+0x1b2/0xb90 mm/mmap.c:1726
do_mmap+0x45c/0x700
vm_mmap_pgoff+0xc0/0x130
vm_mmap+0x71/0x90
elf_map+0xa1/0x1b0
load_elf_binary+0x9de/0x2180
search_binary_handler+0xd8/0x2b0
__do_execve_file+0xb61/0x1080
__x64_sys_execve+0x5f/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe

write to 0xffffffff923164f8 of 8 bytes by task 1265 on cpu 41:
percpu_counter_add_batch+0x83/0xd0 lib/percpu_counter.c:91
exit_mmap+0x178/0x220 include/linux/mman.h:68
mmput+0x10e/0x270
flush_old_exec+0x572/0xfe0
load_elf_binary+0x467/0x2180
search_binary_handler+0xd8/0x2b0
__do_execve_file+0xb61/0x1080
__x64_sys_execve+0x5f/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Since only the read is operating as lockless, fix it by using
READ_ONLY() for it to avoid any possible false warning due to load
tearing.

Signed-off-by: Qian Cai <[email protected]>
---
mm/util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index 988d11e6c17c..58cd8f28651c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -798,7 +798,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
{
long allowed;

- VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
+ VM_WARN_ONCE(READ_ONCE(vm_committed_as.count) <
-(s64)vm_committed_as_batch * num_online_cpus(),
"memory commitment underflow");

--
2.21.0 (Apple Git-122.2)

2020-01-30 04:21:27

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] mm/util: fix a data race in __vm_enough_memory()

On Wed, Jan 29, 2020 at 09:51:33PM -0500, Qian Cai wrote:
> "vm_committed_as.count" could be accessed concurrently as reported by
> KCSAN,
>
> read to 0xffffffff923164f8 of 8 bytes by task 1268 on cpu 38:
> __vm_enough_memory+0x43/0x280 mm/util.c:801
> mmap_region+0x1b2/0xb90 mm/mmap.c:1726
> do_mmap+0x45c/0x700
> vm_mmap_pgoff+0xc0/0x130
> vm_mmap+0x71/0x90
> elf_map+0xa1/0x1b0
> load_elf_binary+0x9de/0x2180
> search_binary_handler+0xd8/0x2b0
> __do_execve_file+0xb61/0x1080
> __x64_sys_execve+0x5f/0x70
> do_syscall_64+0x91/0xb47
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> write to 0xffffffff923164f8 of 8 bytes by task 1265 on cpu 41:
> percpu_counter_add_batch+0x83/0xd0 lib/percpu_counter.c:91
> exit_mmap+0x178/0x220 include/linux/mman.h:68
> mmput+0x10e/0x270
> flush_old_exec+0x572/0xfe0
> load_elf_binary+0x467/0x2180
> search_binary_handler+0xd8/0x2b0
> __do_execve_file+0xb61/0x1080
> __x64_sys_execve+0x5f/0x70
> do_syscall_64+0x91/0xb47
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Since only the read is operating as lockless, fix it by using
> READ_ONLY() for it to avoid any possible false warning due to load

You mean READ_ONCE ...

> {
> long allowed;
>
> - VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
> + VM_WARN_ONCE(READ_ONCE(vm_committed_as.count) <
> -(s64)vm_committed_as_batch * num_online_cpus(),

I'm really not a fan of exposing the internals of a percpu_counter outside
the percpu_counter.h file. Why shouldn't this be fixed by putting the
READ_ONCE() inside percpu_counter_read()?

2020-01-30 11:51:53

by Qian Cai

[permalink] [raw]

Subject: Re: [PATCH] mm/util: fix a data race in __vm_enough_memory()

> On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <[email protected]> wrote:
>
> I'm really not a fan of exposing the internals of a percpu_counter outside
> the percpu_counter.h file. Why shouldn't this be fixed by putting the
> READ_ONCE() inside percpu_counter_read()?

It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned.

However, I had also copied the percpu maintainers to see if they have any preferences?

2020-01-30 12:37:49

by Marco Elver

[permalink] [raw]

Subject: Re: [PATCH] mm/util: fix a data race in __vm_enough_memory()

On Thu, 30 Jan 2020 at 12:50, Qian Cai <[email protected]> wrote:
>
> > On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <[email protected]> wrote:
> >
> > I'm really not a fan of exposing the internals of a percpu_counter outside
> > the percpu_counter.h file. Why shouldn't this be fixed by putting the
> > READ_ONCE() inside percpu_counter_read()?
>
> It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned.
>
> However, I had also copied the percpu maintainers to see if they have any preferences?

I would not add READ_ONCE to percpu_counter_read(), given the writes
(increments) are not atomic either, so not much is gained.

Notice that this is inside a WARN_ONCE, so you may argue that a data
race here doesn't matter to the correct behaviour of the system
(except if you have panic_on_warn on).

For the warning to trigger, vm_committed_as must decrease. Assume that
a data race (assuming bad compiler optimizations) can somehow
accomplish this, then the load or write must cause a transient value
to somehow be less than a stable value. My hypothesis is this is very
unlikely.

Given the fact this is a WARN_ONCE, and the fact that a transient
decrease in the value is unlikely, you may consider
'VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as)) <
...)'. That way you won't modify percpu_counter_read and still catch
unintended races elsewhere.

[ Note that the 'data_race()' macro is still only in -next, -tip, and -rcu. ]

Thanks,
-- Marco

2020-01-31 02:20:38

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] mm/util: fix a data race in __vm_enough_memory()

On Thu, 30 Jan 2020 13:35:18 +0100 Marco Elver <[email protected]> wrote:

> On Thu, 30 Jan 2020 at 12:50, Qian Cai <[email protected]> wrote:
> >
> > > On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <[email protected]> wrote:
> > >
> > > I'm really not a fan of exposing the internals of a percpu_counter outside
> > > the percpu_counter.h file. Why shouldn't this be fixed by putting the
> > > READ_ONCE() inside percpu_counter_read()?
> >
> > It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned.
> >
> > However, I had also copied the percpu maintainers to see if they have any preferences?
>
> I would not add READ_ONCE to percpu_counter_read(), given the writes
> (increments) are not atomic either, so not much is gained.
>
> Notice that this is inside a WARN_ONCE, so you may argue that a data
> race here doesn't matter to the correct behaviour of the system
> (except if you have panic_on_warn on).
>
> For the warning to trigger, vm_committed_as must decrease. Assume that
> a data race (assuming bad compiler optimizations) can somehow
> accomplish this, then the load or write must cause a transient value
> to somehow be less than a stable value. My hypothesis is this is very
> unlikely.
>
> Given the fact this is a WARN_ONCE, and the fact that a transient
> decrease in the value is unlikely, you may consider
> 'VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as)) <
> ...)'. That way you won't modify percpu_counter_read and still catch
> unintended races elsewhere.
>

That, or add an alternative version of per_cpu_counter_read() to the
percpu API. A very carefully commented version!

2020-01-31 02:23:43

by Qian Cai

[permalink] [raw]

Subject: Re: [PATCH] mm/util: fix a data race in __vm_enough_memory()

> On Jan 30, 2020, at 9:18 PM, Andrew Morton <[email protected]> wrote:
>
> On Thu, 30 Jan 2020 13:35:18 +0100 Marco Elver <[email protected]> wrote:
>
>> On Thu, 30 Jan 2020 at 12:50, Qian Cai <[email protected]> wrote:
>>>
>>>> On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <[email protected]> wrote:
>>>>
>>>> I'm really not a fan of exposing the internals of a percpu_counter outside
>>>> the percpu_counter.h file. Why shouldn't this be fixed by putting the
>>>> READ_ONCE() inside percpu_counter_read()?
>>>
>>> It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned.
>>>
>>> However, I had also copied the percpu maintainers to see if they have any preferences?
>>
>> I would not add READ_ONCE to percpu_counter_read(), given the writes
>> (increments) are not atomic either, so not much is gained.
>>
>> Notice that this is inside a WARN_ONCE, so you may argue that a data
>> race here doesn't matter to the correct behaviour of the system
>> (except if you have panic_on_warn on).
>>
>> For the warning to trigger, vm_committed_as must decrease. Assume that
>> a data race (assuming bad compiler optimizations) can somehow
>> accomplish this, then the load or write must cause a transient value
>> to somehow be less than a stable value. My hypothesis is this is very
>> unlikely.
>>
>> Given the fact this is a WARN_ONCE, and the fact that a transient
>> decrease in the value is unlikely, you may consider
>> 'VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as)) <
>> ...)'. That way you won't modify percpu_counter_read and still catch
>> unintended races elsewhere.
>>
>
> That, or add an alternative version of per_cpu_counter_read() to the
> percpu API. A very carefully commented version!

I send a patch to use data_race() which should be sufficient,

https://lore.kernel.org/linux-mm/[email protected]/