LinuxLists.cc - [PATCH] mm/page_counter: fix various data races

2020-01-29 10:54:04

Subject: [PATCH] mm/page_counter: fix various data races

The commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could
had memcg->memsw->watermark been accessed concurrently as reported by
KCSAN,

Reported by Kernel Concurrency Sanitizer on:
BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
try_charge+0x131/0xd50 mm/memcontrol.c:2405
__memcg_kmem_charge_memcg+0x58/0x140
__memcg_kmem_charge+0xcc/0x280
__alloc_pages_nodemask+0x1e1/0x450
alloc_pages_current+0xa6/0x120
pte_alloc_one+0x17/0xd0
__pte_alloc+0x3a/0x1f0
copy_p4d_range+0xc36/0x1990
copy_page_range+0x21d/0x360
dup_mmap+0x5f5/0x7a0
dup_mm+0xa2/0x240
copy_process+0x1b3f/0x3460
_do_fork+0xaa/0xa20
__x64_sys_clone+0x13b/0x170
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe

write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
try_charge+0x131/0xd50 mm/memcontrol.c:2405
mem_cgroup_try_charge+0x159/0x460
mem_cgroup_try_charge_delay+0x3d/0xa0
wp_page_copy+0x14d/0x930
do_wp_page+0x107/0x7b0
__handle_mm_fault+0xce6/0xd40
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

Since watermark could be compared or set to garbage due to load or
store tearing which would change the code logic, fix it by adding a pair
of READ_ONCE() and WRITE_ONCE() in those places.

Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Signed-off-by: Qian Cai <[email protected]>
---
mm/page_counter.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_counter.c b/mm/page_counter.c
index de31470655f6..a17841150906 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -82,8 +82,8 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
* This is indeed racy, but we can live with some
* inaccuracy in the watermark.
*/
- if (new > c->watermark)
- c->watermark = new;
+ if (new > READ_ONCE(c->watermark))
+ WRITE_ONCE(c->watermark, new);
}
}

@@ -135,8 +135,8 @@ bool page_counter_try_charge(struct page_counter *counter,
* Just like with failcnt, we can live with some
* inaccuracy in the watermark.
*/
- if (new > c->watermark)
- c->watermark = new;
+ if (new > READ_ONCE(c->watermark))
+ WRITE_ONCE(c->watermark, new);
}
return true;

--
2.21.0 (Apple Git-122.2)

2020-01-29 10:59:53

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

On 29.01.20 11:52, Qian Cai wrote:
> The commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could
> had memcg->memsw->watermark been accessed concurrently as reported by
> KCSAN,
>
> Reported by Kernel Concurrency Sanitizer on:
> BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
>
> read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
> page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
> try_charge+0x131/0xd50 mm/memcontrol.c:2405
> __memcg_kmem_charge_memcg+0x58/0x140
> __memcg_kmem_charge+0xcc/0x280
> __alloc_pages_nodemask+0x1e1/0x450
> alloc_pages_current+0xa6/0x120
> pte_alloc_one+0x17/0xd0
> __pte_alloc+0x3a/0x1f0
> copy_p4d_range+0xc36/0x1990
> copy_page_range+0x21d/0x360
> dup_mmap+0x5f5/0x7a0
> dup_mm+0xa2/0x240
> copy_process+0x1b3f/0x3460
> _do_fork+0xaa/0xa20
> __x64_sys_clone+0x13b/0x170
> do_syscall_64+0x91/0xb47
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
> page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
> try_charge+0x131/0xd50 mm/memcontrol.c:2405
> mem_cgroup_try_charge+0x159/0x460
> mem_cgroup_try_charge_delay+0x3d/0xa0
> wp_page_copy+0x14d/0x930
> do_wp_page+0x107/0x7b0
> __handle_mm_fault+0xce6/0xd40
> handle_mm_fault+0xfc/0x2f0
> do_page_fault+0x263/0x6f9
> page_fault+0x34/0x40
>
> Since watermark could be compared or set to garbage due to load or
> store tearing which would change the code logic, fix it by adding a pair
> of READ_ONCE() and WRITE_ONCE() in those places.
>
> Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
> Signed-off-by: Qian Cai <[email protected]>
> ---
> mm/page_counter.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_counter.c b/mm/page_counter.c
> index de31470655f6..a17841150906 100644
> --- a/mm/page_counter.c
> +++ b/mm/page_counter.c
> @@ -82,8 +82,8 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
> * This is indeed racy, but we can live with some
> * inaccuracy in the watermark.
> */
> - if (new > c->watermark)
> - c->watermark = new;
> + if (new > READ_ONCE(c->watermark))
> + WRITE_ONCE(c->watermark, new);
> }
> }
>
> @@ -135,8 +135,8 @@ bool page_counter_try_charge(struct page_counter *counter,
> * Just like with failcnt, we can live with some
> * inaccuracy in the watermark.
> */
> - if (new > c->watermark)
> - c->watermark = new;
> + if (new > READ_ONCE(c->watermark))
> + WRITE_ONCE(c->watermark, new);

So, if this is racy, isn't it a problem that that "new" could suddenly
be < c->watermark (concurrent writer). So you would use the "higher"
watermark.

--
Thanks,

David / dhildenb

2020-01-29 11:00:39

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

On 29.01.20 11:57, David Hildenbrand wrote:
> On 29.01.20 11:52, Qian Cai wrote:
>> The commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could
>> had memcg->memsw->watermark been accessed concurrently as reported by
>> KCSAN,
>>
>> Reported by Kernel Concurrency Sanitizer on:
>> BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
>>
>> read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
>> page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
>> try_charge+0x131/0xd50 mm/memcontrol.c:2405
>> __memcg_kmem_charge_memcg+0x58/0x140
>> __memcg_kmem_charge+0xcc/0x280
>> __alloc_pages_nodemask+0x1e1/0x450
>> alloc_pages_current+0xa6/0x120
>> pte_alloc_one+0x17/0xd0
>> __pte_alloc+0x3a/0x1f0
>> copy_p4d_range+0xc36/0x1990
>> copy_page_range+0x21d/0x360
>> dup_mmap+0x5f5/0x7a0
>> dup_mm+0xa2/0x240
>> copy_process+0x1b3f/0x3460
>> _do_fork+0xaa/0xa20
>> __x64_sys_clone+0x13b/0x170
>> do_syscall_64+0x91/0xb47
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
>> page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
>> try_charge+0x131/0xd50 mm/memcontrol.c:2405
>> mem_cgroup_try_charge+0x159/0x460
>> mem_cgroup_try_charge_delay+0x3d/0xa0
>> wp_page_copy+0x14d/0x930
>> do_wp_page+0x107/0x7b0
>> __handle_mm_fault+0xce6/0xd40
>> handle_mm_fault+0xfc/0x2f0
>> do_page_fault+0x263/0x6f9
>> page_fault+0x34/0x40
>>
>> Since watermark could be compared or set to garbage due to load or
>> store tearing which would change the code logic, fix it by adding a pair
>> of READ_ONCE() and WRITE_ONCE() in those places.
>>
>> Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
>> Signed-off-by: Qian Cai <[email protected]>
>> ---
>> mm/page_counter.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/page_counter.c b/mm/page_counter.c
>> index de31470655f6..a17841150906 100644
>> --- a/mm/page_counter.c
>> +++ b/mm/page_counter.c
>> @@ -82,8 +82,8 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
>> * This is indeed racy, but we can live with some
>> * inaccuracy in the watermark.
>> */
>> - if (new > c->watermark)
>> - c->watermark = new;
>> + if (new > READ_ONCE(c->watermark))
>> + WRITE_ONCE(c->watermark, new);
>> }
>> }
>>
>> @@ -135,8 +135,8 @@ bool page_counter_try_charge(struct page_counter *counter,
>> * Just like with failcnt, we can live with some
>> * inaccuracy in the watermark.
>> */
>> - if (new > c->watermark)
>> - c->watermark = new;
>> + if (new > READ_ONCE(c->watermark))
>> + WRITE_ONCE(c->watermark, new);
>
> So, if this is racy, isn't it a problem that that "new" could suddenly
> be < c->watermark (concurrent writer). So you would use the "higher"
> watermark.

s/use/lose/ :)

--
Thanks,

David / dhildenb

2020-01-29 12:06:13

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

On Wed 29-01-20 05:52:24, Qian Cai wrote:
> The commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could
> had memcg->memsw->watermark been accessed concurrently as reported by
> KCSAN,
>
> Reported by Kernel Concurrency Sanitizer on:
> BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
>
> read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
> page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
> try_charge+0x131/0xd50 mm/memcontrol.c:2405
> __memcg_kmem_charge_memcg+0x58/0x140
> __memcg_kmem_charge+0xcc/0x280
> __alloc_pages_nodemask+0x1e1/0x450
> alloc_pages_current+0xa6/0x120
> pte_alloc_one+0x17/0xd0
> __pte_alloc+0x3a/0x1f0
> copy_p4d_range+0xc36/0x1990
> copy_page_range+0x21d/0x360
> dup_mmap+0x5f5/0x7a0
> dup_mm+0xa2/0x240
> copy_process+0x1b3f/0x3460
> _do_fork+0xaa/0xa20
> __x64_sys_clone+0x13b/0x170
> do_syscall_64+0x91/0xb47
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
> page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
> try_charge+0x131/0xd50 mm/memcontrol.c:2405
> mem_cgroup_try_charge+0x159/0x460
> mem_cgroup_try_charge_delay+0x3d/0xa0
> wp_page_copy+0x14d/0x930
> do_wp_page+0x107/0x7b0
> __handle_mm_fault+0xce6/0xd40
> handle_mm_fault+0xfc/0x2f0
> do_page_fault+0x263/0x6f9
> page_fault+0x34/0x40
>
> Since watermark could be compared or set to garbage due to load or
> store tearing which would change the code logic, fix it by adding a pair
> of READ_ONCE() and WRITE_ONCE() in those places.

There is no actual problem and the report is false positive, right?
While compilers are free to do all sorts of stuff do we have any actual
proof that this particular path would ever be problematic.

I do not oppose to the patch. {READ,WRITE}_ONCE actually makes some
sense to me, but the changelog should be more clear on how serious this
is.

> Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
> Signed-off-by: Qian Cai <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/page_counter.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_counter.c b/mm/page_counter.c
> index de31470655f6..a17841150906 100644
> --- a/mm/page_counter.c
> +++ b/mm/page_counter.c
> @@ -82,8 +82,8 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
> * This is indeed racy, but we can live with some
> * inaccuracy in the watermark.
> */
> - if (new > c->watermark)
> - c->watermark = new;
> + if (new > READ_ONCE(c->watermark))
> + WRITE_ONCE(c->watermark, new);
> }
> }
>
> @@ -135,8 +135,8 @@ bool page_counter_try_charge(struct page_counter *counter,
> * Just like with failcnt, we can live with some
> * inaccuracy in the watermark.
> */
> - if (new > c->watermark)
> - c->watermark = new;
> + if (new > READ_ONCE(c->watermark))
> + WRITE_ONCE(c->watermark, new);
> }
> return true;
>
> --
> 2.21.0 (Apple Git-122.2)

--
Michal Hocko
SUSE Labs

2020-01-29 12:16:19

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

On 2020/01/29 21:03, Michal Hocko wrote:
>> Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
>> Signed-off-by: Qian Cai <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>

Please include

Reported-by: [email protected]

for https://syzkaller.appspot.com/bug?id=744097b8b91cecd8b035a6f746bb12e4efc7669f .

By the way, can READ_ONCE()/WRITE_ONCE() really solve this warning?
The link above says read/write on the same location ( mm/page_counter.c:129 ).
I don't know how READ_ONCE()/WRITE_ONCE() can solve the race.

>
>> ---
>> mm/page_counter.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/page_counter.c b/mm/page_counter.c
>> index de31470655f6..a17841150906 100644
>> --- a/mm/page_counter.c
>> +++ b/mm/page_counter.c
>> @@ -82,8 +82,8 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
>> * This is indeed racy, but we can live with some
>> * inaccuracy in the watermark.
>> */
>> - if (new > c->watermark)
>> - c->watermark = new;
>> + if (new > READ_ONCE(c->watermark))
>> + WRITE_ONCE(c->watermark, new);
>> }
>> }
>>
>> @@ -135,8 +135,8 @@ bool page_counter_try_charge(struct page_counter *counter,
>> * Just like with failcnt, we can live with some
>> * inaccuracy in the watermark.
>> */
>> - if (new > c->watermark)
>> - c->watermark = new;
>> + if (new > READ_ONCE(c->watermark))
>> + WRITE_ONCE(c->watermark, new);
>> }
>> return true;
>>
>> --
>> 2.21.0 (Apple Git-122.2)
>

2020-01-29 12:23:12

by Marco Elver

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

On Wed, 29 Jan 2020 at 13:13, Tetsuo Handa
<[email protected]> wrote:
>
> On 2020/01/29 21:03, Michal Hocko wrote:
> >> Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
> >> Signed-off-by: Qian Cai <[email protected]>
> >
> > Acked-by: Michal Hocko <[email protected]>
>
> Please include
>
> Reported-by: [email protected]
>
> for https://syzkaller.appspot.com/bug?id=744097b8b91cecd8b035a6f746bb12e4efc7669f .
>
> By the way, can READ_ONCE()/WRITE_ONCE() really solve this warning?
> The link above says read/write on the same location ( mm/page_counter.c:129 ).
> I don't know how READ_ONCE()/WRITE_ONCE() can solve the race.

It avoids the *data* race, with *_ONCE telling the compiler to not
optimize the accesses in concurrency-unfriendly ways. Since *_ONCE is
used, it conveys clear intent that the code here is meant to be
concurrent, and KCSAN stops complaining (and assumes that the *logic*
is correct).

The race itself is still there, but as per comment in the file,
apparently fine and not a logic bug.

> >
> >> ---
> >> mm/page_counter.c | 8 ++++----
> >> 1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/page_counter.c b/mm/page_counter.c
> >> index de31470655f6..a17841150906 100644
> >> --- a/mm/page_counter.c
> >> +++ b/mm/page_counter.c
> >> @@ -82,8 +82,8 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
> >> * This is indeed racy, but we can live with some
> >> * inaccuracy in the watermark.
> >> */
> >> - if (new > c->watermark)
> >> - c->watermark = new;
> >> + if (new > READ_ONCE(c->watermark))
> >> + WRITE_ONCE(c->watermark, new);
> >> }
> >> }
> >>
> >> @@ -135,8 +135,8 @@ bool page_counter_try_charge(struct page_counter *counter,
> >> * Just like with failcnt, we can live with some
> >> * inaccuracy in the watermark.
> >> */
> >> - if (new > c->watermark)
> >> - c->watermark = new;
> >> + if (new > READ_ONCE(c->watermark))
> >> + WRITE_ONCE(c->watermark, new);
> >> }
> >> return true;
> >>
> >> --
> >> 2.21.0 (Apple Git-122.2)
> >
>

2020-01-29 12:26:59

by Qian Cai

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

> On Jan 29, 2020, at 7:13 AM, Tetsuo Handa <[email protected]> wrote:
>
> By the way, can READ_ONCE()/WRITE_ONCE() really solve this warning?
> The link above says read/write on the same location ( mm/page_counter.c:129 ).
> I don't know how READ_ONCE()/WRITE_ONCE() can solve the race.

That looks like a different one where it complains about c->failcnt++. I’ll send a separate patch for that.

2020-01-29 13:12:42

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

On 2020/01/29 21:25, Qian Cai wrote:
>
>
>> On Jan 29, 2020, at 7:13 AM, Tetsuo Handa <[email protected]> wrote:
>>
>> By the way, can READ_ONCE()/WRITE_ONCE() really solve this warning?
>> The link above says read/write on the same location ( mm/page_counter.c:129 ).
>> I don't know how READ_ONCE()/WRITE_ONCE() can solve the race.
>
> That looks like a different one where it complains about c->failcnt++. I’ll send a separate patch for that.
>

Ah, then this patch is meant for mm/page_counter.c:138 versus page_counter.c:139 race which was
closed as invalid at https://syzkaller.appspot.com/bug?id=871391ec080746185a2dd437c54d75fcd1ef0ef8 .

2020-01-29 13:49:01

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

On 2020/01/29 21:21, Marco Elver wrote:
>> By the way, can READ_ONCE()/WRITE_ONCE() really solve this warning?
>> The link above says read/write on the same location ( mm/page_counter.c:129 ).
>> I don't know how READ_ONCE()/WRITE_ONCE() can solve the race.
>
> It avoids the *data* race, with *_ONCE telling the compiler to not
> optimize the accesses in concurrency-unfriendly ways. Since *_ONCE is
> used, it conveys clear intent that the code here is meant to be
> concurrent, and KCSAN stops complaining (and assumes that the *logic*
> is correct).

I see. Unlike c->failcnt++ which involves read-modify-write, *_ONCE() can be used for
simple read (like c->watermark) or simple write (like c->watermark = new) case.

2020-02-11 12:43:58

by Qian Cai

[permalink] [raw]

Subject: Re: [PATCH] mm/page_counter: fix various data races

> On Jan 29, 2020, at 7:03 AM, Michal Hocko <[email protected]> wrote:
>
> Acked-by: Michal Hocko <[email protected]>

Andrew, could you pick this as well if have not done so?