2020-02-06 03:51:25

by Qian Cai

[permalink] [raw]
Subject: [PATCH] mm/memcontrol: fix a data race in scan count

struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
accessed concurrently as noticed by KCSAN,

BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size

write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
mem_cgroup_update_lru_size+0x11c/0x1d0
mem_cgroup_update_lru_size at mm/memcontrol.c:1266
isolate_lru_pages+0x6a9/0xf30
shrink_active_list+0x123/0xcc0
shrink_lruvec+0x8fd/0x1380
shrink_node+0x317/0xd80
do_try_to_free_pages+0x1f7/0xa10
try_to_free_pages+0x26c/0x5e0
__alloc_pages_slowpath+0x458/0x1290
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x170/0x700
__handle_mm_fault+0xc9f/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
lruvec_lru_size+0xbb/0x270
mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
(inlined by) lruvec_lru_size at mm/vmscan.c:326
shrink_lruvec+0x1d0/0x1380
shrink_node+0x317/0xd80
do_try_to_free_pages+0x1f7/0xa10
try_to_free_pages+0x26c/0x5e0
__alloc_pages_slowpath+0x458/0x1290
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_current+0xa6/0x120
alloc_slab_page+0x3b1/0x540
allocate_slab+0x70/0x660
new_slab+0x46/0x70
___slab_alloc+0x4ad/0x7d0
__slab_alloc+0x43/0x70
kmem_cache_alloc+0x2c3/0x420
getname_flags+0x4c/0x230
getname+0x22/0x30
do_sys_openat2+0x205/0x3b0
do_sys_open+0x9a/0xf0
__x64_sys_openat+0x62/0x80
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported by Kernel Concurrency Sanitizer on:
CPU: 95 PID: 50964 Comm: cc1 Tainted: G W O L 5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The write is under lru_lock, but the read is done as lockless. The scan
count is used to determine how aggressively the anon and file LRU lists
should be scanned. Load tearing could generate an inefficient heuristic,
so fix it by adding READ_ONCE() for the read.

Signed-off-by: Qian Cai <[email protected]>
---
include/linux/memcontrol.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a7a0a1a5c8d5..e8734dabbc61 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -533,7 +533,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
struct mem_cgroup_per_node *mz;

mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
- return mz->lru_zone_size[zone_idx][lru];
+ return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
}

void mem_cgroup_handle_over_high(void);
--
2.21.0 (Apple Git-122.2)


2020-02-10 04:29:03

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm/memcontrol: fix a data race in scan count

On Wed, 5 Feb 2020 22:49:45 -0500 Qian Cai <[email protected]> wrote:

> struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
> accessed concurrently as noticed by KCSAN,
>
> ...
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 95 PID: 50964 Comm: cc1 Tainted: G W O L 5.5.0-next-20200204+ #6
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>
> The write is under lru_lock, but the read is done as lockless. The scan
> count is used to determine how aggressively the anon and file LRU lists
> should be scanned. Load tearing could generate an inefficient heuristic,
> so fix it by adding READ_ONCE() for the read.
>
> ...
>
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -533,7 +533,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
> struct mem_cgroup_per_node *mz;
>
> mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> - return mz->lru_zone_size[zone_idx][lru];
> + return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
> }

I worry about the readability/maintainability of these things. A naive
reader who comes upon this code will wonder "why the heck is it using
READ_ONCE?". A possibly lengthy trawl through the git history will
reveal the reason but that's rather unkind. Wouldn't a simple

/* modified under lru_lock, so use READ_ONCE */

improve the situation?


2020-02-10 04:47:51

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] mm/memcontrol: fix a data race in scan count



> On Feb 9, 2020, at 11:28 PM, Andrew Morton <[email protected]> wrote:
>
> I worry about the readability/maintainability of these things. A naive
> reader who comes upon this code will wonder "why the heck is it using
> READ_ONCE?". A possibly lengthy trawl through the git history will
> reveal the reason but that's rather unkind. Wouldn't a simple
>
> /* modified under lru_lock, so use READ_ONCE */
>
> improve the situation?

Sure. I just don’t remember there are many places in the existing code which put comments for READ_ONCE() and WRITE_ONCE(). For example, kernel/locking/osq_lock.c and kernel/rcu/srcutree.c, but I suppose every subsystem could be different.