pgdat->kswapd_classzone_idx could be accessed concurrently in
wakeup_kswapd(). Plain writes and reads without any lock protection
result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
well as saving a branch (compilers might well optimize the original code
in an unintentional way anyway). The data races were reported by KCSAN,
BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd
write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13:
wakeup_kswapd+0xf1/0x400
wakeup_kswapd at mm/vmscan.c:3967
wake_all_kswapds+0x59/0xc0
wake_all_kswapds at mm/page_alloc.c:4241
__alloc_pages_slowpath+0xdcc/0x1290
__alloc_pages_slowpath at mm/page_alloc.c:4512
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x16e/0x6f0
__handle_mm_fault+0xcd5/0xd40
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
1 lock held by mtest01/7454:
#0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at:
do_page_fault+0x143/0x6f9
do_user_addr_fault at arch/x86/mm/fault.c:1405
(inlined by) do_page_fault at arch/x86/mm/fault.c:1539
irq event stamp: 6944085
count_memcg_event_mm+0x1a6/0x270
count_memcg_event_mm+0x119/0x270
__do_softirq+0x34c/0x57c
irq_exit+0xa2/0xc0
read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38:
wakeup_kswapd+0xc8/0x400
wake_all_kswapds+0x59/0xc0
__alloc_pages_slowpath+0xdcc/0x1290
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x16e/0x6f0
__handle_mm_fault+0xcd5/0xd40
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
1 lock held by mtest01/7472:
#0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at:
do_page_fault+0x143/0x6f9
irq event stamp: 6793561
count_memcg_event_mm+0x1a6/0x270
count_memcg_event_mm+0x119/0x270
__do_softirq+0x34c/0x57c
irq_exit+0xa2/0xc0
Signed-off-by: Qian Cai <[email protected]>
---
mm/vmscan.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 876370565455..400950734e5a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
return;
pgdat = zone->zone_pgdat;
- if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
- pgdat->kswapd_classzone_idx = classzone_idx;
- else
- pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
- classzone_idx);
+ if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
+ READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
+ WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
+
pgdat->kswapd_order = max(pgdat->kswapd_order, order);
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
--
1.8.3.1
On Tue, Feb 25, 2020 at 11:55:26AM -0500, Qian Cai wrote:
> - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
> - pgdat->kswapd_classzone_idx = classzone_idx;
> - else
> - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
> - classzone_idx);
> + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
> + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
> + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
> +
> pgdat->kswapd_order = max(pgdat->kswapd_order, order);
Doesn't this line have the exact same problem you're "solving" above?
Also, why would you do this crazy "f(READ_ONCE(x)) || g(READ_ONCE(x))?
Surely you should be stashing READ_ONCE(x) into a local variable?
And there are a _lot_ of places which access kswapd_classzone_idx
without a lock. Are you sure this patch is sufficient?
> On Feb 25, 2020, at 8:29 PM, Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Feb 25, 2020 at 11:55:26AM -0500, Qian Cai wrote:
>> - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
>> - pgdat->kswapd_classzone_idx = classzone_idx;
>> - else
>> - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
>> - classzone_idx);
>> + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
>> + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
>> + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
>> +
>> pgdat->kswapd_order = max(pgdat->kswapd_order, order);
>
> Doesn't this line have the exact same problem you're "solving" above?
Good catch. I missed that.
>
> Also, why would you do this crazy "f(READ_ONCE(x)) || g(READ_ONCE(x))?
> Surely you should be stashing READ_ONCE(x) into a local variable?
Not a bad idea.
>
> And there are a _lot_ of places which access kswapd_classzone_idx
> without a lock. Are you sure this patch is sufficient?
I am not sure, but KCSAN did not complain about other places after running extended
period of testing, so I will look into once other places show up later.
On Tue, 25 Feb 2020 11:55:26 -0500 Qian Cai <[email protected]> wrote:
> pgdat->kswapd_classzone_idx could be accessed concurrently in
> wakeup_kswapd(). Plain writes and reads without any lock protection
> result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
> well as saving a branch (compilers might well optimize the original code
> in an unintentional way anyway). The data races were reported by KCSAN,
>
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
> return;
> pgdat = zone->zone_pgdat;
>
> - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
> - pgdat->kswapd_classzone_idx = classzone_idx;
> - else
> - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
> - classzone_idx);
> + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
> + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
> + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
> +
> pgdat->kswapd_order = max(pgdat->kswapd_order, order);
> if (!waitqueue_active(&pgdat->kswapd_wait))
> return;
This is very partial, isn't it? The above code itself is racy against
other code which manipulates ->kswapd_classzone_idx and the
manipulation in allow_direct_reclaim() is performed by threads other
than kswapd and so need the READ_ONCE treatment and is still racy with
that?
I guess occasional races here don't really matter, but a grossly wrong
read from load tearing might matter. In which case shouldn't we be
defending against them in all cases where non-kswapd threads read this
field?
> On Feb 25, 2020, at 9:11 PM, Andrew Morton <[email protected]> wrote:
>
> On Tue, 25 Feb 2020 11:55:26 -0500 Qian Cai <[email protected]> wrote:
>
>> pgdat->kswapd_classzone_idx could be accessed concurrently in
>> wakeup_kswapd(). Plain writes and reads without any lock protection
>> result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
>> well as saving a branch (compilers might well optimize the original code
>> in an unintentional way anyway). The data races were reported by KCSAN,
>>
>> ...
>>
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
>> return;
>> pgdat = zone->zone_pgdat;
>>
>> - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
>> - pgdat->kswapd_classzone_idx = classzone_idx;
>> - else
>> - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
>> - classzone_idx);
>> + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
>> + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
>> + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
>> +
>> pgdat->kswapd_order = max(pgdat->kswapd_order, order);
>> if (!waitqueue_active(&pgdat->kswapd_wait))
>> return;
>
> This is very partial, isn't it? The above code itself is racy against
> other code which manipulates ->kswapd_classzone_idx and the
> manipulation in allow_direct_reclaim() is performed by threads other
> than kswapd and so need the READ_ONCE treatment and is still racy with
> that?
Right, I suppose allow_direct_reclaim() could use some treatment too.
>
> I guess occasional races here don't really matter, but a grossly wrong
> read from load tearing might matter. In which case shouldn't we be
> defending against them in all cases where non-kswapd threads read this
> field?