2020-02-26 04:00:00

by Qian Cai

[permalink] [raw]
Subject: [PATCH v2] mm/vmscan: fix data races at kswapd_classzone_idx

pgdat->kswapd_classzone_idx could be accessed concurrently in
wakeup_kswapd(). Plain writes and reads without any lock protection
result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
well as saving a branch (compilers might well optimize the original code
in an unintentional way anyway). While at it, also take care of
pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim().
The data races were reported by KCSAN,

BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd

write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13:
wakeup_kswapd+0xf1/0x400
wakeup_kswapd at mm/vmscan.c:3967
wake_all_kswapds+0x59/0xc0
wake_all_kswapds at mm/page_alloc.c:4241
__alloc_pages_slowpath+0xdcc/0x1290
__alloc_pages_slowpath at mm/page_alloc.c:4512
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x16e/0x6f0
__handle_mm_fault+0xcd5/0xd40
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

1 lock held by mtest01/7454:
#0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at:
do_page_fault+0x143/0x6f9
do_user_addr_fault at arch/x86/mm/fault.c:1405
(inlined by) do_page_fault at arch/x86/mm/fault.c:1539
irq event stamp: 6944085
count_memcg_event_mm+0x1a6/0x270
count_memcg_event_mm+0x119/0x270
__do_softirq+0x34c/0x57c
irq_exit+0xa2/0xc0

read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38:
wakeup_kswapd+0xc8/0x400
wake_all_kswapds+0x59/0xc0
__alloc_pages_slowpath+0xdcc/0x1290
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x16e/0x6f0
__handle_mm_fault+0xcd5/0xd40
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

1 lock held by mtest01/7472:
#0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at:
do_page_fault+0x143/0x6f9
irq event stamp: 6793561
count_memcg_event_mm+0x1a6/0x270
count_memcg_event_mm+0x119/0x270
__do_softirq+0x34c/0x57c
irq_exit+0xa2/0xc0

Signed-off-by: Qian Cai <[email protected]>
---

v2: use a temp variable and take care of kswapd_order per Matthew.
take care of allow_direct_reclaim() as well.

mm/vmscan.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 876370565455..e61cc71b8915 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3136,8 +3136,9 @@ static bool allow_direct_reclaim(pg_data_t *pgdat)

/* kswapd must be awake if processes are being throttled */
if (!wmark_ok && waitqueue_active(&pgdat->kswapd_wait)) {
- pgdat->kswapd_classzone_idx = min(pgdat->kswapd_classzone_idx,
- (enum zone_type)ZONE_NORMAL);
+ if (READ_ONCE(pgdat->kswapd_classzone_idx) > ZONE_NORMAL)
+ WRITE_ONCE(pgdat->kswapd_classzone_idx, ZONE_NORMAL);
+
wake_up_interruptible(&pgdat->kswapd_wait);
}

@@ -3953,20 +3954,23 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
enum zone_type classzone_idx)
{
pg_data_t *pgdat;
+ enum zone_type curr_idx;

if (!managed_zone(zone))
return;

if (!cpuset_zone_allowed(zone, gfp_flags))
return;
+
pgdat = zone->zone_pgdat;
+ curr_idx = READ_ONCE(pgdat->kswapd_classzone_idx);
+
+ if (curr_idx == MAX_NR_ZONES || curr_idx < classzone_idx)
+ WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
+
+ if (READ_ONCE(pgdat->kswapd_order) < order)
+ WRITE_ONCE(pgdat->kswapd_order, order);

- if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
- pgdat->kswapd_classzone_idx = classzone_idx;
- else
- pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
- classzone_idx);
- pgdat->kswapd_order = max(pgdat->kswapd_order, order);
if (!waitqueue_active(&pgdat->kswapd_wait))
return;

--
2.21.0 (Apple Git-122.2)


2020-02-26 04:06:41

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2] mm/vmscan: fix data races at kswapd_classzone_idx

On Tue, Feb 25, 2020 at 10:58:27PM -0500, Qian Cai wrote:
> pgdat->kswapd_classzone_idx could be accessed concurrently in
> wakeup_kswapd(). Plain writes and reads without any lock protection
> result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
> well as saving a branch (compilers might well optimize the original code
> in an unintentional way anyway). While at it, also take care of
> pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim().

I don't understand why the usages of kswapd_classzone_idx in kswapd() and
kswapd_try_to_sleep() don't need changing too? kswapd_classzone_idx()
looks safe to me, but I'm prone to missing stupid things that compilers
are allowed to do.

2020-02-26 11:50:23

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v2] mm/vmscan: fix data races at kswapd_classzone_idx



> On Feb 25, 2020, at 11:06 PM, Matthew Wilcox <[email protected]> wrote:
>
> I don't understand why the usages of kswapd_classzone_idx in kswapd() and
> kswapd_try_to_sleep() don't need changing too? kswapd_classzone_idx()
> looks safe to me, but I'm prone to missing stupid things that compilers
> are allowed to do.

I am not sure. Although it looks possible on paper, I am wondering why KCSAN did not trigger it yet which seems rather common. I did stress testing those areas with KCSAN for a few months now, but it might just be that I missed the report at the first place.

I’ll keep running some testing to confirm it, but until that happens or someone else could confirm it could happen, I’ll leave it out for this version. We can always submit an incremental patch later if necessary.

2020-02-26 13:51:19

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v2] mm/vmscan: fix data races at kswapd_classzone_idx

On Tue, 2020-02-25 at 20:06 -0800, Matthew Wilcox wrote:
> On Tue, Feb 25, 2020 at 10:58:27PM -0500, Qian Cai wrote:
> > pgdat->kswapd_classzone_idx could be accessed concurrently in
> > wakeup_kswapd(). Plain writes and reads without any lock protection
> > result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
> > well as saving a branch (compilers might well optimize the original code
> > in an unintentional way anyway). While at it, also take care of
> > pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim().
>
> I don't understand why the usages of kswapd_classzone_idx in kswapd() and
> kswapd_try_to_sleep() don't need changing too? kswapd_classzone_idx()
> looks safe to me, but I'm prone to missing stupid things that compilers
> are allowed to do.

Right, I did capture the race this time. I'll post a v3.

[  924.803628][ T6299] BUG: KCSAN: data-race in kswapd / wakeup_kswapd 
[  924.809949][ T6299]  
[  924.812170][ T6299] write to 0xffff90973ffff2dc of 4 bytes by task 820 on cpu
6: 
[  924.819630][ T6299]  kswapd+0x27c/0x8d0 
[  924.823509][ T6299]  kthread+0x1e0/0x200 
[  924.827471][ T6299]  ret_from_fork+0x27/0x50 
[  924.831774][ T6299]  
[  924.833987][ T6299] read to 0xffff90973ffff2dc of 4 bytes by task 6299 on cpu
0: 
[  924.841442][ T6299]  wakeup_kswapd+0xf3/0x450 
[  924.845838][ T6299]  wake_all_kswapds+0x59/0xc0 
[  924.850409][ T6299]  __alloc_pages_slowpath+0xdcc/0x1290 
[  924.855769][ T6299]  __alloc_pages_nodemask+0x3bb/0x450 
[  924.861040][ T6299]  alloc_pages_vma+0x8a/0x2c0 
[  924.865612][ T6299]  do_anonymous_page+0x170/0x700 
[  924.870443][ T6299]  __handle_mm_fault+0xc9f/0xd00 
[  924.875276][ T6299]  handle_mm_fault+0xfc/0x2f0 
[  924.879849][ T6299]  do_page_fault+0x263/0x6f9 
[  924.884334][ T6299]  page_fault+0x34/0x40