Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp263635ybf; Wed, 26 Feb 2020 12:39:56 -0800 (PST) X-Google-Smtp-Source: APXvYqz/azuGMHdjjnr5VqzehlB4GX57F1wNbLT4xTkRbK/Pe866nDc2z5QS2f1qjIcka3CYMF7y X-Received: by 2002:a9d:12a2:: with SMTP id g31mr482114otg.283.1582749596371; Wed, 26 Feb 2020 12:39:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582749596; cv=none; d=google.com; s=arc-20160816; b=iwipBtV3x8bKTfhA4WxGGGxazba3UOgsV/eryEzWgQvAoL8xRk3FylQDVTaMedy3LV V5aDSX/HCZjVYfrLtt656L2T1IWDuKxzziKhzQuWmdh2AEaivHiUzd+L8MqcYS/0kzK6 PeADfWypp1BCuP5AyoEejXVQr7c4529seqhDpe3tc5sdwJQ98NK2zNyjyPRj+IcsS0Nr vHbAb9PdK79tvGV+JYme1lBfxDUOPPg94Vm1KgYWa/tfs0iqhQhWQ3aDg2mmfhr3eWYe jl1JMSnY9tuDx8Idvc8L2zI3TYm5jew67qf2+w7nfsRCkcdTZ+NTlbTiozJq3cbdyGNB 7zXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature; bh=1+FF8laV1uW4uV4Gl7mC+45itLTemNWg5eIUO54Qcwc=; b=aqQYT2LlRHgKa3aXTwYGvqgGGIg36DgKCOu3+7Lxz7jE9y4gMZNm+sfMWB5OsQdUUW qSjDxDWXyhgBhb9/B1nwM75wV07UCtWsQ4ru2s9+aMjHh9XyDtnvJflF4+PvTU5Ho3lv T+IaJt7BXLHDt/qK/5bvqD0X9RmsSp4h6Cg+kyCGYvL76aZwy8Xg+XPEfshvF0qkkLsS QIBFOvHg98SIlX2CPaygBvJcJQSrTvPEbdoXJKD5eApl4fBtbJdVL1O7FW5eUzpmvHA0 fYwFDeSExNaLGx6pg5fkAiF+pdjQQTO9yQrjn0Bz9T+IfSJpBusN/zF2BqN6WmCaaARY Ik8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=KubLIMk7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q19si264115otm.221.2020.02.26.12.39.43; Wed, 26 Feb 2020 12:39:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=KubLIMk7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727350AbgBZUiG (ORCPT + 99 others); Wed, 26 Feb 2020 15:38:06 -0500 Received: from mail-qt1-f193.google.com ([209.85.160.193]:37712 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727277AbgBZUiG (ORCPT ); Wed, 26 Feb 2020 15:38:06 -0500 Received: by mail-qt1-f193.google.com with SMTP id j34so571994qtk.4 for ; Wed, 26 Feb 2020 12:38:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=from:to:cc:subject:date:message-id; bh=1+FF8laV1uW4uV4Gl7mC+45itLTemNWg5eIUO54Qcwc=; b=KubLIMk7flKHMhRJs1aAP6ESSh3y8tdMgMGwRGgENJT0Bvp8QKI59thi49+Rd1wqy2 VOfEKMJZMuJYl9qHJOlBDvkNrr8l+rZaGaE/1T2F4Cqt2imi5HM0Ben1msJm50UYOECI DtmauHl25cCws3NOeLsayH1HQMnItxblak1KzSo/4L+uv6irc4vgFwN4K4ufam6AYxLt 6V9bVA4ZZab6+znSX2MHrhxtRU9LD/SCyD0qKP0DHAEO2KgIjTVU3mNDBCclZoOzTFCG S8ioGfQIhI5PfxnRnl1aSDTeyw3haIam0Q/pE9KBkEJe5MEIrqkn77PKZT22hVMCorBE Dpaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=1+FF8laV1uW4uV4Gl7mC+45itLTemNWg5eIUO54Qcwc=; b=XBab+vdBzSS9gHYY9cvUaLft1J+o0Mf9AOI+r2Ufp7T/WsP4/qNX6oYvPtCp6LMCvU NT/RsDR2wgsHvvksB03eg3gCkAhan0ZSP7fN3iuAqnw0a2MwIWiFpNjPws1zrvoI9GRc xUuIqdHbLg4OuvPqpNd9c4rFowanxDbYO0h5K7ZamYMmfFaU2co53sU45PrAbbCc8lbC oao8EwqgviBPa2gsd24Egs6ENoma9e5VLt/1soBegtdOI8Yoyet1akQ6o6rv7eK1iuyU VCe3gRSIrbwTHsei039X3fZHnKFJGVGI4grgP6WFIZnbbT050NkmncIdo61yIoT/GB8S ZWiQ== X-Gm-Message-State: APjAAAU5Nv9JKbn7y6ivrEoLbX8f1o7Vb/LNGGxAOh9+4YWfm3qZeO5O uYBS66sHFNXQ2JGNScsJP7fZbA== X-Received: by 2002:ac8:540f:: with SMTP id b15mr754546qtq.237.1582749485389; Wed, 26 Feb 2020 12:38:05 -0800 (PST) Received: from qcai.nay.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id t30sm1747373qtd.67.2020.02.26.12.38.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Feb 2020 12:38:04 -0800 (PST) From: Qian Cai To: akpm@linux-foundation.org Cc: elver@google.com, willy@infradead.org, linux-kernel@vger.kernel.org, Qian Cai Subject: [PATCH v3] mm/vmscan: fix data races at kswapd_classzone_idx Date: Wed, 26 Feb 2020 15:37:52 -0500 Message-Id: <1582749472-5171-1-git-send-email-cai@lca.pw> X-Mailer: git-send-email 1.8.3.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org pgdat->kswapd_classzone_idx could be accessed concurrently in wakeup_kswapd(). Plain writes and reads without any lock protection result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as well as saving a branch (compilers might well optimize the original code in an unintentional way anyway). While at it, also take care of pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim(). The data races were reported by KCSAN, BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13: wakeup_kswapd+0xf1/0x400 wakeup_kswapd at mm/vmscan.c:3967 wake_all_kswapds+0x59/0xc0 wake_all_kswapds at mm/page_alloc.c:4241 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_slowpath at mm/page_alloc.c:4512 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7454: #0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 irq event stamp: 6944085 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38: wakeup_kswapd+0xc8/0x400 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7472: #0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 irq event stamp: 6793561 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 BUG: KCSAN: data-race in kswapd / wakeup_kswapd write to 0xffff90973ffff2dc of 4 bytes by task 820 on cpu 6: kswapd+0x27c/0x8d0 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 read to 0xffff90973ffff2dc of 4 bytes by task 6299 on cpu 0: wakeup_kswapd+0xf3/0x450 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Signed-off-by: Qian Cai --- v3: Take care of kswapd() and kswapd_try_to_sleep() too. v2: Use a temp variable and take care of kswapd_order per Matthew. Take care of allow_direct_reclaim() as well. mm/vmscan.c | 45 ++++++++++++++++++++++++++------------------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index f14c8c6069a6..4c8a1cdccbba 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3136,8 +3136,9 @@ static bool allow_direct_reclaim(pg_data_t *pgdat) /* kswapd must be awake if processes are being throttled */ if (!wmark_ok && waitqueue_active(&pgdat->kswapd_wait)) { - pgdat->kswapd_classzone_idx = min(pgdat->kswapd_classzone_idx, - (enum zone_type)ZONE_NORMAL); + if (READ_ONCE(pgdat->kswapd_classzone_idx) > ZONE_NORMAL) + WRITE_ONCE(pgdat->kswapd_classzone_idx, ZONE_NORMAL); + wake_up_interruptible(&pgdat->kswapd_wait); } @@ -3769,9 +3770,9 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) static enum zone_type kswapd_classzone_idx(pg_data_t *pgdat, enum zone_type prev_classzone_idx) { - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES) - return prev_classzone_idx; - return pgdat->kswapd_classzone_idx; + enum zone_type curr_idx = READ_ONCE(pgdat->kswapd_classzone_idx); + + return curr_idx == MAX_NR_ZONES ? prev_classzone_idx : curr_idx; } static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order, @@ -3815,8 +3816,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o * the previous request that slept prematurely. */ if (remaining) { - pgdat->kswapd_classzone_idx = kswapd_classzone_idx(pgdat, classzone_idx); - pgdat->kswapd_order = max(pgdat->kswapd_order, reclaim_order); + WRITE_ONCE(pgdat->kswapd_classzone_idx, + kswapd_classzone_idx(pgdat, classzone_idx)); + + if (READ_ONCE(pgdat->kswapd_order) < reclaim_order) + WRITE_ONCE(pgdat->kswapd_order, reclaim_order); } finish_wait(&pgdat->kswapd_wait, &wait); @@ -3893,12 +3897,12 @@ static int kswapd(void *p) tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; set_freezable(); - pgdat->kswapd_order = 0; - pgdat->kswapd_classzone_idx = MAX_NR_ZONES; + WRITE_ONCE(pgdat->kswapd_order, 0); + WRITE_ONCE(pgdat->kswapd_classzone_idx, MAX_NR_ZONES); for ( ; ; ) { bool ret; - alloc_order = reclaim_order = pgdat->kswapd_order; + alloc_order = reclaim_order = READ_ONCE(pgdat->kswapd_order); classzone_idx = kswapd_classzone_idx(pgdat, classzone_idx); kswapd_try_sleep: @@ -3906,10 +3910,10 @@ static int kswapd(void *p) classzone_idx); /* Read the new order and classzone_idx */ - alloc_order = reclaim_order = pgdat->kswapd_order; + alloc_order = reclaim_order = READ_ONCE(pgdat->kswapd_order); classzone_idx = kswapd_classzone_idx(pgdat, classzone_idx); - pgdat->kswapd_order = 0; - pgdat->kswapd_classzone_idx = MAX_NR_ZONES; + WRITE_ONCE(pgdat->kswapd_order, 0); + WRITE_ONCE(pgdat->kswapd_classzone_idx, MAX_NR_ZONES); ret = try_to_freeze(); if (kthread_should_stop()) @@ -3953,20 +3957,23 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order, enum zone_type classzone_idx) { pg_data_t *pgdat; + enum zone_type curr_idx; if (!managed_zone(zone)) return; if (!cpuset_zone_allowed(zone, gfp_flags)) return; + pgdat = zone->zone_pgdat; + curr_idx = READ_ONCE(pgdat->kswapd_classzone_idx); + + if (curr_idx == MAX_NR_ZONES || curr_idx < classzone_idx) + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx); + + if (READ_ONCE(pgdat->kswapd_order) < order) + WRITE_ONCE(pgdat->kswapd_order, order); - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES) - pgdat->kswapd_classzone_idx = classzone_idx; - else - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx, - classzone_idx); - pgdat->kswapd_order = max(pgdat->kswapd_order, order); if (!waitqueue_active(&pgdat->kswapd_wait)) return; -- 1.8.3.1