Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1306374ybv; Thu, 6 Feb 2020 01:36:19 -0800 (PST) X-Google-Smtp-Source: APXvYqyWALemWJyNo1tP+qe2aUdoYhD/Kfi22g8e4cQb/WMuyJQj8td64ZulVTYFqP7OXCg1tiW5 X-Received: by 2002:aca:ab52:: with SMTP id u79mr5973202oie.145.1580981779853; Thu, 06 Feb 2020 01:36:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580981779; cv=none; d=google.com; s=arc-20160816; b=uB0LVRQAvdzpp/dyW4ssabkobmE8TOKpLMjMdYi0vWFTkPPQxuxTN+OfdqAeefHDpu 2SMiN4nIlp8OB1zT9lNJO40ZcVntGMy3nYYOJCwrUk44BYdFya7nU2PXRSF/RPrrB6Ly ec0DmJWV0aV51CR9bkx8paVYpVioJr8cc3NfTzK9Bu8ySlftR9SGLe3JessULITcwQcr cfj2aIUhfU+BwZPiFzZ22UEqjRZIqFf+qUVrecO6+fOpUmic6VgGMqpsmduYEXwzlGSV fcaYzWNmO7gYQrRrCGu8t2Nmf94Uu9ROIixAydDgYFbObFau10FeVEYadRz5DS3dZTJj ckug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=T7LLoJdBq3PsLsJT9YfIwC6CJwfW2ySnDmWnUHjFzX8=; b=SG/dskXsc5BUw9Jtwf6WMzbeWJTVfCuZB3Ia9ckfj6BwnRWrrNynXeEqRIeLdgLfEu I94sncsq0mfLhgS/+VqFXD+dM7rue//sSAxHDcfbh6c5i57qA0CsK8A5TkpwAcapjoAz 7IMebkdN5VO0AM54NJxtMB9POQKKRYRtt6+39ok550sztN1gQX0Yqez4QlMpY1lnODdw 63zt5DYXQ2KnubBZb/usAkuq7XHJDaKQIAFIF/+/uolofE7uQGaUE3dYg8orSgn+1A4I BQs/CqgA4AKiyOd6dvkIuH/nQ7wr9w1vzQK2Sd9SB1eZnMeAoGrR4n9NXhXFS8oNbuVK khPg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m19si1642022oig.91.2020.02.06.01.36.07; Thu, 06 Feb 2020 01:36:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728248AbgBFJEk (ORCPT + 99 others); Thu, 6 Feb 2020 04:04:40 -0500 Received: from mx2.suse.de ([195.135.220.15]:44876 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726452AbgBFJEj (ORCPT ); Thu, 6 Feb 2020 04:04:39 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 0B3A3ADC2; Thu, 6 Feb 2020 09:04:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 6B79A1E0E31; Thu, 6 Feb 2020 10:04:36 +0100 (CET) Date: Thu, 6 Feb 2020 10:04:36 +0100 From: Jan Kara To: John Hubbard Cc: Qian Cai , akpm@linux-foundation.org, ira.weiny@intel.com, dan.j.williams@intel.com, jack@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH -next] mm: mark a intentional data race in page_zonenum() Message-ID: <20200206090436.GF14001@quack2.suse.cz> References: <20200206035235.2537-1-cai@lca.pw> <480a7dde-f678-c07b-2231-4da8e0a38753@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <480a7dde-f678-c07b-2231-4da8e0a38753@nvidia.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 05-02-20 20:50:34, John Hubbard wrote: > On 2/5/20 7:52 PM, Qian Cai wrote: > > The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for > > ZONE_DEVICE pages") introduced a data race as page->flags could be > > Hi, > > I really don't think so. This "race" was there long before that commit. > Anyway, more below: > > > accessed concurrently as noticied by KCSAN, > > > > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page > > > > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3: > > page_cpupid_xchg_last+0x51/0x80 > > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11) > > wp_page_reuse+0x3e/0xc0 > > wp_page_reuse at mm/memory.c:2453 > > do_wp_page+0x472/0x7b0 > > do_wp_page at mm/memory.c:2798 > > __handle_mm_fault+0xcb0/0xd00 > > handle_pte_fault at mm/memory.c:4049 > > (inlined by) __handle_mm_fault at mm/memory.c:4163 > > handle_mm_fault+0xfc/0x2f0 > > handle_mm_fault at mm/memory.c:4200 > > do_page_fault+0x263/0x6f9 > > do_user_addr_fault at arch/x86/mm/fault.c:1465 > > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 > > page_fault+0x34/0x40 > > > > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69: > > put_page+0x15a/0x1f0 > > page_zonenum at include/linux/mm.h:923 > > (inlined by) is_zone_device_page at include/linux/mm.h:929 > > (inlined by) page_is_devmap_managed at include/linux/mm.h:948 > > (inlined by) put_page at include/linux/mm.h:1023 > > wp_page_copy+0x571/0x930 > > wp_page_copy at mm/memory.c:2615 > > do_wp_page+0x107/0x7b0 > > __handle_mm_fault+0xcb0/0xd00 > > handle_mm_fault+0xfc/0x2f0 > > do_page_fault+0x263/0x6f9 > > page_fault+0x34/0x40 > > > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6 > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 > > > > Both the read and write are done only with the non-exclusive mmap_sem > > held. Since the read only check for a specific bit in the flag, even if > > Perhaps a clearer explanation is that the read of the page flags is > always looking at a bit range (zone number: up to 3 bits) that is not > being written to by the writer. Well, that's not quite true because we do store full long there. You're right that we do that with cmpxchg() and modify only some bits in that word but that may be too difficult for the sanitizer to find out. > > load tearing happens, it will be harmless, so just mark it as an > > intentional data races using the data_race() macro. > > > > Signed-off-by: Qian Cai > > --- > > include/linux/mm.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 52269e56c514..cafccad584c2 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); > > static inline enum zone_type page_zonenum(const struct page *page) > > { > > - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > + return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK); > > > I don't know about this. Lots of the kernel is written to do this sort > of thing, and adding a load of "data_race()" everywhere is...well, I'm not > sure if it's really the best way. I wonder: could we maybe teach this > kcsan thing to understand a few of the key idioms, particularly about page > flags, instead of annotating all over the place? So in this particular case, store tearing is non-issue because we use atomic operation to store the value in page_cpupid_xchg_last(). I think it would make some sense to use READ_ONCE(page->flags) here to prevent compiler from loading page->flags several times - I have hard time finding a reason why a compiler would want to do that but conceptually that protection makes sense, it is for free performance wise, and will still allow KCSAN to find a race in case we ever grow a place that modifies page's zone non-atomically (which might be a real problem). And it should also silence the KCSAN warning AFAIU. Honza -- Jan Kara SUSE Labs, CR