Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4760410ybb; Tue, 24 Mar 2020 04:56:29 -0700 (PDT) X-Google-Smtp-Source: ADFU+vts8ib7CMQL3s+T8ZcyBO38AYHE0hjUkY5HPrf5+Om7/ml8fTFEFOgf8BjcLlhMNmPrLGPC X-Received: by 2002:a9d:3d65:: with SMTP id a92mr20199932otc.326.1585050988887; Tue, 24 Mar 2020 04:56:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585050988; cv=none; d=google.com; s=arc-20160816; b=sC4yMQaRxCY7msdDPieeuyErIfJmr1ItFo2BCWnvfnMfKWy+HGY4syZgOlXI5t3eHW qUaM/StLyS3fhjcAFoelqrrOzVomSGhPtXiHTfxy7vjlPxo1ky/6pwU9QGvHvk2pIve0 qZTI10VCgKaZSHdS6G5UiFxARx3k/1SqJKzIfB6R6qmzTFzzyvR5bsgozV+fNT5faYnT 2DPNTr1OAmY9u1d0xIhEwumL7miIdTUrYd9QkrUV6q9exJcTZWwWshfNxJscHVK4PmwF jZhlrWGE8E3JOu4548o3As0ezPOzOC0vE07s8QqSzh0Md+Q5HKVFuV4HeieaXza+lAqt SZkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=w1YWm4F/uaQ5OYqNJamoxbnrEIFgzKhIqyElOppFU5I=; b=Br9944fNEhKBIwRZuedLQbO4d+sHzmpqnky21UZPlPXec4zXtebMDKpkiXTs/Svtel pFbCuideGnHdiThOIcP33vXgT4bDdoyb4WIf6vnz3g29EQcKCIQKwZYmArm0GasD6r6d h5md8yb1dcvD+sEcvujzc1KHHAySThQCQTTPlbLLanz2a6pGWNfhRpiVk5RUB9xq5H2a cJr7RyFkc0XhDrMm/aBmqdVWj7DknlhYd+VGflpsgId+8o/VjMVl5Cg/0UH6nEsfSpvY 4BXxSbjjzNNHQ6LWvyMnPp4bZ+kDApLDDEsylJPLMa2vr+xmKnbvb1oYaTbxBK+86sI0 UhEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=pT25pjC8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j84si9121520oia.126.2020.03.24.04.56.16; Tue, 24 Mar 2020 04:56:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=pT25pjC8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727318AbgCXLzp (ORCPT + 99 others); Tue, 24 Mar 2020 07:55:45 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:34749 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727227AbgCXLzo (ORCPT ); Tue, 24 Mar 2020 07:55:44 -0400 Received: by mail-qt1-f196.google.com with SMTP id 10so14642683qtp.1 for ; Tue, 24 Mar 2020 04:55:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=w1YWm4F/uaQ5OYqNJamoxbnrEIFgzKhIqyElOppFU5I=; b=pT25pjC8buxYmrlihdNvLu+wRRYwjgGQrAyYVoKbCfUxBGdR12W1egcMG9e4xz3YtL iRNrD87Us1LrJS0Mns/670+/pV2oYj3vpLVBhbs+x06bwDfPkQ9ENan/4x7Fwkf2NQw8 MqW0lEC1xz+kRVUOEymctRt+r5x2KjtT3102m2w5xfryGLJiwJ4GWWiKQq26rV43f3QR oIHpCDCaz09ZRgBWHeucaVDMXTHbz8Gxiaz1tACsnHZTllZtmKSWVLv6LSNKgTAjFlkr 19uDaOv1Bw2C14v0UsFV2pL55FtI81uT4URanBCJ1A9PFbaQM7E2/gKSvyxDLBxcZpLd 50sQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=w1YWm4F/uaQ5OYqNJamoxbnrEIFgzKhIqyElOppFU5I=; b=IilRwS/t6MCb+qPhBSTo3B6S8OqCdzE3X+ZDbzbu1AtrTstU5HHSoDOF164MqAXuU6 /DMyYaP8MwCjOtug6G4/RhDYhIErVx01AcIBobwNiXl+m/oMkPxABF7NvqCNLkoSQ8I9 i+mbw3xaMzdF2KypOXhznkBx7WR3q1u3ER1OrI/uhtLvCZcgLOeh+kaUACWdgzGNDHyu T50yZYpNIJkd9JJyKCVN8t4K3DCUH6ohzH4JEu/tO9IeUVi2x1L14XlRMmG7BPSKAsYz xGAv1PzdvMeB/VFSYDRPbM366kwc9y2Lwm+QfG/4cU8hLVrhm90TShSxbCG4R/hbK9qb Qw9w== X-Gm-Message-State: ANhLgQ3ZB4e7xsZiBN+yFKvI4QWZ8wpDWt13JJn5tdtRAQuD2DI2ckfv R9pMDpoE8EsnDHDpMDRFFBRD6Q== X-Received: by 2002:ac8:184f:: with SMTP id n15mr25478318qtk.371.1585050943209; Tue, 24 Mar 2020 04:55:43 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id v75sm13301271qkb.22.2020.03.24.04.55.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 24 Mar 2020 04:55:42 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1jGi9l-00010Q-R6; Tue, 24 Mar 2020 08:55:41 -0300 Date: Tue, 24 Mar 2020 08:55:41 -0300 From: Jason Gunthorpe To: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" Cc: Mike Kravetz , akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, arei.gonglei@huawei.com, weidong.huang@huawei.com, weifuqiang@huawei.com, kvm@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Sean Christopherson , stable@vger.kernel.org Subject: Re: [PATCH v2] mm/hugetlb: fix a addressing exception caused by huge_pte_offset() Message-ID: <20200324115541.GH20941@ziepe.ca> References: <1582342427-230392-1-git-send-email-longpeng2@huawei.com> <51a25d55-de49-4c0a-c994-bf1a8cfc8638@oracle.com> <20200323160955.GY20941@ziepe.ca> <69055395-e7e5-a8e2-7f3e-f61607149318@oracle.com> <20200323180706.GC20941@ziepe.ca> <88698dd7-eb87-4b0b-7ba7-44ef6eab6a6c@oracle.com> <20200323225225.GF20941@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 24, 2020 at 10:37:49AM +0800, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: > > > On 2020/3/24 6:52, Jason Gunthorpe wrote: > > On Mon, Mar 23, 2020 at 01:35:07PM -0700, Mike Kravetz wrote: > >> On 3/23/20 11:07 AM, Jason Gunthorpe wrote: > >>> On Mon, Mar 23, 2020 at 10:27:48AM -0700, Mike Kravetz wrote: > >>> > >>>>> pgd = pgd_offset(mm, addr); > >>>>> - if (!pgd_present(*pgd)) > >>>>> + if (!pgd_present(READ_ONCE(*pgd))) > >>>>> return NULL; > >>>>> p4d = p4d_offset(pgd, addr); > >>>>> - if (!p4d_present(*p4d)) > >>>>> + if (!p4d_present(READ_ONCE(*p4d))) > >>>>> return NULL; > >>>>> > >>>>> pud = pud_offset(p4d, addr); > >>>> > >>>> One would argue that pgd and p4d can not change from present to !present > >>>> during the execution of this code. To me, that seems like the issue which > >>>> would cause an issue. Of course, I could be missing something. > >>> > >>> This I am not sure of, I think it must be true under the read side of > >>> the mmap_sem, but probably not guarenteed under RCU.. > >>> > >>> In any case, it doesn't matter, the fact that *p4d can change at all > >>> is problematic. Unwinding the above inlines we get: > >>> > >>> p4d = p4d_offset(pgd, addr) > >>> if (!p4d_present(*p4d)) > >>> return NULL; > >>> pud = (pud_t *)p4d_page_vaddr(*p4d) + pud_index(address); > >>> > >>> According to our memory model the compiler/CPU is free to execute this > >>> as: > >>> > >>> p4d = p4d_offset(pgd, addr) > >>> p4d_for_vaddr = *p4d; > >>> if (!p4d_present(*p4d)) > >>> return NULL; > >>> pud = (pud_t *)p4d_page_vaddr(p4d_for_vaddr) + pud_index(address); > >>> > >> > >> Wow! How do you know this? You don't need to answer :) > > > > It says explicitly in Documentation/memory-barriers.txt - see > > section COMPILER BARRIER: > > > > (*) The compiler is within its rights to reorder loads and stores > > to the same variable, and in some cases, the CPU is within its > > rights to reorder loads to the same variable. This means that > > the following code: > > > > a[0] = x; > > a[1] = x; > > > > Might result in an older value of x stored in a[1] than in a[0]. > > > > It also says READ_ONCE puts things in program order, but we don't use > > READ_ONCE inside pud_offset(), so it doesn't help us. > > > > Best answer is to code things so there is exactly one dereference of > > the pointer protected by READ_ONCE. Very clear to read, very safe. > > > > Maybe Longpeng can rework the patch around these principles? > > > Thanks Jason and Mike, I learn a lot from your analysis. > > So... the patch should like this ? Yes, the pattern looks right The commit message should reference the above section of COMPILER BARRIER and explain that de-referencing the entries is a data race, so we must consolidate all the reads into one single place. Also, since CH moved all the get_user_pages_fast code out of the arch's many/all archs can drop their arch specific version of this routine. This is really just a specialized version of gup_fast's algorithm.. (also the arch versions seem different, why do some return actual ptes, not null?) Jason