Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp3293024ybb; Sun, 22 Mar 2020 20:44:01 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvs/dthNPM6SKJN5HBD+pwyaNPtJ9QSdw7+TlmjL8AjyVQgCH3kHaChHR3MAoxvlv9gime5 X-Received: by 2002:a9d:bf5:: with SMTP id 108mr17438734oth.260.1584935041369; Sun, 22 Mar 2020 20:44:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584935041; cv=none; d=google.com; s=arc-20160816; b=O7dv74RVK9pakG4uo+NiL4V/kcQGzUwhzUurqijmRCBWyV81yDtJk506cAnbyttjJE 1KoyQs+/3tqg49Eh1qIs9of7OfxskYrELb4uDHwgF4iCmCKcvEtbuC9Wq1LS3iWERSY3 g/huT72Xb6h51OzGcya2QulStsdG/Z7UHrryNHckr4zKvWGTJFy+V6jZxM4rznnOZSHh 8AJeqfLqLKVqKXLM0pTJWagONCBnpLr2H4K2nqNko9AwQU8Zz5+vubQn5+UD5pTz4tcx GN2pc/lmdwOD3x2LSiQTvx9cz/GA2qGxTEF6nW7ja1o/DKCZ7Pf60ulQTcL2o9o/yMOS Wi/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=m1Ouh3KU4a6TeZHRHHa2s+H/qrWDeiqINComiPhsqSA=; b=cEqcRDY6w55yPykOKSBbwvEPmiz/NdqoQC3qEKXndsDwOyXAU8gNfQ1aCXAynrRzsk KFb4T6KiCH1V8dG56z7rGKS/ifSLhxLsd15tnJjR/wWeqdA5DuLw95YNCVQn1DsrIT2y ypDCfVNhLWXPJDRvEfr5hBi8tKtZKXnlNwy+uY83kITo6xoV2vGs76S54o4Cf+Js9R+v Fy0U85RnPWzG1/hxd8VnjZ27Tb69Ux4WAHmoM4btzi0xe0xk8ab0EkJJyL8r9vQyD92k 2HsZHh6b5oz1ppQ0eVR/pqtPZLl3Nk4U7qfWEI+GpA0BsU1iPeGaqwK72fPrf4aTz9DL QBkQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g19si7499232otn.219.2020.03.22.20.43.48; Sun, 22 Mar 2020 20:44:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727124AbgCWDnV (ORCPT + 99 others); Sun, 22 Mar 2020 23:43:21 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:50272 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726979AbgCWDnV (ORCPT ); Sun, 22 Mar 2020 23:43:21 -0400 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 72D3B7BDC1BD720A4460; Mon, 23 Mar 2020 11:43:16 +0800 (CST) Received: from [10.173.228.124] (10.173.228.124) by smtp.huawei.com (10.3.19.202) with Microsoft SMTP Server (TLS) id 14.3.487.0; Mon, 23 Mar 2020 11:43:10 +0800 Subject: Re: [PATCH v2] mm/hugetlb: fix a addressing exception caused by huge_pte_offset() To: Mike Kravetz CC: , , , , , , , , Matthew Wilcox , "Sean Christopherson" , References: <1582342427-230392-1-git-send-email-longpeng2@huawei.com> <51a25d55-de49-4c0a-c994-bf1a8cfc8638@oracle.com> <5700f44e-9df9-1b12-bc29-68e0463c2860@huawei.com> From: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" Message-ID: Date: Mon, 23 Mar 2020 11:43:09 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.173.228.124] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/3/23 10:54, Mike Kravetz wrote: > On 3/22/20 7:03 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: >> >> >> On 2020/3/22 7:38, Mike Kravetz wrote: >>> On 2/21/20 7:33 PM, Longpeng(Mike) wrote: >>>> From: Longpeng >>>> >>>> Our machine encountered a panic(addressing exception) after run >>>> for a long time and the calltrace is: [snip] >>>> >>>> We can avoid this race by read the pud only once. What's more, we also use >>>> READ_ONCE to access the entries for safe(e.g. avoid the compilier mischief) >>>> >>>> Cc: Matthew Wilcox >>>> Cc: Sean Christopherson >>>> Cc: Mike Kravetz >>>> Cc: stable@vger.kernel.org >>>> Signed-off-by: Longpeng >>> >>> Andrew dropped this patch from his tree which caused me to go back and >>> look at the status of this patch/issue. >>> >>> It is pretty obvious that code in the current huge_pte_offset routine >>> is racy. I checked out the assembly code produced by my compiler and >>> verified that the line, >>> >>> if (pud_huge(*pud) || !pud_present(*pud)) >>> >>> does actually dereference *pud twice. So, the value could change between >>> those two dereferences. Longpeng (Mike) could easlily recreate the issue >>> if he put a delay between the two dereferences. I believe the only >>> reservations/concerns about the patch below was the use of READ_ONCE(). >>> Is that correct? >>> >> Hi Mike, >> >> It seems I've missed your another mail in my client, I found it here >> (https://lkml.org/lkml/2020/2/27/1927) just now. >> >> I think we have reached an agreement that the pud/pmd need READ_ONCE in >> huge_pte_offset() and disagreement is whether the pgd/p4d also need READ_ONCE, >> right ? > > Correct. > > Sorry, I did not reply to the mail thread with more context. > >>> Are there any objections to the patch if the READ_ONCE() calls are removed? >>> >> Because the pgd/p4g are only accessed and dereferenced once here, so some guys >> want to remove it. >> >> But we must make sure they are *really* accessed once, in other words, this >> makes we need to care about both the implementation of pgd_present/p4d_present >> and the behavior of any compiler, for example: >> >> ''' >> static inline int func(int val) >> { >> return subfunc1(val) & subfunc2(val); >> } >> >> func(*p); // int *p >> ''' >> We must make sure there's no strange compiler to generate an assemble code that >> access and dereference 'p' more than once. >> >> I've not found any backwards with READ_ONCE here. However, if you also agree to >> remove READ_ONCE around pgd/p4d, I'll do. >> > > I would like to remove the READ_ONCE calls and move the patch forward. It > does address a real issue you are seeing. > > To be honest, I am more worried about the races in lookup_address_in_pgd() > than using or not using READ_ONCE for pgd/p4d in this patch. > I had the same worry, we've discussed in another thread (https://lkml.org/lkml/2020/2/20/1182) where I asked you `Is it possible the pud changes from pud_huge() to pud_none() while another CPU is walking the pagetable` and you thought it's possible. The reason why I didn't do something in lookup_address_in_pgd together is just because I haven't went into trouble caused by it yet. > I have not looked closely at the generated code for lookup_address_in_pgd. > It appears that it would dereference p4d, pud and pmd multiple times. Sean > seemed to think there was something about the calling context that would > make issues like those seen with huge_pte_offset less likely to happen. I > do not know if this is accurate or not. > > Let's remove the two READ_ONCE calls and move this patch forward. We can > look closer at lookup_address_in_pgd and generate another patch if that needs > to be fixed as well. > OK, I'll remove them in v3. I'll do some fault injection or add some delays in lookup_address_in_pgd to test if it can work well. > Thanks > --- Regards, Longpeng(Mike)