Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp330039rdb; Tue, 5 Dec 2023 06:43:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IEkKqarp8A0M+8BLFZIn97htPuXtLJXu3UkeAha8NPhoPE0ZlsHyq9jugggSyBYd5vV1p0Z X-Received: by 2002:aa7:9906:0:b0:6ce:46dc:e4ab with SMTP id z6-20020aa79906000000b006ce46dce4abmr1394100pff.2.1701787392998; Tue, 05 Dec 2023 06:43:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701787392; cv=none; d=google.com; s=arc-20160816; b=CPKabWr9+wswY4resXqlayxFF0fptFAiIdM6qmc/q4kmAQBOQDHLm/hCd1JeXDJfDU Nf4HC75+FJrLY6H1LiBhyiEbUB5qJSXblWwx2oNq+ZqM9WF6sG3oihsq44mCRQWSbUOf hIQ1oxlBwtjf70vHbsTeDPfuJPPqkVydbSUak2yOGKyA1QgwGIUB/PS8ny1jZneUDFld L23ZFCsmGVzMCrZAAtDEyVp4ArVUyGWaepuPvn8L++0ip7bEG/GUWLcFzGT6I0AtANlm NoWXK0wz+idlbbtiFMm3rVFbe6L+KnKRdUgtR2HWNiVED5Oi31OWw/cUON/SNNLyil78 iHWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=U51LSDaqyuiEPq+4FkR+5edG2tUVhnJX9Dfx9GtfI8k=; fh=ErXWpa0aFPNAu9NFGojpwTKMfrIlweU9CVDlSbs68ik=; b=c2RVTB/TsAk1vxacg9K9N/RFqNVrw4wuQ/wFYsKu7uhoRJyZVsTUgDNrfxJY+jlYoF vhAxYLuJy0//o/ag5yJZ8/+7rTNbTnvVkzWXnAPw4Tw/MVgJpUJNlUEmpk+l0oxSmWrd scd6OBDD9AysorLeG34GXurU6V3WaVUl2nAKrYmrPzieArd51OhD2hFL05LKYWXIiBpC mVY8xDy9ec1ozS6vTWE2AuqDJCnX9PzFt3pMu8f7V+Y+N7D//Gc7GRlxsD++mmoJET8z 818eloSvrgdj6AbGVi7jitEhkO9R4XVnDVMOAGZ70BJU1J2kF5MfUH/GIi30nB0Nq2er nRrw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id bv8-20020a056a00414800b006ce719a5927si931397pfb.255.2023.12.05.06.43.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 06:43:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 98967804E813; Tue, 5 Dec 2023 06:43:11 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345572AbjLEOnC (ORCPT + 99 others); Tue, 5 Dec 2023 09:43:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232141AbjLEOm6 (ORCPT ); Tue, 5 Dec 2023 09:42:58 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D7010CA for ; Tue, 5 Dec 2023 06:43:04 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 21EC2139F; Tue, 5 Dec 2023 06:43:51 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3962C3F6C4; Tue, 5 Dec 2023 06:43:03 -0800 (PST) Message-ID: <70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com> Date: Tue, 5 Dec 2023 14:43:01 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs Content-Language: en-GB To: James Houghton , Steve Capper , Will Deacon , Andrew Morton Cc: Mike Kravetz , Muchun Song , Anshuman Khandual , Catalin Marinas , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231204172646.2541916-1-jthoughton@google.com> From: Ryan Roberts In-Reply-To: <20231204172646.2541916-1-jthoughton@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 06:43:11 -0800 (PST) On 04/12/2023 17:26, James Houghton wrote: > It is currently possible for a userspace application to enter a page > fault loop when using HugeTLB pages implemented with contiguous PTEs > when HAFDBS is not available. This happens because: > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). Hi James, Do you know how this happens? AFAIK, this is the set of valid bit combinations, and PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is to understand how this is happening and prevent it? /* * PTE bits configuration in the presence of hardware Dirty Bit Management * (PTE_WRITE == PTE_DBM): * * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) * 0 0 | 1 0 0 * 0 1 | 1 1 0 * 1 0 | 1 0 1 * 1 1 | 0 1 x * * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via * the page fault mechanism. Checking the dirty status of a pte becomes: * * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) */ Thanks, Ryan > 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling > the memory access on a system without HAFDBS, we will get a page > fault. > 3. HugeTLB will check if it needs to update the dirty bits on the PTE. > For contiguous PTEs, it will check to see if the pgprot bits need > updating. In this case, HugeTLB wants to write a sequence of > sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about > to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), > so it thinks no update is necessary. > > Please see this[1] reproducer. > > I think (though I may be wrong) that both step (1) and step (3) are > buggy. > > The first patch in this series fixes step (3); instead of checking if > pte_dirty is matching in __cont_access_flags_changed, check pte_hw_dirty > and pte_sw_dirty separately. > > The second patch in this series makes step (1) less likely to occur. > Without this patch, we can get the kernel to write a sw-dirty, hw-clean > PTE with the following steps (showing the relevant VMA flags and pgprot > bits): > i. Create a valid, writable contiguous PTE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE > ii. mprotect the VMA to PROT_NONE. > VMA vmflags: VM_SHARED > VMA pgprot bits: PTE_RDONLY > PTE pgprot bits: PTE_DIRTY | PTE_RDONLY > iii. mprotect the VMA back to PROT_READ | PROT_WRITE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY > > Applying either one of the two patches in this patchset will fix the > particular issue with HugeTLB pages implemented with contiguous PTEs. > It's possible that only one of these patches should be taken, or that > the right fix is something else entirely. > > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0 > > James Houghton (2): > arm64: hugetlb: Distinguish between hw and sw dirtiness in > __cont_access_flags_changed > arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify > > arch/arm64/include/asm/pgtable.h | 6 ++++++ > arch/arm64/mm/hugetlbpage.c | 5 ++++- > 2 files changed, 10 insertions(+), 1 deletion(-) > > > base-commit: 645a9a454fdb7e698a63a275edca6a17ef97afc4