Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp458338rdb; Tue, 5 Dec 2023 09:55:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IHwQ5Jt6c2eOKy8j9EH9OsZdOgvEqPWSv2jrLjoSfsk3QqLf+Pv7kK0QKSbGRchMVzJtpsE X-Received: by 2002:a17:903:2c7:b0:1d0:6462:9fd8 with SMTP id s7-20020a17090302c700b001d064629fd8mr6352764plk.36.1701798936765; Tue, 05 Dec 2023 09:55:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701798936; cv=none; d=google.com; s=arc-20160816; b=IFlO/zFNK0BHGR3PDi5Jpefi6id6jHQKndWhzx3ODZ7dhFonOpahyNPdPAJLyjmEgL kw9r3y/Los15Y2BsB2KyYrcrMjn8Tz6iyc54eyQ+JxBSzsf9fjIRBRwSHtihv2A5fiZA ZIbW+6Z9qXWVsnbzNjUbsPObMEjk2g/9dh8xYmtPuQdXxQTj35sLmF3Fjm/D7og1hMWU h6KPNFeewbYxZKcAm4x9aPKa45mJikKRp19UKUY+lF128mqr4mlMsttWmeLPw3T8achF rl1/tCxKVCmbLenM6QGU3SrGmEHkxFfb4Z1msoJT1j6CrZi7zcrfS8LmJOfNlV4nCk0e W3gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=14hZsL2EZPjcbdl66EWnzrt1ktBdJ9RVwmhzDfYd1QY=; fh=0EkKljPS2fTjn0WBsxd2EK2R0bjPF0N4G8L4JCchrCw=; b=NkXEeLIVA74a2DiS6Wrlu0PFv8S5JsUKLzXhsmdlpZKA+LMTmp75biPvED7FhTly1G RP6oi+iDtX72sDae8m5FnDf79/ArSV1x91fnwl+G0FEuSJCz7maGkyJprgqbi2L69EyY hMIOXag8RG8dZhLMYiCr1x657Q+d63eIZGT/j06PQlq8rL1Eqa5MrGZOXiCuWya9YfQ+ wxlFPRUsh3RVnVVCTv7pyam/1yxkE0G/CDvN1nfgEkc0lAfBrqHSU0Rn3mofNGa/kNPD m1srvyq5wdjQIo+bFY82BoBLfVfEX7GoEVv3cye+R4PnJw4UUj+pDxHsxAhB2ei7Cnj9 K4Hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=bMNVu7rU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id u1-20020a170902a60100b001d0bdb270a7si2216513plq.259.2023.12.05.09.55.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 09:55:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=bMNVu7rU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id C4AF3812E37A; Tue, 5 Dec 2023 09:55:33 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235105AbjLERzS (ORCPT + 99 others); Tue, 5 Dec 2023 12:55:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235140AbjLERzQ (ORCPT ); Tue, 5 Dec 2023 12:55:16 -0500 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 444C11A5 for ; Tue, 5 Dec 2023 09:55:22 -0800 (PST) Received: by mail-qt1-x82c.google.com with SMTP id d75a77b69052e-423e04781d3so8561cf.0 for ; Tue, 05 Dec 2023 09:55:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701798921; x=1702403721; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=14hZsL2EZPjcbdl66EWnzrt1ktBdJ9RVwmhzDfYd1QY=; b=bMNVu7rUzOHAD8uG87TcJ4KoyxZfmrYKIwcMAEhVKmGuuMuxLSn/wrGrNZdFGFf458 alwzU7uHYrdDdaQ54SYNnTmg9g4N4VA+VtflHhReB4gOXrwI+3neQo8KWebBcpA0HmwV 9HjwRZ30ZJqYO7Fdt45rCWnpSQrTPoCOXvYe2m6OvclXY69UTu4OVJDHLfdorKBOjrSZ 5aQuEmKvyWUUl6Ee80Cns+Oqzw9npGcJsNXIEldNUcNdxHIexO8/4oYZ/nSfrnDhjOu8 gX2tWzHZjoC4DcCHaHOMhNvVSopAyz/neyjvYlpVpvQgxO3bclti1fE5A5POW9DnUgPt 6f8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701798921; x=1702403721; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=14hZsL2EZPjcbdl66EWnzrt1ktBdJ9RVwmhzDfYd1QY=; b=hGbxxILIaSCrI2HEhIOzOtEPXFwYLECMq9RWPIGkP+rzhTKkpPpOGUjNGeBS535M72 bFndTZE+z1APunwXXsDyHiOzuFW3xA0xYKWSZr0YoKo0t/k1rnorzodr8b4WjaY7m/CZ /KUfWFJ66Ql5FX3X/jPrJ4jro70AOs8wScZL5XPg+ZnM2Q7rihpoFnUnqa+bOz5C1sP0 om7XM0prMhgCMOEtSJGsDiOuFhMKjKr+s0hDxeuyKrZhDQTX24JPyDo6Cp0TEHYX969f EO8RQGXIubjlV6+p1HBIYizgLQiUcVpfxBbHXudOdixMdDwrzfnO3fgzTIAicMo3CY2D Ppqg== X-Gm-Message-State: AOJu0YwYHNVv7mvzQ/uiszlUu+TXWVef4k4HJ+A/9jOXlKUuSmsfrTRm DDizT4xzWlpIxQ14RZufuVDQYgj9DdgxV0ovz9MpCw== X-Received: by 2002:a05:622a:216:b0:423:a0d5:6370 with SMTP id b22-20020a05622a021600b00423a0d56370mr1126930qtx.27.1701798921112; Tue, 05 Dec 2023 09:55:21 -0800 (PST) MIME-Version: 1.0 References: <20231204172646.2541916-1-jthoughton@google.com> <70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com> In-Reply-To: <70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com> From: James Houghton Date: Tue, 5 Dec 2023 09:54:44 -0800 Message-ID: Subject: Re: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs To: Ryan Roberts Cc: Steve Capper , Will Deacon , Andrew Morton , Mike Kravetz , Muchun Song , Anshuman Khandual , Catalin Marinas , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Tue, 05 Dec 2023 09:55:33 -0800 (PST) On Tue, Dec 5, 2023 at 6:43=E2=80=AFAM Ryan Roberts = wrote: > > On 04/12/2023 17:26, James Houghton wrote: > > It is currently possible for a userspace application to enter a page > > fault loop when using HugeTLB pages implemented with contiguous PTEs > > when HAFDBS is not available. This happens because: > > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). > > Hi James, > > Do you know how this happens? Hi Ryan, Thanks for taking a look! I do understand why this is happening. There is an explanation in the reproducer[1] and also in this cover letter (though I realize I could have been a little clearer). See below. > AFAIK, this is the set of valid bit combinations, and > PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solut= ion is > to understand how this is happening and prevent it? > > /* > * PTE bits configuration in the presence of hardware Dirty Bit Managemen= t > * (PTE_WRITE =3D=3D PTE_DBM): > * > * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) > * 0 0 | 1 0 0 > * 0 1 | 1 1 0 > * 1 0 | 1 0 1 > * 1 1 | 0 1 x > * > * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated= via > * the page fault mechanism. Checking the dirty status of a pte becomes: > * > * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) > */ Thanks for pointing this out. So (1) is definitely a bug. The second patch in this series makes it impossible to create such a PTE via pte_modify (by forcing sw-dirty PTEs to be hw-dirty as well). > > The second patch in this series makes step (1) less likely to occur. It makes it impossible to create this invalid set of bits via pte_modify(). Assuming all PTE pgprot updates are done via the proper interfaces, patch #2 might actually make this invalid bit combination impossible to produce (that's certainly the goal). So perhaps language stronger than "less likely" is appropriate. Here's the sequence of events to trigger this bug, via mprotect(): > > Without this patch, we can get the kernel to write a sw-dirty, hw-clean > > PTE with the following steps (showing the relevant VMA flags and pgprot > > bits): > > i. Create a valid, writable contiguous PTE. > > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > > PTE pgprot bits: PTE_DIRTY | PTE_WRITE > > ii. mprotect the VMA to PROT_NONE. > > VMA vmflags: VM_SHARED > > VMA pgprot bits: PTE_RDONLY > > PTE pgprot bits: PTE_DIRTY | PTE_RDONLY > > iii. mprotect the VMA back to PROT_READ | PROT_WRITE. > > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > > PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY With patch #2, the PTE pgprot bits in step iii become PTE_DIRTY | PTE_WRITE (hw-dirtiness is set, as the PTE is sw-dirty). Thanks! > > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0