Received: by 2002:a89:48b:0:b0:1f5:f2ab:c469 with SMTP id a11csp1013488lqd; Thu, 25 Apr 2024 03:37:55 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVzGUzsTm2R2kbMTrnWV/1r1Hh7Xj+STsmRuy0gNdgLyHyG/BT+wINBA8VauUSlEyAU8lSbTqI8gPfeTm23z1N1FHHGDXH5TggLsYYtDw== X-Google-Smtp-Source: AGHT+IFsDq1/ideRHXnh1l1Y/b8RHTNmxjgIo5mW3oUgxcsbA/vlEWNoff6wJi3UsVSl2i0h1+qU X-Received: by 2002:a17:906:bc56:b0:a58:85ab:354c with SMTP id s22-20020a170906bc5600b00a5885ab354cmr2338558ejv.1.1714041475265; Thu, 25 Apr 2024 03:37:55 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714041475; cv=pass; d=google.com; s=arc-20160816; b=iBkBJD3PJ3EMlhLgEN9QmqpLVG9MFdFpgdDroIRySO39I0inEg8Sm60GnkSkqqzlWV MduZYqbhjFhuJ/vNhzIRr09+eIv3Gut8RD8OQqUWERuwXOgy3WGOW01XD5A1MJBY2Nt4 6iotv8ozc/DttLq8CbHkVZfeyspZK9WzcYY/xoOHClw3yjUGRFaKEVjwTTfZubO7o6EQ JtAvavEEXVDpiQvnW7Qduw6UeNYgDZR3oB5becgGdtfWsyudz3EGSEyUyyXOvxWy5t6u zkgJqZpqppoWo1+UV7rFP5V+HQCI09vCpsDrj9lkh+rWsmz00Zf1kSwKb/0mweJK17NP PiNw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=I7uZVyAFMxSILXYxSmONuGK59yMwkxU6DTmLoJFRqpg=; fh=UAztLHilEsXOyAL6Hiw6KuMFCKxEeuWDoTgauC5ec0k=; b=DHOHjCE5uLxNO5APFvEzTiJLlCaCeUEB8nT0qQh7cVMRkHhNLfp0b/ojSrTjEqceMF 8TwGbw/I8GhZluOFt5tEBQ8h9fkZcczNht+17Skqr153VALdg1FvsNi8riWPHCDMq+1a BDPSmA3rtxJeHd41QztPwi23HhdAoOYa7shwmyu6ZvGg6HQppRz2c+mUCotYNb3OQoMo VxUpdnTR8E45Fs1CXkrFdH4GUP96DahZuv9+w+T+VAlW3FDdyycMt+PW2hyd2/c1s4zk 9DIvY+4vLURUD21i8y6JZwa1NQagHwbGbJ3qEiNIczBuGZDBU2MOqVDrR3Z2M/fP0Txb 88LA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-158402-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-158402-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id lv7-20020a170906bc8700b00a58a4e473ecsi1230528ejb.405.2024.04.25.03.37.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 03:37:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-158402-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-158402-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-158402-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id C8D3D1F218EE for ; Thu, 25 Apr 2024 10:37:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 90BAF1DA5F; Thu, 25 Apr 2024 10:37:47 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 72F25182AE for ; Thu, 25 Apr 2024 10:37:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714041467; cv=none; b=OGdes0MqnDFCpsz86tDpMWtH84ak+eB7hioYiIO2NESlQW9OjjLe6+bPPB1MjliVWthHfu+fF+ZytM9FlVyRlcl5U/DS66pfpRk0Z6BLK84dPTcUDbFeDwec18InE1Ekze6izYHtjuGQN/sxHce8yMWmD/28N9ElPYS4mGEkyOg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714041467; c=relaxed/simple; bh=Tuutq6fgzLPnVS/n+BwE2Mk0hC7utaoqSxQbOCj4/HI=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=CHpg970pO8vd7PtyoS5M7Vk+82NNUWp16DUJXN3v9+e3cadogcJHDQv1/qeydxLcoNyvppWbO07X05hv4TeKZfBTjPWeH7Wx8doodup9gTe17GwNstYvIpJwi9PT7m6S+zCl+OT6epEuiigfVf3DeJyPk0UH6j7A6/xGAWVVo5M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 18F6D1007; Thu, 25 Apr 2024 03:38:13 -0700 (PDT) Received: from [10.1.27.187] (unknown [10.1.27.187]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7467A3F64C; Thu, 25 Apr 2024 03:37:43 -0700 (PDT) Message-ID: Date: Thu, 25 Apr 2024 11:37:42 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Content-Language: en-GB From: Ryan Roberts To: David Hildenbrand , Catalin Marinas , Will Deacon , Joey Gouly , Ard Biesheuvel , Mark Rutland , Anshuman Khandual , Peter Xu , Mike Rapoport , Shivansh Vij Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20240424111017.3160195-1-ryan.roberts@arm.com> <20240424111017.3160195-2-ryan.roberts@arm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 25/04/2024 11:29, Ryan Roberts wrote: > On 25/04/2024 10:16, David Hildenbrand wrote: >> On 24.04.24 13:10, Ryan Roberts wrote: >>> Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved >>> for SW use when the PTE is valid. This is a waste of those precious SW >>> bits since PTE_PROT_NONE can only ever be set when valid is clear. >>> Instead let's overlay it on what would be a HW bit if valid was set. >>> >>> We need to be careful about which HW bit to choose since some of them >>> must be preserved; when pte_present() is true (as it is for a >>> PTE_PROT_NONE pte), it is legitimate for the core to call various >>> accessors, e.g. pte_dirty(), pte_write() etc. There are also some >>> accessors that are private to the arch which must continue to be >>> honoured, e.g. pte_user(), pte_user_exec() etc. >>> >>> So we choose to overlay PTE_UXN; This effectively means that whenever a >>> pte has PTE_PROT_NONE set, it will always report pte_user_exec() == >>> false, which is obviously always correct. >>> >>> As a result of this change, we must shuffle the layout of the >>> arch-specific swap pte so that PTE_PROT_NONE is always zero and not >>> overlapping with any other field. As a result of this, there is no way >>> to keep the `type` field contiguous without conflicting with >>> PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So >>> let's move PMD_PRESENT_INVALID to bit 60. >> >> A note that some archs split/re-combine type and/or offset, to make use of every >> bit possible :) But that's mostly relevant for 32bit. >> >> (and as long as PFNs can still fit into the swp offset for migration entries etc.) > > Yeah, I considered splitting the type or offset field to avoid moving > PMD_PRESENT_INVALID, but thought it was better to avoid the extra mask and shift. Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap ptes; it would be cleaner to have one bit that defines "present" when valid is clear (similar to PTE_PROT_NONE today) then another bit which is only defined when "present && !valid" which tells us if this is PTE_PROT_NONE or PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). But there is a problem with this: __split_huge_pmd_locked() calls pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, but was trying to avoid the whole thing unravelling so didn't persue. > >> >>> >>> In the end, this frees up bit 58 for future use as a proper SW bit (e.g. >>> soft-dirty or uffd-wp). >> >> I was briefly confused about how you would use these bits as SW bits for swap >> PTEs (which you can't as they overlay the type). See below regarding bit 3. >> >> I would have said here "proper SW bit for present PTEs". > > Yes; I'll clarify in the next version. > >> >>> >>> Signed-off-by: Ryan Roberts >>> --- >>>   arch/arm64/include/asm/pgtable-prot.h |  4 ++-- >>>   arch/arm64/include/asm/pgtable.h      | 16 +++++++++------- >>>   2 files changed, 11 insertions(+), 9 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/pgtable-prot.h >>> b/arch/arm64/include/asm/pgtable-prot.h >>> index dd9ee67d1d87..ef952d69fd04 100644 >>> --- a/arch/arm64/include/asm/pgtable-prot.h >>> +++ b/arch/arm64/include/asm/pgtable-prot.h >>> @@ -18,14 +18,14 @@ >>>   #define PTE_DIRTY        (_AT(pteval_t, 1) << 55) >>>   #define PTE_SPECIAL        (_AT(pteval_t, 1) << 56) >>>   #define PTE_DEVMAP        (_AT(pteval_t, 1) << 57) >>> -#define PTE_PROT_NONE        (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ >>> +#define PTE_PROT_NONE        (PTE_UXN)         /* Reuse PTE_UXN; only when >>> !PTE_VALID */ >>>     /* >>>    * This bit indicates that the entry is present i.e. pmd_page() >>>    * still points to a valid huge page in memory even if the pmd >>>    * has been invalidated. >>>    */ >>> -#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 59) /* only when >>> !PMD_SECT_VALID */ >>> +#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 60) /* only when >>> !PMD_SECT_VALID */ >>>     #define _PROT_DEFAULT        (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) >>>   #define _PROT_SECT_DEFAULT    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) >>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >>> index afdd56d26ad7..23aabff4fa6f 100644 >>> --- a/arch/arm64/include/asm/pgtable.h >>> +++ b/arch/arm64/include/asm/pgtable.h >>> @@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct >>> vm_area_struct *vma, >>>    * Encode and decode a swap entry: >>>    *    bits 0-1:    present (must be zero) >>>    *    bits 2:        remember PG_anon_exclusive >>> - *    bits 3-7:    swap type >>> - *    bits 8-57:    swap offset >>> - *    bit  58:    PTE_PROT_NONE (must be zero) >> >> Reading this patch alone: what happened to bit 3? Please mention that that it >> will be used as a swap pte metadata bit (uffd-wp). > > Will do. It's all a bit arbitrary though. I could have put offset in 3-52, and > then 53 would have been spare for uffd-wp. I'm not sure there is any advantage > to either option. > >> >>> + *    bits 4-53:    swap offset >> >> So we'll still have 50bit for the offset, good. We could even use 61-63 if ever >> required to store bigger PFNs. > > yep, or more sw bits. > >> >> LGTM >> >> Reviewed-by: David Hildenbrand > > Thanks! > >