Received: by 2002:a05:7412:1e0b:b0:fc:a2b0:25d7 with SMTP id kr11csp1061812rdb; Fri, 16 Feb 2024 04:25:54 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX2kMjaheu5zL0CVF/3wZOkmnZzbtuO9cw3+BrItfjE6MObv8UEtKZhQBOw6LaeryfsT59Obap5kS3H449qIUF9eZQUF5kFBxxuYI91pA== X-Google-Smtp-Source: AGHT+IHhMHj/c3GlK8Ko2CuL7PwAQY1yRmRqHeUKwCizGC8IHBOoIgsG2ohZ2fNYDLFzLZq2++KI X-Received: by 2002:a0c:e24b:0:b0:68c:5027:4cf9 with SMTP id x11-20020a0ce24b000000b0068c50274cf9mr4164810qvl.62.1708086354287; Fri, 16 Feb 2024 04:25:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708086354; cv=pass; d=google.com; s=arc-20160816; b=jS9ntwxDFPe82dSBAR2Z1lgKthAn1bsi9hXfBzBxeNHEMYwgbz3JU7YzXSM03UAIWF BRkYW3J11lwWQsGxxf7aDZSle5ht2oXRGqvVPgTA2dsHqVaG1dJ0wlOUVVqoTWpljbnz eT6HDBuBipBo5a+7zSSZ7m5EuuAhLkXIDJMXk7c3RUJ+n1CEyKF+6mB67dUdTUZRBUqr utjKS3JFqmYOy+35DPYRvxaS1bGv8bCcfCMJWiRpNsG1Jbgr13wZvPej/aaQz+zIUhGl ZoYCHmVus/tg76+mA6A5toGcl9tvu3iMZuaSFCBXpHxv+brdfWOkyR/QdEOrjPVo56yH wb4A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date; bh=N1j6xa/DEYDK9Xa48dRWHFL+VGsg2PnWGyeXNci4jq8=; fh=r8G2JENgcRo/RTe4xvZO7IzPLZ7yvHykzgF2Da/HHrw=; b=vQnvX2jzFWYeSzqbu9xVkrSu1flrqAnSVSO/k0Kc0gfMUwfUe2bnJiuEBPQDSg72QY bSwh10LMCoUO+RpJNPU6B+UwFWsG58V18/4Ip/DMLCkz0Ym3PEayCO+r1k+duuJ25Ri0 q8w2wnXP4s+ZF0zgm7cdCsdtyCwyR/rhzmsgLAZg/PvUdwOI2Qt3kw39bIRdZoxRVC2V YQR5H5l4V/lVgZfwZDIHqwdWiwMy5GbQPOVNuK9Qncbf+VJIttNVc4/p301jkCD9ZPh0 a3k02Uv+kghpg1Z98rk/R1GGyMxkYZsVXHoFmxuiLNl15zb8NAASQ2EQ4Ms92NE6k9Mq WbZw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-68565-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-68565-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id js14-20020a0562142aae00b0068f2ac4c8b6si2702772qvb.517.2024.02.16.04.25.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 04:25:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-68565-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-68565-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-68565-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 078E61C226CA for ; Fri, 16 Feb 2024 12:25:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C6F4E78B5B; Fri, 16 Feb 2024 12:25:47 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52B4C7867D for ; Fri, 16 Feb 2024 12:25:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708086347; cv=none; b=Uv890ExD0oxQsVt0qnUWcNIYfaqcLD5lnd5ZhkHJWcmrdEIru4XgXTwOcu5sfTWJJr0ajUB/mdRV8G5kigcH/EzWtTjPy4XlRTxjyhI/TTn2Hve1Wyr80vF0uvX8SKcukoItWmrTxt/3fsdg8o31ewTfQNKEUpxNITr6b+vteic= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708086347; c=relaxed/simple; bh=a80o+y9k9+JZ2PbTxRTtyyupdJ2wSlHG1q/Azlo/+3c=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=UQSWBCGryhwSJ4ATV2i57ieySAQdkA9fA2bYVtzJYwAhjpk7uOUcidxYp9hxmtIViKGSAiuRBmBkyYimtSZ7MmAhUZufS7sFgSc+26LM3ekrzHAl0PAc2NwFXb72k+u4o1/SFD2Htr7YUn2Sh5hfaO7oyC+pmUEM4n0UV1eBvjQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id F0177C433C7; Fri, 16 Feb 2024 12:25:41 +0000 (UTC) Date: Fri, 16 Feb 2024 12:25:39 +0000 From: Catalin Marinas To: Ryan Roberts Cc: Will Deacon , Ard Biesheuvel , Marc Zyngier , James Morse , Andrey Ryabinin , Andrew Morton , Matthew Wilcox , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings Message-ID: References: <20240215103205.2607016-1-ryan.roberts@arm.com> <20240215103205.2607016-13-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240215103205.2607016-13-ryan.roberts@arm.com> On Thu, Feb 15, 2024 at 10:31:59AM +0000, Ryan Roberts wrote: > arch/arm64/mm/contpte.c | 285 +++++++++++++++++++++++++++++++ Nitpick: I think most symbols in contpte.c can be EXPORT_SYMBOL_GPL(). We don't expect them to be used by random out of tree modules. In fact, do we expect them to end up in modules at all? Most seem to be called from the core mm code. > +#define ptep_get_lockless ptep_get_lockless > +static inline pte_t ptep_get_lockless(pte_t *ptep) > +{ > + pte_t pte = __ptep_get(ptep); > + > + if (likely(!pte_valid_cont(pte))) > + return pte; > + > + return contpte_ptep_get_lockless(ptep); > +} [...] > +pte_t contpte_ptep_get_lockless(pte_t *orig_ptep) > +{ > + /* > + * Gather access/dirty bits, which may be populated in any of the ptes > + * of the contig range. We may not be holding the PTL, so any contiguous > + * range may be unfolded/modified/refolded under our feet. Therefore we > + * ensure we read a _consistent_ contpte range by checking that all ptes > + * in the range are valid and have CONT_PTE set, that all pfns are > + * contiguous and that all pgprots are the same (ignoring access/dirty). > + * If we find a pte that is not consistent, then we must be racing with > + * an update so start again. If the target pte does not have CONT_PTE > + * set then that is considered consistent on its own because it is not > + * part of a contpte range. > +*/ I can't get my head around this lockless API. Maybe it works fine (and may have been discussed already) but we should document what the races are, why it works, what the memory ordering requirements are. For example, the generic (well, x86 PAE) ptep_get_lockless() only needs to ensure that the low/high 32 bits of a pte are consistent and there are some ordering rules on how these are updated. Does the arm64 implementation only need to be correct w.r.t. the access/dirty bits? Since we can read orig_ptep atomically, I assume the only other updates from unfolding would set the dirty/access bits. > + > + pgprot_t orig_prot; > + unsigned long pfn; > + pte_t orig_pte; > + pgprot_t prot; > + pte_t *ptep; > + pte_t pte; > + int i; > + > +retry: > + orig_pte = __ptep_get(orig_ptep); > + > + if (!pte_valid_cont(orig_pte)) > + return orig_pte; > + > + orig_prot = pte_pgprot(pte_mkold(pte_mkclean(orig_pte))); > + ptep = contpte_align_down(orig_ptep); > + pfn = pte_pfn(orig_pte) - (orig_ptep - ptep); > + > + for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) { > + pte = __ptep_get(ptep); > + prot = pte_pgprot(pte_mkold(pte_mkclean(pte))); We don't have any ordering guarantees in how the ptes in this range are read or written in the contpte_set_ptes() and the fold/unfold functions. We might not need them given all the other checks below but it's worth adding a comment. > + > + if (!pte_valid_cont(pte) || > + pte_pfn(pte) != pfn || > + pgprot_val(prot) != pgprot_val(orig_prot)) > + goto retry; I think this also needs some comment. I get the !pte_valid_cont() check to attempt retrying when racing with unfolding. Are the other checks needed to detect re-folding with different protection or pfn? > + > + if (pte_dirty(pte)) > + orig_pte = pte_mkdirty(orig_pte); > + > + if (pte_young(pte)) > + orig_pte = pte_mkyoung(orig_pte); > + } After writing the comments above, I think I figured out that the whole point of this loop is to check that the ptes in the contig range are still consistent and the only variation allowed is the dirty/young state to be passed to the orig_pte returned. The original pte may have been updated by the time this loop finishes but I don't think it matters, it wouldn't be any different than reading a single pte and returning it while it is being updated. If you can make this easier to parse (in a few years time) with an additional patch adding some more comments, that would be great. For this patch: Reviewed-by: Catalin Marinas -- Catalin