Received: by 2002:a05:7412:3b8b:b0:fc:a2b0:25d7 with SMTP id nd11csp3042606rdb; Tue, 13 Feb 2024 05:24:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IGGEzud7/5CwijLQkBDZw+5JY4fDGzFyuv5UNQljhuMMPSHmncOQbsfNOfo2PUzZJ51pwe2 X-Received: by 2002:a05:6214:508c:b0:68c:b9a8:a41b with SMTP id kk12-20020a056214508c00b0068cb9a8a41bmr4463346qvb.6.1707830689342; Tue, 13 Feb 2024 05:24:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707830689; cv=pass; d=google.com; s=arc-20160816; b=j8o5NfkRyJzXk9LCoudpvCQz28ON6wQQ1HHIrSZes+127V5IKSAGx+WxkxZ9NYSakZ DhCwewHf3gxTEemsTxboyx03zoXykit7UmELLz+igNhjZvobPwyhyo00IQ8cT3berZyK qoyjiGA3qGvj2WLsAOJ4eL8BGTrhvlIaGddqcS5Q0f9iS/UtVjEMhZQxwfkx5bt4fi2G dH9z/2DLzFWQVOzPJ1fr7kM6FspfM6Pnc6dxsF93VKwxdzTzPyFwf1hgtW3JBfYKsDpG fOK6atwoT85nMP5OE9+8WRVZfJtd+MYr3tf1KGDkZyO9ncb5W/eRV9Uy1TvCc9dZB0fj 2iYA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=1wRUPa9qFr1PAhaunGCETjUroWP6uh2Kn1bbKJpy/no=; fh=5cdrzkG3gL5ASLnmyvYR+wSyAcsh3QoFNDOz4mwfPWk=; b=SSpI69ReqsFY7vSrhfbANIZw+iz3fvTSlvAArpA6fPUqM4mTVL/feGP9ftiQQ6DqGG OlQki0MrSzrA31EKzib4KzxcvHZRdKWdKmJsuG16+j0kQanELYZ169qgrL2JEfwJvkE2 92hpAvBt2XxKFU4V/Ngz1L+dbaDgBCqAV2Z4fG0/6h1rNC1lT2Rjj3qePmHtlvi5Vw1z 5420O9ZlY51E3/upQoXstqL3V72Gyt0bUayP5tGGZObtxQrMW4xZdDbrzmQhAMygppHa PmOgqSuSQiJ2SBfLA0fpR860qlftwji39CPmuVKkWYNHcCh92Pe+HnV42V3WzGjmmIMm K5Iw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-63581-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63581-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=2; AJvYcCXccrhyo+pVGOJZCf61QAugktP2FTTtjjrTJHF8erD2Z3gC2zFVsLnomvExYNBGU2dsMP3y2+FhTFoDU/DVXiTe+FCaQ0zoHVxdnVB5UQ== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id c5-20020a056214224500b0068ef75c950bsi401392qvc.124.2024.02.13.05.24.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 05:24:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-63581-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-63581-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63581-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id EB6B11C21C89 for ; Tue, 13 Feb 2024 13:24:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 96EE655C18; Tue, 13 Feb 2024 13:24:43 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6066A55C04 for ; Tue, 13 Feb 2024 13:24:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707830683; cv=none; b=gwb9HKKVtUMWYLyrnwltKq6/NHzROfOiHokk292OFM1FAr6HRxg+a57gH1pYXXR4aSPQ3g9gO1SDSaCUX214bTiBMPjGnUoWE4SxAcKYfI+mE+1YfDOviVMKeO7g8H9LLG57+Lt8Bh8VB3QaOEbGJ+d+h/lnQ/dl6aHYjrJLCqo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707830683; c=relaxed/simple; bh=gF9SlWYZzoKeXO5N0lntgur2mMdXnes++Vt7sVfrFwM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=YwC+IbyF151PaxBNZa3rxq7amVNgMT3Y4rNcDZc+5qU5FSI8at13J9it/il63mr6V79+/1z4e6qwElu2IrkCs+YIHVpDuL+lKyOj0JixriVDz+KkSjozSWDkCgN9Z7Z1GW0kaulWOLtGOVctWsaj/2UEOuhvdGnz2s5LonLCEKY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B29DADA7; Tue, 13 Feb 2024 05:25:21 -0800 (PST) Received: from [10.1.36.184] (XHFQ2J9959.cambridge.arm.com [10.1.36.184]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D643C3F762; Tue, 13 Feb 2024 05:24:36 -0800 (PST) Message-ID: Date: Tue, 13 Feb 2024 13:24:35 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings Content-Language: en-GB To: David Hildenbrand , Mark Rutland Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , James Morse , Andrey Ryabinin , Andrew Morton , Matthew Wilcox , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi , Nicholas Piggin , Christophe Leroy , "Aneesh Kumar K.V" , "Naveen N. Rao" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240202080756.1453939-1-ryan.roberts@arm.com> <20240202080756.1453939-20-ryan.roberts@arm.com> <64395ae4-3a7d-45dd-8f1d-ea6b232829c5@arm.com> <41499621-482f-455b-9f68-b43ea8052557@redhat.com> <1d302d7a-50ab-4ab4-b049-75ed4a71a87d@arm.com> <99e2a92c-f2a2-4e1e-8ce2-08caae2cb7e4@redhat.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 13/02/2024 13:22, David Hildenbrand wrote: > On 13.02.24 14:20, Ryan Roberts wrote: >> On 13/02/2024 13:13, David Hildenbrand wrote: >>> On 13.02.24 14:06, Ryan Roberts wrote: >>>> On 13/02/2024 12:19, David Hildenbrand wrote: >>>>> On 13.02.24 13:06, Ryan Roberts wrote: >>>>>> On 12/02/2024 20:38, Ryan Roberts wrote: >>>>>>> [...] >>>>>>> >>>>>>>>>>> +static inline bool mm_is_user(struct mm_struct *mm) >>>>>>>>>>> +{ >>>>>>>>>>> +    /* >>>>>>>>>>> +     * Don't attempt to apply the contig bit to kernel mappings, >>>>>>>>>>> because >>>>>>>>>>> +     * dynamically adding/removing the contig bit can cause page >>>>>>>>>>> faults. >>>>>>>>>>> +     * These racing faults are ok for user space, since they get >>>>>>>>>>> serialized >>>>>>>>>>> +     * on the PTL. But kernel mappings can't tolerate faults. >>>>>>>>>>> +     */ >>>>>>>>>>> +    return mm != &init_mm; >>>>>>>>>>> +} >>>>>>>>>> >>>>>>>>>> We also have the efi_mm as a non-user mm, though I don't think we >>>>>>>>>> manipulate >>>>>>>>>> that while it is live, and I'm not sure if that needs any special >>>>>>>>>> handling. >>>>>>>>> >>>>>>>>> Well we never need this function in the hot (order-0 folio) path, so I >>>>>>>>> think I >>>>>>>>> could add a check for efi_mm here with performance implication. It's >>>>>>>>> probably >>>>>>>>> safest to explicitly exclude it? What do you think? >>>>>>>> >>>>>>>> Oops: This should have read "I think I could add a check for efi_mm here >>>>>>>> *without* performance implication" >>>>>>> >>>>>>> It turns out that efi_mm is only defined when CONFIG_EFI is enabled. I >>>>>>> can do >>>>>>> this: >>>>>>> >>>>>>> return mm != &init_mm && (!IS_ENABLED(CONFIG_EFI) || mm != &efi_mm); >>>>>>> >>>>>>> Is that acceptable? This is my preference, but nothing else outside of efi >>>>>>> references this symbol currently. >>>>>>> >>>>>>> Or perhaps I can convince myself that its safe to treat efi_mm like >>>>>>> userspace. >>>>>>> There are a couple of things that need to be garanteed for it to be safe: >>>>>>> >>>>>>>      - The PFNs of present ptes either need to have an associated struct >>>>>>> page or >>>>>>>        need to have the PTE_SPECIAL bit set (either pte_mkspecial() or >>>>>>>        pte_mkdevmap()) >>>>>>> >>>>>>>      - Live mappings must either be static (no changes that could cause >>>>>>> fold/unfold >>>>>>>        while live) or the system must be able to tolerate a temporary fault >>>>>>> >>>>>>> Mark suggests efi_mm is not manipulated while live, so that meets the latter >>>>>>> requirement, but I'm not sure about the former? >>>>>> >>>>>> I've gone through all the efi code, and conclude that, as Mark suggests, the >>>>>> mappings are indeed static. And additionally, the ptes are populated using >>>>>> only >>>>>> the _private_ ptep API, so there is no issue here. As just discussed with >>>>>> Mark, >>>>>> my prefereence is to not make any changes to code, and just add a comment >>>>>> describing why efi_mm is safe. >>>>>> >>>>>> Details: >>>>>> >>>>>> * Registered with ptdump >>>>>>        * ptep_get_lockless() >>>>>> * efi_create_mapping -> create_pgd_mapping … -> init_pte: >>>>>>        * __ptep_get() >>>>>>        * __set_pte() >>>>>> * efi_memattr_apply_permissions -> efi_set_mapping_permissions … -> >>>>>> set_permissions >>>>>>        * __ptep_get() >>>>>>        * __set_pte() >>>>> >>>>> Sound good. We could add some VM_WARN_ON if we ever get the efi_mm via the >>>>> "official" APIs. >>>> >>>> We could, but that would lead to the same linkage issue, which I'm trying to >>>> avoid in the first place: >>>> >>>> VM_WARN_ON(IS_ENABLED(CONFIG_EFI) && mm == efi_mm); >>>> >>>> This creates new source code dependencies, which I would rather avoid if >>>> possible. >>> >>> Just a thought, you could have a is_efi_mm() function that abstracts all that. >>> >>> diff --git a/include/linux/efi.h b/include/linux/efi.h >>> index c74f47711f0b..152f5fa66a2a 100644 >>> --- a/include/linux/efi.h >>> +++ b/include/linux/efi.h >>> @@ -692,6 +692,15 @@ extern struct efi { >>>     extern struct mm_struct efi_mm; >>>   +static inline void is_efi_mm(struct mm_struct *mm) >>> +{ >>> +#ifdef CONFIG_EFI >>> +       return mm == &efi_mm; >>> +#else >>> +       return false; >>> +#endif >>> +} >>> + >>>   static inline int >>>   efi_guidcmp (efi_guid_t left, efi_guid_t right) >>>   { >>> >>> >> >> That would definitely work, but in that case, I might as well just check for it >> in mm_is_user() (and personally I would change the name to mm_is_efi()): >> >> >> static inline bool mm_is_user(struct mm_struct *mm) >> { >>     return mm != &init_mm && !mm_is_efi(mm); >> } >> >> Any objections? >> > > Nope :) Maybe slap in an "unlikely()", because efi_mm *is* unlikely to show up. Deal >