Received: by 2002:ab2:1689:0:b0:1f7:5705:b850 with SMTP id d9csp1358600lqa; Mon, 29 Apr 2024 06:33:15 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUr9Jn1V+KkuetY3aahTowZJQLg/gwDV3IJGM9NksPVlInldWV3abcIcknBmKaopojaXscdfPaBu2uTg3J3tg62lMlML7EZvJAHcmmtyA== X-Google-Smtp-Source: AGHT+IGvOmV118/7vGYyNMDPOkc/OaeE3zmig/aQzK5eGNJaWXF7MVmcp+BYEflclssSqSTCvTXG X-Received: by 2002:a05:6a20:7354:b0:1a7:9b0e:ded3 with SMTP id v20-20020a056a20735400b001a79b0eded3mr15034073pzc.11.1714397594957; Mon, 29 Apr 2024 06:33:14 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714397594; cv=pass; d=google.com; s=arc-20160816; b=i9/bM4H/itXWUw9nFkQLAtG/Fqb7BJoWJEX2jXk8BaaCAPqV/eBwb4fbAk3EWxeDg8 geqQUzcwAHAH0QHZDNfucIkXbyW/kc0S6iM+Qac4soIUqpA5RLIYm0LHRroGZdrEF78z fZWpez164s+6MZJCjeAcXOoCqYx9f6rhIMF1d5KDTiJt5QFPTfchjKQg1Bb6373/MgVZ OZ7XVz/1wrGGX1ijnlmri1ic4XPwAiZhqOBmL54WD8Mndf/5m/1o8bKTTZrNcxMaNL8s +zDjl1M57Jy88tUkOe3Cgh3fpaLdsT0bl6tjIpW93MZIKCUiQJ5LD5hpFkfD+i9LyTqX RN3g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=5/WLB6GgFbSPQNwklvnmkTynllxxS4m1l9Ly1V4gxDQ=; fh=U1xoU6unq6u5Q1DS7Bkgc+vHYyDBQRlQ7rJyYfSCZs8=; b=oCJctKJfJ95FvenXokLRNkcMAmDDp0mz2Rx6pWcJwZtADEq9ECytBw6eoY7Yv2lm4A w91wasOWB3OEsB7gsYlr+NNN6iptTt0BleP6ja2L4FgNzIJLbAAwjgvW4KilcJ4G/kKm Hz0eEm6NYV2wDjPFPhZGsrIs1xB25nfrUOLtxHHbiFaxFYGj+MkGwfkMIpdSqHIVyoTm 1OUY3Z3SWCDRp0lGJpQnG+QeC90slTkiHJapHuDidvGoJljLkzYvmeqw2tXtG9jno1fY fqgePXdDk0oBvncFThB556goOvs50NfF23FVjBsSP8Wst/T5H0+CLsf7H8vaaFlZ3R3C ZWjg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-162361-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-162361-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id v14-20020a056a00148e00b006ed633b9110si20219006pfu.26.2024.04.29.06.33.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 06:33:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-162361-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-162361-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-162361-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 0B083288C53 for ; Mon, 29 Apr 2024 13:24:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5105975809; Mon, 29 Apr 2024 13:23:42 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7F7B570CDB for ; Mon, 29 Apr 2024 13:23:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714397021; cv=none; b=T3AZYJV9+GhPUffRMtxdawHjB1m22/ZmJkg3TWDGKzCPj1bHSluyY8StL+kr8JhQ/nbZ1Zmso5LdyHLk+MZ6854bVzwdi5cNEX/q+fvh2PrkgYCEylGTE6I9QIBahXRN5gv9BLGp/lrZwxfTvXxtd/1/tzoTVcTuUMGsqXK6wnE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714397021; c=relaxed/simple; bh=RJNe2/zqHTI5PmIO/BAY9aqVrI5+SR3aM4haSDw11i8=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=HeU542gH+yIHacg+kF0BOZ5noVYVsCNOCIuYL6VdIRx0g6qxHsv9p39o/hb3U91GkaCizwpQnO8aZ2C7cj+n4Nai+ySPWQlBj9ADoUzDjnk3wQXl7VepDOX7jT7ZEvoIbcdb9J4QABZW0VWqYYI2NHFNque5N9nnUHJlpgjz3tY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2FE312F4; Mon, 29 Apr 2024 06:24:05 -0700 (PDT) Received: from [10.57.65.53] (unknown [10.57.65.53]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CABAE3F73F; Mon, 29 Apr 2024 06:23:36 -0700 (PDT) Message-ID: <3ee07020-74d9-4f13-a3d0-4924a1aa69c6@arm.com> Date: Mon, 29 Apr 2024 14:23:35 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Content-Language: en-GB From: Ryan Roberts To: Catalin Marinas Cc: David Hildenbrand , Will Deacon , Joey Gouly , Ard Biesheuvel , Mark Rutland , Anshuman Khandual , Peter Xu , Mike Rapoport , Shivansh Vij , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20240424111017.3160195-1-ryan.roberts@arm.com> <20240424111017.3160195-2-ryan.roberts@arm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 29/04/2024 14:01, Ryan Roberts wrote: > On 29/04/2024 13:38, Catalin Marinas wrote: >> On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote: >>> On 26/04/2024 15:48, Catalin Marinas wrote: >>>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: >>>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap >>>>> ptes; it would be cleaner to have one bit that defines "present" when valid is >>>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined >>>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or >>>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). >>>> >>>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID >>>> and use it for both ptes and pmds. >>> >>> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in >>> core-mm so will now fix that before I can validate my change. see >>> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ >>> >>> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead >>> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1) >>> with both PTE_WRITE=0 and PTE_RDONLY=0. >>> >>> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit >>> modification", this is not a problem as the pte is invalid, so the HW doesn't >>> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability >>> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination >>> that we now repurpose for PROT_NONE. >> >> Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be >> set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination >> for PAGE_NONE (bar the kernel mappings). > > Yes I guess that works. I personally prefer my proposal because it is more > intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But > if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is > preferable, then I'll go with that. Ignore this - I looked at your proposed approach and agree it's better. I'll use `PTE_USER|PTE_UXN==0b01`. Posting shortly... > >> >> For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID >> means pte_protnone(). For pmds, however, we can end up with >> pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_* >> permissions encoded into a valid pmd. That's where a dedicated >> PTE_PROT_NONE bit helped. > > Yes agreed. > >> >> Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*() >> first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since >> the pmd is present, it goes and checks pmd_protnone() which returns >> true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help >> but it looks fragile to rely on them. >> >> So I think for protnone we need to check some other bits (like USER and >> UXN) in addition to PTE_PRESENT_INVALID. > > Yes 100% agree. But using PTE_WRITE|PTE_RDONLY==0b00 is just as valid for that > purpose, I think? > >> >>> This will subtly change behaviour in an edge case though. Imagine: >>> >>> pte_t pte; >>> >>> pte = pte_modify(pte, PAGE_NONE); >>> pte = pte_mkwrite_novma(pte); >>> WARN_ON(pte_protnone(pte)); >>> >>> Should that warning fire or not? Previously, because we had a dedicated bit for >>> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me >>> it's more intuitive if it doesn't fire. Regardless there is no core code that >>> ever does this. Once you have a protnone pte, its terminal - nothing ever >>> modifies it with these helpers AFAICS. >> >> I don't think any core code should try to make page a PAGE_NONE pte >> writeable. > > I looked at some other arches; some (at least alpha and hexagon) will not fire > this warning because they have R and W bits and 0b00 means NONE. Others (x86) > will fire it because they have an explicit NONE bit and don't remove it on > permission change. So I conclude its UB and fine to do either. > >> >>> Personally I think this is a nice tidy up that saves a SW bit in both present >>> and swap ptes. What do you think? (I'll just post the series if its easier to >>> provide feedback in that context). >> >> It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as >> it doesn't affect the pmd case I mentioned above. >> >>>>> But there is a problem with this: __split_huge_pmd_locked() calls >>>>> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So >>>>> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, >>>>> but was trying to avoid the whole thing unravelling so didn't persue. >>>> >>>> Maybe what's wrong is the arm64 implementation setting this bit on a >>>> swap/migration pmd (though we could handle this in the core code as >>>> well, it depends what the other architectures do). The only check for >>>> the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed >>>> into the pmd_present() check. I think it is currently broken as >>>> pmd_present() can return true for a swap pmd after pmd_mkinvalid(). >>> >>> I've posted a fix here: >>> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ >>> >>> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd. >> >> I agree, thanks. >> >