by Will Deacon

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On Tue, Apr 30, 2024 at 03:02:21PM +0100, Ryan Roberts wrote:
> On 30/04/2024 14:30, Will Deacon wrote:
> > On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
> >> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> >> index dd9ee67d1d87..de62e6881154 100644
> >> --- a/arch/arm64/include/asm/pgtable-prot.h
> >> +++ b/arch/arm64/include/asm/pgtable-prot.h
> >> @@ -18,14 +18,7 @@
> >> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
> >> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
> >> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
> >> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> >> -
> >> -/*
> >> - * This bit indicates that the entry is present i.e. pmd_page()
> >> - * still points to a valid huge page in memory even if the pmd
> >> - * has been invalidated.
> >> - */
> >> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
> >> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
> >
> > So this now overlaps with AttrIndx[3] if FEAT_AIE is implemented. Although
> > this shouldn't matter on the face of things because it's only used for
> > invalid entries, we originally moved the PROT_NONE bit from 2 to 57 back
> > in 3676f9ef5481 ("arm64: Move PTE_PROT_NONE higher up") because it was
> > possible to change the memory type for PROT_NONE mappings via some
> > drivers.
>
> I'm not sure I follow your argument.
>
> 1. We don't support FEAT_AIE (currently) so AttrIndx[3] is always going to be 0
> for valid ptes. Drivers are only calling our helpers (e.g.
> pgprot_writecombine(), right?) and those only know how to set AttrIndx[2:0].

Sure, but we might want to use it in future and chucking that out for the
sake of uffd doesn't seem like an obviously worthwhile trade-off to me.

> 2. PMD_PRESENT_INVALID was already occupying bit 59. So wouldn't the same shape
> of concern apply there too for PMDs that have been invalidated, where the driver
> then comes along and changes the memory type? (Perhaps because
> PMD_PRESENT_INVALID is only set while the PTL is held this can't happen).

I was mainly thinking of the PROT_NONE case, to be honest with you. I
struggle to envisage how a driver could sensibly mess with the memory
type for anonymous mappings, let alone huge pages! But perhaps I just
lack imagination :)

> 3. I had this same vague concern about confusion due to overlapping bit 59,
> which is why in the next patch, I'm moving it to the NS bit.
>
> Experience tells me that when I'm arguing confidently with someone who is much
> more expert than me, then I'm using wrong... so what have I missed? :)
>
> >
> > Moving the field to the NS bit (as you do later in the series) resolves
> > this, but the architecture currently says that the NS bit is RES0. How
> > can we guarantee that it won't be repurposed by hardware in future?
>
> Well it remains free for use in valid entries of course.

I think that's what I'm actually questioning! RES0 doesn't mean that
tomorrow's whizz-bang CPU extension isn't allowed to use it, but that's
a guarantee that we need if we're going to use it for our own purposes.

> So I guess you are asking how to guarantee we won't also need to be able
> to modify it on the fly for PROT_NONE entries? I don't have a definite
> answer, but I've been working on the assumption that the architecture
> introducing a feature that is only needed in states where NS is not needed
> is unlikely (so using that bit for the feature is also unlikely). And then
> needing to manipulate that feature dyanically for PROT_NONE mappings is
> even less likely.

The architects are quite good at inventing unlikely features :) SVE
blowing the sigcontext comes to mind. I think we should seek
clarification that the NS bit won't be allocated in the future if we are
going to use it for our own stuff.

> If all else fails we could move it to nG (bit 11) to free up bit 5. But that
> requires a bit more fiddling with the swap pte format.

Oh, cunning, I hadn't thought of that. I think that's probably a better
approach if the NS bit isn't guaranteed to be left alone by the
architecture.

> >> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> >> */
> >> static inline int pte_protnone(pte_t pte)
> >> {
> >> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
> >> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
> >> }
> >
> > Why do we need to check pte_user_*() here? Isn't PROT_NONE the only case
> > in which a pte will have PTE_INVALID set?
>
> I guess for *ptes* this is technically correct. But I was trying to make the
> format generic and reusable for *pmds* too. (pmd_protnone() wraps
> pte_protnone()). For pmds, PTE_INVALID also represents invalid-but-present PMDs
> (i.e. pmds on which pmd_mkinvalid() has been called).
>
> The intention is that PTE_INVALID indicates "present but not valid in HW". And
> (!pte_user(pte) && !pte_user_exec(pte)) indicates the PROT_NONE permission.

Ok, but it does mean the compiler can't emit a nice TBNZ instruction for
the pte macro. Can you either seperate out the pmd/pte versions of the
macro or just add a comment along the lines of what you said above, please?

Cheers,

Will

2024-04-30 15:39:41

by Ryan Roberts

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30/04/2024 16:04, Will Deacon wrote:
> On Tue, Apr 30, 2024 at 03:02:21PM +0100, Ryan Roberts wrote:
>> On 30/04/2024 14:30, Will Deacon wrote:
>>> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
>>>> index dd9ee67d1d87..de62e6881154 100644
>>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>>> @@ -18,14 +18,7 @@
>>>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
>>>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
>>>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
>>>> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>>>> -
>>>> -/*
>>>> - * This bit indicates that the entry is present i.e. pmd_page()
>>>> - * still points to a valid huge page in memory even if the pmd
>>>> - * has been invalidated.
>>>> - */
>>>> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
>>>> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
>>>
>>> So this now overlaps with AttrIndx[3] if FEAT_AIE is implemented. Although
>>> this shouldn't matter on the face of things because it's only used for
>>> invalid entries, we originally moved the PROT_NONE bit from 2 to 57 back
>>> in 3676f9ef5481 ("arm64: Move PTE_PROT_NONE higher up") because it was
>>> possible to change the memory type for PROT_NONE mappings via some
>>> drivers.
>>
>> I'm not sure I follow your argument.
>>
>> 1. We don't support FEAT_AIE (currently) so AttrIndx[3] is always going to be 0
>> for valid ptes. Drivers are only calling our helpers (e.g.
>> pgprot_writecombine(), right?) and those only know how to set AttrIndx[2:0].
>
> Sure, but we might want to use it in future and chucking that out for the
> sake of uffd doesn't seem like an obviously worthwhile trade-off to me.

Totally agree, which is why I move it in the next patch. I was just commenting
that its not a problem for this intermediate state between patches because the
kernel doesn't support FEAT_AIE today.

>
>> 2. PMD_PRESENT_INVALID was already occupying bit 59. So wouldn't the same shape
>> of concern apply there too for PMDs that have been invalidated, where the driver
>> then comes along and changes the memory type? (Perhaps because
>> PMD_PRESENT_INVALID is only set while the PTL is held this can't happen).
>
> I was mainly thinking of the PROT_NONE case, to be honest with you. I
> struggle to envisage how a driver could sensibly mess with the memory
> type for anonymous mappings, let alone huge pages! But perhaps I just
> lack imagination :)
>
>> 3. I had this same vague concern about confusion due to overlapping bit 59,
>> which is why in the next patch, I'm moving it to the NS bit.
>>
>> Experience tells me that when I'm arguing confidently with someone who is much
>> more expert than me, then I'm using wrong... so what have I missed? :)

Sorry that was meant to say "I'm *usually* wrong" in case it wasn't obvious.

>>
>>>
>>> Moving the field to the NS bit (as you do later in the series) resolves
>>> this, but the architecture currently says that the NS bit is RES0. How
>>> can we guarantee that it won't be repurposed by hardware in future?
>>
>> Well it remains free for use in valid entries of course.
>
> I think that's what I'm actually questioning!

Sorry I'm not sure if we are talking cross-purposes... PTE_INVALID is only
overlayed on the NS bit when PTE_VALID=0. So the NS bit remains 0 for valid PTEs
today, and in future, any new feature control within the bit could be set as
required for valid ptes.

But I *think* you are concerned about the possibility that any future feature
control that occupies that bit could also require persisting in pte/pmd even
when its invalidated (i.e. pmd_mkinvalid(pmd) or pte_modify(pte, PAGE_NONE))?

> RES0 doesn't mean that
> tomorrow's whizz-bang CPU extension isn't allowed to use it, but that's
> a guarantee that we need if we're going to use it for our own purposes.
>
>> So I guess you are asking how to guarantee we won't also need to be able
>> to modify it on the fly for PROT_NONE entries? I don't have a definite
>> answer, but I've been working on the assumption that the architecture
>> introducing a feature that is only needed in states where NS is not needed
>> is unlikely (so using that bit for the feature is also unlikely). And then
>> needing to manipulate that feature dyanically for PROT_NONE mappings is
>> even less likely.
>
> The architects are quite good at inventing unlikely features :) SVE
> blowing the sigcontext comes to mind. I think we should seek
> clarification that the NS bit won't be allocated in the future if we are
> going to use it for our own stuff.

OK, clarification sought; the architects are *not* willing to upgrade the res0
to "IGNORED"...

>
>> If all else fails we could move it to nG (bit 11) to free up bit 5. But that
>> requires a bit more fiddling with the swap pte format.
>
> Oh, cunning, I hadn't thought of that. I think that's probably a better
> approach if the NS bit isn't guaranteed to be left alone by the
> architecture.

..So I'll change patch 2 to move the bit to nG (bit 11). Does that work for
you? I *think* an alternative could be the contig bit. But nG feels safest to me
- I'd have to think a lot harder about contig bit.

>
>>>> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>>>> */
>>>> static inline int pte_protnone(pte_t pte)
>>>> {
>>>> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
>>>> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
>>>> }
>>>
>>> Why do we need to check pte_user_*() here? Isn't PROT_NONE the only case
>>> in which a pte will have PTE_INVALID set?
>>
>> I guess for *ptes* this is technically correct. But I was trying to make the
>> format generic and reusable for *pmds* too. (pmd_protnone() wraps
>> pte_protnone()). For pmds, PTE_INVALID also represents invalid-but-present PMDs
>> (i.e. pmds on which pmd_mkinvalid() has been called).
>>
>> The intention is that PTE_INVALID indicates "present but not valid in HW". And
>> (!pte_user(pte) && !pte_user_exec(pte)) indicates the PROT_NONE permission.
>
> Ok, but it does mean the compiler can't emit a nice TBNZ instruction for
> the pte macro. Can you either seperate out the pmd/pte versions of the
> macro or just add a comment along the lines of what you said above, please?

I'll add a comment; I'd rather not have the implementations diverge unless there
is a clear performance advantage.

Thanks for the review!

>
> Cheers,
>
> Will