2022-05-09 04:48:19

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

__set_pages_np() not only clears _PAGE_PRESENT and _PAGE_RW, but also
clears _PAGE_GLOBAL to avoid confusing _PAGE_GLOBAL as _PAGE_PROTNONE
when the PTE is not present.

Common usage for __set_pages_p() is to call it after __set_pages_np().
Therefore calling __set_pages_p() after __set_pages_np() clears
_PAGE_GLOBAL, making it unable to globally shared in TLB.

As they are called by set_direct_map_{invalid,default}_noflush(),
pages in direct map cannot be globally shared in TLB after being used by
vmalloc, secretmem, and hibernation.

So set PAGE_KERNEL isntead of __pgprot(_PAGE_PRESENT | _PAGE_RW) in
__set_pages_p().

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
arch/x86/mm/pat/set_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index abf5ed76e4b7..fcb6147c4cd4 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2177,7 +2177,7 @@ static int __set_pages_p(struct page *page, int numpages)
struct cpa_data cpa = { .vaddr = &tempaddr,
.pgd = NULL,
.numpages = numpages,
- .mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+ .mask_set = PAGE_KERNEL,
.mask_clr = __pgprot(0),
.flags = 0};

--
2.32.0



2022-05-09 16:13:31

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

On 5/6/22 00:19, Hyeonggon Yoo wrote:
> __set_pages_np() not only clears _PAGE_PRESENT and _PAGE_RW, but also
> clears _PAGE_GLOBAL to avoid confusing _PAGE_GLOBAL as _PAGE_PROTNONE
> when the PTE is not present.
>
> Common usage for __set_pages_p() is to call it after __set_pages_np().
> Therefore calling __set_pages_p() after __set_pages_np() clears
> _PAGE_GLOBAL, making it unable to globally shared in TLB.
>
> As they are called by set_direct_map_{invalid,default}_noflush(),
> pages in direct map cannot be globally shared in TLB after being used by
> vmalloc, secretmem, and hibernation.
>
> So set PAGE_KERNEL isntead of __pgprot(_PAGE_PRESENT | _PAGE_RW) in
> __set_pages_p().
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
> ---
> arch/x86/mm/pat/set_memory.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index abf5ed76e4b7..fcb6147c4cd4 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2177,7 +2177,7 @@ static int __set_pages_p(struct page *page, int numpages)
> struct cpa_data cpa = { .vaddr = &tempaddr,
> .pgd = NULL,
> .numpages = numpages,
> - .mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> + .mask_set = PAGE_KERNEL,

With SME/SEV, this will also (unintentionally) set the encryption bit, so
I don't think this is correct.

Thanks,
Tom

> .mask_clr = __pgprot(0),
> .flags = 0};
>

2022-05-10 06:22:06

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

On Fri, 2022-05-06 at 14:19 +0900, Hyeonggon Yoo wrote:
> __set_pages_np() not only clears _PAGE_PRESENT and _PAGE_RW, but also
> clears _PAGE_GLOBAL to avoid confusing _PAGE_GLOBAL as _PAGE_PROTNONE
> when the PTE is not present.
>
> Common usage for __set_pages_p() is to call it after
> __set_pages_np().
> Therefore calling __set_pages_p() after __set_pages_np() clears
> _PAGE_GLOBAL, making it unable to globally shared in TLB.
>
> As they are called by set_direct_map_{invalid,default}_noflush(),
> pages in direct map cannot be globally shared in TLB after being used
> by
> vmalloc, secretmem, and hibernation.
>
> So set PAGE_KERNEL isntead of __pgprot(_PAGE_PRESENT | _PAGE_RW) in
> __set_pages_p().

Nice find. I think we can't always set PAGE_KERNEL also because of the
PTI code. It sometimes wants the direct map to be non global.

Maybe something like this?
(_PAGE_PRESENT | _PAGE_RW | _PAGE_GLOBAL) & __default_kernel_pte_mask

That would add back in a little of the "default global" behavior that
was removed in d1440b2, but I think it should be ok because it is
limited to the direct map. Otherwise, I wonder if the existing global
clearing logic is really needed.

2022-05-10 15:26:40

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

On Mon, May 09, 2022 at 11:06:22AM -0500, Tom Lendacky wrote:
> On 5/6/22 00:19, Hyeonggon Yoo wrote:
> > __set_pages_np() not only clears _PAGE_PRESENT and _PAGE_RW, but also
> > clears _PAGE_GLOBAL to avoid confusing _PAGE_GLOBAL as _PAGE_PROTNONE
> > when the PTE is not present.
> >
> > Common usage for __set_pages_p() is to call it after __set_pages_np().
> > Therefore calling __set_pages_p() after __set_pages_np() clears
> > _PAGE_GLOBAL, making it unable to globally shared in TLB.
> >
> > As they are called by set_direct_map_{invalid,default}_noflush(),
> > pages in direct map cannot be globally shared in TLB after being used by
> > vmalloc, secretmem, and hibernation.
> >
> > So set PAGE_KERNEL isntead of __pgprot(_PAGE_PRESENT | _PAGE_RW) in
> > __set_pages_p().
> >
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
> > ---
> > arch/x86/mm/pat/set_memory.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index abf5ed76e4b7..fcb6147c4cd4 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -2177,7 +2177,7 @@ static int __set_pages_p(struct page *page, int numpages)
> > struct cpa_data cpa = { .vaddr = &tempaddr,
> > .pgd = NULL,
> > .numpages = numpages,
> > - .mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > + .mask_set = PAGE_KERNEL,
>
> With SME/SEV, this will also (unintentionally) set the encryption bit, so I
> don't think this is correct.
>

Thank you for catching this.
It seems PAGE_KERNEL was too much for __set_pages_p().

I think __pgprot_mask(_PAGE_KERNEL | _PAGE_RW | _PAGE_GLOBAL) would be correct.
Thoughts?

> Thanks,
> Tom
>
> > .mask_clr = __pgprot(0),
> > .flags = 0};

--
Thanks,
Hyeonggon

2022-05-10 15:49:14

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

On Tue, May 10, 2022 at 12:47:11AM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-05-06 at 14:19 +0900, Hyeonggon Yoo wrote:
> > __set_pages_np() not only clears _PAGE_PRESENT and _PAGE_RW, but also
> > clears _PAGE_GLOBAL to avoid confusing _PAGE_GLOBAL as _PAGE_PROTNONE
> > when the PTE is not present.
> >
> > Common usage for __set_pages_p() is to call it after
> > __set_pages_np().
> > Therefore calling __set_pages_p() after __set_pages_np() clears
> > _PAGE_GLOBAL, making it unable to globally shared in TLB.
> >
> > As they are called by set_direct_map_{invalid,default}_noflush(),
> > pages in direct map cannot be globally shared in TLB after being used
> > by
> > vmalloc, secretmem, and hibernation.
> >
> > So set PAGE_KERNEL isntead of __pgprot(_PAGE_PRESENT | _PAGE_RW) in
> > __set_pages_p().
>
> Nice find. I think we can't always set PAGE_KERNEL also because of the
> PTI code. It sometimes wants the direct map to be non global.
>

Thanks for review!

IIUC __pgprot_mask() already clears _PAGE_GLOBAL from PAGE_KERNEL
when PTI is used.

#define __pgprot_mask(x) __pgprot((x) & __default_kernel_pte_mask)
#define PAGE_KERNEL __pgprot_mask(__PAGE_KERNEL | _ENC)

But yeah, it seems PAGE_KERNEL is too much for this.

> Maybe something like this?
> (_PAGE_PRESENT | _PAGE_RW | _PAGE_GLOBAL) & __default_kernel_pte_mask
>

What about __pgprot_mask(_PAGE_PRESENT | _PAGE_RW | _PAGE_GLOBAL)?

> That would add back in a little of the "default global" behavior that
> was removed in d1440b2, but I think it should be ok because it is
> limited to the direct map.

> Otherwise, I wonder if the existing global
> clearing logic is really needed.

I think it's still needed. pte_present() returning true due to _PAGE_PROTNONE
after __set_pages_np() simply doesn't make sense. No?

--
Thanks,
Hyeonggon

2022-05-10 17:38:31

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

On 5/10/22 06:35, Tom Lendacky wrote:
> I'm wondering if adding a specific helper that takes a boolean to
> indicate whether to set the global flag would be best. I'll let some of
> the MM maintainers comment about that.

First of all, I'm not positive that _PAGE_BIT_PROTNONE is ever used for
kernel mappings. This would all get a lot easier if we decided that
_PAGE_BIT_PROTNONE is only for userspace mappings and we don't have to
worry about it when _PAGE_USER is clear.

Second, the number of places that do these
__set_pages_p()/__set_pages_np() pairs is pretty limited. Some of them
are *quite* unambiguous over whether they are dealing with the direct map:

> int set_direct_map_invalid_noflush(struct page *page)
> {
> return __set_pages_np(page, 1);
> }
>
> int set_direct_map_default_noflush(struct page *page)
> {
> return __set_pages_p(page, 1);
> }

which would make it patently obvious whether __set_pages_p() should
restore the global bit. That would have been a problem in the "old" PTI
days where _some_ of the direct map was exposed to Meltdown. I don't
think we have any of those mappings left, though. They're all aliases
like text and cpu_entry_area.

It would be nice if someone could look into unraveling
_PAGE_BIT_PROTNONE. We could even probably move it to another bit for
kernel mappings if we actually need it (I'm not convinced we do).

2022-05-10 20:44:37

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

On 5/10/22 06:57, Hyeonggon Yoo wrote:
> On Mon, May 09, 2022 at 11:06:22AM -0500, Tom Lendacky wrote:
>> On 5/6/22 00:19, Hyeonggon Yoo wrote:
>>> __set_pages_np() not only clears _PAGE_PRESENT and _PAGE_RW, but also
>>> clears _PAGE_GLOBAL to avoid confusing _PAGE_GLOBAL as _PAGE_PROTNONE
>>> when the PTE is not present.
>>>
>>> Common usage for __set_pages_p() is to call it after __set_pages_np().
>>> Therefore calling __set_pages_p() after __set_pages_np() clears
>>> _PAGE_GLOBAL, making it unable to globally shared in TLB.
>>>
>>> As they are called by set_direct_map_{invalid,default}_noflush(),
>>> pages in direct map cannot be globally shared in TLB after being used by
>>> vmalloc, secretmem, and hibernation.
>>>
>>> So set PAGE_KERNEL isntead of __pgprot(_PAGE_PRESENT | _PAGE_RW) in
>>> __set_pages_p().
>>>
>>> Signed-off-by: Hyeonggon Yoo <[email protected]>
>>> ---
>>> arch/x86/mm/pat/set_memory.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>> index abf5ed76e4b7..fcb6147c4cd4 100644
>>> --- a/arch/x86/mm/pat/set_memory.c
>>> +++ b/arch/x86/mm/pat/set_memory.c
>>> @@ -2177,7 +2177,7 @@ static int __set_pages_p(struct page *page, int numpages)
>>> struct cpa_data cpa = { .vaddr = &tempaddr,
>>> .pgd = NULL,
>>> .numpages = numpages,
>>> - .mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW),
>>> + .mask_set = PAGE_KERNEL,
>>
>> With SME/SEV, this will also (unintentionally) set the encryption bit, so I
>> don't think this is correct.
>>
>
> Thank you for catching this.
> It seems PAGE_KERNEL was too much for __set_pages_p().
>
> I think __pgprot_mask(_PAGE_KERNEL | _PAGE_RW | _PAGE_GLOBAL) would be correct.
> Thoughts?

That should work, but you are counting on this function being called in a
specific situation. Should this function ever be called from some other
place in the kernel, in the future, that might inadvertently apply the
global setting when it shouldn't.

I'm wondering if adding a specific helper that takes a boolean to indicate
whether to set the global flag would be best. I'll let some of the MM
maintainers comment about that.

Thanks,
Tom

>
>> Thanks,
>> Tom
>>
>>> .mask_clr = __pgprot(0),
>>> .flags = 0};
>

2022-05-10 21:39:54

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/cpa: set PAGE_KERNEL in __set_pages_p()

On Tue, 2022-05-10 at 20:50 +0900, Hyeonggon Yoo wrote:
> Thanks for review!
>
> IIUC __pgprot_mask() already clears _PAGE_GLOBAL from PAGE_KERNEL
> when PTI is used.
>
> #define __pgprot_mask(x) __pgprot((x) &
> __default_kernel_pte_mask)
> #define PAGE_KERNEL __pgprot_mask(__PAGE_KERNEL | _ENC)
>
> But yeah, it seems PAGE_KERNEL is too much for this.

Oh, yep, you're right.

>
> > Maybe something like this?
> > (_PAGE_PRESENT | _PAGE_RW | _PAGE_GLOBAL) &
> > __default_kernel_pte_mask
> >
>
> What about __pgprot_mask(_PAGE_PRESENT | _PAGE_RW | _PAGE_GLOBAL)?
>
> > That would add back in a little of the "default global" behavior
> > that
> > was removed in d1440b2, but I think it should be ok because it is
> > limited to the direct map.
> > Otherwise, I wonder if the existing global
> > clearing logic is really needed.
>
> I think it's still needed. pte_present() returning true due to
> _PAGE_PROTNONE
> after __set_pages_np() simply doesn't make sense. No?

Yea, I just meant we don't need PROTNONE in the kernel PTEs (I think),
so the kernel is forced to do weird stuff in CPA so that it can share
the pte helpers with userspace.

I have not thought this all the way through, but say we declared that
PROT_NONE doesn't exist in kernel pte's, then pte_present() could be
like:

static inline int pte_present(pte_t a)
{
pteval_t val = pte_flags(a)
bool kernel_p = val & (_PAGE_USER | _PAGE_PRESENT) ==
_PAGE_PRESENT;
bool user_p = (val & _PAGE_USER) &&
(val & (_PAGE_PRESENT | _PAGE_PROTNONE));

/*
* PROT_NONE does not exist for kernel PTEs, but global and
* not present can appear in other cases. Check each
* differently depending on the user bit.
*/
return kernel_p || user_p;
}

Not sure where else it ties in. I think that is what Dave was getting
at in the other thread - to figure out all the places that need to
change to remove the meaning from kernel PTEs.

Re-setting global seems valid too.

2022-05-11 08:55:45

by Hyeonggon Yoo

[permalink] [raw]
Subject: Is _PAGE_PROTNONE set only for user mappings?

On Tue, May 10, 2022 at 07:39:30AM -0700, Dave Hansen wrote:
> On 5/10/22 06:35, Tom Lendacky wrote:
> > I'm wondering if adding a specific helper that takes a boolean to
> > indicate whether to set the global flag would be best. I'll let some of
> > the MM maintainers comment about that.
>
> First of all, I'm not positive that _PAGE_BIT_PROTNONE is ever used for
> kernel mappings. This would all get a lot easier if we decided that
> _PAGE_BIT_PROTNONE is only for userspace mappings and we don't have to
> worry about it when _PAGE_USER is clear.

After quickly skimming code it seems the place that actually sets _PAGE_PROTNONE
is via mm/mmap.c's protection_map:

> /* description of effects of mapping type and prot in current implementation.
> * this is due to the limited x86 page protection hardware. The expected
> * behavior is in parens:
> *
> * map_type prot
> * PROT_NONE PROT_READ PROT_WRITE PROT_EXEC
> * MAP_SHARED r: (no) no r: (yes) yes r: (no) yes r: (no) yes
> * w: (no) no w: (no) no w: (yes) yes w: (no) no
> * x: (no) no x: (no) yes x: (no) yes x: (yes) yes
> *
> * MAP_PRIVATE r: (no) no r: (yes) yes r: (no) yes r: (no) yes
> * w: (no) no w: (no) no w: (copy) copy w: (no) no
> * x: (no) no x: (no) yes x: (no) yes x: (yes) yes
> *
> */
> pgprot_t protection_map[16] = {
> __P000, __P001, __P010, __P011, __P100, __P101, __P110, __P111,
> __S000, __S001, __S010, __S011, __S100, __S101, __S110, __S111
> };

Where __P000, __S000 is PAGE_NONE (_PAGE_ACCESSED | _PAGE_PROTNONE).

And protection_map is accessed via:
> pgprot_t vm_get_page_prot(unsigned long vm_flags)
> {
> pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags &
> (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
> pgprot_val(arch_vm_get_page_prot(vm_flags)));
>
> return arch_filter_pgprot(ret);
> }
> EXPORT_SYMBOL(vm_get_page_prot);

I guess it's only set for processes' VMA if no caller is abusing
vm_get_page_prot() for kernel mappings.

But yeah, just quick guessing does not make us convinced.
Let's Cc people working on mm.

If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay
not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address,
because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel
address. right?

>
> Second, the number of places that do these
> __set_pages_p()/__set_pages_np() pairs is pretty limited. Some of them
> are *quite* unambiguous over whether they are dealing with the direct map:
>
> > int set_direct_map_invalid_noflush(struct page *page)
> > {
> > return __set_pages_np(page, 1);
> > }
> >
> > int set_direct_map_default_noflush(struct page *page)
> > {
> > return __set_pages_p(page, 1);
> > }
>
> which would make it patently obvious whether __set_pages_p() should
> restore the global bit. That would have been a problem in the "old" PTI
> days where _some_ of the direct map was exposed to Meltdown. I don't
> think we have any of those mappings left, though. They're all aliases
> like text and cpu_entry_area.
>
> It would be nice if someone could look into unraveling
> _PAGE_BIT_PROTNONE. We could even probably move it to another bit for
> kernel mappings if we actually need it (I'm not convinced we do).

--
Thanks,
Hyeonggon

2022-05-14 01:55:43

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On Thu, May 12, 2022 at 11:37:48AM +0100, Mel Gorman wrote:
> On Wed, May 11, 2022 at 02:20:45PM +0900, Hyeonggon Yoo wrote:
> > > pgprot_t vm_get_page_prot(unsigned long vm_flags)
> > > {
> > > pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags &
> > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
> > > pgprot_val(arch_vm_get_page_prot(vm_flags)));
> > >
> > > return arch_filter_pgprot(ret);
> > > }
> > > EXPORT_SYMBOL(vm_get_page_prot);
> >
> > I guess it's only set for processes' VMA if no caller is abusing
> > vm_get_page_prot() for kernel mappings.
> >
> > But yeah, just quick guessing does not make us convinced.
> > Let's Cc people working on mm.
> >
> > If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay
> > not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address,
> > because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel
> > address. right?
> >
>
> I'm not aware of a case where _PAGE_BIT_PROTNONE is used for a kernel
> address expecting PROT_NONE semantics instead of the global bit. NUMA
> Balancing is not going to accidentally treat a kernel address as if it's
> a NUMA hinting fault. By the time a fault is determining if a PTE access
> is a numa hinting fault or accesssing a PROT_NONE region, it has been
> established that it is a userspace address backed by a valid VMA.
>

Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c
expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear
in the code that _PAGE_PROTNONE is only used for user mappings.

How does below look?

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 40497a9020c6..f8d02b91a90c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -35,7 +35,10 @@
#define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4

/* If _PAGE_BIT_PRESENT is clear, we use these: */
-/* - if the user mapped it with PROT_NONE; pte_present gives true */
+/*
+ * if the user mapped it with PROT_NONE; pte_present gives true
+ * this is only used for user mappings (with _PAGE_BIT_USER)
+ */
#define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL

#define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
@@ -115,7 +118,8 @@
#define _PAGE_DEVMAP (_AT(pteval_t, 0))
#endif

-#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \
+ (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE))

/*
* Set of bits not changed in pte_modify. The pte's

--
Thanks,
Hyeonggon

2022-05-14 03:07:56

by Mel Gorman

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On Wed, May 11, 2022 at 02:20:45PM +0900, Hyeonggon Yoo wrote:
> > pgprot_t vm_get_page_prot(unsigned long vm_flags)
> > {
> > pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags &
> > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
> > pgprot_val(arch_vm_get_page_prot(vm_flags)));
> >
> > return arch_filter_pgprot(ret);
> > }
> > EXPORT_SYMBOL(vm_get_page_prot);
>
> I guess it's only set for processes' VMA if no caller is abusing
> vm_get_page_prot() for kernel mappings.
>
> But yeah, just quick guessing does not make us convinced.
> Let's Cc people working on mm.
>
> If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay
> not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address,
> because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel
> address. right?
>

I'm not aware of a case where _PAGE_BIT_PROTNONE is used for a kernel
address expecting PROT_NONE semantics instead of the global bit. NUMA
Balancing is not going to accidentally treat a kernel address as if it's
a NUMA hinting fault. By the time a fault is determining if a PTE access
is a numa hinting fault or accesssing a PROT_NONE region, it has been
established that it is a userspace address backed by a valid VMA.

--
Mel Gorman
SUSE Labs

2022-05-16 20:23:38

by Dave Hansen

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On 5/12/22 22:33, Hyeonggon Yoo wrote:
> Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c
> expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear
> in the code that _PAGE_PROTNONE is only used for user mappings.
>
> How does below look?
>
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 40497a9020c6..f8d02b91a90c 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -35,7 +35,10 @@
> #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
>
> /* If _PAGE_BIT_PRESENT is clear, we use these: */
> -/* - if the user mapped it with PROT_NONE; pte_present gives true */
> +/*
> + * if the user mapped it with PROT_NONE; pte_present gives true
> + * this is only used for user mappings (with _PAGE_BIT_USER)
> + */
> #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
>
> #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
> @@ -115,7 +118,8 @@
> #define _PAGE_DEVMAP (_AT(pteval_t, 0))
> #endif
>
> -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
> +#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \
> + (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE))
>
> /*
> * Set of bits not changed in pte_modify. The pte's

I don't like the idea of _PAGE_BIT_USER being so implicit. It is
something kernel users should know explicitly that they are messing with.

I was thinking of something more along the lines of taking the
set_memory.c code and ensuring that it never sets (or even observes)
_PAGE_BIT_GLOBAL on a _PAGE_USER mapping. There was also a question of
if set_memory.c is ever used on userspace mappings. It would be good to
validate whether it's possible in-tree today and if not, enforce that
_PAGE_USER PTEs should never even be observed with set_memory.c.

The arch/x86/mm/dump_pagetables.c code is also a reasonable place to put
assumptions about the page tables since it walks *everything* when asked.

2022-05-17 03:07:51

by Mel Gorman

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On Fri, May 13, 2022 at 02:33:38PM +0900, Hyeonggon Yoo wrote:
> On Thu, May 12, 2022 at 11:37:48AM +0100, Mel Gorman wrote:
> > On Wed, May 11, 2022 at 02:20:45PM +0900, Hyeonggon Yoo wrote:
> > > > pgprot_t vm_get_page_prot(unsigned long vm_flags)
> > > > {
> > > > pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags &
> > > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
> > > > pgprot_val(arch_vm_get_page_prot(vm_flags)));
> > > >
> > > > return arch_filter_pgprot(ret);
> > > > }
> > > > EXPORT_SYMBOL(vm_get_page_prot);
> > >
> > > I guess it's only set for processes' VMA if no caller is abusing
> > > vm_get_page_prot() for kernel mappings.
> > >
> > > But yeah, just quick guessing does not make us convinced.
> > > Let's Cc people working on mm.
> > >
> > > If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay
> > > not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address,
> > > because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel
> > > address. right?
> > >
> >
> > I'm not aware of a case where _PAGE_BIT_PROTNONE is used for a kernel
> > address expecting PROT_NONE semantics instead of the global bit. NUMA
> > Balancing is not going to accidentally treat a kernel address as if it's
> > a NUMA hinting fault. By the time a fault is determining if a PTE access
> > is a numa hinting fault or accesssing a PROT_NONE region, it has been
> > established that it is a userspace address backed by a valid VMA.
> >
>
> Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c
> expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear
> in the code that _PAGE_PROTNONE is only used for user mappings.
>
> How does below look?
>

I've no strong objections. I worry that this somehow could be used to
set PAGE_USER on a kernel mapping page and maybe a comment would be more
appropriate. However, I'm failing to imagine how NUMA balancing could be
fooled into doing that.

--
Mel Gorman
SUSE Labs

2022-05-23 08:18:44

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote:
> On 5/12/22 22:33, Hyeonggon Yoo wrote:
> > Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c
> > expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear
> > in the code that _PAGE_PROTNONE is only used for user mappings.
> >
> > How does below look?
> >
> > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> > index 40497a9020c6..f8d02b91a90c 100644
> > --- a/arch/x86/include/asm/pgtable_types.h
> > +++ b/arch/x86/include/asm/pgtable_types.h
> > @@ -35,7 +35,10 @@
> > #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
> >
> > /* If _PAGE_BIT_PRESENT is clear, we use these: */
> > -/* - if the user mapped it with PROT_NONE; pte_present gives true */
> > +/*
> > + * if the user mapped it with PROT_NONE; pte_present gives true
> > + * this is only used for user mappings (with _PAGE_BIT_USER)
> > + */
> > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
> >
> > #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
> > @@ -115,7 +118,8 @@
> > #define _PAGE_DEVMAP (_AT(pteval_t, 0))
> > #endif
> >
> > -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
> > +#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \
> > + (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE))
> >
> > /*
> > * Set of bits not changed in pte_modify. The pte's
>
> I don't like the idea of _PAGE_BIT_USER being so implicit. It is
> something kernel users should know explicitly that they are messing with.
>

Sounds reasonable. Better explicit than implicit.

> I was thinking of something more along the lines of taking the
> set_memory.c code and ensuring that it never sets (or even observes)
> _PAGE_BIT_GLOBAL on a _PAGE_USER mapping.

Yeah that would be a bit more explicit solution.

> There was also a question of
> if set_memory.c is ever used on userspace mappings. It would be good to
> validate whether it's possible in-tree today and if not, enforce that
> _PAGE_USER PTEs should never even be observed with set_memory.c.

Simply adding dump_stack() tells me my kernel on my machine does not use
set_memory.c for userspace mappings but Hmm I'll take a look.

> The arch/x86/mm/dump_pagetables.c code is also a reasonable place to put
> assumptions about the page tables since it walks *everything* when asked.

Thanks for that information!

Thanks,
Hyeonggon

2022-05-25 12:18:31

by Sean Christopherson

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On Sun, May 22, 2022, Hyeonggon Yoo wrote:
> On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote:
> > I was thinking of something more along the lines of taking the
> > set_memory.c code and ensuring that it never sets (or even observes)
> > _PAGE_BIT_GLOBAL on a _PAGE_USER mapping.
>
> Yeah that would be a bit more explicit solution.
>
> > There was also a question of
> > if set_memory.c is ever used on userspace mappings. It would be good to
> > validate whether it's possible in-tree today and if not, enforce that
> > _PAGE_USER PTEs should never even be observed with set_memory.c.
>
> Simply adding dump_stack() tells me my kernel on my machine does not use
> set_memory.c for userspace mappings but Hmm I'll take a look.

vc_slow_virt_to_phys() uses lookup_address_in_pgd() with user mappings, but that
code is all but guaranteed to be buggy, e.g. doesn't guard against concurrent
modifications to user mappings.

show_fault_oops() can also call into lookup_address_in_pgd() with a user mapping,
though at that point the kernel has bigger problems since it's executing from user
memory.

And up until commits 44187235cbcc ("KVM: x86/mmu: fix potential races when walking
host page table") and 643d95aac59a ("Revert "x86/mm: Introduce lookup_address_in_mm()""),
KVM had a similar bug.

Generally speaking, set_memory.c is not equipped to play nice with user mappings.
It mostly "works", but there are races galore. IMO, hardening set_memory.c to scream
if it's handed a user address or encounters _PAGE_USER PTEs would be a very good thing.

2022-05-27 12:08:24

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On Tue, May 24, 2022 at 08:22:30PM +0000, Sean Christopherson wrote:
> On Sun, May 22, 2022, Hyeonggon Yoo wrote:
> > On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote:
> > > I was thinking of something more along the lines of taking the
> > > set_memory.c code and ensuring that it never sets (or even observes)
> > > _PAGE_BIT_GLOBAL on a _PAGE_USER mapping.
> >
> > Yeah that would be a bit more explicit solution.
> >
> > > There was also a question of
> > > if set_memory.c is ever used on userspace mappings. It would be good to
> > > validate whether it's possible in-tree today and if not, enforce that
> > > _PAGE_USER PTEs should never even be observed with set_memory.c.
> >
> > Simply adding dump_stack() tells me my kernel on my machine does not use
> > set_memory.c for userspace mappings but Hmm I'll take a look.
>
> vc_slow_virt_to_phys() uses lookup_address_in_pgd() with user mappings, but that
> code is all but guaranteed to be buggy, e.g. doesn't guard against concurrent
> modifications to user mappings.
>
> show_fault_oops() can also call into lookup_address_in_pgd() with a user mapping,
> though at that point the kernel has bigger problems since it's executing from user
> memory.
>
> And up until commits 44187235cbcc ("KVM: x86/mmu: fix potential races when walking
> host page table") and 643d95aac59a ("Revert "x86/mm: Introduce lookup_address_in_mm()""),
> KVM had a similar bug.

Thanks for your helpful insight.

I was curious if set_memory*() helpers are used for user mappings.
with some quick look ptrace() and uprobes (where updating application's text is needed)
use kmap + memcpy or replace_page() instead of set_memory*() API.

_lookup_address_cpa() uses init_mm.pgd when cpa.pgd is not specified
and the only place that passes pgd is efi subsystem (efi_mm.pgd), which is not a
userspace.

So it is *obvious* that set_memory*() functions should not be used
for user mappings. because that will only result in updating only init_mm's
page table.

Therefore answering to the first question ('do we really need to unset _PAGE_GLOBAL when we're
clearing _PAGE_PRESENT in set_memory.c to avoid confusing it as
_PAGE_PROTNONE?'):

we don't need to consider PROT_NONE semantics in set_memory.c
because we don't (shouldn't) change user mappings in it and _PAGE_PROTNONE is not used
for kernel mappings.

> Generally speaking, set_memory.c is not equipped to play nice with user mappings.
> It mostly "works", but there are races galore. IMO, hardening set_memory.c to scream
> if it's handed a user address or encounters _PAGE_USER PTEs would be a very good thing.

I agree that would be a good thing too.

Thanks,
Hyeonggon

2022-05-30 04:11:38

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote:
> On 5/12/22 22:33, Hyeonggon Yoo wrote:
> > Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c
> > expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear
> > in the code that _PAGE_PROTNONE is only used for user mappings.
> >
> > How does below look?
> >
> > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> > index 40497a9020c6..f8d02b91a90c 100644
> > --- a/arch/x86/include/asm/pgtable_types.h
> > +++ b/arch/x86/include/asm/pgtable_types.h
> > @@ -35,7 +35,10 @@
> > #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
> >
> > /* If _PAGE_BIT_PRESENT is clear, we use these: */
> > -/* - if the user mapped it with PROT_NONE; pte_present gives true */
> > +/*
> > + * if the user mapped it with PROT_NONE; pte_present gives true
> > + * this is only used for user mappings (with _PAGE_BIT_USER)
> > + */
> > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
> >
> > #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
> > @@ -115,7 +118,8 @@
> > #define _PAGE_DEVMAP (_AT(pteval_t, 0))
> > #endif
> >
> > -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
> > +#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \
> > + (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE))
> >
> > /*
> > * Set of bits not changed in pte_modify. The pte's
>
> I don't like the idea of _PAGE_BIT_USER being so implicit. It is
> something kernel users should know explicitly that they are messing with.
>
> I was thinking of something more along the lines of taking the
> set_memory.c code and ensuring that it never sets (or even observes)
> _PAGE_BIT_GLOBAL on a _PAGE_USER mapping. There was also a question of
> if set_memory.c is ever used on userspace mappings. It would be good to
> validate whether it's possible in-tree today and if not, enforce that
> _PAGE_USER PTEs should never even be observed with set_memory.c.

Writing code I'm a bit confused:
commit d1440b23c922d8 ("x86/mm: Factor out pageattr
_PAGE_GLOBAL setting") says:

"This unconditional setting of _PAGE_GLOBAL is a problem when we have
PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
not."

Is this this sentence not valid anymore in PTI,
and just unconditionally setting _PAGE_GLOBAL would be okay in kernel
side regardless of PTI?

I'm wondering it because previously I thought "Let's not clear
_PAGE_GLOBAL in set_memory.c for kernel mappings and make pmd/pte_present() not
confuse when _PAGE_USER is not set"

But you don't like it as it's a bit implicit.
Then I wonder - how do we know when to set _PAGE_GLOBAL again?

> The arch/x86/mm/dump_pagetables.c code is also a reasonable place to put
> assumptions about the page tables since it walks *everything* when asked.

2022-06-03 05:11:36

by Dave Hansen

[permalink] [raw]
Subject: Re: Is _PAGE_PROTNONE set only for user mappings?

On 5/29/22 03:32, Hyeonggon Yoo wrote:
> On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote:
> Writing code I'm a bit confused:
> commit d1440b23c922d8 ("x86/mm: Factor out pageattr
> _PAGE_GLOBAL setting") says:
>
> "This unconditional setting of _PAGE_GLOBAL is a problem when we have
> PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
> not."
>
> Is this this sentence not valid anymore in PTI,
> and just unconditionally setting _PAGE_GLOBAL would be okay in kernel
> side regardless of PTI?

I believe it's still valid.

IIRC, there are three cases:

1. No KPTI. All kernel mappings are _PAGE_GLOBAL. Basically, for
present mappings, if _PAGE_USER is clear, _PAGE_GLOBAL is set.
2. KPTI with PCID hardware support (or in a few other cases): The kernel
image is mostly non-global. Anything mapped into userspace *is*
marked global, like entry text.
3. KPTI without PCIDs: Basically case #2, but with more of the kernel
image left global.

So, not only are there different KPTI modes, there a different pars of
the kernel that require different _PAGE_GLOBAL behavior.

pti_kernel_image_global_ok() in arch/x86/mm/pti.c explains it pretty well.