2018-02-08 12:56:36

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

AMD SME claims one bit from physical address to indicate whether the
page is encrypted or not. To achieve that we clear out the bit from
__PHYSICAL_MASK.

The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
For instance for upcoming Intel Multi-Key Total Memory Encryption.

Let's factor it out into separate feature with own Kconfig handle.

It also helps with overhead of AMD SME. It saves more than 3k in .text
on defconfig + AMD_MEM_ENCRYPT:

add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)

We would need to return to this once we have infrastructure to patch
constants in code. That's good candidate for it.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/Kconfig | 4 ++++
arch/x86/boot/compressed/pagetable.c | 3 +++
arch/x86/include/asm/page_types.h | 8 +++++++-
arch/x86/mm/mem_encrypt.c | 3 +++
arch/x86/mm/pgtable.c | 5 +++++
5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b52cdf48ad26..ffd9ef3f6ca6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -332,6 +332,9 @@ config ARCH_SUPPORTS_UPROBES
config FIX_EARLYCON_MEM
def_bool y

+config DYNAMIC_PHYSICAL_MASK
+ bool
+
config PGTABLE_LEVELS
int
default 5 if X86_5LEVEL
@@ -1469,6 +1472,7 @@ config ARCH_HAS_MEM_ENCRYPT
config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+ select DYNAMIC_PHYSICAL_MASK
---help---
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index b5e5e02f8cde..4318ac0af815 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -16,6 +16,9 @@
#define __pa(x) ((unsigned long)(x))
#define __va(x) ((void *)((unsigned long)(x)))

+/* No need in adjustable __PHYSICAL_MASK during decompresssion phase */
+#undef CONFIG_DYNAMIC_PHYSICAL_MASK
+
/*
* The pgtable.h and mm/ident_map.c includes make use of the SME related
* information which is not used in the compressed image support. Un-define
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 1e53560a84bb..c85e15010f48 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -17,7 +17,6 @@
#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))

-#define __PHYSICAL_MASK ((phys_addr_t)(__sme_clr((1ULL << __PHYSICAL_MASK_SHIFT) - 1)))
#define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1)

/* Cast *PAGE_MASK to a signed type so that it is sign-extended if
@@ -55,6 +54,13 @@

#ifndef __ASSEMBLY__

+#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
+extern phys_addr_t physical_mask;
+#define __PHYSICAL_MASK physical_mask
+#else
+#define __PHYSICAL_MASK ((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#endif
+
extern int devmem_is_allowed(unsigned long pagenr);

extern unsigned long max_low_pfn_mapped;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 1a53071e2e17..18954f97f3da 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -999,6 +999,7 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
/* SEV state cannot be controlled by a command line option */
sme_me_mask = me_mask;
sev_enabled = true;
+ physical_mask &= ~sme_me_mask;
return;
}

@@ -1033,4 +1034,6 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
sme_me_mask = 0;
else
sme_me_mask = active_by_default ? me_mask : 0;
+
+ physical_mask &= ~sme_me_mask;
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..a4dfe85f2fd8 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -7,6 +7,11 @@
#include <asm/fixmap.h>
#include <asm/mtrr.h>

+#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
+phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+EXPORT_SYMBOL(physical_mask);
+#endif
+
#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)

#ifdef CONFIG_HIGHPTE
--
2.15.1



2018-02-14 04:11:56

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
> AMD SME claims one bit from physical address to indicate whether the
> page is encrypted or not. To achieve that we clear out the bit from
> __PHYSICAL_MASK.

I was actually working on a suggestion by Linus to use one of the software
page table bits to indicate encryption and translate that to the hardware
bit when writing the actual page table entry. With that, __PHYSICAL_MASK
would go back to its original definition.

Thanks,
Tom

>
> The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
> For instance for upcoming Intel Multi-Key Total Memory Encryption.
>
> Let's factor it out into separate feature with own Kconfig handle.
>
> It also helps with overhead of AMD SME. It saves more than 3k in .text
> on defconfig + AMD_MEM_ENCRYPT:
>
> add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)
>
> We would need to return to this once we have infrastructure to patch
> constants in code. That's good candidate for it.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> arch/x86/Kconfig | 4 ++++
> arch/x86/boot/compressed/pagetable.c | 3 +++
> arch/x86/include/asm/page_types.h | 8 +++++++-
> arch/x86/mm/mem_encrypt.c | 3 +++
> arch/x86/mm/pgtable.c | 5 +++++
> 5 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b52cdf48ad26..ffd9ef3f6ca6 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -332,6 +332,9 @@ config ARCH_SUPPORTS_UPROBES
> config FIX_EARLYCON_MEM
> def_bool y
>
> +config DYNAMIC_PHYSICAL_MASK
> + bool
> +
> config PGTABLE_LEVELS
> int
> default 5 if X86_5LEVEL
> @@ -1469,6 +1472,7 @@ config ARCH_HAS_MEM_ENCRYPT
> config AMD_MEM_ENCRYPT
> bool "AMD Secure Memory Encryption (SME) support"
> depends on X86_64 && CPU_SUP_AMD
> + select DYNAMIC_PHYSICAL_MASK
> ---help---
> Say yes to enable support for the encryption of system memory.
> This requires an AMD processor that supports Secure Memory
> diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
> index b5e5e02f8cde..4318ac0af815 100644
> --- a/arch/x86/boot/compressed/pagetable.c
> +++ b/arch/x86/boot/compressed/pagetable.c
> @@ -16,6 +16,9 @@
> #define __pa(x) ((unsigned long)(x))
> #define __va(x) ((void *)((unsigned long)(x)))
>
> +/* No need in adjustable __PHYSICAL_MASK during decompresssion phase */
> +#undef CONFIG_DYNAMIC_PHYSICAL_MASK
> +
> /*
> * The pgtable.h and mm/ident_map.c includes make use of the SME related
> * information which is not used in the compressed image support. Un-define
> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> index 1e53560a84bb..c85e15010f48 100644
> --- a/arch/x86/include/asm/page_types.h
> +++ b/arch/x86/include/asm/page_types.h
> @@ -17,7 +17,6 @@
> #define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
> #define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
>
> -#define __PHYSICAL_MASK ((phys_addr_t)(__sme_clr((1ULL << __PHYSICAL_MASK_SHIFT) - 1)))
> #define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1)
>
> /* Cast *PAGE_MASK to a signed type so that it is sign-extended if
> @@ -55,6 +54,13 @@
>
> #ifndef __ASSEMBLY__
>
> +#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
> +extern phys_addr_t physical_mask;
> +#define __PHYSICAL_MASK physical_mask
> +#else
> +#define __PHYSICAL_MASK ((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
> +#endif
> +
> extern int devmem_is_allowed(unsigned long pagenr);
>
> extern unsigned long max_low_pfn_mapped;
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 1a53071e2e17..18954f97f3da 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -999,6 +999,7 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
> /* SEV state cannot be controlled by a command line option */
> sme_me_mask = me_mask;
> sev_enabled = true;
> + physical_mask &= ~sme_me_mask;
> return;
> }
>
> @@ -1033,4 +1034,6 @@ void __init __nostackprotector sme_enable(struct boot_params *bp)
> sme_me_mask = 0;
> else
> sme_me_mask = active_by_default ? me_mask : 0;
> +
> + physical_mask &= ~sme_me_mask;
> }
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 004abf9ebf12..a4dfe85f2fd8 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -7,6 +7,11 @@
> #include <asm/fixmap.h>
> #include <asm/mtrr.h>
>
> +#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
> +phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
> +EXPORT_SYMBOL(physical_mask);
> +#endif
> +
> #define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
>
> #ifdef CONFIG_HIGHPTE
>

2018-02-14 04:22:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On Tue, Feb 13, 2018 at 10:10:22PM -0600, Tom Lendacky wrote:
> On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
> > AMD SME claims one bit from physical address to indicate whether the
> > page is encrypted or not. To achieve that we clear out the bit from
> > __PHYSICAL_MASK.
>
> I was actually working on a suggestion by Linus to use one of the software
> page table bits to indicate encryption and translate that to the hardware
> bit when writing the actual page table entry. With that, __PHYSICAL_MASK
> would go back to its original definition.

But you would need to mask it on reading of pfn from page table entry,
right? I expect it to have more overhead than this one.

And software bits are valuable. Do we still have a spare one for this?

--
Kirill A. Shutemov

2018-02-14 04:58:44

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On 2/13/2018 10:21 PM, Kirill A. Shutemov wrote:
> On Tue, Feb 13, 2018 at 10:10:22PM -0600, Tom Lendacky wrote:
>> On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
>>> AMD SME claims one bit from physical address to indicate whether the
>>> page is encrypted or not. To achieve that we clear out the bit from
>>> __PHYSICAL_MASK.
>>
>> I was actually working on a suggestion by Linus to use one of the software
>> page table bits to indicate encryption and translate that to the hardware
>> bit when writing the actual page table entry. With that, __PHYSICAL_MASK
>> would go back to its original definition.
>
> But you would need to mask it on reading of pfn from page table entry,
> right? I expect it to have more overhead than this one.

When reading back an entry it would translate the hardware bit position
back to the software bit position. The suggestion for changing it was
to make _PAGE_ENC a constant and not tied to the sme_me_mask.

See https://marc.info/?l=linux-kernel&m=151017622615894&w=2

>
> And software bits are valuable. Do we still have a spare one for this?

I was looking at possibly using bit 57 (_PAGE_BIT_SOFTW5).

Thanks,
Tom

>

2018-02-14 07:31:18

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On Tue, 2018-02-13 at 22:57 -0600, Tom Lendacky wrote:
> On 2/13/2018 10:21 PM, Kirill A. Shutemov wrote:
> > On Tue, Feb 13, 2018 at 10:10:22PM -0600, Tom Lendacky wrote:
> > > On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
> > > > AMD SME claims one bit from physical address to indicate
> > > > whether the
> > > > page is encrypted or not. To achieve that we clear out the bit
> > > > from
> > > > __PHYSICAL_MASK.
> > >
> > > I was actually working on a suggestion by Linus to use one of the
> > > software
> > > page table bits to indicate encryption and translate that to the
> > > hardware
> > > bit when writing the actual page table entry. With that,
> > > __PHYSICAL_MASK
> > > would go back to its original definition.
> >
> > But you would need to mask it on reading of pfn from page table
> > entry,
> > right? I expect it to have more overhead than this one.
>
> When reading back an entry it would translate the hardware bit
> position
> back to the software bit position. The suggestion for changing it
> was
> to make _PAGE_ENC a constant and not tied to the sme_me_mask.
>
> See https://marc.info/?l=linux-kernel&m=151017622615894&w=2
>
> >
> > And software bits are valuable. Do we still have a spare one for
> > this?
>
> I was looking at possibly using bit 57 (_PAGE_BIT_SOFTW5).

But MK-TME supports upto 15 bits (architectually) as keyID. How is this
supposed to work with MK-TME?

Thanks,
-Kai
>
> Thanks,
> Tom
>
> >

2018-02-14 09:03:42

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On Wed, Feb 14, 2018 at 08:30:20PM +1300, Kai Huang wrote:
> On Tue, 2018-02-13 at 22:57 -0600, Tom Lendacky wrote:
> > On 2/13/2018 10:21 PM, Kirill A. Shutemov wrote:
> > > On Tue, Feb 13, 2018 at 10:10:22PM -0600, Tom Lendacky wrote:
> > > > On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
> > > > > AMD SME claims one bit from physical address to indicate
> > > > > whether the
> > > > > page is encrypted or not. To achieve that we clear out the bit
> > > > > from
> > > > > __PHYSICAL_MASK.
> > > >
> > > > I was actually working on a suggestion by Linus to use one of the
> > > > software
> > > > page table bits to indicate encryption and translate that to the
> > > > hardware
> > > > bit when writing the actual page table entry. With that,
> > > > __PHYSICAL_MASK
> > > > would go back to its original definition.
> > >
> > > But you would need to mask it on reading of pfn from page table
> > > entry,
> > > right? I expect it to have more overhead than this one.
> >
> > When reading back an entry it would translate the hardware bit
> > position
> > back to the software bit position. The suggestion for changing it
> > was
> > to make _PAGE_ENC a constant and not tied to the sme_me_mask.

But is it really constant? I thought it's enumerated at boot-time.
Can we step onto a problem for future AMD CPUs?

In case of MKTME the bits we need to clear are not constant. Depends on
CPU and BIOS settings.

By making _PAGE_ENC constant we would effectively lower maximum physical
address space the kernel can handle, regardless if the system has SME
enabled. I can imagine some people wouldn't be happy about this.

And I think it would collide with 5-level paging.

I would leave it as variable for now and look on this later once we would
have infrastructure to patch constants in kernel text.

--
Kirill A. Shutemov

2018-02-14 14:32:31

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On 2/14/2018 1:30 AM, Kai Huang wrote:
> On Tue, 2018-02-13 at 22:57 -0600, Tom Lendacky wrote:
>> On 2/13/2018 10:21 PM, Kirill A. Shutemov wrote:
>>> On Tue, Feb 13, 2018 at 10:10:22PM -0600, Tom Lendacky wrote:
>>>> On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
>>>>> AMD SME claims one bit from physical address to indicate
>>>>> whether the
>>>>> page is encrypted or not. To achieve that we clear out the bit
>>>>> from
>>>>> __PHYSICAL_MASK.
>>>>
>>>> I was actually working on a suggestion by Linus to use one of the
>>>> software
>>>> page table bits to indicate encryption and translate that to the
>>>> hardware
>>>> bit when writing the actual page table entry. With that,
>>>> __PHYSICAL_MASK
>>>> would go back to its original definition.
>>>
>>> But you would need to mask it on reading of pfn from page table
>>> entry,
>>> right? I expect it to have more overhead than this one.
>>
>> When reading back an entry it would translate the hardware bit
>> position
>> back to the software bit position. The suggestion for changing it
>> was
>> to make _PAGE_ENC a constant and not tied to the sme_me_mask.
>>
>> See https://marc.info/?l=linux-kernel&m=151017622615894&w=2
>>
>>>
>>> And software bits are valuable. Do we still have a spare one for
>>> this?
>>
>> I was looking at possibly using bit 57 (_PAGE_BIT_SOFTW5).
>
> But MK-TME supports upto 15 bits (architectually) as keyID. How is this
> supposed to work with MK-TME?

I didn't know about MK-TME when I first started looking at this. Is there
any way to still use just the single bit to indicate encryption and then
have logic that provides the proper bits when actually writing to the page
table? I'm not sure what it would take, but it might be worth looking
into.

Thanks,
Tom

>
> Thanks,
> -Kai
>>
>> Thanks,
>> Tom
>>
>>>

2018-02-14 15:11:11

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On 2/14/2018 3:02 AM, Kirill A. Shutemov wrote:
> On Wed, Feb 14, 2018 at 08:30:20PM +1300, Kai Huang wrote:
>> On Tue, 2018-02-13 at 22:57 -0600, Tom Lendacky wrote:
>>> On 2/13/2018 10:21 PM, Kirill A. Shutemov wrote:
>>>> On Tue, Feb 13, 2018 at 10:10:22PM -0600, Tom Lendacky wrote:
>>>>> On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
>>>>>> AMD SME claims one bit from physical address to indicate
>>>>>> whether the
>>>>>> page is encrypted or not. To achieve that we clear out the bit
>>>>>> from
>>>>>> __PHYSICAL_MASK.
>>>>>
>>>>> I was actually working on a suggestion by Linus to use one of the
>>>>> software
>>>>> page table bits to indicate encryption and translate that to the
>>>>> hardware
>>>>> bit when writing the actual page table entry. With that,
>>>>> __PHYSICAL_MASK
>>>>> would go back to its original definition.
>>>>
>>>> But you would need to mask it on reading of pfn from page table
>>>> entry,
>>>> right? I expect it to have more overhead than this one.
>>>
>>> When reading back an entry it would translate the hardware bit
>>> position
>>> back to the software bit position. The suggestion for changing it
>>> was
>>> to make _PAGE_ENC a constant and not tied to the sme_me_mask.
>
> But is it really constant? I thought it's enumerated at boot-time.
> Can we step onto a problem for future AMD CPUs?

_PAGE_ENC would be constant and it would be translated to the actual bit
that was enumerated at boot-time when writing the page table entry and
translated back to _PAGE_ENC when reading the page table entry.

>
> In case of MKTME the bits we need to clear are not constant. Depends on
> CPU and BIOS settings.
>
> By making _PAGE_ENC constant we would effectively lower maximum physical
> address space the kernel can handle, regardless if the system has SME
> enabled. I can imagine some people wouldn't be happy about this.

I don't see how this would lower the maximum physical address space the
kernel could handle. Bit 57 is part of the reserved page table flag
bits and if SME is not enabled the hardware bits are never used.

What I do see as a problem is a kernel built with support for SME, and
therefore _PAGE_ENC is not zero, but SME has not been enabled by the BIOS
or mem_encrypt=off is specified. In this case you can never be certain
that the translation from software bit to hardware bit and back is
correct. Take for example, pmd_bad(). Here, _KERNPG_TABLE would have a
non-zero _PAGE_ENC or'd into it. When written to a page table entry when
SME is not enabled/active, the actual hardware encryption bit would not be
set. When reading back the value, since the hardware encryption bit is
not set, the translation to set _PAGE_ENC bit won't be done and the
comparison to _KERNPG_TABLE would fail. Of course we could just eliminate
_PAGE_ENC from the comparison...

>
> And I think it would collide with 5-level paging.

Does 5-level paging remove bit 57 from the reserved flags?

>
> I would leave it as variable for now and look on this later once we would
> have infrastructure to patch constants in kernel text.

If the MK-TME support is going to use the same approach to include the
mask/bits real time in _PAGE_ENC then maybe it would be best to get that
in first and then look to see if something could be done along the lines
of what Linus suggests or with the patchable constants.

Thanks,
Tom

>

2018-02-14 20:01:09

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

On Wed, Feb 14, 2018 at 09:09:05AM -0600, Tom Lendacky wrote:
> On 2/14/2018 3:02 AM, Kirill A. Shutemov wrote:
> > On Wed, Feb 14, 2018 at 08:30:20PM +1300, Kai Huang wrote:
> >> On Tue, 2018-02-13 at 22:57 -0600, Tom Lendacky wrote:
> >>> On 2/13/2018 10:21 PM, Kirill A. Shutemov wrote:
> >>>> On Tue, Feb 13, 2018 at 10:10:22PM -0600, Tom Lendacky wrote:
> >>>>> On 2/8/2018 6:55 AM, Kirill A. Shutemov wrote:
> >>>>>> AMD SME claims one bit from physical address to indicate
> >>>>>> whether the
> >>>>>> page is encrypted or not. To achieve that we clear out the bit
> >>>>>> from
> >>>>>> __PHYSICAL_MASK.
> >>>>>
> >>>>> I was actually working on a suggestion by Linus to use one of the
> >>>>> software
> >>>>> page table bits to indicate encryption and translate that to the
> >>>>> hardware
> >>>>> bit when writing the actual page table entry. With that,
> >>>>> __PHYSICAL_MASK
> >>>>> would go back to its original definition.
> >>>>
> >>>> But you would need to mask it on reading of pfn from page table
> >>>> entry,
> >>>> right? I expect it to have more overhead than this one.
> >>>
> >>> When reading back an entry it would translate the hardware bit
> >>> position
> >>> back to the software bit position. The suggestion for changing it
> >>> was
> >>> to make _PAGE_ENC a constant and not tied to the sme_me_mask.
> >
> > But is it really constant? I thought it's enumerated at boot-time.
> > Can we step onto a problem for future AMD CPUs?
>
> _PAGE_ENC would be constant and it would be translated to the actual bit
> that was enumerated at boot-time when writing the page table entry and
> translated back to _PAGE_ENC when reading the page table entry.

I'm confused. I thought that _PAGE_ENC would actual bit that indicate
encryption.

If we need translation, I don't see how pte_pfn() can be implemented
efficiently. Getting pfn from page table entry is the path that I'm worry
about. It's all over the kernel and having overhead there comes with cost.

Could you write a prototype of such patch? I have hard time grasping your
idea.

> > In case of MKTME the bits we need to clear are not constant. Depends on
> > CPU and BIOS settings.
> >
> > By making _PAGE_ENC constant we would effectively lower maximum physical
> > address space the kernel can handle, regardless if the system has SME
> > enabled. I can imagine some people wouldn't be happy about this.
>
> I don't see how this would lower the maximum physical address space the
> kernel could handle. Bit 57 is part of the reserved page table flag
> bits and if SME is not enabled the hardware bits are never used.

My statement came from confusion about _PAGE_ENC value.

> What I do see as a problem is a kernel built with support for SME, and
> therefore _PAGE_ENC is not zero, but SME has not been enabled by the BIOS
> or mem_encrypt=off is specified. In this case you can never be certain
> that the translation from software bit to hardware bit and back is
> correct. Take for example, pmd_bad(). Here, _KERNPG_TABLE would have a
> non-zero _PAGE_ENC or'd into it. When written to a page table entry when
> SME is not enabled/active, the actual hardware encryption bit would not be
> set. When reading back the value, since the hardware encryption bit is
> not set, the translation to set _PAGE_ENC bit won't be done and the
> comparison to _KERNPG_TABLE would fail. Of course we could just eliminate
> _PAGE_ENC from the comparison...
>
> >
> > And I think it would collide with 5-level paging.
>
> Does 5-level paging remove bit 57 from the reserved flags?

No. The same confusion.

> > I would leave it as variable for now and look on this later once we would
> > have infrastructure to patch constants in kernel text.
>
> If the MK-TME support is going to use the same approach to include the
> mask/bits real time in _PAGE_ENC then maybe it would be best to get that
> in first and then look to see if something could be done along the lines
> of what Linus suggests or with the patchable constants.

--
Kirill A. Shutemov