Hello,
While addressing community's review comments about how PTI treats Global
bit in kernel page table, I noticed most part of the direct mapping in
the kernel page table has the Global bit set, while I think the intent
is to unset Global bit when PTI is on for the majority mappings.
e.g. on my desktop with i7-7700K/32G memory, the following part has
Global bit set according to /sys/kernel/debug/page_tables/kernel:
---[ Low Kernel Mapping ]---
... ...
0xffff888140000000-0xffff888840000000 28G RW PSE GLB NX pud
... ...
There are also many other parts in the direct mapping that have Global
bit set, I just showed the biggest part.
Leaving these entries with Global bit set doesn't look good, as Global
means when switching from kernel to user mode, this entry, if has
been used in kernel mode, can remain in TLB and user can potentially
make use of this entry to speculatively access memory that they shouldn't
access, reducing the effect of clearing kernel part in user page tables.
I tracked this issue to commit c164fbb40c43f("x86/mm: thread
pgprot_t through init_memory_mapping()") where it used
__PAGE_KERNEL_LARGE instead of PAGE_KERNEL_LARGE for PUD mapping.
__PAGE_KERNEL_LARGE is a raw value, it doesn't get the mask of
__default_kernel_pte_mask, hence leaving Global bit on these PUD
mappings(and mappings that are split from these PUD mappings).
While to see if this left Global bit can really make bad things happen,
I wrote a test code based on secret.c from:
https://github.com/IAIK/meltdown.
In less than 10 tries, the physical_reader program can recover the
secret strings set in the secret2.c program so I think this issue should
be fixed. I run the two programs like this:
# start secret2 from one terminal, bind it to cpu2
$ sudo taskset -c 2 ./secret2
[+] Secret: Burn after reading this string, it is a secret stringce and kernel
[+] Physical address of secret: 0x120121000
[+] Exit with Ctrl+C if you are done reading the secret
# in another terminal, run physical_reader. make sure to bind to the
# same cpu so that it can see the TLB for the page where secret string
# resides:
$ sudo taskset -c 2 ./physical_reader 0x120121000 0xffff888000000000
[+] Physical address : 0x120121000
[+] Physical offset : 0xffff888000000000
[+] Reading virtual address: 0xffff888120121000
)
Burn after reading this string, it is a secret stringce and kern^C
This patch fixes this problem by using proper mask when setting PUD
mappings.
Aaron Lu (1):
x86/mm: Use proper mask when setting PUD mapping
arch/x86/mm/init_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.37.1
commit c164fbb40c43f("x86/mm: thread pgprot_t through
init_memory_mapping()") mistakenly used __pgprot() which doesn't
respect __default_kernel_pte_mask when setting PUD mapping, fix it
by using __pgprot_mask().
Fixes: c164fbb40c43f("x86/mm: thread pgprot_t through init_memory_mapping()")
Signed-off-by: Aaron Lu <[email protected]>
---
arch/x86/mm/init_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 39c5246964a9..a7238f17df47 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -645,7 +645,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
pages++;
spin_lock(&init_mm.page_table_lock);
- prot = __pgprot(pgprot_val(prot) | __PAGE_KERNEL_LARGE);
+ prot = __pgprot_mask(pgprot_val(prot) | __PAGE_KERNEL_LARGE);
set_pte_init((pte_t *)pud,
pfn_pte((paddr & PUD_MASK) >> PAGE_SHIFT,
--
2.37.1
On Thu, Aug 18, 2022 at 7:30 PM Aaron Lu <[email protected]> wrote:
>
> - prot = __pgprot(pgprot_val(prot) | __PAGE_KERNEL_LARGE);
> + prot = __pgprot_mask(pgprot_val(prot) | __PAGE_KERNEL_LARGE);
The patch looks "ObviouslyCorrect(tm)" to me, but I have to admit that
I absolutely hate how we use the pte helpers for all the levels.
It gets even worse when we do that
set_pte_init((pte_t *)pud,
pfn_pte((paddr & PUD_MASK) >> PAGE_SHIFT,
prot),
init);
on the next lines, and I don't understand why this doesn't use
"set_pud_init()" here.
It's probably something obvious, like "using set_pud_init() would mean
that we'd have to cast the *second* argument instead, because we don't
have a pfd_pud() function".
But it all makes me go a bit eww, and also makes me suspect I am
missing something else too, and that my "this looks
ObviouslyCorrect(tm)" is thus worthless.
Also, I don't understand why we use that __PAGE_KERNEL_LARGE at all.
We already have a valid set of protection bits, that had gotten
properly masked previously.
Isn't the only bit we actually want to set "_PAGE_PSE"?
IOW, I get the feeling that that patch should instead just be
- prot = __pgprot(pgprot_val(prot) | __PAGE_KERNEL_LARGE);
+ prot = __pgprot(pgprot_val(prot) | _PAGE_PSE);
and we should never need to mask anything off at all with
__pgprot_mask() - the bug was really that we set way too many bits.
But again, I *also* have the feeling that I'm missing something important.
Ingo, Thomas, any others who know this code better than me by now - comments?
Linus
On Fri, Aug 19, 2022 at 10:30:01AM +0800, Aaron Lu wrote:
> commit c164fbb40c43f("x86/mm: thread pgprot_t through
> init_memory_mapping()") mistakenly used __pgprot() which doesn't
> respect __default_kernel_pte_mask when setting PUD mapping, fix it
> by using __pgprot_mask().
>
> Fixes: c164fbb40c43f("x86/mm: thread pgprot_t through init_memory_mapping()")
Nit, you need a space before the '(' character or the linux-next scripts
will complain :(
Also, you cc:ed [email protected] and lkml. As this is public, no need to
copy [email protected] at all.
thanks,
greg k-h