Remove a redundant check on kernel code's PMD _PAGE_PRESENT attribute
before fix up.
Current process looks like this:
pmd in [0, _text)
unset _PAGE_PRESENT
pmd in [_text, _end]
if (_PAGE_PRESENT)
fix up delta
pmd in (_end, 512)
unset _PAGE_PRESENT
level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
redundant
Signed-off-by: Wei Yang <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Kirill A. Shutemov <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Steve Wahl <[email protected]>
---
v3: refine the change log per kirill's comment
v2: adjust the change log to emphasize the redundant check
---
arch/x86/kernel/head64.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a817ed0724d1..bac33ec19aa2 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
/* fixup pages that are part of the kernel image */
for (; i <= pmd_index((unsigned long)_end); i++)
- if (pmd[i] & _PAGE_PRESENT)
- pmd[i] += load_delta;
+ pmd[i] += load_delta;
/* invalidate pages after the kernel image */
for (; i < PTRS_PER_PMD; i++)
--
2.34.1
On Thu, May 23, 2024 at 12:35:39PM +0000, Wei Yang wrote:
> Remove a redundant check on kernel code's PMD _PAGE_PRESENT attribute
> before fix up.
>
> Current process looks like this:
>
> pmd in [0, _text)
> unset _PAGE_PRESENT
> pmd in [_text, _end]
> if (_PAGE_PRESENT)
> fix up delta
> pmd in (_end, 512)
> unset _PAGE_PRESENT
>
> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
> redundant
>
> Signed-off-by: Wei Yang <[email protected]>
> CC: Thomas Gleixner <[email protected]>
> CC: Kirill A. Shutemov <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Steve Wahl <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
--
Kiryl Shutsemau / Kirill A. Shutemov
On 5/23/24 05:35, Wei Yang wrote:
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>
> /* fixup pages that are part of the kernel image */
> for (; i <= pmd_index((unsigned long)_end); i++)
> - if (pmd[i] & _PAGE_PRESENT)
> - pmd[i] += load_delta;
> + pmd[i] += load_delta;
So, I think this is correct. But, man, I wish folks would go through
the git history and make it clear that they understand _how_ the code
got the way it is.
I suspect that the original _PAGE_PRESENT check wasn't even necessary if
cleanup_highmap() really did fix things up. But this commit:
2aa85f246c18 ("x86/boot/64: Make level2_kernel_pgt pages invalid
outside kernel area")
tweaked things to actively clear out PMDs that weren't populated in
Kirill's original loop. It didn't touch the _PAGE_PRESENT check. But
it certainly did imply that the PMD doesn't have any holes in it and
there's nothing int he middle that needs _PAGE_PRESENT cleared.
> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
> redundant
This isn't super reassuring. It also depends on nothing having munged
the page tables up to this point. The code is also a bit cruel in that
it manipulates two different sets of PMDs with the same 'pmd' variable.
Also, is this comment still accurate after '2aa85f246c18'?
> * Fixup the kernel text+data virtual addresses. Note that
> * we might write invalid pmds, when the kernel is relocated
> * cleanup_highmap() fixes this up along with the mappings
> * beyond _end.
On Mon, Jun 03, 2024 at 11:50:06AM -0700, Dave Hansen wrote:
>On 5/23/24 05:35, Wei Yang wrote:
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>>
>> /* fixup pages that are part of the kernel image */
>> for (; i <= pmd_index((unsigned long)_end); i++)
>> - if (pmd[i] & _PAGE_PRESENT)
>> - pmd[i] += load_delta;
>> + pmd[i] += load_delta;
>
>So, I think this is correct. But, man, I wish folks would go through
>the git history and make it clear that they understand _how_ thecode
>got the way it is.
>
Dave
Thanks for your comment.
In my first version, it lists the historical change, while Thomas thought they
are not relevant. So I remove those descriptions.
https://lkml.org/lkml/2024/3/23/350
>I suspect that the original _PAGE_PRESENT check wasn't even necessary if
>cleanup_highmap() really did fix things up. But this commit:
>
> 2aa85f246c18 ("x86/boot/64: Make level2_kernel_pgt pages invalid
> outside kernel area")
>
>tweaked things to actively clear out PMDs that weren't populated in
>Kirill's original loop. It didn't touch the _PAGE_PRESENT check. But
>it certainly did imply that the PMD doesn't have any holes in it and
>there's nothing int he middle that needs _PAGE_PRESENT cleared.
>
As I mentioned in my first version, the original code is introduced by
commit 1ab60e0f72f7 ("[PATCH] x86-64: Relocatable Kernel Support")
The reason for the check on _PAGE_PRESENT is at that moment, level2_kernel_pgt
is defined as:
NEXT_PAGE(level2_kernel_pgt)
/* 40MB kernel mapping. The kernel code cannot be bigger than that.
When you change this change KERNEL_TEXT_SIZE in page.h too. */
/* (2^48-(2*1024*1024*1024)-((2^39)*511)-((2^30)*510)) = 0 */
PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC|_PAGE_GLOBAL,
KERNEL_TEXT_SIZE/PMD_SIZE)
/* Module mapping starts here */
.fill (PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0
While now, it looks like this:
SYM_DATA_START_PAGE_ALIGNED(level2_kernel_pgt)
/*
* Kernel high mapping.
*
* The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
* virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
* 512 MiB otherwise.
*
* (NOTE: after that starts the module area, see MODULES_VADDR.)
*
* This table is eventually used by the kernel during normal runtime.
* Care must be taken to clear out undesired bits later, like _PAGE_RW
* or _PAGE_GLOBAL in some cases.
*/
PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
SYM_DATA_END(level2_kernel_pgt)
The difference is at the original version, level2_kernel_pgt is not all
defined with _PAGE_PRESENT set. I didn't dig into from which commit we expand
the level2_kernel_pgt to full, while I think from that point, the check is
redundant.
>> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
>> redundant
>
>This isn't super reassuring. It also depends on nothing having munged
>the page tables up to this point. The code is also a bit cruel in that
>it manipulates two different sets of PMDs with the same 'pmd' variable.
>
>Also, is this comment still accurate after '2aa85f246c18'?
>
>> * Fixup the kernel text+data virtual addresses. Note that
>> * we might write invalid pmds, when the kernel is relocated
>> * cleanup_highmap() fixes this up along with the mappings
>> * beyond _end.
Sounds this is not necessary any more. Do you prefer to remove this in next
version of this patch.
--
Wei Yang
Help you, Help me
On Mon, Jun 03, 2024 at 11:50:06AM -0700, Dave Hansen wrote:
>On 5/23/24 05:35, Wei Yang wrote:
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>>
>> /* fixup pages that are part of the kernel image */
>> for (; i <= pmd_index((unsigned long)_end); i++)
>> - if (pmd[i] & _PAGE_PRESENT)
>> - pmd[i] += load_delta;
>> + pmd[i] += load_delta;
>
>So, I think this is correct. But, man, I wish folks would go through
>the git history and make it clear that they understand _how_ the code
>got the way it is.
>
>I suspect that the original _PAGE_PRESENT check wasn't even necessary if
>cleanup_highmap() really did fix things up. But this commit:
>
> 2aa85f246c18 ("x86/boot/64: Make level2_kernel_pgt pages invalid
> outside kernel area")
>
>tweaked things to actively clear out PMDs that weren't populated in
>Kirill's original loop. It didn't touch the _PAGE_PRESENT check. But
>it certainly did imply that the PMD doesn't have any holes in it and
>there's nothing int he middle that needs _PAGE_PRESENT cleared.
>
>> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
>> redundant
>
>This isn't super reassuring. It also depends on nothing having munged
>the page tables up to this point. The code is also a bit cruel in that
>it manipulates two different sets of PMDs with the same 'pmd' variable.
>
>Also, is this comment still accurate after '2aa85f246c18'?
>
>> * Fixup the kernel text+data virtual addresses. Note that
>> * we might write invalid pmds, when the kernel is relocated
>> * cleanup_highmap() fixes this up along with the mappings
>> * beyond _end.
Hi, Dave
Do you have other suggestions? What do I expect to do next?
--
Wei Yang
Help you, Help me