hi,
while browsing the page table setup code, I noticed the x86_64 head
code might not need the identity mappings at all.
It seems it's ok to switch it off completely from the begining,
unless I'm missing something.
wbr,
jirka
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/kernel/head64.c | 10 ----------
arch/x86/kernel/head_64.S | 9 ++++-----
2 files changed, 4 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2d2673c..620a9c3 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -27,13 +27,6 @@
#include <asm/trampoline.h>
#include <asm/bios_ebda.h>
-static void __init zap_identity_mappings(void)
-{
- pgd_t *pgd = pgd_offset_k(0UL);
- pgd_clear(pgd);
- __flush_tlb_all();
-}
-
/* Don't add a printk in there. printk relies on the PDA which is not initialized
yet. */
static void __init clear_bss(void)
@@ -74,9 +67,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
/* clear bss before set_intr_gate with early_idt_handler */
clear_bss();
- /* Make NULL pointers segfault */
- zap_identity_mappings();
-
/* Cleanup the over mapped high alias */
cleanup_highmap();
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 239046b..c55e6fa 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -341,13 +341,12 @@ ENTRY(name)
.data
/*
- * This default setting generates an ident mapping at address 0x100000
- * and a mapping for the kernel that precisely maps virtual address
- * 0xffffffff80000000 to physical address 0x000000. (always using
- * 2Mbyte large pages provided by PAE mode)
+ * This default setting generates a mapping for the kernel that
+ * precisely maps virtual address 0xffffffff80000000 to physical
+ * address 0x000000. (always using * 2Mbyte large pages provided
+ * by PAE mode)
*/
NEXT_PAGE(init_level4_pgt)
- .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
.org init_level4_pgt + L4_PAGE_OFFSET*8, 0
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
.org init_level4_pgt + L4_START_KERNEL*8, 0
--
1.7.1
Jiri Olsa <[email protected]> writes:
> hi,
>
> while browsing the page table setup code, I noticed the x86_64 head
> code might not need the identity mappings at all.
> It seems it's ok to switch it off completely from the begining,
> unless I'm missing something.
Have you tested it?
I expect you will find that we need the identity mapping because
before we load this page table we are running with virt==phys
and we need the identity mapping retained in the new page table
so we can get to the instruction after movq %rax, %cr0.
Eric
On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
> Jiri Olsa <[email protected]> writes:
>
> > hi,
> >
> > while browsing the page table setup code, I noticed the x86_64 head
> > code might not need the identity mappings at all.
> > It seems it's ok to switch it off completely from the begining,
> > unless I'm missing something.
>
> Have you tested it?
yes, I booted it with no problem
>
> I expect you will find that we need the identity mapping because
> before we load this page table we are running with virt==phys
> and we need the identity mapping retained in the new page table
> so we can get to the instruction after movq %rax, %cr0.
well, right after the page table setup, there's following code
switching to the kernel map adresses 0xffffffff80000000+
movq %rax, %cr3
/* Ensure I am executing from virtual addresses */
movq $1f, %rax
jmp *%rax
1:
and I found no other identity mapping usage after this point
jirka
On 02/11/2011 08:07 PM, Jiri Olsa wrote:
> On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
>> Jiri Olsa <[email protected]> writes:
>>
>>> hi,
>>>
>>> while browsing the page table setup code, I noticed the x86_64 head
>>> code might not need the identity mappings at all.
>>> It seems it's ok to switch it off completely from the begining,
>>> unless I'm missing something.
>>
>> Have you tested it?
>
> yes, I booted it with no problem
>
Hi Jiri, just wonder -- hibernation still works after that?
Also Xen might/might-not need it as well, and walk_pgd_level
seems to use it as well, no?
--
Cyrill
On Fri, Feb 11, 2011 at 12:07 PM, Jiri Olsa <[email protected]> wrote:
> On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
>> Jiri Olsa <[email protected]> writes:
>>
>> > hi,
>> >
>> > while browsing the page table setup code, I noticed the x86_64 head
>> > code might not need the identity mappings at all.
>> > It seems it's ok to switch it off completely from the begining,
>> > unless I'm missing something.
>>
>> Have you tested it?
>
> yes, I booted it with no problem
The only reason this doesn't crash is because the identity mappings
provided by the boot code are marked as global, and therefore might
not be flushed by simply loading cr3. The cpu can evict TLB entries
at any time though, so it's a bad idea to run without the identity
mappings even for the brief moment before jumping to the virtual
address.
--
Brian Gerst
On Fri, Feb 11, 2011 at 12:59:47PM -0500, Brian Gerst wrote:
> On Fri, Feb 11, 2011 at 12:07 PM, Jiri Olsa <[email protected]> wrote:
> > On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
> >> Jiri Olsa <[email protected]> writes:
> >>
> >> > hi,
> >> >
> >> > while browsing the page table setup code, I noticed the x86_64 head
> >> > code might not need the identity mappings at all.
> >> > It seems it's ok to switch it off completely from the begining,
> >> > unless I'm missing something.
> >>
> >> Have you tested it?
> >
> > yes, I booted it with no problem
>
> The only reason this doesn't crash is because the identity mappings
> provided by the boot code are marked as global, and therefore might
> not be flushed by simply loading cr3. The cpu can evict TLB entries
> at any time though, so it's a bad idea to run without the identity
> mappings even for the brief moment before jumping to the virtual
> address.
I added code for flushing whole TLB (including global pages) and it
still boots (attached).
I'm sorry if I'm missing something obvious (probably the TLB flushing
code is wrong), but I'd like to understand this part.
What instruction/action would require the identity mapping,
after the page table is set?
thanks (and again sry for noise :) )
jirka
---
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c55e6fa..073f489 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -165,6 +165,13 @@ ENTRY(secondary_startup_64)
movl $(X86_CR4_PAE | X86_CR4_PGE), %eax
movq %rax, %cr4
+ /* invalidate whole TLB */
+ movq %cr4, %rax
+ movq %rax, %rdx
+ andq $~X86_CR4_PGE, %rax
+ movq %rax, %cr4
+ movq %rdx, %cr4
+
/* Setup early boot stage 4 level pagetables. */
movq $(init_level4_pgt - __START_KERNEL_map), %rax
addq phys_base(%rip), %rax
On Fri, Feb 11, 2011 at 2:13 PM, Jiri Olsa <[email protected]> wrote:
> On Fri, Feb 11, 2011 at 12:59:47PM -0500, Brian Gerst wrote:
>> On Fri, Feb 11, 2011 at 12:07 PM, Jiri Olsa <[email protected]> wrote:
>> > On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
>> >> Jiri Olsa <[email protected]> writes:
>> >>
>> >> > hi,
>> >> >
>> >> > while browsing the page table setup code, I noticed the x86_64 head
>> >> > code might not need the identity mappings at all.
>> >> > It seems it's ok to switch it off completely from the begining,
>> >> > unless I'm missing something.
>> >>
>> >> Have you tested it?
>> >
>> > yes, I booted it with no problem
>>
>> The only reason this doesn't crash is because the identity mappings
>> provided by the boot code are marked as global, and therefore might
>> not be flushed by simply loading cr3. The cpu can evict TLB entries
>> at any time though, so it's a bad idea to run without the identity
>> mappings even for the brief moment before jumping to the virtual
>> address.
>
> I added code for flushing whole TLB (including global pages) and it
> still boots (attached).
>
> I'm sorry if I'm missing something obvious (probably the TLB flushing
> code is wrong), but I'd like to understand this part.
>
> What instruction/action would require the identity mapping,
> after the page table is set?
>
> thanks (and again sry for noise :) )
> jirka
>
>
> ---
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index c55e6fa..073f489 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -165,6 +165,13 @@ ENTRY(secondary_startup_64)
> movl $(X86_CR4_PAE | X86_CR4_PGE), %eax
> movq %rax, %cr4
>
> + /* invalidate whole TLB */
> + movq %cr4, %rax
> + movq %rax, %rdx
> + andq $~X86_CR4_PGE, %rax
> + movq %rax, %cr4
> + movq %rdx, %cr4
> +
> /* Setup early boot stage 4 level pagetables. */
> movq $(init_level4_pgt - __START_KERNEL_map), %rax
> addq phys_base(%rip), %rax
>
The way you have it, it will immediately reload the global identity
entry into the TLB when it executes the next instruction, because cr3
is still pointing to the old pagetables. Disable PGE during or
immediately after the load of cr3 to make sure the global identity
entries are flushed.
--
Brian Gerst
On Fri, Feb 11, 2011 at 03:19:11PM -0500, Brian Gerst wrote:
> On Fri, Feb 11, 2011 at 2:13 PM, Jiri Olsa <[email protected]> wrote:
> > On Fri, Feb 11, 2011 at 12:59:47PM -0500, Brian Gerst wrote:
> >> On Fri, Feb 11, 2011 at 12:07 PM, Jiri Olsa <[email protected]> wrote:
> >> > On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
> >> >> Jiri Olsa <[email protected]> writes:
> >> >>
> >> >> > hi,
> >> >> >
> >> >> > while browsing the page table setup code, I noticed the x86_64 head
> >> >> > code might not need the identity mappings at all.
> >> >> > It seems it's ok to switch it off completely from the begining,
> >> >> > unless I'm missing something.
> >> >>
> >> >> Have you tested it?
> >> >
> >> > yes, I booted it with no problem
> >>
> >> The only reason this doesn't crash is because the identity mappings
> >> provided by the boot code are marked as global, and therefore might
> >> not be flushed by simply loading cr3. ?The cpu can evict TLB entries
> >> at any time though, so it's a bad idea to run without the identity
> >> mappings even for the brief moment before jumping to the virtual
> >> address.
> >
> > I added code for flushing whole TLB (including global pages) and it
> > still boots (attached).
> >
> > I'm sorry if I'm missing something obvious (probably the TLB flushing
> > code is wrong), but I'd like to understand this part.
> >
> > What instruction/action would require the identity mapping,
> > after the page table is set?
> >
> > thanks (and again sry for noise :) )
> > jirka
> >
> >
> > ---
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index c55e6fa..073f489 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -165,6 +165,13 @@ ENTRY(secondary_startup_64)
> > ? ? ? ?movl ? ?$(X86_CR4_PAE | X86_CR4_PGE), %eax
> > ? ? ? ?movq ? ?%rax, %cr4
> >
> > + ? ? ? /* invalidate whole TLB */
> > + ? ? ? movq %cr4, %rax
> > + ? ? ? movq %rax, %rdx
> > + ? ? ? andq $~X86_CR4_PGE, %rax
> > + ? ? ? movq %rax, %cr4
> > + ? ? ? movq %rdx, %cr4
> > +
> > ? ? ? ?/* Setup early boot stage 4 level pagetables. */
> > ? ? ? ?movq ? ?$(init_level4_pgt - __START_KERNEL_map), %rax
> > ? ? ? ?addq ? ?phys_base(%rip), %rax
> >
>
> The way you have it, it will immediately reload the global identity
> entry into the TLB when it executes the next instruction, because cr3
> is still pointing to the old pagetables. Disable PGE during or
> immediately after the load of cr3 to make sure the global identity
> entries are flushed.
you're right, when I put it after setting cr3 it crashed
but I still don't understand what instruction took it down..?
thanks,
jirka
On Fri, Feb 11, 2011 at 3:40 PM, Jiri Olsa <[email protected]> wrote:
> On Fri, Feb 11, 2011 at 03:19:11PM -0500, Brian Gerst wrote:
>> On Fri, Feb 11, 2011 at 2:13 PM, Jiri Olsa <[email protected]> wrote:
>> > On Fri, Feb 11, 2011 at 12:59:47PM -0500, Brian Gerst wrote:
>> >> On Fri, Feb 11, 2011 at 12:07 PM, Jiri Olsa <[email protected]> wrote:
>> >> > On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
>> >> >> Jiri Olsa <[email protected]> writes:
>> >> >>
>> >> >> > hi,
>> >> >> >
>> >> >> > while browsing the page table setup code, I noticed the x86_64 head
>> >> >> > code might not need the identity mappings at all.
>> >> >> > It seems it's ok to switch it off completely from the begining,
>> >> >> > unless I'm missing something.
>> >> >>
>> >> >> Have you tested it?
>> >> >
>> >> > yes, I booted it with no problem
>> >>
>> >> The only reason this doesn't crash is because the identity mappings
>> >> provided by the boot code are marked as global, and therefore might
>> >> not be flushed by simply loading cr3. The cpu can evict TLB entries
>> >> at any time though, so it's a bad idea to run without the identity
>> >> mappings even for the brief moment before jumping to the virtual
>> >> address.
>> >
>> > I added code for flushing whole TLB (including global pages) and it
>> > still boots (attached).
>> >
>> > I'm sorry if I'm missing something obvious (probably the TLB flushing
>> > code is wrong), but I'd like to understand this part.
>> >
>> > What instruction/action would require the identity mapping,
>> > after the page table is set?
>> >
>> > thanks (and again sry for noise :) )
>> > jirka
>> >
>> >
>> > ---
>> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
>> > index c55e6fa..073f489 100644
>> > --- a/arch/x86/kernel/head_64.S
>> > +++ b/arch/x86/kernel/head_64.S
>> > @@ -165,6 +165,13 @@ ENTRY(secondary_startup_64)
>> > movl $(X86_CR4_PAE | X86_CR4_PGE), %eax
>> > movq %rax, %cr4
>> >
>> > + /* invalidate whole TLB */
>> > + movq %cr4, %rax
>> > + movq %rax, %rdx
>> > + andq $~X86_CR4_PGE, %rax
>> > + movq %rax, %cr4
>> > + movq %rdx, %cr4
>> > +
>> > /* Setup early boot stage 4 level pagetables. */
>> > movq $(init_level4_pgt - __START_KERNEL_map), %rax
>> > addq phys_base(%rip), %rax
>> >
>>
>> The way you have it, it will immediately reload the global identity
>> entry into the TLB when it executes the next instruction, because cr3
>> is still pointing to the old pagetables. Disable PGE during or
>> immediately after the load of cr3 to make sure the global identity
>> entries are flushed.
>
> you're right, when I put it after setting cr3 it crashed
> but I still don't understand what instruction took it down..?
The instruction immediately following the flush (doesn't matter what
it is). You flushed the entry for the page you were executing from,
so the cpu has to reload the entry for that page so that it can fetch
the next instruction. Since that page doesn't exist in the new page
table, it crashes.
Getting back to the original patch. Here is the relevant text from
the Intel System Programming Guide:
------------------------------
4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches
...
The processor is always free to invalidate additional entries in the
TLBs and paging-structure
caches. The following are some examples:
...
MOV to CR3 may invalidate TLB entries for global pages.
------------------------------
So even if it just so happens to work on your particular cpu, it is
not guaranteed to always work.
--
Brian Gerst
On Fri, Feb 11, 2011 at 04:59:31PM -0500, Brian Gerst wrote:
> On Fri, Feb 11, 2011 at 3:40 PM, Jiri Olsa <[email protected]> wrote:
> > On Fri, Feb 11, 2011 at 03:19:11PM -0500, Brian Gerst wrote:
> >> On Fri, Feb 11, 2011 at 2:13 PM, Jiri Olsa <[email protected]> wrote:
> >> > On Fri, Feb 11, 2011 at 12:59:47PM -0500, Brian Gerst wrote:
> >> >> On Fri, Feb 11, 2011 at 12:07 PM, Jiri Olsa <[email protected]> wrote:
> >> >> > On Fri, Feb 11, 2011 at 08:46:41AM -0800, Eric W. Biederman wrote:
> >> >> >> Jiri Olsa <[email protected]> writes:
> >> >> >>
> >> >> >> > hi,
> >> >> >> >
> >> >> >> > while browsing the page table setup code, I noticed the x86_64 head
> >> >> >> > code might not need the identity mappings at all.
> >> >> >> > It seems it's ok to switch it off completely from the begining,
> >> >> >> > unless I'm missing something.
> >> >> >>
> >> >> >> Have you tested it?
> >> >> >
> >> >> > yes, I booted it with no problem
> >> >>
> >> >> The only reason this doesn't crash is because the identity mappings
> >> >> provided by the boot code are marked as global, and therefore might
> >> >> not be flushed by simply loading cr3. ?The cpu can evict TLB entries
> >> >> at any time though, so it's a bad idea to run without the identity
> >> >> mappings even for the brief moment before jumping to the virtual
> >> >> address.
> >> >
> >> > I added code for flushing whole TLB (including global pages) and it
> >> > still boots (attached).
> >> >
> >> > I'm sorry if I'm missing something obvious (probably the TLB flushing
> >> > code is wrong), but I'd like to understand this part.
> >> >
> >> > What instruction/action would require the identity mapping,
> >> > after the page table is set?
> >> >
> >> > thanks (and again sry for noise :) )
> >> > jirka
> >> >
> >> >
> >> > ---
> >> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> >> > index c55e6fa..073f489 100644
> >> > --- a/arch/x86/kernel/head_64.S
> >> > +++ b/arch/x86/kernel/head_64.S
> >> > @@ -165,6 +165,13 @@ ENTRY(secondary_startup_64)
> >> > ? ? ? ?movl ? ?$(X86_CR4_PAE | X86_CR4_PGE), %eax
> >> > ? ? ? ?movq ? ?%rax, %cr4
> >> >
> >> > + ? ? ? /* invalidate whole TLB */
> >> > + ? ? ? movq %cr4, %rax
> >> > + ? ? ? movq %rax, %rdx
> >> > + ? ? ? andq $~X86_CR4_PGE, %rax
> >> > + ? ? ? movq %rax, %cr4
> >> > + ? ? ? movq %rdx, %cr4
> >> > +
> >> > ? ? ? ?/* Setup early boot stage 4 level pagetables. */
> >> > ? ? ? ?movq ? ?$(init_level4_pgt - __START_KERNEL_map), %rax
> >> > ? ? ? ?addq ? ?phys_base(%rip), %rax
> >> >
> >>
> >> The way you have it, it will immediately reload the global identity
> >> entry into the TLB when it executes the next instruction, because cr3
> >> is still pointing to the old pagetables. ?Disable PGE during or
> >> immediately after the load of cr3 to make sure the global identity
> >> entries are flushed.
> >
> > you're right, when I put it after setting cr3 it crashed
> > but I still don't understand what instruction took it down..?
>
> The instruction immediately following the flush (doesn't matter what
> it is). You flushed the entry for the page you were executing from,
> so the cpu has to reload the entry for that page so that it can fetch
> the next instruction. Since that page doesn't exist in the new page
> table, it crashes.
ops, rip is the one I missed... cool :)
thanks for explanation,
jirka
>
> Getting back to the original patch. Here is the relevant text from
> the Intel System Programming Guide:
> ------------------------------
> 4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches
> ...
> The processor is always free to invalidate additional entries in the
> TLBs and paging-structure
> caches. The following are some examples:
> ...
> MOV to CR3 may invalidate TLB entries for global pages.
> ------------------------------
>
> So even if it just so happens to work on your particular cpu, it is
> not guaranteed to always work.
>
> --
> Brian Gerst