2014-07-22 15:36:35

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH 1/2] ARM: LPAE: load upper bits of early TTBR0/TTBR1

This patch fixes booting when idmap pgd lays above 4gb. Commit
4756dcbfd37 mostly had fixed this, but it'd failed to load upper bits.

Also this fixes adding TTBR1_OFFSET to TTRR1: if lower part overflows
carry flag must be added to the upper part.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Cyril Chemparathy <[email protected]>
Cc: Vitaly Andrianov <[email protected]>
---
arch/arm/mm/proc-v7-3level.S | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 22e3ad6..f0481dd 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -140,12 +140,11 @@ ENDPROC(cpu_v7_set_pte_ext)
mov \tmp, \ttbr1, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
mov \ttbr1, \ttbr1, lsl #ARCH_PGD_SHIFT @ lower bits
addls \ttbr1, \ttbr1, #TTBR1_OFFSET
- mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
+ adcls \tmp, \tmp, #0
+ mcrr p15, 1, \ttbr1, \tmp, c2 @ load TTBR1
mov \tmp, \ttbr0, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
mov \ttbr0, \ttbr0, lsl #ARCH_PGD_SHIFT @ lower bits
- mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
- mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
- mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
+ mcrr p15, 0, \ttbr0, \tmp, c2 @ load TTBR0
.endm

/*


2014-07-22 15:36:41

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

idmap layout combines both phisical and virtual addresses.
Everything works fine if ram physically lays below PAGE_OFFSET.
Otherwise idmap starts punching huge holes in virtual memory layout.
It maps ram by 2MiB sections, but when it allocates new pmd page it
cuts 1GiB at once.

This patch makes a copy of all affected pmds from init_mm.
Only few (usually one) 2MiB sections will be lost.
This is not eliminates problem but makes it 512 times less likely.

Usually idmap is used only for a short period. For examle for booting
secondary cpu __turn_mmu_on (executed in physical addresses) jumps to
virtual address of __secondary_switched which calls secondary_start_kernel
which in turn calls cpu_switch_mm and switches to normal pgd from init_mm.
So everything will be fine if these functions aren't intersect with idmap.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
arch/arm/mm/idmap.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 8e0e52e..dcd023c 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -25,6 +25,9 @@ static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end,
pr_warning("Failed to allocate identity pmd.\n");
return;
}
+ if (!pud_none(*pud))
+ memcpy(pmd, pmd_offset(pud, 0),
+ PTRS_PER_PMD * sizeof(pmd_t));
pud_populate(&init_mm, pud, pmd);
pmd += pmd_index(addr);
} else

2014-07-28 18:13:24

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] ARM: LPAE: load upper bits of early TTBR0/TTBR1

On Tue, Jul 22, 2014 at 04:36:23PM +0100, Konstantin Khlebnikov wrote:
> This patch fixes booting when idmap pgd lays above 4gb. Commit
> 4756dcbfd37 mostly had fixed this, but it'd failed to load upper bits.
>
> Also this fixes adding TTBR1_OFFSET to TTRR1: if lower part overflows
> carry flag must be added to the upper part.
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> Cc: Cyril Chemparathy <[email protected]>
> Cc: Vitaly Andrianov <[email protected]>
> ---
> arch/arm/mm/proc-v7-3level.S | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 22e3ad6..f0481dd 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -140,12 +140,11 @@ ENDPROC(cpu_v7_set_pte_ext)
> mov \tmp, \ttbr1, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
> mov \ttbr1, \ttbr1, lsl #ARCH_PGD_SHIFT @ lower bits
> addls \ttbr1, \ttbr1, #TTBR1_OFFSET
> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
> + adcls \tmp, \tmp, #0
> + mcrr p15, 1, \ttbr1, \tmp, c2 @ load TTBR1
> mov \tmp, \ttbr0, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
> mov \ttbr0, \ttbr0, lsl #ARCH_PGD_SHIFT @ lower bits
> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
> + mcrr p15, 0, \ttbr0, \tmp, c2 @ load TTBR0

I must admit, the code you are removing here looks really strange. Was there
a badly resolved conflict somewhere along the way? It would be nice to see
if your fix (which seems ok to me) was actually present in the mailing list
posting of the patch that ended in the above mess.

Will

2014-07-28 18:15:25

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Tue, Jul 22, 2014 at 04:36:35PM +0100, Konstantin Khlebnikov wrote:
> idmap layout combines both phisical and virtual addresses.
> Everything works fine if ram physically lays below PAGE_OFFSET.
> Otherwise idmap starts punching huge holes in virtual memory layout.
> It maps ram by 2MiB sections, but when it allocates new pmd page it
> cuts 1GiB at once.
>
> This patch makes a copy of all affected pmds from init_mm.
> Only few (usually one) 2MiB sections will be lost.
> This is not eliminates problem but makes it 512 times less likely.

I'm struggling to understand your commit message, but making a problem `512
times less likely' does sound like a bit of a hack to me. Can't we fix this
properly instead?

Will

2014-07-28 18:25:18

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 10:14 PM, Will Deacon <[email protected]> wrote:
> On Tue, Jul 22, 2014 at 04:36:35PM +0100, Konstantin Khlebnikov wrote:
>> idmap layout combines both phisical and virtual addresses.
>> Everything works fine if ram physically lays below PAGE_OFFSET.
>> Otherwise idmap starts punching huge holes in virtual memory layout.
>> It maps ram by 2MiB sections, but when it allocates new pmd page it
>> cuts 1GiB at once.
>>
>> This patch makes a copy of all affected pmds from init_mm.
>> Only few (usually one) 2MiB sections will be lost.
>> This is not eliminates problem but makes it 512 times less likely.
>
> I'm struggling to understand your commit message, but making a problem `512
> times less likely' does sound like a bit of a hack to me. Can't we fix this
> properly instead?

Yep, my comment sucks.

Usually idmap looks like this:

|0x00000000 -- <chunk of physical memory in identical mapping > --- |
TASK_SIZE -- <kernel space vm layoyt> --- 0xFFFFFFFF |

But when that physical memory chunk starts from 0xE8000000 or even
0xF2000000 evenything becomes very complicated.

>
> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2014-07-28 18:41:02

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 1/2] ARM: LPAE: load upper bits of early TTBR0/TTBR1

On Mon, Jul 28, 2014 at 10:12 PM, Will Deacon <[email protected]> wrote:
> On Tue, Jul 22, 2014 at 04:36:23PM +0100, Konstantin Khlebnikov wrote:
>> This patch fixes booting when idmap pgd lays above 4gb. Commit
>> 4756dcbfd37 mostly had fixed this, but it'd failed to load upper bits.
>>
>> Also this fixes adding TTBR1_OFFSET to TTRR1: if lower part overflows
>> carry flag must be added to the upper part.
>>
>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>> Cc: Cyril Chemparathy <[email protected]>
>> Cc: Vitaly Andrianov <[email protected]>
>> ---
>> arch/arm/mm/proc-v7-3level.S | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 22e3ad6..f0481dd 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -140,12 +140,11 @@ ENDPROC(cpu_v7_set_pte_ext)
>> mov \tmp, \ttbr1, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
>> mov \ttbr1, \ttbr1, lsl #ARCH_PGD_SHIFT @ lower bits
>> addls \ttbr1, \ttbr1, #TTBR1_OFFSET
>> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
>> + adcls \tmp, \tmp, #0
>> + mcrr p15, 1, \ttbr1, \tmp, c2 @ load TTBR1
>> mov \tmp, \ttbr0, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
>> mov \ttbr0, \ttbr0, lsl #ARCH_PGD_SHIFT @ lower bits
>> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
>> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
>> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
>> + mcrr p15, 0, \ttbr0, \tmp, c2 @ load TTBR0
>
> I must admit, the code you are removing here looks really strange. Was there
> a badly resolved conflict somewhere along the way? It would be nice to see
> if your fix (which seems ok to me) was actually present in the mailing list
> posting of the patch that ended in the above mess.

Nope, no merge conflicts, source in original patch
https://lkml.org/lkml/2012/9/11/346

That mess completely harmless, this code is used only once on boot.
I don't have that email, so replying isn't trivial for me.

>
> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2014-07-28 18:41:42

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 07:25:14PM +0100, Konstantin Khlebnikov wrote:
> On Mon, Jul 28, 2014 at 10:14 PM, Will Deacon <[email protected]> wrote:
> > On Tue, Jul 22, 2014 at 04:36:35PM +0100, Konstantin Khlebnikov wrote:
> >> idmap layout combines both phisical and virtual addresses.
> >> Everything works fine if ram physically lays below PAGE_OFFSET.
> >> Otherwise idmap starts punching huge holes in virtual memory layout.
> >> It maps ram by 2MiB sections, but when it allocates new pmd page it
> >> cuts 1GiB at once.
> >>
> >> This patch makes a copy of all affected pmds from init_mm.
> >> Only few (usually one) 2MiB sections will be lost.
> >> This is not eliminates problem but makes it 512 times less likely.
> >
> > I'm struggling to understand your commit message, but making a problem `512
> > times less likely' does sound like a bit of a hack to me. Can't we fix this
> > properly instead?
>
> Yep, my comment sucks.
>
> Usually idmap looks like this:
>
> |0x00000000 -- <chunk of physical memory in identical mapping > --- |
> TASK_SIZE -- <kernel space vm layoyt> --- 0xFFFFFFFF |
>
> But when that physical memory chunk starts from 0xE8000000 or even
> 0xF2000000 evenything becomes very complicated.

Why? As long as we don't clobber the kernel text (which would require
PHYS_OFFSET to be at a really weird alignment and very close to
PAGE_OFFSET), then you should be alright. Sure, you'll lose things like your
stack and the vmalloc area etc, but you're running in the idmap, so don't
use those things.

soft_restart is an example of code that deals with these issues. Which code
is causing you problems?

Will

2014-07-28 18:48:14

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] ARM: LPAE: load upper bits of early TTBR0/TTBR1

On Mon, Jul 28, 2014 at 07:40:58PM +0100, Konstantin Khlebnikov wrote:
> On Mon, Jul 28, 2014 at 10:12 PM, Will Deacon <[email protected]> wrote:
> > On Tue, Jul 22, 2014 at 04:36:23PM +0100, Konstantin Khlebnikov wrote:
> >> This patch fixes booting when idmap pgd lays above 4gb. Commit
> >> 4756dcbfd37 mostly had fixed this, but it'd failed to load upper bits.
> >>
> >> Also this fixes adding TTBR1_OFFSET to TTRR1: if lower part overflows
> >> carry flag must be added to the upper part.
> >>
> >> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> >> Cc: Cyril Chemparathy <[email protected]>
> >> Cc: Vitaly Andrianov <[email protected]>
> >> ---
> >> arch/arm/mm/proc-v7-3level.S | 7 +++----
> >> 1 file changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> >> index 22e3ad6..f0481dd 100644
> >> --- a/arch/arm/mm/proc-v7-3level.S
> >> +++ b/arch/arm/mm/proc-v7-3level.S
> >> @@ -140,12 +140,11 @@ ENDPROC(cpu_v7_set_pte_ext)
> >> mov \tmp, \ttbr1, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
> >> mov \ttbr1, \ttbr1, lsl #ARCH_PGD_SHIFT @ lower bits
> >> addls \ttbr1, \ttbr1, #TTBR1_OFFSET
> >> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
> >> + adcls \tmp, \tmp, #0
> >> + mcrr p15, 1, \ttbr1, \tmp, c2 @ load TTBR1
> >> mov \tmp, \ttbr0, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
> >> mov \ttbr0, \ttbr0, lsl #ARCH_PGD_SHIFT @ lower bits
> >> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
> >> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
> >> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
> >> + mcrr p15, 0, \ttbr0, \tmp, c2 @ load TTBR0
> >
> > I must admit, the code you are removing here looks really strange. Was there
> > a badly resolved conflict somewhere along the way? It would be nice to see
> > if your fix (which seems ok to me) was actually present in the mailing list
> > posting of the patch that ended in the above mess.
>
> Nope, no merge conflicts, source in original patch
> https://lkml.org/lkml/2012/9/11/346
>
> That mess completely harmless, this code is used only once on boot.
> I don't have that email, so replying isn't trivial for me.

How bizarre. Also, Cyril doesn't work for TI anymore (his email is
bouncing), so it's tricky to know what he meant here.

Your patch looks better than what we currently have though. Have you managed
to test it on a keystone platform (I don't have one)?

Will

2014-07-28 18:57:05

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 10:41 PM, Will Deacon <[email protected]> wrote:
> On Mon, Jul 28, 2014 at 07:25:14PM +0100, Konstantin Khlebnikov wrote:
>> On Mon, Jul 28, 2014 at 10:14 PM, Will Deacon <[email protected]> wrote:
>> > On Tue, Jul 22, 2014 at 04:36:35PM +0100, Konstantin Khlebnikov wrote:
>> >> idmap layout combines both phisical and virtual addresses.
>> >> Everything works fine if ram physically lays below PAGE_OFFSET.
>> >> Otherwise idmap starts punching huge holes in virtual memory layout.
>> >> It maps ram by 2MiB sections, but when it allocates new pmd page it
>> >> cuts 1GiB at once.
>> >>
>> >> This patch makes a copy of all affected pmds from init_mm.
>> >> Only few (usually one) 2MiB sections will be lost.
>> >> This is not eliminates problem but makes it 512 times less likely.
>> >
>> > I'm struggling to understand your commit message, but making a problem `512
>> > times less likely' does sound like a bit of a hack to me. Can't we fix this
>> > properly instead?
>>
>> Yep, my comment sucks.
>>
>> Usually idmap looks like this:
>>
>> |0x00000000 -- <chunk of physical memory in identical mapping > --- |
>> TASK_SIZE -- <kernel space vm layoyt> --- 0xFFFFFFFF |
>>
>> But when that physical memory chunk starts from 0xE8000000 or even
>> 0xF2000000 evenything becomes very complicated.
>
> Why? As long as we don't clobber the kernel text (which would require
> PHYS_OFFSET to be at a really weird alignment and very close to
> PAGE_OFFSET), then you should be alright. Sure, you'll lose things like your
> stack and the vmalloc area etc, but you're running in the idmap, so don't
> use those things.

Yep, we have piece of hardware with really weird aligned PHYS_OFFSET,
mostly all ram is above 4gb and we was lucky enough to get into the trouble.

It seems keystone has all memory above 4gb but small piece is mapped below
especially for booting. I suppose it's below PAGE_OFFSET so Cyril
hadn't seen that problem.

Also I seen comment somewhere in the code which tells that idrmap pgd is
always below 4gb which isn't quite true. Moreover, I had some experiments with
mapping ram to random places in qemu and seen that kernel cannot boot if
PHYS_OFFSET isn't alligned to 128mb which is strange.
So, it seems there is plenty bugs anound.

>
> soft_restart is an example of code that deals with these issues. Which code
> is causing you problems?

That was booting of secondary cpus, all of them.

>
> Will

2014-07-28 19:06:51

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 07:57:00PM +0100, Konstantin Khlebnikov wrote:
> On Mon, Jul 28, 2014 at 10:41 PM, Will Deacon <[email protected]> wrote:
> > On Mon, Jul 28, 2014 at 07:25:14PM +0100, Konstantin Khlebnikov wrote:
> >> On Mon, Jul 28, 2014 at 10:14 PM, Will Deacon <[email protected]> wrote:
> >> > On Tue, Jul 22, 2014 at 04:36:35PM +0100, Konstantin Khlebnikov wrote:
> >> >> idmap layout combines both phisical and virtual addresses.
> >> >> Everything works fine if ram physically lays below PAGE_OFFSET.
> >> >> Otherwise idmap starts punching huge holes in virtual memory layout.
> >> >> It maps ram by 2MiB sections, but when it allocates new pmd page it
> >> >> cuts 1GiB at once.
> >> >>
> >> >> This patch makes a copy of all affected pmds from init_mm.
> >> >> Only few (usually one) 2MiB sections will be lost.
> >> >> This is not eliminates problem but makes it 512 times less likely.
> >> >
> >> > I'm struggling to understand your commit message, but making a problem `512
> >> > times less likely' does sound like a bit of a hack to me. Can't we fix this
> >> > properly instead?
> >>
> >> Yep, my comment sucks.
> >>
> >> Usually idmap looks like this:
> >>
> >> |0x00000000 -- <chunk of physical memory in identical mapping > --- |
> >> TASK_SIZE -- <kernel space vm layoyt> --- 0xFFFFFFFF |
> >>
> >> But when that physical memory chunk starts from 0xE8000000 or even
> >> 0xF2000000 evenything becomes very complicated.
> >
> > Why? As long as we don't clobber the kernel text (which would require
> > PHYS_OFFSET to be at a really weird alignment and very close to
> > PAGE_OFFSET), then you should be alright. Sure, you'll lose things like your
> > stack and the vmalloc area etc, but you're running in the idmap, so don't
> > use those things.
>
> Yep, we have piece of hardware with really weird aligned PHYS_OFFSET,
> mostly all ram is above 4gb and we was lucky enough to get into the trouble.
>
> It seems keystone has all memory above 4gb but small piece is mapped below
> especially for booting. I suppose it's below PAGE_OFFSET so Cyril
> hadn't seen that problem.

Sure, I remember when the keystone support was merged. There's a low alias
of memory which isn't I/O coherent, so when we enable the MMU we rewrite the
page tables to use the high alias (which is also broken, as I pointed out on
the list today).

> Also I seen comment somewhere in the code which tells that idrmap pgd is
> always below 4gb which isn't quite true. Moreover, I had some experiments with
> mapping ram to random places in qemu and seen that kernel cannot boot if
> PHYS_OFFSET isn't alligned to 128mb which is strange.
> So, it seems there is plenty bugs anound.
>
> >
> > soft_restart is an example of code that deals with these issues. Which code
> > is causing you problems?
>
> That was booting of secondary cpus, all of them.

Ok. I think we need more specifics to progress with this patch. It's not
clear to me what's going wrong or why your platform is causing the issues
you're seeing.

Will

2014-07-28 19:13:19

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 10:57:00PM +0400, Konstantin Khlebnikov wrote:
> Also I seen comment somewhere in the code which tells that idrmap pgd is
> always below 4gb which isn't quite true.

"idmap" means "identity map". It's a region which maps the same value of
physical address to the same value of virtual address.

Since the virtual address space is limited to 4GB, there is /no way/ that
the physical address can be above 4GB, and it still be called an
"identity map".

The reason for this is that code in the identity map will be fetched with
the MMU off. While this code is running, it will enable the MMU using the
identity map page table pointer, and the CPU must see _no_ change in the
instructions/data fetched from that region. It will then branch to the
kernel proper, and the kernel will then switch away from the identity page
table.

Once the kernel has switched away from the identity page table, interrupts
and other exceptions can then be taken on the CPU - but not before.
Neither is it expected that the CPU will access any devices until it has
switched away from the identity page table.

What this also means is that the code in the identity map must remain
visible in the low 4GB of physical address space.

> Moreover, I had some experiments with
> mapping ram to random places in qemu and seen that kernel cannot boot if
> PHYS_OFFSET isn't alligned to 128mb which is strange.

That is intentional. PHYS_OFFSET has a number of restrictions, one of
them is indeed that the physical offset /will/ be aligned to 128MB.
That was decided after looking at the platforms we have and is now
fixed at that value with platform-breaking consequences if it needs
changing.

--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

2014-07-28 19:29:42

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 11:13 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Mon, Jul 28, 2014 at 10:57:00PM +0400, Konstantin Khlebnikov wrote:
>> Also I seen comment somewhere in the code which tells that idrmap pgd is
>> always below 4gb which isn't quite true.
>
> "idmap" means "identity map". It's a region which maps the same value of
> physical address to the same value of virtual address.
>
> Since the virtual address space is limited to 4GB, there is /no way/ that
> the physical address can be above 4GB, and it still be called an
> "identity map".
>
> The reason for this is that code in the identity map will be fetched with
> the MMU off. While this code is running, it will enable the MMU using the
> identity map page table pointer, and the CPU must see _no_ change in the
> instructions/data fetched from that region. It will then branch to the
> kernel proper, and the kernel will then switch away from the identity page
> table.
>
> Once the kernel has switched away from the identity page table, interrupts
> and other exceptions can then be taken on the CPU - but not before.
> Neither is it expected that the CPU will access any devices until it has
> switched away from the identity page table.
>
> What this also means is that the code in the identity map must remain
> visible in the low 4GB of physical address space.

Ok, before switching from identity mapping to normal mapping kernel must
switch instruction pointer from physical address to virtual. It's a long jump
out idmap code section to some normal kernel function.
__secondary_switched in my case. And both physical source and virtual
destination addresses must present in one same mapping (idmap).

idmap pgd starts as copy of pgd from init_mm, it points to the same pmd pages.
When it populates identical mapping for that special code section it allocates
new pages from pmd entries (which covers 1Gb of vm layout) but only few of
them are filled. In case 3/1GB split If idmap touches somehing above 3Gb it
kills whole kernel vm layout and as result kernel cannot jump to any
virtual address.

>
>> Moreover, I had some experiments with
>> mapping ram to random places in qemu and seen that kernel cannot boot if
>> PHYS_OFFSET isn't alligned to 128mb which is strange.
>
> That is intentional. PHYS_OFFSET has a number of restrictions, one of
> them is indeed that the physical offset /will/ be aligned to 128MB.
> That was decided after looking at the platforms we have and is now
> fixed at that value with platform-breaking consequences if it needs
> changing.
>
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.

2014-07-28 19:34:16

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 11:29 PM, Konstantin Khlebnikov
<[email protected]> wrote:
> On Mon, Jul 28, 2014 at 11:13 PM, Russell King - ARM Linux
> <[email protected]> wrote:
>> On Mon, Jul 28, 2014 at 10:57:00PM +0400, Konstantin Khlebnikov wrote:
>>> Also I seen comment somewhere in the code which tells that idrmap pgd is
>>> always below 4gb which isn't quite true.
>>
>> "idmap" means "identity map". It's a region which maps the same value of
>> physical address to the same value of virtual address.
>>
>> Since the virtual address space is limited to 4GB, there is /no way/ that
>> the physical address can be above 4GB, and it still be called an
>> "identity map".
>>
>> The reason for this is that code in the identity map will be fetched with
>> the MMU off. While this code is running, it will enable the MMU using the
>> identity map page table pointer, and the CPU must see _no_ change in the
>> instructions/data fetched from that region. It will then branch to the
>> kernel proper, and the kernel will then switch away from the identity page
>> table.
>>
>> Once the kernel has switched away from the identity page table, interrupts
>> and other exceptions can then be taken on the CPU - but not before.
>> Neither is it expected that the CPU will access any devices until it has
>> switched away from the identity page table.
>>
>> What this also means is that the code in the identity map must remain
>> visible in the low 4GB of physical address space.
>
> Ok, before switching from identity mapping to normal mapping kernel must
> switch instruction pointer from physical address to virtual. It's a long jump
> out idmap code section to some normal kernel function.
> __secondary_switched in my case. And both physical source and virtual
> destination addresses must present in one same mapping (idmap).
>
> idmap pgd starts as copy of pgd from init_mm, it points to the same pmd pages.
> When it populates identical mapping for that special code section it allocates
> new pages from pmd entries (which covers 1Gb of vm layout) but only few of
> them are filled. In case 3/1GB split If idmap touches somehing above 3Gb it
> kills whole kernel vm layout and as result kernel cannot jump to any
> virtual address.

Oh, my bad. That was response to Will Deacon's

> Ok. I think we need more specifics to progress with this patch. It's not
> clear to me what's going wrong or why your platform is causing the issues
> you're seeing.

>
>>
>>> Moreover, I had some experiments with
>>> mapping ram to random places in qemu and seen that kernel cannot boot if
>>> PHYS_OFFSET isn't alligned to 128mb which is strange.
>>
>> That is intentional. PHYS_OFFSET has a number of restrictions, one of
>> them is indeed that the physical offset /will/ be aligned to 128MB.
>> That was decided after looking at the platforms we have and is now
>> fixed at that value with platform-breaking consequences if it needs
>> changing.
>>
>> --
>> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
>> according to speedtest.net.

2014-07-28 19:43:05

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 11:29:39PM +0400, Konstantin Khlebnikov wrote:
> Ok, before switching from identity mapping to normal mapping kernel must
> switch instruction pointer from physical address to virtual.

"switch instruction pointer from physical address to virtual."

There's no such distinction for the instruction pointer.

There is the mode where the MMU is disabled. The CPU fetches from the
address in the program counter. The instructions it fetches will
instruct it to perform operations to enable the MMU.

The CPU itself continues fetching instructions (with an unpredictable
number of instructions in the CPU's pipeline) across this boundary.

Hence, it is _impossible_ to know which exact instructions are fetched
with the MMU off, and which are fetched with the MMU on.

This is why we need the identity mapping: so that the CPU's instruction
fetches can continue unaffected by turning the MMU on.

Before the MMU is on, the CPU is fetching from an untranslated (== physical)
view of the address space, which is limited to 4GB in size.

After the MMU is on, the CPU is fetching from a translated (== virtual)
view of the address space, which and the translated version is also
limited to 4GB in size.

> It's a long jump
> out idmap code section to some normal kernel function.
> __secondary_switched in my case. And both physical source and virtual
> destination addresses must present in one same mapping (idmap).

... one which the code already caters for.

> idmap pgd starts as copy of pgd from init_mm, it points to the same pmd pages.
> When it populates identical mapping for that special code section it allocates
> new pages from pmd entries (which covers 1Gb of vm layout) but only few of
> them are filled. In case 3/1GB split If idmap touches somehing above 3Gb it
> kills whole kernel vm layout and as result kernel cannot jump to any
> virtual address.

It doesn't matter, provided the kernel text and data in the virtual
address space are not overwritten by the identity mapping. If it
ends up in the vmalloc or IO space, that should not be a problem.

It also should not be a problem if the physical memory starts at
PAGE_OFFSET (iow, PHYS_OFFSET == PAGE_OFFSET). Indeed, we have
some platforms where that is true.

--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

2014-07-28 19:57:19

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 11:42 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Mon, Jul 28, 2014 at 11:29:39PM +0400, Konstantin Khlebnikov wrote:
>> Ok, before switching from identity mapping to normal mapping kernel must
>> switch instruction pointer from physical address to virtual.
>
> "switch instruction pointer from physical address to virtual."
>
> There's no such distinction for the instruction pointer.

I know. I mean "logically".

>
> There is the mode where the MMU is disabled. The CPU fetches from the
> address in the program counter. The instructions it fetches will
> instruct it to perform operations to enable the MMU.
>
> The CPU itself continues fetching instructions (with an unpredictable
> number of instructions in the CPU's pipeline) across this boundary.
>
> Hence, it is _impossible_ to know which exact instructions are fetched
> with the MMU off, and which are fetched with the MMU on.
>
> This is why we need the identity mapping: so that the CPU's instruction
> fetches can continue unaffected by turning the MMU on.
>
> Before the MMU is on, the CPU is fetching from an untranslated (== physical)
> view of the address space, which is limited to 4GB in size.
>
> After the MMU is on, the CPU is fetching from a translated (== virtual)
> view of the address space, which and the translated version is also
> limited to 4GB in size.

Sorry but I'm really look so dumb? Maybe it's true, it's almost
midnight at my side.

>
>> It's a long jump
>> out idmap code section to some normal kernel function.
>> __secondary_switched in my case. And both physical source and virtual
>> destination addresses must present in one same mapping (idmap).
>
> ... one which the code already caters for.
>
>> idmap pgd starts as copy of pgd from init_mm, it points to the same pmd pages.
>> When it populates identical mapping for that special code section it allocates
>> new pages from pmd entries (which covers 1Gb of vm layout) but only few of
>> them are filled. In case 3/1GB split If idmap touches somehing above 3Gb it
>> kills whole kernel vm layout and as result kernel cannot jump to any
>> virtual address.
>
> It doesn't matter, provided the kernel text and data in the virtual
> address space are not overwritten by the identity mapping. If it
> ends up in the vmalloc or IO space, that should not be a problem.

In my case it's been overwritten.
And it always happens when PHYS_OFFSET >= PAGE_OFFSET
because in case of LPAE idmap always overwrites 1Gb at once.

>
> It also should not be a problem if the physical memory starts at
> PAGE_OFFSET (iow, PHYS_OFFSET == PAGE_OFFSET). Indeed, we have
> some platforms where that is true.
>
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.

2014-07-29 10:57:33

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Mon, Jul 28, 2014 at 11:57:16PM +0400, Konstantin Khlebnikov wrote:
> On Mon, Jul 28, 2014 at 11:42 PM, Russell King - ARM Linux
> <[email protected]> wrote:
> > On Mon, Jul 28, 2014 at 11:29:39PM +0400, Konstantin Khlebnikov wrote:
> >> Ok, before switching from identity mapping to normal mapping kernel must
> >> switch instruction pointer from physical address to virtual.
> >
> > "switch instruction pointer from physical address to virtual."
> >
> > There's no such distinction for the instruction pointer.
>
> I know. I mean "logically".
>
...
>
> Sorry but I'm really look so dumb? Maybe it's true, it's almost
> midnight at my side.

When you use language which suggests a lack of understanding, then
I will explain things.

> > It doesn't matter, provided the kernel text and data in the virtual
> > address space are not overwritten by the identity mapping. If it
> > ends up in the vmalloc or IO space, that should not be a problem.
>
> In my case it's been overwritten.
> And it always happens when PHYS_OFFSET >= PAGE_OFFSET
> because in case of LPAE idmap always overwrites 1Gb at once.

Right, I now see what you're getting at. Here's a better description
which I've used when committing your patch. I've only taken patch 2
at the present time.



On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.

When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir. If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.

However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.

When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings in the 1GiB region, particularly
the mappings of the kernel itself.


--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

2014-07-29 11:16:54

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] ARM: LPAE: load upper bits of early TTBR0/TTBR1

On Mon, Jul 28, 2014 at 07:47:41PM +0100, Will Deacon wrote:
> On Mon, Jul 28, 2014 at 07:40:58PM +0100, Konstantin Khlebnikov wrote:
> > On Mon, Jul 28, 2014 at 10:12 PM, Will Deacon <[email protected]> wrote:
> > > On Tue, Jul 22, 2014 at 04:36:23PM +0100, Konstantin Khlebnikov wrote:
> > >> This patch fixes booting when idmap pgd lays above 4gb. Commit
> > >> 4756dcbfd37 mostly had fixed this, but it'd failed to load upper bits.
> > >>
> > >> Also this fixes adding TTBR1_OFFSET to TTRR1: if lower part overflows
> > >> carry flag must be added to the upper part.
> > >>
> > >> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> > >> Cc: Cyril Chemparathy <[email protected]>
> > >> Cc: Vitaly Andrianov <[email protected]>
> > >> ---
> > >> arch/arm/mm/proc-v7-3level.S | 7 +++----
> > >> 1 file changed, 3 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> > >> index 22e3ad6..f0481dd 100644
> > >> --- a/arch/arm/mm/proc-v7-3level.S
> > >> +++ b/arch/arm/mm/proc-v7-3level.S
> > >> @@ -140,12 +140,11 @@ ENDPROC(cpu_v7_set_pte_ext)
> > >> mov \tmp, \ttbr1, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
> > >> mov \ttbr1, \ttbr1, lsl #ARCH_PGD_SHIFT @ lower bits
> > >> addls \ttbr1, \ttbr1, #TTBR1_OFFSET
> > >> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
> > >> + adcls \tmp, \tmp, #0
> > >> + mcrr p15, 1, \ttbr1, \tmp, c2 @ load TTBR1
> > >> mov \tmp, \ttbr0, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
> > >> mov \ttbr0, \ttbr0, lsl #ARCH_PGD_SHIFT @ lower bits
> > >> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
> > >> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
> > >> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
> > >> + mcrr p15, 0, \ttbr0, \tmp, c2 @ load TTBR0
> > >
> > > I must admit, the code you are removing here looks really strange. Was there
> > > a badly resolved conflict somewhere along the way? It would be nice to see
> > > if your fix (which seems ok to me) was actually present in the mailing list
> > > posting of the patch that ended in the above mess.
> >
> > Nope, no merge conflicts, source in original patch
> > https://lkml.org/lkml/2012/9/11/346
> >
> > That mess completely harmless, this code is used only once on boot.
> > I don't have that email, so replying isn't trivial for me.
>
> How bizarre. Also, Cyril doesn't work for TI anymore (his email is
> bouncing), so it's tricky to know what he meant here.
>
> Your patch looks better than what we currently have though. Have you managed
> to test it on a keystone platform (I don't have one)?

Given that:

Acked-by: Will Deacon <[email protected]>

Will

2014-07-29 12:29:51

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 1/2] ARM: LPAE: load upper bits of early TTBR0/TTBR1

On Mon, Jul 28, 2014 at 10:47 PM, Will Deacon <[email protected]> wrote:
> On Mon, Jul 28, 2014 at 07:40:58PM +0100, Konstantin Khlebnikov wrote:
>> On Mon, Jul 28, 2014 at 10:12 PM, Will Deacon <[email protected]> wrote:
>> > On Tue, Jul 22, 2014 at 04:36:23PM +0100, Konstantin Khlebnikov wrote:
>> >> This patch fixes booting when idmap pgd lays above 4gb. Commit
>> >> 4756dcbfd37 mostly had fixed this, but it'd failed to load upper bits.
>> >>
>> >> Also this fixes adding TTBR1_OFFSET to TTRR1: if lower part overflows
>> >> carry flag must be added to the upper part.
>> >>
>> >> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>> >> Cc: Cyril Chemparathy <[email protected]>
>> >> Cc: Vitaly Andrianov <[email protected]>
>> >> ---
>> >> arch/arm/mm/proc-v7-3level.S | 7 +++----
>> >> 1 file changed, 3 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> >> index 22e3ad6..f0481dd 100644
>> >> --- a/arch/arm/mm/proc-v7-3level.S
>> >> +++ b/arch/arm/mm/proc-v7-3level.S
>> >> @@ -140,12 +140,11 @@ ENDPROC(cpu_v7_set_pte_ext)
>> >> mov \tmp, \ttbr1, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
>> >> mov \ttbr1, \ttbr1, lsl #ARCH_PGD_SHIFT @ lower bits
>> >> addls \ttbr1, \ttbr1, #TTBR1_OFFSET
>> >> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
>> >> + adcls \tmp, \tmp, #0
>> >> + mcrr p15, 1, \ttbr1, \tmp, c2 @ load TTBR1
>> >> mov \tmp, \ttbr0, lsr #(32 - ARCH_PGD_SHIFT) @ upper bits
>> >> mov \ttbr0, \ttbr0, lsl #ARCH_PGD_SHIFT @ lower bits
>> >> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
>> >> - mcrr p15, 1, \ttbr1, \zero, c2 @ load TTBR1
>> >> - mcrr p15, 0, \ttbr0, \zero, c2 @ load TTBR0
>> >> + mcrr p15, 0, \ttbr0, \tmp, c2 @ load TTBR0
>> >
>> > I must admit, the code you are removing here looks really strange. Was there
>> > a badly resolved conflict somewhere along the way? It would be nice to see
>> > if your fix (which seems ok to me) was actually present in the mailing list
>> > posting of the patch that ended in the above mess.
>>
>> Nope, no merge conflicts, source in original patch
>> https://lkml.org/lkml/2012/9/11/346
>>
>> That mess completely harmless, this code is used only once on boot.
>> I don't have that email, so replying isn't trivial for me.
>
> How bizarre. Also, Cyril doesn't work for TI anymore (his email is
> bouncing), so it's tricky to know what he meant here.
>
> Your patch looks better than what we currently have though. Have you managed
> to test it on a keystone platform (I don't have one)?

No, I don't have it too. As well as I don't have direct access to the
platform where
problem was found. I've debugged this in patched qemu.


Also this commit has wrong assumptions about idmap pgd address.
(at least in the commit message)

Probably suspend/resume path is affected too. I'll try to check it.

commit f3db3f4389dbd9a8c2b4477f37a6ebddfd670ad8
Author: Mahesh Sivasubramanian <[email protected]>
Date: Fri Nov 8 23:25:20 2013 +0100

ARM: 7885/1: Save/Restore 64-bit TTBR registers on LPAE suspend/resume

LPAE enabled kernels use the 64-bit version of TTBR0 and TTBR1
registers. If we're running an LPAE kernel, fill the upper half
of TTBR0 with 0 because we're setting it to the idmap here (the
idmap is guaranteed to be < 4Gb) and fully restore TTBR1 instead
of just restoring the lower 32 bits. Failure to do so can cause
failures on resume from suspend when these registers are only
half restored.

Signed-off-by: Mahesh Sivasubramanian <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Russell King <[email protected]>


>
> Will

2014-07-29 12:37:10

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: LPAE: reduce damage caused by idmap to virtual memory layout

On Tue, Jul 29, 2014 at 2:57 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Mon, Jul 28, 2014 at 11:57:16PM +0400, Konstantin Khlebnikov wrote:
>> On Mon, Jul 28, 2014 at 11:42 PM, Russell King - ARM Linux
>> <[email protected]> wrote:
>> > On Mon, Jul 28, 2014 at 11:29:39PM +0400, Konstantin Khlebnikov wrote:
>> >> Ok, before switching from identity mapping to normal mapping kernel must
>> >> switch instruction pointer from physical address to virtual.
>> >
>> > "switch instruction pointer from physical address to virtual."
>> >
>> > There's no such distinction for the instruction pointer.
>>
>> I know. I mean "logically".
>>
> ...
>>
>> Sorry but I'm really look so dumb? Maybe it's true, it's almost
>> midnight at my side.
>
> When you use language which suggests a lack of understanding, then
> I will explain things.
>
>> > It doesn't matter, provided the kernel text and data in the virtual
>> > address space are not overwritten by the identity mapping. If it
>> > ends up in the vmalloc or IO space, that should not be a problem.
>>
>> In my case it's been overwritten.
>> And it always happens when PHYS_OFFSET >= PAGE_OFFSET
>> because in case of LPAE idmap always overwrites 1Gb at once.
>
> Right, I now see what you're getting at. Here's a better description
> which I've used when committing your patch. I've only taken patch 2
> at the present time.
>
>

Ok. Thank you for your time.

>
> On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
> (pmd) entries map 2MiB.
>
> When the identity mapping is created on LPAE, the pgd pointers are copied
> from the swapper_pg_dir. If we find that we need to modify the contents
> of a pmd, we allocate a new empty pmd table and insert it into the
> appropriate 1GB slot, before then filling it with the identity mapping.
>
> However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
> those mappings.
>
> When replacing a PMD, first copy the old PMD contents to the new PMD, so
> that we preserve the existing mappings in the 1GiB region, particularly
> the mappings of the kernel itself.
>
>
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.