2018-05-02 16:09:25

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation

startup_64() copies kernel (including .data section) to the new place.
It's required for safe in-place decompression.

This is a problem if the original place is referenced: by mistake I've
put 'top_pgtable' into .data section and the address is loaded into CR3.
If the original place gets overwritten during image decompression the
kernel will crash and the machine will be rebooted.

Move 'top_pgtable' into .pgtable section where the rest of page tables
are. This section is not subject for relocation.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
---
arch/x86/boot/compressed/head_64.S | 8 ++++++++
arch/x86/boot/compressed/pgtable_64.c | 4 +---
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fca012baba19..c433c21703e6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -649,3 +649,11 @@ boot_stack_end:
.balign 4096
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
+
+/*
+ * The page table is going to be used instead of page table in the trampoline
+ * memory.
+ */
+ .global top_pgtable
+top_pgtable:
+ .fill PAGE_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 32af1cbcd903..3a0578f54550 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -25,10 +25,8 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
/*
* The page table is going to be used instead of page table in the trampoline
* memory.
- *
- * It must not be in BSS as BSS is cleared after cleanup_trampoline().
*/
-static char top_pgtable[PAGE_SIZE] __aligned(PAGE_SIZE) __section(.data);
+extern char *top_pgtable;

/*
* Trampoline address will be printed by extract_kernel() for debugging
--
2.17.0



2018-05-03 03:44:19

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation

On Wed, 2 May 2018, Kirill A. Shutemov wrote:

> startup_64() copies kernel (including .data section) to the new place.
> It's required for safe in-place decompression.
>
> This is a problem if the original place is referenced: by mistake I've
> put 'top_pgtable' into .data section and the address is loaded into CR3.
> If the original place gets overwritten during image decompression the
> kernel will crash and the machine will be rebooted.
>
> Move 'top_pgtable' into .pgtable section where the rest of page tables
> are. This section is not subject for relocation.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")

Thanks for the Cc, Kirill, which I presume was because I'd mentioned
to you that I was unable to boot 4.17-rc on laptop or workstation.

Which is still so with 4.17-rc3, and I'm sorry to say still so with this
patch: even if I add the fix which I think this patch needs, see below.

I did bisect on Monday, and the first bad was your commit 194a9749c73d
"x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G".
(Cc'ing Dave since his PTI Global work was my other suspect, but that's
off the hook - if I revert just 194a9749c73d then I have no trouble.)

Am I really the only one getting immediate reboot on x86_64?
Perhaps everyone else has machines with 5-level page tables now ?-)

I've looked at the changes a little, and tried a few things (hoping to
avoid a long back and forth describing and trying things for you); but
no success yet, and rather out of my depth with these changes - I've
not had to delve into boot/compressed before.

(I did briefly get excited by the
trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET
in cleanup_trampoline(), which lacks a "/ sizeof(unsigned long)";
but since ...PGTABLE_OFFSET is 0 anyway, that's nothing but cosmetic.)

Hugh

> ---
> arch/x86/boot/compressed/head_64.S | 8 ++++++++
> arch/x86/boot/compressed/pgtable_64.c | 4 +---
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index fca012baba19..c433c21703e6 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -649,3 +649,11 @@ boot_stack_end:
> .balign 4096
> pgtable:
> .fill BOOT_PGT_SIZE, 1, 0
> +
> +/*
> + * The page table is going to be used instead of page table in the trampoline
> + * memory.
> + */
> + .global top_pgtable
> +top_pgtable:
> + .fill PAGE_SIZE, 1, 0
> diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
> index 32af1cbcd903..3a0578f54550 100644
> --- a/arch/x86/boot/compressed/pgtable_64.c
> +++ b/arch/x86/boot/compressed/pgtable_64.c
> @@ -25,10 +25,8 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
> /*
> * The page table is going to be used instead of page table in the trampoline
> * memory.
> - *
> - * It must not be in BSS as BSS is cleared after cleanup_trampoline().
> */
> -static char top_pgtable[PAGE_SIZE] __aligned(PAGE_SIZE) __section(.data);
> +extern char *top_pgtable;

Doesn't that need to be extern char top_pgtable[] ?

>
> /*
> * Trampoline address will be printed by extract_kernel() for debugging
> --
> 2.17.0

2018-05-03 08:40:16

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation

On Wed, May 02, 2018 at 08:42:30PM -0700, Hugh Dickins wrote:
> On Wed, 2 May 2018, Kirill A. Shutemov wrote:
>
> > startup_64() copies kernel (including .data section) to the new place.
> > It's required for safe in-place decompression.
> >
> > This is a problem if the original place is referenced: by mistake I've
> > put 'top_pgtable' into .data section and the address is loaded into CR3.
> > If the original place gets overwritten during image decompression the
> > kernel will crash and the machine will be rebooted.
> >
> > Move 'top_pgtable' into .pgtable section where the rest of page tables
> > are. This section is not subject for relocation.
> >
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
> > Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
>
> Thanks for the Cc, Kirill, which I presume was because I'd mentioned
> to you that I was unable to boot 4.17-rc on laptop or workstation.

Right.

> Which is still so with 4.17-rc3, and I'm sorry to say still so with this
> patch: even if I add the fix which I think this patch needs, see below.

Hm.. Could you share your config?

IIRC, you use legacy boot. What bootloader?

> I did bisect on Monday, and the first bad was your commit 194a9749c73d
> "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G".
> (Cc'ing Dave since his PTI Global work was my other suspect, but that's
> off the hook - if I revert just 194a9749c73d then I have no trouble.)
>
> Am I really the only one getting immediate reboot on x86_64?

There was one more thread:

http://lkml.kernel.org/r/[email protected]

But no firm conclusion, only blaming GCC without a good reason.
BTW, what compiler do you use?

> Perhaps everyone else has machines with 5-level page tables now ?-)

No :)

> I've looked at the changes a little, and tried a few things (hoping to
> avoid a long back and forth describing and trying things for you); but
> no success yet, and rather out of my depth with these changes - I've
> not had to delve into boot/compressed before.
>
> (I did briefly get excited by the
> trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET
> in cleanup_trampoline(), which lacks a "/ sizeof(unsigned long)";
> but since ...PGTABLE_OFFSET is 0 anyway, that's nothing but cosmetic.)

It worth fixing anyway. Thanks for pointing it out.

> > ---
> > arch/x86/boot/compressed/head_64.S | 8 ++++++++
> > arch/x86/boot/compressed/pgtable_64.c | 4 +---
> > 2 files changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index fca012baba19..c433c21703e6 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -649,3 +649,11 @@ boot_stack_end:
> > .balign 4096
> > pgtable:
> > .fill BOOT_PGT_SIZE, 1, 0
> > +
> > +/*
> > + * The page table is going to be used instead of page table in the trampoline
> > + * memory.
> > + */
> > + .global top_pgtable
> > +top_pgtable:
> > + .fill PAGE_SIZE, 1, 0
> > diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
> > index 32af1cbcd903..3a0578f54550 100644
> > --- a/arch/x86/boot/compressed/pgtable_64.c
> > +++ b/arch/x86/boot/compressed/pgtable_64.c
> > @@ -25,10 +25,8 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
> > /*
> > * The page table is going to be used instead of page table in the trampoline
> > * memory.
> > - *
> > - * It must not be in BSS as BSS is cleared after cleanup_trampoline().
> > */
> > -static char top_pgtable[PAGE_SIZE] __aligned(PAGE_SIZE) __section(.data);
> > +extern char *top_pgtable;
>
> Doesn't that need to be extern char top_pgtable[] ?

Ouch. That's embarrassing.

So in my case the top_pgtable is zero and apparently it's good enough
place to put the page table. It boots :P

The patch is bogus and I still don't understand what is going on.

Could you please check if bypassing cleanup_trampoline() altogether fixes
the issue for you:

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fca012baba19..73821ac626f6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -367,16 +367,6 @@ trampoline_return:
/* Restore the stack, the 32-bit trampoline uses its own stack */
leaq boot_stack_end(%rbx), %rsp

- /*
- * cleanup_trampoline() would restore trampoline memory.
- *
- * RSI holds real mode data and needs to be preserved across
- * this function call.
- */
- pushq %rsi
- call cleanup_trampoline
- popq %rsi
-
/* Zero EFLAGS */
pushq $0
popfq
--
Kirill A. Shutemov

2018-05-03 10:52:35

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation

On Thu, May 03, 2018 at 08:38:49AM +0000, Kirill A. Shutemov wrote:
> The patch is bogus and I still don't understand what is going on.

I think I found the issue. Could you check the patch below:

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fca012baba19..86169ae1c536 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -370,10 +370,13 @@ trampoline_return:
/*
* cleanup_trampoline() would restore trampoline memory.
*
+ * RDI is address of the page table to use (if required).
+ *
* RSI holds real mode data and needs to be preserved across
* this function call.
*/
pushq %rsi
+ leaq top_pgtable(%rbx), %rdi
call cleanup_trampoline
popq %rsi

@@ -647,5 +650,14 @@ boot_stack_end:
*/
.section ".pgtable","a",@nobits
.balign 4096
+ .global pgtable
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
+
+/*
+ * The page table is going to be used instead of page table in the trampoline
+ * memory.
+ */
+ .global top_pgtable
+top_pgtable:
+ .fill PAGE_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 32af1cbcd903..a362fa0b849c 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -22,14 +22,6 @@ struct paging_config {
/* Buffer to preserve trampoline memory */
static char trampoline_save[TRAMPOLINE_32BIT_SIZE];

-/*
- * The page table is going to be used instead of page table in the trampoline
- * memory.
- *
- * It must not be in BSS as BSS is cleared after cleanup_trampoline().
- */
-static char top_pgtable[PAGE_SIZE] __aligned(PAGE_SIZE) __section(.data);
-
/*
* Trampoline address will be printed by extract_kernel() for debugging
* purposes.
@@ -134,7 +126,7 @@ struct paging_config paging_prepare(void)
return paging_config;
}

-void cleanup_trampoline(void)
+void cleanup_trampoline(void *pgtable)
{
void *trampoline_pgtable;

@@ -145,8 +137,8 @@ void cleanup_trampoline(void)
* if it's there.
*/
if ((void *)__native_read_cr3() == trampoline_pgtable) {
- memcpy(top_pgtable, trampoline_pgtable, PAGE_SIZE);
- native_write_cr3((unsigned long)top_pgtable);
+ memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
+ native_write_cr3((unsigned long)pgtable);
}

/* Restore trampoline memory */
--
Kirill A. Shutemov

2018-05-03 17:20:51

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation

On Thu, 3 May 2018, Kirill A. Shutemov wrote:

> On Thu, May 03, 2018 at 08:38:49AM +0000, Kirill A. Shutemov wrote:
> > The patch is bogus and I still don't understand what is going on.
>
> I think I found the issue. Could you check the patch below:

Sorry, no good on either machine, immediate reboot as before.
I'm gathering the info you asked, will send privately in an hour or so.

Hugh

>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index fca012baba19..86169ae1c536 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -370,10 +370,13 @@ trampoline_return:
> /*
> * cleanup_trampoline() would restore trampoline memory.
> *
> + * RDI is address of the page table to use (if required).
> + *
> * RSI holds real mode data and needs to be preserved across
> * this function call.
> */
> pushq %rsi
> + leaq top_pgtable(%rbx), %rdi
> call cleanup_trampoline
> popq %rsi
>
> @@ -647,5 +650,14 @@ boot_stack_end:
> */
> .section ".pgtable","a",@nobits
> .balign 4096
> + .global pgtable
> pgtable:
> .fill BOOT_PGT_SIZE, 1, 0
> +
> +/*
> + * The page table is going to be used instead of page table in the trampoline
> + * memory.
> + */
> + .global top_pgtable
> +top_pgtable:
> + .fill PAGE_SIZE, 1, 0
> diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
> index 32af1cbcd903..a362fa0b849c 100644
> --- a/arch/x86/boot/compressed/pgtable_64.c
> +++ b/arch/x86/boot/compressed/pgtable_64.c
> @@ -22,14 +22,6 @@ struct paging_config {
> /* Buffer to preserve trampoline memory */
> static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
>
> -/*
> - * The page table is going to be used instead of page table in the trampoline
> - * memory.
> - *
> - * It must not be in BSS as BSS is cleared after cleanup_trampoline().
> - */
> -static char top_pgtable[PAGE_SIZE] __aligned(PAGE_SIZE) __section(.data);
> -
> /*
> * Trampoline address will be printed by extract_kernel() for debugging
> * purposes.
> @@ -134,7 +126,7 @@ struct paging_config paging_prepare(void)
> return paging_config;
> }
>
> -void cleanup_trampoline(void)
> +void cleanup_trampoline(void *pgtable)
> {
> void *trampoline_pgtable;
>
> @@ -145,8 +137,8 @@ void cleanup_trampoline(void)
> * if it's there.
> */
> if ((void *)__native_read_cr3() == trampoline_pgtable) {
> - memcpy(top_pgtable, trampoline_pgtable, PAGE_SIZE);
> - native_write_cr3((unsigned long)top_pgtable);
> + memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
> + native_write_cr3((unsigned long)pgtable);
> }
>
> /* Restore trampoline memory */
> --
> Kirill A. Shutemov
>

2018-06-21 15:19:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation


* Hugh Dickins <[email protected]> wrote:

> On Thu, 3 May 2018, Kirill A. Shutemov wrote:
>
> > On Thu, May 03, 2018 at 08:38:49AM +0000, Kirill A. Shutemov wrote:
> > > The patch is bogus and I still don't understand what is going on.
> >
> > I think I found the issue. Could you check the patch below:
>
> Sorry, no good on either machine, immediate reboot as before.
> I'm gathering the info you asked, will send privately in an hour or so.

Hi, any update on this bug?

Kirill: mind (re-)sending a series of any resulting patches that you have pending?

If there's no fix, is there any list of SHA1's to revert?

Thanks,

Ingo

2018-06-21 15:28:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation

On Thu, Jun 21, 2018 at 03:18:05PM +0000, Ingo Molnar wrote:
>
> * Hugh Dickins <[email protected]> wrote:
>
> > On Thu, 3 May 2018, Kirill A. Shutemov wrote:
> >
> > > On Thu, May 03, 2018 at 08:38:49AM +0000, Kirill A. Shutemov wrote:
> > > > The patch is bogus and I still don't understand what is going on.
> > >
> > > I think I found the issue. Could you check the patch below:
> >
> > Sorry, no good on either machine, immediate reboot as before.
> > I'm gathering the info you asked, will send privately in an hour or so.
>
> Hi, any update on this bug?
>
> Kirill: mind (re-)sending a series of any resulting patches that you have pending?
>
> If there's no fix, is there any list of SHA1's to revert?

The root cause of the bug was fixed in v4.17 by

5c9b0b1c4988 ("x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()").

The only pending patch I have at the moment is

http://lkml.kernel.org/r/[email protected]

Do you want me to resend it again?

--
Kirill A. Shutemov

2018-06-21 16:08:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Exclude 'top_pgtable' from relocation


* Kirill A. Shutemov <[email protected]> wrote:

> > Kirill: mind (re-)sending a series of any resulting patches that you have pending?
>
> The root cause of the bug was fixed in v4.17 by
>
> 5c9b0b1c4988 ("x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()").

Ok, cool!

> The only pending patch I have at the moment is
>
> http://lkml.kernel.org/r/[email protected]
>
> Do you want me to resend it again?

No need, I think Thomas is handling that one.

Thanks,

Ingo