2019-04-19 18:36:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

Breaking thread because this one got too big.

On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote:
> There are two approach to fix it, detect if the systab is mapped, and
> avoid reading it if not.

Ok, so tglx and I discussed this situation which is slowly getting out
of hand with all the tinkering.

So, here's what we should do - scream loudly now if some of this doesn't
make any sense.

1. Junichi's patch should get the systab check above added and sent to
5.1 so that at least some EFI kexecing can work with 5.1

2. Then, the fact whether the kernel has been kexec'ed and which
addresses it should use early, should all be passed through boot_params
which is either setup by kexec(1) or by the first kernel itself, in the
kexec_file_load() case.

> the systab region is not mapped by the identity mapping provided by
> kexec.

3. Then that needs to be fixed in the first kernel as it is a
shortcoming of us starting to parse systab very early. It is the kexec
setup code's problem not the early compressed stage's problem that the
EFI systab is not mapped.

Anything else I've forgotten? Anything I've misrepresented?

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


2019-04-19 18:37:26

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On Fri, Apr 19, 2019 at 06:50:14PM +0800, Baoquan He wrote:
> Talked with Kairui privately just now. Seems Junichi's patch need add
> this systab mapping. Since the systab region is not mapped on some
> machines. Those machine don't have this issue because they got systab
> region luckily coverred by 1 GB page mapping in 1st kernel before
> kexec jumping.

You don't have to repeat all I that - I know what the problem is. Read
what I said again: it is too late for 5.1 to do any involved surgery.

> > 2. Then, the fact whether the kernel has been kexec'ed and which
> > addresses it should use early, should all be passed through boot_params
> > which is either setup by kexec(1) or by the first kernel itself, in the
> > kexec_file_load() case.
>
> Seems no better way to check if it's kexec-ed kernel, except of the
> setup data checking of kexec-ed kernel.

Why does that "seem" so?

Read again what I said: "should all be passed through boot_params".
Which means, boot_params should be extended with a field of a flag to
say: "this is a kexec'ed kernel".

If it "seems" then it should be made to not "seem" but to work properly.

> Yeah, adding the systab mapping looks good. Kairui put it in
> decompressing stage just because he wants to cover the case in which the
> old kernel kexec jumping to 2nd kernel. Now it seems not very
> reasonable, we also have the new kernel kexec jumping to old 2nd kernel.

I don't think we can guarantee kexec between old<->new kernel to always
work. Otherwise, we can forget all development and improvements of new
kernel.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-04-19 18:46:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On Fri, Apr 19, 2019 at 07:20:06PM +0800, Kairui Song wrote:
> Thanks for the declaration Bao, I can verify on the machine I have,
> the issue still exist without kaslr. Currently, we read rsdp in early
> code and fill in boot_params unconditional, so it will read from the
> systab anyway.

Yes, and in the future, info required by the kexec'ed kernel - like the
EFI systab address or even whether the kernel has been kexec'ed or comes
from cold boot - should be passed in boot_params. So that we don't have
to do all that ugly dancing in early code.

> Yes, kexec only cover RAM in the ident map it prepared for second
> kernel, but the systab could be in reserved region, so if it didn't
> fall into the 1G padding by accident it will fail when reading from
> it. Fix in early code could make sure 2nd kernel always work. Or
> should we treat it specially in kexec mapping prepare code?

Yes, we should. As I said, this is not early boot code's problem but the
kexec setup code's problem.

If the new kernel cannot get RSDP that early, then it should fail the
same way it failed before. That early RDSP parsing was added for the
movable regions thing working with KASLR.

If it can't get a RDSP for whatever reason, then if KASLR selects
a region overlapping with the movable regions, then it is the old
behavior.

Ok?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-04-19 18:46:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On Fri, Apr 19, 2019 at 01:28:01PM +0200, Borislav Petkov wrote:
> Read again what I said: "should all be passed through boot_params".
> Which means, boot_params should be extended with a field of a flag to
> say: "this is a kexec'ed kernel".

And by that I mean similar to the XLF_EFI_KEXEC mechanism. The first
kernel or kexec(1) should prepare the info needed by the kexec'ed
kernel.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-04-19 18:55:19

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On 04/19/19 at 01:28pm, Borislav Petkov wrote:
> Why does that "seem" so?
>
> Read again what I said: "should all be passed through boot_params".
> Which means, boot_params should be extended with a field of a flag to
> say: "this is a kexec'ed kernel".
>
> If it "seems" then it should be made to not "seem" but to work properly.

No objection to extending with fields of a flag to mark kexec'ed kernel.
Or kdump kernel either. We now check kdump kernel by /proc/vmcore.

>
> > Yeah, adding the systab mapping looks good. Kairui put it in
> > decompressing stage just because he wants to cover the case in which the
> > old kernel kexec jumping to 2nd kernel. Now it seems not very
> > reasonable, we also have the new kernel kexec jumping to old 2nd kernel.
>
> I don't think we can guarantee kexec between old<->new kernel to always
> work. Otherwise, we can forget all development and improvements of new
> kernel.

I personally agree with this. Very earlier, we tried to remove the
896 MB limitation of crashkernel=xM, to extend it to 4G or the whole
RAM, but rejcted by linus since he worried it could break the old kernel
kdumping. I may not remember well.

2019-04-19 19:10:18

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On 04/19/19 at 12:17pm, Borislav Petkov wrote:
> Breaking thread because this one got too big.
>
> On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote:
> > There are two approach to fix it, detect if the systab is mapped, and
> > avoid reading it if not.
>
> Ok, so tglx and I discussed this situation which is slowly getting out
> of hand with all the tinkering.
>
> So, here's what we should do - scream loudly now if some of this doesn't
> make any sense.
>
> 1. Junichi's patch should get the systab check above added and sent to
> 5.1 so that at least some EFI kexecing can work with 5.1

Talked with Kairui privately just now. Seems Junichi's patch need add
this systab mapping. Since the systab region is not mapped on some
machines. Those machine don't have this issue because they got systab
region luckily coverred by 1 GB page mapping in 1st kernel before
kexec jumping.

This issue should happen whether it is KASLR kernel or not KASLR kernel.

>
> 2. Then, the fact whether the kernel has been kexec'ed and which
> addresses it should use early, should all be passed through boot_params
> which is either setup by kexec(1) or by the first kernel itself, in the
> kexec_file_load() case.

Seems no better way to check if it's kexec-ed kernel, except of the
setup data checking of kexec-ed kernel.

It may happen in both kexec_load or kexec_file_load, since we build
ident mapping of kexec for RAM in 1st kernel.

>
> > the systab region is not mapped by the identity mapping provided by
> > kexec.
>
> 3. Then that needs to be fixed in the first kernel as it is a
> shortcoming of us starting to parse systab very early. It is the kexec
> setup code's problem not the early compressed stage's problem that the
> EFI systab is not mapped.

Yeah, adding the systab mapping looks good. Kairui put it in
decompressing stage just because he wants to cover the case in which the
old kernel kexec jumping to 2nd kernel. Now it seems not very
reasonable, we also have the new kernel kexec jumping to old 2nd kernel.

Thanks
Baoquan

2019-04-19 19:10:23

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH] x86/boot: Disable RSDP parsing temporarily

Ok,

thinking about this more, we believe it is too late in the release cycle
to keep experimenting so the only thing left to do is the below.

This should bring the situation back to what it was before, at 5.0
times, and we'll have plenty of time now to address and properly fix all
the outstanding issues.

---
From: Borislav Petkov <[email protected]>

The original intention to move RDSP parsing very early, before KASLR
does its ranges selection, was to accommodate movable memory regions
machines (CONFIG_MEMORY_HOTREMOVE) to still be able to do memory
hotplug.

However, that broke kexec'ing a kernel on EFI machines because depending
on where the EFI systab was mapped, on at least one machine it isn't
present in the kexec mapping of the second kernel, leading to a triple
fault in the early code.

Fixing this properly requires significantly involved surgery and we
cannot allow ourselves to do that, that close to the merge window.

So disable the RSDP parsing code temporarily until it is fixed properly
in the next release cycle.

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Chao Fan <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: Kees Cook <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: x86-ml <[email protected]>
---
arch/x86/boot/compressed/misc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c0d6c560df69..5a237e8dbf8d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -352,7 +352,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
boot_params->hdr.loadflags &= ~KASLR_FLAG;

/* Save RSDP address for later use. */
- boot_params->acpi_rsdp_addr = get_rsdp_addr();
+ /* boot_params->acpi_rsdp_addr = get_rsdp_addr(); */

sanitize_boot_params(boot_params);

--
2.21.0


--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-04-19 19:13:38

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On 04/19/19 at 06:50pm, Baoquan He wrote:
> On 04/19/19 at 12:17pm, Borislav Petkov wrote:
> > Breaking thread because this one got too big.
> >
> > On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote:
> > > There are two approach to fix it, detect if the systab is mapped, and
> > > avoid reading it if not.
> >
> > Ok, so tglx and I discussed this situation which is slowly getting out
> > of hand with all the tinkering.
> >
> > So, here's what we should do - scream loudly now if some of this doesn't
> > make any sense.
> >
> > 1. Junichi's patch should get the systab check above added and sent to
> > 5.1 so that at least some EFI kexecing can work with 5.1
>
> Talked with Kairui privately just now. Seems Junichi's patch need add
> this systab mapping. Since the systab region is not mapped on some
> machines. Those machine don't have this issue because they got systab
> region luckily coverred by 1 GB page mapping in 1st kernel before
> kexec jumping.
>
> This issue should happen whether it is KASLR kernel or not KASLR kernel.
>
> >
> > 2. Then, the fact whether the kernel has been kexec'ed and which
> > addresses it should use early, should all be passed through boot_params
> > which is either setup by kexec(1) or by the first kernel itself, in the
> > kexec_file_load() case.
>
> Seems no better way to check if it's kexec-ed kernel, except of the
> setup data checking of kexec-ed kernel.
>
> It may happen in both kexec_load or kexec_file_load, since we build
> ident mapping of kexec for RAM in 1st kernel.
>
> >
> > > the systab region is not mapped by the identity mapping provided by
> > > kexec.
> >
> > 3. Then that needs to be fixed in the first kernel as it is a
> > shortcoming of us starting to parse systab very early. It is the kexec
> > setup code's problem not the early compressed stage's problem that the
> > EFI systab is not mapped.
>
> Yeah, adding the systab mapping looks good. Kairui put it in
^ in 1st kernel
> decompressing stage just because he wants to cover the case in which the
> old kernel kexec jumping to 2nd kernel. Now it seems not very
> reasonable, we also have the new kernel kexec jumping to old 2nd kernel.
>
> Thanks
> Baoquan

2019-04-19 19:32:43

by Kairui Song

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On Fri, Apr 19, 2019 at 7:34 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Apr 19, 2019 at 07:20:06PM +0800, Kairui Song wrote:
> > Thanks for the declaration Bao, I can verify on the machine I have,
> > the issue still exist without kaslr. Currently, we read rsdp in early
> > code and fill in boot_params unconditional, so it will read from the
> > systab anyway.
>
> Yes, and in the future, info required by the kexec'ed kernel - like the
> EFI systab address or even whether the kernel has been kexec'ed or comes
> from cold boot - should be passed in boot_params. So that we don't have
> to do all that ugly dancing in early code.
>
> > Yes, kexec only cover RAM in the ident map it prepared for second
> > kernel, but the systab could be in reserved region, so if it didn't
> > fall into the 1G padding by accident it will fail when reading from
> > it. Fix in early code could make sure 2nd kernel always work. Or
> > should we treat it specially in kexec mapping prepare code?
>
> Yes, we should. As I said, this is not early boot code's problem but the
> kexec setup code's problem.
>
> If the new kernel cannot get RSDP that early, then it should fail the
> same way it failed before. That early RDSP parsing was added for the
> movable regions thing working with KASLR.
>
> If it can't get a RDSP for whatever reason, then if KASLR selects
> a region overlapping with the movable regions, then it is the old
> behavior.
>
> Ok?
>

OK. And then fix the mapping issue in 1st kernel is the right way,
I'll skip the update for the early code mapping thing.


--
Best Regards,
Kairui Song

2019-04-19 20:15:44

by Kairui Song

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On Fri, Apr 19, 2019 at 6:50 PM Baoquan He <[email protected]> wrote:
>
> On 04/19/19 at 12:17pm, Borislav Petkov wrote:
> > Breaking thread because this one got too big.
> >
> > On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote:
> > > There are two approach to fix it, detect if the systab is mapped, and
> > > avoid reading it if not.
> >
> > Ok, so tglx and I discussed this situation which is slowly getting out
> > of hand with all the tinkering.
> >
> > So, here's what we should do - scream loudly now if some of this doesn't
> > make any sense.
> >
> > 1. Junichi's patch should get the systab check above added and sent to
> > 5.1 so that at least some EFI kexecing can work with 5.1
>
> Talked with Kairui privately just now. Seems Junichi's patch need add
> this systab mapping. Since the systab region is not mapped on some
> machines. Those machine don't have this issue because they got systab
> region luckily coverred by 1 GB page mapping in 1st kernel before
> kexec jumping.
>
> This issue should happen whether it is KASLR kernel or not KASLR kernel.

Thanks for the declaration Bao, I can verify on the machine I have,
the issue still exist without kaslr. Currently, we read rsdp in early
code and fill in boot_params unconditional, so it will read from the
systab anyway.

>
> >
> > 2. Then, the fact whether the kernel has been kexec'ed and which
> > addresses it should use early, should all be passed through boot_params
> > which is either setup by kexec(1) or by the first kernel itself, in the
> > kexec_file_load() case.
>
> Seems no better way to check if it's kexec-ed kernel, except of the
> setup data checking of kexec-ed kernel.
>
> It may happen in both kexec_load or kexec_file_load, since we build
> ident mapping of kexec for RAM in 1st kernel.

For kexec_file_load newer kernel will fill in the acpi_rsdp in
boot_params so it bypassed the kexec_get_rsdp_addr (which will read
from systab). The problem is not fixed, systab mapping still missing,
but not likely to happen with kexec_file_load on newer kernel.

>
> >
> > > the systab region is not mapped by the identity mapping provided by
> > > kexec.
> >
> > 3. Then that needs to be fixed in the first kernel as it is a
> > shortcoming of us starting to parse systab very early. It is the kexec
> > setup code's problem not the early compressed stage's problem that the
> > EFI systab is not mapped.
>
> Yeah, adding the systab mapping looks good. Kairui put it in
> decompressing stage just because he wants to cover the case in which the
> old kernel kexec jumping to 2nd kernel. Now it seems not very
> reasonable, we also have the new kernel kexec jumping to old 1nd kernel.

Yes, kexec only cover RAM in the ident map it prepared for second
kernel, but the systab could be in reserved region, so if it didn't
fall into the 1G padding by accident it will fail when reading from
it. Fix in early code could make sure 2nd kernel always work. Or
should we treat it specially in kexec mapping prepare code?

>
> Thanks
> Baoquan
--
Best Regards,
Kairui Song

Subject: [tip:x86/urgent] x86/boot: Disable RSDP parsing temporarily

Commit-ID: 36f0c423552dacaca152324b8e9bda42a6d88865
Gitweb: https://git.kernel.org/tip/36f0c423552dacaca152324b8e9bda42a6d88865
Author: Borislav Petkov <[email protected]>
AuthorDate: Fri, 19 Apr 2019 15:40:14 +0200
Committer: Borislav Petkov <[email protected]>
CommitDate: Mon, 22 Apr 2019 11:36:43 +0200

x86/boot: Disable RSDP parsing temporarily

The original intention to move RDSP parsing very early, before KASLR
does its ranges selection, was to accommodate movable memory regions
machines (CONFIG_MEMORY_HOTREMOVE) to still be able to do memory
hotplug.

However, that broke kexec'ing a kernel on EFI machines because depending
on where the EFI systab was mapped, on at least one machine it isn't
present in the kexec mapping of the second kernel, leading to a triple
fault in the early code.

Fixing this properly requires significantly involved surgery and we
cannot allow ourselves to do that, that close to the merge window.

So disable the RSDP parsing code temporarily until it is fixed properly
in the next release cycle.

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Chao Fan <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: Kees Cook <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/misc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c0d6c560df69..5a237e8dbf8d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -352,7 +352,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
boot_params->hdr.loadflags &= ~KASLR_FLAG;

/* Save RSDP address for later use. */
- boot_params->acpi_rsdp_addr = get_rsdp_addr();
+ /* boot_params->acpi_rsdp_addr = get_rsdp_addr(); */

sanitize_boot_params(boot_params);

2019-04-22 16:18:23

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it

On 04/19/19 at 01:36pm, Borislav Petkov wrote:
> On Fri, Apr 19, 2019 at 01:28:01PM +0200, Borislav Petkov wrote:
> > Read again what I said: "should all be passed through boot_params".
> > Which means, boot_params should be extended with a field of a flag to
> > say: "this is a kexec'ed kernel".
>
> And by that I mean similar to the XLF_EFI_KEXEC mechanism. The first
> kernel or kexec(1) should prepare the info needed by the kexec'ed
> kernel.

We have set the loader type to '0x0D << 4' for kexec specifically, in both
kexec_load and kexec_file_load. We can check this to identify if it's
kexec-ed kernel or not.

Update patch with it?

static void *bzImage64_load(struct kimage *image, char *kernel,
unsigned long kernel_len, char *initrd,
unsigned long initrd_len, char *cmdline,
unsigned long cmdline_len)
{

...
/* bootloader info. Do we need a separate ID for kexec kernel loader? */
params->hdr.type_of_loader = 0x0D << 4;
...

}