2024-03-01 18:58:40

by Borislav Petkov

[permalink] [raw]
Subject: [RFC PATCH 0/2] x86/kexec: Revert 5level dynamic switching

From: "Borislav Petkov (AMD)" <[email protected]>

Hi,

I think this is silly - both 1st and second, the-kexec'ed kernel would
either have CONFIG_X86_5LEVEL enabled or not - why would this be
different.

And this'll become even more the case in the future.

So remove this silly stuff.

Unless there's a valid use case, which I'm willing to hear. Those commit
messages don't say anything about it.

Thx.

Borislav Petkov (AMD) (2):
Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level
only kernel"
Revert "x86/boot: Add xloadflags bits to check for 5-level paging
support"

arch/x86/boot/header.S | 12 +-----------
arch/x86/include/uapi/asm/bootparam.h | 2 --
arch/x86/kernel/kexec-bzimage64.c | 5 -----
3 files changed, 1 insertion(+), 18 deletions(-)

--
2.43.0



2024-03-01 19:22:23

by Borislav Petkov

[permalink] [raw]
Subject: [RFC PATCH 2/2] Revert "x86/boot: Add xloadflags bits to check for 5-level paging support"

From: "Borislav Petkov (AMD)" <[email protected]>

This reverts commit f2d08c5d3bcf3f7ef788af122b57a919efa1e9d0.

This whole dynamic switching support is silly. I don't see a use case
where one would use an old kernel with CONFIG_X86_5LEVEL disabled to
kexec into. I.e., you use pretty much the same kernel.

But I'm open to corrections.

Commit message of

f2d08c5d3bcf ("x86/boot: Add xloadflags bits to check for 5-level paging support")

claims:

The flags will be used by the kernel kexec subsystem and the userspace
kexec tools.

but they're nowhere to be found in kexec tools:

[ ~/src/kexec-tools> git describe
v2.0.28-4-g6ee2ac1bf739
[ ~/src/kexec-tools> git grep XLF_5LEVEL
[ ~/src/kexec-tools>

Zap it all.

Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/boot/header.S | 12 +-----------
arch/x86/include/uapi/asm/bootparam.h | 2 --
2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index a1bbedd989e4..0f261224acef 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -364,17 +364,7 @@ xloadflags:
# define XLF4 0
#endif

-#ifdef CONFIG_X86_64
-#ifdef CONFIG_X86_5LEVEL
-#define XLF56 (XLF_5LEVEL|XLF_5LEVEL_ENABLED)
-#else
-#define XLF56 XLF_5LEVEL
-#endif
-#else
-#define XLF56 0
-#endif
-
- .word XLF0 | XLF1 | XLF23 | XLF4 | XLF56
+ .word XLF0 | XLF1 | XLF23 | XLF4

cmdline_size: .long COMMAND_LINE_SIZE-1 #length of the command line,
#added with boot protocol
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 4a38e7917756..b53b524f6ed2 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -22,8 +22,6 @@
#define XLF_EFI_HANDOVER_32 (1<<2)
#define XLF_EFI_HANDOVER_64 (1<<3)
#define XLF_EFI_KEXEC (1<<4)
-#define XLF_5LEVEL (1<<5)
-#define XLF_5LEVEL_ENABLED (1<<6)

#ifndef __ASSEMBLY__

--
2.43.0


2024-03-01 19:26:11

by Borislav Petkov

[permalink] [raw]
Subject: [RFC PATCH 1/2] Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel"

From: "Borislav Petkov (AMD)" <[email protected]>

This reverts commit ee338b9ee2822e65a85750da6129946c14962410.

This whole dynamic switching support is silly. I don't see a use case
where one would use an old kernel with CONFIG_X86_5LEVEL disabled to
kexec into. I.e., you use pretty much the same kernel.

But I'm open to corrections.

Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/kernel/kexec-bzimage64.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index cde167b0ea92..4f2e47338b7f 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -375,11 +375,6 @@ static int bzImage64_probe(const char *buf, unsigned long len)
return ret;
}

- if (!(header->xloadflags & XLF_5LEVEL) && pgtable_l5_enabled()) {
- pr_err("bzImage cannot handle 5-level paging mode.\n");
- return ret;
- }
-
/* I've got a bzImage */
pr_debug("It's a relocatable bzImage64\n");
ret = 0;
--
2.43.0


2024-03-04 11:04:07

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel"

On 03/01/24 at 07:56pm, Borislav Petkov wrote:
> From: "Borislav Petkov (AMD)" <[email protected]>
>
> This reverts commit ee338b9ee2822e65a85750da6129946c14962410.
>
> This whole dynamic switching support is silly. I don't see a use case
> where one would use an old kernel with CONFIG_X86_5LEVEL disabled to
> kexec into. I.e., you use pretty much the same kernel.

It's not true. Customer may want to try to load a different kernel if
they have taken many testings and trust that kdump kernel, or for
debugging. The similar for kexec reboot into 2nd kernel. We don't
enforce kexec/kdump to work on the same kernel as the 1st kernel. With
the fail and message, user can take measure to avoid that. it's better
the failure is encountered when failing to jump to kexec/kdump kernel.

I remmeber we have use case where customer used kdump kernel different
than the 1st kernel. While I don't remember why.

>
> But I'm open to corrections.
>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> ---
> arch/x86/kernel/kexec-bzimage64.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
> index cde167b0ea92..4f2e47338b7f 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -375,11 +375,6 @@ static int bzImage64_probe(const char *buf, unsigned long len)
> return ret;
> }
>
> - if (!(header->xloadflags & XLF_5LEVEL) && pgtable_l5_enabled()) {
> - pr_err("bzImage cannot handle 5-level paging mode.\n");
> - return ret;
> - }
> -
> /* I've got a bzImage */
> pr_debug("It's a relocatable bzImage64\n");
> ret = 0;
> --
> 2.43.0
>


2024-03-04 11:13:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel"

On Mon, Mar 04, 2024 at 06:51:26PM +0800, Baoquan He wrote:
> It's not true. Customer may want to try to load a different kernel if

"may want" is one of those hypothetical things which we don't do. If we
have to support everything a customer *may* want, then the kernel will
be a madness.

Also, you do realize that the kernel doesn't care about "customers",
right?

And the question is, how *sensible* is such a use case?

In my experience, not at all. You simply take the same kernel or a very
similar one and kexec it.

> they have taken many testings and trust that kdump kernel, or for
> debugging.

Yes, and those kernels will have 5level too. Practically, distros must
enable 5level support in their kernels in order to support modern hw.

> The similar for kexec reboot into 2nd kernel. We don't enforce
> kexec/kdump to work on the same kernel as the 1st kernel. With the
> fail and message, user can take measure to avoid that. it's better the
> failure is encountered when failing to jump to kexec/kdump kernel.

I can't parse that example.

Btw, kexec tools don't use those XLF_5LEVEL* flags bits either. Which
basically means we don't really need them.

> I remmeber we have use case where customer used kdump kernel different
> than the 1st kernel. While I don't remember why.

See above.

And that customer can still use the old distro kernels which have those
flags.

The point here is, going forward, 5level becomes ubiquitous and will be
even more tightly integrated in the kernel so that it'll become just
another default feature which is either there or not.

So the distinction is going away and the flags can go too.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-05 03:44:03

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel"

On 03/04/24 at 12:11pm, Borislav Petkov wrote:
> On Mon, Mar 04, 2024 at 06:51:26PM +0800, Baoquan He wrote:
> > It's not true. Customer may want to try to load a different kernel if
>
> "may want" is one of those hypothetical things which we don't do. If we
> have to support everything a customer *may* want, then the kernel will
> be a madness.
>
> Also, you do realize that the kernel doesn't care about "customers",
> right?

Guess you mean upstream kernel doesn't care about 'customers'. Downstream
kernel does care about customers.

>
> And the question is, how *sensible* is such a use case?
>
> In my experience, not at all. You simply take the same kernel or a very
> similar one and kexec it.

Hmm, there's different view between upstream and downstream. For distros
kernel, we need a lot of testing to make sure one kernel is trustworthy
as kdump kernel. Here, 'a lot of testing' means a long list of user cases
for kexec/kdump. Please see below file from centos kexec-tools package:

https://git.centos.org/rpms/kexec-tools/blob/bb7919506eba39a2b7277c8d36fe1774f9c33428/f/SOURCES/supported-kdump-targets.txt

And the kdump kernel doesn't have to be the same kernel as the 1st kernel.
I can give several examples:

1) Nvidia GPU or AMD GPU doesn't work well when kexec/kdump jumping to
2nd kernel in some releases. When we meet that case, we want to use the
newer kernel as 1st kernel. we also want to deploy kdump kernel to
capture the vmcore for analyzing once corruption encountered. Then the
old kernel which have been tested and prove to be working well can be
configured as 2nd kernel.

2) in redhat's internal testing, we also run debugging kernel to
test, while the debugging kernel require much more memory to boot up and
run than normal kernel, e.g KASAN memory feature will eat up 1/8 of
system memory. In this case, we run debugging kernel, but use normal
kernel (non-debugging kernel) instead configured as kdump kernel.

And the original purpose of kexec feature Eric developed is to facilitate
kernel developer to jump into a new and different kernel. We never
enforce users have to set kernel for kexec/kdump as the current running
kernel. But we do need explain why if one kernel can't be set as a kdump
kernel when it's different than the current running kernel. E.g kdump
kernel is too old, or like this 5-level case, jumping from 5-level to
4-level will fail.

>
> > they have taken many testings and trust that kdump kernel, or for
> > debugging.
>
> Yes, and those kernels will have 5level too. Practically, distros must
> enable 5level support in their kernels in order to support modern hw.
>
> > The similar for kexec reboot into 2nd kernel. We don't enforce
> > kexec/kdump to work on the same kernel as the 1st kernel. With the
> > fail and message, user can take measure to avoid that. it's better the
> > failure is encountered when failing to jump to kexec/kdump kernel.
>
> I can't parse that example.
>
> Btw, kexec tools don't use those XLF_5LEVEL* flags bits either. Which
> basically means we don't really need them.

No, it's not true. Kexec-tools doesn't check, means kexec_load interface
doesn't checking the flag. But it's set in xloadflags, and checked in
kexec_file load. As we know, kexec_file load implements most of codes in
kernel. At that time, people were talking if continuing adding new feature
into kexec_load interface.

In this patch 1, you are removing the flag checking for kexec_file load
interface that RHEL/Fedora default to use.

> > I remmeber we have use case where customer used kdump kernel different
> > than the 1st kernel. While I don't remember why.
>
> See above.
>
> And that customer can still use the old distro kernels which have those
> flags.

These two patches includes two parts of work. One is marking the kernel
itself supporting 5-level or not. The other is if I am the running
kernel and capable of 5-level, need check if the being loaded kernel is
capable of 5-level. The 2nd part will be executed when kexec_file load
interface is invoked with 'kexec -s'.

If we take off the checking, and people want to jump from the new kernel
to an old kernel where 5-level kernel code haven't been added or
CONFIG_X86_5LEVEL is unset on purpose, it won't fail and prompt message at
all until 2nd kernel booting silently failed. E.g, the coming RHEL10 anchor
a upstream kernel w/o the flag checking, people want to kexec/kdump jump
from rhel10 to an old rhel7 kernel. It could be an extreme case, while
revealing the scenario.

> The point here is, going forward, 5level becomes ubiquitous and will be
> even more tightly integrated in the kernel so that it'll become just
> another default feature which is either there or not.
>
> So the distinction is going away and the flags can go too.

I understand this and it makes sense to me, the existing code need be
combined with the realistic usage. I will check with our QE and support
engineers to see how far the targeted kernel taken as kexec-ed/kdump
kdump is allowed to be from the 1st kernel in our support or possible
use case. If no use case is concerned, we can take off the flags and
checking. Will report back soon once I get feedback.

Thanks
Baoquan


2024-03-05 11:55:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel"

On Tue, Mar 05, 2024 at 11:43:01AM +0800, Baoquan He wrote:
> Guess you mean upstream kernel doesn't care about 'customers'. Downstream
> kernel does care about customers.

You know very well what I mean. You're at Red Hat - I was at SUSE for
a decade. You know exactly well what the distinction is.

> Hmm, there's different view between upstream and downstream. For distros
> kernel, we need a lot of testing to make sure one kernel is trustworthy
> as kdump kernel. Here, 'a lot of testing' means a long list of user cases
> for kexec/kdump. Please see below file from centos kexec-tools package:
>
> https://git.centos.org/rpms/kexec-tools/blob/bb7919506eba39a2b7277c8d36fe1774f9c33428/f/SOURCES/supported-kdump-targets.txt
>
> And the kdump kernel doesn't have to be the same kernel as the 1st kernel.

This "example" basically proves my point. None of those dump targets
talk about architecture support - this is all drivers.

> I can give several examples:
>
> 1) Nvidia GPU or AMD GPU doesn't work well when kexec/kdump jumping to
> 2nd kernel in some releases. When we meet that case, we want to use the
> newer kernel as 1st kernel. we also want to deploy kdump kernel to
> capture the vmcore for analyzing once corruption encountered. Then the
> old kernel which have been tested and prove to be working well can be
> configured as 2nd kernel.

Same as above - nothing to do with architecture support. Both kernels
can and will have 5level support because you won't do two kernel images:
one with and one without 5level.

> E.g kdump kernel is too old, or like this 5-level case, jumping from
> 5-level to 4-level will fail.

5level support is present upstream since when?

$ git describe 6fb895692a034
v4.11-rc1-97-g6fb895692a03

There's no sensible kdump use case where you jump between 4.12 *and*
6.10, depending on when we revert this.

> No, it's not true. Kexec-tools doesn't check,

No, it is true. kexec-tools does *NOT* use those flags. Vs

"The flags will be used by the kernel kexec subsystem and the userspace
kexec tools."

from f2d08c5d3bcf3f7ef788af122b57a919efa1e9d0.

> If we take off the checking, and people want to jump from the new kernel
> to an old kernel where 5-level kernel code haven't been added or
> CONFIG_X86_5LEVEL is unset on purpose, it won't fail and prompt message at
> all until 2nd kernel booting silently failed. E.g, the coming RHEL10 anchor
> a upstream kernel w/o the flag checking, people want to kexec/kdump jump
> from rhel10 to an old rhel7 kernel. It could be an extreme case, while
> revealing the scenario.

That is the only valid reason you've given until now. Yes, that makes
sense - the removal of those flags should go together with the removal
of CONFIG_X86_5LEVEL and making this feature unconditional.

Because, practically, that config item is enabled on every relevant
x86 kernel config out there. It would be silly if not.

/me puts on TODO.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-06 04:02:22

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel"

On 03/05/24 at 12:55pm, Borislav Petkov wrote:
> On Tue, Mar 05, 2024 at 11:43:01AM +0800, Baoquan He wrote:
.....
>
> > If we take off the checking, and people want to jump from the new kernel
> > to an old kernel where 5-level kernel code haven't been added or
> > CONFIG_X86_5LEVEL is unset on purpose, it won't fail and prompt message at
> > all until 2nd kernel booting silently failed. E.g, the coming RHEL10 anchor
> > a upstream kernel w/o the flag checking, people want to kexec/kdump jump
> > from rhel10 to an old rhel7 kernel. It could be an extreme case, while
> > revealing the scenario.
>
> That is the only valid reason you've given until now. Yes, that makes
> sense - the removal of those flags should go together with the removal
> of CONFIG_X86_5LEVEL and making this feature unconditional.

Please forgive my awful expression.

>
> Because, practically, that config item is enabled on every relevant
> x86 kernel config out there. It would be silly if not.

I agree. Thanks for looking into this.