2018-08-29 14:17:55

by Baoquan He

[permalink] [raw]
Subject: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

This was suggested by Kirill several months ago, I worked out several
patches to fix, then interrupted by other issues. So sort them out
now and post for reviewing.

The current upstream kernel supports 5-level paging mode and supports
dynamically choosing paging mode during bootup according to kernel
image, hardware and kernel parameter setting. This flexibility brings
several issues for kexec/kdump:
1)
Switching between paging modes, requires changes into target kernel.
It means you cannot kexec() 4-level paging kernel from 5-level paging
kernel if 4-level paging kernel doesn't include changes.

2)
Switching from 5-level paging to 4-level paging kernel would fail, if
kexec() put kernel image above 64TiB of memory.

3)
Kdump jumping has similar issue as 2). This require us to only
reserve crashkernel below 64TB, otherwise jumping from 5-level to
4-level kernel will fail.

4)
The current kexec_load interface will put kernel at the top of system
RAM. This also need be restricted to be under 64TB. However this is not
an issue for kexec_file_load interface since it puts kernel at the top
of lowest 4GB. I ever planned to unify these two's behavior to put
kernel at top of system RAM the reason is we have been using the old
kexec_load, and still more widely than kexec_file_load. Just the change
involves too mamy lines of code change, seems people don't like it. Now
I decide to give up the unifying thing, just leave with it, and add the
restriction for kexec_load in kexec_tools unitilies. The unifying
behaviour patches are:

[PATCH v7 0/4] resource: Use list_head to link sibling resource
http://lkml.kernel.org/r/[email protected]

Note:
The issues 1), 2) need be done in kernel for kexec_file_load interface.
Meanwhile, 1), 2), and 4) need be done in user space kexec_tools
utility. I will post patches later for user space fix. Issue 3) can only
be done in kernel.

Baoquan He (3):
x86/boot: Add bit fields into xloadflags for 5-level kernel checking
x86/kexec/64: Error out if try to jump to old 4-level kernel from
5-level kernel
x86/kdump/64: Change the upper limit of crashkernel reservation

arch/x86/boot/header.S | 12 +++++++++++-
arch/x86/include/uapi/asm/bootparam.h | 2 ++
arch/x86/kernel/kexec-bzimage64.c | 5 +++++
arch/x86/kernel/setup.c | 18 ++++++++++++++----
4 files changed, 32 insertions(+), 5 deletions(-)

--
2.13.6



2018-08-29 14:18:12

by Baoquan He

[permalink] [raw]
Subject: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

Add two bit fields XLF_5LEVEL and XLF_5LEVEL_ENABLED for 5-level kernel.
Bit XLF_5LEVEL indicates if 5-level related code is contained
in this kernel.
Bit XLF_5LEVEL_ENABLED indicates if CONFIG_X86_5LEVEL=y is set.

They are being used in later patch to check if kexec/kdump kernel
is loaded in right place.

Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/boot/header.S | 12 +++++++++++-
arch/x86/include/uapi/asm/bootparam.h | 2 ++
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 850b8762e889..be19f4199727 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -419,7 +419,17 @@ xloadflags:
# define XLF4 0
#endif

- .word XLF0 | XLF1 | XLF23 | XLF4
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_5LEVEL
+#define XLF56 (XLF_5LEVEL|XLF_5LEVEL_ENABLED)
+#else
+#define XLF56 XLF_5LEVEL
+#endif
+#else
+#define XLF56 0
+#endif
+
+ .word XLF0 | XLF1 | XLF23 | XLF4 | XLF56

cmdline_size: .long COMMAND_LINE_SIZE-1 #length of the command line,
#added with boot protocol
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index a06cbf019744..d76b2773d0c4 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -29,6 +29,8 @@
#define XLF_EFI_HANDOVER_32 (1<<2)
#define XLF_EFI_HANDOVER_64 (1<<3)
#define XLF_EFI_KEXEC (1<<4)
+#define XLF_5LEVEL (1<<5)
+#define XLF_5LEVEL_ENABLED (1<<6)

#ifndef __ASSEMBLY__

--
2.13.6


2018-08-29 14:18:51

by Baoquan He

[permalink] [raw]
Subject: [PATCH 2/3] x86/kexec/64: Error out if try to jump to old 4-level kernel from 5-level kernel

In relocate_kernel() CR4.LA57 flag is set before kexec jumping if
the kernel has 5-level paging enabled. Then in boot/compressed/head_64.S,
it will check if the booting kernel is in 4-level or 5-level paging
mode, and handle accordingly. However, the old kernel which doesn't
contain the 5-level codes doesn't know how to cope with it, then #GP
triggered.

Instead of triggering #GP during kexec kernel boot, error out during
kexec loading if find out we are trying to jump to old 4-level kernel
from 5-level kernel.

Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/kernel/kexec-bzimage64.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index 7326078eaa7a..f5fe94ee209a 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -316,6 +316,11 @@ static int bzImage64_probe(const char *buf, unsigned long len)
return ret;
}

+ if (!(header->xloadflags & XLF_5LEVEL) && pgtable_l5_enabled()) {
+ pr_err("Can not jump to old 4-level kernel from 5-level kernel.\n");
+ return ret;
+ }
+
/* I've got a bzImage */
pr_debug("It's a relocatable bzImage64\n");
ret = 0;
--
2.13.6


2018-08-29 14:18:51

by Baoquan He

[permalink] [raw]
Subject: [PATCH 3/3] x86/kdump/64: Change the upper limit of crashkernel reservation

Restrict kdump to only reserve crashkernel below 64TB. Since the kdump
jumping may be from 5-level to 4-level, and the kdump kernel is put
above 64TB in 5-level kernel, then the jumping will fail. And the
crashkernel reservation is done during the 1st kernel bootup, there's
no way to detect the paging mode of kdump kernel at that time.

Hence change the upper limmit of crashkernel reservation to 64TB
on x86_64.

Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/kernel/setup.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8fe740e22030..1a6389096782 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -451,16 +451,26 @@ static void __init memblock_x86_reserve_range_setup_data(void)
#define CRASH_ALIGN (16 << 20)

/*
- * Keep the crash kernel below this limit. On 32 bits earlier kernels
- * would limit the kernel to the low 512 MiB due to mapping restrictions.
- * On 64bit, old kexec-tools need to under 896MiB.
+ * Keep the crash kernel below this limit.
+ *
+ * On 32 bits earlier kernels would limit the kernel to the low
+ * 512 MiB due to mapping restrictions.
+ *
+ * On 64bit, old kexec-tools need to be under 896MiB. The later
+ * supports to put kernel above 4G, up to system RAM top. Here
+ * kdump kernel need be restricted to be under 64TB, which is
+ * the upper limit of system RAM in 4-level paing mode. Since
+ * the kdump jumping could be from 5-level to 4-level, the jumping
+ * will fail if kernel is put above 64TB, and there's no way to
+ * detect the paging mode of the kernel which will be loaded for
+ * dumping during the 1st kernel boots up.
*/
#ifdef CONFIG_X86_32
# define CRASH_ADDR_LOW_MAX (512 << 20)
# define CRASH_ADDR_HIGH_MAX (512 << 20)
#else
# define CRASH_ADDR_LOW_MAX (896UL << 20)
-# define CRASH_ADDR_HIGH_MAX MAXMEM
+# define CRASH_ADDR_HIGH_MAX (1ULL < 46)
#endif

static int __init reserve_crashkernel_low(void)
--
2.13.6


2018-08-30 13:52:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86/kdump/64: Change the upper limit of crashkernel reservation

On Wed, Aug 29, 2018 at 10:16:24PM +0800, Baoquan He wrote:
> Restrict kdump to only reserve crashkernel below 64TB. Since the kdump
> jumping may be from 5-level to 4-level, and the kdump kernel is put
> above 64TB in 5-level kernel, then the jumping will fail. And the
> crashkernel reservation is done during the 1st kernel bootup, there's
> no way to detect the paging mode of kdump kernel at that time.
>
> Hence change the upper limmit of crashkernel reservation to 64TB

s/limmit/limit/

--
Kirill A. Shutemov

2018-08-30 14:00:59

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

On Wed, Aug 29, 2018 at 10:16:21PM +0800, Baoquan He wrote:
> This was suggested by Kirill several months ago, I worked out several
> patches to fix, then interrupted by other issues. So sort them out
> now and post for reviewing.

Thanks for doing this.

> The current upstream kernel supports 5-level paging mode and supports
> dynamically choosing paging mode during bootup according to kernel
> image, hardware and kernel parameter setting. This flexibility brings
> several issues for kexec/kdump:
> 1)
> Switching between paging modes, requires changes into target kernel.
> It means you cannot kexec() 4-level paging kernel from 5-level paging
> kernel if 4-level paging kernel doesn't include changes.
>
> 2)
> Switching from 5-level paging to 4-level paging kernel would fail, if
> kexec() put kernel image above 64TiB of memory.

I'm not entirely sure that 64TiB is the limit here. Technically, 4-level
paging allows to address 256TiB in 1-to-1 mapping. We just don't have
machines with that wide physical address space (which don't support
5-level paging too).

What is your reasoning about 64TiB limit?

--
Kirill A. Shutemov

2018-08-30 14:14:13

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

On 08/30/18 at 04:58pm, Kirill A. Shutemov wrote:
> On Wed, Aug 29, 2018 at 10:16:21PM +0800, Baoquan He wrote:
> > This was suggested by Kirill several months ago, I worked out several
> > patches to fix, then interrupted by other issues. So sort them out
> > now and post for reviewing.
>
> Thanks for doing this.
>
> > The current upstream kernel supports 5-level paging mode and supports
> > dynamically choosing paging mode during bootup according to kernel
> > image, hardware and kernel parameter setting. This flexibility brings
> > several issues for kexec/kdump:
> > 1)
> > Switching between paging modes, requires changes into target kernel.
> > It means you cannot kexec() 4-level paging kernel from 5-level paging
> > kernel if 4-level paging kernel doesn't include changes.
> >
> > 2)
> > Switching from 5-level paging to 4-level paging kernel would fail, if
> > kexec() put kernel image above 64TiB of memory.
>
> I'm not entirely sure that 64TiB is the limit here. Technically, 4-level
> paging allows to address 256TiB in 1-to-1 mapping. We just don't have
> machines with that wide physical address space (which don't support
> 5-level paging too).

Hmm, afaik, the MAX_PHYSMEM_BITS limits the maximum address space
which physical RAM can mapped to. We have 256TB for the whole address
space for 4-level paging, that includes user space and kernel space,
it might not allow 256TB entirely for the direct mapping.
And the direct mapping is only for physical RAM mapping, and
kexec/kdump only cares about the physical RAM space and load them
inside.

# define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)

Not sure if my understanding is right, please correct me if I am wrong.

Thanks
Baoquan

2018-08-30 14:16:00

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86/kdump/64: Change the upper limit of crashkernel reservation

On 08/30/18 at 04:50pm, Kirill A. Shutemov wrote:
> On Wed, Aug 29, 2018 at 10:16:24PM +0800, Baoquan He wrote:
> > Restrict kdump to only reserve crashkernel below 64TB. Since the kdump
> > jumping may be from 5-level to 4-level, and the kdump kernel is put
> > above 64TB in 5-level kernel, then the jumping will fail. And the
> > crashkernel reservation is done during the 1st kernel bootup, there's
> > no way to detect the paging mode of kdump kernel at that time.
> >
> > Hence change the upper limmit of crashkernel reservation to 64TB
>
> s/limmit/limit/

Thanks, will change.

2018-08-30 14:30:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

On Thu, Aug 30, 2018 at 02:12:02PM +0000, Baoquan He wrote:
> On 08/30/18 at 04:58pm, Kirill A. Shutemov wrote:
> > On Wed, Aug 29, 2018 at 10:16:21PM +0800, Baoquan He wrote:
> > > This was suggested by Kirill several months ago, I worked out several
> > > patches to fix, then interrupted by other issues. So sort them out
> > > now and post for reviewing.
> >
> > Thanks for doing this.
> >
> > > The current upstream kernel supports 5-level paging mode and supports
> > > dynamically choosing paging mode during bootup according to kernel
> > > image, hardware and kernel parameter setting. This flexibility brings
> > > several issues for kexec/kdump:
> > > 1)
> > > Switching between paging modes, requires changes into target kernel.
> > > It means you cannot kexec() 4-level paging kernel from 5-level paging
> > > kernel if 4-level paging kernel doesn't include changes.
> > >
> > > 2)
> > > Switching from 5-level paging to 4-level paging kernel would fail, if
> > > kexec() put kernel image above 64TiB of memory.
> >
> > I'm not entirely sure that 64TiB is the limit here. Technically, 4-level
> > paging allows to address 256TiB in 1-to-1 mapping. We just don't have
> > machines with that wide physical address space (which don't support
> > 5-level paging too).
>
> Hmm, afaik, the MAX_PHYSMEM_BITS limits the maximum address space
> which physical RAM can mapped to. We have 256TB for the whole address
> space for 4-level paging, that includes user space and kernel space,
> it might not allow 256TB entirely for the direct mapping.
> And the direct mapping is only for physical RAM mapping, and
> kexec/kdump only cares about the physical RAM space and load them
> inside.
>
> # define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
>
> Not sure if my understanding is right, please correct me if I am wrong.

IIRC, we only care about the place kexec puts the kernel before it gets
decompressed. After the decompression kernel will be put into the right
spot.

Decompression is done in early boot where we use 1-to-1 mapping (not a
usual kernel virtual memory layout). All 256TiB should be reachable.

Said all that, I think it's safer to stick with 64TiB.

For the whole patcheset:

Acked-by: Kirill A. Shutemov <[email protected]>

--
Kirill A. Shutemov

2018-08-30 14:59:32

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

On 08/30/18 at 05:27pm, Kirill A. Shutemov wrote:
> On Thu, Aug 30, 2018 at 02:12:02PM +0000, Baoquan He wrote:
> > On 08/30/18 at 04:58pm, Kirill A. Shutemov wrote:
> > > On Wed, Aug 29, 2018 at 10:16:21PM +0800, Baoquan He wrote:
> > > > This was suggested by Kirill several months ago, I worked out several
> > > > patches to fix, then interrupted by other issues. So sort them out
> > > > now and post for reviewing.
> > >
> > > Thanks for doing this.
> > >
> > > > The current upstream kernel supports 5-level paging mode and supports
> > > > dynamically choosing paging mode during bootup according to kernel
> > > > image, hardware and kernel parameter setting. This flexibility brings
> > > > several issues for kexec/kdump:
> > > > 1)
> > > > Switching between paging modes, requires changes into target kernel.
> > > > It means you cannot kexec() 4-level paging kernel from 5-level paging
> > > > kernel if 4-level paging kernel doesn't include changes.
> > > >
> > > > 2)
> > > > Switching from 5-level paging to 4-level paging kernel would fail, if
> > > > kexec() put kernel image above 64TiB of memory.
> > >
> > > I'm not entirely sure that 64TiB is the limit here. Technically, 4-level
> > > paging allows to address 256TiB in 1-to-1 mapping. We just don't have
> > > machines with that wide physical address space (which don't support
> > > 5-level paging too).
> >
> > Hmm, afaik, the MAX_PHYSMEM_BITS limits the maximum address space
> > which physical RAM can mapped to. We have 256TB for the whole address
> > space for 4-level paging, that includes user space and kernel space,
> > it might not allow 256TB entirely for the direct mapping.
> > And the direct mapping is only for physical RAM mapping, and
> > kexec/kdump only cares about the physical RAM space and load them
> > inside.
> >
> > # define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
> >
> > Not sure if my understanding is right, please correct me if I am wrong.
>
> IIRC, we only care about the place kexec puts the kernel before it gets
> decompressed. After the decompression kernel will be put into the right
> spot.
>
> Decompression is done in early boot where we use 1-to-1 mapping (not a
> usual kernel virtual memory layout). All 256TiB should be reachable.

My understanding that is although it's 1:1 identity mapping, it still
has to be inside available physical RAM region. I don't remember what
the old code did, now in __startup_64(), you can see that there's a
check like below, and at this time, it's still identity mapping.

/* Is the address too large? */
if (physaddr >> MAX_PHYSMEM_BITS)
for (;;);

Thanks
Baoquan

2018-08-30 15:04:22

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

On 08/30/18 at 10:57pm, Baoquan He wrote:
> On 08/30/18 at 05:27pm, Kirill A. Shutemov wrote:
> > On Thu, Aug 30, 2018 at 02:12:02PM +0000, Baoquan He wrote:
> > > On 08/30/18 at 04:58pm, Kirill A. Shutemov wrote:
> > > > On Wed, Aug 29, 2018 at 10:16:21PM +0800, Baoquan He wrote:
> > > > > This was suggested by Kirill several months ago, I worked out several
> > > > > patches to fix, then interrupted by other issues. So sort them out
> > > > > now and post for reviewing.
> > > >
> > > > Thanks for doing this.
> > > >
> > > > > The current upstream kernel supports 5-level paging mode and supports
> > > > > dynamically choosing paging mode during bootup according to kernel
> > > > > image, hardware and kernel parameter setting. This flexibility brings
> > > > > several issues for kexec/kdump:
> > > > > 1)
> > > > > Switching between paging modes, requires changes into target kernel.
> > > > > It means you cannot kexec() 4-level paging kernel from 5-level paging
> > > > > kernel if 4-level paging kernel doesn't include changes.
> > > > >
> > > > > 2)
> > > > > Switching from 5-level paging to 4-level paging kernel would fail, if
> > > > > kexec() put kernel image above 64TiB of memory.
> > > >
> > > > I'm not entirely sure that 64TiB is the limit here. Technically, 4-level
> > > > paging allows to address 256TiB in 1-to-1 mapping. We just don't have
> > > > machines with that wide physical address space (which don't support
> > > > 5-level paging too).
> > >
> > > Hmm, afaik, the MAX_PHYSMEM_BITS limits the maximum address space
> > > which physical RAM can mapped to. We have 256TB for the whole address
> > > space for 4-level paging, that includes user space and kernel space,
> > > it might not allow 256TB entirely for the direct mapping.
> > > And the direct mapping is only for physical RAM mapping, and
> > > kexec/kdump only cares about the physical RAM space and load them
> > > inside.
> > >
> > > # define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
> > >
> > > Not sure if my understanding is right, please correct me if I am wrong.
> >
> > IIRC, we only care about the place kexec puts the kernel before it gets
> > decompressed. After the decompression kernel will be put into the right
> > spot.
> >
> > Decompression is done in early boot where we use 1-to-1 mapping (not a
> > usual kernel virtual memory layout). All 256TiB should be reachable.
>
> My understanding that is although it's 1:1 identity mapping, it still
~is that~ , sorry, typo
> has to be inside available physical RAM region. I don't remember what
> the old code did, now in __startup_64(), you can see that there's a
> check like below, and at this time, it's still identity mapping.
>
> /* Is the address too large? */
> if (physaddr >> MAX_PHYSMEM_BITS)
> for (;;);
>
> Thanks
> Baoquan

2018-09-02 20:46:40

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

On Thu, Aug 30, 2018 at 10:57:51PM +0800, Baoquan He wrote:
> On 08/30/18 at 05:27pm, Kirill A. Shutemov wrote:
> > On Thu, Aug 30, 2018 at 02:12:02PM +0000, Baoquan He wrote:
> > > On 08/30/18 at 04:58pm, Kirill A. Shutemov wrote:
> > > > On Wed, Aug 29, 2018 at 10:16:21PM +0800, Baoquan He wrote:
> > > > > This was suggested by Kirill several months ago, I worked out several
> > > > > patches to fix, then interrupted by other issues. So sort them out
> > > > > now and post for reviewing.
> > > >
> > > > Thanks for doing this.
> > > >
> > > > > The current upstream kernel supports 5-level paging mode and supports
> > > > > dynamically choosing paging mode during bootup according to kernel
> > > > > image, hardware and kernel parameter setting. This flexibility brings
> > > > > several issues for kexec/kdump:
> > > > > 1)
> > > > > Switching between paging modes, requires changes into target kernel.
> > > > > It means you cannot kexec() 4-level paging kernel from 5-level paging
> > > > > kernel if 4-level paging kernel doesn't include changes.
> > > > >
> > > > > 2)
> > > > > Switching from 5-level paging to 4-level paging kernel would fail, if
> > > > > kexec() put kernel image above 64TiB of memory.
> > > >
> > > > I'm not entirely sure that 64TiB is the limit here. Technically, 4-level
> > > > paging allows to address 256TiB in 1-to-1 mapping. We just don't have
> > > > machines with that wide physical address space (which don't support
> > > > 5-level paging too).
> > >
> > > Hmm, afaik, the MAX_PHYSMEM_BITS limits the maximum address space
> > > which physical RAM can mapped to. We have 256TB for the whole address
> > > space for 4-level paging, that includes user space and kernel space,
> > > it might not allow 256TB entirely for the direct mapping.
> > > And the direct mapping is only for physical RAM mapping, and
> > > kexec/kdump only cares about the physical RAM space and load them
> > > inside.
> > >
> > > # define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
> > >
> > > Not sure if my understanding is right, please correct me if I am wrong.
> >
> > IIRC, we only care about the place kexec puts the kernel before it gets
> > decompressed. After the decompression kernel will be put into the right
> > spot.
> >
> > Decompression is done in early boot where we use 1-to-1 mapping (not a
> > usual kernel virtual memory layout). All 256TiB should be reachable.
>
> My understanding that is although it's 1:1 identity mapping, it still
> has to be inside available physical RAM region. I don't remember what
> the old code did, now in __startup_64(),

I'm talking about the code that runs before __startup_64(), in
arch/x86/boot/compressed. Physcal memory start at virtual address 0 there,
without PAGE_OFFSET.

> you can see that there's a check like below, and at this time, it's
> still identity mapping.
>
> /* Is the address too large? */
> if (physaddr >> MAX_PHYSMEM_BITS)
> for (;;);
>
> Thanks
> Baoquan

--
Kirill A. Shutemov

2018-09-04 02:43:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

I don't understand why there is any reason not to always enter the target
kernel in 4-level mode. There certainly is no point whatsoever in having two
xloadflags: the only thing that could possibly matter is whether or not the
kernel in question *can* be entered in 5-level mode should that ever be necessary.

-hpa

2018-09-04 03:45:58

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

Hi hpa,

Thanks for looking into this.

On 09/03/18 at 07:41pm, H. Peter Anvin wrote:
> I don't understand why there is any reason not to always enter the target
> kernel in 4-level mode. There certainly is no point whatsoever in having two
> xloadflags: the only thing that could possibly matter is whether or not the
> kernel in question *can* be entered in 5-level mode should that ever be necessary.

This patchset is only used to fix kexec/kdump issues. We never stop
kernel booting in 4-level mode from firmware as 1st kernel. However,
there are issues when jump from the 1st kernel which is in 5-level mode
to 2nd kernel. The reason is:

1) in arch/x86/kernel/relocate_kernel_64.S, we set X86_CR4_LA57 into cr4
if the 1st kernel is in 5-level mode. Then in
arch/x86/boot/compressed/head_64.S, paging_prepare() is called to decide
if 5-level mode will be enabled, and prepare the trampoline. If
kexec/kdump kernel is expected to be in 4-level, e.g with 'nolv5'
specified, it still can handle well. But for the old kernel w/o these
5-level codes, it will ignore the fact that X86_CR4_LA57 has been set
in CR4 and proceed anyway, then #GP is triggered. That's why XLF_5LEVEL
is used to mark.

2) For kexec_load interface, we put kernel/initrd at top of system RAM
in kexec_tools utility. If the 1st kernel is in 5-level mode, the
kexec-ed kernel has "CONFIG_X86_5LEVEL=n", we have to detect this and
limit the kernel to be loaded under 64TB, since kexec-ed kernel will
definitely run in 4-level mode. Putting kernel above 64TB will fail
kexec-ed kernel booting. That's why *XLF_5LEVEL_ENABLED* is needed.

Thanks
Baoquan

2018-09-04 04:16:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/03/18 20:44, Baoquan He wrote:
>
> 1) in arch/x86/kernel/relocate_kernel_64.S, we set X86_CR4_LA57 into cr4
> if the 1st kernel is in 5-level mode. Then in
> arch/x86/boot/compressed/head_64.S, paging_prepare() is called to decide
> if 5-level mode will be enabled, and prepare the trampoline. If
> kexec/kdump kernel is expected to be in 4-level, e.g with 'nolv5'
> specified, it still can handle well. But for the old kernel w/o these
> 5-level codes, it will ignore the fact that X86_CR4_LA57 has been set
> in CR4 and proceed anyway, then #GP is triggered. That's why XLF_5LEVEL
> is used to mark.
>

That's what I'm saying, don't do that. Always jump into the second kernel in
4-level mode, i.e. X86_CR4_LA57 unset. That's the only sane thing.

-hpa

2018-09-04 05:22:50

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/03/18 at 09:13pm, H. Peter Anvin wrote:
> On 09/03/18 20:44, Baoquan He wrote:
> >
> > 1) in arch/x86/kernel/relocate_kernel_64.S, we set X86_CR4_LA57 into cr4
> > if the 1st kernel is in 5-level mode. Then in
> > arch/x86/boot/compressed/head_64.S, paging_prepare() is called to decide
> > if 5-level mode will be enabled, and prepare the trampoline. If
> > kexec/kdump kernel is expected to be in 4-level, e.g with 'nolv5'
> > specified, it still can handle well. But for the old kernel w/o these
> > 5-level codes, it will ignore the fact that X86_CR4_LA57 has been set
> > in CR4 and proceed anyway, then #GP is triggered. That's why XLF_5LEVEL
> > is used to mark.
> >
>
> That's what I'm saying, don't do that. Always jump into the second kernel in
> 4-level mode, i.e. X86_CR4_LA57 unset. That's the only sane thing.

Well, this might not be suggested. Kexec has been a formal feature in
our distro, our customers usually use it to reboot high end servers
because those machines may take one hour to boot up from firmware. And
5-level may be also supported very soon, if people want to do a fast
reboot from the current kernel in 5-level, and expect to see it's in
5-level too in the 2nd kernel, this always kexec jumping to the 2nd
kernel in 4-level mode might be unaccepted.

Thanks
Baoquan

2018-09-04 05:51:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/03/18 22:20, Baoquan He wrote:
> On 09/03/18 at 09:13pm, H. Peter Anvin wrote:
>> On 09/03/18 20:44, Baoquan He wrote:
>>>
>>> 1) in arch/x86/kernel/relocate_kernel_64.S, we set X86_CR4_LA57 into cr4
>>> if the 1st kernel is in 5-level mode. Then in
>>> arch/x86/boot/compressed/head_64.S, paging_prepare() is called to decide
>>> if 5-level mode will be enabled, and prepare the trampoline. If
>>> kexec/kdump kernel is expected to be in 4-level, e.g with 'nolv5'
>>> specified, it still can handle well. But for the old kernel w/o these
>>> 5-level codes, it will ignore the fact that X86_CR4_LA57 has been set
>>> in CR4 and proceed anyway, then #GP is triggered. That's why XLF_5LEVEL
>>> is used to mark.
>>>
>>
>> That's what I'm saying, don't do that. Always jump into the second kernel in
>> 4-level mode, i.e. X86_CR4_LA57 unset. That's the only sane thing.
>
> Well, this might not be suggested. Kexec has been a formal feature in
> our distro, our customers usually use it to reboot high end servers
> because those machines may take one hour to boot up from firmware. And
> 5-level may be also supported very soon, if people want to do a fast
> reboot from the current kernel in 5-level, and expect to see it's in
> 5-level too in the 2nd kernel, this always kexec jumping to the 2nd
> kernel in 4-level mode might be unaccepted.
>

That makes no sense. I'm talking about *entering* the kernel; the second
kernel should switch to 5-level mode as necessary.

-hpa


2018-09-04 06:08:16

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/03/18 at 10:46pm, H. Peter Anvin wrote:
> On 09/03/18 22:20, Baoquan He wrote:
> > On 09/03/18 at 09:13pm, H. Peter Anvin wrote:
> >> On 09/03/18 20:44, Baoquan He wrote:
> >>>
> >>> 1) in arch/x86/kernel/relocate_kernel_64.S, we set X86_CR4_LA57 into cr4
> >>> if the 1st kernel is in 5-level mode. Then in
> >>> arch/x86/boot/compressed/head_64.S, paging_prepare() is called to decide
> >>> if 5-level mode will be enabled, and prepare the trampoline. If
> >>> kexec/kdump kernel is expected to be in 4-level, e.g with 'nolv5'
> >>> specified, it still can handle well. But for the old kernel w/o these
> >>> 5-level codes, it will ignore the fact that X86_CR4_LA57 has been set
> >>> in CR4 and proceed anyway, then #GP is triggered. That's why XLF_5LEVEL
> >>> is used to mark.
> >>>
> >>
> >> That's what I'm saying, don't do that. Always jump into the second kernel in
> >> 4-level mode, i.e. X86_CR4_LA57 unset. That's the only sane thing.
> >
> > Well, this might not be suggested. Kexec has been a formal feature in
> > our distro, our customers usually use it to reboot high end servers
> > because those machines may take one hour to boot up from firmware. And
> > 5-level may be also supported very soon, if people want to do a fast
> > reboot from the current kernel in 5-level, and expect to see it's in
> > 5-level too in the 2nd kernel, this always kexec jumping to the 2nd
> > kernel in 4-level mode might be unaccepted.
> >
>
> That makes no sense. I'm talking about *entering* the kernel; the second
> kernel should switch to 5-level mode as necessary.

OK, I didn't get your point. I forget what difficulty was met so that
Kirill need to take this way. In that way, we will never have chance to
put kernel above 64TB even from 5-level kernel to jump to 5-level
kernel.

Hi Kirill,

Could you help to explain why the current implementation is decided?

Thanks
Baoquan


2018-09-04 06:40:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/03/18 23:06, Baoquan He wrote:
>>
>> That makes no sense. I'm talking about *entering* the kernel; the second
>> kernel should switch to 5-level mode as necessary.
>
> OK, I didn't get your point. I forget what difficulty was met so that
> Kirill need to take this way. In that way, we will never have chance to
> put kernel above 64TB even from 5-level kernel to jump to 5-level
> kernel.
>

It sounds like you have no intent of doing that anyway? Now, that is
something one could use an xloadflag for, as I previously stated: "this kernel
supports being entered in 5-level mode."

-hpa


2018-09-04 07:18:36

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/03/18 at 11:36pm, H. Peter Anvin wrote:
> On 09/03/18 23:06, Baoquan He wrote:
> >>
> >> That makes no sense. I'm talking about *entering* the kernel; the second
> >> kernel should switch to 5-level mode as necessary.
> >
> > OK, I didn't get your point. I forget what difficulty was met so that
> > Kirill need to take this way. In that way, we will never have chance to
> > put kernel above 64TB even from 5-level kernel to jump to 5-level
> > kernel.
> >
>
> It sounds like you have no intent of doing that anyway? Now, that is
> something one could use an xloadflag for, as I previously stated: "this kernel
> supports being entered in 5-level mode."

I am willing to take any better way to improve. May need to make clear
why that was not taken. Not sure if Kirill still has the details.

Thanks
Baoquan


2018-09-04 08:44:15

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On Mon, Sep 03, 2018 at 10:46:33PM -0700, H. Peter Anvin wrote:
> On 09/03/18 22:20, Baoquan He wrote:
> > On 09/03/18 at 09:13pm, H. Peter Anvin wrote:
> >> On 09/03/18 20:44, Baoquan He wrote:
> >>>
> >>> 1) in arch/x86/kernel/relocate_kernel_64.S, we set X86_CR4_LA57 into cr4
> >>> if the 1st kernel is in 5-level mode. Then in
> >>> arch/x86/boot/compressed/head_64.S, paging_prepare() is called to decide
> >>> if 5-level mode will be enabled, and prepare the trampoline. If
> >>> kexec/kdump kernel is expected to be in 4-level, e.g with 'nolv5'
> >>> specified, it still can handle well. But for the old kernel w/o these
> >>> 5-level codes, it will ignore the fact that X86_CR4_LA57 has been set
> >>> in CR4 and proceed anyway, then #GP is triggered. That's why XLF_5LEVEL
> >>> is used to mark.
> >>>
> >>
> >> That's what I'm saying, don't do that. Always jump into the second kernel in
> >> 4-level mode, i.e. X86_CR4_LA57 unset. That's the only sane thing.
> >
> > Well, this might not be suggested. Kexec has been a formal feature in
> > our distro, our customers usually use it to reboot high end servers
> > because those machines may take one hour to boot up from firmware. And
> > 5-level may be also supported very soon, if people want to do a fast
> > reboot from the current kernel in 5-level, and expect to see it's in
> > 5-level too in the 2nd kernel, this always kexec jumping to the 2nd
> > kernel in 4-level mode might be unaccepted.
> >
>
> That makes no sense. I'm talking about *entering* the kernel; the second
> kernel should switch to 5-level mode as necessary.

Switching between 4- and 5-level paging modes (in either direction)
requires paing disabling. It means the code that does the switching has to
be under 4G otherwise we would lose control.

We handle the switching correctly in kernel decompression code, but not on
kexec caller side.

XLF_5LEVEL indicates that kernel decompression code can deal with
switching between paging modes and it's safe to jump there in 5-level
paging mode.

As an alternative we can change kexec to switch to 4-level paging mode
before starting the new kernel. Not sure how hard it will be.

--
Kirill A. Shutemov

2018-09-05 04:08:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/04/18 01:42, Kirill A. Shutemov wrote:
>
> Switching between 4- and 5-level paging modes (in either direction)
> requires paing disabling. It means the code that does the switching has to
> be under 4G otherwise we would lose control.
>
> We handle the switching correctly in kernel decompression code, but not on
> kexec caller side.
>
> XLF_5LEVEL indicates that kernel decompression code can deal with
> switching between paging modes and it's safe to jump there in 5-level
> paging mode.
>
> As an alternative we can change kexec to switch to 4-level paging mode
> before starting the new kernel. Not sure how hard it will be.
>

Have a flag saying entering in 5-level mode is fine. However, you really
should support returning to 4-level mode in kexec. It is *much* easier to do
on the caller side as you have total control of memory allocation there.

-hpa

2018-09-05 08:04:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On Tue, 4 Sep 2018, H. Peter Anvin wrote:

> On 09/04/18 01:42, Kirill A. Shutemov wrote:
> >
> > Switching between 4- and 5-level paging modes (in either direction)
> > requires paing disabling. It means the code that does the switching has to
> > be under 4G otherwise we would lose control.
> >
> > We handle the switching correctly in kernel decompression code, but not on
> > kexec caller side.
> >
> > XLF_5LEVEL indicates that kernel decompression code can deal with
> > switching between paging modes and it's safe to jump there in 5-level
> > paging mode.
> >
> > As an alternative we can change kexec to switch to 4-level paging mode
> > before starting the new kernel. Not sure how hard it will be.
> >
>
> Have a flag saying entering in 5-level mode is fine. However, you really
> should support returning to 4-level mode in kexec. It is *much* easier to do
> on the caller side as you have total control of memory allocation there.

Works for a regular kexec, but not for starting a crash kernel....

Thanks,

tglx

2018-09-26 07:56:47

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/boot: Add bit fields into xloadflags for 5-level kernel checking

On 09/05/18 at 10:02am, Thomas Gleixner wrote:
> On Tue, 4 Sep 2018, H. Peter Anvin wrote:
>
> > On 09/04/18 01:42, Kirill A. Shutemov wrote:
> > >
> > > Switching between 4- and 5-level paging modes (in either direction)
> > > requires paing disabling. It means the code that does the switching has to
> > > be under 4G otherwise we would lose control.
> > >
> > > We handle the switching correctly in kernel decompression code, but not on
> > > kexec caller side.
> > >
> > > XLF_5LEVEL indicates that kernel decompression code can deal with
> > > switching between paging modes and it's safe to jump there in 5-level
> > > paging mode.
> > >
> > > As an alternative we can change kexec to switch to 4-level paging mode
> > > before starting the new kernel. Not sure how hard it will be.
> > >
> >
> > Have a flag saying entering in 5-level mode is fine. However, you really
> > should support returning to 4-level mode in kexec. It is *much* easier to do
> > on the caller side as you have total control of memory allocation there.
>
> Works for a regular kexec, but not for starting a crash kernel....

Agree, it's not appropriate to do this after normal kernel crashed and
prepare to jump to 2nd kernel.

Can this patchset be merged? I will post patches to kexec-tools utility
since it will make use of these flags.

Thanks
Baoquan