2019-10-18 13:43:14

by Lianbo Jiang

[permalink] [raw]
Subject: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Kdump kernel will reuse the first 640k region because of some reasons,
for example: the trampline and conventional PC system BIOS region may
require to allocate memory in this area. Obviously, kdump kernel will
also overwrite the first 640k region, therefore, kernel has to copy
the contents of the first 640k area to a backup area, which is done in
purgatory(), because vmcore may need the old memory. When vmcore is
dumped, kdump kernel will read the old memory from the backup area of
the first 640k area.

Basically, the main reason should be clear, kernel does not correctly
handle the first 640k region when SME is active, which causes that
kernel does not properly copy these old memory to the backup area in
purgatory(). Therefore, kdump kernel reads out the incorrect contents
from the backup area when dumping vmcore. Finally, the phenomenon is
as follow:

[root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values

KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
CPUS: 128
DATE: Thu Sep 19 08:31:18 2019
UPTIME: 00:01:21
LOAD AVERAGE: 0.16, 0.07, 0.02
TASKS: 1343
NODENAME: amd-ethanol
RELEASE: 5.3.0-rc7+
VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
MACHINE: x86_64 (2195 Mhz)
MEMORY: 127.9 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 9789
COMMAND: "bash"
TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
CPU: 83
STATE: TASK_RUNNING (PANIC)

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
crash>

BTW: I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for SME
situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter.

To avoid the above error, when the crashkernel kernel command line
option is specified, lets reserve the remaining low 1MiB memory(
after reserving real mode memroy) so that the allocated memory does
not fall into the low 1MiB area, which makes us not to copy the first
640k content to a backup region in purgatory(). This indicates that
it does not need to be included in crash dumps or used for anything
execept the processor trampolines that must live in the low 1MiB.

In addition, also need to clean all the code related to the backup
region later.

Signed-off-by: Lianbo Jiang <[email protected]>
---
arch/x86/realmode/init.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..1f0492830f2c 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -34,6 +34,17 @@ void __init reserve_real_mode(void)

memblock_reserve(mem, size);
set_real_mode_mem(mem);
+
+#ifdef CONFIG_KEXEC_CORE
+ /*
+ * When the crashkernel option is specified, only use the low
+ * 1MiB for the real mode trampoline.
+ */
+ if (strstr(boot_command_line, "crashkernel=")) {
+ memblock_reserve(0, 1<<20);
+ pr_info("Reserving the low 1MiB of memory for crashkernel\n");
+ }
+#endif /* CONFIG_KEXEC_CORE */
}

static void __init setup_real_mode(void)
--
2.17.1


2019-10-22 12:05:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

On Thu, Oct 17, 2019 at 05:43:45PM +0800, Lianbo Jiang wrote:
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Put that as a Link: below.

> Kdump kernel will reuse the first 640k region because of some reasons,

s/ of some reasons//

> for example: the trampline and conventional PC system BIOS region may

spellcheck: s/trampline/trampoline/

I see two more typos in here and if you had a spellchecker enabled in
your editor where you write the commit message, you'll see them too.
Please use one.

> require to allocate memory in this area. Obviously, kdump kernel will
> also overwrite the first 640k region,

Well, it is not obvious to me. Please be more specific: why would the
kdump kernel do that?

> therefore, kernel has to copy
> the contents of the first 640k area to a backup area, which is done in
> purgatory(), because vmcore may need the old memory. When vmcore is
> dumped, kdump kernel will read the old memory from the backup area of
> the first 640k area.
>
> Basically, the main reason should be clear, kernel does not correctly
> handle the first 640k region when SME is active,

If you mention the actual reason here, that sentence would be clearer:

"When SME is enabled in the first kernel, the kdump kernel must access
the first kernel's memory with the encryption bit set."

Something like that.

> which causes that
> kernel does not properly copy these old memory to the backup area in
> purgatory(). Therefore, kdump kernel reads out the incorrect contents

s/incorrect/encrypted/

> from the backup area when dumping vmcore. Finally, the phenomenon is

phenomenon?

> as follow:
>
> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
>
> KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
> DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
> CPUS: 128
> DATE: Thu Sep 19 08:31:18 2019
> UPTIME: 00:01:21
> LOAD AVERAGE: 0.16, 0.07, 0.02
> TASKS: 1343
> NODENAME: amd-ethanol
> RELEASE: 5.3.0-rc7+
> VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
> MACHINE: x86_64 (2195 Mhz)
> MEMORY: 127.9 GB
> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> PID: 9789
> COMMAND: "bash"
> TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
> CPU: 83
> STATE: TASK_RUNNING (PANIC)
>
> crash> kmem -s|grep -i invalid
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> crash>

I fail to see what that's trying to tell me? You have invalid pointers?

> BTW: I also tried to fix the above problem in purgatory(), but there
> are too many restricts in purgatory() context, for example: i can't
> allocate new memory to create the identity mapping page table for SME
> situation.

This paragraph belongs under the "---" line below.

> Currently, there are two places where the first 640k area is needed,
> the first one is in the find_trampoline_placement(), another one is
> in the reserve_real_mode(), and their content doesn't matter.
>
> To avoid the above error, when the crashkernel kernel command line
> option is specified, lets reserve the remaining low 1MiB memory(
> after reserving real mode memroy) so that the allocated memory does
> not fall into the low 1MiB area, which makes us not to copy the first
> 640k content to a backup region in purgatory(). This indicates that
> it does not need to be included in crash dumps or used for anything
> execept the processor trampolines that must live in the low 1MiB.
>
> In addition, also need to clean all the code related to the backup
> region later.

Ditto.

> Signed-off-by: Lianbo Jiang <[email protected]>
> ---
> arch/x86/realmode/init.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 7dce39c8c034..1f0492830f2c 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -34,6 +34,17 @@ void __init reserve_real_mode(void)
>
> memblock_reserve(mem, size);
> set_real_mode_mem(mem);
> +
> +#ifdef CONFIG_KEXEC_CORE
> + /*
> + * When the crashkernel option is specified, only use the low
> + * 1MiB for the real mode trampoline.
> + */
> + if (strstr(boot_command_line, "crashkernel=")) {
> + memblock_reserve(0, 1<<20);
> + pr_info("Reserving the low 1MiB of memory for crashkernel\n");
> + }
> +#endif /* CONFIG_KEXEC_CORE */

This ifdeffery needs to be a function in kernel/kexec_core.c which is
called by reserve_real_mode(), instead.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-10-22 14:06:29

by Dave Anderson

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified


---- Original Message -----

> >
> > [root linux]$ crash vmlinux
> > /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
> > WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
> >
> > KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
> > DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
> > CPUS: 128
> > DATE: Thu Sep 19 08:31:18 2019
> > UPTIME: 00:01:21
> > LOAD AVERAGE: 0.16, 0.07, 0.02
> > TASKS: 1343
> > NODENAME: amd-ethanol
> > RELEASE: 5.3.0-rc7+
> > VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
> > MACHINE: x86_64 (2195 Mhz)
> > MEMORY: 127.9 GB
> > PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> > PID: 9789
> > COMMAND: "bash"
> > TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
> > CPU: 83
> > STATE: TASK_RUNNING (PANIC)
> >
> > crash> kmem -s|grep -i invalid
> > kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> > kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> > crash>
>
> I fail to see what that's trying to tell me? You have invalid pointers?

Correct, because the pointer values are encrypted. The command is walking through the
singly-linked list of free objects in a slab from the dma-kmalloc-512 slab cache. The
slab memory had been allocated from low memory, and because of the problem at hand,
it was was copied to the vmcore in its encrypted state.

Dave

2019-10-23 07:47:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

On Wed, Oct 23, 2019 at 01:35:09PM +0800, lijiang wrote:
> Would you mind if i improve this patch as follow? Thanks.

Yap, looks good to me.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-10-23 07:47:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

On Wed, Oct 23, 2019 at 01:23:33PM +0800, lijiang wrote:
> Kdump kernel will reuse the first 640k region because the real mode
> trampoline has to work in this area. When the vmcore is dumped, the
> old memory in this area may be accessed, therefore, kernel has to
> copy the contents of the first 640k area to a backup region so that
> kdump kernel can read the old memory from the backup area of the
> first 640k area, which is done in the purgatory().

That sounds better. :)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-10-23 09:22:54

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

在 2019年10月23日 15:46, Borislav Petkov 写道:
> On Wed, Oct 23, 2019 at 01:35:09PM +0800, lijiang wrote:
>> Would you mind if i improve this patch as follow? Thanks.
>
> Yap, looks good to me.
>
Thanks for your comment.

OK. I will post this one and the third patch in this series later.

Thanks.
Lianbo


> Thx.
>

2019-10-23 11:57:24

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

在 2019年10月22日 16:30, Borislav Petkov 写道:
> On Thu, Oct 17, 2019 at 05:43:45PM +0800, Lianbo Jiang wrote:
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>
Thanks for your comment.

> Put that as a Link: below.
>
Looks better. OK.

>> Kdump kernel will reuse the first 640k region because of some reasons,
>
> s/ of some reasons//
>
>> for example: the trampline and conventional PC system BIOS region may
>
> spellcheck: s/trampline/trampoline/
>
> I see two more typos in here and if you had a spellchecker enabled in
> your editor where you write the commit message, you'll see them too.
> Please use one.
>
Good point. I just tried to enable the spellchecker in the vim and now it
has worked well. Thanks. :-)

>> require to allocate memory in this area. Obviously, kdump kernel will
>> also overwrite the first 640k region,
>
> Well, it is not obvious to me. Please be more specific: why would the
> kdump kernel do that?
>
Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

>> therefore, kernel has to copy
>> the contents of the first 640k area to a backup area, which is done in
>> purgatory(), because vmcore may need the old memory. When vmcore is
>> dumped, kdump kernel will read the old memory from the backup area of
>> the first 640k area.
>>
>> Basically, the main reason should be clear, kernel does not correctly
>> handle the first 640k region when SME is active,
>
> If you mention the actual reason here, that sentence would be clearer:
>
> "When SME is enabled in the first kernel, the kdump kernel must access
> the first kernel's memory with the encryption bit set."
>
> Something like that.
>
Looks good.

>> which causes that
>> kernel does not properly copy these old memory to the backup area in
>> purgatory(). Therefore, kdump kernel reads out the incorrect contents
>
> s/incorrect/encrypted/
>
Exactly.

>> from the backup area when dumping vmcore. Finally, the phenomenon is
>
> phenomenon?
>
Finally, it caused the following errors.

>> as follow:
>>
>> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
>> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
>>
>> KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
>> DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
>> CPUS: 128
>> DATE: Thu Sep 19 08:31:18 2019
>> UPTIME: 00:01:21
>> LOAD AVERAGE: 0.16, 0.07, 0.02
>> TASKS: 1343
>> NODENAME: amd-ethanol
>> RELEASE: 5.3.0-rc7+
>> VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
>> MACHINE: x86_64 (2195 Mhz)
>> MEMORY: 127.9 GB
>> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>> PID: 9789
>> COMMAND: "bash"
>> TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
>> CPU: 83
>> STATE: TASK_RUNNING (PANIC)
>>
>> crash> kmem -s|grep -i invalid
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> crash>
>
> I fail to see what that's trying to tell me? You have invalid pointers?
>
Yes, when parsing the vmcore via crash tool, it occurs the above errors,
the crash tool gets invalid pointers.

>> BTW: I also tried to fix the above problem in purgatory(), but there
>> are too many restricts in purgatory() context, for example: i can't
>> allocate new memory to create the identity mapping page table for SME
>> situation.
>
> This paragraph belongs under the "---" line below.
>
OK. Thanks.

>> Currently, there are two places where the first 640k area is needed,
>> the first one is in the find_trampoline_placement(), another one is
>> in the reserve_real_mode(), and their content doesn't matter.
>>
>> To avoid the above error, when the crashkernel kernel command line
>> option is specified, lets reserve the remaining low 1MiB memory(
>> after reserving real mode memroy) so that the allocated memory does
>> not fall into the low 1MiB area, which makes us not to copy the first
>> 640k content to a backup region in purgatory(). This indicates that
>> it does not need to be included in crash dumps or used for anything
>> execept the processor trampolines that must live in the low 1MiB.
>>
>> In addition, also need to clean all the code related to the backup
>> region later.
>
> Ditto.
>
>> Signed-off-by: Lianbo Jiang <[email protected]>
>> ---
>> arch/x86/realmode/init.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index 7dce39c8c034..1f0492830f2c 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -34,6 +34,17 @@ void __init reserve_real_mode(void)
>>
>> memblock_reserve(mem, size);
>> set_real_mode_mem(mem);
>> +
>> +#ifdef CONFIG_KEXEC_CORE
>> + /*
>> + * When the crashkernel option is specified, only use the low
>> + * 1MiB for the real mode trampoline.
>> + */
>> + if (strstr(boot_command_line, "crashkernel=")) {
>> + memblock_reserve(0, 1<<20);
>> + pr_info("Reserving the low 1MiB of memory for crashkernel\n");
>> + }
>> +#endif /* CONFIG_KEXEC_CORE */
>
> This ifdeffery needs to be a function in kernel/kexec_core.c which is
> called by reserve_real_mode(), instead.
>
Good understanding. I will try to improve it later.

Thanks.
Lianbo
> Thx.
>

2019-10-23 12:24:26

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

在 2019年10月22日 16:30, Borislav Petkov 写道:
> This ifdeffery needs to be a function in kernel/kexec_core.c which is
> called by reserve_real_mode(), instead.

Would you mind if i improve this patch as follow? Thanks.

From 5804abec62279585f374d78ace1250505c44c6b7 Mon Sep 17 00:00:00 2001
From: Lianbo Jiang <[email protected]>
Date: Wed, 23 Oct 2019 11:27:04 +0800
Subject: [PATCH] x86/kdump: always reserve the low 1MiB when the crashkernel
option is specified

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1MiB memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1MiB
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1MiB.

Signed-off-by: Lianbo Jiang <[email protected]>
---
BTW:I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for
SME situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter.

In addition, also need to clean all the code related to the backup
region later.

arch/x86/realmode/init.c | 2 ++
include/linux/kexec.h | 2 ++
kernel/kexec_core.c | 13 +++++++++++++
3 files changed, 17 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..064cc79a015d 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -3,6 +3,7 @@
#include <linux/slab.h>
#include <linux/memblock.h>
#include <linux/mem_encrypt.h>
+#include <linux/kexec.h>

#include <asm/set_memory.h>
#include <asm/pgtable.h>
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)

memblock_reserve(mem, size);
set_real_mode_mem(mem);
+ kexec_reserve_low_1MiB();
}

static void __init setup_real_mode(void)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2e43a4..30acf1d738bc 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
extern void crash_kexec(struct pt_regs *);
int kexec_should_crash(struct task_struct *);
int kexec_crash_loaded(void);
+void kexec_reserve_low_1MiB(void);
void crash_save_cpu(struct pt_regs *regs, int cpu);
extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);

@@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
static inline void crash_kexec(struct pt_regs *regs) { }
static inline int kexec_should_crash(struct task_struct *p) { return 0; }
static inline int kexec_crash_loaded(void) { return 0; }
+static inline void kexec_reserve_low_1MiB(void) { }
#define kexec_in_progress false
#endif /* CONFIG_KEXEC_CORE */

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 15d70a90b50d..5bd89f1fee42 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -37,6 +37,7 @@
#include <linux/compiler.h>
#include <linux/hugetlb.h>
#include <linux/frame.h>
+#include <linux/memblock.h>

#include <asm/page.h>
#include <asm/sections.h>
@@ -70,6 +71,18 @@ struct resource crashk_low_res = {
.desc = IORES_DESC_CRASH_KERNEL
};

+/*
+ * When the crashkernel option is specified, only use the low
+ * 1MiB for the real mode trampoline.
+ */
+void kexec_reserve_low_1MiB(void)
+{
+ if (strstr(boot_command_line, "crashkernel=")) {
+ memblock_reserve(0, 1<<20);
+ pr_info("Reserving the low 1MiB of memory for crashkernel\n");
+ }
+}
+
int kexec_should_crash(struct task_struct *p)
{
/*
--
2.17.1

2019-10-24 23:09:00

by [email protected]

[permalink] [raw]
Subject: RE: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

I don't find the corresponding patch in the v5 patchset, so I comment here.

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of lijiang
> Sent: Wednesday, October 23, 2019 2:35 PM
> To: Borislav Petkov <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the
> crashkernel option is specified
>
> 在 2019年10月22日 16:30, Borislav Petkov 写道:
> > This ifdeffery needs to be a function in kernel/kexec_core.c which is
> > called by reserve_real_mode(), instead.
>
> Would you mind if i improve this patch as follow? Thanks.
>
> From 5804abec62279585f374d78ace1250505c44c6b7 Mon Sep 17 00:00:00 2001
> From: Lianbo Jiang <[email protected]>
> Date: Wed, 23 Oct 2019 11:27:04 +0800
> Subject: [PATCH] x86/kdump: always reserve the low 1MiB when the crashkernel
> option is specified
>
> Kdump kernel will reuse the first 640k region because the real mode
> trampoline has to work in this area. When the vmcore is dumped, the
> old memory in this area may be accessed, therefore, kernel has to
> copy the contents of the first 640k area to a backup region so that
> kdump kernel can read the old memory from the backup area of the
> first 640k area, which is done in the purgatory().
>
> But, the current handling of copying the first 640k area runs into
> problems when SME is enabled, kernel does not properly copy these
> old memory to the backup area in the purgatory(), thereby, kdump
> kernel reads out the encrypted contents, because the kdump kernel
> must access the first kernel's memory with the encryption bit set
> when SME is enabled in the first kernel. Please refer to this link:
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>
> Finally, it causes the following errors, and the crash tool gets
> invalid pointers when parsing the vmcore.
>
> crash> kmem -s|grep -i invalid
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid
> freepointer:a6086ac099f0c5a4
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid
> freepointer:a6086ac099f0c5a4
> crash>
>
> To avoid the above errors, when the crashkernel option is specified,
> lets reserve the remaining low 1MiB memory(after reserving real mode
> memory) so that the allocated memory does not fall into the low 1MiB
> area, which makes us not to copy the first 640k content to a backup
> region in purgatory(). This indicates that it does not need to be
> included in crash dumps or used for anything except the processor
> trampolines that must live in the low 1MiB.
>
> Signed-off-by: Lianbo Jiang <[email protected]>
> ---
> BTW:I also tried to fix the above problem in purgatory(), but there
> are too many restricts in purgatory() context, for example: i can't
> allocate new memory to create the identity mapping page table for
> SME situation.
>
> Currently, there are two places where the first 640k area is needed,
> the first one is in the find_trampoline_placement(), another one is
> in the reserve_real_mode(), and their content doesn't matter.
>
> In addition, also need to clean all the code related to the backup
> region later.
>
> arch/x86/realmode/init.c | 2 ++
> include/linux/kexec.h | 2 ++
> kernel/kexec_core.c | 13 +++++++++++++
> 3 files changed, 17 insertions(+)
>
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 7dce39c8c034..064cc79a015d 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -3,6 +3,7 @@
> #include <linux/slab.h>
> #include <linux/memblock.h>
> #include <linux/mem_encrypt.h>
> +#include <linux/kexec.h>
>
> #include <asm/set_memory.h>
> #include <asm/pgtable.h>
> @@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
>
> memblock_reserve(mem, size);
> set_real_mode_mem(mem);
> + kexec_reserve_low_1MiB();
> }
>
> static void __init setup_real_mode(void)
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 1776eb2e43a4..30acf1d738bc 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
> extern void crash_kexec(struct pt_regs *);
> int kexec_should_crash(struct task_struct *);
> int kexec_crash_loaded(void);
> +void kexec_reserve_low_1MiB(void);
> void crash_save_cpu(struct pt_regs *regs, int cpu);
> extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
>
> @@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
> static inline void crash_kexec(struct pt_regs *regs) { }
> static inline int kexec_should_crash(struct task_struct *p) { return 0; }
> static inline int kexec_crash_loaded(void) { return 0; }
> +static inline void kexec_reserve_low_1MiB(void) { }
> #define kexec_in_progress false
> #endif /* CONFIG_KEXEC_CORE */
>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 15d70a90b50d..5bd89f1fee42 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -37,6 +37,7 @@
> #include <linux/compiler.h>
> #include <linux/hugetlb.h>
> #include <linux/frame.h>
> +#include <linux/memblock.h>
>
> #include <asm/page.h>
> #include <asm/sections.h>
> @@ -70,6 +71,18 @@ struct resource crashk_low_res = {
> .desc = IORES_DESC_CRASH_KERNEL
> };
>
> +/*
> + * When the crashkernel option is specified, only use the low
> + * 1MiB for the real mode trampoline.
> + */
> +void kexec_reserve_low_1MiB(void)
> +{
> + if (strstr(boot_command_line, "crashkernel=")) {

strstr() matches for example, ANYEXTRACHARACTERScrashkernel=ANYEXTRACHARACTERS.

Is it enough to use cmdline_find_option_bool()?

> + memblock_reserve(0, 1<<20);
> + pr_info("Reserving the low 1MiB of memory for
> crashkernel\n");
> + }
> +}
> +
> int kexec_should_crash(struct task_struct *p)
> {
> /*
> --
> 2.17.1

2019-10-24 23:17:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

On Thu, Oct 24, 2019 at 08:13:25AM +0000, [email protected] wrote:
> I don't find the corresponding patch in the v5 patchset, so I comment here.

You don't?

https://lore.kernel.org/lkml/[email protected]/

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-10-25 09:56:22

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

在 2019年10月24日 16:13, [email protected] 写道:
> I don't find the corresponding patch in the v5 patchset, so I comment here.
>
Thanks for your comment.

>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of lijiang
>> Sent: Wednesday, October 23, 2019 2:35 PM
>> To: Borislav Petkov <[email protected]>
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]
>> Subject: Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the
>> crashkernel option is specified
>>
>> 在 2019年10月22日 16:30, Borislav Petkov 写道:
>>> This ifdeffery needs to be a function in kernel/kexec_core.c which is
>>> called by reserve_real_mode(), instead.
>>
>> Would you mind if i improve this patch as follow? Thanks.
>>
>> From 5804abec62279585f374d78ace1250505c44c6b7 Mon Sep 17 00:00:00 2001
>> From: Lianbo Jiang <[email protected]>
>> Date: Wed, 23 Oct 2019 11:27:04 +0800
>> Subject: [PATCH] x86/kdump: always reserve the low 1MiB when the crashkernel
>> option is specified
>>
>> Kdump kernel will reuse the first 640k region because the real mode
>> trampoline has to work in this area. When the vmcore is dumped, the
>> old memory in this area may be accessed, therefore, kernel has to
>> copy the contents of the first 640k area to a backup region so that
>> kdump kernel can read the old memory from the backup area of the
>> first 640k area, which is done in the purgatory().
>>
>> But, the current handling of copying the first 640k area runs into
>> problems when SME is enabled, kernel does not properly copy these
>> old memory to the backup area in the purgatory(), thereby, kdump
>> kernel reads out the encrypted contents, because the kdump kernel
>> must access the first kernel's memory with the encryption bit set
>> when SME is enabled in the first kernel. Please refer to this link:
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>>
>> Finally, it causes the following errors, and the crash tool gets
>> invalid pointers when parsing the vmcore.
>>
>> crash> kmem -s|grep -i invalid
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid
>> freepointer:a6086ac099f0c5a4
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid
>> freepointer:a6086ac099f0c5a4
>> crash>
>>
>> To avoid the above errors, when the crashkernel option is specified,
>> lets reserve the remaining low 1MiB memory(after reserving real mode
>> memory) so that the allocated memory does not fall into the low 1MiB
>> area, which makes us not to copy the first 640k content to a backup
>> region in purgatory(). This indicates that it does not need to be
>> included in crash dumps or used for anything except the processor
>> trampolines that must live in the low 1MiB.
>>
>> Signed-off-by: Lianbo Jiang <[email protected]>
>> ---
>> BTW:I also tried to fix the above problem in purgatory(), but there
>> are too many restricts in purgatory() context, for example: i can't
>> allocate new memory to create the identity mapping page table for
>> SME situation.
>>
>> Currently, there are two places where the first 640k area is needed,
>> the first one is in the find_trampoline_placement(), another one is
>> in the reserve_real_mode(), and their content doesn't matter.
>>
>> In addition, also need to clean all the code related to the backup
>> region later.
>>
>> arch/x86/realmode/init.c | 2 ++
>> include/linux/kexec.h | 2 ++
>> kernel/kexec_core.c | 13 +++++++++++++
>> 3 files changed, 17 insertions(+)
>>
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index 7dce39c8c034..064cc79a015d 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -3,6 +3,7 @@
>> #include <linux/slab.h>
>> #include <linux/memblock.h>
>> #include <linux/mem_encrypt.h>
>> +#include <linux/kexec.h>
>>
>> #include <asm/set_memory.h>
>> #include <asm/pgtable.h>
>> @@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
>>
>> memblock_reserve(mem, size);
>> set_real_mode_mem(mem);
>> + kexec_reserve_low_1MiB();
>> }
>>
>> static void __init setup_real_mode(void)
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 1776eb2e43a4..30acf1d738bc 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
>> extern void crash_kexec(struct pt_regs *);
>> int kexec_should_crash(struct task_struct *);
>> int kexec_crash_loaded(void);
>> +void kexec_reserve_low_1MiB(void);
>> void crash_save_cpu(struct pt_regs *regs, int cpu);
>> extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
>>
>> @@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
>> static inline void crash_kexec(struct pt_regs *regs) { }
>> static inline int kexec_should_crash(struct task_struct *p) { return 0; }
>> static inline int kexec_crash_loaded(void) { return 0; }
>> +static inline void kexec_reserve_low_1MiB(void) { }
>> #define kexec_in_progress false
>> #endif /* CONFIG_KEXEC_CORE */
>>
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 15d70a90b50d..5bd89f1fee42 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -37,6 +37,7 @@
>> #include <linux/compiler.h>
>> #include <linux/hugetlb.h>
>> #include <linux/frame.h>
>> +#include <linux/memblock.h>
>>
>> #include <asm/page.h>
>> #include <asm/sections.h>
>> @@ -70,6 +71,18 @@ struct resource crashk_low_res = {
>> .desc = IORES_DESC_CRASH_KERNEL
>> };
>>
>> +/*
>> + * When the crashkernel option is specified, only use the low
>> + * 1MiB for the real mode trampoline.
>> + */
>> +void kexec_reserve_low_1MiB(void)
>> +{
>> + if (strstr(boot_command_line, "crashkernel=")) {
>
> strstr() matches for example, ANYEXTRACHARACTERScrashkernel=ANYEXTRACHARACTERS.
>
> Is it enough to use cmdline_find_option_bool()?
>
The cmdline_find_option_bool() will find a boolean option, but the crashkernel option
is not a boolean option, maybe it looks odd. So, should we use the cmdline_find_option()
better?

+#include <asm/cmdline.h>

void __init kexec_reserve_low_1MiB(void)
{
- if (strstr(boot_command_line, "crashkernel=")) {
+ char buffer[4];
+
+ if (cmdline_find_option(boot_command_line, "crashkernel=",
+ buffer, sizeof(buffer))) {
memblock_reserve(0, 1<<20);
pr_info("Reserving the low 1MiB of memory for crashkernel\n");
}

And here, no need to parse the arguments of crashkernel(sometimes, which has a
complicated syntax), so the size of buffer should be enough. What's your opinion?

Thanks
Lianbo

>> + memblock_reserve(0, 1<<20);
>> + pr_info("Reserving the low 1MiB of memory for
>> crashkernel\n");
>> + }
>> +}
>> +
>> int kexec_should_crash(struct task_struct *p)
>> {
>> /*
>> --
>> 2.17.1
>

2019-10-25 19:10:40

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] x86/kdump: always reserve the low 1MiB when the crashkernel

Hi lijiang,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.4-rc4 next-20191024]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/lijiang/x86-kdump-always-reserve-the-low-1MiB-when-the-crashkernel/20191025-030439
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f116b96685a046a89c25d4a6ba2da489145c8888
config: i386-defconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-14) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> WARNING: vmlinux.o(.text+0xe39b7): Section mismatch in reference from the function kexec_reserve_low_1MiB() to the variable .init.data:boot_command_line
The function kexec_reserve_low_1MiB() references
the variable __initdata boot_command_line.
This is often because kexec_reserve_low_1MiB lacks a __initdata
annotation or the annotation of boot_command_line is wrong.
--
>> WARNING: vmlinux.o(.text+0xe39d0): Section mismatch in reference from the function kexec_reserve_low_1MiB() to the function .meminit.text:memblock_reserve()
The function kexec_reserve_low_1MiB() references
the function __meminit memblock_reserve().
This is often because kexec_reserve_low_1MiB lacks a __meminit
annotation or the annotation of memblock_reserve is wrong.

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.88 kB)
.config.gz (27.49 kB)
Download all attachments

2019-10-25 19:18:25

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH] x86/kdump: always reserve the low 1MiB when the crashkernel

在 2019年10月25日 06:12, kbuild test robot 写道:
> Hi lijiang,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on linus/master]
> [cannot apply to v5.4-rc4 next-20191024]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url: https://github.com/0day-ci/linux/commits/lijiang/x86-kdump-always-reserve-the-low-1MiB-when-the-crashkernel/20191025-030439
> base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f116b96685a046a89c25d4a6ba2da489145c8888
> config: i386-defconfig (attached as .config)
> compiler: gcc-7 (Debian 7.4.0-14) 7.4.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
>>> WARNING: vmlinux.o(.text+0xe39b7): Section mismatch in reference from the function kexec_reserve_low_1MiB() to the variable .init.data:boot_command_line
> The function kexec_reserve_low_1MiB() references
> the variable __initdata boot_command_line.
> This is often because kexec_reserve_low_1MiB lacks a __initdata
> annotation or the annotation of boot_command_line is wrong.
> --
>>> WARNING: vmlinux.o(.text+0xe39d0): Section mismatch in reference from the function kexec_reserve_low_1MiB() to the function .meminit.text:memblock_reserve()
> The function kexec_reserve_low_1MiB() references
> the function __meminit memblock_reserve().
> This is often because kexec_reserve_low_1MiB lacks a __meminit
> annotation or the annotation of memblock_reserve is wrong.
>
These warnings have been fixed in patch v5. Please refer to the latest patch v5.

Thanks.
Lianbo

> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>