2019-10-07 07:12:36

by Lianbo Jiang

[permalink] [raw]
Subject: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Kdump kernel will reuse the first 640k region because of some reasons,
for example: the trampline and conventional PC system BIOS region may
require to allocate memory in this area. Obviously, kdump kernel will
also overwrite the first 640k region, therefore, kernel has to copy
the contents of the first 640k area to a backup area, which is done in
purgatory(), because vmcore may need the old memory. When vmcore is
dumped, kdump kernel will read the old memory from the backup area of
the first 640k area.

Basically, the main reason should be clear, kernel does not correctly
handle the first 640k region when SME is active, which causes that
kernel does not properly copy these old memory to the backup area in
purgatory(). Therefore, kdump kernel reads out the incorrect contents
from the backup area when dumping vmcore. Finally, the phenomenon is
as follow:

[root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values

KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
CPUS: 128
DATE: Thu Sep 19 08:31:18 2019
UPTIME: 00:01:21
LOAD AVERAGE: 0.16, 0.07, 0.02
TASKS: 1343
NODENAME: amd-ethanol
RELEASE: 5.3.0-rc7+
VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
MACHINE: x86_64 (2195 Mhz)
MEMORY: 127.9 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 9789
COMMAND: "bash"
TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
CPU: 83
STATE: TASK_RUNNING (PANIC)

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
crash>

BTW: I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for SME
situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter. To avoid
the above error, lets occupy the remain memory of the first 640k region
(expect for the trampoline and real mode) so that the allocated memory
does not fall into the first 640k area when SME is active, which makes
us not to worry about whether kernel can correctly copy the contents of
the first 640k area to a backup region in the purgatory().

Signed-off-by: Lianbo Jiang <[email protected]>
---
Changes since v1:
1. Improve patch log
2. Change the checking condition from sme_active() to sme_active()
&& strstr(boot_command_line, "crashkernel=")

arch/x86/kernel/setup.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 77ea96b794bd..bdb1a02a84fd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)

reserve_real_mode();

+ if (sme_active() && strstr(boot_command_line, "crashkernel="))
+ memblock_reserve(0, 640*1024);
+
trim_platform_memory_ranges();
trim_low_memory_range();

--
2.17.1


2019-10-07 09:34:56

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

Hi Lianbo,
On 10/07/19 at 03:08pm, Lianbo Jiang wrote:
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>
> Kdump kernel will reuse the first 640k region because of some reasons,
> for example: the trampline and conventional PC system BIOS region may
> require to allocate memory in this area. Obviously, kdump kernel will
> also overwrite the first 640k region, therefore, kernel has to copy
> the contents of the first 640k area to a backup area, which is done in
> purgatory(), because vmcore may need the old memory. When vmcore is
> dumped, kdump kernel will read the old memory from the backup area of
> the first 640k area.
>
> Basically, the main reason should be clear, kernel does not correctly
> handle the first 640k region when SME is active, which causes that
> kernel does not properly copy these old memory to the backup area in
> purgatory(). Therefore, kdump kernel reads out the incorrect contents
> from the backup area when dumping vmcore. Finally, the phenomenon is
> as follow:
>
> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
>
> KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
> DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
> CPUS: 128
> DATE: Thu Sep 19 08:31:18 2019
> UPTIME: 00:01:21
> LOAD AVERAGE: 0.16, 0.07, 0.02
> TASKS: 1343
> NODENAME: amd-ethanol
> RELEASE: 5.3.0-rc7+
> VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
> MACHINE: x86_64 (2195 Mhz)
> MEMORY: 127.9 GB
> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> PID: 9789
> COMMAND: "bash"
> TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
> CPU: 83
> STATE: TASK_RUNNING (PANIC)
>
> crash> kmem -s|grep -i invalid
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> crash>
>
> BTW: I also tried to fix the above problem in purgatory(), but there
> are too many restricts in purgatory() context, for example: i can't
> allocate new memory to create the identity mapping page table for SME
> situation.
>
> Currently, there are two places where the first 640k area is needed,
> the first one is in the find_trampoline_placement(), another one is
> in the reserve_real_mode(), and their content doesn't matter. To avoid
> the above error, lets occupy the remain memory of the first 640k region
> (expect for the trampoline and real mode) so that the allocated memory
> does not fall into the first 640k area when SME is active, which makes
> us not to worry about whether kernel can correctly copy the contents of
> the first 640k area to a backup region in the purgatory().
>
> Signed-off-by: Lianbo Jiang <[email protected]>
> ---
> Changes since v1:
> 1. Improve patch log
> 2. Change the checking condition from sme_active() to sme_active()
> && strstr(boot_command_line, "crashkernel=")
>
> arch/x86/kernel/setup.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 77ea96b794bd..bdb1a02a84fd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)
>
> reserve_real_mode();
>
> + if (sme_active() && strstr(boot_command_line, "crashkernel="))
> + memblock_reserve(0, 640*1024);
> +

Seems you missed the comment about "unconditionally do it", only check
crashkernel param looks better.

Also I noticed reserve_crashkernel is called after initmem_init, I'm not
sure if memblock_reserve is good enough in early code before
initmem_init.

> trim_platform_memory_ranges();
> trim_low_memory_range();
>
> --
> 2.17.1
>

Thanks
Dave

2019-10-07 11:54:59

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

在 2019年10月07日 17:33, Dave Young 写道:
> Hi Lianbo,
> On 10/07/19 at 03:08pm, Lianbo Jiang wrote:
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>>
>> Kdump kernel will reuse the first 640k region because of some reasons,
>> for example: the trampline and conventional PC system BIOS region may
>> require to allocate memory in this area. Obviously, kdump kernel will
>> also overwrite the first 640k region, therefore, kernel has to copy
>> the contents of the first 640k area to a backup area, which is done in
>> purgatory(), because vmcore may need the old memory. When vmcore is
>> dumped, kdump kernel will read the old memory from the backup area of
>> the first 640k area.
>>
>> Basically, the main reason should be clear, kernel does not correctly
>> handle the first 640k region when SME is active, which causes that
>> kernel does not properly copy these old memory to the backup area in
>> purgatory(). Therefore, kdump kernel reads out the incorrect contents
>> from the backup area when dumping vmcore. Finally, the phenomenon is
>> as follow:
>>
>> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
>> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
>>
>> KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
>> DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
>> CPUS: 128
>> DATE: Thu Sep 19 08:31:18 2019
>> UPTIME: 00:01:21
>> LOAD AVERAGE: 0.16, 0.07, 0.02
>> TASKS: 1343
>> NODENAME: amd-ethanol
>> RELEASE: 5.3.0-rc7+
>> VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
>> MACHINE: x86_64 (2195 Mhz)
>> MEMORY: 127.9 GB
>> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>> PID: 9789
>> COMMAND: "bash"
>> TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
>> CPU: 83
>> STATE: TASK_RUNNING (PANIC)
>>
>> crash> kmem -s|grep -i invalid
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> crash>
>>
>> BTW: I also tried to fix the above problem in purgatory(), but there
>> are too many restricts in purgatory() context, for example: i can't
>> allocate new memory to create the identity mapping page table for SME
>> situation.
>>
>> Currently, there are two places where the first 640k area is needed,
>> the first one is in the find_trampoline_placement(), another one is
>> in the reserve_real_mode(), and their content doesn't matter. To avoid
>> the above error, lets occupy the remain memory of the first 640k region
>> (expect for the trampoline and real mode) so that the allocated memory
>> does not fall into the first 640k area when SME is active, which makes
>> us not to worry about whether kernel can correctly copy the contents of
>> the first 640k area to a backup region in the purgatory().
>>
>> Signed-off-by: Lianbo Jiang <[email protected]>
>> ---
>> Changes since v1:
>> 1. Improve patch log
>> 2. Change the checking condition from sme_active() to sme_active()
>> && strstr(boot_command_line, "crashkernel=")
>>
>> arch/x86/kernel/setup.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 77ea96b794bd..bdb1a02a84fd 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)
>>
>> reserve_real_mode();
>>
>> + if (sme_active() && strstr(boot_command_line, "crashkernel="))
>> + memblock_reserve(0, 640*1024);
>> +
>
> Seems you missed the comment about "unconditionally do it", only check
> crashkernel param looks better.
>
If so, it means that copying the first 640k to a backup region is no longer needed, and
i should post a patch series to remove the copy_backup_region(). Any idea?

> Also I noticed reserve_crashkernel is called after initmem_init, I'm not
> sure if memblock_reserve is good enough in early code before
> initmem_init.
>
The first zero page and real mode are also reserved before the initmem_init(),
and seems that they work well until now.

Thanks.
Lianbo

>> trim_platform_memory_ranges();
>> trim_low_memory_range();
>>
>> --
>> 2.17.1
>>
>
> Thanks
> Dave
>

2019-10-07 17:14:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

lijiang <[email protected]> writes:

> 在 2019年10月07日 17:33, Dave Young 写道:
>> Hi Lianbo,
>> On 10/07/19 at 03:08pm, Lianbo Jiang wrote:
>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>>>
>>> Kdump kernel will reuse the first 640k region because of some reasons,
>>> for example: the trampline and conventional PC system BIOS region may
>>> require to allocate memory in this area. Obviously, kdump kernel will
>>> also overwrite the first 640k region, therefore, kernel has to copy
>>> the contents of the first 640k area to a backup area, which is done in
>>> purgatory(), because vmcore may need the old memory. When vmcore is
>>> dumped, kdump kernel will read the old memory from the backup area of
>>> the first 640k area.
>>>
>>> Basically, the main reason should be clear, kernel does not correctly
>>> handle the first 640k region when SME is active, which causes that
>>> kernel does not properly copy these old memory to the backup area in
>>> purgatory(). Therefore, kdump kernel reads out the incorrect contents
>>> from the backup area when dumping vmcore. Finally, the phenomenon is
>>> as follow:
>>>
>>> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
>>> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
>>>
>>> KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
>>> DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
>>> CPUS: 128
>>> DATE: Thu Sep 19 08:31:18 2019
>>> UPTIME: 00:01:21
>>> LOAD AVERAGE: 0.16, 0.07, 0.02
>>> TASKS: 1343
>>> NODENAME: amd-ethanol
>>> RELEASE: 5.3.0-rc7+
>>> VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
>>> MACHINE: x86_64 (2195 Mhz)
>>> MEMORY: 127.9 GB
>>> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>>> PID: 9789
>>> COMMAND: "bash"
>>> TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
>>> CPU: 83
>>> STATE: TASK_RUNNING (PANIC)
>>>
>>> crash> kmem -s|grep -i invalid
>>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>>> crash>
>>>
>>> BTW: I also tried to fix the above problem in purgatory(), but there
>>> are too many restricts in purgatory() context, for example: i can't
>>> allocate new memory to create the identity mapping page table for SME
>>> situation.
>>>
>>> Currently, there are two places where the first 640k area is needed,
>>> the first one is in the find_trampoline_placement(), another one is
>>> in the reserve_real_mode(), and their content doesn't matter. To avoid
>>> the above error, lets occupy the remain memory of the first 640k region
>>> (expect for the trampoline and real mode) so that the allocated memory
>>> does not fall into the first 640k area when SME is active, which makes
>>> us not to worry about whether kernel can correctly copy the contents of
>>> the first 640k area to a backup region in the purgatory().
>>>
>>> Signed-off-by: Lianbo Jiang <[email protected]>
>>> ---
>>> Changes since v1:
>>> 1. Improve patch log
>>> 2. Change the checking condition from sme_active() to sme_active()
>>> && strstr(boot_command_line, "crashkernel=")
>>>
>>> arch/x86/kernel/setup.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>> index 77ea96b794bd..bdb1a02a84fd 100644
>>> --- a/arch/x86/kernel/setup.c
>>> +++ b/arch/x86/kernel/setup.c
>>> @@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)
>>>
>>> reserve_real_mode();
>>>
>>> + if (sme_active() && strstr(boot_command_line, "crashkernel="))
>>> + memblock_reserve(0, 640*1024);
>>> +
>>
>> Seems you missed the comment about "unconditionally do it", only check
>> crashkernel param looks better.
>>
> If so, it means that copying the first 640k to a backup region is no longer needed, and
> i should post a patch series to remove the copy_backup_region(). Any idea?
>
>> Also I noticed reserve_crashkernel is called after initmem_init, I'm not
>> sure if memblock_reserve is good enough in early code before
>> initmem_init.
>>
> The first zero page and real mode are also reserved before the initmem_init(),
> and seems that they work well until now.
>
> Thanks.
> Lianbo

This has only been boot tested but I think this is about what we need.

I feel like I haven't found and deleted all of the backup region code.

I think it is important to have the reservation code in reseve_real_mode
as the logic is fundamentally intertwined.

Eric


From: "Eric W. Biederman" <[email protected]>
Date: Mon, 7 Oct 2019 11:57:24 -0500
Subject: [PATCH] x86/kexec: Always reserve the low 1MiB

When the crashkernel kernel command line option is specified always
reserve the low 1MiB. That way it does not need to be included
in crash dumps or used for anything execept the processor trampolines
that must live in the low 1MiB.

The current handling of copying the low 1MiB runs into problems when
SME is active. So just simplify everything and make it unnecessary
to do anything with the low 1MiB.

This comes at a cost of 640KiB. But when crash kernels need 32MiB or
more to run this isn't much more, and it makes everything much more
reliable.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/x86/include/asm/kexec.h | 4 ----
arch/x86/kernel/crash.c | 19 -------------------
arch/x86/purgatory/purgatory.c | 15 ---------------
arch/x86/realmode/init.c | 10 ++++++++++
4 files changed, 10 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..e36307ac324d 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
# define KEXEC_ARCH KEXEC_ARCH_X86_64
#endif

-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END (640 * 1024UL - 1) /* 640K */
-
/*
* This function is responsible for capturing register states if coming
* via panic otherwise just fix up the ss and sp if coming via kernel
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..dc4773d2f4a6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -409,31 +409,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
return ret;
}

-static int determine_backup_region(struct resource *res, void *arg)
-{
- struct kimage *image = arg;
-
- image->arch.backup_src_start = res->start;
- image->arch.backup_src_sz = resource_size(res);
-
- /* Expecting only one range for backup region */
- return 1;
-}
-
int crash_load_segments(struct kimage *image)
{
int ret;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
.buf_max = ULONG_MAX, .top_down = false };

- /*
- * Determine and load a segment for backup area. First 640K RAM
- * region is backup source
- */
-
- ret = walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
- image, determine_backup_region);
-
/* Zero or postive return values are ok */
if (ret < 0)
return ret;
diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
index 3b95410ff0f8..448de04703ba 100644
--- a/arch/x86/purgatory/purgatory.c
+++ b/arch/x86/purgatory/purgatory.c
@@ -22,20 +22,6 @@ u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);

struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(.kexec-purgatory);

-/*
- * On x86, second kernel requries first 640K of memory to boot. Copy
- * first 640K to a backup region in reserved memory range so that second
- * kernel can use first 640K.
- */
-static int copy_backup_region(void)
-{
- if (purgatory_backup_dest) {
- memcpy((void *)purgatory_backup_dest,
- (void *)purgatory_backup_src, purgatory_backup_sz);
- }
- return 0;
-}
-
static int verify_sha256_digest(void)
{
struct kexec_sha_region *ptr, *end;
@@ -66,7 +52,6 @@ void purgatory(void)
for (;;)
;
}
- copy_backup_region();
}

/*
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..76c680ad23a1 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -34,6 +34,16 @@ void __init reserve_real_mode(void)

memblock_reserve(mem, size);
set_real_mode_mem(mem);
+
+#ifdef CONFIG_KEXEC_CORE
+ /* When crashkernel is specified only use the low 1MiB for the
+ * real mode trampolines.
+ */
+ if (strstr(boot_command_line, "crashkernel=")) {
+ memblock_reserve(0, 1<<20);
+ pr_info("Reserving low 1MiB of memory for crashkernel\n");
+ }
+#endif /* CONFIG_KEXEC_CORE */
}

static void __init setup_real_mode(void)
--
2.20.1


2019-10-08 02:45:34

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

On 10/07/19 at 12:12pm, Eric W. Biederman wrote:
> This has only been boot tested but I think this is about what we need.
>
> I feel like I haven't found and deleted all of the backup region code.
>
> I think it is important to have the reservation code in reseve_real_mode
> as the logic is fundamentally intertwined.
>
> Eric
>
>
> From: "Eric W. Biederman" <[email protected]>
> Date: Mon, 7 Oct 2019 11:57:24 -0500
> Subject: [PATCH] x86/kexec: Always reserve the low 1MiB
>
> When the crashkernel kernel command line option is specified always
> reserve the low 1MiB. That way it does not need to be included
> in crash dumps or used for anything execept the processor trampolines
> that must live in the low 1MiB.
>
> The current handling of copying the low 1MiB runs into problems when
> SME is active. So just simplify everything and make it unnecessary
> to do anything with the low 1MiB.
>
> This comes at a cost of 640KiB. But when crash kernels need 32MiB or
> more to run this isn't much more, and it makes everything much more
> reliable.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> arch/x86/include/asm/kexec.h | 4 ----
> arch/x86/kernel/crash.c | 19 -------------------
> arch/x86/purgatory/purgatory.c | 15 ---------------
> arch/x86/realmode/init.c | 10 ++++++++++
> 4 files changed, 10 insertions(+), 38 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 5e7d6b46de97..e36307ac324d 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -66,10 +66,6 @@ struct kimage;
> # define KEXEC_ARCH KEXEC_ARCH_X86_64
> #endif
>
> -/* Memory to backup during crash kdump */
> -#define KEXEC_BACKUP_SRC_START (0UL)
> -#define KEXEC_BACKUP_SRC_END (640 * 1024UL - 1) /* 640K */
> -
> /*
> * This function is responsible for capturing register states if coming
> * via panic otherwise just fix up the ss and sp if coming via kernel
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index eb651fbde92a..dc4773d2f4a6 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -409,31 +409,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
> return ret;
> }
>
> -static int determine_backup_region(struct resource *res, void *arg)
> -{
> - struct kimage *image = arg;
> -
> - image->arch.backup_src_start = res->start;
> - image->arch.backup_src_sz = resource_size(res);
> -
> - /* Expecting only one range for backup region */
> - return 1;
> -}
> -
> int crash_load_segments(struct kimage *image)
> {
> int ret;
> struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> .buf_max = ULONG_MAX, .top_down = false };
>
> - /*
> - * Determine and load a segment for backup area. First 640K RAM
> - * region is backup source
> - */
> -
> - ret = walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
> - image, determine_backup_region);
> -
> /* Zero or postive return values are ok */
> if (ret < 0)
> return ret;
> diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
> index 3b95410ff0f8..448de04703ba 100644
> --- a/arch/x86/purgatory/purgatory.c
> +++ b/arch/x86/purgatory/purgatory.c
> @@ -22,20 +22,6 @@ u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
>
> struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(.kexec-purgatory);
>
> -/*
> - * On x86, second kernel requries first 640K of memory to boot. Copy
> - * first 640K to a backup region in reserved memory range so that second
> - * kernel can use first 640K.
> - */
> -static int copy_backup_region(void)
> -{
> - if (purgatory_backup_dest) {
> - memcpy((void *)purgatory_backup_dest,
> - (void *)purgatory_backup_src, purgatory_backup_sz);
> - }
> - return 0;
> -}
> -
> static int verify_sha256_digest(void)
> {
> struct kexec_sha_region *ptr, *end;
> @@ -66,7 +52,6 @@ void purgatory(void)
> for (;;)
> ;
> }
> - copy_backup_region();
> }
>
> /*
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 7dce39c8c034..76c680ad23a1 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -34,6 +34,16 @@ void __init reserve_real_mode(void)
>
> memblock_reserve(mem, size);
> set_real_mode_mem(mem);
> +
> +#ifdef CONFIG_KEXEC_CORE
> + /* When crashkernel is specified only use the low 1MiB for the
> + * real mode trampolines.
> + */
> + if (strstr(boot_command_line, "crashkernel=")) {
> + memblock_reserve(0, 1<<20);
> + pr_info("Reserving low 1MiB of memory for crashkernel\n");
> + }

Reserving low 1M looks good to me. The memblock reserved pages won't
enter into buddy allocator, unless they are freed explicitly with
memblock_free() later.
> +#endif /* CONFIG_KEXEC_CORE */

I doubt this patch can work in kdump kernel booting. Because the low 1MB
is not passed to kdump kernel as system RAM, please check below code.

/* Prepare memory map for crash dump kernel */
int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
{
......

/* Add first 640K segment */
ei.addr = image->arch.backup_src_start;
ei.size = image->arch.backup_src_sz;
ei.type = E820_TYPE_RAM;
add_e820_entry(params, &ei);

......
}

You can see that image->arch.backup_src_start/backup_src_sz are zero.
Lianbo will take a test to check.

Thanks
Baoquan

2019-10-08 02:59:26

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

On 10/08/19 at 10:44am, Baoquan He wrote:
> On 10/07/19 at 12:12pm, Eric W. Biederman wrote:
> > This has only been boot tested but I think this is about what we need.
> >
> > I feel like I haven't found and deleted all of the backup region code.
> >
> > I think it is important to have the reservation code in reseve_real_mode
> > as the logic is fundamentally intertwined.
> >
> > Eric
> >
> >
> > From: "Eric W. Biederman" <[email protected]>
> > Date: Mon, 7 Oct 2019 11:57:24 -0500
> > Subject: [PATCH] x86/kexec: Always reserve the low 1MiB
> >
> > When the crashkernel kernel command line option is specified always
> > reserve the low 1MiB. That way it does not need to be included
> > in crash dumps or used for anything execept the processor trampolines
> > that must live in the low 1MiB.
> >
> > The current handling of copying the low 1MiB runs into problems when
> > SME is active. So just simplify everything and make it unnecessary
> > to do anything with the low 1MiB.
> >
> > This comes at a cost of 640KiB. But when crash kernels need 32MiB or
> > more to run this isn't much more, and it makes everything much more
> > reliable.
> >
> > Signed-off-by: "Eric W. Biederman" <[email protected]>
> > ---
> > arch/x86/include/asm/kexec.h | 4 ----
> > arch/x86/kernel/crash.c | 19 -------------------
> > arch/x86/purgatory/purgatory.c | 15 ---------------
> > arch/x86/realmode/init.c | 10 ++++++++++
> > 4 files changed, 10 insertions(+), 38 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> > index 5e7d6b46de97..e36307ac324d 100644
> > --- a/arch/x86/include/asm/kexec.h
> > +++ b/arch/x86/include/asm/kexec.h
> > @@ -66,10 +66,6 @@ struct kimage;
> > # define KEXEC_ARCH KEXEC_ARCH_X86_64
> > #endif
> >
> > -/* Memory to backup during crash kdump */
> > -#define KEXEC_BACKUP_SRC_START (0UL)
> > -#define KEXEC_BACKUP_SRC_END (640 * 1024UL - 1) /* 640K */
> > -
> > /*
> > * This function is responsible for capturing register states if coming
> > * via panic otherwise just fix up the ss and sp if coming via kernel
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index eb651fbde92a..dc4773d2f4a6 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -409,31 +409,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
> > return ret;
> > }
> >
> > -static int determine_backup_region(struct resource *res, void *arg)
> > -{
> > - struct kimage *image = arg;
> > -
> > - image->arch.backup_src_start = res->start;
> > - image->arch.backup_src_sz = resource_size(res);
> > -
> > - /* Expecting only one range for backup region */
> > - return 1;
> > -}
> > -
> > int crash_load_segments(struct kimage *image)
> > {
> > int ret;
> > struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> > .buf_max = ULONG_MAX, .top_down = false };
> >
> > - /*
> > - * Determine and load a segment for backup area. First 640K RAM
> > - * region is backup source
> > - */
> > -
> > - ret = walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
> > - image, determine_backup_region);
> > -
> > /* Zero or postive return values are ok */
> > if (ret < 0)
> > return ret;
> > diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
> > index 3b95410ff0f8..448de04703ba 100644
> > --- a/arch/x86/purgatory/purgatory.c
> > +++ b/arch/x86/purgatory/purgatory.c
> > @@ -22,20 +22,6 @@ u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
> >
> > struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(.kexec-purgatory);
> >
> > -/*
> > - * On x86, second kernel requries first 640K of memory to boot. Copy
> > - * first 640K to a backup region in reserved memory range so that second
> > - * kernel can use first 640K.
> > - */
> > -static int copy_backup_region(void)
> > -{
> > - if (purgatory_backup_dest) {
> > - memcpy((void *)purgatory_backup_dest,
> > - (void *)purgatory_backup_src, purgatory_backup_sz);
> > - }
> > - return 0;
> > -}
> > -
> > static int verify_sha256_digest(void)
> > {
> > struct kexec_sha_region *ptr, *end;
> > @@ -66,7 +52,6 @@ void purgatory(void)
> > for (;;)
> > ;
> > }
> > - copy_backup_region();
> > }
> >
> > /*
> > diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> > index 7dce39c8c034..76c680ad23a1 100644
> > --- a/arch/x86/realmode/init.c
> > +++ b/arch/x86/realmode/init.c
> > @@ -34,6 +34,16 @@ void __init reserve_real_mode(void)
> >
> > memblock_reserve(mem, size);
> > set_real_mode_mem(mem);
> > +
> > +#ifdef CONFIG_KEXEC_CORE
> > + /* When crashkernel is specified only use the low 1MiB for the
> > + * real mode trampolines.
> > + */
> > + if (strstr(boot_command_line, "crashkernel=")) {
> > + memblock_reserve(0, 1<<20);
> > + pr_info("Reserving low 1MiB of memory for crashkernel\n");
> > + }
>
> Reserving low 1M looks good to me. The memblock reserved pages won't
> enter into buddy allocator, unless they are freed explicitly with
> memblock_free() later.
> > +#endif /* CONFIG_KEXEC_CORE */
>
> I doubt this patch can work in kdump kernel booting. Because the low 1MB
> is not passed to kdump kernel as system RAM, please check below code.
>
> /* Prepare memory map for crash dump kernel */
> int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
> {
> ......
>
> /* Add first 640K segment */
> ei.addr = image->arch.backup_src_start;
> ei.size = image->arch.backup_src_sz;
> ei.type = E820_TYPE_RAM;
> add_e820_entry(params, &ei);
>
> ......
> }

The current code will dig out one block of 640K from crashkernel region,
we call it the backup region, then copy its low 640K to this backup region.
the low 640K will be added to kdump kernel as system RAM, and the backup
region will be mapped to the [0,640K] of the vmcore elf file of the 1st
kernel. So we can memblock reserve low 1MB, and add it to kdump kernel
as system RAM in crash_setup_memmap_entries(), then clean up all the
other backup related old code, just based on Eric's patch.

2019-10-08 03:19:36

by Lianbo Jiang

[permalink] [raw]
Subject: Re: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

在 2019年10月08日 01:12, Eric W. Biederman 写道:
> lijiang <[email protected]> writes:
>
>> 在 2019年10月07日 17:33, Dave Young 写道:
>>> Hi Lianbo,
>>> On 10/07/19 at 03:08pm, Lianbo Jiang wrote:
>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>>>>
>>>> Kdump kernel will reuse the first 640k region because of some reasons,
>>>> for example: the trampline and conventional PC system BIOS region may
>>>> require to allocate memory in this area. Obviously, kdump kernel will
>>>> also overwrite the first 640k region, therefore, kernel has to copy
>>>> the contents of the first 640k area to a backup area, which is done in
>>>> purgatory(), because vmcore may need the old memory. When vmcore is
>>>> dumped, kdump kernel will read the old memory from the backup area of
>>>> the first 640k area.
>>>>
>>>> Basically, the main reason should be clear, kernel does not correctly
>>>> handle the first 640k region when SME is active, which causes that
>>>> kernel does not properly copy these old memory to the backup area in
>>>> purgatory(). Therefore, kdump kernel reads out the incorrect contents
>>>> from the backup area when dumping vmcore. Finally, the phenomenon is
>>>> as follow:
>>>>
>>>> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
>>>> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
>>>>
>>>> KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
>>>> DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL DUMP]
>>>> CPUS: 128
>>>> DATE: Thu Sep 19 08:31:18 2019
>>>> UPTIME: 00:01:21
>>>> LOAD AVERAGE: 0.16, 0.07, 0.02
>>>> TASKS: 1343
>>>> NODENAME: amd-ethanol
>>>> RELEASE: 5.3.0-rc7+
>>>> VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
>>>> MACHINE: x86_64 (2195 Mhz)
>>>> MEMORY: 127.9 GB
>>>> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>>>> PID: 9789
>>>> COMMAND: "bash"
>>>> TASK: "ffff89711894ae80 [THREAD_INFO: ffff89711894ae80]"
>>>> CPU: 83
>>>> STATE: TASK_RUNNING (PANIC)
>>>>
>>>> crash> kmem -s|grep -i invalid
>>>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>>>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>>>> crash>
>>>>
>>>> BTW: I also tried to fix the above problem in purgatory(), but there
>>>> are too many restricts in purgatory() context, for example: i can't
>>>> allocate new memory to create the identity mapping page table for SME
>>>> situation.
>>>>
>>>> Currently, there are two places where the first 640k area is needed,
>>>> the first one is in the find_trampoline_placement(), another one is
>>>> in the reserve_real_mode(), and their content doesn't matter. To avoid
>>>> the above error, lets occupy the remain memory of the first 640k region
>>>> (expect for the trampoline and real mode) so that the allocated memory
>>>> does not fall into the first 640k area when SME is active, which makes
>>>> us not to worry about whether kernel can correctly copy the contents of
>>>> the first 640k area to a backup region in the purgatory().
>>>>
>>>> Signed-off-by: Lianbo Jiang <[email protected]>
>>>> ---
>>>> Changes since v1:
>>>> 1. Improve patch log
>>>> 2. Change the checking condition from sme_active() to sme_active()
>>>> && strstr(boot_command_line, "crashkernel=")
>>>>
>>>> arch/x86/kernel/setup.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>> index 77ea96b794bd..bdb1a02a84fd 100644
>>>> --- a/arch/x86/kernel/setup.c
>>>> +++ b/arch/x86/kernel/setup.c
>>>> @@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)
>>>>
>>>> reserve_real_mode();
>>>>
>>>> + if (sme_active() && strstr(boot_command_line, "crashkernel="))
>>>> + memblock_reserve(0, 640*1024);
>>>> +
>>>
>>> Seems you missed the comment about "unconditionally do it", only check
>>> crashkernel param looks better.
>>>
>> If so, it means that copying the first 640k to a backup region is no longer needed, and
>> i should post a patch series to remove the copy_backup_region(). Any idea?
>>
>>> Also I noticed reserve_crashkernel is called after initmem_init, I'm not
>>> sure if memblock_reserve is good enough in early code before
>>> initmem_init.
>>>
>> The first zero page and real mode are also reserved before the initmem_init(),
>> and seems that they work well until now.
>>
>> Thanks.
>> Lianbo
>
> This has only been boot tested but I think this is about what we need.
>
> I feel like I haven't found and deleted all of the backup region code.
>
No worry, i will check the backup related code.

In addition, i will also make a test and improve it based on your draft patch.
And I will post them here.

Thanks.
Lianbo

> I think it is important to have the reservation code in reseve_real_mode
> as the logic is fundamentally intertwined.
>
> Eric
> >
> From: "Eric W. Biederman" <[email protected]>
> Date: Mon, 7 Oct 2019 11:57:24 -0500
> Subject: [PATCH] x86/kexec: Always reserve the low 1MiB
>
> When the crashkernel kernel command line option is specified always
> reserve the low 1MiB. That way it does not need to be included
> in crash dumps or used for anything execept the processor trampolines
> that must live in the low 1MiB.
>
> The current handling of copying the low 1MiB runs into problems when
> SME is active. So just simplify everything and make it unnecessary
> to do anything with the low 1MiB.
>
> This comes at a cost of 640KiB. But when crash kernels need 32MiB or
> more to run this isn't much more, and it makes everything much more
> reliable.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> arch/x86/include/asm/kexec.h | 4 ----
> arch/x86/kernel/crash.c | 19 -------------------
> arch/x86/purgatory/purgatory.c | 15 ---------------
> arch/x86/realmode/init.c | 10 ++++++++++
> 4 files changed, 10 insertions(+), 38 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 5e7d6b46de97..e36307ac324d 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -66,10 +66,6 @@ struct kimage;
> # define KEXEC_ARCH KEXEC_ARCH_X86_64
> #endif
>
> -/* Memory to backup during crash kdump */
> -#define KEXEC_BACKUP_SRC_START (0UL)
> -#define KEXEC_BACKUP_SRC_END (640 * 1024UL - 1) /* 640K */
> -
> /*
> * This function is responsible for capturing register states if coming
> * via panic otherwise just fix up the ss and sp if coming via kernel
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index eb651fbde92a..dc4773d2f4a6 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -409,31 +409,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
> return ret;
> }
>
> -static int determine_backup_region(struct resource *res, void *arg)
> -{
> - struct kimage *image = arg;
> -
> - image->arch.backup_src_start = res->start;
> - image->arch.backup_src_sz = resource_size(res);
> -
> - /* Expecting only one range for backup region */
> - return 1;
> -}
> -
> int crash_load_segments(struct kimage *image)
> {
> int ret;
> struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> .buf_max = ULONG_MAX, .top_down = false };
>
> - /*
> - * Determine and load a segment for backup area. First 640K RAM
> - * region is backup source
> - */
> -
> - ret = walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
> - image, determine_backup_region);
> -
> /* Zero or postive return values are ok */
> if (ret < 0)
> return ret;
> diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
> index 3b95410ff0f8..448de04703ba 100644
> --- a/arch/x86/purgatory/purgatory.c
> +++ b/arch/x86/purgatory/purgatory.c
> @@ -22,20 +22,6 @@ u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
>
> struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(.kexec-purgatory);
>
> -/*
> - * On x86, second kernel requries first 640K of memory to boot. Copy
> - * first 640K to a backup region in reserved memory range so that second
> - * kernel can use first 640K.
> - */
> -static int copy_backup_region(void)
> -{
> - if (purgatory_backup_dest) {
> - memcpy((void *)purgatory_backup_dest,
> - (void *)purgatory_backup_src, purgatory_backup_sz);
> - }
> - return 0;
> -}
> -
> static int verify_sha256_digest(void)
> {
> struct kexec_sha_region *ptr, *end;
> @@ -66,7 +52,6 @@ void purgatory(void)
> for (;;)
> ;
> }
> - copy_backup_region();
> }
>
> /*
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 7dce39c8c034..76c680ad23a1 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -34,6 +34,16 @@ void __init reserve_real_mode(void)
>
> memblock_reserve(mem, size);
> set_real_mode_mem(mem);
> +
> +#ifdef CONFIG_KEXEC_CORE
> + /* When crashkernel is specified only use the low 1MiB for the
> + * real mode trampolines.
> + */
> + if (strstr(boot_command_line, "crashkernel=")) {
> + memblock_reserve(0, 1<<20);
> + pr_info("Reserving low 1MiB of memory for crashkernel\n");
> + }
> +#endif /* CONFIG_KEXEC_CORE */
> }
>
> static void __init setup_real_mode(void)
>