2012-02-29 17:21:38

by Eugene Surovegin

[permalink] [raw]
Subject: [PATCH] kdump: force page alignment for per-CPU crash notes.

Per-CPU allocations are not guaranteed to be physically contiguous.
However, kdump kernel and user-space code assumes that per-CPU
memory, used for saving CPU registers on crash, is.
This can cause corrupted /proc/vmcore in some cases - the main
symptom being huge ELF note section.

Force page alignment for note_buf_t to ensure that this assumption holds.

Signed-off-by: Eugene Surovegin <[email protected]>
CC: Eric Biederman <[email protected]>
CC: Vivek Goyal <[email protected]>
CC: kexec-list <[email protected]>
---
kernel/kexec.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 7b08867..e641b5c 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1232,8 +1232,13 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)

static int __init crash_notes_memory_init(void)
{
- /* Allocate memory for saving cpu registers. */
- crash_notes = alloc_percpu(note_buf_t);
+ /* Allocate memory for saving cpu registers.
+ * Force page alignment to avoid crossing physical page boundary -
+ * kexec-tools and kernel /proc/vmcore handler assume these per-CPU
+ * chunks are physically contiguous.
+ */
+ crash_notes = (note_buf_t __percpu *)__alloc_percpu(sizeof(note_buf_t),
+ PAGE_SIZE);
if (!crash_notes) {
printk("Kexec: Memory allocation for saving cpu register"
" states failed\n");
--
1.7.9.1


2012-03-01 01:18:15

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.

On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> Per-CPU allocations are not guaranteed to be physically contiguous.
> However, kdump kernel and user-space code assumes that per-CPU
> memory, used for saving CPU registers on crash, is.
> This can cause corrupted /proc/vmcore in some cases - the main
> symptom being huge ELF note section.
>
> Force page alignment for note_buf_t to ensure that this assumption holds.

Ouch. I'm surprised there is an allocation on crash, perhaps
it could at least be done earlier? And am I right in thinking
that this change increases the likely hood that the allocation
could fail?

>
> Signed-off-by: Eugene Surovegin <[email protected]>
> CC: Eric Biederman <[email protected]>
> CC: Vivek Goyal <[email protected]>
> CC: kexec-list <[email protected]>
> ---
> kernel/kexec.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 7b08867..e641b5c 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1232,8 +1232,13 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>
> static int __init crash_notes_memory_init(void)
> {
> - /* Allocate memory for saving cpu registers. */
> - crash_notes = alloc_percpu(note_buf_t);
> + /* Allocate memory for saving cpu registers.
> + * Force page alignment to avoid crossing physical page boundary -
> + * kexec-tools and kernel /proc/vmcore handler assume these per-CPU
> + * chunks are physically contiguous.
> + */
> + crash_notes = (note_buf_t __percpu *)__alloc_percpu(sizeof(note_buf_t),
> + PAGE_SIZE);
> if (!crash_notes) {
> printk("Kexec: Memory allocation for saving cpu register"
> " states failed\n");
> --
> 1.7.9.1
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2012-03-01 01:32:06

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.

On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>
> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> > > However, kdump kernel and user-space code assumes that per-CPU
> > > memory, used for saving CPU registers on crash, is.
> > > This can cause corrupted /proc/vmcore in some cases - the main
> > > symptom being huge ELF note section.
> > >
> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >
> > Ouch. I'm surprised there is an allocation on crash, perhaps
> > it could at least be done earlier? And am I right in thinking
> > that this change increases the likely hood that the allocation
> > could fail?
> >
>
> I'm not following. This allocation is done on start-up, not on crash.
> If you cannot allocate this much memory on system boot, I'm not sure what
> else you can do on this system....

Sorry, my eyes deceived me. You are correct and I agree.

Is it the case that note_buf_t is never larger than PAGE_SIZE?
If so I your patch looks good to me.

2012-03-01 01:40:27

by Eugene Surovegin

[permalink] [raw]
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.

On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>>
>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>> > > However, kdump kernel and user-space code assumes that per-CPU
>> > > memory, used for saving CPU registers on crash, is.
>> > > This can cause corrupted /proc/vmcore in some cases - the main
>> > > symptom being huge ELF note section.
>> > >
>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>> >
>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>> > it could at least be done earlier? And am I right in thinking
>> > that this change increases the likely hood that the allocation
>> > could fail?
>> >
>>
>> I'm not following. This allocation is done on start-up, not on crash.
>> If you cannot allocate this much memory on system boot, I'm not sure what
>> else you can do on this system....
>
> Sorry, my eyes deceived me. You are correct and I agree.
>
> Is it the case that note_buf_t is never larger than PAGE_SIZE?
> If so I your patch looks good to me.

Currently, maximum note size is hardcoded in kexec-tools to 1024
(MAX_NOTE_BYTES).
Usually it's way less. IIRC on x86_64 it's 336 bytes.

--
Eugene

2012-03-01 01:52:01

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.

From: Eugene Surovegin <[email protected]>
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
Date: Wed, 29 Feb 2012 17:39:55 -0800

> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
>> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>>>
>>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>>> > > However, kdump kernel and user-space code assumes that per-CPU
>>> > > memory, used for saving CPU registers on crash, is.
>>> > > This can cause corrupted /proc/vmcore in some cases - the main
>>> > > symptom being huge ELF note section.
>>> > >
>>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>>> >
>>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>>> > it could at least be done earlier? And am I right in thinking
>>> > that this change increases the likely hood that the allocation
>>> > could fail?
>>> >
>>>
>>> I'm not following. This allocation is done on start-up, not on crash.
>>> If you cannot allocate this much memory on system boot, I'm not sure what
>>> else you can do on this system....
>>
>> Sorry, my eyes deceived me. You are correct and I agree.
>>
>> Is it the case that note_buf_t is never larger than PAGE_SIZE?
>> If so I your patch looks good to me.
>
> Currently, maximum note size is hardcoded in kexec-tools to 1024
> (MAX_NOTE_BYTES).
> Usually it's way less. IIRC on x86_64 it's 336 bytes.
>

This is elf_prstatus and I guess it's mostly equal to registers.

crash> p sizeof(struct elf_prstatus)
$3 = 336
crash> ptype struct elf_prstatus
type = struct elf_prstatus {
struct elf_siginfo pr_info;
short int pr_cursig;
long unsigned int pr_sigpend;
long unsigned int pr_sighold;
pid_t pr_pid;
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
struct timeval pr_utime;
struct timeval pr_stime;
struct timeval pr_cutime;
struct timeval pr_cstime;
elf_gregset_t pr_reg; <-- this
int pr_fpvalid;
}

What kinds of architecture does have so many registers? It's just my
interest. Or possibly other kinds of notes is written here?

Thanks.
HATAYAMA, Daisuke

2012-03-01 01:57:26

by Eugene Surovegin

[permalink] [raw]
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.

On Wed, Feb 29, 2012 at 5:51 PM, HATAYAMA Daisuke
<[email protected]> wrote:
> From: Eugene Surovegin <[email protected]>
> Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
> Date: Wed, 29 Feb 2012 17:39:55 -0800
>
>> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
>>> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>>>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>>>>
>>>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>>>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>>>> > > However, kdump kernel and user-space code assumes that per-CPU
>>>> > > memory, used for saving CPU registers on crash, is.
>>>> > > This can cause corrupted /proc/vmcore in some cases - the main
>>>> > > symptom being huge ELF note section.
>>>> > >
>>>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>>>> >
>>>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>>>> > it could at least be done earlier? And am I right in thinking
>>>> > that this change increases the likely hood that the allocation
>>>> > could fail?
>>>> >
>>>>
>>>> I'm not following. This allocation is done on start-up, not on crash.
>>>> If you cannot allocate this much memory on system boot, I'm not sure what
>>>> else you can do on this system....
>>>
>>> Sorry, my eyes deceived me. You are correct and I agree.
>>>
>>> Is it the case that note_buf_t is never larger than PAGE_SIZE?
>>> If so I your patch looks good to me.
>>
>> Currently, maximum note size is hardcoded in kexec-tools to 1024
>> (MAX_NOTE_BYTES).
>> Usually it's way less. IIRC on x86_64 it's 336 bytes.
>>
>
> This is elf_prstatus and I guess it's mostly equal to registers.
>
> crash> p sizeof(struct elf_prstatus)
> $3 = 336
> crash> ptype struct elf_prstatus
> type = struct elf_prstatus {
> ? ?struct elf_siginfo pr_info;
> ? ?short int pr_cursig;
> ? ?long unsigned int pr_sigpend;
> ? ?long unsigned int pr_sighold;
> ? ?pid_t pr_pid;
> ? ?pid_t pr_ppid;
> ? ?pid_t pr_pgrp;
> ? ?pid_t pr_sid;
> ? ?struct timeval pr_utime;
> ? ?struct timeval pr_stime;
> ? ?struct timeval pr_cutime;
> ? ?struct timeval pr_cstime;
> ? ?elf_gregset_t pr_reg; <-- this
> ? ?int pr_fpvalid;
> }
>
> What kinds of architecture does have so many registers? It's just my
> interest. Or possibly other kinds of notes is written here?

I'm not sure about other archs, but we don't write there anything
except for 'elf_prstatus' and sentinel "final" note.

--
Eugene

2012-03-01 02:50:43

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.

On Thu, Mar 01, 2012 at 10:51:26AM +0900, HATAYAMA Daisuke wrote:
> From: Eugene Surovegin <[email protected]>
> Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
> Date: Wed, 29 Feb 2012 17:39:55 -0800
>
> > On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
> >> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> >>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
> >>>
> >>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> >>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> >>> > > However, kdump kernel and user-space code assumes that per-CPU
> >>> > > memory, used for saving CPU registers on crash, is.
> >>> > > This can cause corrupted /proc/vmcore in some cases - the main
> >>> > > symptom being huge ELF note section.
> >>> > >
> >>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >>> >
> >>> > Ouch. I'm surprised there is an allocation on crash, perhaps
> >>> > it could at least be done earlier? And am I right in thinking
> >>> > that this change increases the likely hood that the allocation
> >>> > could fail?
> >>> >
> >>>
> >>> I'm not following. This allocation is done on start-up, not on crash.
> >>> If you cannot allocate this much memory on system boot, I'm not sure what
> >>> else you can do on this system....
> >>
> >> Sorry, my eyes deceived me. You are correct and I agree.
> >>
> >> Is it the case that note_buf_t is never larger than PAGE_SIZE?
> >> If so I your patch looks good to me.
> >
> > Currently, maximum note size is hardcoded in kexec-tools to 1024
> > (MAX_NOTE_BYTES).
> > Usually it's way less. IIRC on x86_64 it's 336 bytes.

I presume that MAX_NOTE_BYTES was chosen to be large to
minimise the chance that it would ever need to be changed.

> This is elf_prstatus and I guess it's mostly equal to registers.
>
> crash> p sizeof(struct elf_prstatus)
> $3 = 336
> crash> ptype struct elf_prstatus
> type = struct elf_prstatus {
> struct elf_siginfo pr_info;
> short int pr_cursig;
> long unsigned int pr_sigpend;
> long unsigned int pr_sighold;
> pid_t pr_pid;
> pid_t pr_ppid;
> pid_t pr_pgrp;
> pid_t pr_sid;
> struct timeval pr_utime;
> struct timeval pr_stime;
> struct timeval pr_cutime;
> struct timeval pr_cstime;
> elf_gregset_t pr_reg; <-- this
> int pr_fpvalid;
> }
>
> What kinds of architecture does have so many registers? It's just my
> interest. Or possibly other kinds of notes is written here?

The winner seems to be ia64 with 128.

2012-03-01 02:53:32

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.

On Wed, Feb 29, 2012 at 05:39:55PM -0800, Eugene Surovegin wrote:
> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
> > On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> >> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
> >>
> >> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> >> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> >> > > However, kdump kernel and user-space code assumes that per-CPU
> >> > > memory, used for saving CPU registers on crash, is.
> >> > > This can cause corrupted /proc/vmcore in some cases - the main
> >> > > symptom being huge ELF note section.
> >> > >
> >> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >> >
> >> > Ouch. I'm surprised there is an allocation on crash, perhaps
> >> > it could at least be done earlier? And am I right in thinking
> >> > that this change increases the likely hood that the allocation
> >> > could fail?
> >> >
> >>
> >> I'm not following. This allocation is done on start-up, not on crash.
> >> If you cannot allocate this much memory on system boot, I'm not sure what
> >> else you can do on this system....
> >
> > Sorry, my eyes deceived me. You are correct and I agree.
> >
> > Is it the case that note_buf_t is never larger than PAGE_SIZE?
> > If so I your patch looks good to me.
>
> Currently, maximum note size is hardcoded in kexec-tools to 1024
> (MAX_NOTE_BYTES).
> Usually it's way less. IIRC on x86_64 it's 336 bytes.

Ok, understood.

Reviewed-by: Simon Horman <[email protected]>