Per-CPU allocations are not guaranteed to be physically contiguous.
However, kdump kernel and user-space code assumes that per-CPU
memory, used for saving CPU registers on crash, is.
This can cause corrupted /proc/vmcore in some cases - the main
symptom being huge ELF note section.
Force page alignment for note_buf_t to ensure that this assumption holds.
Signed-off-by: Eugene Surovegin <[email protected]>
CC: Eric Biederman <[email protected]>
CC: Vivek Goyal <[email protected]>
CC: kexec-list <[email protected]>
---
kernel/kexec.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 7b08867..e641b5c 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1232,8 +1232,13 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
static int __init crash_notes_memory_init(void)
{
- /* Allocate memory for saving cpu registers. */
- crash_notes = alloc_percpu(note_buf_t);
+ /* Allocate memory for saving cpu registers.
+ * Force page alignment to avoid crossing physical page boundary -
+ * kexec-tools and kernel /proc/vmcore handler assume these per-CPU
+ * chunks are physically contiguous.
+ */
+ crash_notes = (note_buf_t __percpu *)__alloc_percpu(sizeof(note_buf_t),
+ PAGE_SIZE);
if (!crash_notes) {
printk("Kexec: Memory allocation for saving cpu register"
" states failed\n");
--
1.7.9.1
On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> Per-CPU allocations are not guaranteed to be physically contiguous.
> However, kdump kernel and user-space code assumes that per-CPU
> memory, used for saving CPU registers on crash, is.
> This can cause corrupted /proc/vmcore in some cases - the main
> symptom being huge ELF note section.
>
> Force page alignment for note_buf_t to ensure that this assumption holds.
Ouch. I'm surprised there is an allocation on crash, perhaps
it could at least be done earlier? And am I right in thinking
that this change increases the likely hood that the allocation
could fail?
>
> Signed-off-by: Eugene Surovegin <[email protected]>
> CC: Eric Biederman <[email protected]>
> CC: Vivek Goyal <[email protected]>
> CC: kexec-list <[email protected]>
> ---
> kernel/kexec.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 7b08867..e641b5c 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1232,8 +1232,13 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>
> static int __init crash_notes_memory_init(void)
> {
> - /* Allocate memory for saving cpu registers. */
> - crash_notes = alloc_percpu(note_buf_t);
> + /* Allocate memory for saving cpu registers.
> + * Force page alignment to avoid crossing physical page boundary -
> + * kexec-tools and kernel /proc/vmcore handler assume these per-CPU
> + * chunks are physically contiguous.
> + */
> + crash_notes = (note_buf_t __percpu *)__alloc_percpu(sizeof(note_buf_t),
> + PAGE_SIZE);
> if (!crash_notes) {
> printk("Kexec: Memory allocation for saving cpu register"
> " states failed\n");
> --
> 1.7.9.1
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>
On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>
> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> > > However, kdump kernel and user-space code assumes that per-CPU
> > > memory, used for saving CPU registers on crash, is.
> > > This can cause corrupted /proc/vmcore in some cases - the main
> > > symptom being huge ELF note section.
> > >
> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >
> > Ouch. I'm surprised there is an allocation on crash, perhaps
> > it could at least be done earlier? And am I right in thinking
> > that this change increases the likely hood that the allocation
> > could fail?
> >
>
> I'm not following. This allocation is done on start-up, not on crash.
> If you cannot allocate this much memory on system boot, I'm not sure what
> else you can do on this system....
Sorry, my eyes deceived me. You are correct and I agree.
Is it the case that note_buf_t is never larger than PAGE_SIZE?
If so I your patch looks good to me.
On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>>
>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>> > > However, kdump kernel and user-space code assumes that per-CPU
>> > > memory, used for saving CPU registers on crash, is.
>> > > This can cause corrupted /proc/vmcore in some cases - the main
>> > > symptom being huge ELF note section.
>> > >
>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>> >
>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>> > it could at least be done earlier? And am I right in thinking
>> > that this change increases the likely hood that the allocation
>> > could fail?
>> >
>>
>> I'm not following. This allocation is done on start-up, not on crash.
>> If you cannot allocate this much memory on system boot, I'm not sure what
>> else you can do on this system....
>
> Sorry, my eyes deceived me. You are correct and I agree.
>
> Is it the case that note_buf_t is never larger than PAGE_SIZE?
> If so I your patch looks good to me.
Currently, maximum note size is hardcoded in kexec-tools to 1024
(MAX_NOTE_BYTES).
Usually it's way less. IIRC on x86_64 it's 336 bytes.
--
Eugene
From: Eugene Surovegin <[email protected]>
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
Date: Wed, 29 Feb 2012 17:39:55 -0800
> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
>> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>>>
>>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>>> > > However, kdump kernel and user-space code assumes that per-CPU
>>> > > memory, used for saving CPU registers on crash, is.
>>> > > This can cause corrupted /proc/vmcore in some cases - the main
>>> > > symptom being huge ELF note section.
>>> > >
>>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>>> >
>>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>>> > it could at least be done earlier? And am I right in thinking
>>> > that this change increases the likely hood that the allocation
>>> > could fail?
>>> >
>>>
>>> I'm not following. This allocation is done on start-up, not on crash.
>>> If you cannot allocate this much memory on system boot, I'm not sure what
>>> else you can do on this system....
>>
>> Sorry, my eyes deceived me. You are correct and I agree.
>>
>> Is it the case that note_buf_t is never larger than PAGE_SIZE?
>> If so I your patch looks good to me.
>
> Currently, maximum note size is hardcoded in kexec-tools to 1024
> (MAX_NOTE_BYTES).
> Usually it's way less. IIRC on x86_64 it's 336 bytes.
>
This is elf_prstatus and I guess it's mostly equal to registers.
crash> p sizeof(struct elf_prstatus)
$3 = 336
crash> ptype struct elf_prstatus
type = struct elf_prstatus {
struct elf_siginfo pr_info;
short int pr_cursig;
long unsigned int pr_sigpend;
long unsigned int pr_sighold;
pid_t pr_pid;
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
struct timeval pr_utime;
struct timeval pr_stime;
struct timeval pr_cutime;
struct timeval pr_cstime;
elf_gregset_t pr_reg; <-- this
int pr_fpvalid;
}
What kinds of architecture does have so many registers? It's just my
interest. Or possibly other kinds of notes is written here?
Thanks.
HATAYAMA, Daisuke
On Wed, Feb 29, 2012 at 5:51 PM, HATAYAMA Daisuke
<[email protected]> wrote:
> From: Eugene Surovegin <[email protected]>
> Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
> Date: Wed, 29 Feb 2012 17:39:55 -0800
>
>> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
>>> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>>>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
>>>>
>>>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>>>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>>>> > > However, kdump kernel and user-space code assumes that per-CPU
>>>> > > memory, used for saving CPU registers on crash, is.
>>>> > > This can cause corrupted /proc/vmcore in some cases - the main
>>>> > > symptom being huge ELF note section.
>>>> > >
>>>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>>>> >
>>>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>>>> > it could at least be done earlier? And am I right in thinking
>>>> > that this change increases the likely hood that the allocation
>>>> > could fail?
>>>> >
>>>>
>>>> I'm not following. This allocation is done on start-up, not on crash.
>>>> If you cannot allocate this much memory on system boot, I'm not sure what
>>>> else you can do on this system....
>>>
>>> Sorry, my eyes deceived me. You are correct and I agree.
>>>
>>> Is it the case that note_buf_t is never larger than PAGE_SIZE?
>>> If so I your patch looks good to me.
>>
>> Currently, maximum note size is hardcoded in kexec-tools to 1024
>> (MAX_NOTE_BYTES).
>> Usually it's way less. IIRC on x86_64 it's 336 bytes.
>>
>
> This is elf_prstatus and I guess it's mostly equal to registers.
>
> crash> p sizeof(struct elf_prstatus)
> $3 = 336
> crash> ptype struct elf_prstatus
> type = struct elf_prstatus {
> ? ?struct elf_siginfo pr_info;
> ? ?short int pr_cursig;
> ? ?long unsigned int pr_sigpend;
> ? ?long unsigned int pr_sighold;
> ? ?pid_t pr_pid;
> ? ?pid_t pr_ppid;
> ? ?pid_t pr_pgrp;
> ? ?pid_t pr_sid;
> ? ?struct timeval pr_utime;
> ? ?struct timeval pr_stime;
> ? ?struct timeval pr_cutime;
> ? ?struct timeval pr_cstime;
> ? ?elf_gregset_t pr_reg; <-- this
> ? ?int pr_fpvalid;
> }
>
> What kinds of architecture does have so many registers? It's just my
> interest. Or possibly other kinds of notes is written here?
I'm not sure about other archs, but we don't write there anything
except for 'elf_prstatus' and sentinel "final" note.
--
Eugene
On Thu, Mar 01, 2012 at 10:51:26AM +0900, HATAYAMA Daisuke wrote:
> From: Eugene Surovegin <[email protected]>
> Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
> Date: Wed, 29 Feb 2012 17:39:55 -0800
>
> > On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
> >> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> >>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
> >>>
> >>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> >>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> >>> > > However, kdump kernel and user-space code assumes that per-CPU
> >>> > > memory, used for saving CPU registers on crash, is.
> >>> > > This can cause corrupted /proc/vmcore in some cases - the main
> >>> > > symptom being huge ELF note section.
> >>> > >
> >>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >>> >
> >>> > Ouch. I'm surprised there is an allocation on crash, perhaps
> >>> > it could at least be done earlier? And am I right in thinking
> >>> > that this change increases the likely hood that the allocation
> >>> > could fail?
> >>> >
> >>>
> >>> I'm not following. This allocation is done on start-up, not on crash.
> >>> If you cannot allocate this much memory on system boot, I'm not sure what
> >>> else you can do on this system....
> >>
> >> Sorry, my eyes deceived me. You are correct and I agree.
> >>
> >> Is it the case that note_buf_t is never larger than PAGE_SIZE?
> >> If so I your patch looks good to me.
> >
> > Currently, maximum note size is hardcoded in kexec-tools to 1024
> > (MAX_NOTE_BYTES).
> > Usually it's way less. IIRC on x86_64 it's 336 bytes.
I presume that MAX_NOTE_BYTES was chosen to be large to
minimise the chance that it would ever need to be changed.
> This is elf_prstatus and I guess it's mostly equal to registers.
>
> crash> p sizeof(struct elf_prstatus)
> $3 = 336
> crash> ptype struct elf_prstatus
> type = struct elf_prstatus {
> struct elf_siginfo pr_info;
> short int pr_cursig;
> long unsigned int pr_sigpend;
> long unsigned int pr_sighold;
> pid_t pr_pid;
> pid_t pr_ppid;
> pid_t pr_pgrp;
> pid_t pr_sid;
> struct timeval pr_utime;
> struct timeval pr_stime;
> struct timeval pr_cutime;
> struct timeval pr_cstime;
> elf_gregset_t pr_reg; <-- this
> int pr_fpvalid;
> }
>
> What kinds of architecture does have so many registers? It's just my
> interest. Or possibly other kinds of notes is written here?
The winner seems to be ia64 with 128.
On Wed, Feb 29, 2012 at 05:39:55PM -0800, Eugene Surovegin wrote:
> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <[email protected]> wrote:
> > On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> >> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <[email protected]> wrote:
> >>
> >> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> >> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> >> > > However, kdump kernel and user-space code assumes that per-CPU
> >> > > memory, used for saving CPU registers on crash, is.
> >> > > This can cause corrupted /proc/vmcore in some cases - the main
> >> > > symptom being huge ELF note section.
> >> > >
> >> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >> >
> >> > Ouch. I'm surprised there is an allocation on crash, perhaps
> >> > it could at least be done earlier? And am I right in thinking
> >> > that this change increases the likely hood that the allocation
> >> > could fail?
> >> >
> >>
> >> I'm not following. This allocation is done on start-up, not on crash.
> >> If you cannot allocate this much memory on system boot, I'm not sure what
> >> else you can do on this system....
> >
> > Sorry, my eyes deceived me. You are correct and I agree.
> >
> > Is it the case that note_buf_t is never larger than PAGE_SIZE?
> > If so I your patch looks good to me.
>
> Currently, maximum note size is hardcoded in kexec-tools to 1024
> (MAX_NOTE_BYTES).
> Usually it's way less. IIRC on x86_64 it's 336 bytes.
Ok, understood.
Reviewed-by: Simon Horman <[email protected]>