2015-08-03 12:50:53

by Baoquan He

[permalink] [raw]
Subject: [Patch v2] align crash_notes allocation to make it be inside one physical page

People reported that crash_notes in /proc/vmcore were corrupted and
this cause crash kdump failure. With code debugging and log we got
the root cause. This is because percpu variable crash_notes are
allocated in 2 vmalloc pages. Currently percpu is based on vmalloc
by default. Vmalloc can't guarantee 2 continuous vmalloc pages
are also on 2 continuous physical pages. So when 1st kernel exports
the starting address and size of crash_notes through sysfs like below:

/sys/devices/system/cpu/cpux/crash_notes
/sys/devices/system/cpu/cpux/crash_notes_size

kdump kernel use them to get the content of crash_notes. However the
2nd part may not be in the next neighbouring physical page as we
expected if crash_notes are allocated accross 2 vmalloc pages. That's
why nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in
update_note_header_size_elf64() and cause note header merging failure
or some warnings.

In this patch change to call __alloc_percpu() to passed in the align
value by rounding crash_notes_size up to the nearest power of two.
This make sure the crash_notes is allocated inside one physical page
since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE.
Meanwhile add a WARN_ON in case it grows to be bigger than PAGE_SIZE
in the future.

v1->v2:
Minfei mentioned percpu can't take align value bigger then PAGE_SIZE.
So limit align to be PAGE_SIZE at most.

Signed-off-by: Baoquan He <[email protected]>
---
kernel/kexec.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index a785c10..9f4b070 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
static int __init crash_notes_memory_init(void)
{
/* Allocate memory for saving cpu registers. */
- crash_notes = alloc_percpu(note_buf_t);
+ size_t size, align;
+ int order;
+
+ size = sizeof(note_buf_t);
+ order = get_count_order(size);
+ align = min_t(size_t, (1<<order), PAGE_SIZE);
+
+ WARN_ON(size > PAGE_SIZE);
+
+ crash_notes = __alloc_percpu(size, align);
if (!crash_notes) {
pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
return -ENOMEM;
--
2.1.0


2015-08-03 22:04:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch v2] align crash_notes allocation to make it be inside one physical page

On Mon, 3 Aug 2015 20:50:43 +0800 Baoquan He <[email protected]> wrote:

> People reported that crash_notes in /proc/vmcore were corrupted and
> this cause crash kdump failure. With code debugging and log we got
> the root cause. This is because percpu variable crash_notes are
> allocated in 2 vmalloc pages. Currently percpu is based on vmalloc
> by default. Vmalloc can't guarantee 2 continuous vmalloc pages
> are also on 2 continuous physical pages. So when 1st kernel exports
> the starting address and size of crash_notes through sysfs like below:
>
> /sys/devices/system/cpu/cpux/crash_notes
> /sys/devices/system/cpu/cpux/crash_notes_size
>
> kdump kernel use them to get the content of crash_notes. However the
> 2nd part may not be in the next neighbouring physical page as we
> expected if crash_notes are allocated accross 2 vmalloc pages. That's
> why nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in
> update_note_header_size_elf64() and cause note header merging failure
> or some warnings.
>
> In this patch change to call __alloc_percpu() to passed in the align
> value by rounding crash_notes_size up to the nearest power of two.
> This make sure the crash_notes is allocated inside one physical page
> since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE.
> Meanwhile add a WARN_ON in case it grows to be bigger than PAGE_SIZE
> in the future.
>
> ...
>
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> static int __init crash_notes_memory_init(void)
> {
> /* Allocate memory for saving cpu registers. */
> - crash_notes = alloc_percpu(note_buf_t);
> + size_t size, align;
> + int order;
> +
> + size = sizeof(note_buf_t);
> + order = get_count_order(size);
> + align = min_t(size_t, (1<<order), PAGE_SIZE);
> +
> + WARN_ON(size > PAGE_SIZE);
> +
> + crash_notes = __alloc_percpu(size, align);

A code comment would be helpful - the reason for this code's existence
is otherwise utterly unobvious.

I think it can be done this way:

align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);


I never noticed get_count_order() before. afaict it does the same as
order_base_2(), except get_count_order() generates better code and has
a ridiculous name.

And I think the WARN_ON can be replaced with a
BUILD_BUG_ON(sizeof>PAGE_SIZE)? That would avoid adding runtime
overhead.

2015-08-03 23:02:00

by Baoquan He

[permalink] [raw]
Subject: Re: [Patch v2] align crash_notes allocation to make it be inside one physical page

Hi Andrew,

Thanks a lot for your reviewing and suggestiong.

On 08/03/15 at 03:04pm, Andrew Morton wrote:
> On Mon, 3 Aug 2015 20:50:43 +0800 Baoquan He <[email protected]> wrote:
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> > static int __init crash_notes_memory_init(void)
> > {
> > /* Allocate memory for saving cpu registers. */
> > - crash_notes = alloc_percpu(note_buf_t);
> > + size_t size, align;
> > + int order;
> > +
> > + size = sizeof(note_buf_t);
> > + order = get_count_order(size);
> > + align = min_t(size_t, (1<<order), PAGE_SIZE);
> > +
> > + WARN_ON(size > PAGE_SIZE);
> > +
> > + crash_notes = __alloc_percpu(size, align);
>
> A code comment would be helpful - the reason for this code's existence
> is otherwise utterly unobvious.

Will add in new post.

>
> I think it can be done this way:
>
> align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
>
>
> I never noticed get_count_order() before. afaict it does the same as
> order_base_2(), except get_count_order() generates better code and has
> a ridiculous name.

OK, will change the code as you suggested.

>
> And I think the WARN_ON can be replaced with a
> BUILD_BUG_ON(sizeof>PAGE_SIZE)? That would avoid adding runtime
> overhead.

I am not sure about this. BUILD_BUG_ON will break kernel compiling.
Before we got the root cause several work around fix were introduced to
skip this kind of crash_note.

c4082f3 vmcore: continue vmcore initialization if PT_NOTE is found empty
38dfac8 vmcore: prevent PT_NOTE p_memsz overflow during header update

That means if (sizeof(note_buf_t)>PAGE_SIZE) really happened, normal
kernel works well, kdump kernel can work but we will lose those
crash_notes. And if on one certain ARCH sizeof(note_buf_t) is bigger
than PAGE_SIZE, the design here must be changed to avoid using percpu
variable or adjust their note_buf_t. That may take a not short time to
discuss and review. Comparing with this it may be better to tolerate the
dumping vmcore with uncomplete crash_notes for a while until new design
is taken.

Thanks
Baoquan

2015-08-11 06:33:38

by Baoquan He

[permalink] [raw]
Subject: Re: [Patch v2] align crash_notes allocation to make it be inside one physical page

Hi Andrew,

On 08/03/15 at 03:04pm, Andrew Morton wrote:
> On Mon, 3 Aug 2015 20:50:43 +0800 Baoquan He <[email protected]> wrote:
> And I think the WARN_ON can be replaced with a
> BUILD_BUG_ON(sizeof>PAGE_SIZE)? That would avoid adding runtime
> overhead.

Rethink about this, you are right. Using BUILD_BUG_ON is better.
Anyone who found this compiling break should check if his/her code
changes increase the crash_notes size. If possible that increase need be
avoidable. Otherwise he should report this to upstream why it's
unavoidable to increase crash_notes size, then let's consider the
redesign the crash_notes data structure.

So I will use BUILD_BUG_ON and repost.

Thanks
Baoquan
>

2015-08-11 07:56:16

by Minfei Huang

[permalink] [raw]
Subject: Re: [Patch v2] align crash_notes allocation to make it be inside one physical page

On 08/11/15 at 02:33pm, Baoquan He wrote:
> Hi Andrew,
>
> On 08/03/15 at 03:04pm, Andrew Morton wrote:
> > On Mon, 3 Aug 2015 20:50:43 +0800 Baoquan He <[email protected]> wrote:
> > And I think the WARN_ON can be replaced with a
> > BUILD_BUG_ON(sizeof>PAGE_SIZE)? That would avoid adding runtime
> > overhead.
>
> Rethink about this, you are right. Using BUILD_BUG_ON is better.
> Anyone who found this compiling break should check if his/her code
> changes increase the crash_notes size. If possible that increase need be
> avoidable. Otherwise he should report this to upstream why it's
> unavoidable to increase crash_notes size, then let's consider the
> redesign the crash_notes data structure.
>
> So I will use BUILD_BUG_ON and repost.

Baoquan.

If the size of notes never be exceeded to PAGE_SIZE, I think we can
revert below patch, since the situation which describes in patch does
not happen.

commit 38dfac843cb6d7be1874888839817404a15a6b3c
Author: Greg Pearson <[email protected]>
Date: Mon Feb 10 14:25:36 2014 -0800

vmcore: prevent PT_NOTE p_memsz overflow during header update

What do you think about this?

Thanks
Minfei

2015-08-11 08:42:21

by Baoquan He

[permalink] [raw]
Subject: Re: [Patch v2] align crash_notes allocation to make it be inside one physical page

Hi,

On 08/11/15 at 04:01pm, Minfei Huang wrote:
> Baoquan.
>
> If the size of notes never be exceeded to PAGE_SIZE, I think we can
> revert below patch, since the situation which describes in patch does
> not happen.
>
> commit 38dfac843cb6d7be1874888839817404a15a6b3c
> Author: Greg Pearson <[email protected]>
> Date: Mon Feb 10 14:25:36 2014 -0800
>
> vmcore: prevent PT_NOTE p_memsz overflow during header update
>
> What do you think about this?

Yeah, I am fine with this reverting. If people all agree with this I can
post patch to revert this. In fact I am eagerer to revert below commit
since it's a littble bit confusing after crash_notes crossing 2
pages bug is fixed. Below commit is a work around fix , but it's too much.
I am willing to hear people's idea.

commit 34b47764297130b21aaeb4cc6119bb811814b8e3
Author: WANG Chao <[email protected]>
Date: Tue Feb 17 13:46:01 2015 -0800

vmcore: fix PT_NOTE n_namesz, n_descsz overflow issue

Thanks
Baoquan