Currently, update_note_header_size_elf64() and
update_note_header_size_elf32() will add the size
of a PT_NOTE entry to real_sz even if that causes real_sz
to exceeds max_sz. This patch corrects the while loop logic
in those routines to ensure that does not happen.
One possible negative side effect of exceeding the max_sz
limit is an allocation failure in merge_note_headers_elf64()
or merge_note_headers_elf32() which would produce console
output such as the following while booting the crash
kernel.
vmalloc: allocation failure: 14076997632 bytes
swapper/0: page allocation failure: order:0, mode:0x80d2
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-gbp1 #7
ffffffff817dcc30 ffff88003025fc28 ffffffff815bdb0b ffff88003025fcb0
ffffffff8113b3d0 ffffffff817dcc30 ffff88003025fc48 ffffc90000000018
ffff88003025fcc0 ffff88003025fc60 ffff88003025fc80 ffff88002b5df980
Call Trace:
[<ffffffff815bdb0b>] dump_stack+0x19/0x1b
[<ffffffff8113b3d0>] warn_alloc_failed+0xf0/0x160
[<ffffffff81a1267d>] ? merge_note_headers_elf64.constprop.9+0x116/0x24a
[<ffffffff8116d34e>] __vmalloc_node_range+0x19e/0x250
[<ffffffff81210454>] ? read_from_oldmem.part.0+0xa4/0xe0
[<ffffffff8116d4ec>] vmalloc_user+0x4c/0x70
[<ffffffff81a1267d>] ? merge_note_headers_elf64.constprop.9+0x116/0x24a
[<ffffffff81a1267d>] merge_note_headers_elf64.constprop.9+0x116/0x24a
[<ffffffff81a12cce>] vmcore_init+0x2d4/0x76c
[<ffffffff8120f9f0>] ? kcore_update_ram+0x1f0/0x1f0
[<ffffffff81063b92>] ? walk_system_raange+0x112/0x130
[<ffffffff81a129fa>] ? merge_note_headers_elf32.constprop.8+0x249/0x249
[<ffffffff810020e2>] do_one_initcall+0xe2/0x190
[<ffffffff819e20c4>] kernel_init_freeable+0x17c/0x207
[<ffffffff819e18d0>] ? do_early_param+0x88/0x88
[<ffffffff815a0d20>] ? rest_init+0x80/0x80
[<ffffffff815a0d2e>] kernel_init+0xe/0x180
[<ffffffff815cd8ac>] ret_from_fork+0x7c/0xb0
[<ffffffff815a0d20>] ? rest_init+0x80/0x80
Kdump: vmcore not initialized
kdump: dump target is /dev/sda4
kdump: saving to /sysroot//var/crash/127.0.0.1-2014.01.28-13:58:52/
kdump: saving vmcore-dmesg.txt
Cannot open /proc/vmcore: No such file or directory
kdump: saving vmcore-dmesg.txt failed
kdump: saving vmcore
kdump: saving vmcore failed
This type of failure has been seen on a four socket prototype
system with certain memory configurations. Most PT_NOTE sections
have a single entry similar to:
n_namesz = 0x5
n_descsz = 0x150
n_type = 0x1
Occasionally, a second entry is encountered with very
large n_namesz and n_descsz sizes:
n_namesz = 0x80000008
n_descsz = 0x510ae163
n_type = 0x80000008
Not yet sure of the source of these extra entries, they
seem bogus, but they shouldn't cause crash dump to fail.
Signed-off-by: Greg Pearson <[email protected]>
---
fs/proc/vmcore.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 2ca7ba0..90a4469 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -468,12 +468,13 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
return rc;
}
nhdr_ptr = notes_section;
- while (real_sz < max_sz) {
- if (nhdr_ptr->n_namesz == 0)
- break;
+ while (nhdr_ptr->n_namesz != 0) {
sz = sizeof(Elf64_Nhdr) +
((nhdr_ptr->n_namesz + 3) & ~3) +
((nhdr_ptr->n_descsz + 3) & ~3);
+ /* Silently drop further PT_NOTE entries */
+ if ((real_sz + sz) > max_sz)
+ break;
real_sz += sz;
nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
}
@@ -648,12 +649,13 @@ static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
return rc;
}
nhdr_ptr = notes_section;
- while (real_sz < max_sz) {
- if (nhdr_ptr->n_namesz == 0)
- break;
+ while (nhdr_ptr->n_namesz != 0) {
sz = sizeof(Elf32_Nhdr) +
((nhdr_ptr->n_namesz + 3) & ~3) +
((nhdr_ptr->n_descsz + 3) & ~3);
+ /* Silently drop further PT_NOTE entries */
+ if ((real_sz + sz) > max_sz)
+ break;
real_sz += sz;
nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
}
--
1.8.3.2
On Fri, 31 Jan 2014 16:06:06 -0700 Greg Pearson <[email protected]> wrote:
> Currently, update_note_header_size_elf64() and
> update_note_header_size_elf32() will add the size
> of a PT_NOTE entry to real_sz even if that causes real_sz
> to exceeds max_sz. This patch corrects the while loop logic
> in those routines to ensure that does not happen.
>
> One possible negative side effect of exceeding the max_sz
> limit is an allocation failure in merge_note_headers_elf64()
> or merge_note_headers_elf32() which would produce console
> output such as the following while booting the crash
> kernel.
>
> ...
>
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -468,12 +468,13 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
> return rc;
> }
> nhdr_ptr = notes_section;
> - while (real_sz < max_sz) {
> - if (nhdr_ptr->n_namesz == 0)
> - break;
> + while (nhdr_ptr->n_namesz != 0) {
> sz = sizeof(Elf64_Nhdr) +
> ((nhdr_ptr->n_namesz + 3) & ~3) +
> ((nhdr_ptr->n_descsz + 3) & ~3);
> + /* Silently drop further PT_NOTE entries */
> + if ((real_sz + sz) > max_sz)
> + break;
What are the implications of silently dropping some notes? Should we
warn when it occurs?
> real_sz += sz;
> nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
> }
On Fri, 31 Jan 2014 16:06:06 -0700 Greg Pearson <[email protected]> wrote:
> Currently, update_note_header_size_elf64() and
> update_note_header_size_elf32() will add the size
> of a PT_NOTE entry to real_sz even if that causes real_sz
> to exceeds max_sz. This patch corrects the while loop logic
> in those routines to ensure that does not happen.
>
> ...
>
> Occasionally, a second entry is encountered with very
> large n_namesz and n_descsz sizes:
>
> n_namesz = 0x80000008
> n_descsz = 0x510ae163
> n_type = 0x80000008
Hang on.
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -468,12 +468,13 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
> return rc;
> }
> nhdr_ptr = notes_section;
> - while (real_sz < max_sz) {
> - if (nhdr_ptr->n_namesz == 0)
> - break;
> + while (nhdr_ptr->n_namesz != 0) {
> sz = sizeof(Elf64_Nhdr) +
> ((nhdr_ptr->n_namesz + 3) & ~3) +
> ((nhdr_ptr->n_descsz + 3) & ~3);
> + /* Silently drop further PT_NOTE entries */
> + if ((real_sz + sz) > max_sz)
> + break;
If we are encountering notes with these crazy sizes then what is
preventing (real_sx + sz) from wrapping through zero, which would
defeat this check?
On 01/31/2014 04:16 PM, Andrew Morton wrote:
> On Fri, 31 Jan 2014 16:06:06 -0700 Greg Pearson <[email protected]> wrote:
>
>> Currently, update_note_header_size_elf64() and
>> update_note_header_size_elf32() will add the size
>> of a PT_NOTE entry to real_sz even if that causes real_sz
>> to exceeds max_sz. This patch corrects the while loop logic
>> in those routines to ensure that does not happen.
>>
>> ...
>>
>> Occasionally, a second entry is encountered with very
>> large n_namesz and n_descsz sizes:
>>
>> n_namesz = 0x80000008
>> n_descsz = 0x510ae163
>> n_type = 0x80000008
> Hang on.
>
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -468,12 +468,13 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
>> return rc;
>> }
>> nhdr_ptr = notes_section;
>> - while (real_sz < max_sz) {
>> - if (nhdr_ptr->n_namesz == 0)
>> - break;
>> + while (nhdr_ptr->n_namesz != 0) {
>> sz = sizeof(Elf64_Nhdr) +
>> ((nhdr_ptr->n_namesz + 3) & ~3) +
>> ((nhdr_ptr->n_descsz + 3) & ~3);
>> + /* Silently drop further PT_NOTE entries */
>> + if ((real_sz + sz) > max_sz)
>> + break;
> If we are encountering notes with these crazy sizes then what is
> preventing (real_sx + sz) from wrapping through zero, which would
> defeat this check?
>
>
As far as I know the only consequence of dropping a PT_NOTE entry is
that it would not be available in the crash dump for use in debugging.
I'm not sure how important this data might be for triage. I'm guessing
that in cases where one of these strange PT_NOTE entries shows up with a
size that causes an overflow it probably isn't even a real PT_NOTE entry
so dropping it won't matter, but that's a guess at this point since I'm
still trying to figure out how the bogus entries were created.
I think the wrap case is handled ok by the current code since "real_sz"
and "sz" are both declared as u64, while the "n_namesz" and "n_descsz"
fields are declared as u32. This is true for both the elf32 and elf64 case.
Adding a pr_warn() is probably a good idea, then I can remove the comment.
--
Greg
On Sat, 1 Feb 2014 01:07:29 +0000 "Pearson, Greg" <[email protected]> wrote:
> As far as I know the only consequence of dropping a PT_NOTE entry is
> that it would not be available in the crash dump for use in debugging.
> I'm not sure how important this data might be for triage. I'm guessing
> that in cases where one of these strange PT_NOTE entries shows up with a
> size that causes an overflow it probably isn't even a real PT_NOTE entry
> so dropping it won't matter, but that's a guess at this point since I'm
> still trying to figure out how the bogus entries were created.
Can we detect the crazy-huge notes, skip them and then proceed with
the following sanely-sized ones?
Andrew Morton <[email protected]> writes:
> On Sat, 1 Feb 2014 01:07:29 +0000 "Pearson, Greg" <[email protected]> wrote:
>
>> As far as I know the only consequence of dropping a PT_NOTE entry is
>> that it would not be available in the crash dump for use in debugging.
>> I'm not sure how important this data might be for triage. I'm guessing
>> that in cases where one of these strange PT_NOTE entries shows up with a
>> size that causes an overflow it probably isn't even a real PT_NOTE entry
>> so dropping it won't matter, but that's a guess at this point since I'm
>> still trying to figure out how the bogus entries were created.
>
> Can we detect the crazy-huge notes, skip them and then proceed with
> the following sanely-sized ones?
The only way we can have following sanely-sized notes is if they are in
a separate note segment (one of our extensions for kdump and
/proc/vmcore merges them together).
Eric
On Sun, Feb 02, 2014 at 02:25:25PM -0800, Eric W. Biederman wrote:
> Andrew Morton <[email protected]> writes:
>
> > On Sat, 1 Feb 2014 01:07:29 +0000 "Pearson, Greg" <[email protected]> wrote:
> >
> >> As far as I know the only consequence of dropping a PT_NOTE entry is
> >> that it would not be available in the crash dump for use in debugging.
> >> I'm not sure how important this data might be for triage. I'm guessing
> >> that in cases where one of these strange PT_NOTE entries shows up with a
> >> size that causes an overflow it probably isn't even a real PT_NOTE entry
> >> so dropping it won't matter, but that's a guess at this point since I'm
> >> still trying to figure out how the bogus entries were created.
> >
> > Can we detect the crazy-huge notes, skip them and then proceed with
> > the following sanely-sized ones?
>
> The only way we can have following sanely-sized notes is if they are in
> a separate note segment (one of our extensions for kdump and
> /proc/vmcore merges them together).
This processing is happening before we have merged ELF notes. Previous
kernel/kexec-tools prepared per cpu PT_NOTE type ELF note. One for
each cpu. And by default it prepares only one ELF note per PT_NOTE. So
there should not be more notes in the same PT_NOTE.
Also even if there are, n_namesz and n_descsz values seem so high that
after skipping these nothing valid should be after that.
So I will not be too worried about skipping seemingly corrupted ELf
notes. I think giving a warning makes sense though. Is somebody
overwriting the memory area in kenrel reserved for per cpu PT_NOTE.
Thanks
Vivek
On 02/03/2014 08:47 AM, Vivek Goyal wrote:
> On Sun, Feb 02, 2014 at 02:25:25PM -0800, Eric W. Biederman wrote:
>> Andrew Morton <[email protected]> writes:
>>
>>> On Sat, 1 Feb 2014 01:07:29 +0000 "Pearson, Greg" <[email protected]> wrote:
>>>
>>>> As far as I know the only consequence of dropping a PT_NOTE entry is
>>>> that it would not be available in the crash dump for use in debugging.
>>>> I'm not sure how important this data might be for triage. I'm guessing
>>>> that in cases where one of these strange PT_NOTE entries shows up with a
>>>> size that causes an overflow it probably isn't even a real PT_NOTE entry
>>>> so dropping it won't matter, but that's a guess at this point since I'm
>>>> still trying to figure out how the bogus entries were created.
>>> Can we detect the crazy-huge notes, skip them and then proceed with
>>> the following sanely-sized ones?
>> The only way we can have following sanely-sized notes is if they are in
>> a separate note segment (one of our extensions for kdump and
>> /proc/vmcore merges them together).
> This processing is happening before we have merged ELF notes. Previous
> kernel/kexec-tools prepared per cpu PT_NOTE type ELF note. One for
> each cpu. And by default it prepares only one ELF note per PT_NOTE. So
> there should not be more notes in the same PT_NOTE.
>
> Also even if there are, n_namesz and n_descsz values seem so high that
> after skipping these nothing valid should be after that.
>
> So I will not be too worried about skipping seemingly corrupted ELf
> notes. I think giving a warning makes sense though. Is somebody
> overwriting the memory area in kenrel reserved for per cpu PT_NOTE.
I haven't figured out the cause of the strange second PT_NOTE entries
yet, but I suspect some type of memory corruption.
I'll re-cut the patch and add a pr_warn() when we drop an entry.
--
Greg
>
> Thanks
> Vivek
On Mon, Feb 03, 2014 at 04:57:56PM +0000, Pearson, Greg wrote:
[..]
> > So I will not be too worried about skipping seemingly corrupted ELf
> > notes. I think giving a warning makes sense though. Is somebody
> > overwriting the memory area in kenrel reserved for per cpu PT_NOTE.
>
> I haven't figured out the cause of the strange second PT_NOTE entries
> yet, but I suspect some type of memory corruption.
Hi Greg,
May be put some printk() in crash_save_cpu() and make sure that at
creation time notes are being created properly. That will atleast
make clear whether elf notes were not created properly or they got
corrupted later.
Thanks
Vivek