2013-03-05 07:04:58

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 00/20] kdump, vmcore: support mmap() on /proc/vmcore

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation.

In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and IO work. Update of page
table and the following TLB flush makes such processing much slow;
though I have yet to make patch for makedumpfile and yet to confirm
how it's improved.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance. My simple benchmark shows the improvement
from 200 [MiB/sec] to over 50.0 [GiB/sec].

ChangeLog
=========

v1 => v2)

- Clean up the existing codes: use e_phoff, and remove the assumption
on PT_NOTE entries.
=> See PATCH 01, 02.

- Fix potencial bug that ELF haeader size is not included in exported
vmcoreinfo size.
=> See Patch 03.

- Divide patch modifying read_vmcore() into two: clean-up and primary
code change.
=> See Patch 9, 10.

- Put ELF note segments in page-size boundary on the 1st kernel
instead of copying them into the buffer on the 2nd kernel.
=> See Patch 11, 12, 13, 14, 16.

Benchmark
=========

No change is seen from the previous patch series. See the previous
one from here:

https://lkml.org/lkml/2013/2/14/89

TODO
====

- fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
confirm whether we can see enough performance improvement. The idea
is described here:
http://lists.infradead.org/pipermail/kexec/2013-February/007982.html

- fix crash utility and makedumpfile to support NT_VMCORE_PAD note
type. Both tools don't distinguish the same note types from
different note names, which is not conform to ELF specification; now
both reads NT_VMCORE_PAD note type as NT_VMCORE_DEBUGINFO.

Test
====

This patch set is composed based on v3.9-rc1.

Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.

---

HATAYAMA Daisuke (20):
vmcore: introduce mmap_vmcore()
vmcore: count holes generated by round-up operation for vmcore size
vmcore: round-up offset of vmcore object in page-size boundary
vmcore: check if vmcore objects satify mmap()'s page-size boundary requirement
vmcore: check NT_VMCORE_PAD as a mark indicating the end of ELF note buffer
kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
elf: introduce NT_VMCORE_PAD type
kexec, elf: introduce NT_VMCORE_DEBUGINFO note type
kexec: allocate vmcoreinfo note buffer on page-size boundary
vmcore: allocate per-cpu crash_notes objects on page-size boundary
vmcore: read buffers for vmcore objects copied from old memory
vmcore: clean up read_vmcore()
vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
vmcore: copy non page-size aligned head and tail pages in 2nd kernel
vmcore, procfs: introduce a flag to distinguish objects copied in 2nd kernel
vmcore: round up buffer size of ELF headers by PAGE_SIZE
vmcore: allocate buffer for ELF headers on page-size alignment
vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data size
vmcore: rearrange program headers without assuming consequtive PT_NOTE entries
vmcore: refer to e_phoff member explicitly


arch/s390/include/asm/kexec.h | 7
fs/proc/vmcore.c | 577 ++++++++++++++++++++++++++++++++---------
include/linux/kexec.h | 16 +
include/linux/proc_fs.h | 8 -
include/uapi/linux/elf.h | 5
kernel/kexec.c | 47 ++-
kernel/ksysfs.c | 2
7 files changed, 505 insertions(+), 157 deletions(-)

--
Thanks.
HATAYAMA, Daisuke


2013-03-05 07:05:10

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 02/20] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries

Current code assumes all PT_NOTE headers are placed at the beginning
of program header table and they are consequtive. But the assumption
could be broken by future changes on either kexec-tools or the 1st
kernel. This patch removes the assumption and rearranges program
headers as the following conditions are satisfied:

- PT_NOTE entry is unique at the first entry,

- the order of program headers are unchanged during this
rearrangement, only their positions are changed in positive
direction.

- unused part that occurs in the bottom of program headers are filled
with 0.

Also, this patch adds one exceptional case where the number of PT_NOTE
entries is somehow 0. Then, immediately go out of the function.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 92 +++++++++++++++++++++++++++++++++++++++++++-----------
1 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index abf4f01..b5c9e33 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -251,8 +251,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
struct list_head *vc_list)
{
- int i, nr_ptnote=0, rc=0;
- char *tmp;
+ int i, j, nr_ptnote=0, i_ptnote, rc=0;
Elf64_Ehdr *ehdr_ptr;
Elf64_Phdr phdr, *phdr_ptr;
Elf64_Nhdr *nhdr_ptr;
@@ -302,6 +301,39 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
kfree(notes_section);
}

+ if (nr_ptnote == 0)
+ goto out;
+
+ phdr_ptr = (Elf64_Phdr *)(elfptr + ehdr_ptr->e_phoff);
+
+ /* Remove unwanted PT_NOTE program headers. */
+
+ /* - 1st pass shifts non-PT_NOTE entries until the first
+ PT_NOTE entry. */
+ i_ptnote = -1;
+ for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
+ if (phdr_ptr[i].p_type == PT_NOTE) {
+ i_ptnote = i;
+ break;
+ }
+ }
+ BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
+ memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf64_Phdr));
+
+ /* - 2nd pass moves the remaining non-PT_NOTE entries under
+ the first PT_NOTE entry. */
+ for (i = j = i_ptnote + 1; i < ehdr_ptr->e_phnum; i++) {
+ if (phdr_ptr[i].p_type != PT_NOTE) {
+ memmove(phdr_ptr + j, phdr_ptr + i,
+ sizeof(Elf64_Phdr));
+ j++;
+ }
+ }
+
+ /* - Finally, fill unused part with 0. */
+ memset(phdr_ptr + ehdr_ptr->e_phnum - (nr_ptnote - 1), 0,
+ (nr_ptnote - 1) * sizeof(Elf64_Phdr));
+
/* Prepare merged PT_NOTE program header. */
phdr.p_type = PT_NOTE;
phdr.p_flags = 0;
@@ -313,18 +345,14 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
phdr.p_align = 0;

/* Add merged PT_NOTE program header*/
- tmp = elfptr + ehdr_ptr->e_phoff;
- memcpy(tmp, &phdr, sizeof(phdr));
- tmp += sizeof(phdr);
+ memcpy(phdr_ptr, &phdr, sizeof(Elf64_Phdr));

- /* Remove unwanted PT_NOTE program headers. */
- i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
- *elfsz = *elfsz - i;
- memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
+ *elfsz = *elfsz - (nr_ptnote - 1) * sizeof(Elf64_Phdr);

/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;

+out:
return 0;
}

@@ -332,8 +360,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
struct list_head *vc_list)
{
- int i, nr_ptnote=0, rc=0;
- char *tmp;
+ int i, j, nr_ptnote=0, i_ptnote, rc=0;
Elf32_Ehdr *ehdr_ptr;
Elf32_Phdr phdr, *phdr_ptr;
Elf32_Nhdr *nhdr_ptr;
@@ -383,6 +410,39 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
kfree(notes_section);
}

+ if (nr_ptnote == 0)
+ goto out;
+
+ phdr_ptr = (Elf32_Phdr *)(elfptr + ehdr_ptr->e_phoff);
+
+ /* Remove unwanted PT_NOTE program headers. */
+
+ /* - 1st pass shifts non-PT_NOTE entries until the first
+ PT_NOTE entry. */
+ i_ptnote = -1;
+ for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
+ if (phdr_ptr[i].p_type == PT_NOTE) {
+ i_ptnote = i;
+ break;
+ }
+ }
+ BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
+ memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf32_Phdr));
+
+ /* - 2nd pass moves the remaining non-PT_NOTE entries under
+ the first PT_NOTE entry. */
+ for (i = j = i_ptnote + 1; i < ehdr_ptr->e_phnum; i++) {
+ if (phdr_ptr[i].p_type != PT_NOTE) {
+ memmove(phdr_ptr + j, phdr_ptr + i,
+ sizeof(Elf32_Phdr));
+ j++;
+ }
+ }
+
+ /* - Finally, fill unused part with 0. */
+ memset(phdr_ptr + ehdr_ptr->e_phnum - (nr_ptnote - 1), 0,
+ (nr_ptnote - 1) * sizeof(Elf32_Phdr));
+
/* Prepare merged PT_NOTE program header. */
phdr.p_type = PT_NOTE;
phdr.p_flags = 0;
@@ -394,18 +454,14 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
phdr.p_align = 0;

/* Add merged PT_NOTE program header*/
- tmp = elfptr + ehdr_ptr->e_phoff;
- memcpy(tmp, &phdr, sizeof(phdr));
- tmp += sizeof(phdr);
+ memcpy(phdr_ptr, &phdr, sizeof(Elf32_Phdr));

- /* Remove unwanted PT_NOTE program headers. */
- i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
- *elfsz = *elfsz - i;
- memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));
+ *elfsz = *elfsz - (nr_ptnote - 1) * sizeof(Elf32_Phdr);

/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;

+out:
return 0;
}

2013-03-05 07:05:16

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 03/20] vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data size

p_memsz member of program header entry with PT_NOTE type needs to have
size of the corresponding ELF note segment. Currently, vmcoreinfo
exports data part only. If vmcoreinfo reachs vmcoreinfo_max_size, then
in merge_note_headers_elf{32,64}, empty ELF note header cannot be
found or buffer overrun can happen.

Note: kexec-tools assigns PAGE_SIZE to p_memsz for other ELF note
types. Due to the above reason, the same issue occurs if actual ELF
note data exceeds (PAGE_SIZE - 2 * KEXEC_NOTE_HEAD_BYTES).

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

kernel/ksysfs.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 6ada93c..97d2763 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -126,7 +126,7 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
{
return sprintf(buf, "%lx %x\n",
paddr_vmcoreinfo_note(),
- (unsigned int)vmcoreinfo_max_size);
+ (unsigned int)sizeof(vmcoreinfo_note));
}
KERNEL_ATTR_RO(vmcoreinfo);

2013-03-05 07:05:27

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 05/20] vmcore: round up buffer size of ELF headers by PAGE_SIZE

To satisfy mmap() page-size boundary requirement, round up buffer size
of ELF headers by PAGE_SIZE. The resulting value becomes offset of ELF
note segments and it's assigned in unique PT_NOTE program header
entry.

Also, some part that assumes past ELF headers' size is replaced by
this new rounded-up value.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 1b02d01..c511cf4 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -340,7 +340,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
phdr.p_flags = 0;
note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
- phdr.p_offset = note_off;
+ phdr.p_offset = roundup(note_off, PAGE_SIZE);
phdr.p_vaddr = phdr.p_paddr = 0;
phdr.p_filesz = phdr.p_memsz = phdr_sz;
phdr.p_align = 0;
@@ -353,6 +353,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;

+ *elfsz = roundup(*elfsz, PAGE_SIZE);
out:
return 0;
}
@@ -449,7 +450,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
phdr.p_flags = 0;
note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
- phdr.p_offset = note_off;
+ phdr.p_offset = roundup(note_off, PAGE_SIZE);
phdr.p_vaddr = phdr.p_paddr = 0;
phdr.p_filesz = phdr.p_memsz = phdr_sz;
phdr.p_align = 0;
@@ -462,6 +463,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;

+ *elfsz = roundup(*elfsz, PAGE_SIZE);
out:
return 0;
}
@@ -482,9 +484,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */

/* First program header is PT_NOTE header. */
- vmcore_off = ehdr_ptr->e_phoff +
- (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
- phdr_ptr->p_memsz; /* Note sections */
+ vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
+ PAGE_SIZE);

for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)
@@ -519,9 +520,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */

/* First program header is PT_NOTE header. */
- vmcore_off = ehdr_ptr->e_phoff +
- (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
- phdr_ptr->p_memsz; /* Note sections */
+ vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
+ PAGE_SIZE);

for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)

2013-03-05 07:05:33

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 06/20] vmcore, procfs: introduce a flag to distinguish objects copied in 2nd kernel

The part of dump target memory is copied into the 2nd kernel if it
doesn't satisfy mmap()'s page-size boundary requirement. To
distinguish such copied object from usual old memory, a flag
MEM_TYPE_CURRENT_KERNEL is introduced. If this flag is set, the object
is considered being copied into buffer on the 2nd kernel.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

include/linux/proc_fs.h | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 8307f2f..11dd592 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -97,11 +97,17 @@ struct kcore_list {
int type;
};

+#define MEM_TYPE_CURRENT_KERNEL 0x1
+
struct vmcore {
struct list_head list;
- unsigned long long paddr;
+ union {
+ unsigned long long paddr;
+ char *buf;
+ };
unsigned long long size;
loff_t offset;
+ unsigned int flag;
};

#ifdef CONFIG_PROC_FS

2013-03-05 07:05:39

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 07/20] vmcore: copy non page-size aligned head and tail pages in 2nd kernel

Due to mmap() requirement, we need to copy pages not starting or
ending with page-size aligned address in 2nd kernel and to map them to
user-space.

For example, see the map below:

00000000-0001ffff : reserved
00010000-0009f7ff : System RAM
0009f800-0009ffff : reserved

where the System RAM ends with 0x9f800 that is not page-size
aligned. This map is divided into two parts:

00010000-0009dfff
0009f000-0009f7ff

and the first one is kept in old memory and the 2nd one is copied into
buffer on 2nd kernel.

This kind of non-page-size-aligned area can always occur since any
part of System RAM can be converted into reserved area at runtime.

If not doing copying like this and if remapping non page-size aligned
pages on old memory directly, mmap() had to export memory which is not
dump target to user-space. In the above example this is reserved
0x9f800-0xa0000.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 172 insertions(+), 20 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index c511cf4..6b071b4 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -474,11 +474,10 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
size_t elfsz,
struct list_head *vc_list)
{
- int i;
+ int i, rc;
Elf64_Ehdr *ehdr_ptr;
Elf64_Phdr *phdr_ptr;
loff_t vmcore_off;
- struct vmcore *new;

ehdr_ptr = (Elf64_Ehdr *)elfptr;
phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
@@ -488,20 +487,97 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
PAGE_SIZE);

for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+ u64 start, end, rest;
+
if (phdr_ptr->p_type != PT_LOAD)
continue;

- /* Add this contiguous chunk of memory to vmcore list.*/
- new = get_new_element();
- if (!new)
- return -ENOMEM;
- new->paddr = phdr_ptr->p_offset;
- new->size = phdr_ptr->p_memsz;
- list_add_tail(&new->list, vc_list);
+ start = phdr_ptr->p_offset;
+ end = phdr_ptr->p_offset + phdr_ptr->p_memsz;
+ rest = phdr_ptr->p_memsz;
+
+ if (start & ~PAGE_MASK) {
+ u64 paddr, len;
+ char *buf;
+ struct vmcore *new;
+
+ paddr = start;
+ len = min(roundup(start,PAGE_SIZE), end) - start;
+
+ buf = (char *)get_zeroed_page(GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+ rc = read_from_oldmem(buf + (start & ~PAGE_MASK), len,
+ &paddr, 0);
+ if (rc < 0) {
+ free_pages((unsigned long)buf, 0);
+ return rc;
+ }
+
+ new = get_new_element();
+ if (!new) {
+ free_pages((unsigned long)buf, 0);
+ return -ENOMEM;
+ }
+ new->flag |= MEM_TYPE_CURRENT_KERNEL;
+ new->size = PAGE_SIZE;
+ new->buf = buf;
+ list_add_tail(&new->list, vc_list);
+
+ rest -= len;
+ }
+
+ if (rest > 0 &&
+ roundup(start, PAGE_SIZE) < rounddown(end, PAGE_SIZE)) {
+ u64 paddr, len;
+ struct vmcore *new;
+
+ paddr = roundup(start, PAGE_SIZE);
+ len =rounddown(end,PAGE_SIZE)-roundup(start,PAGE_SIZE);
+
+ new = get_new_element();
+ if (!new)
+ return -ENOMEM;
+ new->paddr = paddr;
+ new->size = len;
+ list_add_tail(&new->list, vc_list);
+
+ rest -= len;
+ }
+
+ if (rest > 0) {
+ u64 paddr, len;
+ char *buf;
+ struct vmcore *new;
+
+ paddr = rounddown(end, PAGE_SIZE);
+ len = end - rounddown(end, PAGE_SIZE);
+
+ buf = (char *)get_zeroed_page(GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+ rc = read_from_oldmem(buf, len, &paddr, 0);
+ if (rc < 0) {
+ free_pages((unsigned long)buf, 0);
+ return rc;
+ }
+
+ new = get_new_element();
+ if (!new) {
+ free_pages((unsigned long)buf, 0);
+ return -ENOMEM;
+ }
+ new->flag |= MEM_TYPE_CURRENT_KERNEL;
+ new->size = PAGE_SIZE;
+ new->buf = buf;
+ list_add_tail(&new->list, vc_list);
+
+ rest -= len;
+ }

/* Update the program header offset. */
phdr_ptr->p_offset = vmcore_off;
- vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+ vmcore_off +=roundup(end,PAGE_SIZE)-rounddown(start,PAGE_SIZE);
}
return 0;
}
@@ -510,11 +586,10 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
size_t elfsz,
struct list_head *vc_list)
{
- int i;
+ int i, rc;
Elf32_Ehdr *ehdr_ptr;
Elf32_Phdr *phdr_ptr;
loff_t vmcore_off;
- struct vmcore *new;

ehdr_ptr = (Elf32_Ehdr *)elfptr;
phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
@@ -524,20 +599,97 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
PAGE_SIZE);

for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+ u64 start, end, rest;
+
if (phdr_ptr->p_type != PT_LOAD)
continue;

- /* Add this contiguous chunk of memory to vmcore list.*/
- new = get_new_element();
- if (!new)
- return -ENOMEM;
- new->paddr = phdr_ptr->p_offset;
- new->size = phdr_ptr->p_memsz;
- list_add_tail(&new->list, vc_list);
+ start = phdr_ptr->p_offset;
+ end = phdr_ptr->p_offset + phdr_ptr->p_memsz;
+ rest = phdr_ptr->p_memsz;
+
+ if (start & ~PAGE_MASK) {
+ u64 paddr, len;
+ char *buf;
+ struct vmcore *new;
+
+ paddr = start;
+ len = min(roundup(start,PAGE_SIZE), end) - start;
+
+ buf = (char *)get_zeroed_page(GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+ rc = read_from_oldmem(buf + (start & ~PAGE_MASK), len,
+ &paddr, 0);
+ if (rc < 0) {
+ free_pages((unsigned long)buf, 0);
+ return rc;
+ }
+
+ new = get_new_element();
+ if (!new) {
+ free_pages((unsigned long)buf, 0);
+ return -ENOMEM;
+ }
+ new->flag |= MEM_TYPE_CURRENT_KERNEL;
+ new->size = PAGE_SIZE;
+ new->buf = buf;
+ list_add_tail(&new->list, vc_list);
+
+ rest -= len;
+ }
+
+ if (rest > 0 &&
+ roundup(start, PAGE_SIZE) < rounddown(end, PAGE_SIZE)) {
+ u64 paddr, len;
+ struct vmcore *new;
+
+ paddr = roundup(start, PAGE_SIZE);
+ len =rounddown(end,PAGE_SIZE)-roundup(start,PAGE_SIZE);
+
+ new = get_new_element();
+ if (!new)
+ return -ENOMEM;
+ new->paddr = paddr;
+ new->size = len;
+ list_add_tail(&new->list, vc_list);
+
+ rest -= len;
+ }
+
+ if (rest > 0) {
+ u64 paddr, len;
+ char *buf;
+ struct vmcore *new;
+
+ paddr = rounddown(end, PAGE_SIZE);
+ len = end - rounddown(end, PAGE_SIZE);
+
+ buf = (char *)get_zeroed_page(GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+ rc = read_from_oldmem(buf, len, &paddr, 0);
+ if (rc < 0) {
+ free_pages((unsigned long)buf, 0);
+ return rc;
+ }
+
+ new = get_new_element();
+ if (!new) {
+ free_pages((unsigned long)buf, 0);
+ return -ENOMEM;
+ }
+ new->flag |= MEM_TYPE_CURRENT_KERNEL;
+ new->size = PAGE_SIZE;
+ new->buf = buf;
+ list_add_tail(&new->list, vc_list);
+
+ rest -= len;
+ }

/* Update the program header offset */
phdr_ptr->p_offset = vmcore_off;
- vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+ vmcore_off +=roundup(end,PAGE_SIZE)-rounddown(start,PAGE_SIZE);
}
return 0;
}

2013-03-05 07:05:43

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 08/20] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel

If flag MEM_TYPE_CURRENT_KERNEL is set, the object is copied in some
buffer on the 2nd kernel, so clean-up funciton needs to free it.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 6b071b4..e4751ce 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -926,6 +926,10 @@ void vmcore_cleanup(void)
struct vmcore *m;

m = list_entry(pos, struct vmcore, list);
+
+ if (m->flag & MEM_TYPE_CURRENT_KERNEL)
+ free_pages((unsigned long)m->buf, get_order(m->size));
+
list_del(&m->list);
kfree(m);
}

2013-03-05 07:05:55

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 10/20] vmcore: read buffers for vmcore objects copied from old memory

If flag MEM_TYPE_CURRENT_KERNEL is set, the object is copied in the
buffer on the 2nd kernel, then read_vmcore() reads the buffer. If the
flag is not set, read_vmcore() reads old memory as usual.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 09b7f34..c8899dc 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -158,10 +158,17 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
tsz = m->offset + m->size - *fpos;
if (buflen < tsz)
tsz = buflen;
- start = m->paddr + *fpos - m->offset;
- tmp = read_from_oldmem(buffer, tsz, &start, 1);
- if (tmp < 0)
- return tmp;
+ if (m->flag & MEM_TYPE_CURRENT_KERNEL) {
+ if (copy_to_user(buffer,
+ m->buf + *fpos - m->offset,
+ tsz))
+ return -EFAULT;
+ } else {
+ start = m->paddr + *fpos - m->offset;
+ tmp = read_from_oldmem(buffer, tsz, &start, 1);
+ if (tmp < 0)
+ return tmp;
+ }
buflen -= tsz;
*fpos += tsz;
buffer += tsz;

2013-03-05 07:05:49

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 09/20] vmcore: clean up read_vmcore()

Clean up read_vmcore(). Part for objects in vmcore_list can be written
uniformly to part for ELF headers. By this change, duplicate and
complicated codes are removed, so it's more clear to see what's done
there.

Also, by this change, map_offset_to_paddr() is no longer used. Remove
it.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 68 ++++++++++++++++--------------------------------------
1 files changed, 20 insertions(+), 48 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index e4751ce..09b7f34 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -118,27 +118,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
return read;
}

-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
- struct vmcore **m_ptr)
-{
- struct vmcore *m;
- u64 paddr;
-
- list_for_each_entry(m, vc_list, list) {
- u64 start, end;
- start = m->offset;
- end = m->offset + m->size - 1;
- if (offset >= start && offset <= end) {
- paddr = m->paddr + offset - start;
- *m_ptr = m;
- return paddr;
- }
- }
- *m_ptr = NULL;
- return 0;
-}
-
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
@@ -147,8 +126,8 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
{
ssize_t acc = 0, tmp;
size_t tsz;
- u64 start, nr_bytes;
- struct vmcore *curr_m = NULL;
+ u64 start;
+ struct vmcore *m;

if (buflen == 0 || *fpos >= vmcore_size)
return 0;
@@ -174,33 +153,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
return acc;
}

- start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
- if (!curr_m)
- return -EINVAL;
-
- while (buflen) {
- tsz = min_t(size_t, buflen, PAGE_SIZE - (start & ~PAGE_MASK));
-
- /* Calculate left bytes in current memory segment. */
- nr_bytes = (curr_m->size - (start - curr_m->paddr));
- if (tsz > nr_bytes)
- tsz = nr_bytes;
-
- tmp = read_from_oldmem(buffer, tsz, &start, 1);
- if (tmp < 0)
- return tmp;
- buflen -= tsz;
- *fpos += tsz;
- buffer += tsz;
- acc += tsz;
- if (start >= (curr_m->paddr + curr_m->size)) {
- if (curr_m->list.next == &vmcore_list)
- return acc; /*EOF*/
- curr_m = list_entry(curr_m->list.next,
- struct vmcore, list);
- start = curr_m->paddr;
+ list_for_each_entry(m, &vmcore_list, list) {
+ if (*fpos < m->offset + m->size) {
+ tsz = m->offset + m->size - *fpos;
+ if (buflen < tsz)
+ tsz = buflen;
+ start = m->paddr + *fpos - m->offset;
+ tmp = read_from_oldmem(buffer, tsz, &start, 1);
+ if (tmp < 0)
+ return tmp;
+ buflen -= tsz;
+ *fpos += tsz;
+ buffer += tsz;
+ acc += tsz;
+
+ /* leave now if filled buffer already */
+ if (buflen == 0)
+ return acc;
}
}
+
return acc;
}

2013-03-05 07:05:59

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 11/20] vmcore: allocate per-cpu crash_notes objects on page-size boundary

To satisfy mmap()'s page-size boundary requirement, allocate per-cpu
crash_notes objects on page-size boundary.

/proc/vmcore on the 2nd kernel checks if each note objects is
allocated on page-size boundary. If there's some object not satisfying
the page-size boundary requirement, /proc/vmcore doesn't provide
mmap() interface.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

kernel/kexec.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index bddd3d7..d1f365e 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1234,7 +1234,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
static int __init crash_notes_memory_init(void)
{
/* Allocate memory for saving cpu registers. */
- crash_notes = alloc_percpu(note_buf_t);
+ crash_notes = __alloc_percpu(roundup(sizeof(note_buf_t), PAGE_SIZE),
+ PAGE_SIZE);
if (!crash_notes) {
printk("Kexec: Memory allocation for saving cpu register"
" states failed\n");

2013-03-05 07:06:06

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 12/20] kexec: allocate vmcoreinfo note buffer on page-size boundary

To satisfy mmap()'s page-size boundary requirement, specify aligned
attribute to vmcoreinfo_note objects to allocate it on page-size
boundary.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

include/linux/kexec.h | 6 ++++--
kernel/kexec.c | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d2e6927..5113570 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -185,8 +185,10 @@ extern struct kimage *kexec_crash_image;
#define VMCOREINFO_BYTES (4096)
#define VMCOREINFO_NOTE_NAME "VMCOREINFO"
#define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4)
-#define VMCOREINFO_NOTE_SIZE (KEXEC_NOTE_HEAD_BYTES*2 + VMCOREINFO_BYTES \
- + VMCOREINFO_NOTE_NAME_BYTES)
+#define VMCOREINFO_NOTE_SIZE ALIGN(KEXEC_NOTE_HEAD_BYTES*2 \
+ +VMCOREINFO_BYTES \
+ +VMCOREINFO_NOTE_NAME_BYTES, \
+ PAGE_SIZE)

/* Location of a reserved region to hold the crash kernel.
*/
diff --git a/kernel/kexec.c b/kernel/kexec.c
index d1f365e..195de6d 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -43,7 +43,7 @@ note_buf_t __percpu *crash_notes;

/* vmcoreinfo stuff */
static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
-u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
+u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4] __aligned(PAGE_SIZE);
size_t vmcoreinfo_size;
size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);

2013-03-05 07:06:19

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 14/20] elf: introduce NT_VMCORE_PAD type

The NT_VMCORE_PAD type is introduced to make both crash_notes buffer
and vmcoreinfo_note buffer satisfy mmap()'s page-size boundary
requirement by filling them with this note type.

The purpose of this type is just to align the buffer in page-size
boundary; it has no meaning in contents, which are fully filled with
zero.

This note type belongs to "VMCOREINFO" name space and the type in this
name space is 7. The reason why the numbers from 1 to 5 is not chosen
is that for the ones from 1 to 4, there are the corresponding note
types using the same number in "CORE" name space, and crash utility
and makedumpfile don't distinguish note types by name space at all;
for the remaining 5, this has somehow not been used since v2.4.0
kernel despite the fact that NT_AUXV is defined as 6. It looks that it
avoids some dependency to 5. Here simply 5 is not chosen for
conservative viewpoint.

By this change, gdb and binutils work well without any change, but
makedumpfile and crash utility need their changes to distinguish two
note types in "VMCOREINFO" name space.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

include/uapi/linux/elf.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index b869904..9753e4c 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -402,6 +402,7 @@ typedef struct elf64_shdr {
* Notes exported from /proc/vmcore, belonging to "VMCOREINFO" name.
*/
#define NT_VMCORE_DEBUGINFO 0 /* vmcore system kernel's debuginfo */
+#define NT_VMCORE_PAD 7 /* vmcore padding of note segments */

/* Note header in a PT_NOTE section */
typedef struct elf32_note {

2013-03-05 07:06:15

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 13/20] kexec, elf: introduce NT_VMCORE_DEBUGINFO note type

This patch introduces NT_VMCORE_DEBUGINFO to a unique note type in
VMCOREINFO name, which has had no name so far. The name means that
it's a kind of note type in vmcoreinfo that contains system kernel's
debug information.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

include/uapi/linux/elf.h | 4 ++++
kernel/kexec.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 8072d35..b869904 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -398,6 +398,10 @@ typedef struct elf64_shdr {
#define NT_METAG_CBUF 0x500 /* Metag catch buffer registers */
#define NT_METAG_RPIPE 0x501 /* Metag read pipeline state */

+/*
+ * Notes exported from /proc/vmcore, belonging to "VMCOREINFO" name.
+ */
+#define NT_VMCORE_DEBUGINFO 0 /* vmcore system kernel's debuginfo */

/* Note header in a PT_NOTE section */
typedef struct elf32_note {
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 195de6d..6597b82 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1438,8 +1438,8 @@ static void update_vmcoreinfo_note(void)

if (!vmcoreinfo_size)
return;
- buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data,
- vmcoreinfo_size);
+ buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_DEBUGINFO,
+ vmcoreinfo_data, vmcoreinfo_size);
final_note(buf);
}

2013-03-05 07:06:26

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
note type to make them satisfy mmap()'s page-size boundary
requirement.

So far, end of note segments has been marked by zero-filled elf
header. Instead, this patch writes NT_VMCORE_PAD note in the end of
note segments until the offset on page-size boundary.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

arch/s390/include/asm/kexec.h | 7 ++++--
include/linux/kexec.h | 12 ++++++-----
kernel/kexec.c | 46 ++++++++++++++++++++++++++---------------
3 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 694bcd6..2a531ce 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -41,8 +41,8 @@
/*
* Size for s390x ELF notes per CPU
*
- * Seven notes plus zero note at the end: prstatus, fpregset, timer,
- * tod_cmp, tod_reg, control regs, and prefix
+ * Seven notes plus note with NT_VMCORE_PAD type at the end: prstatus,
+ * fpregset, timer, tod_cmp, tod_reg, control regs, and prefix
*/
#define KEXEC_NOTE_BYTES \
(ALIGN(sizeof(struct elf_note), 4) * 8 + \
@@ -53,7 +53,8 @@
ALIGN(sizeof(u64), 4) + \
ALIGN(sizeof(u32), 4) + \
ALIGN(sizeof(u64) * 16, 4) + \
- ALIGN(sizeof(u32), 4) \
+ ALIGN(sizeof(u32), 4) + \
+ VMCOREINFO_NOTE_NAME_BYTES \
)

/* Provide a dummy definition to avoid build failures. */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 5113570..6592935 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -47,14 +47,16 @@
#define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
#define KEXEC_CORE_NOTE_DESC_BYTES ALIGN(sizeof(struct elf_prstatus), 4)
/*
- * The per-cpu notes area is a list of notes terminated by a "NULL"
- * note header. For kdump, the code in vmcore.c runs in the context
- * of the second kernel to combine them into one note.
+ * The per-cpu notes area is a list of notes terminated by a note
+ * header with NT_VMCORE_PAD type. For kdump, the code in vmcore.c
+ * runs in the context of the second kernel to combine them into one
+ * note.
*/
#ifndef KEXEC_NOTE_BYTES
#define KEXEC_NOTE_BYTES ( (KEXEC_NOTE_HEAD_BYTES * 2) + \
KEXEC_CORE_NOTE_NAME_BYTES + \
- KEXEC_CORE_NOTE_DESC_BYTES )
+ KEXEC_CORE_NOTE_DESC_BYTES + \
+ VMCOREINFO_NOTE_NAME_BYTES)
#endif

/*
@@ -187,7 +189,7 @@ extern struct kimage *kexec_crash_image;
#define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4)
#define VMCOREINFO_NOTE_SIZE ALIGN(KEXEC_NOTE_HEAD_BYTES*2 \
+VMCOREINFO_BYTES \
- +VMCOREINFO_NOTE_NAME_BYTES, \
+ +VMCOREINFO_NOTE_NAME_BYTES*2, \
PAGE_SIZE)

/* Location of a reserved region to hold the crash kernel.
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 6597b82..fbdc0f0 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -40,6 +40,7 @@

/* Per cpu memory for storing cpu states in case of system crash. */
note_buf_t __percpu *crash_notes;
+static size_t crash_notes_size = ALIGN(sizeof(note_buf_t), PAGE_SIZE);

/* vmcoreinfo stuff */
static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
@@ -1177,6 +1178,7 @@ unlock:
return ret;
}

+/* If @data is NULL, fill @buf with 0 in @data_len bytes. */
static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
size_t data_len)
{
@@ -1189,26 +1191,36 @@ static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
buf += (sizeof(note) + 3)/4;
memcpy(buf, name, note.n_namesz);
buf += (note.n_namesz + 3)/4;
- memcpy(buf, data, note.n_descsz);
+ if (data)
+ memcpy(buf, data, note.n_descsz);
+ else
+ memset(buf, 0, note.n_descsz);
buf += (note.n_descsz + 3)/4;

return buf;
}

-static void final_note(u32 *buf)
+static void final_note(u32 *buf, size_t buf_len, size_t data_len)
{
- struct elf_note note;
+ size_t used_bytes, pad_hdr_size;

- note.n_namesz = 0;
- note.n_descsz = 0;
- note.n_type = 0;
- memcpy(buf, &note, sizeof(note));
+ pad_hdr_size = KEXEC_NOTE_HEAD_BYTES + VMCOREINFO_NOTE_NAME_BYTES;
+
+ /*
+ * keep space for ELF note header and "VMCOREINFO" name to
+ * terminate ELF segment by NT_VMCORE_PAD note.
+ */
+ BUG_ON(data_len + pad_hdr_size > buf_len);
+
+ used_bytes = data_len + pad_hdr_size;
+ append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_PAD, NULL,
+ roundup(used_bytes, PAGE_SIZE) - used_bytes);
}

void crash_save_cpu(struct pt_regs *regs, int cpu)
{
struct elf_prstatus prstatus;
- u32 *buf;
+ u32 *buf, *buf_end;

if ((cpu < 0) || (cpu >= nr_cpu_ids))
return;
@@ -1226,16 +1238,15 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
memset(&prstatus, 0, sizeof(prstatus));
prstatus.pr_pid = current->pid;
elf_core_copy_kernel_regs(&prstatus.pr_reg, regs);
- buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS,
- &prstatus, sizeof(prstatus));
- final_note(buf);
+ buf_end = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS,
+ &prstatus, sizeof(prstatus));
+ final_note(buf_end, crash_notes_size, (buf_end - buf) * sizeof(u32));
}

static int __init crash_notes_memory_init(void)
{
/* Allocate memory for saving cpu registers. */
- crash_notes = __alloc_percpu(roundup(sizeof(note_buf_t), PAGE_SIZE),
- PAGE_SIZE);
+ crash_notes = __alloc_percpu(crash_notes_size, PAGE_SIZE);
if (!crash_notes) {
printk("Kexec: Memory allocation for saving cpu register"
" states failed\n");
@@ -1434,13 +1445,14 @@ int __init parse_crashkernel_low(char *cmdline,

static void update_vmcoreinfo_note(void)
{
- u32 *buf = vmcoreinfo_note;
+ u32 *buf = vmcoreinfo_note, *buf_end;

if (!vmcoreinfo_size)
return;
- buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_DEBUGINFO,
- vmcoreinfo_data, vmcoreinfo_size);
- final_note(buf);
+ buf_end = append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_DEBUGINFO,
+ vmcoreinfo_data, vmcoreinfo_size);
+ final_note(buf_end, sizeof(vmcoreinfo_note),
+ (buf_end - buf) * sizeof(u32));
}

void crash_save_vmcoreinfo(void)

2013-03-05 07:06:36

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 17/20] vmcore: check if vmcore objects satify mmap()'s page-size boundary requirement

If there's some vmcore object that doesn't satisfy page-size boundary
requirement, remap_pfn_range() fails to remap it to user-space.

Objects that posisbly don't satisfy the requirement are ELF note
segments only. The memory chunks corresponding to PT_LOAD entries are
guaranteed to satisfy page-size boundary requirement by the copy from
old memory to buffer in 2nd kernel done in later patch.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index e432946..5582aaa 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -38,6 +38,8 @@ static u64 vmcore_size;

static struct proc_dir_entry *proc_vmcore = NULL;

+static bool support_mmap_vmcore;
+
/*
* Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
* The called function has to take care of module refcounting.
@@ -897,6 +899,7 @@ static int __init parse_crash_elf_headers(void)
static int __init vmcore_init(void)
{
int rc = 0;
+ struct vmcore *m;

/* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
if (!(is_vmcore_usable()))
@@ -907,6 +910,25 @@ static int __init vmcore_init(void)
return rc;
}

+ /* If some object doesn't satisfy PAGE_SIZE boundary
+ * requirement, mmap_vmcore() is not exported to
+ * user-space. */
+ support_mmap_vmcore = true;
+ list_for_each_entry(m, &vmcore_list, list) {
+ u64 paddr;
+
+ if (m->flag & MEM_TYPE_CURRENT_KERNEL)
+ paddr = (u64)__pa(m->buf);
+ else
+ paddr = m->paddr;
+
+ if ((m->offset & ~PAGE_MASK) || (paddr & ~PAGE_MASK)
+ || (m->size & ~PAGE_MASK)) {
+ support_mmap_vmcore = false;
+ break;
+ }
+ }
+
proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
if (proc_vmcore)
proc_vmcore->size = vmcore_size;

2013-03-05 07:06:31

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 16/20] vmcore: check NT_VMCORE_PAD as a mark indicating the end of ELF note buffer

Modern kernel marks the end of ELF note buffer with NT_VMCORE_PAD type
note in order to make the buffer satisfy mmap()'s page-size boundary
requirement. This patch makes finishing reading each buffer if the
note type now being read is NT_VMCORE_PAD type.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index c8899dc..e432946 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -259,12 +259,24 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
}
nhdr_ptr = notes_section;
for (j = 0; j < max_sz; j += sz) {
+ char *name;
+
+ /* Old kernel marks the end of ELF note buffer
+ * with empty header. */
if (nhdr_ptr->n_namesz == 0)
break;
sz = sizeof(Elf64_Nhdr) +
((nhdr_ptr->n_namesz + 3) & ~3) +
((nhdr_ptr->n_descsz + 3) & ~3);
real_sz += sz;
+
+ /* Modern kernel marks the end of ELF note
+ * buffer with NT_VMCORE_PAD type note. */
+ name = (char *)(nhdr_ptr + 1);
+ if (strncmp(name, VMCOREINFO_NOTE_NAME,
+ sizeof(VMCOREINFO_NOTE_NAME)) == 0
+ && nhdr_ptr->n_type == NT_VMCORE_PAD)
+ break;
nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
}

@@ -369,12 +381,24 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
}
nhdr_ptr = notes_section;
for (j = 0; j < max_sz; j += sz) {
+ char *name;
+
+ /* Old kernel marks the end of ELF note buffer
+ * with empty header. */
if (nhdr_ptr->n_namesz == 0)
break;
sz = sizeof(Elf32_Nhdr) +
((nhdr_ptr->n_namesz + 3) & ~3) +
((nhdr_ptr->n_descsz + 3) & ~3);
real_sz += sz;
+
+ /* Modern kernel marks the end of ELF note
+ * buffer with NT_VMCORE_PAD type note. */
+ name = (char *)(nhdr_ptr + 1);
+ if (strncmp(name, VMCOREINFO_NOTE_NAME,
+ sizeof(VMCOREINFO_NOTE_NAME)) == 0
+ && nhdr_ptr->n_type == NT_VMCORE_PAD)
+ break;
nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
}

2013-03-05 07:06:47

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 19/20] vmcore: count holes generated by round-up operation for vmcore size

The previous patch changed offsets of each vmcore objects by round-up
operation. vmcore size must count the holes.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 6660cd5..709d21a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -195,7 +195,7 @@ static struct vmcore* __init get_new_element(void)
return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
}

-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
{
int i;
u64 size;
@@ -204,15 +204,15 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)

ehdr_ptr = (Elf64_Ehdr *)elfptr;
phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
- size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+ size = elfsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
- size += phdr_ptr->p_memsz;
+ size += roundup(phdr_ptr->p_memsz, PAGE_SIZE);
phdr_ptr++;
}
return size;
}

-static u64 __init get_vmcore_size_elf32(char *elfptr)
+static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
{
int i;
u64 size;
@@ -221,9 +221,9 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)

ehdr_ptr = (Elf32_Ehdr *)elfptr;
phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
- size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+ size = elfsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
- size += phdr_ptr->p_memsz;
+ size += roundup(phdr_ptr->p_memsz, PAGE_SIZE);
phdr_ptr++;
}
return size;
@@ -878,14 +878,14 @@ static int __init parse_crash_elf_headers(void)
return rc;

/* Determine vmcore size. */
- vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+ vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
} else if (e_ident[EI_CLASS] == ELFCLASS32) {
rc = parse_crash_elf32_headers();
if (rc)
return rc;

/* Determine vmcore size. */
- vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+ vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
} else {
pr_warn("Warning: Core image elf header is not sane\n");
return -EINVAL;

2013-03-05 07:06:52

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 20/20] vmcore: introduce mmap_vmcore()

This patch introduces mmap_vmcore().

If flag MEM_TYPE_CURRENT_KERNEL is set, remapped is the buffer on the
2nd kernel. If not set, remapped is some area in old memory.

Neither writable nor executable mapping is permitted even with
mprotect(). Non-writable mapping is also requirement of
remap_pfn_range() when mapping linear pages on non-consequtive
physical pages; see is_cow_mapping().

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 709d21a..9433ef0 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -185,9 +185,81 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
return acc;
}

+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+ size_t size = vma->vm_end - vma->vm_start;
+ u64 start, end, len, tsz;
+ struct vmcore *m;
+
+ if (!support_mmap_vmcore)
+ return -ENODEV;
+
+ start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+ end = start + size;
+
+ if (size > vmcore_size || end > vmcore_size)
+ return -EINVAL;
+
+ if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+ return -EPERM;
+
+ vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+ len = 0;
+
+ if (start < elfcorebuf_sz) {
+ u64 pfn;
+
+ tsz = elfcorebuf_sz - start;
+ if (size < tsz)
+ tsz = size;
+ pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+ if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+ vma->vm_page_prot))
+ return -EAGAIN;
+ size -= tsz;
+ start += tsz;
+ len += tsz;
+
+ if (size == 0)
+ return 0;
+ }
+
+ list_for_each_entry(m, &vmcore_list, list) {
+ if (start < m->offset + m->size) {
+ u64 pfn = 0;
+
+ tsz = m->offset + m->size - start;
+ if (size < tsz)
+ tsz = size;
+ if (m->flag & MEM_TYPE_CURRENT_KERNEL) {
+ pfn = __pa(m->buf + start - m->offset)
+ >> PAGE_SHIFT;
+ } else {
+ pfn = (m->paddr + (start - m->offset))
+ >> PAGE_SHIFT;
+ }
+ if (remap_pfn_range(vma, vma->vm_start + len, pfn, tsz,
+ vma->vm_page_prot)) {
+ do_munmap(vma->vm_mm, vma->vm_start, len);
+ return -EAGAIN;
+ }
+ size -= tsz;
+ start += tsz;
+ len += tsz;
+
+ if (size == 0)
+ return 0;
+ }
+ }
+
+ return 0;
+}
+
static const struct file_operations proc_vmcore_operations = {
.read = read_vmcore,
.llseek = default_llseek,
+ .mmap = mmap_vmcore,
};

static struct vmcore* __init get_new_element(void)

2013-03-05 07:07:33

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 18/20] vmcore: round-up offset of vmcore object in page-size boundary

To satisfy mmap()'s page-size bounary requirement, round-up offset of
each vmcore objects in page-size boundary; each offset is connected to
user-space virtual address through mapping of mmap().

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 18 ++++++++----------
1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 5582aaa..6660cd5 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -700,7 +700,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
}

/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
+static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
struct list_head *vc_list)
{
loff_t vmcore_off;
@@ -710,17 +710,16 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
ehdr_ptr = (Elf64_Ehdr *)elfptr;

/* Skip Elf header and program headers. */
- vmcore_off = ehdr_ptr->e_phoff +
- (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+ vmcore_off = elfsz;

list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
- vmcore_off += m->size;
+ vmcore_off += roundup(m->size, PAGE_SIZE);
}
}

/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
+static void __init set_vmcore_list_offsets_elf32(char *elfptr, size_t elfsz,
struct list_head *vc_list)
{
loff_t vmcore_off;
@@ -730,12 +729,11 @@ static void __init set_vmcore_list_offsets_elf32(char *elfptr,
ehdr_ptr = (Elf32_Ehdr *)elfptr;

/* Skip Elf header and program headers. */
- vmcore_off = ehdr_ptr->e_phoff +
- (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+ vmcore_off = elfsz;

list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
- vmcore_off += m->size;
+ vmcore_off += roundup(m->size, PAGE_SIZE);
}
}

@@ -795,7 +793,7 @@ static int __init parse_crash_elf64_headers(void)
get_order(elfcorebuf_sz_orig));
return rc;
}
- set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+ set_vmcore_list_offsets_elf64(elfcorebuf, elfcorebuf_sz, &vmcore_list);
return 0;
}

@@ -855,7 +853,7 @@ static int __init parse_crash_elf32_headers(void)
get_order(elfcorebuf_sz_orig));
return rc;
}
- set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+ set_vmcore_list_offsets_elf32(elfcorebuf, elfcorebuf_sz, &vmcore_list);
return 0;
}

2013-03-05 07:05:24

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 04/20] vmcore: allocate buffer for ELF headers on page-size alignment

Allocate buffer for ELF headers on page-size aligned boudary to
satisfy mmap() requirement. For this, __get_free_pages() is used
instead of kmalloc().

Also, later patch will decrease actually used buffer size for ELF
headers, so it's necessary to keep original buffer size and actually
used buffer size separately. elfcorebuf_sz_orig keeps the original one
and elfcorebuf_sz the actually used one.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 30 +++++++++++++++++++++---------
1 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b5c9e33..1b02d01 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -31,6 +31,7 @@ static LIST_HEAD(vmcore_list);
/* Stores the pointer to the buffer containing kernel elf core headers. */
static char *elfcorebuf;
static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;

/* Total size of vmcore file. */
static u64 vmcore_size;
@@ -610,26 +611,31 @@ static int __init parse_crash_elf64_headers(void)

/* Read in all elf headers. */
elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf64_Phdr);
- elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+ elfcorebuf_sz_orig = elfcorebuf_sz;
+ elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(elfcorebuf_sz_orig));
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
if (rc < 0) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}

/* Merge all PT_NOTE headers into one. */
rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
&vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
@@ -665,26 +671,31 @@ static int __init parse_crash_elf32_headers(void)

/* Read in all elf headers. */
elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf32_Phdr);
- elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+ elfcorebuf_sz_orig = elfcorebuf_sz;
+ elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(elfcorebuf_sz));
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
if (rc < 0) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}

/* Merge all PT_NOTE headers into one. */
rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
&vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
@@ -766,7 +777,8 @@ void vmcore_cleanup(void)
list_del(&m->list);
kfree(m);
}
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
elfcorebuf = NULL;
}
EXPORT_SYMBOL_GPL(vmcore_cleanup);

2013-03-05 07:05:06

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

Code around /proc/vmcore currently assumes program header table is
next to ELF header. But future change can break the assumption on
kexec-tools and the 1st kernel. To avoid worst case, now refer to
e_phoff member that indicates position of program header table in
file-offset.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---

fs/proc/vmcore.c | 40 ++++++++++++++++++++--------------------
1 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b870f74..abf4f01 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -221,8 +221,8 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
Elf64_Phdr *phdr_ptr;

ehdr_ptr = (Elf64_Ehdr *)elfptr;
- phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
- size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+ phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
+ size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
size += phdr_ptr->p_memsz;
phdr_ptr++;
@@ -238,8 +238,8 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
Elf32_Phdr *phdr_ptr;

ehdr_ptr = (Elf32_Ehdr *)elfptr;
- phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
- size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+ phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
+ size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
size += phdr_ptr->p_memsz;
phdr_ptr++;
@@ -259,7 +259,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
u64 phdr_sz = 0, note_off;

ehdr_ptr = (Elf64_Ehdr *)elfptr;
- phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
+ phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
int j;
void *notes_section;
@@ -305,7 +305,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
/* Prepare merged PT_NOTE program header. */
phdr.p_type = PT_NOTE;
phdr.p_flags = 0;
- note_off = sizeof(Elf64_Ehdr) +
+ note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
phdr.p_offset = note_off;
phdr.p_vaddr = phdr.p_paddr = 0;
@@ -313,14 +313,14 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
phdr.p_align = 0;

/* Add merged PT_NOTE program header*/
- tmp = elfptr + sizeof(Elf64_Ehdr);
+ tmp = elfptr + ehdr_ptr->e_phoff;
memcpy(tmp, &phdr, sizeof(phdr));
tmp += sizeof(phdr);

/* Remove unwanted PT_NOTE program headers. */
i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
*elfsz = *elfsz - i;
- memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
+ memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));

/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -340,7 +340,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
u64 phdr_sz = 0, note_off;

ehdr_ptr = (Elf32_Ehdr *)elfptr;
- phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
+ phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
int j;
void *notes_section;
@@ -386,7 +386,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
/* Prepare merged PT_NOTE program header. */
phdr.p_type = PT_NOTE;
phdr.p_flags = 0;
- note_off = sizeof(Elf32_Ehdr) +
+ note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
phdr.p_offset = note_off;
phdr.p_vaddr = phdr.p_paddr = 0;
@@ -394,14 +394,14 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
phdr.p_align = 0;

/* Add merged PT_NOTE program header*/
- tmp = elfptr + sizeof(Elf32_Ehdr);
+ tmp = elfptr + ehdr_ptr->e_phoff;
memcpy(tmp, &phdr, sizeof(phdr));
tmp += sizeof(phdr);

/* Remove unwanted PT_NOTE program headers. */
i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
*elfsz = *elfsz - i;
- memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
+ memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));

/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -422,10 +422,10 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
struct vmcore *new;

ehdr_ptr = (Elf64_Ehdr *)elfptr;
- phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
+ phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */

/* First program header is PT_NOTE header. */
- vmcore_off = sizeof(Elf64_Ehdr) +
+ vmcore_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
phdr_ptr->p_memsz; /* Note sections */

@@ -459,10 +459,10 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
struct vmcore *new;

ehdr_ptr = (Elf32_Ehdr *)elfptr;
- phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
+ phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */

/* First program header is PT_NOTE header. */
- vmcore_off = sizeof(Elf32_Ehdr) +
+ vmcore_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
phdr_ptr->p_memsz; /* Note sections */

@@ -496,7 +496,7 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
ehdr_ptr = (Elf64_Ehdr *)elfptr;

/* Skip Elf header and program headers. */
- vmcore_off = sizeof(Elf64_Ehdr) +
+ vmcore_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);

list_for_each_entry(m, vc_list, list) {
@@ -516,7 +516,7 @@ static void __init set_vmcore_list_offsets_elf32(char *elfptr,
ehdr_ptr = (Elf32_Ehdr *)elfptr;

/* Skip Elf header and program headers. */
- vmcore_off = sizeof(Elf32_Ehdr) +
+ vmcore_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);

list_for_each_entry(m, vc_list, list) {
@@ -553,7 +553,7 @@ static int __init parse_crash_elf64_headers(void)
}

/* Read in all elf headers. */
- elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+ elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf64_Phdr);
elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
if (!elfcorebuf)
return -ENOMEM;
@@ -608,7 +608,7 @@ static int __init parse_crash_elf32_headers(void)
}

/* Read in all elf headers. */
- elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
+ elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf32_Phdr);
elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
if (!elfcorebuf)
return -ENOMEM;

2013-03-05 07:37:42

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

于 2013年03月02日 16:35, HATAYAMA Daisuke 写道:
> Code around /proc/vmcore currently assumes program header table is
> next to ELF header. But future change can break the assumption on
> kexec-tools and the 1st kernel. To avoid worst case, now refer to
> e_phoff member that indicates position of program header table in
> file-offset.

Reviewed-by: Zhang Yanfei <[email protected]>

>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> fs/proc/vmcore.c | 40 ++++++++++++++++++++--------------------
> 1 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index b870f74..abf4f01 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -221,8 +221,8 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
> Elf64_Phdr *phdr_ptr;
>
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> - size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
> + size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
> for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> size += phdr_ptr->p_memsz;
> phdr_ptr++;
> @@ -238,8 +238,8 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
> Elf32_Phdr *phdr_ptr;
>
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> - size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
> + size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
> for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> size += phdr_ptr->p_memsz;
> phdr_ptr++;
> @@ -259,7 +259,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> u64 phdr_sz = 0, note_off;
>
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> int j;
> void *notes_section;
> @@ -305,7 +305,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> /* Prepare merged PT_NOTE program header. */
> phdr.p_type = PT_NOTE;
> phdr.p_flags = 0;
> - note_off = sizeof(Elf64_Ehdr) +
> + note_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> phdr.p_offset = note_off;
> phdr.p_vaddr = phdr.p_paddr = 0;
> @@ -313,14 +313,14 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> phdr.p_align = 0;
>
> /* Add merged PT_NOTE program header*/
> - tmp = elfptr + sizeof(Elf64_Ehdr);
> + tmp = elfptr + ehdr_ptr->e_phoff;
> memcpy(tmp, &phdr, sizeof(phdr));
> tmp += sizeof(phdr);
>
> /* Remove unwanted PT_NOTE program headers. */
> i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
> *elfsz = *elfsz - i;
> - memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> + memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
>
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -340,7 +340,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> u64 phdr_sz = 0, note_off;
>
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> int j;
> void *notes_section;
> @@ -386,7 +386,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> /* Prepare merged PT_NOTE program header. */
> phdr.p_type = PT_NOTE;
> phdr.p_flags = 0;
> - note_off = sizeof(Elf32_Ehdr) +
> + note_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
> phdr.p_offset = note_off;
> phdr.p_vaddr = phdr.p_paddr = 0;
> @@ -394,14 +394,14 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> phdr.p_align = 0;
>
> /* Add merged PT_NOTE program header*/
> - tmp = elfptr + sizeof(Elf32_Ehdr);
> + tmp = elfptr + ehdr_ptr->e_phoff;
> memcpy(tmp, &phdr, sizeof(phdr));
> tmp += sizeof(phdr);
>
> /* Remove unwanted PT_NOTE program headers. */
> i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
> *elfsz = *elfsz - i;
> - memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
> + memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));
>
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -422,10 +422,10 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> struct vmcore *new;
>
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
>
> /* First program header is PT_NOTE header. */
> - vmcore_off = sizeof(Elf64_Ehdr) +
> + vmcore_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
> phdr_ptr->p_memsz; /* Note sections */
>
> @@ -459,10 +459,10 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> struct vmcore *new;
>
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
>
> /* First program header is PT_NOTE header. */
> - vmcore_off = sizeof(Elf32_Ehdr) +
> + vmcore_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
> phdr_ptr->p_memsz; /* Note sections */
>
> @@ -496,7 +496,7 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
>
> /* Skip Elf header and program headers. */
> - vmcore_off = sizeof(Elf64_Ehdr) +
> + vmcore_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
>
> list_for_each_entry(m, vc_list, list) {
> @@ -516,7 +516,7 @@ static void __init set_vmcore_list_offsets_elf32(char *elfptr,
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
>
> /* Skip Elf header and program headers. */
> - vmcore_off = sizeof(Elf32_Ehdr) +
> + vmcore_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
>
> list_for_each_entry(m, vc_list, list) {
> @@ -553,7 +553,7 @@ static int __init parse_crash_elf64_headers(void)
> }
>
> /* Read in all elf headers. */
> - elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> + elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf64_Phdr);
> elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> if (!elfcorebuf)
> return -ENOMEM;
> @@ -608,7 +608,7 @@ static int __init parse_crash_elf32_headers(void)
> }
>
> /* Read in all elf headers. */
> - elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> + elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf32_Phdr);
> elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> if (!elfcorebuf)
> return -ENOMEM;
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2013-03-05 08:38:47

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 02/20] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries

于 2013年03月02日 16:35, HATAYAMA Daisuke 写道:
> Current code assumes all PT_NOTE headers are placed at the beginning
> of program header table and they are consequtive. But the assumption
> could be broken by future changes on either kexec-tools or the 1st
> kernel. This patch removes the assumption and rearranges program
> headers as the following conditions are satisfied:
>
> - PT_NOTE entry is unique at the first entry,
>
> - the order of program headers are unchanged during this
> rearrangement, only their positions are changed in positive
> direction.
>
> - unused part that occurs in the bottom of program headers are filled
> with 0.
>
> Also, this patch adds one exceptional case where the number of PT_NOTE
> entries is somehow 0. Then, immediately go out of the function.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> fs/proc/vmcore.c | 92 +++++++++++++++++++++++++++++++++++++++++++-----------
> 1 files changed, 74 insertions(+), 18 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index abf4f01..b5c9e33 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -251,8 +251,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
> static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> struct list_head *vc_list)
> {
> - int i, nr_ptnote=0, rc=0;
> - char *tmp;
> + int i, j, nr_ptnote=0, i_ptnote, rc=0;

After applying the patch, there are two "j" defined.

251 static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
252 struct list_head *vc_list)
253 {
254 int i, j, nr_ptnote=0, i_ptnote, rc=0;
255 Elf64_Ehdr *ehdr_ptr;
256 Elf64_Phdr phdr, *phdr_ptr;
257 Elf64_Nhdr *nhdr_ptr;
258 u64 phdr_sz = 0, note_off;
259
260 ehdr_ptr = (Elf64_Ehdr *)elfptr;
261 phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
262 for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
263 int j;
264 void *notes_section;
265 struct vmcore *new;


line 254 and 263.


> Elf64_Ehdr *ehdr_ptr;
> Elf64_Phdr phdr, *phdr_ptr;
> Elf64_Nhdr *nhdr_ptr;
> @@ -302,6 +301,39 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> kfree(notes_section);
> }
>
> + if (nr_ptnote == 0)
> + goto out;
> +
> + phdr_ptr = (Elf64_Phdr *)(elfptr + ehdr_ptr->e_phoff);
> +
> + /* Remove unwanted PT_NOTE program headers. */
> +
> + /* - 1st pass shifts non-PT_NOTE entries until the first
> + PT_NOTE entry. */
> + i_ptnote = -1;
> + for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
> + if (phdr_ptr[i].p_type == PT_NOTE) {
> + i_ptnote = i;
> + break;
> + }
> + }
> + BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
> + memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf64_Phdr));

is there any problem with this move? What is the batch bytes for every loop
of memmove?

if i_ptnode == 2, so we have

-------------------------------------
| PT_LOAD 1 | PT_LOAD 2 | PT_NOTE 1 |
-------------------------------------

-->

-------------------------------------
| | PT_LOAD 1 | PT_LOAD 2 |
-------------------------------------

right? In the move, Does PT_LOAD 1 overwrite the original PT_LOAD 2?

> +
> + /* - 2nd pass moves the remaining non-PT_NOTE entries under
> + the first PT_NOTE entry. */
> + for (i = j = i_ptnote + 1; i < ehdr_ptr->e_phnum; i++) {
> + if (phdr_ptr[i].p_type != PT_NOTE) {
> + memmove(phdr_ptr + j, phdr_ptr + i,
> + sizeof(Elf64_Phdr));
> + j++;
> + }
> + }
> +
> + /* - Finally, fill unused part with 0. */
> + memset(phdr_ptr + ehdr_ptr->e_phnum - (nr_ptnote - 1), 0,
> + (nr_ptnote - 1) * sizeof(Elf64_Phdr));
> +
> /* Prepare merged PT_NOTE program header. */
> phdr.p_type = PT_NOTE;
> phdr.p_flags = 0;
> @@ -313,18 +345,14 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> phdr.p_align = 0;
>
> /* Add merged PT_NOTE program header*/
> - tmp = elfptr + ehdr_ptr->e_phoff;
> - memcpy(tmp, &phdr, sizeof(phdr));
> - tmp += sizeof(phdr);
> + memcpy(phdr_ptr, &phdr, sizeof(Elf64_Phdr));
>
> - /* Remove unwanted PT_NOTE program headers. */
> - i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
> - *elfsz = *elfsz - i;
> - memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
> + *elfsz = *elfsz - (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>
> +out:
> return 0;
> }
>
> @@ -332,8 +360,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> struct list_head *vc_list)
> {
> - int i, nr_ptnote=0, rc=0;
> - char *tmp;
> + int i, j, nr_ptnote=0, i_ptnote, rc=0;
> Elf32_Ehdr *ehdr_ptr;
> Elf32_Phdr phdr, *phdr_ptr;
> Elf32_Nhdr *nhdr_ptr;
> @@ -383,6 +410,39 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> kfree(notes_section);
> }
>
> + if (nr_ptnote == 0)
> + goto out;
> +
> + phdr_ptr = (Elf32_Phdr *)(elfptr + ehdr_ptr->e_phoff);
> +
> + /* Remove unwanted PT_NOTE program headers. */
> +
> + /* - 1st pass shifts non-PT_NOTE entries until the first
> + PT_NOTE entry. */
> + i_ptnote = -1;
> + for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
> + if (phdr_ptr[i].p_type == PT_NOTE) {
> + i_ptnote = i;
> + break;
> + }
> + }
> + BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
> + memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf32_Phdr));
> +
> + /* - 2nd pass moves the remaining non-PT_NOTE entries under
> + the first PT_NOTE entry. */
> + for (i = j = i_ptnote + 1; i < ehdr_ptr->e_phnum; i++) {
> + if (phdr_ptr[i].p_type != PT_NOTE) {
> + memmove(phdr_ptr + j, phdr_ptr + i,
> + sizeof(Elf32_Phdr));
> + j++;
> + }
> + }
> +
> + /* - Finally, fill unused part with 0. */
> + memset(phdr_ptr + ehdr_ptr->e_phnum - (nr_ptnote - 1), 0,
> + (nr_ptnote - 1) * sizeof(Elf32_Phdr));
> +
> /* Prepare merged PT_NOTE program header. */
> phdr.p_type = PT_NOTE;
> phdr.p_flags = 0;
> @@ -394,18 +454,14 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> phdr.p_align = 0;
>
> /* Add merged PT_NOTE program header*/
> - tmp = elfptr + ehdr_ptr->e_phoff;
> - memcpy(tmp, &phdr, sizeof(phdr));
> - tmp += sizeof(phdr);
> + memcpy(phdr_ptr, &phdr, sizeof(Elf32_Phdr));
>
> - /* Remove unwanted PT_NOTE program headers. */
> - i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
> - *elfsz = *elfsz - i;
> - memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));
> + *elfsz = *elfsz - (nr_ptnote - 1) * sizeof(Elf32_Phdr);
>
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>
> +out:
> return 0;
> }
>
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2013-03-05 09:03:02

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v2 02/20] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries

From: Zhang Yanfei <[email protected]>
Subject: Re: [PATCH v2 02/20] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries
Date: Tue, 5 Mar 2013 16:36:53 +0800

> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:35, HATAYAMA Daisuke $B<LF;(B:
>> Current code assumes all PT_NOTE headers are placed at the beginning
>> of program header table and they are consequtive. But the assumption
>> could be broken by future changes on either kexec-tools or the 1st
>> kernel. This patch removes the assumption and rearranges program
>> headers as the following conditions are satisfied:
>>
>> - PT_NOTE entry is unique at the first entry,
>>
>> - the order of program headers are unchanged during this
>> rearrangement, only their positions are changed in positive
>> direction.
>>
>> - unused part that occurs in the bottom of program headers are filled
>> with 0.
>>
>> Also, this patch adds one exceptional case where the number of PT_NOTE
>> entries is somehow 0. Then, immediately go out of the function.
>>
>> Signed-off-by: HATAYAMA Daisuke <[email protected]>
>> ---
>>
>> fs/proc/vmcore.c | 92 +++++++++++++++++++++++++++++++++++++++++++-----------
>> 1 files changed, 74 insertions(+), 18 deletions(-)
>>
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index abf4f01..b5c9e33 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -251,8 +251,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>> static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> struct list_head *vc_list)
>> {
>> - int i, nr_ptnote=0, rc=0;
>> - char *tmp;
>> + int i, j, nr_ptnote=0, i_ptnote, rc=0;
>
> After applying the patch, there are two "j" defined.
>
> 251 static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> 252 struct list_head *vc_list)
> 253 {
> 254 int i, j, nr_ptnote=0, i_ptnote, rc=0;
> 255 Elf64_Ehdr *ehdr_ptr;
> 256 Elf64_Phdr phdr, *phdr_ptr;
> 257 Elf64_Nhdr *nhdr_ptr;
> 258 u64 phdr_sz = 0, note_off;
> 259
> 260 ehdr_ptr = (Elf64_Ehdr *)elfptr;
> 261 phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
> 262 for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> 263 int j;
> 264 void *notes_section;
> 265 struct vmcore *new;
>
>
> line 254 and 263.
>

I've already noticed the name of the inner j is never best in meaning
under development but I didn't make patch for it; it's beyond the
scope of this patch series.

I'll replace the outer j by another incremental variable like k.

>
>> Elf64_Ehdr *ehdr_ptr;
>> Elf64_Phdr phdr, *phdr_ptr;
>> Elf64_Nhdr *nhdr_ptr;
>> @@ -302,6 +301,39 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> kfree(notes_section);
>> }
>>
>> + if (nr_ptnote == 0)
>> + goto out;
>> +
>> + phdr_ptr = (Elf64_Phdr *)(elfptr + ehdr_ptr->e_phoff);
>> +
>> + /* Remove unwanted PT_NOTE program headers. */
>> +
>> + /* - 1st pass shifts non-PT_NOTE entries until the first
>> + PT_NOTE entry. */
>> + i_ptnote = -1;
>> + for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
>> + if (phdr_ptr[i].p_type == PT_NOTE) {
>> + i_ptnote = i;
>> + break;
>> + }
>> + }
>> + BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
>> + memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf64_Phdr));
>
> is there any problem with this move? What is the batch bytes for every loop
> of memmove?
>
> if i_ptnode == 2, so we have
>
> -------------------------------------
> | PT_LOAD 1 | PT_LOAD 2 | PT_NOTE 1 |
> -------------------------------------
>
> -->
>
> -------------------------------------
> | | PT_LOAD 1 | PT_LOAD 2 |
> -------------------------------------
>
> right? In the move, Does PT_LOAD 1 overwrite the original PT_LOAD 2?
>

Right and yes, see man memmove and man memcpy, and please compare
them.

Thanks.
HATAYAMA, Daisuke

2013-03-05 09:31:08

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 03/20] vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data size

于 2013年03月02日 16:36, HATAYAMA Daisuke 写道:
> p_memsz member of program header entry with PT_NOTE type needs to have
> size of the corresponding ELF note segment. Currently, vmcoreinfo
> exports data part only. If vmcoreinfo reachs vmcoreinfo_max_size, then
> in merge_note_headers_elf{32,64}, empty ELF note header cannot be
> found or buffer overrun can happen.
>
> Note: kexec-tools assigns PAGE_SIZE to p_memsz for other ELF note
> types. Due to the above reason, the same issue occurs if actual ELF
> note data exceeds (PAGE_SIZE - 2 * KEXEC_NOTE_HEAD_BYTES).
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> kernel/ksysfs.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index 6ada93c..97d2763 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -126,7 +126,7 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
> {
> return sprintf(buf, "%lx %x\n",
> paddr_vmcoreinfo_note(),
> - (unsigned int)vmcoreinfo_max_size);
> + (unsigned int)sizeof(vmcoreinfo_note));
> }
> KERNEL_ATTR_RO(vmcoreinfo);

Reviewed-by: Zhang Yanfei <[email protected]>

>
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2013-03-05 09:37:24

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 02/20] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries

$BP2(B 2013$BG/(B03$B7n(B05$BF|(B 17:02, HATAYAMA Daisuke $B<LF;(B:
> From: Zhang Yanfei <[email protected]>
> Subject: Re: [PATCH v2 02/20] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries
> Date: Tue, 5 Mar 2013 16:36:53 +0800
>
>> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:35, HATAYAMA Daisuke $B<LF;(B:
>>> Current code assumes all PT_NOTE headers are placed at the beginning
>>> of program header table and they are consequtive. But the assumption
>>> could be broken by future changes on either kexec-tools or the 1st
>>> kernel. This patch removes the assumption and rearranges program
>>> headers as the following conditions are satisfied:
>>>
>>> - PT_NOTE entry is unique at the first entry,
>>>
>>> - the order of program headers are unchanged during this
>>> rearrangement, only their positions are changed in positive
>>> direction.
>>>
>>> - unused part that occurs in the bottom of program headers are filled
>>> with 0.
>>>
>>> Also, this patch adds one exceptional case where the number of PT_NOTE
>>> entries is somehow 0. Then, immediately go out of the function.
>>>
>>> Signed-off-by: HATAYAMA Daisuke <[email protected]>
>>> ---
>>>
>>> fs/proc/vmcore.c | 92 +++++++++++++++++++++++++++++++++++++++++++-----------
>>> 1 files changed, 74 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>>> index abf4f01..b5c9e33 100644
>>> --- a/fs/proc/vmcore.c
>>> +++ b/fs/proc/vmcore.c
>>> @@ -251,8 +251,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>>> static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>>> struct list_head *vc_list)
>>> {
>>> - int i, nr_ptnote=0, rc=0;
>>> - char *tmp;
>>> + int i, j, nr_ptnote=0, i_ptnote, rc=0;
>>
>> After applying the patch, there are two "j" defined.
>>
>> 251 static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> 252 struct list_head *vc_list)
>> 253 {
>> 254 int i, j, nr_ptnote=0, i_ptnote, rc=0;
>> 255 Elf64_Ehdr *ehdr_ptr;
>> 256 Elf64_Phdr phdr, *phdr_ptr;
>> 257 Elf64_Nhdr *nhdr_ptr;
>> 258 u64 phdr_sz = 0, note_off;
>> 259
>> 260 ehdr_ptr = (Elf64_Ehdr *)elfptr;
>> 261 phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>> 262 for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>> 263 int j;
>> 264 void *notes_section;
>> 265 struct vmcore *new;
>>
>>
>> line 254 and 263.
>>
>
> I've already noticed the name of the inner j is never best in meaning
> under development but I didn't make patch for it; it's beyond the
> scope of this patch series.
>
> I'll replace the outer j by another incremental variable like k.
>
>>
>>> Elf64_Ehdr *ehdr_ptr;
>>> Elf64_Phdr phdr, *phdr_ptr;
>>> Elf64_Nhdr *nhdr_ptr;
>>> @@ -302,6 +301,39 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>>> kfree(notes_section);
>>> }
>>>
>>> + if (nr_ptnote == 0)
>>> + goto out;
>>> +
>>> + phdr_ptr = (Elf64_Phdr *)(elfptr + ehdr_ptr->e_phoff);
>>> +
>>> + /* Remove unwanted PT_NOTE program headers. */
>>> +
>>> + /* - 1st pass shifts non-PT_NOTE entries until the first
>>> + PT_NOTE entry. */
>>> + i_ptnote = -1;
>>> + for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
>>> + if (phdr_ptr[i].p_type == PT_NOTE) {
>>> + i_ptnote = i;
>>> + break;
>>> + }
>>> + }
>>> + BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
>>> + memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf64_Phdr));
>>
>> is there any problem with this move? What is the batch bytes for every loop
>> of memmove?
>>
>> if i_ptnode == 2, so we have
>>
>> -------------------------------------
>> | PT_LOAD 1 | PT_LOAD 2 | PT_NOTE 1 |
>> -------------------------------------
>>
>> -->
>>
>> -------------------------------------
>> | | PT_LOAD 1 | PT_LOAD 2 |
>> -------------------------------------
>>
>> right? In the move, Does PT_LOAD 1 overwrite the original PT_LOAD 2?
>>
>
> Right and yes, see man memmove and man memcpy, and please compare
> them.
>

I see. memmove will always handle well even if there is overlapping between dest and src.

Thanks
Zhang

2013-03-06 00:09:34

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v2 03/20] vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data size

From: HATAYAMA Daisuke <[email protected]>
Subject: [PATCH v2 03/20] vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data size
Date: Sat, 2 Mar 2013 17:36:05 +0900

> p_memsz member of program header entry with PT_NOTE type needs to have
> size of the corresponding ELF note segment. Currently, vmcoreinfo
> exports data part only. If vmcoreinfo reachs vmcoreinfo_max_size, then
> in merge_note_headers_elf{32,64}, empty ELF note header cannot be
> found or buffer overrun can happen.

Sorry, I noticed this "buffer overrun can happen" was completely
wrong. In merge_note_headers_elf{32,64}, the size is being checked to
avoid buffer overrun.

int j;
void *notes_section;
struct vmcore *new;
u64 offset, max_sz, sz, real_sz = 0;
...
for (j = 0; j < max_sz; j += sz) {
if (nhdr_ptr->n_namesz == 0)
break;
sz = sizeof(Elf32_Nhdr) +
((nhdr_ptr->n_namesz + 3) & ~3) +
((nhdr_ptr->n_descsz + 3) & ~3);
real_sz += sz;
nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
}

But later patch changes teminator of ELF note segments from the null
not header to NT_VMCORE_PAD note type. It's important to export a
whole buffer for ELF note segments, not data part only. This patch
description doesn't explain this, and I'll add this explanation in the
next version.

Also, here j has int type but the other variables compared with the j
have u64 type. This is strange, and in fact verbose because for the
purpose of the j, real_sz seems exact. I'll replace the for statement
by while statement in additional clean-up patch as:

while (real_sz < max_sz) {
..
}

Thanks.
HATAYAMA, Daisuke

2013-03-06 07:00:53

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 04/20] vmcore: allocate buffer for ELF headers on page-size alignment

于 2013年03月02日 16:36, HATAYAMA Daisuke 写道:
> Allocate buffer for ELF headers on page-size aligned boudary to
> satisfy mmap() requirement. For this, __get_free_pages() is used
> instead of kmalloc().
>
> Also, later patch will decrease actually used buffer size for ELF
> headers, so it's necessary to keep original buffer size and actually
> used buffer size separately. elfcorebuf_sz_orig keeps the original one
> and elfcorebuf_sz the actually used one.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> fs/proc/vmcore.c | 30 +++++++++++++++++++++---------
> 1 files changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index b5c9e33..1b02d01 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -31,6 +31,7 @@ static LIST_HEAD(vmcore_list);
> /* Stores the pointer to the buffer containing kernel elf core headers. */
> static char *elfcorebuf;
> static size_t elfcorebuf_sz;
> +static size_t elfcorebuf_sz_orig;
>
> /* Total size of vmcore file. */
> static u64 vmcore_size;
> @@ -610,26 +611,31 @@ static int __init parse_crash_elf64_headers(void)
>
> /* Read in all elf headers. */
> elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf64_Phdr);
> - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> + elfcorebuf_sz_orig = elfcorebuf_sz;
> + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> + get_order(elfcorebuf_sz_orig));
> if (!elfcorebuf)
> return -ENOMEM;
> addr = elfcorehdr_addr;
> rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> if (rc < 0) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
>
> /* Merge all PT_NOTE headers into one. */
> rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
> &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> @@ -665,26 +671,31 @@ static int __init parse_crash_elf32_headers(void)
>
> /* Read in all elf headers. */
> elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf32_Phdr);
> - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> + elfcorebuf_sz_orig = elfcorebuf_sz;
> + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> + get_order(elfcorebuf_sz));

Why not elfcorebuf_sz_orig here?

> if (!elfcorebuf)
> return -ENOMEM;
> addr = elfcorehdr_addr;
> rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> if (rc < 0) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
>
> /* Merge all PT_NOTE headers into one. */
> rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
> &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> @@ -766,7 +777,8 @@ void vmcore_cleanup(void)
> list_del(&m->list);
> kfree(m);
> }
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> elfcorebuf = NULL;
> }
> EXPORT_SYMBOL_GPL(vmcore_cleanup);
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2013-03-06 09:15:11

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v2 04/20] vmcore: allocate buffer for ELF headers on page-size alignment

From: Zhang Yanfei <[email protected]>
Subject: Re: [PATCH v2 04/20] vmcore: allocate buffer for ELF headers on page-size alignment
Date: Wed, 6 Mar 2013 14:57:50 +0800

> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:36, HATAYAMA Daisuke $B<LF;(B:
>> Allocate buffer for ELF headers on page-size aligned boudary to
>> satisfy mmap() requirement. For this, __get_free_pages() is used
>> instead of kmalloc().
>>
>> Also, later patch will decrease actually used buffer size for ELF
>> headers, so it's necessary to keep original buffer size and actually
>> used buffer size separately. elfcorebuf_sz_orig keeps the original one
>> and elfcorebuf_sz the actually used one.
>>
>> Signed-off-by: HATAYAMA Daisuke <[email protected]>
>> ---
>>
>> fs/proc/vmcore.c | 30 +++++++++++++++++++++---------
>> 1 files changed, 21 insertions(+), 9 deletions(-)
>>

....

>> @@ -665,26 +671,31 @@ static int __init parse_crash_elf32_headers(void)
>>
>> /* Read in all elf headers. */
>> elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf32_Phdr);
>> - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
>> + elfcorebuf_sz_orig = elfcorebuf_sz;
>> + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
>> + get_order(elfcorebuf_sz));
>
> Why not elfcorebuf_sz_orig here?
>

Thanks. I'll fix this.

Thanks.
HATAYAMA, Daisuke

2013-03-06 15:51:59

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 05/20] vmcore: round up buffer size of ELF headers by PAGE_SIZE

2013/3/2 HATAYAMA Daisuke <[email protected]>:
> To satisfy mmap() page-size boundary requirement, round up buffer size
> of ELF headers by PAGE_SIZE. The resulting value becomes offset of ELF
~~~~~~~~~~~
ELF header? or ELF program headers?

> note segments and it's assigned in unique PT_NOTE program header
> entry.
>
> Also, some part that assumes past ELF headers' size is replaced by
> this new rounded-up value.

Reviewed-by: Zhang Yanfei <[email protected]>

>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> fs/proc/vmcore.c | 16 ++++++++--------
> 1 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 1b02d01..c511cf4 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -340,7 +340,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> phdr.p_flags = 0;
> note_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> - phdr.p_offset = note_off;
> + phdr.p_offset = roundup(note_off, PAGE_SIZE);
> phdr.p_vaddr = phdr.p_paddr = 0;
> phdr.p_filesz = phdr.p_memsz = phdr_sz;
> phdr.p_align = 0;
> @@ -353,6 +353,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>
> + *elfsz = roundup(*elfsz, PAGE_SIZE);
> out:
> return 0;
> }
> @@ -449,7 +450,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> phdr.p_flags = 0;
> note_off = ehdr_ptr->e_phoff +
> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
> - phdr.p_offset = note_off;
> + phdr.p_offset = roundup(note_off, PAGE_SIZE);
> phdr.p_vaddr = phdr.p_paddr = 0;
> phdr.p_filesz = phdr.p_memsz = phdr_sz;
> phdr.p_align = 0;
> @@ -462,6 +463,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>
> + *elfsz = roundup(*elfsz, PAGE_SIZE);
> out:
> return 0;
> }
> @@ -482,9 +484,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
>
> /* First program header is PT_NOTE header. */
> - vmcore_off = ehdr_ptr->e_phoff +
> - (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
> - phdr_ptr->p_memsz; /* Note sections */
> + vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
> + PAGE_SIZE);
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> if (phdr_ptr->p_type != PT_LOAD)
> @@ -519,9 +520,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
>
> /* First program header is PT_NOTE header. */
> - vmcore_off = ehdr_ptr->e_phoff +
> - (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
> - phdr_ptr->p_memsz; /* Note sections */
> + vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
> + PAGE_SIZE);
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> if (phdr_ptr->p_type != PT_LOAD)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-06 16:03:59

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 06/20] vmcore, procfs: introduce a flag to distinguish objects copied in 2nd kernel

2013/3/2 HATAYAMA Daisuke <[email protected]>:
> The part of dump target memory is copied into the 2nd kernel if it
> doesn't satisfy mmap()'s page-size boundary requirement. To
> distinguish such copied object from usual old memory, a flag
> MEM_TYPE_CURRENT_KERNEL is introduced. If this flag is set, the object
> is considered being copied into buffer on the 2nd kernel.

Reviewed-by: Zhang Yanfei <[email protected]>

>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> include/linux/proc_fs.h | 8 +++++++-
> 1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 8307f2f..11dd592 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -97,11 +97,17 @@ struct kcore_list {
> int type;
> };
>
> +#define MEM_TYPE_CURRENT_KERNEL 0x1
> +
> struct vmcore {
> struct list_head list;
> - unsigned long long paddr;
> + union {
> + unsigned long long paddr;
> + char *buf;
> + };
> unsigned long long size;
> loff_t offset;
> + unsigned int flag;
> };
>
> #ifdef CONFIG_PROC_FS
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-07 10:13:05

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

于 2013年03月02日 16:37, HATAYAMA Daisuke 写道:
> Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
> note type to make them satisfy mmap()'s page-size boundary
> requirement.
>
> So far, end of note segments has been marked by zero-filled elf
> header. Instead, this patch writes NT_VMCORE_PAD note in the end of
> note segments until the offset on page-size boundary.


In the codes below, it seems that you assign name "VMCOREINFO" for
note type NT_VMCORE_PAD, right? This is kind of wired, i think. This
name has been used for NT_VMCORE_DEBUGINFO note already. Why not something
like "VMCOREPAD" or "PAD"?

>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> arch/s390/include/asm/kexec.h | 7 ++++--
> include/linux/kexec.h | 12 ++++++-----
> kernel/kexec.c | 46 ++++++++++++++++++++++++++---------------
> 3 files changed, 40 insertions(+), 25 deletions(-)
>
> diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
> index 694bcd6..2a531ce 100644
> --- a/arch/s390/include/asm/kexec.h
> +++ b/arch/s390/include/asm/kexec.h
> @@ -41,8 +41,8 @@
> /*
> * Size for s390x ELF notes per CPU
> *
> - * Seven notes plus zero note at the end: prstatus, fpregset, timer,
> - * tod_cmp, tod_reg, control regs, and prefix
> + * Seven notes plus note with NT_VMCORE_PAD type at the end: prstatus,
> + * fpregset, timer, tod_cmp, tod_reg, control regs, and prefix
> */
> #define KEXEC_NOTE_BYTES \
> (ALIGN(sizeof(struct elf_note), 4) * 8 + \
> @@ -53,7 +53,8 @@
> ALIGN(sizeof(u64), 4) + \
> ALIGN(sizeof(u32), 4) + \
> ALIGN(sizeof(u64) * 16, 4) + \
> - ALIGN(sizeof(u32), 4) \
> + ALIGN(sizeof(u32), 4) + \
> + VMCOREINFO_NOTE_NAME_BYTES \
> )
>
> /* Provide a dummy definition to avoid build failures. */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 5113570..6592935 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -47,14 +47,16 @@
> #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
> #define KEXEC_CORE_NOTE_DESC_BYTES ALIGN(sizeof(struct elf_prstatus), 4)
> /*
> - * The per-cpu notes area is a list of notes terminated by a "NULL"
> - * note header. For kdump, the code in vmcore.c runs in the context
> - * of the second kernel to combine them into one note.
> + * The per-cpu notes area is a list of notes terminated by a note
> + * header with NT_VMCORE_PAD type. For kdump, the code in vmcore.c
> + * runs in the context of the second kernel to combine them into one
> + * note.
> */
> #ifndef KEXEC_NOTE_BYTES
> #define KEXEC_NOTE_BYTES ( (KEXEC_NOTE_HEAD_BYTES * 2) + \
> KEXEC_CORE_NOTE_NAME_BYTES + \
> - KEXEC_CORE_NOTE_DESC_BYTES )
> + KEXEC_CORE_NOTE_DESC_BYTES + \
> + VMCOREINFO_NOTE_NAME_BYTES)
> #endif
>
> /*
> @@ -187,7 +189,7 @@ extern struct kimage *kexec_crash_image;
> #define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4)
> #define VMCOREINFO_NOTE_SIZE ALIGN(KEXEC_NOTE_HEAD_BYTES*2 \
> +VMCOREINFO_BYTES \
> - +VMCOREINFO_NOTE_NAME_BYTES, \
> + +VMCOREINFO_NOTE_NAME_BYTES*2, \
> PAGE_SIZE)
>
> /* Location of a reserved region to hold the crash kernel.
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 6597b82..fbdc0f0 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -40,6 +40,7 @@
>
> /* Per cpu memory for storing cpu states in case of system crash. */
> note_buf_t __percpu *crash_notes;
> +static size_t crash_notes_size = ALIGN(sizeof(note_buf_t), PAGE_SIZE);
>
> /* vmcoreinfo stuff */
> static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
> @@ -1177,6 +1178,7 @@ unlock:
> return ret;
> }
>
> +/* If @data is NULL, fill @buf with 0 in @data_len bytes. */
> static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
> size_t data_len)
> {
> @@ -1189,26 +1191,36 @@ static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
> buf += (sizeof(note) + 3)/4;
> memcpy(buf, name, note.n_namesz);
> buf += (note.n_namesz + 3)/4;
> - memcpy(buf, data, note.n_descsz);
> + if (data)
> + memcpy(buf, data, note.n_descsz);
> + else
> + memset(buf, 0, note.n_descsz);
> buf += (note.n_descsz + 3)/4;
>
> return buf;
> }
>
> -static void final_note(u32 *buf)
> +static void final_note(u32 *buf, size_t buf_len, size_t data_len)
> {
> - struct elf_note note;
> + size_t used_bytes, pad_hdr_size;
>
> - note.n_namesz = 0;
> - note.n_descsz = 0;
> - note.n_type = 0;
> - memcpy(buf, &note, sizeof(note));
> + pad_hdr_size = KEXEC_NOTE_HEAD_BYTES + VMCOREINFO_NOTE_NAME_BYTES;
> +
> + /*
> + * keep space for ELF note header and "VMCOREINFO" name to
> + * terminate ELF segment by NT_VMCORE_PAD note.
> + */
> + BUG_ON(data_len + pad_hdr_size > buf_len);
> +
> + used_bytes = data_len + pad_hdr_size;
> + append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_PAD, NULL,
> + roundup(used_bytes, PAGE_SIZE) - used_bytes);
> }
>
> void crash_save_cpu(struct pt_regs *regs, int cpu)
> {
> struct elf_prstatus prstatus;
> - u32 *buf;
> + u32 *buf, *buf_end;
>
> if ((cpu < 0) || (cpu >= nr_cpu_ids))
> return;
> @@ -1226,16 +1238,15 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> memset(&prstatus, 0, sizeof(prstatus));
> prstatus.pr_pid = current->pid;
> elf_core_copy_kernel_regs(&prstatus.pr_reg, regs);
> - buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS,
> - &prstatus, sizeof(prstatus));
> - final_note(buf);
> + buf_end = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS,
> + &prstatus, sizeof(prstatus));
> + final_note(buf_end, crash_notes_size, (buf_end - buf) * sizeof(u32));
> }
>
> static int __init crash_notes_memory_init(void)
> {
> /* Allocate memory for saving cpu registers. */
> - crash_notes = __alloc_percpu(roundup(sizeof(note_buf_t), PAGE_SIZE),
> - PAGE_SIZE);
> + crash_notes = __alloc_percpu(crash_notes_size, PAGE_SIZE);
> if (!crash_notes) {
> printk("Kexec: Memory allocation for saving cpu register"
> " states failed\n");
> @@ -1434,13 +1445,14 @@ int __init parse_crashkernel_low(char *cmdline,
>
> static void update_vmcoreinfo_note(void)
> {
> - u32 *buf = vmcoreinfo_note;
> + u32 *buf = vmcoreinfo_note, *buf_end;
>
> if (!vmcoreinfo_size)
> return;
> - buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_DEBUGINFO,
> - vmcoreinfo_data, vmcoreinfo_size);
> - final_note(buf);
> + buf_end = append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_DEBUGINFO,
> + vmcoreinfo_data, vmcoreinfo_size);
> + final_note(buf_end, sizeof(vmcoreinfo_note),
> + (buf_end - buf) * sizeof(u32));
> }
>
> void crash_save_vmcoreinfo(void)
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2013-03-08 01:55:29

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

From: Zhang Yanfei <[email protected]>
Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
Date: Thu, 7 Mar 2013 18:11:30 +0800

> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:37, HATAYAMA Daisuke $B<LF;(B:
>> Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
>> note type to make them satisfy mmap()'s page-size boundary
>> requirement.
>>
>> So far, end of note segments has been marked by zero-filled elf
>> header. Instead, this patch writes NT_VMCORE_PAD note in the end of
>> note segments until the offset on page-size boundary.
>
>
> In the codes below, it seems that you assign name "VMCOREINFO" for
> note type NT_VMCORE_PAD, right? This is kind of wired, i think. This
> name has been used for NT_VMCORE_DEBUGINFO note already. Why not something
> like "VMCOREPAD" or "PAD"?
>

It looks you are confusing or don't know name and type. The name is
namespace and in the namespace, there are multiple note types, each of
which has the corresponding data. In other words, data corresponding
to types differ if they belong to differnet name space even if
integers of the types are coincide with.

The "VMCOREINFO" name represents information exported from
/proc/vmcore that is used in kdump framework. In this sense,
NT_VMCORE_PAD that is specific for /proc/vmcore and kdump framework,
should belong to the "VMCOREINFO" name.

Thanks.
HATAYAMA, Daisuke

2013-03-08 13:02:53

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

2013/3/8 HATAYAMA Daisuke <[email protected]>:
> From: Zhang Yanfei <[email protected]>
> Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
> Date: Thu, 7 Mar 2013 18:11:30 +0800
>
>> 于 2013年03月02日 16:37, HATAYAMA Daisuke 写道:
>>> Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
>>> note type to make them satisfy mmap()'s page-size boundary
>>> requirement.
>>>
>>> So far, end of note segments has been marked by zero-filled elf
>>> header. Instead, this patch writes NT_VMCORE_PAD note in the end of
>>> note segments until the offset on page-size boundary.
>>
>>
>> In the codes below, it seems that you assign name "VMCOREINFO" for
>> note type NT_VMCORE_PAD, right? This is kind of wired, i think. This
>> name has been used for NT_VMCORE_DEBUGINFO note already. Why not something
>> like "VMCOREPAD" or "PAD"?
>>
>
> It looks you are confusing or don't know name and type. The name is
> namespace and in the namespace, there are multiple note types, each of
> which has the corresponding data. In other words, data corresponding
> to types differ if they belong to differnet name space even if
> integers of the types are coincide with.

Yes, I knew this. Just as the spec said " a program must recognize both the name
and the type to recognize a descriptor.". But I cannot understand what your word
"namespace" came from? I think you complicate simple things here.

Only with a type, we cannot recognize a descriptor, because "multiple
interpretations of
a single type value may exist", So we should combine the name and the type
together. If both the name and type of two descriptors are the same,
we could say we
have two same descriptors. If one of them (type or name) are
different, we say the
two descriptors are different and the two notes have different data.

If I am wrong, please correct me.

>
> The "VMCOREINFO" name represents information exported from
> /proc/vmcore that is used in kdump framework. In this sense,
> NT_VMCORE_PAD that is specific for /proc/vmcore and kdump framework,
> should belong to the "VMCOREINFO" name.

I cannot understand the name explanation totally. Does the name really
have this meaning? Is there any authentic document? I was always thinking we
could feel free to name a name by ourselves!

>
> Thanks.
> HATAYAMA, Daisuke
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-09 03:47:01

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

From: Yanfei Zhang <[email protected]>
Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
Date: Fri, 8 Mar 2013 21:02:50 +0800

> 2013/3/8 HATAYAMA Daisuke <[email protected]>:
>> From: Zhang Yanfei <[email protected]>
>> Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
>> Date: Thu, 7 Mar 2013 18:11:30 +0800
>>
>>> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:37, HATAYAMA Daisuke $B<LF;(B:
>>>> Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
>>>> note type to make them satisfy mmap()'s page-size boundary
>>>> requirement.
>>>>
>>>> So far, end of note segments has been marked by zero-filled elf
>>>> header. Instead, this patch writes NT_VMCORE_PAD note in the end of
>>>> note segments until the offset on page-size boundary.
>>>
>>>
>>> In the codes below, it seems that you assign name "VMCOREINFO" for
>>> note type NT_VMCORE_PAD, right? This is kind of wired, i think. This
>>> name has been used for NT_VMCORE_DEBUGINFO note already. Why not something
>>> like "VMCOREPAD" or "PAD"?
>>>
>>
>> It looks you are confusing or don't know name and type. The name is
>> namespace and in the namespace, there are multiple note types, each of
>> which has the corresponding data. In other words, data corresponding
>> to types differ if they belong to differnet name space even if
>> integers of the types are coincide with.
>
> Yes, I knew this. Just as the spec said " a program must recognize both the name
> and the type to recognize a descriptor.". But I cannot understand what your word
> "namespace" came from? I think you complicate simple things here.
>
> Only with a type, we cannot recognize a descriptor, because "multiple
> interpretations of
> a single type value may exist", So we should combine the name and the type
> together. If both the name and type of two descriptors are the same,
> we could say we
> have two same descriptors. If one of them (type or name) are
> different, we say the
> two descriptors are different and the two notes have different data.
>
> If I am wrong, please correct me.

??? I think you're saying here the same thing as my explanation above.

Although the term ''name space'' never occurs in ELF, it seems to me
standard to represent the same values as different ones by combining
additional elements as names to them.

Well, formally, it is represented as simply tuples or vector
space. For example, support set S and S' and define new set S x S' by

S x S' := { (s, s') | s in S, s' in S' }

and equality of the S x S' are defined as usual:

(s1, s1') == (s2, s2') iff s1 == s2 and s1' == s2'.

In ELF, S is names and S' is types. There's no other formal meaning
there.

>>
>> The "VMCOREINFO" name represents information exported from
>> /proc/vmcore that is used in kdump framework. In this sense,
>> NT_VMCORE_PAD that is specific for /proc/vmcore and kdump framework,
>> should belong to the "VMCOREINFO" name.
>
> I cannot understand the name explanation totally. Does the name really
> have this meaning? Is there any authentic document? I was always thinking we
> could feel free to name a name by ourselves!

Of course, it's optional for you to decide how to name notes within
the mechanism. But it's important to treat naming for ease of managing
note types. In addition to the above formal definition, it's important
to consider what name gives us. It's readability, telling us that note
types that belong to unique name are treated in common in the sense of
the name. This is apart from the formal definition above.

It's certainly possible to distinguish notes by giving names only and
not giving types. For example, imagine there are new 27 notes and they
have different names but have 0 as type.

name type
"SOME_NOTE_A" 0
"SOME_NOTE_B" 0
...
"SOME_NOTE_Z" 0

Also, for example,

name type
"SOME_NOTE" 0 => NT_SOME_NOTE_A
"SOME_NOTE" 1 => NT_SOME_NOTE_B
...
"SOME_NOTE" 26 => NT_SOME_NOTE_Z

For the former case, it *looks to me* that space of time is not used
effectively and it *looks to me* that space of name is not consumed
efficiently.

After all, it amounts to individual preference about naming. I cannot
say anything more.

Thanks.
HATAYAMA, Daisuke

2013-03-10 02:35:25

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

$BP2(B 2013$BG/(B03$B7n(B09$BF|(B 11:46, HATAYAMA Daisuke $B<LF;(B:
> From: Yanfei Zhang <[email protected]>
> Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
> Date: Fri, 8 Mar 2013 21:02:50 +0800
>
>> 2013/3/8 HATAYAMA Daisuke <[email protected]>:
>>> From: Zhang Yanfei <[email protected]>
>>> Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
>>> Date: Thu, 7 Mar 2013 18:11:30 +0800
>>>
>>>> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:37, HATAYAMA Daisuke $B<LF;(B:
>>>>> Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
>>>>> note type to make them satisfy mmap()'s page-size boundary
>>>>> requirement.
>>>>>
>>>>> So far, end of note segments has been marked by zero-filled elf
>>>>> header. Instead, this patch writes NT_VMCORE_PAD note in the end of
>>>>> note segments until the offset on page-size boundary.
>>>>
>>>>
>>>> In the codes below, it seems that you assign name "VMCOREINFO" for
>>>> note type NT_VMCORE_PAD, right? This is kind of wired, i think. This
>>>> name has been used for NT_VMCORE_DEBUGINFO note already. Why not something
>>>> like "VMCOREPAD" or "PAD"?
>>>>
>>>
>>> It looks you are confusing or don't know name and type. The name is
>>> namespace and in the namespace, there are multiple note types, each of
>>> which has the corresponding data. In other words, data corresponding
>>> to types differ if they belong to differnet name space even if
>>> integers of the types are coincide with.
>>
>> Yes, I knew this. Just as the spec said " a program must recognize both the name
>> and the type to recognize a descriptor.". But I cannot understand what your word
>> "namespace" came from? I think you complicate simple things here.
>>
>> Only with a type, we cannot recognize a descriptor, because "multiple
>> interpretations of
>> a single type value may exist", So we should combine the name and the type
>> together. If both the name and type of two descriptors are the same,
>> we could say we
>> have two same descriptors. If one of them (type or name) are
>> different, we say the
>> two descriptors are different and the two notes have different data.
>>
>> If I am wrong, please correct me.
>
> ??? I think you're saying here the same thing as my explanation above.
>
> Although the term ''name space'' never occurs in ELF, it seems to me
> standard to represent the same values as different ones by combining
> additional elements as names to them.
>
> Well, formally, it is represented as simply tuples or vector
> space. For example, support set S and S' and define new set S x S' by
>
> S x S' := { (s, s') | s in S, s' in S' }
>
> and equality of the S x S' are defined as usual:
>
> (s1, s1') == (s2, s2') iff s1 == s2 and s1' == s2'.
>
> In ELF, S is names and S' is types. There's no other formal meaning
> there.
>
>>>
>>> The "VMCOREINFO" name represents information exported from
>>> /proc/vmcore that is used in kdump framework. In this sense,
>>> NT_VMCORE_PAD that is specific for /proc/vmcore and kdump framework,
>>> should belong to the "VMCOREINFO" name.
>>
>> I cannot understand the name explanation totally. Does the name really
>> have this meaning? Is there any authentic document? I was always thinking we
>> could feel free to name a name by ourselves!
>
> Of course, it's optional for you to decide how to name notes within
> the mechanism. But it's important to treat naming for ease of managing
> note types. In addition to the above formal definition, it's important
> to consider what name gives us. It's readability, telling us that note
> types that belong to unique name are treated in common in the sense of
> the name. This is apart from the formal definition above.
>
> It's certainly possible to distinguish notes by giving names only and
> not giving types. For example, imagine there are new 27 notes and they
> have different names but have 0 as type.
>
> name type
> "SOME_NOTE_A" 0
> "SOME_NOTE_B" 0
> ...
> "SOME_NOTE_Z" 0
>
> Also, for example,
>
> name type
> "SOME_NOTE" 0 => NT_SOME_NOTE_A
> "SOME_NOTE" 1 => NT_SOME_NOTE_B
> ...
> "SOME_NOTE" 26 => NT_SOME_NOTE_Z
>
> For the former case, it *looks to me* that space of time is not used
> effectively and it *looks to me* that space of name is not consumed
> efficiently.
>
> After all, it amounts to individual preference about naming. I cannot
> say anything more.
>

I see. I know what you mean now. I was just surprised and puzzled about your
"namespace" concept.

Other than the name of NT_VMCORE_PAD, no complaints about the code.

Reviewed-by: Zhang Yanfei <[email protected]>

2013-03-10 06:48:07

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

于 2013年03月05日 15:35, Zhang Yanfei 写道:
> 于 2013年03月02日 16:35, HATAYAMA Daisuke 写道:
>> Code around /proc/vmcore currently assumes program header table is
>> next to ELF header. But future change can break the assumption on
>> kexec-tools and the 1st kernel. To avoid worst case, now refer to
>> e_phoff member that indicates position of program header table in
>> file-offset.
>
> Reviewed-by: Zhang Yanfei <[email protected]>
>
>>
>> Signed-off-by: HATAYAMA Daisuke <[email protected]>
>> ---
>>
>> fs/proc/vmcore.c | 40 ++++++++++++++++++++--------------------
>> 1 files changed, 20 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index b870f74..abf4f01 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -221,8 +221,8 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>> Elf64_Phdr *phdr_ptr;
>>
>> ehdr_ptr = (Elf64_Ehdr *)elfptr;
>> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
>> - size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
>> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>> + size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
>> for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>> size += phdr_ptr->p_memsz;
>> phdr_ptr++;
>> @@ -238,8 +238,8 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>> Elf32_Phdr *phdr_ptr;
>>
>> ehdr_ptr = (Elf32_Ehdr *)elfptr;
>> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
>> - size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
>> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>> + size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
>> for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>> size += phdr_ptr->p_memsz;
>> phdr_ptr++;
>> @@ -259,7 +259,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> u64 phdr_sz = 0, note_off;
>>
>> ehdr_ptr = (Elf64_Ehdr *)elfptr;
>> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
>> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>> int j;
>> void *notes_section;
>> @@ -305,7 +305,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> /* Prepare merged PT_NOTE program header. */
>> phdr.p_type = PT_NOTE;
>> phdr.p_flags = 0;
>> - note_off = sizeof(Elf64_Ehdr) +
>> + note_off = ehdr_ptr->e_phoff +
>> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
>> phdr.p_offset = note_off;
>> phdr.p_vaddr = phdr.p_paddr = 0;
>> @@ -313,14 +313,14 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> phdr.p_align = 0;
>>
>> /* Add merged PT_NOTE program header*/
>> - tmp = elfptr + sizeof(Elf64_Ehdr);
>> + tmp = elfptr + ehdr_ptr->e_phoff;
>> memcpy(tmp, &phdr, sizeof(phdr));
>> tmp += sizeof(phdr);
>>
>> /* Remove unwanted PT_NOTE program headers. */
>> i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>> *elfsz = *elfsz - i;
>> - memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
>> + memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
>>
>> /* Modify e_phnum to reflect merged headers. */
>> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>> @@ -340,7 +340,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>> u64 phdr_sz = 0, note_off;
>>
>> ehdr_ptr = (Elf32_Ehdr *)elfptr;
>> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
>> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>> int j;
>> void *notes_section;
>> @@ -386,7 +386,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>> /* Prepare merged PT_NOTE program header. */
>> phdr.p_type = PT_NOTE;
>> phdr.p_flags = 0;
>> - note_off = sizeof(Elf32_Ehdr) +
>> + note_off = ehdr_ptr->e_phoff +
>> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
>> phdr.p_offset = note_off;
>> phdr.p_vaddr = phdr.p_paddr = 0;
>> @@ -394,14 +394,14 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>> phdr.p_align = 0;
>>
>> /* Add merged PT_NOTE program header*/
>> - tmp = elfptr + sizeof(Elf32_Ehdr);
>> + tmp = elfptr + ehdr_ptr->e_phoff;
>> memcpy(tmp, &phdr, sizeof(phdr));
>> tmp += sizeof(phdr);
>>
>> /* Remove unwanted PT_NOTE program headers. */
>> i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
>> *elfsz = *elfsz - i;
>> - memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
>> + memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));
>>
>> /* Modify e_phnum to reflect merged headers. */
>> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>> @@ -422,10 +422,10 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>> struct vmcore *new;
>>
>> ehdr_ptr = (Elf64_Ehdr *)elfptr;
>> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
>>
>> /* First program header is PT_NOTE header. */
>> - vmcore_off = sizeof(Elf64_Ehdr) +
>> + vmcore_off = ehdr_ptr->e_phoff +
>> (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
>> phdr_ptr->p_memsz; /* Note sections */
>>
>> @@ -459,10 +459,10 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>> struct vmcore *new;
>>
>> ehdr_ptr = (Elf32_Ehdr *)elfptr;
>> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
>>
>> /* First program header is PT_NOTE header. */
>> - vmcore_off = sizeof(Elf32_Ehdr) +
>> + vmcore_off = ehdr_ptr->e_phoff +
>> (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
>> phdr_ptr->p_memsz; /* Note sections */
>>
>> @@ -496,7 +496,7 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
>> ehdr_ptr = (Elf64_Ehdr *)elfptr;
>>
>> /* Skip Elf header and program headers. */
>> - vmcore_off = sizeof(Elf64_Ehdr) +
>> + vmcore_off = ehdr_ptr->e_phoff +
>> (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
>>
>> list_for_each_entry(m, vc_list, list) {
>> @@ -516,7 +516,7 @@ static void __init set_vmcore_list_offsets_elf32(char *elfptr,
>> ehdr_ptr = (Elf32_Ehdr *)elfptr;
>>
>> /* Skip Elf header and program headers. */
>> - vmcore_off = sizeof(Elf32_Ehdr) +
>> + vmcore_off = ehdr_ptr->e_phoff +
>> (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
>>
>> list_for_each_entry(m, vc_list, list) {
>> @@ -553,7 +553,7 @@ static int __init parse_crash_elf64_headers(void)
>> }
>>
>> /* Read in all elf headers. */
>> - elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
>> + elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf64_Phdr);

One minor suggestion.

Previously, when the code assumes program headers are following immediately
the ELF header, it uses

elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);

to calculate the size of ELF header and ELF program headers

This patch avoids the assumption, and uses ehdr.e_phoff to get the program
headers' address. But it will read unrelated contents into elfcorebuf if
program headers are not following immediately the ELF header. So could the
code be:

elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
addr = elfcorehdr_addr + ehdr.e_phoff;
memcpy(elfcorebuf, &ehdr, sizeof(Elf64_Ehdr));
read_from_oldmem(elfcorebuf + sizeof(Elf64_Ehdr), elfcorebuf_sz -
sizeof(Elf64_Ehdr), &addr, 0);
(Elf64_Ehdr *)elfcorebuf->e_phoff = sizeof(Elf64_Ehdr);

>> elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
>> if (!elfcorebuf)
>> return -ENOMEM;
>> @@ -608,7 +608,7 @@ static int __init parse_crash_elf32_headers(void)
>> }
>>
>> /* Read in all elf headers. */
>> - elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
>> + elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf32_Phdr);
>> elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
>> if (!elfcorebuf)
>> return -ENOMEM;
>>
>>
>> _______________________________________________
>> kexec mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/kexec
>>
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2013-03-10 07:12:04

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v2 07/20] vmcore: copy non page-size aligned head and tail pages in 2nd kernel

于 2013年03月02日 16:36, HATAYAMA Daisuke 写道:
> Due to mmap() requirement, we need to copy pages not starting or
> ending with page-size aligned address in 2nd kernel and to map them to
> user-space.
>
> For example, see the map below:
>
> 00000000-0001ffff : reserved
> 00010000-0009f7ff : System RAM
> 0009f800-0009ffff : reserved
>
> where the System RAM ends with 0x9f800 that is not page-size
> aligned. This map is divided into two parts:
>
> 00010000-0009dfff

00010000-0009efff

> 0009f000-0009f7ff
>
> and the first one is kept in old memory and the 2nd one is copied into
> buffer on 2nd kernel.
>
> This kind of non-page-size-aligned area can always occur since any
> part of System RAM can be converted into reserved area at runtime.
>
> If not doing copying like this and if remapping non page-size aligned
> pages on old memory directly, mmap() had to export memory which is not
> dump target to user-space. In the above example this is reserved
> 0x9f800-0xa0000.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> fs/proc/vmcore.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 files changed, 172 insertions(+), 20 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index c511cf4..6b071b4 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -474,11 +474,10 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> size_t elfsz,
> struct list_head *vc_list)
> {
> - int i;
> + int i, rc;
> Elf64_Ehdr *ehdr_ptr;
> Elf64_Phdr *phdr_ptr;
> loff_t vmcore_off;
> - struct vmcore *new;
>
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
> phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
> @@ -488,20 +487,97 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> PAGE_SIZE);
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> + u64 start, end, rest;
> +
> if (phdr_ptr->p_type != PT_LOAD)
> continue;
>
> - /* Add this contiguous chunk of memory to vmcore list.*/
> - new = get_new_element();
> - if (!new)
> - return -ENOMEM;
> - new->paddr = phdr_ptr->p_offset;
> - new->size = phdr_ptr->p_memsz;
> - list_add_tail(&new->list, vc_list);
> + start = phdr_ptr->p_offset;
> + end = phdr_ptr->p_offset + phdr_ptr->p_memsz;
> + rest = phdr_ptr->p_memsz;
> +
> + if (start & ~PAGE_MASK) {
> + u64 paddr, len;
> + char *buf;
> + struct vmcore *new;
> +
> + paddr = start;
> + len = min(roundup(start,PAGE_SIZE), end) - start;
> +
> + buf = (char *)get_zeroed_page(GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> + rc = read_from_oldmem(buf + (start & ~PAGE_MASK), len,
> + &paddr, 0);
> + if (rc < 0) {
> + free_pages((unsigned long)buf, 0);
> + return rc;
> + }
> +
> + new = get_new_element();
> + if (!new) {
> + free_pages((unsigned long)buf, 0);
> + return -ENOMEM;
> + }
> + new->flag |= MEM_TYPE_CURRENT_KERNEL;
> + new->size = PAGE_SIZE;
> + new->buf = buf;
> + list_add_tail(&new->list, vc_list);
> +
> + rest -= len;
> + }
> +
> + if (rest > 0 &&
> + roundup(start, PAGE_SIZE) < rounddown(end, PAGE_SIZE)) {
> + u64 paddr, len;
> + struct vmcore *new;
> +
> + paddr = roundup(start, PAGE_SIZE);
> + len =rounddown(end,PAGE_SIZE)-roundup(start,PAGE_SIZE);
> +
> + new = get_new_element();
> + if (!new)
> + return -ENOMEM;
> + new->paddr = paddr;
> + new->size = len;
> + list_add_tail(&new->list, vc_list);
> +
> + rest -= len;
> + }
> +
> + if (rest > 0) {
> + u64 paddr, len;
> + char *buf;
> + struct vmcore *new;
> +
> + paddr = rounddown(end, PAGE_SIZE);
> + len = end - rounddown(end, PAGE_SIZE);
> +
> + buf = (char *)get_zeroed_page(GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> + rc = read_from_oldmem(buf, len, &paddr, 0);
> + if (rc < 0) {
> + free_pages((unsigned long)buf, 0);
> + return rc;
> + }
> +
> + new = get_new_element();
> + if (!new) {
> + free_pages((unsigned long)buf, 0);
> + return -ENOMEM;
> + }
> + new->flag |= MEM_TYPE_CURRENT_KERNEL;
> + new->size = PAGE_SIZE;
> + new->buf = buf;
> + list_add_tail(&new->list, vc_list);
> +
> + rest -= len;
> + }
>
> /* Update the program header offset. */
> phdr_ptr->p_offset = vmcore_off;
> - vmcore_off = vmcore_off + phdr_ptr->p_memsz;
> + vmcore_off +=roundup(end,PAGE_SIZE)-rounddown(start,PAGE_SIZE);

Here the code changes phdr_ptr->p_offset to a new page-size aligned offset.
But it seems the phdr_ptr->p_paddr is still the non page-size aligned
physical address? Does the mismatch of a PT_LOAD segment and the physical
memory occur?

Or, later in makedumpfile, it will check the phdr_ptr->paddr to see if it
is page-size aligned and also phdr_ptr->p_memsz to get the real memory size,
not including padding?

> }
> return 0;
> }
> @@ -510,11 +586,10 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> size_t elfsz,
> struct list_head *vc_list)
> {
> - int i;
> + int i, rc;
> Elf32_Ehdr *ehdr_ptr;
> Elf32_Phdr *phdr_ptr;
> loff_t vmcore_off;
> - struct vmcore *new;
>
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
> phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
> @@ -524,20 +599,97 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> PAGE_SIZE);
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> + u64 start, end, rest;
> +
> if (phdr_ptr->p_type != PT_LOAD)
> continue;
>
> - /* Add this contiguous chunk of memory to vmcore list.*/
> - new = get_new_element();
> - if (!new)
> - return -ENOMEM;
> - new->paddr = phdr_ptr->p_offset;
> - new->size = phdr_ptr->p_memsz;
> - list_add_tail(&new->list, vc_list);
> + start = phdr_ptr->p_offset;
> + end = phdr_ptr->p_offset + phdr_ptr->p_memsz;
> + rest = phdr_ptr->p_memsz;
> +
> + if (start & ~PAGE_MASK) {
> + u64 paddr, len;
> + char *buf;
> + struct vmcore *new;
> +
> + paddr = start;
> + len = min(roundup(start,PAGE_SIZE), end) - start;
> +
> + buf = (char *)get_zeroed_page(GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> + rc = read_from_oldmem(buf + (start & ~PAGE_MASK), len,
> + &paddr, 0);
> + if (rc < 0) {
> + free_pages((unsigned long)buf, 0);
> + return rc;
> + }
> +
> + new = get_new_element();
> + if (!new) {
> + free_pages((unsigned long)buf, 0);
> + return -ENOMEM;
> + }
> + new->flag |= MEM_TYPE_CURRENT_KERNEL;
> + new->size = PAGE_SIZE;
> + new->buf = buf;
> + list_add_tail(&new->list, vc_list);
> +
> + rest -= len;
> + }
> +
> + if (rest > 0 &&
> + roundup(start, PAGE_SIZE) < rounddown(end, PAGE_SIZE)) {
> + u64 paddr, len;
> + struct vmcore *new;
> +
> + paddr = roundup(start, PAGE_SIZE);
> + len =rounddown(end,PAGE_SIZE)-roundup(start,PAGE_SIZE);
> +
> + new = get_new_element();
> + if (!new)
> + return -ENOMEM;
> + new->paddr = paddr;
> + new->size = len;
> + list_add_tail(&new->list, vc_list);
> +
> + rest -= len;
> + }
> +
> + if (rest > 0) {
> + u64 paddr, len;
> + char *buf;
> + struct vmcore *new;
> +
> + paddr = rounddown(end, PAGE_SIZE);
> + len = end - rounddown(end, PAGE_SIZE);
> +
> + buf = (char *)get_zeroed_page(GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> + rc = read_from_oldmem(buf, len, &paddr, 0);
> + if (rc < 0) {
> + free_pages((unsigned long)buf, 0);
> + return rc;
> + }
> +
> + new = get_new_element();
> + if (!new) {
> + free_pages((unsigned long)buf, 0);
> + return -ENOMEM;
> + }
> + new->flag |= MEM_TYPE_CURRENT_KERNEL;
> + new->size = PAGE_SIZE;
> + new->buf = buf;
> + list_add_tail(&new->list, vc_list);
> +
> + rest -= len;
> + }
>
> /* Update the program header offset */
> phdr_ptr->p_offset = vmcore_off;
> - vmcore_off = vmcore_off + phdr_ptr->p_memsz;
> + vmcore_off +=roundup(end,PAGE_SIZE)-rounddown(start,PAGE_SIZE);
> }
> return 0;
> }
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2013-03-11 00:28:10

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v2 07/20] vmcore: copy non page-size aligned head and tail pages in 2nd kernel

From: Zhang Yanfei <[email protected]>
Subject: Re: [PATCH v2 07/20] vmcore: copy non page-size aligned head and tail pages in 2nd kernel
Date: Sun, 10 Mar 2013 14:16:51 +0800

> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:36, HATAYAMA Daisuke $B<LF;(B:
<cut>
>> /* Update the program header offset. */
>> phdr_ptr->p_offset = vmcore_off;
>> - vmcore_off = vmcore_off + phdr_ptr->p_memsz;
>> + vmcore_off +=roundup(end,PAGE_SIZE)-rounddown(start,PAGE_SIZE);
>
> Here the code changes phdr_ptr->p_offset to a new page-size aligned offset.
> But it seems the phdr_ptr->p_paddr is still the non page-size aligned
> physical address? Does the mismatch of a PT_LOAD segment and the physical
> memory occur?
>
> Or, later in makedumpfile, it will check the phdr_ptr->paddr to see if it
> is page-size aligned and also phdr_ptr->p_memsz to get the real memory size,
> not including padding?
>

No, it doesn't happen. p_offset is made page-size aligned for
requirement to user-space linear-address vs file offset. The
requirement for physical address is handled internally in vmcore_list,
which is transparent to to user-space.

Thanks.
HATAYAMA, Daisuke

2013-03-11 00:31:45

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

From: Zhang Yanfei <[email protected]>
Subject: Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly
Date: Sun, 10 Mar 2013 14:46:31 +0800

> $BP2(B 2013$BG/(B03$B7n(B05$BF|(B 15:35, Zhang Yanfei $B<LF;(B:
>> $BP2(B 2013$BG/(B03$B7n(B02$BF|(B 16:35, HATAYAMA Daisuke $B<LF;(B:
<cut>
>
> One minor suggestion.
>
> Previously, when the code assumes program headers are following immediately
> the ELF header, it uses
>
> elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
>
> to calculate the size of ELF header and ELF program headers
>
> This patch avoids the assumption, and uses ehdr.e_phoff to get the program
> headers' address. But it will read unrelated contents into elfcorebuf if
> program headers are not following immediately the ELF header. So could the
> code be:
>
> elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> addr = elfcorehdr_addr + ehdr.e_phoff;
> memcpy(elfcorebuf, &ehdr, sizeof(Elf64_Ehdr));
> read_from_oldmem(elfcorebuf + sizeof(Elf64_Ehdr), elfcorebuf_sz -
> sizeof(Elf64_Ehdr), &addr, 0);
> (Elf64_Ehdr *)elfcorebuf->e_phoff = sizeof(Elf64_Ehdr);

Thanks. This is not minor suggestion. This is critical. My code is
completely broken. On ELF, segments and headers other than ELF header
can occur in any positions. This means program header table can occur
after segments. So, on terabyte systems, e_phoff can be more than
terabytes.

Sorry, this was due to my carelessness.

Thanks.
HATAYAMA, Daisuke

2013-03-11 17:37:12

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

On Mon, Mar 11, 2013 at 09:31:41AM +0900, HATAYAMA Daisuke wrote:
> From: Zhang Yanfei <[email protected]>
> Subject: Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly
> Date: Sun, 10 Mar 2013 14:46:31 +0800
>
> > 于 2013年03月05日 15:35, Zhang Yanfei 写道:
> >> 于 2013年03月02日 16:35, HATAYAMA Daisuke 写道:
> <cut>
> >
> > One minor suggestion.
> >
> > Previously, when the code assumes program headers are following immediately
> > the ELF header, it uses
> >
> > elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> >
> > to calculate the size of ELF header and ELF program headers
> >
> > This patch avoids the assumption, and uses ehdr.e_phoff to get the program
> > headers' address. But it will read unrelated contents into elfcorebuf if
> > program headers are not following immediately the ELF header. So could the
> > code be:
> >
> > elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> > addr = elfcorehdr_addr + ehdr.e_phoff;
> > memcpy(elfcorebuf, &ehdr, sizeof(Elf64_Ehdr));
> > read_from_oldmem(elfcorebuf + sizeof(Elf64_Ehdr), elfcorebuf_sz -
> > sizeof(Elf64_Ehdr), &addr, 0);
> > (Elf64_Ehdr *)elfcorebuf->e_phoff = sizeof(Elf64_Ehdr);
>
> Thanks. This is not minor suggestion. This is critical. My code is
> completely broken. On ELF, segments and headers other than ELF header
> can occur in any positions. This means program header table can occur
> after segments. So, on terabyte systems, e_phoff can be more than
> terabytes.

Agreed. It is safer to not copy al the bits till e_phoff.

Thanks
Vivek