Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation.
In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and IO work.
To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance.
Benchmark
=========
You can see two benchmarks on terabyte memory system. Both show about
40 seconds on 2TB system. This is almost equal to performance by
experimental kernel-side memory filtering.
- makedumpfile mmap() benchmark, by Jingbai Ma
https://lkml.org/lkml/2013/3/27/19
- makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
https://lkml.org/lkml/2013/3/26/914
ChangeLog
=========
v4 => v5)
- Rebase 3.10-rc1.
- Introduce remap_vmalloc_range_partial() in order to remap vmalloc
memory in a part of vma area.
- Allocate buffer for ELF note segment at 2nd kernel by vmalloc(). Use
remap_vmalloc_range_partial() to remap the memory to userspace.
v3 => v4)
- Rebase 3.9-rc7.
- Drop clean-up patches orthogonal to the main topic of this patch set.
- Copy ELF note segments in the 2nd kernel just as in v1. Allocate
vmcore objects per pages. => See [PATCH 5/8]
- Map memory referenced by PT_LOAD entry directly even if the start or
end of the region doesn't fit inside page boundary, no longer copy
them as the previous v3. Then, holes, outside OS memory, are visible
from /proc/vmcore. => See [PATCH 7/8]
v2 => v3)
- Rebase 3.9-rc3.
- Copy program headers separately from e_phoff in ELF note segment
buffer. Now there's no risk to allocate huge memory if program
header table positions after memory segment.
- Add cleanup patch that removes unnecessary variable.
- Fix wrongly using the variable that is buffer size configurable at
runtime. Instead, use the variable that has original buffer size.
v1 => v2)
- Clean up the existing codes: use e_phoff, and remove the assumption
on PT_NOTE entries.
- Fix potential bug that ELF header size is not included in exported
vmcoreinfo size.
- Divide patch modifying read_vmcore() into two: clean-up and primary
code change.
- Put ELF note segments in page-size boundary on the 1st kernel
instead of copying them into the buffer on the 2nd kernel.
Test
====
This patch set is composed based on v3.10-rc1, tested on x86_64,
x86_32 both with 1GB and with 5GB (over 4GB) memory configurations.
---
HATAYAMA Daisuke (8):
vmcore: support mmap() on /proc/vmcore
vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
vmalloc: introduce remap_vmalloc_range_partial
vmalloc: make find_vm_area check in range
vmcore: clean up read_vmcore()
vmcore: allocate buffer for ELF headers on page-size alignment
fs/proc/vmcore.c | 491 ++++++++++++++++++++++++++++++++---------------
include/linux/vmalloc.h | 4
mm/vmalloc.c | 65 ++++--
3 files changed, 386 insertions(+), 174 deletions(-)
--
Thanks.
HATAYAMA, Daisuke
Allocate ELF headers on page-size boundary using __get_free_pages()
instead of kmalloc().
Later patch will merge PT_NOTE entries into a single unique one and
decrease the buffer size actually used. Keep original buffer size in
variable elfcorebuf_sz_orig to kfree the buffer later and actually
used buffer size with rounded up to page-size boundary in variable
elfcorebuf_sz separately.
The size of part of the ELF buffer exported from /proc/vmcore is
elfcorebuf_sz.
The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
elfcorebuf_sz_orig], is filled with 0.
Use size of the ELF headers as an initial offset value in
set_vmcore_list_offsets_elf{64,32} and
process_ptload_program_headers_elf{64,32} in order to indicate that
the offset includes the holes towards the page boundary.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
fs/proc/vmcore.c | 80 ++++++++++++++++++++++++++++++------------------------
1 files changed, 45 insertions(+), 35 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 17f7e08..69e1198 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
/* Stores the pointer to the buffer containing kernel elf core headers. */
static char *elfcorebuf;
static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;
/* Total size of vmcore file. */
static u64 vmcore_size;
@@ -214,7 +215,7 @@ static struct vmcore* __init get_new_element(void)
return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
}
-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
{
int i;
u64 size;
@@ -223,7 +224,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
ehdr_ptr = (Elf64_Ehdr *)elfptr;
phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
- size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+ size = elfsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
size += phdr_ptr->p_memsz;
phdr_ptr++;
@@ -231,7 +232,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
return size;
}
-static u64 __init get_vmcore_size_elf32(char *elfptr)
+static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
{
int i;
u64 size;
@@ -240,7 +241,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
ehdr_ptr = (Elf32_Ehdr *)elfptr;
phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
- size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+ size = elfsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
size += phdr_ptr->p_memsz;
phdr_ptr++;
@@ -308,7 +309,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
phdr.p_flags = 0;
note_off = sizeof(Elf64_Ehdr) +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
- phdr.p_offset = note_off;
+ phdr.p_offset = roundup(note_off, PAGE_SIZE);
phdr.p_vaddr = phdr.p_paddr = 0;
phdr.p_filesz = phdr.p_memsz = phdr_sz;
phdr.p_align = 0;
@@ -322,6 +323,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
*elfsz = *elfsz - i;
memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
+ memset(elfptr + *elfsz, 0, i);
+ *elfsz = roundup(*elfsz, PAGE_SIZE);
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -389,7 +392,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
phdr.p_flags = 0;
note_off = sizeof(Elf32_Ehdr) +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
- phdr.p_offset = note_off;
+ phdr.p_offset = roundup(note_off, PAGE_SIZE);
phdr.p_vaddr = phdr.p_paddr = 0;
phdr.p_filesz = phdr.p_memsz = phdr_sz;
phdr.p_align = 0;
@@ -403,6 +406,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
*elfsz = *elfsz - i;
memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
+ memset(elfptr + *elfsz, 0, i);
+ *elfsz = roundup(*elfsz, PAGE_SIZE);
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -426,9 +431,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
/* First program header is PT_NOTE header. */
- vmcore_off = sizeof(Elf64_Ehdr) +
- (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
- phdr_ptr->p_memsz; /* Note sections */
+ vmcore_off = elfsz + roundup(phdr_ptr->p_memsz, PAGE_SIZE);
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)
@@ -463,9 +466,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
/* First program header is PT_NOTE header. */
- vmcore_off = sizeof(Elf32_Ehdr) +
- (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
- phdr_ptr->p_memsz; /* Note sections */
+ vmcore_off = elfsz + roundup(phdr_ptr->p_memsz, PAGE_SIZE);
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)
@@ -487,7 +488,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
}
/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
+static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
struct list_head *vc_list)
{
loff_t vmcore_off;
@@ -497,8 +498,7 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
ehdr_ptr = (Elf64_Ehdr *)elfptr;
/* Skip Elf header and program headers. */
- vmcore_off = sizeof(Elf64_Ehdr) +
- (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+ vmcore_off = elfsz;
list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
@@ -507,7 +507,7 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
}
/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
+static void __init set_vmcore_list_offsets_elf32(char *elfptr, size_t elfsz,
struct list_head *vc_list)
{
loff_t vmcore_off;
@@ -517,8 +517,7 @@ static void __init set_vmcore_list_offsets_elf32(char *elfptr,
ehdr_ptr = (Elf32_Ehdr *)elfptr;
/* Skip Elf header and program headers. */
- vmcore_off = sizeof(Elf32_Ehdr) +
- (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+ vmcore_off = elfsz;
list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
@@ -554,30 +553,35 @@ static int __init parse_crash_elf64_headers(void)
}
/* Read in all elf headers. */
- elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
- elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+ elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+ elfcorebuf_sz = elfcorebuf_sz_orig;
+ elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(elfcorebuf_sz_orig));
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+ rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
if (rc < 0) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
/* Merge all PT_NOTE headers into one. */
rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
&vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
- set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+ set_vmcore_list_offsets_elf64(elfcorebuf, elfcorebuf_sz, &vmcore_list);
return 0;
}
@@ -609,30 +613,35 @@ static int __init parse_crash_elf32_headers(void)
}
/* Read in all elf headers. */
- elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
- elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+ elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
+ elfcorebuf_sz = elfcorebuf_sz_orig;
+ elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(elfcorebuf_sz_orig));
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+ rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
if (rc < 0) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
/* Merge all PT_NOTE headers into one. */
rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
&vmcore_list);
if (rc) {
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
return rc;
}
- set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+ set_vmcore_list_offsets_elf32(elfcorebuf, elfcorebuf_sz, &vmcore_list);
return 0;
}
@@ -657,14 +666,14 @@ static int __init parse_crash_elf_headers(void)
return rc;
/* Determine vmcore size. */
- vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+ vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
} else if (e_ident[EI_CLASS] == ELFCLASS32) {
rc = parse_crash_elf32_headers();
if (rc)
return rc;
/* Determine vmcore size. */
- vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+ vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
} else {
pr_warn("Warning: Core image elf header is not sane\n");
return -EINVAL;
@@ -711,7 +720,8 @@ void vmcore_cleanup(void)
list_del(&m->list);
kfree(m);
}
- kfree(elfcorebuf);
+ free_pages((unsigned long)elfcorebuf,
+ get_order(elfcorebuf_sz_orig));
elfcorebuf = NULL;
}
EXPORT_SYMBOL_GPL(vmcore_cleanup);
Rewrite part of read_vmcore() that reads objects in vmcore_list in the
same way as part reading ELF headers, by which some duplicated and
redundant codes are removed.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
fs/proc/vmcore.c | 68 ++++++++++++++++--------------------------------------
1 files changed, 20 insertions(+), 48 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 69e1198..48886e6 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -119,27 +119,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
return read;
}
-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
- struct vmcore **m_ptr)
-{
- struct vmcore *m;
- u64 paddr;
-
- list_for_each_entry(m, vc_list, list) {
- u64 start, end;
- start = m->offset;
- end = m->offset + m->size - 1;
- if (offset >= start && offset <= end) {
- paddr = m->paddr + offset - start;
- *m_ptr = m;
- return paddr;
- }
- }
- *m_ptr = NULL;
- return 0;
-}
-
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
@@ -148,8 +127,8 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
{
ssize_t acc = 0, tmp;
size_t tsz;
- u64 start, nr_bytes;
- struct vmcore *curr_m = NULL;
+ u64 start;
+ struct vmcore *m = NULL;
if (buflen == 0 || *fpos >= vmcore_size)
return 0;
@@ -175,33 +154,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
return acc;
}
- start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
- if (!curr_m)
- return -EINVAL;
-
- while (buflen) {
- tsz = min_t(size_t, buflen, PAGE_SIZE - (start & ~PAGE_MASK));
-
- /* Calculate left bytes in current memory segment. */
- nr_bytes = (curr_m->size - (start - curr_m->paddr));
- if (tsz > nr_bytes)
- tsz = nr_bytes;
-
- tmp = read_from_oldmem(buffer, tsz, &start, 1);
- if (tmp < 0)
- return tmp;
- buflen -= tsz;
- *fpos += tsz;
- buffer += tsz;
- acc += tsz;
- if (start >= (curr_m->paddr + curr_m->size)) {
- if (curr_m->list.next == &vmcore_list)
- return acc; /*EOF*/
- curr_m = list_entry(curr_m->list.next,
- struct vmcore, list);
- start = curr_m->paddr;
+ list_for_each_entry(m, &vmcore_list, list) {
+ if (*fpos < m->offset + m->size) {
+ tsz = m->offset + m->size - *fpos;
+ if (buflen < tsz)
+ tsz = buflen;
+ start = m->paddr + *fpos - m->offset;
+ tmp = read_from_oldmem(buffer, tsz, &start, 1);
+ if (tmp < 0)
+ return tmp;
+ buflen -= tsz;
+ *fpos += tsz;
+ buffer += tsz;
+ acc += tsz;
+
+ /* leave now if filled buffer already */
+ if (buflen == 0)
+ return acc;
}
}
+
return acc;
}
Currently, __find_vmap_area searches for the kernel VM area starting
at a given address. This patch changes this behavior so that it
searches for the kernel VM area to which the address belongs. This
change is needed by remap_vmalloc_range_partial to be introduced in
later patch that receives any position of kernel VM area as target
address.
This patch changes the condition (addr > va->va_start) to the
equivalent (addr >= va->va_end) by taking advantage of the fact that
each kernel VM area is non-overlapping.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
mm/vmalloc.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d365724..3875fa2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -292,7 +292,7 @@ static struct vmap_area *__find_vmap_area(unsigned long addr)
va = rb_entry(n, struct vmap_area, rb_node);
if (addr < va->va_start)
n = n->rb_left;
- else if (addr > va->va_start)
+ else if (addr >= va->va_end)
n = n->rb_right;
else
return va;
The reasons why we don't allocate ELF note segment in the 1st kernel
(old memory) on page boundary is to keep backward compatibility for
old kernels, and that if doing so, we waste not a little memory due to
round-up operation to fit the memory to page boundary since most of
the buffers are in per-cpu area.
ELF notes are per-cpu, so total size of ELF note segments depends on
number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
and there's already system with 4192 CPUs in SGI, where total size
amounts to 1MB. This can be larger in the near future or possibly even
now on another architecture that has larger size of note per a single
cpu. Thus, to avoid the case where memory allocation for large block
fails, we allocate vmcore objects on vmalloc memory.
This patch adds elfnotesegbuf and elfnotesegbuf_sz variables to keep
pointer to the ELF note segment buffer and its size. There's no longer
the vmcore object that corresponds to the ELF note segment in
vmcore_list. Accordingly, read_vmcore() has new case for ELF note
segment and set_vmcore_list_offsets_elf{64,32}() and other helper
functions starts calculating offset from sum of size of ELF headers
and size of ELF note segment.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
fs/proc/vmcore.c | 225 ++++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 165 insertions(+), 60 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 48886e6..795efd2 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -34,6 +34,9 @@ static char *elfcorebuf;
static size_t elfcorebuf_sz;
static size_t elfcorebuf_sz_orig;
+static char *elfnotesegbuf;
+static size_t elfnotesegbuf_sz;
+
/* Total size of vmcore file. */
static u64 vmcore_size;
@@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
return acc;
}
+ /* Read Elf note segment */
+ if (*fpos < elfcorebuf_sz + elfnotesegbuf_sz) {
+ void *kaddr;
+
+ tsz = elfcorebuf_sz + elfnotesegbuf_sz - *fpos;
+ if (buflen < tsz)
+ tsz = buflen;
+ kaddr = elfnotesegbuf + *fpos - elfcorebuf_sz;
+ if (copy_to_user(buffer, kaddr, tsz))
+ return -EFAULT;
+ buflen -= tsz;
+ *fpos += tsz;
+ buffer += tsz;
+ acc += tsz;
+
+ /* leave now if filled buffer already */
+ if (buflen == 0)
+ return acc;
+ }
+
list_for_each_entry(m, &vmcore_list, list) {
if (*fpos < m->offset + m->size) {
tsz = m->offset + m->size - *fpos;
@@ -221,23 +244,18 @@ static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
return size;
}
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
- struct list_head *vc_list)
+static int __init process_note_headers_elf64(const Elf64_Ehdr *ehdr_ptr,
+ int *nr_ptnotep, u64 *phdr_szp,
+ char *notesegp)
{
int i, nr_ptnote=0, rc=0;
- char *tmp;
- Elf64_Ehdr *ehdr_ptr;
- Elf64_Phdr phdr, *phdr_ptr;
+ Elf64_Phdr *phdr_ptr = (Elf64_Phdr*)(ehdr_ptr + 1);
Elf64_Nhdr *nhdr_ptr;
- u64 phdr_sz = 0, note_off;
+ u64 phdr_sz = 0;
- ehdr_ptr = (Elf64_Ehdr *)elfptr;
- phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
int j;
void *notes_section;
- struct vmcore *new;
u64 offset, max_sz, sz, real_sz = 0;
if (phdr_ptr->p_type != PT_NOTE)
continue;
@@ -262,20 +280,62 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
real_sz += sz;
nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
}
-
- /* Add this contiguous chunk of notes section to vmcore list.*/
- new = get_new_element();
- if (!new) {
- kfree(notes_section);
- return -ENOMEM;
+ if (notesegp) {
+ offset = phdr_ptr->p_offset;
+ rc = read_from_oldmem(notesegp + phdr_sz, real_sz,
+ &offset, 0);
+ if (rc < 0) {
+ kfree(notes_section);
+ return rc;
+ }
}
- new->paddr = phdr_ptr->p_offset;
- new->size = real_sz;
- list_add_tail(&new->list, vc_list);
phdr_sz += real_sz;
kfree(notes_section);
}
+ if (nr_ptnotep)
+ *nr_ptnotep = nr_ptnote;
+ if (phdr_szp)
+ *phdr_szp = phdr_sz;
+
+ return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+ char **notesegptr, size_t *notesegsz,
+ struct list_head *vc_list)
+{
+ int i, nr_ptnote=0, rc=0;
+ char *tmp;
+ Elf64_Ehdr *ehdr_ptr;
+ Elf64_Phdr phdr;
+ u64 phdr_sz = 0, note_off;
+ struct vm_struct *vm;
+
+ ehdr_ptr = (Elf64_Ehdr *)elfptr;
+
+ /* The first path calculates the number of PT_NOTE entries and
+ * total size of ELF note segment. */
+ rc = process_note_headers_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz, NULL);
+ if (rc < 0)
+ return rc;
+
+ *notesegsz = roundup(phdr_sz, PAGE_SIZE);
+ *notesegptr = vzalloc(*notesegsz);
+ if (!*notesegptr)
+ return -ENOMEM;
+
+ vm = find_vm_area(*notesegptr);
+ BUG_ON(!vm);
+ vm->flags |= VM_USERMAP;
+
+ /* The second path copies the ELF note segment in the ELF note
+ * segment buffer. */
+ rc = process_note_headers_elf64(ehdr_ptr, NULL, NULL, *notesegptr);
+ if (rc < 0)
+ return rc;
+
/* Prepare merged PT_NOTE program header. */
phdr.p_type = PT_NOTE;
phdr.p_flags = 0;
@@ -304,23 +364,18 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
return 0;
}
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
- struct list_head *vc_list)
+static int __init process_note_headers_elf32(const Elf32_Ehdr *ehdr_ptr,
+ int *nr_ptnotep, u64 *phdr_szp,
+ char *notesegp)
{
int i, nr_ptnote=0, rc=0;
- char *tmp;
- Elf32_Ehdr *ehdr_ptr;
- Elf32_Phdr phdr, *phdr_ptr;
+ Elf32_Phdr *phdr_ptr = (Elf32_Phdr*)(ehdr_ptr + 1);
Elf32_Nhdr *nhdr_ptr;
- u64 phdr_sz = 0, note_off;
+ u64 phdr_sz = 0;
- ehdr_ptr = (Elf32_Ehdr *)elfptr;
- phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
int j;
void *notes_section;
- struct vmcore *new;
u64 offset, max_sz, sz, real_sz = 0;
if (phdr_ptr->p_type != PT_NOTE)
continue;
@@ -345,20 +400,62 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
real_sz += sz;
nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
}
-
- /* Add this contiguous chunk of notes section to vmcore list.*/
- new = get_new_element();
- if (!new) {
- kfree(notes_section);
- return -ENOMEM;
+ if (notesegp) {
+ offset = phdr_ptr->p_offset;
+ rc = read_from_oldmem(notesegp + phdr_sz, real_sz,
+ &offset, 0);
+ if (rc < 0) {
+ kfree(notes_section);
+ return rc;
+ }
}
- new->paddr = phdr_ptr->p_offset;
- new->size = real_sz;
- list_add_tail(&new->list, vc_list);
phdr_sz += real_sz;
kfree(notes_section);
}
+ if (nr_ptnotep)
+ *nr_ptnotep = nr_ptnote;
+ if (phdr_szp)
+ *phdr_szp = phdr_sz;
+
+ return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
+ char **notesegptr, size_t *notesegsz,
+ struct list_head *vc_list)
+{
+ int i, nr_ptnote=0, rc=0;
+ char *tmp;
+ Elf32_Ehdr *ehdr_ptr;
+ Elf32_Phdr phdr;
+ u64 phdr_sz = 0, note_off;
+ struct vm_struct *vm;
+
+ ehdr_ptr = (Elf32_Ehdr *)elfptr;
+
+ /* The first path calculates the number of PT_NOTE entries and
+ * total size of ELF note segment. */
+ rc = process_note_headers_elf32(ehdr_ptr, &nr_ptnote, &phdr_sz, NULL);
+ if (rc < 0)
+ return rc;
+
+ *notesegsz = roundup(phdr_sz, PAGE_SIZE);
+ *notesegptr = vzalloc(*notesegsz);
+ if (!*notesegptr)
+ return -ENOMEM;
+
+ vm = find_vm_area(*notesegptr);
+ BUG_ON(!vm);
+ vm->flags |= VM_USERMAP;
+
+ /* The second path copies the ELF note segment in the ELF note
+ * segment buffer. */
+ rc = process_note_headers_elf32(ehdr_ptr, NULL, NULL, *notesegptr);
+ if (rc < 0)
+ return rc;
+
/* Prepare merged PT_NOTE program header. */
phdr.p_type = PT_NOTE;
phdr.p_flags = 0;
@@ -391,6 +488,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
* the new offset fields of exported program headers. */
static int __init process_ptload_program_headers_elf64(char *elfptr,
size_t elfsz,
+ size_t elfnotesegsz,
struct list_head *vc_list)
{
int i;
@@ -402,8 +500,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
ehdr_ptr = (Elf64_Ehdr *)elfptr;
phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
- /* First program header is PT_NOTE header. */
- vmcore_off = elfsz + roundup(phdr_ptr->p_memsz, PAGE_SIZE);
+ /* Skip Elf header, program headers and Elf note segment. */
+ vmcore_off = elfsz + elfnotesegsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)
@@ -426,6 +524,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
static int __init process_ptload_program_headers_elf32(char *elfptr,
size_t elfsz,
+ size_t elfnotesegsz,
struct list_head *vc_list)
{
int i;
@@ -437,8 +536,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
ehdr_ptr = (Elf32_Ehdr *)elfptr;
phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
- /* First program header is PT_NOTE header. */
- vmcore_off = elfsz + roundup(phdr_ptr->p_memsz, PAGE_SIZE);
+ /* Skip Elf header, program headers and Elf note segment. */
+ vmcore_off = elfsz + elfnotesegsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)
@@ -460,17 +559,15 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
}
/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
+static void __init set_vmcore_list_offsets_elf64(size_t elfsz,
+ size_t elfnotesegsz,
struct list_head *vc_list)
{
loff_t vmcore_off;
- Elf64_Ehdr *ehdr_ptr;
struct vmcore *m;
- ehdr_ptr = (Elf64_Ehdr *)elfptr;
-
- /* Skip Elf header and program headers. */
- vmcore_off = elfsz;
+ /* Skip Elf header, program headers and Elf note segment. */
+ vmcore_off = elfsz + elfnotesegsz;
list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
@@ -479,17 +576,15 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
}
/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr, size_t elfsz,
+static void __init set_vmcore_list_offsets_elf32(size_t elfsz,
+ size_t elfnotesegsz,
struct list_head *vc_list)
{
loff_t vmcore_off;
- Elf32_Ehdr *ehdr_ptr;
struct vmcore *m;
- ehdr_ptr = (Elf32_Ehdr *)elfptr;
-
- /* Skip Elf header and program headers. */
- vmcore_off = elfsz;
+ /* Skip Elf header, program headers and Elf note segment. */
+ vmcore_off = elfsz + elfnotesegsz;
list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
@@ -540,20 +635,24 @@ static int __init parse_crash_elf64_headers(void)
}
/* Merge all PT_NOTE headers into one. */
- rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+ rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz,
+ &elfnotesegbuf, &elfnotesegbuf_sz,
+ &vmcore_list);
if (rc) {
free_pages((unsigned long)elfcorebuf,
get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
- &vmcore_list);
+ elfnotesegbuf_sz,
+ &vmcore_list);
if (rc) {
free_pages((unsigned long)elfcorebuf,
get_order(elfcorebuf_sz_orig));
return rc;
}
- set_vmcore_list_offsets_elf64(elfcorebuf, elfcorebuf_sz, &vmcore_list);
+ set_vmcore_list_offsets_elf64(elfcorebuf_sz, elfnotesegbuf_sz,
+ &vmcore_list);
return 0;
}
@@ -600,20 +699,24 @@ static int __init parse_crash_elf32_headers(void)
}
/* Merge all PT_NOTE headers into one. */
- rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+ rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz,
+ &elfnotesegbuf, &elfnotesegbuf_sz,
+ &vmcore_list);
if (rc) {
free_pages((unsigned long)elfcorebuf,
get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
- &vmcore_list);
+ elfnotesegbuf_sz,
+ &vmcore_list);
if (rc) {
free_pages((unsigned long)elfcorebuf,
get_order(elfcorebuf_sz_orig));
return rc;
}
- set_vmcore_list_offsets_elf32(elfcorebuf, elfcorebuf_sz, &vmcore_list);
+ set_vmcore_list_offsets_elf32(elfcorebuf_sz, elfnotesegbuf_sz,
+ &vmcore_list);
return 0;
}
@@ -692,6 +795,8 @@ void vmcore_cleanup(void)
list_del(&m->list);
kfree(m);
}
+ vfree(elfnotesegbuf);
+ elfnotesegbuf = NULL;
free_pages((unsigned long)elfcorebuf,
get_order(elfcorebuf_sz_orig));
elfcorebuf = NULL;
We want to allocate ELF note segment buffer on the 2nd kernel in
vmalloc space and remap it to user-space in order to reduce the risk
that memory allocation fails on system with huge number of CPUs and so
with huge ELF note segment that exceeds 11-order block size.
Although there's already remap_vmalloc_range for the purpose of
remapping vmalloc memory to user-space, we need to specify user-space
range via vma. Mmap on /proc/vmcore needs to remap range across
multiple objects, so the interface that requires vma to cover full
range is problematic.
This patch introduces remap_vmalloc_range_partial that receives
user-space range as a pair of base address and size and can be used
for mmap on /proc/vmcore case.
remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
include/linux/vmalloc.h | 4 +++
mm/vmalloc.c | 63 +++++++++++++++++++++++++++++++++--------------
2 files changed, 48 insertions(+), 19 deletions(-)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 7d5773a..dd0a2c8 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -82,6 +82,10 @@ extern void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
extern void vunmap(const void *addr);
+extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
+ unsigned long uaddr, void *kaddr,
+ unsigned long size);
+
extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
void vmalloc_sync_all(void);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3875fa2..d9a9f4f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2148,42 +2148,44 @@ finished:
}
/**
- * remap_vmalloc_range - map vmalloc pages to userspace
- * @vma: vma to cover (map full range of vma)
- * @addr: vmalloc memory
- * @pgoff: number of pages into addr before first page to map
+ * remap_vmalloc_range_partial - map vmalloc pages to userspace
+ * @vma: vma to cover
+ * @uaddr: target user address to start at
+ * @kaddr: virtual address of vmalloc kernel memory
+ * @size: size of map area
*
* Returns: 0 for success, -Exxx on failure
*
- * This function checks that addr is a valid vmalloc'ed area, and
- * that it is big enough to cover the vma. Will return failure if
- * that criteria isn't met.
+ * This function checks that @kaddr is a valid vmalloc'ed area,
+ * and that it is big enough to cover the range starting at
+ * @uaddr in @vma. Will return failure if that criteria isn't
+ * met.
*
* Similar to remap_pfn_range() (see mm/memory.c)
*/
-int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
- unsigned long pgoff)
+int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
+ void *kaddr, unsigned long size)
{
struct vm_struct *area;
- unsigned long uaddr = vma->vm_start;
- unsigned long usize = vma->vm_end - vma->vm_start;
- if ((PAGE_SIZE-1) & (unsigned long)addr)
+ size = PAGE_ALIGN(size);
+
+ if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
+ ((PAGE_SIZE-1) & (unsigned long)kaddr))
return -EINVAL;
- area = find_vm_area(addr);
+ area = find_vm_area(kaddr);
if (!area)
return -EINVAL;
if (!(area->flags & VM_USERMAP))
return -EINVAL;
- if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE)
+ if (kaddr + size > area->addr + area->size)
return -EINVAL;
- addr += pgoff << PAGE_SHIFT;
do {
- struct page *page = vmalloc_to_page(addr);
+ struct page *page = vmalloc_to_page(kaddr);
int ret;
ret = vm_insert_page(vma, uaddr, page);
@@ -2191,14 +2193,37 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
return ret;
uaddr += PAGE_SIZE;
- addr += PAGE_SIZE;
- usize -= PAGE_SIZE;
- } while (usize > 0);
+ kaddr += PAGE_SIZE;
+ size -= PAGE_SIZE;
+ } while (size > 0);
vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
return 0;
}
+EXPORT_SYMBOL(remap_vmalloc_range_partial);
+
+/**
+ * remap_vmalloc_range - map vmalloc pages to userspace
+ * @vma: vma to cover (map full range of vma)
+ * @addr: vmalloc memory
+ * @pgoff: number of pages into addr before first page to map
+ *
+ * Returns: 0 for success, -Exxx on failure
+ *
+ * This function checks that addr is a valid vmalloc'ed area, and
+ * that it is big enough to cover the vma. Will return failure if
+ * that criteria isn't met.
+ *
+ * Similar to remap_pfn_range() (see mm/memory.c)
+ */
+int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
+ unsigned long pgoff)
+{
+ return remap_vmalloc_range_partial(vma, vma->vm_start,
+ addr + (pgoff << PAGE_SHIFT),
+ vma->vm_end - vma->vm_start);
+}
EXPORT_SYMBOL(remap_vmalloc_range);
/*
Treat memory chunks referenced by PT_LOAD program header entries in
page-size boundary in vmcore_list. Formally, for each range [start,
end], we set up the corresponding vmcore object in vmcore_list to
[rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
This change affects layout of /proc/vmcore. The gaps generated by the
rearrangement are newly made visible to applications as
holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
start] and [end, roundup(end, PAGE_SIZE)].
Suppose variable m points at a vmcore object in vmcore_list, and
variable phdr points at the program header of PT_LOAD type the
variable m corresponds to. Then, pictorially:
m->offset +---------------+
| hole |
phdr->p_offset = +---------------+
m->offset + (paddr - start) | |\
| kernel memory | phdr->p_memsz
| |/
+---------------+
| hole |
m->offset + m->size +---------------+
where m->offset and m->offset + m->size are always page-size aligned.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
fs/proc/vmcore.c | 30 ++++++++++++++++++++++--------
1 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 795efd2..eb7ff29 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -504,20 +504,27 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
vmcore_off = elfsz + elfnotesegsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+ u64 paddr, start, end, size;
+
if (phdr_ptr->p_type != PT_LOAD)
continue;
+ paddr = phdr_ptr->p_offset;
+ start = rounddown(paddr, PAGE_SIZE);
+ end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+ size = end - start;
+
/* Add this contiguous chunk of memory to vmcore list.*/
new = get_new_element();
if (!new)
return -ENOMEM;
- new->paddr = phdr_ptr->p_offset;
- new->size = phdr_ptr->p_memsz;
+ new->paddr = start;
+ new->size = size;
list_add_tail(&new->list, vc_list);
/* Update the program header offset. */
- phdr_ptr->p_offset = vmcore_off;
- vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+ phdr_ptr->p_offset = vmcore_off + (paddr - start);
+ vmcore_off = vmcore_off + size;
}
return 0;
}
@@ -540,20 +547,27 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
vmcore_off = elfsz + elfnotesegsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+ u64 paddr, start, end, size;
+
if (phdr_ptr->p_type != PT_LOAD)
continue;
+ paddr = phdr_ptr->p_offset;
+ start = rounddown(paddr, PAGE_SIZE);
+ end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+ size = end - start;
+
/* Add this contiguous chunk of memory to vmcore list.*/
new = get_new_element();
if (!new)
return -ENOMEM;
- new->paddr = phdr_ptr->p_offset;
- new->size = phdr_ptr->p_memsz;
+ new->paddr = start;
+ new->size = size;
list_add_tail(&new->list, vc_list);
/* Update the program header offset */
- phdr_ptr->p_offset = vmcore_off;
- vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+ phdr_ptr->p_offset = vmcore_off + (paddr - start);
+ vmcore_off = vmcore_off + size;
}
return 0;
}
The previous patches newly added holes before each chunk of memory and
the holes need to be count in vmcore file size. There are two ways to
count file size in such a way:
1) supporse p as a poitner to the last program header entry with
PT_LOAD type, then roundup(p->p_offset + p->p_memsz, PAGE_SIZE), or
2) calculate sum of size of buffers for ELF header, program headers,
ELF note segments and objects in vmcore_list.
Although 1) is more direct and simpler than 2), 2) seems better in
that it reflects internal object structure of /proc/vmcore. Thus, this
patch changes get_vmcore_size_elf{64, 32} so that it calculates size
in the way of 2).
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
fs/proc/vmcore.c | 40 ++++++++++++++++++----------------------
1 files changed, 18 insertions(+), 22 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index eb7ff29..ad6da17 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -210,36 +210,28 @@ static struct vmcore* __init get_new_element(void)
return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
}
-static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
+static u64 __init get_vmcore_size_elf64(size_t elfsz, size_t elfnotesegsz,
+ struct list_head *vc_list)
{
- int i;
u64 size;
- Elf64_Ehdr *ehdr_ptr;
- Elf64_Phdr *phdr_ptr;
+ struct vmcore *m;
- ehdr_ptr = (Elf64_Ehdr *)elfptr;
- phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
- size = elfsz;
- for (i = 0; i < ehdr_ptr->e_phnum; i++) {
- size += phdr_ptr->p_memsz;
- phdr_ptr++;
+ size = elfsz + elfnotesegsz;
+ list_for_each_entry(m, vc_list, list) {
+ size += m->size;
}
return size;
}
-static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
+static u64 __init get_vmcore_size_elf32(size_t elfsz, size_t elfnotesegsz,
+ struct list_head *vc_list)
{
- int i;
u64 size;
- Elf32_Ehdr *ehdr_ptr;
- Elf32_Phdr *phdr_ptr;
+ struct vmcore *m;
- ehdr_ptr = (Elf32_Ehdr *)elfptr;
- phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
- size = elfsz;
- for (i = 0; i < ehdr_ptr->e_phnum; i++) {
- size += phdr_ptr->p_memsz;
- phdr_ptr++;
+ size = elfsz + elfnotesegsz;
+ list_for_each_entry(m, vc_list, list) {
+ size += m->size;
}
return size;
}
@@ -755,14 +747,18 @@ static int __init parse_crash_elf_headers(void)
return rc;
/* Determine vmcore size. */
- vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
+ vmcore_size = get_vmcore_size_elf64(elfcorebuf_sz,
+ elfnotesegbuf_sz,
+ &vmcore_list);
} else if (e_ident[EI_CLASS] == ELFCLASS32) {
rc = parse_crash_elf32_headers();
if (rc)
return rc;
/* Determine vmcore size. */
- vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
+ vmcore_size = get_vmcore_size_elf32(elfcorebuf_sz,
+ elfnotesegbuf_sz,
+ &vmcore_list);
} else {
pr_warn("Warning: Core image elf header is not sane\n");
return -EINVAL;
This patch introduces mmap_vmcore().
Don't permit writable nor executable mapping even with mprotect()
because this mmap() is aimed at reading crash dump memory.
Non-writable mapping is also requirement of remap_pfn_range() when
mapping linear pages on non-consecutive physical pages; see
is_cow_mapping().
Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
remap_vmalloc_range_pertial at the same time for a single
vma. do_munmap() can correctly clean partially remapped vma with two
functions in abnormal case. See zap_pte_range(), vm_normal_page() and
their comments for details.
On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
fs/proc/vmcore.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 86 insertions(+), 0 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index ad6da17..d4b88f6 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -20,6 +20,7 @@
#include <linux/init.h>
#include <linux/crash_dump.h>
#include <linux/list.h>
+#include <linux/vmalloc.h>
#include <asm/uaccess.h>
#include <asm/io.h>
#include "internal.h"
@@ -200,9 +201,94 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
return acc;
}
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+ size_t size = vma->vm_end - vma->vm_start;
+ u64 start, end, len, tsz;
+ struct vmcore *m;
+
+ start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+ end = start + size;
+
+ if (size > vmcore_size || end > vmcore_size)
+ return -EINVAL;
+
+ if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+ return -EPERM;
+
+ vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+ vma->vm_flags |= VM_MIXEDMAP;
+
+ len = 0;
+
+ if (start < elfcorebuf_sz) {
+ u64 pfn;
+
+ tsz = elfcorebuf_sz - start;
+ if (size < tsz)
+ tsz = size;
+ pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+ if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+ vma->vm_page_prot))
+ return -EAGAIN;
+ size -= tsz;
+ start += tsz;
+ len += tsz;
+
+ if (size == 0)
+ return 0;
+ }
+
+ if (start < elfcorebuf_sz + elfnotesegbuf_sz) {
+ void *kaddr;
+
+ tsz = elfcorebuf_sz + elfnotesegbuf_sz - start;
+ if (size < tsz)
+ tsz = size;
+ kaddr = elfnotesegbuf + start - elfcorebuf_sz;
+ if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
+ kaddr, tsz)) {
+ do_munmap(vma->vm_mm, vma->vm_start, len);
+ return -EAGAIN;
+ }
+ size -= tsz;
+ start += tsz;
+ len += tsz;
+
+ if (size == 0)
+ return 0;
+ }
+
+ list_for_each_entry(m, &vmcore_list, list) {
+ if (start < m->offset + m->size) {
+ u64 paddr = 0;
+
+ tsz = m->offset + m->size - start;
+ if (size < tsz)
+ tsz = size;
+ paddr = m->paddr + start - m->offset;
+ if (remap_pfn_range(vma, vma->vm_start + len,
+ paddr >> PAGE_SHIFT, tsz,
+ vma->vm_page_prot)) {
+ do_munmap(vma->vm_mm, vma->vm_start, len);
+ return -EAGAIN;
+ }
+ size -= tsz;
+ start += tsz;
+ len += tsz;
+
+ if (size == 0)
+ return 0;
+ }
+ }
+
+ return 0;
+}
+
static const struct file_operations proc_vmcore_operations = {
.read = read_vmcore,
.llseek = default_llseek,
+ .mmap = mmap_vmcore,
};
static struct vmcore* __init get_new_element(void)
On Tue, May 14, 2013 at 10:57:12AM +0900, HATAYAMA Daisuke wrote:
> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
>
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
>
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
>
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
>
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
Looks good to me.
Acked-by: Vivek Goyal <[email protected]>
Vivek
> ---
>
> fs/proc/vmcore.c | 80 ++++++++++++++++++++++++++++++------------------------
> 1 files changed, 45 insertions(+), 35 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 17f7e08..69e1198 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
> /* Stores the pointer to the buffer containing kernel elf core headers. */
> static char *elfcorebuf;
> static size_t elfcorebuf_sz;
> +static size_t elfcorebuf_sz_orig;
>
> /* Total size of vmcore file. */
> static u64 vmcore_size;
> @@ -214,7 +215,7 @@ static struct vmcore* __init get_new_element(void)
> return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
> }
>
> -static u64 __init get_vmcore_size_elf64(char *elfptr)
> +static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
> {
> int i;
> u64 size;
> @@ -223,7 +224,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
> phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> - size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
> + size = elfsz;
> for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> size += phdr_ptr->p_memsz;
> phdr_ptr++;
> @@ -231,7 +232,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
> return size;
> }
>
> -static u64 __init get_vmcore_size_elf32(char *elfptr)
> +static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
> {
> int i;
> u64 size;
> @@ -240,7 +241,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
> phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> - size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
> + size = elfsz;
> for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> size += phdr_ptr->p_memsz;
> phdr_ptr++;
> @@ -308,7 +309,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> phdr.p_flags = 0;
> note_off = sizeof(Elf64_Ehdr) +
> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> - phdr.p_offset = note_off;
> + phdr.p_offset = roundup(note_off, PAGE_SIZE);
> phdr.p_vaddr = phdr.p_paddr = 0;
> phdr.p_filesz = phdr.p_memsz = phdr_sz;
> phdr.p_align = 0;
> @@ -322,6 +323,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
> *elfsz = *elfsz - i;
> memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> + memset(elfptr + *elfsz, 0, i);
> + *elfsz = roundup(*elfsz, PAGE_SIZE);
>
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -389,7 +392,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> phdr.p_flags = 0;
> note_off = sizeof(Elf32_Ehdr) +
> (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
> - phdr.p_offset = note_off;
> + phdr.p_offset = roundup(note_off, PAGE_SIZE);
> phdr.p_vaddr = phdr.p_paddr = 0;
> phdr.p_filesz = phdr.p_memsz = phdr_sz;
> phdr.p_align = 0;
> @@ -403,6 +406,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
> *elfsz = *elfsz - i;
> memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
> + memset(elfptr + *elfsz, 0, i);
> + *elfsz = roundup(*elfsz, PAGE_SIZE);
>
> /* Modify e_phnum to reflect merged headers. */
> ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -426,9 +431,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>
> /* First program header is PT_NOTE header. */
> - vmcore_off = sizeof(Elf64_Ehdr) +
> - (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
> - phdr_ptr->p_memsz; /* Note sections */
> + vmcore_off = elfsz + roundup(phdr_ptr->p_memsz, PAGE_SIZE);
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> if (phdr_ptr->p_type != PT_LOAD)
> @@ -463,9 +466,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>
> /* First program header is PT_NOTE header. */
> - vmcore_off = sizeof(Elf32_Ehdr) +
> - (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
> - phdr_ptr->p_memsz; /* Note sections */
> + vmcore_off = elfsz + roundup(phdr_ptr->p_memsz, PAGE_SIZE);
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> if (phdr_ptr->p_type != PT_LOAD)
> @@ -487,7 +488,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> }
>
> /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf64(char *elfptr,
> +static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
> struct list_head *vc_list)
> {
> loff_t vmcore_off;
> @@ -497,8 +498,7 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
>
> /* Skip Elf header and program headers. */
> - vmcore_off = sizeof(Elf64_Ehdr) +
> - (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
> + vmcore_off = elfsz;
>
> list_for_each_entry(m, vc_list, list) {
> m->offset = vmcore_off;
> @@ -507,7 +507,7 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
> }
>
> /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf32(char *elfptr,
> +static void __init set_vmcore_list_offsets_elf32(char *elfptr, size_t elfsz,
> struct list_head *vc_list)
> {
> loff_t vmcore_off;
> @@ -517,8 +517,7 @@ static void __init set_vmcore_list_offsets_elf32(char *elfptr,
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
>
> /* Skip Elf header and program headers. */
> - vmcore_off = sizeof(Elf32_Ehdr) +
> - (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
> + vmcore_off = elfsz;
>
> list_for_each_entry(m, vc_list, list) {
> m->offset = vmcore_off;
> @@ -554,30 +553,35 @@ static int __init parse_crash_elf64_headers(void)
> }
>
> /* Read in all elf headers. */
> - elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> + elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> + elfcorebuf_sz = elfcorebuf_sz_orig;
> + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> + get_order(elfcorebuf_sz_orig));
> if (!elfcorebuf)
> return -ENOMEM;
> addr = elfcorehdr_addr;
> - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> + rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
> if (rc < 0) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
>
> /* Merge all PT_NOTE headers into one. */
> rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
> &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> - set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> + set_vmcore_list_offsets_elf64(elfcorebuf, elfcorebuf_sz, &vmcore_list);
> return 0;
> }
>
> @@ -609,30 +613,35 @@ static int __init parse_crash_elf32_headers(void)
> }
>
> /* Read in all elf headers. */
> - elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> + elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> + elfcorebuf_sz = elfcorebuf_sz_orig;
> + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> + get_order(elfcorebuf_sz_orig));
> if (!elfcorebuf)
> return -ENOMEM;
> addr = elfcorehdr_addr;
> - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> + rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
> if (rc < 0) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
>
> /* Merge all PT_NOTE headers into one. */
> rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
> &vmcore_list);
> if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> return rc;
> }
> - set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> + set_vmcore_list_offsets_elf32(elfcorebuf, elfcorebuf_sz, &vmcore_list);
> return 0;
> }
>
> @@ -657,14 +666,14 @@ static int __init parse_crash_elf_headers(void)
> return rc;
>
> /* Determine vmcore size. */
> - vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> + vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
> } else if (e_ident[EI_CLASS] == ELFCLASS32) {
> rc = parse_crash_elf32_headers();
> if (rc)
> return rc;
>
> /* Determine vmcore size. */
> - vmcore_size = get_vmcore_size_elf32(elfcorebuf);
> + vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
> } else {
> pr_warn("Warning: Core image elf header is not sane\n");
> return -EINVAL;
> @@ -711,7 +720,8 @@ void vmcore_cleanup(void)
> list_del(&m->list);
> kfree(m);
> }
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> + get_order(elfcorebuf_sz_orig));
> elfcorebuf = NULL;
> }
> EXPORT_SYMBOL_GPL(vmcore_cleanup);
On Tue, May 14, 2013 at 10:57:17AM +0900, HATAYAMA Daisuke wrote:
> Rewrite part of read_vmcore() that reads objects in vmcore_list in the
> same way as part reading ELF headers, by which some duplicated and
> redundant codes are removed.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
Looks good to me.
Acked-by: Vivek Goyal <[email protected]>
Vivek
> ---
>
> fs/proc/vmcore.c | 68 ++++++++++++++++--------------------------------------
> 1 files changed, 20 insertions(+), 48 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 69e1198..48886e6 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -119,27 +119,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
> return read;
> }
>
> -/* Maps vmcore file offset to respective physical address in memroy. */
> -static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
> - struct vmcore **m_ptr)
> -{
> - struct vmcore *m;
> - u64 paddr;
> -
> - list_for_each_entry(m, vc_list, list) {
> - u64 start, end;
> - start = m->offset;
> - end = m->offset + m->size - 1;
> - if (offset >= start && offset <= end) {
> - paddr = m->paddr + offset - start;
> - *m_ptr = m;
> - return paddr;
> - }
> - }
> - *m_ptr = NULL;
> - return 0;
> -}
> -
> /* Read from the ELF header and then the crash dump. On error, negative value is
> * returned otherwise number of bytes read are returned.
> */
> @@ -148,8 +127,8 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
> {
> ssize_t acc = 0, tmp;
> size_t tsz;
> - u64 start, nr_bytes;
> - struct vmcore *curr_m = NULL;
> + u64 start;
> + struct vmcore *m = NULL;
>
> if (buflen == 0 || *fpos >= vmcore_size)
> return 0;
> @@ -175,33 +154,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
> return acc;
> }
>
> - start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
> - if (!curr_m)
> - return -EINVAL;
> -
> - while (buflen) {
> - tsz = min_t(size_t, buflen, PAGE_SIZE - (start & ~PAGE_MASK));
> -
> - /* Calculate left bytes in current memory segment. */
> - nr_bytes = (curr_m->size - (start - curr_m->paddr));
> - if (tsz > nr_bytes)
> - tsz = nr_bytes;
> -
> - tmp = read_from_oldmem(buffer, tsz, &start, 1);
> - if (tmp < 0)
> - return tmp;
> - buflen -= tsz;
> - *fpos += tsz;
> - buffer += tsz;
> - acc += tsz;
> - if (start >= (curr_m->paddr + curr_m->size)) {
> - if (curr_m->list.next == &vmcore_list)
> - return acc; /*EOF*/
> - curr_m = list_entry(curr_m->list.next,
> - struct vmcore, list);
> - start = curr_m->paddr;
> + list_for_each_entry(m, &vmcore_list, list) {
> + if (*fpos < m->offset + m->size) {
> + tsz = m->offset + m->size - *fpos;
> + if (buflen < tsz)
> + tsz = buflen;
> + start = m->paddr + *fpos - m->offset;
> + tmp = read_from_oldmem(buffer, tsz, &start, 1);
> + if (tmp < 0)
> + return tmp;
> + buflen -= tsz;
> + *fpos += tsz;
> + buffer += tsz;
> + acc += tsz;
> +
> + /* leave now if filled buffer already */
> + if (buflen == 0)
> + return acc;
> }
> }
> +
> return acc;
> }
>
On Tue, May 14, 2013 at 10:57:23AM +0900, HATAYAMA Daisuke wrote:
> Currently, __find_vmap_area searches for the kernel VM area starting
> at a given address. This patch changes this behavior so that it
> searches for the kernel VM area to which the address belongs. This
> change is needed by remap_vmalloc_range_partial to be introduced in
> later patch that receives any position of kernel VM area as target
> address.
>
> This patch changes the condition (addr > va->va_start) to the
> equivalent (addr >= va->va_end) by taking advantage of the fact that
> each kernel VM area is non-overlapping.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
This will require ack from mm folks. CCing some of them.
Thanks
Vivek
> ---
>
> mm/vmalloc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d365724..3875fa2 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -292,7 +292,7 @@ static struct vmap_area *__find_vmap_area(unsigned long addr)
> va = rb_entry(n, struct vmap_area, rb_node);
> if (addr < va->va_start)
> n = n->rb_left;
> - else if (addr > va->va_start)
> + else if (addr >= va->va_end)
> n = n->rb_right;
> else
> return va;
On Tue, May 14, 2013 at 10:57:29AM +0900, HATAYAMA Daisuke wrote:
> We want to allocate ELF note segment buffer on the 2nd kernel in
> vmalloc space and remap it to user-space in order to reduce the risk
> that memory allocation fails on system with huge number of CPUs and so
> with huge ELF note segment that exceeds 11-order block size.
>
> Although there's already remap_vmalloc_range for the purpose of
> remapping vmalloc memory to user-space, we need to specify user-space
> range via vma. Mmap on /proc/vmcore needs to remap range across
> multiple objects, so the interface that requires vma to cover full
> range is problematic.
>
> This patch introduces remap_vmalloc_range_partial that receives
> user-space range as a pair of base address and size and can be used
> for mmap on /proc/vmcore case.
>
> remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
This also needs ACK of VM folks. CCing some of them.
Thanks
Vivek
> ---
>
> include/linux/vmalloc.h | 4 +++
> mm/vmalloc.c | 63 +++++++++++++++++++++++++++++++++--------------
> 2 files changed, 48 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 7d5773a..dd0a2c8 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -82,6 +82,10 @@ extern void *vmap(struct page **pages, unsigned int count,
> unsigned long flags, pgprot_t prot);
> extern void vunmap(const void *addr);
>
> +extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
> + unsigned long uaddr, void *kaddr,
> + unsigned long size);
> +
> extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
> unsigned long pgoff);
> void vmalloc_sync_all(void);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 3875fa2..d9a9f4f6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2148,42 +2148,44 @@ finished:
> }
>
> /**
> - * remap_vmalloc_range - map vmalloc pages to userspace
> - * @vma: vma to cover (map full range of vma)
> - * @addr: vmalloc memory
> - * @pgoff: number of pages into addr before first page to map
> + * remap_vmalloc_range_partial - map vmalloc pages to userspace
> + * @vma: vma to cover
> + * @uaddr: target user address to start at
> + * @kaddr: virtual address of vmalloc kernel memory
> + * @size: size of map area
> *
> * Returns: 0 for success, -Exxx on failure
> *
> - * This function checks that addr is a valid vmalloc'ed area, and
> - * that it is big enough to cover the vma. Will return failure if
> - * that criteria isn't met.
> + * This function checks that @kaddr is a valid vmalloc'ed area,
> + * and that it is big enough to cover the range starting at
> + * @uaddr in @vma. Will return failure if that criteria isn't
> + * met.
> *
> * Similar to remap_pfn_range() (see mm/memory.c)
> */
> -int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
> - unsigned long pgoff)
> +int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
> + void *kaddr, unsigned long size)
> {
> struct vm_struct *area;
> - unsigned long uaddr = vma->vm_start;
> - unsigned long usize = vma->vm_end - vma->vm_start;
>
> - if ((PAGE_SIZE-1) & (unsigned long)addr)
> + size = PAGE_ALIGN(size);
> +
> + if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
> + ((PAGE_SIZE-1) & (unsigned long)kaddr))
> return -EINVAL;
>
> - area = find_vm_area(addr);
> + area = find_vm_area(kaddr);
> if (!area)
> return -EINVAL;
>
> if (!(area->flags & VM_USERMAP))
> return -EINVAL;
>
> - if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE)
> + if (kaddr + size > area->addr + area->size)
> return -EINVAL;
>
> - addr += pgoff << PAGE_SHIFT;
> do {
> - struct page *page = vmalloc_to_page(addr);
> + struct page *page = vmalloc_to_page(kaddr);
> int ret;
>
> ret = vm_insert_page(vma, uaddr, page);
> @@ -2191,14 +2193,37 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
> return ret;
>
> uaddr += PAGE_SIZE;
> - addr += PAGE_SIZE;
> - usize -= PAGE_SIZE;
> - } while (usize > 0);
> + kaddr += PAGE_SIZE;
> + size -= PAGE_SIZE;
> + } while (size > 0);
>
> vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
>
> return 0;
> }
> +EXPORT_SYMBOL(remap_vmalloc_range_partial);
> +
> +/**
> + * remap_vmalloc_range - map vmalloc pages to userspace
> + * @vma: vma to cover (map full range of vma)
> + * @addr: vmalloc memory
> + * @pgoff: number of pages into addr before first page to map
> + *
> + * Returns: 0 for success, -Exxx on failure
> + *
> + * This function checks that addr is a valid vmalloc'ed area, and
> + * that it is big enough to cover the vma. Will return failure if
> + * that criteria isn't met.
> + *
> + * Similar to remap_pfn_range() (see mm/memory.c)
> + */
> +int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
> + unsigned long pgoff)
> +{
> + return remap_vmalloc_range_partial(vma, vma->vm_start,
> + addr + (pgoff << PAGE_SHIFT),
> + vma->vm_end - vma->vm_start);
> +}
> EXPORT_SYMBOL(remap_vmalloc_range);
>
> /*
On Tue, May 14, 2013 at 10:57:35AM +0900, HATAYAMA Daisuke wrote:
> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
>
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
>
> This patch adds elfnotesegbuf and elfnotesegbuf_sz variables to keep
> pointer to the ELF note segment buffer and its size. There's no longer
> the vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
>
> fs/proc/vmcore.c | 225 ++++++++++++++++++++++++++++++++++++++++--------------
> 1 files changed, 165 insertions(+), 60 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 48886e6..795efd2 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -34,6 +34,9 @@ static char *elfcorebuf;
> static size_t elfcorebuf_sz;
> static size_t elfcorebuf_sz_orig;
>
> +static char *elfnotesegbuf;
> +static size_t elfnotesegbuf_sz;
How about calling these just elfnotes_buf and elfnotes_sz.
[..]
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> + char **notesegptr, size_t *notesegsz,
> + struct list_head *vc_list)
> +{
> + int i, nr_ptnote=0, rc=0;
> + char *tmp;
> + Elf64_Ehdr *ehdr_ptr;
> + Elf64_Phdr phdr;
> + u64 phdr_sz = 0, note_off;
> + struct vm_struct *vm;
> +
> + ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> + /* The first path calculates the number of PT_NOTE entries and
> + * total size of ELF note segment. */
> + rc = process_note_headers_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz, NULL);
> + if (rc < 0)
> + return rc;
> +
> + *notesegsz = roundup(phdr_sz, PAGE_SIZE);
> + *notesegptr = vzalloc(*notesegsz);
> + if (!*notesegptr)
> + return -ENOMEM;
> +
> + vm = find_vm_area(*notesegptr);
> + BUG_ON(!vm);
> + vm->flags |= VM_USERMAP;
> +
> + /* The second path copies the ELF note segment in the ELF note
> + * segment buffer. */
> + rc = process_note_headers_elf64(ehdr_ptr, NULL, NULL, *notesegptr);
So same function process_note_headers_elf64() is doing two different
things based on parameters passed. Please create two new functions
to do two different things and name these appropriately.
Say
get_elf_note_number_and_size()
copy_elf_notes()
> + if (rc < 0)
> + return rc;
> +
> /* Prepare merged PT_NOTE program header. */
> phdr.p_type = PT_NOTE;
> phdr.p_flags = 0;
> @@ -304,23 +364,18 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> return 0;
> }
>
> -/* Merges all the PT_NOTE headers into one. */
> -static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> - struct list_head *vc_list)
> +static int __init process_note_headers_elf32(const Elf32_Ehdr *ehdr_ptr,
> + int *nr_ptnotep, u64 *phdr_szp,
> + char *notesegp)
Can you please describe function parameters at the beginning of function
in a comment. Things are gettting little confusing now.
What does notesegp signify? phdr_szp could be simply *phdr_sz,
nr_ptnotesp could be *nr_notes. Please simplify the naming a bit.
Seems too twisted to me.
Thanks
Vivek
On Tue, May 14, 2013 at 10:57:35AM +0900, HATAYAMA Daisuke wrote:
[..]
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> + char **notesegptr, size_t *notesegsz,
> + struct list_head *vc_list)
> +{
Given that we are copying notes in second kernel, we are not using vc_list
in merge_note_headers() any more. So remove vc_list from paramter list
here.
For local parameters we could simply use notes_buf (instead of notesgptr)
and notes_sz (instead of notesgsz). It seems mroe readable.
Thanks
Vivek
On Tue, May 14, 2013 at 10:57:40AM +0900, HATAYAMA Daisuke wrote:
> Treat memory chunks referenced by PT_LOAD program header entries in
> page-size boundary in vmcore_list. Formally, for each range [start,
> end], we set up the corresponding vmcore object in vmcore_list to
> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
>
> This change affects layout of /proc/vmcore. The gaps generated by the
> rearrangement are newly made visible to applications as
> holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
> start] and [end, roundup(end, PAGE_SIZE)].
>
> Suppose variable m points at a vmcore object in vmcore_list, and
> variable phdr points at the program header of PT_LOAD type the
> variable m corresponds to. Then, pictorially:
>
> m->offset +---------------+
> | hole |
> phdr->p_offset = +---------------+
> m->offset + (paddr - start) | |\
> | kernel memory | phdr->p_memsz
> | |/
> +---------------+
> | hole |
> m->offset + m->size +---------------+
>
> where m->offset and m->offset + m->size are always page-size aligned.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
Looks good to me. I think this patch could be higher up in the series.
Acked-by: Vivek Goyal <[email protected]>
Vivek
> ---
>
> fs/proc/vmcore.c | 30 ++++++++++++++++++++++--------
> 1 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 795efd2..eb7ff29 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -504,20 +504,27 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> vmcore_off = elfsz + elfnotesegsz;
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> + u64 paddr, start, end, size;
> +
> if (phdr_ptr->p_type != PT_LOAD)
> continue;
>
> + paddr = phdr_ptr->p_offset;
> + start = rounddown(paddr, PAGE_SIZE);
> + end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
> + size = end - start;
> +
> /* Add this contiguous chunk of memory to vmcore list.*/
> new = get_new_element();
> if (!new)
> return -ENOMEM;
> - new->paddr = phdr_ptr->p_offset;
> - new->size = phdr_ptr->p_memsz;
> + new->paddr = start;
> + new->size = size;
> list_add_tail(&new->list, vc_list);
>
> /* Update the program header offset. */
> - phdr_ptr->p_offset = vmcore_off;
> - vmcore_off = vmcore_off + phdr_ptr->p_memsz;
> + phdr_ptr->p_offset = vmcore_off + (paddr - start);
> + vmcore_off = vmcore_off + size;
> }
> return 0;
> }
> @@ -540,20 +547,27 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> vmcore_off = elfsz + elfnotesegsz;
>
> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> + u64 paddr, start, end, size;
> +
> if (phdr_ptr->p_type != PT_LOAD)
> continue;
>
> + paddr = phdr_ptr->p_offset;
> + start = rounddown(paddr, PAGE_SIZE);
> + end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
> + size = end - start;
> +
> /* Add this contiguous chunk of memory to vmcore list.*/
> new = get_new_element();
> if (!new)
> return -ENOMEM;
> - new->paddr = phdr_ptr->p_offset;
> - new->size = phdr_ptr->p_memsz;
> + new->paddr = start;
> + new->size = size;
> list_add_tail(&new->list, vc_list);
>
> /* Update the program header offset */
> - phdr_ptr->p_offset = vmcore_off;
> - vmcore_off = vmcore_off + phdr_ptr->p_memsz;
> + phdr_ptr->p_offset = vmcore_off + (paddr - start);
> + vmcore_off = vmcore_off + size;
> }
> return 0;
> }
On Tue, May 14, 2013 at 10:57:46AM +0900, HATAYAMA Daisuke wrote:
> The previous patches newly added holes before each chunk of memory and
> the holes need to be count in vmcore file size. There are two ways to
> count file size in such a way:
>
> 1) supporse p as a poitner to the last program header entry with
> PT_LOAD type, then roundup(p->p_offset + p->p_memsz, PAGE_SIZE), or
>
> 2) calculate sum of size of buffers for ELF header, program headers,
> ELF note segments and objects in vmcore_list.
>
> Although 1) is more direct and simpler than 2), 2) seems better in
> that it reflects internal object structure of /proc/vmcore. Thus, this
> patch changes get_vmcore_size_elf{64, 32} so that it calculates size
> in the way of 2).
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
Looks good to me.
Acked-by: Vivek Goyal <[email protected]>
Vivek
>
> fs/proc/vmcore.c | 40 ++++++++++++++++++----------------------
> 1 files changed, 18 insertions(+), 22 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index eb7ff29..ad6da17 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -210,36 +210,28 @@ static struct vmcore* __init get_new_element(void)
> return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
> }
>
> -static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
> +static u64 __init get_vmcore_size_elf64(size_t elfsz, size_t elfnotesegsz,
> + struct list_head *vc_list)
> {
> - int i;
> u64 size;
> - Elf64_Ehdr *ehdr_ptr;
> - Elf64_Phdr *phdr_ptr;
> + struct vmcore *m;
>
> - ehdr_ptr = (Elf64_Ehdr *)elfptr;
> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> - size = elfsz;
> - for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> - size += phdr_ptr->p_memsz;
> - phdr_ptr++;
> + size = elfsz + elfnotesegsz;
> + list_for_each_entry(m, vc_list, list) {
> + size += m->size;
> }
> return size;
> }
>
> -static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
> +static u64 __init get_vmcore_size_elf32(size_t elfsz, size_t elfnotesegsz,
> + struct list_head *vc_list)
> {
> - int i;
> u64 size;
> - Elf32_Ehdr *ehdr_ptr;
> - Elf32_Phdr *phdr_ptr;
> + struct vmcore *m;
>
> - ehdr_ptr = (Elf32_Ehdr *)elfptr;
> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> - size = elfsz;
> - for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> - size += phdr_ptr->p_memsz;
> - phdr_ptr++;
> + size = elfsz + elfnotesegsz;
> + list_for_each_entry(m, vc_list, list) {
> + size += m->size;
> }
> return size;
> }
> @@ -755,14 +747,18 @@ static int __init parse_crash_elf_headers(void)
> return rc;
>
> /* Determine vmcore size. */
> - vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
> + vmcore_size = get_vmcore_size_elf64(elfcorebuf_sz,
> + elfnotesegbuf_sz,
> + &vmcore_list);
> } else if (e_ident[EI_CLASS] == ELFCLASS32) {
> rc = parse_crash_elf32_headers();
> if (rc)
> return rc;
>
> /* Determine vmcore size. */
> - vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
> + vmcore_size = get_vmcore_size_elf32(elfcorebuf_sz,
> + elfnotesegbuf_sz,
> + &vmcore_list);
> } else {
> pr_warn("Warning: Core image elf header is not sane\n");
> return -EINVAL;
(2013/05/15 1:47), Vivek Goyal wrote:
> On Tue, May 14, 2013 at 10:57:46AM +0900, HATAYAMA Daisuke wrote:
>> The previous patches newly added holes before each chunk of memory and
>> the holes need to be count in vmcore file size. There are two ways to
>> count file size in such a way:
>>
>> 1) supporse p as a poitner to the last program header entry with
>> PT_LOAD type, then roundup(p->p_offset + p->p_memsz, PAGE_SIZE), or
This part was wrong. This should have been:
1) support m as a pointer to the last vmcore object in vmcore_list, then
file size is (m->offset + m->size), or
I'll correct this way in next patch.
Note that no functional change, only description is wrong here.
--
Thanks.
HATAYAMA, Daisuke
(2013/05/15 0:35), Vivek Goyal wrote:
> On Tue, May 14, 2013 at 10:57:35AM +0900, HATAYAMA Daisuke wrote:
>> The reasons why we don't allocate ELF note segment in the 1st kernel
>> (old memory) on page boundary is to keep backward compatibility for
>> old kernels, and that if doing so, we waste not a little memory due to
>> round-up operation to fit the memory to page boundary since most of
>> the buffers are in per-cpu area.
>>
>> ELF notes are per-cpu, so total size of ELF note segments depends on
>> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
>> and there's already system with 4192 CPUs in SGI, where total size
>> amounts to 1MB. This can be larger in the near future or possibly even
>> now on another architecture that has larger size of note per a single
>> cpu. Thus, to avoid the case where memory allocation for large block
>> fails, we allocate vmcore objects on vmalloc memory.
>>
>> This patch adds elfnotesegbuf and elfnotesegbuf_sz variables to keep
>> pointer to the ELF note segment buffer and its size. There's no longer
>> the vmcore object that corresponds to the ELF note segment in
>> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
>> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
>> functions starts calculating offset from sum of size of ELF headers
>> and size of ELF note segment.
>>
>> Signed-off-by: HATAYAMA Daisuke <[email protected]>
>> ---
>>
>> fs/proc/vmcore.c | 225 ++++++++++++++++++++++++++++++++++++++++--------------
>> 1 files changed, 165 insertions(+), 60 deletions(-)
>>
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 48886e6..795efd2 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -34,6 +34,9 @@ static char *elfcorebuf;
>> static size_t elfcorebuf_sz;
>> static size_t elfcorebuf_sz_orig;
>>
>> +static char *elfnotesegbuf;
>> +static size_t elfnotesegbuf_sz;
>
> How about calling these just elfnotes_buf and elfnotes_sz.
>
> [..]
>> +/* Merges all the PT_NOTE headers into one. */
>> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> + char **notesegptr, size_t *notesegsz,
>> + struct list_head *vc_list)
>> +{
>> + int i, nr_ptnote=0, rc=0;
>> + char *tmp;
>> + Elf64_Ehdr *ehdr_ptr;
>> + Elf64_Phdr phdr;
>> + u64 phdr_sz = 0, note_off;
>> + struct vm_struct *vm;
>> +
>> + ehdr_ptr = (Elf64_Ehdr *)elfptr;
>> +
>> + /* The first path calculates the number of PT_NOTE entries and
>> + * total size of ELF note segment. */
>> + rc = process_note_headers_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz, NULL);
>> + if (rc < 0)
>> + return rc;
>> +
>> + *notesegsz = roundup(phdr_sz, PAGE_SIZE);
>> + *notesegptr = vzalloc(*notesegsz);
>> + if (!*notesegptr)
>> + return -ENOMEM;
>> +
>> + vm = find_vm_area(*notesegptr);
>> + BUG_ON(!vm);
>> + vm->flags |= VM_USERMAP;
>> +
>> + /* The second path copies the ELF note segment in the ELF note
>> + * segment buffer. */
>> + rc = process_note_headers_elf64(ehdr_ptr, NULL, NULL, *notesegptr);
>
> So same function process_note_headers_elf64() is doing two different
> things based on parameters passed. Please create two new functions
> to do two different things and name these appropriately.
>
> Say
>
> get_elf_note_number_and_size()
> copy_elf_notes()
I see. Similar to other functions, 32-bit and 64-bit versions are
needed. So I give them symbols:
get_note_number_and_size_elf64()
copy_notes_elf64()
and elf32 counterpart.
>
>
>> + if (rc < 0)
>> + return rc;
>> +
>> /* Prepare merged PT_NOTE program header. */
>> phdr.p_type = PT_NOTE;
>> phdr.p_flags = 0;
>> @@ -304,23 +364,18 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>> return 0;
>> }
>>
>> -/* Merges all the PT_NOTE headers into one. */
>> -static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>> - struct list_head *vc_list)
>> +static int __init process_note_headers_elf32(const Elf32_Ehdr *ehdr_ptr,
>> + int *nr_ptnotep, u64 *phdr_szp,
>> + char *notesegp)
>
> Can you please describe function parameters at the beginning of function
> in a comment. Things are gettting little confusing now.
>
> What does notesegp signify? phdr_szp could be simply *phdr_sz,
> nr_ptnotesp could be *nr_notes. Please simplify the naming a bit.
> Seems too twisted to me.
I see. I'll reflect that in addition to your other comments.
--
Thanks.
HATAYAMA, Daisuke