Hello Vivek,
Sorry for the late reply. It took me some time to implement the page fault
mechanism to be used for zfcpdump.
So here the 5th version of the patch set. Included are also the required
changes for s390 regarding the new mmap support.
First some answers to outstanding questions:
> Ok, zfcpdump is a problem because HSA memory region is in addition to
> regular memory address space.
>
> Yep trying to figure out unused memory region in first kernel and mapping
> HSA to that is little ugly. But you know s390 better, so you decide
> whether you want to take that path or not.
I would prefer the solution with the arch dependent ELF header access
functions as we already discussed (see this patch set).
> How about if we introduce another function to read notes, say
> arch_read_crash_notes(). So arch_read_from_crash_header() is meant
> to get just ELF headers (ELF header and PT_LOAD, PT_NOTE type elf headers)
> and arch_read_crash_notes() gets the data as pointed by PT_NOTE elf
> header.
I added that function.
Besides of that, as said above, I implemented mmap support for zfcpdump
by adding a fault handler function to vmcore. Also I did some other minor
changes (see change log).
In this patch series I did not include the discussed ELF header swap trick
patch because with the ELF header read functions this patch currently is
not necessary.
Best Regards,
Michael
ChangeLog
=========
v4 => v5)
- Add weak function elfcorehdr_read_notes() to read ELF notes
- Rename weak functions for ELF header access and use "vmcorehdr_" prefix
- Generic vmcore code calls elfcorehdr_alloc() if elfcorehdr= is not specified
- Add vmcore fault handler for mmap of non-resident memory regions
- Add weak function remap_oldmem_pfn_range() to be used by zfcpdump for mmap
v3 => v4)
- Rebase to 3.10-rc2 + vmcore mmap patches v8
v2 => v3)
- Get rid of ELFCORE_ADDR_NEWMEM
- Make read_from_crash_header() only read from kernel
- Move read_from_crash_header() to weak function arch_read_from_crash_header()
- Implement read_from_crash_header() strong function for s390
- Set elfcorehdr_addr to address of new ELF header
v1 => v2)
- Rebase 3.10-rc2 + vmcore mmap patches
- Introduced arch_get/free_crash_header() and ELFCORE_ADDR_NEWMEM
Feature Description
===================
For s390 we want to use /proc/vmcore for our SCSI stand-alone
dump (zfcpdump). We have support where the first HSA_SIZE bytes are
saved into a hypervisor owned memory area (HSA) before the kdump
kernel is booted. When the kdump kernel starts, it is restricted
to use only HSA_SIZE bytes.
The advantages of this mechanism are:
* No crashkernel memory has to be defined in the old kernel.
* Early boot problems (before kexec_load has been done) can be dumped
* Non-Linux systems can be dumped.
We modify the s390 copy_oldmem_page() function to read from the HSA memory
if memory below HSA_SIZE bytes is requested.
Since we cannot use the kexec tool to load the kernel in this scenario,
we have to build the ELF header in the 2nd (kdump/new) kernel.
So with the following patch set we would like to introduce the new
function that the ELF header for /proc/vmcore can be created in the 2nd
kernel memory.
The following steps are done during zfcpdump execution:
1. Production system crashes
2. User boots a SCSI disk that has been prepared with the zfcpdump tool
3. Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
4. Boot loader loads kernel into low memory area
5. Kernel boots and uses only HSA_SIZE bytes of memory
6. Kernel saves registers of non-boot CPUs
7. Kernel does memory detection for dump memory map
8. Kernel creates ELF header for /proc/vmcore
9. /proc/vmcore uses this header for initialization
10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
- copy_oldmem_page() copies from HSA for memory below HSA_SIZE
- copy_oldmem_page() copies from real memory for memory above HSA_SIZE
Jan Willeke (1):
s390/kdump/mmap: Implement remap_oldmem_pfn_range for s390
Michael Holzheu (4):
kdump: Introduce ELF header in new memory feature
s390/kdump: Use ELF header in new memory feature
vmcore/mmap: Introduce remap_oldmem_pfn_range()
s390/kdump: Use vmcore for zfcpdump
arch/s390/Kconfig | 3 +-
arch/s390/include/asm/sclp.h | 1 +
arch/s390/kernel/crash_dump.c | 203 +++++++++++++++++++++++++++++++++---------
drivers/s390/char/zcore.c | 6 +-
fs/proc/vmcore.c | 157 +++++++++++++++++++++++++++-----
include/linux/crash_dump.h | 8 ++
6 files changed, 314 insertions(+), 64 deletions(-)
--
1.8.1.6
Currently for s390 we create the ELF core header in the 2nd kernel
with a small trick. We relocate the addresses in the ELF header in
a way that for the /proc/vmcore code it seems to be in the 1st kernel
(old) memory and the read_from_oldmem() returns the correct data.
This allows the /proc/vmcore code to use the ELF header in the
2nd kernel.
This patch now exchanges the old mechanism with the new and much
cleaner function call override feature that now offcially allows to
create the ELF core header in the 2nd kernel.
To use the new feature the following function have to be defined
by the architecture backend code to read from new memory:
* elfcorehdr_alloc: Return the address and size of the ELF header
* elfcorehdr_free: Free the memory of the ELF header
* elfcorehdr_read: Read from ELF header
* elfcorehdr_read_notes: Read from ELF notes
Signed-off-by: Michael Holzheu <[email protected]>
---
fs/proc/vmcore.c | 63 +++++++++++++++++++++++++++++++++++++---------
include/linux/crash_dump.h | 5 ++++
2 files changed, 56 insertions(+), 12 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 0f6db52..221f84b 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -123,6 +123,36 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
return read;
}
+/*
+ * Architectures may override this function to allocate ELF header in 2nd kernel
+ */
+int __weak elfcorehdr_alloc(void)
+{
+ return -ENODEV;
+}
+
+/*
+ * Architectures may override this function to free header
+ */
+void __weak elfcorehdr_free(void)
+{}
+
+/*
+ * Architectures may override this function to read from ELF header
+ */
+ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ return read_from_oldmem(buf, count, ppos, 0);
+}
+
+/*
+ * Architectures may override this function to read from notes sections
+ */
+ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
+{
+ return read_from_oldmem(buf, count, ppos, 0);
+}
+
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
@@ -322,7 +352,7 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
notes_section = kmalloc(max_sz, GFP_KERNEL);
if (!notes_section)
return -ENOMEM;
- rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
if (rc < 0) {
kfree(notes_section);
return rc;
@@ -409,7 +439,8 @@ static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
if (phdr_ptr->p_type != PT_NOTE)
continue;
offset = phdr_ptr->p_offset;
- rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
+ &offset);
if (rc < 0)
return rc;
notes_buf += phdr_ptr->p_memsz;
@@ -510,7 +541,7 @@ static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
notes_section = kmalloc(max_sz, GFP_KERNEL);
if (!notes_section)
return -ENOMEM;
- rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
if (rc < 0) {
kfree(notes_section);
return rc;
@@ -597,7 +628,8 @@ static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
if (phdr_ptr->p_type != PT_NOTE)
continue;
offset = phdr_ptr->p_offset;
- rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
+ &offset);
if (rc < 0)
return rc;
notes_buf += phdr_ptr->p_memsz;
@@ -793,7 +825,7 @@ static int __init parse_crash_elf64_headers(void)
addr = elfcorehdr_addr;
/* Read Elf header */
- rc = read_from_oldmem((char*)&ehdr, sizeof(Elf64_Ehdr), &addr, 0);
+ rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf64_Ehdr), &addr);
if (rc < 0)
return rc;
@@ -820,7 +852,7 @@ static int __init parse_crash_elf64_headers(void)
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
+ rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
if (rc < 0)
goto fail;
@@ -849,7 +881,7 @@ static int __init parse_crash_elf32_headers(void)
addr = elfcorehdr_addr;
/* Read Elf header */
- rc = read_from_oldmem((char*)&ehdr, sizeof(Elf32_Ehdr), &addr, 0);
+ rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf32_Ehdr), &addr);
if (rc < 0)
return rc;
@@ -875,7 +907,7 @@ static int __init parse_crash_elf32_headers(void)
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
+ rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
if (rc < 0)
goto fail;
@@ -902,7 +934,7 @@ static int __init parse_crash_elf_headers(void)
int rc=0;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(e_ident, EI_NIDENT, &addr, 0);
+ rc = elfcorehdr_read(e_ident, EI_NIDENT, &addr);
if (rc < 0)
return rc;
if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
@@ -935,10 +967,17 @@ static int __init vmcore_init(void)
{
int rc = 0;
- /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
- if (!(is_vmcore_usable()))
- return rc;
+ /*
+ * If elfcorehdr= has not been passed in cmdline, try to get the
+ * header from 2nd kernel, then capture the dump.
+ */
+ if (!(is_vmcore_usable())) {
+ rc = elfcorehdr_alloc();
+ if (rc)
+ return rc;
+ }
rc = parse_crash_elf_headers();
+ elfcorehdr_free();
if (rc) {
pr_warn("Kdump: vmcore not initialized\n");
return rc;
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 37e4f8d..dd0f434 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -12,6 +12,11 @@
extern unsigned long long elfcorehdr_addr;
extern unsigned long long elfcorehdr_size;
+extern int __weak elfcorehdr_alloc(void);
+extern void __weak elfcorehdr_free(void);
+extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
+extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+
extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
--
1.8.1.6
For zfcpdump we can't map the HSA storage because it is only available
via a read interface. Therefore, for the new vmcore mmap feature we have
introduce a new mechanism to create mappings on demand.
This patch introduces a new architecture function remap_oldmem_pfn_range()
that should be used to create mappings with remap_pfn_range() for oldmem
areas that can be directly mapped. For zfcpdump this is everything besides
of the HSA memory. For the areas that are not mapped by remap_oldmem_pfn_range()
a generic vmcore a new generic vmcore fault handler mmap_vmcore_fault()
is called. This handler does the following:
* Check if /proc/vmcore page cache page is already available
* If yes:
- Return that page
* If no:
- Allocate new page
- Fill page using __vmcore_read()
- Add new page to page cache
Signed-off-by: Michael Holzheu <[email protected]>
---
fs/proc/vmcore.c | 94 ++++++++++++++++++++++++++++++++++++++++++----
include/linux/crash_dump.h | 3 ++
2 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 221f84b..259a639 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -21,6 +21,7 @@
#include <linux/crash_dump.h>
#include <linux/list.h>
#include <linux/vmalloc.h>
+#include <linux/pagemap.h>
#include <asm/uaccess.h>
#include <asm/io.h>
#include "internal.h"
@@ -153,11 +154,36 @@ ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
return read_from_oldmem(buf, count, ppos, 0);
}
+/*
+ * Architectures may override this function to map oldmem
+ */
+int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range(vma, from, pfn, size, prot);
+}
+
+/*
+ * Copy to either kernel or user space
+ */
+static int copy_to(void *target, void *src, size_t size, int userbuf)
+{
+ if (userbuf) {
+ if (copy_to_user(target, src, size))
+ return -EFAULT;
+ return 0;
+ } else {
+ memcpy(target, src, size);
+ return 0;
+ }
+}
+
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
-static ssize_t read_vmcore(struct file *file, char __user *buffer,
- size_t buflen, loff_t *fpos)
+static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
+ int userbuf)
{
ssize_t acc = 0, tmp;
size_t tsz;
@@ -174,7 +200,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
/* Read ELF core header */
if (*fpos < elfcorebuf_sz) {
tsz = min(elfcorebuf_sz - (size_t)*fpos, buflen);
- if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
+ if (copy_to(buffer, elfcorebuf + *fpos, tsz, userbuf))
return -EFAULT;
buflen -= tsz;
*fpos += tsz;
@@ -192,7 +218,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
- if (copy_to_user(buffer, kaddr, tsz))
+ if (copy_to(buffer, kaddr, tsz, userbuf))
return -EFAULT;
buflen -= tsz;
*fpos += tsz;
@@ -208,7 +234,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
if (*fpos < m->offset + m->size) {
tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);
start = m->paddr + *fpos - m->offset;
- tmp = read_from_oldmem(buffer, tsz, &start, 1);
+ tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
if (tmp < 0)
return tmp;
buflen -= tsz;
@@ -225,6 +251,56 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
return acc;
}
+static ssize_t read_vmcore(struct file *file, char __user *buffer,
+ size_t buflen, loff_t *fpos)
+{
+ return __read_vmcore(buffer, buflen, fpos, 1);
+}
+
+/*
+ * The vmcore fault handler uses the page cache and fills data using the
+ * standard __vmcore_read() function.
+ */
+static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ struct address_space *mapping = vma->vm_private_data;
+ pgoff_t index = vmf->pgoff;
+ struct page *page;
+ loff_t src;
+ char *buf;
+ int rc;
+
+find_page:
+ page = find_lock_page(mapping, index);
+ if (page) {
+ unlock_page(page);
+ rc = VM_FAULT_MINOR;
+ } else {
+ page = page_cache_alloc_cold(mapping);
+ if (!page)
+ return VM_FAULT_OOM;
+ rc = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
+ if (rc) {
+ page_cache_release(page);
+ if (rc == -EEXIST)
+ goto find_page;
+ /* Probably ENOMEM for radix tree node */
+ return VM_FAULT_OOM;
+ }
+ buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
+ src = index << PAGE_CACHE_SHIFT;
+ __read_vmcore(buf, PAGE_SIZE, &src, 0);
+ unlock_page(page);
+ rc = VM_FAULT_MAJOR;
+ }
+ vmf->page = page;
+ return rc;
+}
+
+static const struct vm_operations_struct vmcore_mmap_ops = {
+ .fault = mmap_vmcore_fault,
+};
+
static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
{
size_t size = vma->vm_end - vma->vm_start;
@@ -242,6 +318,8 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
vma->vm_flags |= VM_MIXEDMAP;
+ vma->vm_ops = &vmcore_mmap_ops;
+ vma->vm_private_data = file->f_mapping;
len = 0;
@@ -283,9 +361,9 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
tsz = min_t(size_t, m->offset + m->size - start, size);
paddr = m->paddr + start - m->offset;
- if (remap_pfn_range(vma, vma->vm_start + len,
- paddr >> PAGE_SHIFT, tsz,
- vma->vm_page_prot))
+ if (remap_oldmem_pfn_range(vma, vma->vm_start + len,
+ paddr >> PAGE_SHIFT, tsz,
+ vma->vm_page_prot))
goto fail;
size -= tsz;
start += tsz;
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index dd0f434..c0818de 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -16,6 +16,9 @@ extern int __weak elfcorehdr_alloc(void);
extern void __weak elfcorehdr_free(void);
extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+extern int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot);
extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
--
1.8.1.6
From: Jan Willeke <[email protected]>
This patch introduces the s390 specific way to map pages from oldmem.
The memory area below OLDMEM_SIZE is mapped with offset OLDMEM_BASE.
The other old memory is mapped directly.
Signed-off-by: Jan Willeke <[email protected]>
Signed-off-by: Michael Holzheu <[email protected]>
---
arch/s390/kernel/crash_dump.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index a17f8d4..12ccc3c 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -56,6 +56,32 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
}
/*
+ * Remap "oldmem"
+ *
+ * For the kdump reserved memory this functions performs a swap operation:
+ * [0 - OLDMEM_SIZE] is mapped to [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE]
+ */
+int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ unsigned long size_old;
+ int rc;
+
+ if (pfn < OLDMEM_SIZE >> PAGE_SHIFT) {
+ size_old = min(size, OLDMEM_SIZE - (pfn << PAGE_SHIFT));
+ rc = remap_pfn_range(vma, from,
+ pfn + (OLDMEM_BASE >> PAGE_SHIFT),
+ size_old, prot);
+ if (rc || size == size_old)
+ return rc;
+ size -= size_old;
+ from += size_old;
+ pfn += size_old >> PAGE_SHIFT;
+ }
+ return remap_pfn_range(vma, from, pfn, size, prot);
+}
+
+/*
* Copy memory from old kernel
*/
int copy_from_oldmem(void *dest, void *src, size_t count)
--
1.8.1.6
This patch now exchanges the old relocate mechanism with the new
arch function call override mechanism that allows to create the ELF
core header in the 2nd kernel.
Signed-off-by: Michael Holzheu <[email protected]>
---
arch/s390/kernel/crash_dump.c | 70 ++++++++++++++++++++++++++-----------------
1 file changed, 43 insertions(+), 27 deletions(-)
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index f703d91..a17f8d4 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -22,6 +22,11 @@
#define PTR_DIFF(x, y) ((unsigned long)(((char *) (x)) - ((unsigned long) (y))))
/*
+ * Pointer to ELF header in new kernel
+ */
+static void *elfcorehdr_newmem;
+
+/*
* Copy one page from "oldmem"
*
* For the kdump reserved memory this functions performs a swap operation:
@@ -325,14 +330,6 @@ static int get_mem_chunk_cnt(void)
}
/*
- * Relocate pointer in order to allow vmcore code access the data
- */
-static inline unsigned long relocate(unsigned long addr)
-{
- return OLDMEM_BASE + addr;
-}
-
-/*
* Initialize ELF loads (new kernel)
*/
static int loads_init(Elf64_Phdr *phdr, u64 loads_offset)
@@ -383,7 +380,7 @@ static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
ptr = nt_vmcoreinfo(ptr);
memset(phdr, 0, sizeof(*phdr));
phdr->p_type = PT_NOTE;
- phdr->p_offset = relocate(notes_offset);
+ phdr->p_offset = notes_offset;
phdr->p_filesz = (unsigned long) PTR_SUB(ptr, ptr_start);
phdr->p_memsz = phdr->p_filesz;
return ptr;
@@ -392,7 +389,7 @@ static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
/*
* Create ELF core header (new kernel)
*/
-static void s390_elf_corehdr_create(char **elfcorebuf, size_t *elfcorebuf_sz)
+int elfcorehdr_alloc(void)
{
Elf64_Phdr *phdr_notes, *phdr_loads;
int mem_chunk_cnt;
@@ -400,6 +397,8 @@ static void s390_elf_corehdr_create(char **elfcorebuf, size_t *elfcorebuf_sz)
u32 alloc_size;
u64 hdr_off;
+ if (!OLDMEM_BASE)
+ return -ENOENT;
mem_chunk_cnt = get_mem_chunk_cnt();
alloc_size = 0x1000 + get_cpu_cnt() * 0x300 +
@@ -417,27 +416,44 @@ static void s390_elf_corehdr_create(char **elfcorebuf, size_t *elfcorebuf_sz)
ptr = notes_init(phdr_notes, ptr, ((unsigned long) hdr) + hdr_off);
/* Init loads */
hdr_off = PTR_DIFF(ptr, hdr);
- loads_init(phdr_loads, ((unsigned long) hdr) + hdr_off);
- *elfcorebuf_sz = hdr_off;
- *elfcorebuf = (void *) relocate((unsigned long) hdr);
- BUG_ON(*elfcorebuf_sz > alloc_size);
+ loads_init(phdr_loads, hdr_off);
+ elfcorehdr_addr = (unsigned long long) hdr;
+ elfcorehdr_size = (unsigned long long) hdr_off;
+ elfcorehdr_newmem = hdr;
+ BUG_ON(elfcorehdr_size > alloc_size);
+ return 0;
}
/*
- * Create kdump ELF core header in new kernel, if it has not been passed via
- * the "elfcorehdr" kernel parameter
+ * Free ELF core header (new kernel)
*/
-static int setup_kdump_elfcorehdr(void)
+void elfcorehdr_free(void)
{
- size_t elfcorebuf_sz;
- char *elfcorebuf;
-
- if (!OLDMEM_BASE || is_kdump_kernel())
- return -EINVAL;
- s390_elf_corehdr_create(&elfcorebuf, &elfcorebuf_sz);
- elfcorehdr_addr = (unsigned long long) elfcorebuf;
- elfcorehdr_size = elfcorebuf_sz;
- return 0;
+ if (!elfcorehdr_newmem)
+ return;
+ vfree(elfcorehdr_newmem);
+ elfcorehdr_newmem = NULL;
}
-subsys_initcall(setup_kdump_elfcorehdr);
+/*
+ * Read from ELF header
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ void *src = (void *)(unsigned long)*ppos;
+
+ if (elfcorehdr_newmem)
+ memcpy(buf, src, count);
+ else
+ copy_from_oldmem(buf, src, count);
+ *ppos += count;
+ return count;
+}
+
+/*
+ * Read from ELF notes data
+ */
+ssize_t elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
+{
+ return elfcorehdr_read(buf, count, ppos);
+}
--
1.8.1.6
This patch modifies the s390 copy_oldmem_page() and remap_oldmem_pfn_range()
function for zfcpdump to read from the HSA memory if memory below HSA_SIZE
bytes is requested. Otherwise real memory is used.
Signed-off-by: Michael Holzheu <[email protected]>
---
arch/s390/Kconfig | 3 +-
arch/s390/include/asm/sclp.h | 1 +
arch/s390/kernel/crash_dump.c | 113 ++++++++++++++++++++++++++++++++++++------
drivers/s390/char/zcore.c | 6 +--
4 files changed, 103 insertions(+), 20 deletions(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 2c9789d..a0d78f1 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -516,6 +516,7 @@ config CRASH_DUMP
bool "kernel crash dumps"
depends on 64BIT && SMP
select KEXEC
+ select ZFCPDUMP
help
Generate crash dump after being started by kexec.
Crash dump kernels are loaded in the main kernel with kexec-tools
@@ -526,7 +527,7 @@ config CRASH_DUMP
config ZFCPDUMP
def_bool n
prompt "zfcpdump support"
- select SMP
+ depends on SMP
help
Select this option if you want to build an zfcpdump enabled kernel.
Refer to <file:Documentation/s390/zfcpdump.txt> for more details on this.
diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 06a1361..7dc7f9c 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -56,5 +56,6 @@ bool sclp_has_linemode(void);
bool sclp_has_vt220(void);
int sclp_pci_configure(u32 fid);
int sclp_pci_deconfigure(u32 fid);
+int memcpy_hsa(void *dest, unsigned long src, size_t count, int mode);
#endif /* _ASM_S390_SCLP_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 12ccc3c..ee0770d 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -16,6 +16,7 @@
#include <asm/os_info.h>
#include <asm/elf.h>
#include <asm/ipl.h>
+#include <asm/sclp.h>
#define PTR_ADD(x, y) (((char *) (x)) + ((unsigned long) (y)))
#define PTR_SUB(x, y) (((char *) (x)) - ((unsigned long) (y)))
@@ -27,21 +28,36 @@
static void *elfcorehdr_newmem;
/*
+ * For pages below ZFCPDUMP_HSA_SIZE memory from the HSA is copied. Otherwise
+ * real memory copy is used.
+ */
+static ssize_t copy_oldmem_page_zfcpdump(char *buf, size_t csize,
+ unsigned long src, int userbuf)
+{
+ if (src < ZFCPDUMP_HSA_SIZE) {
+ if (memcpy_hsa(buf, src, csize, userbuf) < 0)
+ return -EINVAL;
+ } else {
+ if (userbuf)
+ copy_to_user_real((void __force __user *) buf,
+ (void *) src, csize);
+ else
+ memcpy_real(buf, (void *) src, csize);
+ }
+ return csize;
+}
+
+/*
* Copy one page from "oldmem"
*
* For the kdump reserved memory this functions performs a swap operation:
* - [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE] is mapped to [0 - OLDMEM_SIZE].
* - [0 - OLDMEM_SIZE] is mapped to [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE]
*/
-ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
- size_t csize, unsigned long offset, int userbuf)
-{
- unsigned long src;
+static ssize_t copy_oldmem_page_kdump(char *buf, size_t csize,
+ unsigned long src, int userbuf)
- if (!csize)
- return 0;
-
- src = (pfn << PAGE_SHIFT) + offset;
+{
if (src < OLDMEM_SIZE)
src += OLDMEM_BASE;
else if (src > OLDMEM_BASE &&
@@ -56,13 +72,31 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
}
/*
- * Remap "oldmem"
+ * Copy one page from "oldmem"
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+ unsigned long offset, int userbuf)
+{
+ unsigned long src;
+ if (!csize)
+ return 0;
+ src = (pfn << PAGE_SHIFT) + offset;
+
+ if (OLDMEM_BASE)
+ return copy_oldmem_page_kdump(buf, csize, src, userbuf);
+ else
+ return copy_oldmem_page_zfcpdump(buf, csize, src, userbuf);
+}
+
+/*
+ * Remap "oldmem" for kdump
*
* For the kdump reserved memory this functions performs a swap operation:
* [0 - OLDMEM_SIZE] is mapped to [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE]
*/
-int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static int remap_oldmem_pfn_range_kdump(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot)
{
unsigned long size_old;
int rc;
@@ -82,6 +116,43 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
}
/*
+ * Remap "oldmem" for zfcpdump
+ *
+ * We only map available memory above ZFCPDUMP_HSA_SIZE. Memory below
+ * ZFCPDUMP_HSA_SIZE is read on demand using the copy_oldmem_page() function.
+ */
+static int remap_oldmem_pfn_range_zfcpdump(struct vm_area_struct *vma,
+ unsigned long from,
+ unsigned long pfn,
+ unsigned long size, pgprot_t prot)
+{
+ unsigned long size_hsa;
+
+ if (pfn < ZFCPDUMP_HSA_SIZE >> PAGE_SHIFT) {
+ size_hsa = min(size, OLDMEM_SIZE - (pfn << PAGE_SHIFT));
+ if (size == size_hsa)
+ return 0;
+ size -= size_hsa;
+ from += size_hsa;
+ pfn += size_hsa >> PAGE_SHIFT;
+ }
+ return remap_pfn_range(vma, from, pfn, size, prot);
+}
+
+/*
+ * Remap "oldmem" for kdump or zfcpdump
+ */
+int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ if (OLDMEM_BASE)
+ return remap_oldmem_pfn_range_kdump(vma, from, pfn, size, prot);
+ else
+ return remap_oldmem_pfn_range_zfcpdump(vma, from, pfn, size,
+ prot);
+}
+
+/*
* Copy memory from old kernel
*/
int copy_from_oldmem(void *dest, void *src, size_t count)
@@ -89,11 +160,21 @@ int copy_from_oldmem(void *dest, void *src, size_t count)
unsigned long copied = 0;
int rc;
- if ((unsigned long) src < OLDMEM_SIZE) {
- copied = min(count, OLDMEM_SIZE - (unsigned long) src);
- rc = memcpy_real(dest, src + OLDMEM_BASE, copied);
- if (rc)
- return rc;
+ if (OLDMEM_BASE) {
+ if ((unsigned long) src < OLDMEM_SIZE) {
+ copied = min(count, OLDMEM_SIZE - (unsigned long) src);
+ rc = memcpy_real(dest, src + OLDMEM_BASE, copied);
+ if (rc)
+ return rc;
+ }
+ } else {
+ if ((unsigned long) src < ZFCPDUMP_HSA_SIZE) {
+ copied = min(count,
+ ZFCPDUMP_HSA_SIZE - (unsigned long) src);
+ rc = memcpy_hsa(dest, (unsigned long) src, copied, 0);
+ if (rc)
+ return rc;
+ }
}
return memcpy_real(dest + copied, src + copied, count - copied);
}
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index 9e5e146..794820a 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -30,8 +30,8 @@
#define TRACE(x...) debug_sprintf_event(zcore_dbf, 1, x)
-#define TO_USER 0
-#define TO_KERNEL 1
+#define TO_USER 1
+#define TO_KERNEL 0
#define CHUNK_INFO_SIZE 34 /* 2 16-byte char, each followed by blank */
enum arch_id {
@@ -73,7 +73,7 @@ static struct ipl_parameter_block *ipl_block;
* @count: Size of buffer, which should be copied
* @mode: Either TO_KERNEL or TO_USER
*/
-static int memcpy_hsa(void *dest, unsigned long src, size_t count, int mode)
+int memcpy_hsa(void *dest, unsigned long src, size_t count, int mode)
{
int offs, blk_num;
static char buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
--
1.8.1.6
On Sat, 8 Jun 2013 11:38:00 +0400
Maxim Uvarov <[email protected]> wrote:
[snip]
> > +static int copy_to(void *target, void *src, size_t size, int
> > userbuf) +{
> > + if (userbuf) {
> > + if (copy_to_user(target, src, size))
> > + return -EFAULT;
> > + return 0;
> > + } else {
> > + memcpy(target, src, size);
> > + return 0;
> > + }
> > +}
> >
>
> It is better to do return 0 in the end of function.
Sure, I will change that.
Thanks!
Michael
2013/6/8 Michael Holzheu <[email protected]>:
<cut>
> @@ -935,10 +967,17 @@ static int __init vmcore_init(void)
> {
> int rc = 0;
>
> - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> - if (!(is_vmcore_usable()))
> - return rc;
> + /*
> + * If elfcorehdr= has not been passed in cmdline, try to get the
> + * header from 2nd kernel, then capture the dump.
> + */
> + if (!(is_vmcore_usable())) {
> + rc = elfcorehdr_alloc();
> + if (rc)
> + return rc;
> + }
Here you removed comment in the 1st path. Please keep it.
Thanks.
HATAYAMA, Daisuke
2013/6/8 Michael Holzheu <[email protected]>:
<cut>
> @@ -417,27 +416,44 @@ static void s390_elf_corehdr_create(char **elfcorebuf, size_t *elfcorebuf_sz)
> ptr = notes_init(phdr_notes, ptr, ((unsigned long) hdr) + hdr_off);
> /* Init loads */
> hdr_off = PTR_DIFF(ptr, hdr);
> - loads_init(phdr_loads, ((unsigned long) hdr) + hdr_off);
> - *elfcorebuf_sz = hdr_off;
> - *elfcorebuf = (void *) relocate((unsigned long) hdr);
> - BUG_ON(*elfcorebuf_sz > alloc_size);
> + loads_init(phdr_loads, hdr_off);
> + elfcorehdr_addr = (unsigned long long) hdr;
> + elfcorehdr_size = (unsigned long long) hdr_off;
> + elfcorehdr_newmem = hdr;
> + BUG_ON(elfcorehdr_size > alloc_size);
> + return 0;
> }
>
> /*
> - * Create kdump ELF core header in new kernel, if it has not been passed via
> - * the "elfcorehdr" kernel parameter
> + * Free ELF core header (new kernel)
> */
> -static int setup_kdump_elfcorehdr(void)
> +void elfcorehdr_free(void)
> {
> - size_t elfcorebuf_sz;
> - char *elfcorebuf;
> -
> - if (!OLDMEM_BASE || is_kdump_kernel())
> - return -EINVAL;
> - s390_elf_corehdr_create(&elfcorebuf, &elfcorebuf_sz);
> - elfcorehdr_addr = (unsigned long long) elfcorebuf;
> - elfcorehdr_size = elfcorebuf_sz;
> - return 0;
> + if (!elfcorehdr_newmem)
> + return;
> + vfree(elfcorehdr_newmem);
kfree is correct here?
Thanks.
HATAYAMA, Daisuke
2013/6/8 Michael Holzheu <[email protected]>:
<cut>
> @@ -225,6 +251,56 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
> return acc;
> }
>
> +static ssize_t read_vmcore(struct file *file, char __user *buffer,
> + size_t buflen, loff_t *fpos)
> +{
> + return __read_vmcore(buffer, buflen, fpos, 1);
> +}
> +
> +/*
> + * The vmcore fault handler uses the page cache and fills data using the
> + * standard __vmcore_read() function.
> + */
> +static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> + struct address_space *mapping = vma->vm_private_data;
> + pgoff_t index = vmf->pgoff;
> + struct page *page;
> + loff_t src;
> + char *buf;
> + int rc;
> +
> +find_page:
> + page = find_lock_page(mapping, index);
> + if (page) {
> + unlock_page(page);
> + rc = VM_FAULT_MINOR;
> + } else {
> + page = page_cache_alloc_cold(mapping);
> + if (!page)
> + return VM_FAULT_OOM;
> + rc = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
> + if (rc) {
> + page_cache_release(page);
> + if (rc == -EEXIST)
> + goto find_page;
> + /* Probably ENOMEM for radix tree node */
> + return VM_FAULT_OOM;
> + }
> + buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> + src = index << PAGE_CACHE_SHIFT;
> + __read_vmcore(buf, PAGE_SIZE, &src, 0);
> + unlock_page(page);
> + rc = VM_FAULT_MAJOR;
> + }
> + vmf->page = page;
> + return rc;
> +}
How about reusing find_or_create_page()?
Thanks.
HATAYAMA, Daisuke
On Mon, 10 Jun 2013 22:36:57 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> > -static int setup_kdump_elfcorehdr(void)
> > +void elfcorehdr_free(void)
> > {
> > - size_t elfcorebuf_sz;
> > - char *elfcorebuf;
> > -
> > - if (!OLDMEM_BASE || is_kdump_kernel())
> > - return -EINVAL;
> > - s390_elf_corehdr_create(&elfcorebuf, &elfcorebuf_sz);
> > - elfcorehdr_addr = (unsigned long long) elfcorebuf;
> > - elfcorehdr_size = elfcorebuf_sz;
> > - return 0;
> > + if (!elfcorehdr_newmem)
> > + return;
> > + vfree(elfcorehdr_newmem);
>
> kfree is correct here?
You are right, I fixed that!
Thanks!
Michael
On Mon, 10 Jun 2013 22:35:30 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> 2013/6/8 Michael Holzheu <[email protected]>:
> <cut>
> > @@ -935,10 +967,17 @@ static int __init vmcore_init(void)
> > {
> > int rc = 0;
> >
> > - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> > - if (!(is_vmcore_usable()))
> > - return rc;
> > + /*
> > + * If elfcorehdr= has not been passed in cmdline, try to get the
> > + * header from 2nd kernel, then capture the dump.
> > + */
> > + if (!(is_vmcore_usable())) {
> > + rc = elfcorehdr_alloc();
> > + if (rc)
> > + return rc;
> > + }
>
> Here you removed comment in the 1st path. Please keep it.
Perhaps something like the following:
/*
* If elfcorehdr= has been passed in cmdline, then capture the dump.
* Otherwise, try to get the header from the 2nd kernel.
*/
Michael
On Mon, 10 Jun 2013 22:40:24 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> > +static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> > +{
> > + struct address_space *mapping = vma->vm_private_data;
> > + pgoff_t index = vmf->pgoff;
> > + struct page *page;
> > + loff_t src;
> > + char *buf;
> > + int rc;
> > +
> > +find_page:
> > + page = find_lock_page(mapping, index);
> > + if (page) {
> > + unlock_page(page);
> > + rc = VM_FAULT_MINOR;
> > + } else {
> > + page = page_cache_alloc_cold(mapping);
> > + if (!page)
> > + return VM_FAULT_OOM;
> > + rc = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
> > + if (rc) {
> > + page_cache_release(page);
> > + if (rc == -EEXIST)
> > + goto find_page;
> > + /* Probably ENOMEM for radix tree node */
> > + return VM_FAULT_OOM;
> > + }
> > + buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> > + src = index << PAGE_CACHE_SHIFT;
> > + __read_vmcore(buf, PAGE_SIZE, &src, 0);
> > + unlock_page(page);
> > + rc = VM_FAULT_MAJOR;
> > + }
> > + vmf->page = page;
> > + return rc;
> > +}
>
> How about reusing find_or_create_page()?
Hmmm, how would I know then if I have to fill the page with
__read_vmcore() or not?
Best Regards,
Michael
On Mon, 10 Jun 2013 22:40:24 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> 2013/6/8 Michael Holzheu <[email protected]>:
> <cut>
> > @@ -225,6 +251,56 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
> > return acc;
> > }
> >
> > +static ssize_t read_vmcore(struct file *file, char __user *buffer,
> > + size_t buflen, loff_t *fpos)
> > +{
> > + return __read_vmcore(buffer, buflen, fpos, 1);
> > +}
> > +
> > +/*
> > + * The vmcore fault handler uses the page cache and fills data using the
> > + * standard __vmcore_read() function.
> > + */
> > +static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> > +{
> > + struct address_space *mapping = vma->vm_private_data;
> > + pgoff_t index = vmf->pgoff;
> > + struct page *page;
> > + loff_t src;
> > + char *buf;
> > + int rc;
> > +
> > +find_page:
> > + page = find_lock_page(mapping, index);
> > + if (page) {
> > + unlock_page(page);
> > + rc = VM_FAULT_MINOR;
> > + } else {
> > + page = page_cache_alloc_cold(mapping);
> > + if (!page)
> > + return VM_FAULT_OOM;
> > + rc = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
> > + if (rc) {
> > + page_cache_release(page);
> > + if (rc == -EEXIST)
> > + goto find_page;
> > + /* Probably ENOMEM for radix tree node */
> > + return VM_FAULT_OOM;
> > + }
> > + buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> > + src = index << PAGE_CACHE_SHIFT;
> > + __read_vmcore(buf, PAGE_SIZE, &src, 0);
> > + unlock_page(page);
> > + rc = VM_FAULT_MAJOR;
> > + }
> > + vmf->page = page;
> > + return rc;
> > +}
>
> How about reusing find_or_create_page()?
The function would then look like the following:
static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct address_space *mapping = vma->vm_private_data;
pgoff_t index = vmf->pgoff;
struct page *page;
loff_t src;
char *buf;
page = find_or_create_page(mapping, index, GFP_KERNEL);
if (!page)
return VM_FAULT_OOM;
src = index << PAGE_CACHE_SHIFT;
buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
__read_vmcore(buf, PAGE_SIZE, &src, 0);
unlock_page(page);
vmf->page = page;
return 0;
}
I agree that this makes the function simpler but we have to copy
the page also if it has already been filled, correct?
But since normally only one process uses /proc/vmcore this might be
acceptable.
BTW: I also removed the VM_FAULT_MAJOR/MINOR because I think the fault
handler should return 0 if a page has been found.
Michael
2013/6/11 Michael Holzheu <[email protected]>:
> On Mon, 10 Jun 2013 22:40:24 +0900
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> 2013/6/8 Michael Holzheu <[email protected]>:
>> <cut>
>> > @@ -225,6 +251,56 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>> > return acc;
>> > }
>> >
>> > +static ssize_t read_vmcore(struct file *file, char __user *buffer,
>> > + size_t buflen, loff_t *fpos)
>> > +{
>> > + return __read_vmcore(buffer, buflen, fpos, 1);
>> > +}
>> > +
>> > +/*
>> > + * The vmcore fault handler uses the page cache and fills data using the
>> > + * standard __vmcore_read() function.
>> > + */
>> > +static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>> > +{
>> > + struct address_space *mapping = vma->vm_private_data;
>> > + pgoff_t index = vmf->pgoff;
>> > + struct page *page;
>> > + loff_t src;
>> > + char *buf;
>> > + int rc;
>> > +
>> > +find_page:
>> > + page = find_lock_page(mapping, index);
>> > + if (page) {
>> > + unlock_page(page);
>> > + rc = VM_FAULT_MINOR;
>> > + } else {
>> > + page = page_cache_alloc_cold(mapping);
>> > + if (!page)
>> > + return VM_FAULT_OOM;
>> > + rc = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
>> > + if (rc) {
>> > + page_cache_release(page);
>> > + if (rc == -EEXIST)
>> > + goto find_page;
>> > + /* Probably ENOMEM for radix tree node */
>> > + return VM_FAULT_OOM;
>> > + }
>> > + buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
>> > + src = index << PAGE_CACHE_SHIFT;
>> > + __read_vmcore(buf, PAGE_SIZE, &src, 0);
>> > + unlock_page(page);
>> > + rc = VM_FAULT_MAJOR;
>> > + }
>> > + vmf->page = page;
>> > + return rc;
>> > +}
>>
>> How about reusing find_or_create_page()?
>
> The function would then look like the following:
>
> static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> {
> struct address_space *mapping = vma->vm_private_data;
> pgoff_t index = vmf->pgoff;
> struct page *page;
> loff_t src;
> char *buf;
>
> page = find_or_create_page(mapping, index, GFP_KERNEL);
> if (!page)
> return VM_FAULT_OOM;
> src = index << PAGE_CACHE_SHIFT;
> buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> __read_vmcore(buf, PAGE_SIZE, &src, 0);
> unlock_page(page);
> vmf->page = page;
> return 0;
> }
>
> I agree that this makes the function simpler but we have to copy
> the page also if it has already been filled, correct?
>
You can use for the purpose PG_uptodate flag.
> But since normally only one process uses /proc/vmcore this might be
> acceptable.
>
> BTW: I also removed the VM_FAULT_MAJOR/MINOR because I think the fault
> handler should return 0 if a page has been found.
>
> Michael
>
Thanks.
HATAYAMA, Daisuke
2013/6/11 Michael Holzheu <[email protected]>:
> On Mon, 10 Jun 2013 22:40:24 +0900
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> 2013/6/8 Michael Holzheu <[email protected]>:
>> <cut>
>> > @@ -225,6 +251,56 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>> > return acc;
>> > }
>> >
>> > +static ssize_t read_vmcore(struct file *file, char __user *buffer,
>> > + size_t buflen, loff_t *fpos)
>> > +{
>> > + return __read_vmcore(buffer, buflen, fpos, 1);
>> > +}
>> > +
>> > +/*
>> > + * The vmcore fault handler uses the page cache and fills data using the
>> > + * standard __vmcore_read() function.
>> > + */
>> > +static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>> > +{
>> > + struct address_space *mapping = vma->vm_private_data;
>> > + pgoff_t index = vmf->pgoff;
>> > + struct page *page;
>> > + loff_t src;
>> > + char *buf;
>> > + int rc;
>> > +
>> > +find_page:
>> > + page = find_lock_page(mapping, index);
>> > + if (page) {
>> > + unlock_page(page);
>> > + rc = VM_FAULT_MINOR;
>> > + } else {
>> > + page = page_cache_alloc_cold(mapping);
>> > + if (!page)
>> > + return VM_FAULT_OOM;
>> > + rc = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
>> > + if (rc) {
>> > + page_cache_release(page);
>> > + if (rc == -EEXIST)
>> > + goto find_page;
>> > + /* Probably ENOMEM for radix tree node */
>> > + return VM_FAULT_OOM;
>> > + }
>> > + buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
>> > + src = index << PAGE_CACHE_SHIFT;
>> > + __read_vmcore(buf, PAGE_SIZE, &src, 0);
>> > + unlock_page(page);
>> > + rc = VM_FAULT_MAJOR;
>> > + }
>> > + vmf->page = page;
>> > + return rc;
>> > +}
>>
>> How about reusing find_or_create_page()?
>
> The function would then look like the following:
>
> static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> {
> struct address_space *mapping = vma->vm_private_data;
> pgoff_t index = vmf->pgoff;
> struct page *page;
> loff_t src;
> char *buf;
>
> page = find_or_create_page(mapping, index, GFP_KERNEL);
> if (!page)
> return VM_FAULT_OOM;
> src = index << PAGE_CACHE_SHIFT;
> buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> __read_vmcore(buf, PAGE_SIZE, &src, 0);
You should check return value of __read_vmcore() since this can fail.
Then, I think returning VM_FAULT_SIGBUS is correct.
> unlock_page(page);
> vmf->page = page;
> return 0;
> }
Thanks.
HATAYAMA, Daisuke
(2013/06/08 1:56), Michael Holzheu wrote:
<cut>
> @@ -82,6 +116,43 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
> }
>
> /*
> + * Remap "oldmem" for zfcpdump
> + *
> + * We only map available memory above ZFCPDUMP_HSA_SIZE. Memory below
> + * ZFCPDUMP_HSA_SIZE is read on demand using the copy_oldmem_page() function.
> + */
> +static int remap_oldmem_pfn_range_zfcpdump(struct vm_area_struct *vma,
> + unsigned long from,
> + unsigned long pfn,
> + unsigned long size, pgprot_t prot)
> +{
> + unsigned long size_hsa;
> +
> + if (pfn < ZFCPDUMP_HSA_SIZE >> PAGE_SHIFT) {
> + size_hsa = min(size, OLDMEM_SIZE - (pfn << PAGE_SHIFT));
ZFCPDUMP_HSA_SIZE?
--
Thanks.
HATAYAMA, Daisuke
On Tue, 11 Jun 2013 21:42:15 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> 2013/6/11 Michael Holzheu <[email protected]>:
> > On Mon, 10 Jun 2013 22:40:24 +0900
> > HATAYAMA Daisuke <[email protected]> wrote:
> >
> >> 2013/6/8 Michael Holzheu <[email protected]>:
> >> <cut>
> >> > @@ -225,6 +251,56 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
> >> > return acc;
> >> > }
> >> >
> >> > +static ssize_t read_vmcore(struct file *file, char __user *buffer,
> >> > + size_t buflen, loff_t *fpos)
> >> > +{
> >> > + return __read_vmcore(buffer, buflen, fpos, 1);
> >> > +}
> >> > +
> >> > +/*
> >> > + * The vmcore fault handler uses the page cache and fills data using the
> >> > + * standard __vmcore_read() function.
> >> > + */
> >> > +static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >> > +{
> >> > + struct address_space *mapping = vma->vm_private_data;
> >> > + pgoff_t index = vmf->pgoff;
> >> > + struct page *page;
> >> > + loff_t src;
> >> > + char *buf;
> >> > + int rc;
> >> > +
> >> > +find_page:
> >> > + page = find_lock_page(mapping, index);
> >> > + if (page) {
> >> > + unlock_page(page);
> >> > + rc = VM_FAULT_MINOR;
> >> > + } else {
> >> > + page = page_cache_alloc_cold(mapping);
> >> > + if (!page)
> >> > + return VM_FAULT_OOM;
> >> > + rc = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
> >> > + if (rc) {
> >> > + page_cache_release(page);
> >> > + if (rc == -EEXIST)
> >> > + goto find_page;
> >> > + /* Probably ENOMEM for radix tree node */
> >> > + return VM_FAULT_OOM;
> >> > + }
> >> > + buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> >> > + src = index << PAGE_CACHE_SHIFT;
> >> > + __read_vmcore(buf, PAGE_SIZE, &src, 0);
> >> > + unlock_page(page);
> >> > + rc = VM_FAULT_MAJOR;
> >> > + }
> >> > + vmf->page = page;
> >> > + return rc;
> >> > +}
> >>
> >> How about reusing find_or_create_page()?
> >
> > The function would then look like the following:
> >
> > static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> > {
> > struct address_space *mapping = vma->vm_private_data;
> > pgoff_t index = vmf->pgoff;
> > struct page *page;
> > loff_t src;
> > char *buf;
> >
> > page = find_or_create_page(mapping, index, GFP_KERNEL);
> > if (!page)
> > return VM_FAULT_OOM;
> > src = index << PAGE_CACHE_SHIFT;
> > buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> > __read_vmcore(buf, PAGE_SIZE, &src, 0);
> > unlock_page(page);
> > vmf->page = page;
> > return 0;
> > }
> >
> > I agree that this makes the function simpler but we have to copy
> > the page also if it has already been filled, correct?
> >
>
> You can use for the purpose PG_uptodate flag.
Thanks for that hint! So together with your other comment regarding
error checking for __read_vmcore() the function would look like the
following:
static int mmap_vmcore_fault(struct vm_area_struct *vma, struct'vm_fault *vmf)
{
struct address_space *mapping = vma->vm_private_data;
pgoff_t index = vmf->pgoff;
struct page *page;
loff_t src;
char *buf;
page = find_or_create_page(mapping, index, GFP_KERNEL);
if (!page)
return VM_FAULT_OOM;
if (!PageUptodate(page)) {
src = index << PAGE_CACHE_SHIFT;
buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
if (__read_vmcore(buf, PAGE_SIZE, &src, 0) < 0) {
unlock_page(page);
return VM_FAULT_SIGBUS;
}
SetPageUptodate(page);
}
unlock_page(page);
vmf->page = page;
return 0;
}
Perhaps one open issue remains:
Can we remove the page from the page cache if __read_vmcore() fails?
Thanks!
Michael
On Wed, 12 Jun 2013 08:47:52 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> (2013/06/08 1:56), Michael Holzheu wrote:
[snip]
> > +static int remap_oldmem_pfn_range_zfcpdump(struct vm_area_struct *vma,
> > + unsigned long from,
> > + unsigned long pfn,
> > + unsigned long size, pgprot_t prot)
> > +{
> > + unsigned long size_hsa;
> > +
> > + if (pfn < ZFCPDUMP_HSA_SIZE >> PAGE_SHIFT) {
> > + size_hsa = min(size, OLDMEM_SIZE - (pfn << PAGE_SHIFT));
>
> ZFCPDUMP_HSA_SIZE?
Correct!
Michael
(2013/06/12 18:13), Michael Holzheu wrote:
> On Tue, 11 Jun 2013 21:42:15 +0900
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> 2013/6/11 Michael Holzheu <[email protected]>:
>>> On Mon, 10 Jun 2013 22:40:24 +0900
>>> HATAYAMA Daisuke <[email protected]> wrote:
>>>
>>>> 2013/6/8 Michael Holzheu <[email protected]>:
<cut>
> static int mmap_vmcore_fault(struct vm_area_struct *vma, struct'vm_fault *vmf)
> {
> struct address_space *mapping = vma->vm_private_data;
> pgoff_t index = vmf->pgoff;
> struct page *page;
> loff_t src;
> char *buf;
>
> page = find_or_create_page(mapping, index, GFP_KERNEL);
> if (!page)
> return VM_FAULT_OOM;
> if (!PageUptodate(page)) {
> src = index << PAGE_CACHE_SHIFT;
> buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> if (__read_vmcore(buf, PAGE_SIZE, &src, 0) < 0) {
> unlock_page(page);
> return VM_FAULT_SIGBUS;
> }
> SetPageUptodate(page);
> }
> unlock_page(page);
> vmf->page = page;
> return 0;
> }
>
> Perhaps one open issue remains:
>
> Can we remove the page from the page cache if __read_vmcore() fails?
>
Yes, use page_cache_release() after unlocking the page like:
if (__read_vmcore(buf, PAGE_SIZE, &src, 0) < 0) {
unlock_page(page);
+ page_cache_release(page);
return VM_FAULT_SIGBUS;
}
BTW, you now keep file->f_mapping in vma->vm_private_data, but the vma already has the file object in its vma->vm_file member. You can get the mapping by vma->vm_file->f_mapping without necessity of vma->vm_private_data.
--
Thanks.
HATAYAMA, Daisuke
(2013/06/12 18:13), Michael Holzheu wrote:
> On Tue, 11 Jun 2013 21:42:15 +0900
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> 2013/6/11 Michael Holzheu <[email protected]>:
>>> On Mon, 10 Jun 2013 22:40:24 +0900
>>> HATAYAMA Daisuke <[email protected]> wrote:
>>>
>>>> 2013/6/8 Michael Holzheu <[email protected]>:
<cut>
>
> Thanks for that hint! So together with your other comment regarding
> error checking for __read_vmcore() the function would look like the
> following:
>
> static int mmap_vmcore_fault(struct vm_area_struct *vma, struct'vm_fault *vmf)
> {
> struct address_space *mapping = vma->vm_private_data;
> pgoff_t index = vmf->pgoff;
> struct page *page;
> loff_t src;
> char *buf;
>
> page = find_or_create_page(mapping, index, GFP_KERNEL);
> if (!page)
> return VM_FAULT_OOM;
> if (!PageUptodate(page)) {
> src = index << PAGE_CACHE_SHIFT;
> buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> if (__read_vmcore(buf, PAGE_SIZE, &src, 0) < 0) {
> unlock_page(page);
> return VM_FAULT_SIGBUS;
> }
> SetPageUptodate(page);
> }
> unlock_page(page);
> vmf->page = page;
> return 0;
> }
>
> Perhaps one open issue remains:
>
> Can we remove the page from the page cache if __read_vmcore() fails?
Sorry, I overlooked the case that __read_vmcore() returns ENOMEM since it uses ioremap() internally in which page table allocation happens. Fault handler should return VM_FAULT_OOM in that case.
--
Thanks.
HATAYAMA, Daisuke
On Thu, 13 Jun 2013 10:32:48 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> > Perhaps one open issue remains:
> >
> > Can we remove the page from the page cache if __read_vmcore() fails?
> >
>
> Yes, use page_cache_release() after unlocking the page like:
>
> if (__read_vmcore(buf, PAGE_SIZE, &src, 0) < 0) {
> unlock_page(page);
> + page_cache_release(page);
> return VM_FAULT_SIGBUS;
> }
>
> BTW, you now keep file->f_mapping in vma->vm_private_data, but the vma already has the file object in its vma->vm_file member. You can get the mapping by vma->vm_file->f_mapping without necessity of vma->vm_private_data.
Hello Hatayama,
Here the new function:
static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct address_space *mapping = vma->vm_file->f_mapping;
pgoff_t index = vmf->pgoff;
struct page *page;
loff_t src;
char *buf;
int rc;
page = find_or_create_page(mapping, index, GFP_KERNEL);
if (!page)
return VM_FAULT_OOM;
if (!PageUptodate(page)) {
src = index << PAGE_CACHE_SHIFT;
buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
rc = __read_vmcore(buf, PAGE_SIZE, &src, 0);
if (rc < 0) {
unlock_page(page);
page_cache_release(page);
return (rc == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
}
SetPageUptodate(page);
}
unlock_page(page);
vmf->page = page;
return 0;
}
Thanks for all the constructive feedback!
Best Regards,
Michael
On Fri, Jun 07, 2013 at 06:55:57PM +0200, Michael Holzheu wrote:
[..]
> @@ -935,10 +967,17 @@ static int __init vmcore_init(void)
> {
> int rc = 0;
>
> - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> - if (!(is_vmcore_usable()))
> - return rc;
> + /*
> + * If elfcorehdr= has not been passed in cmdline, try to get the
> + * header from 2nd kernel, then capture the dump.
> + */
> + if (!(is_vmcore_usable())) {
> + rc = elfcorehdr_alloc();
> + if (rc)
> + return rc;
> + }
Hi Michael,
Patch description says that elfcorehdr_alloc() returns the addr and
size of elf headers. But that does not seem to be the case here. Has
it been modified in later patches.
Also will it be better if we call elfcorehdr_alloc() always and then
check for is_vmcore_usable().
Something like.
elfcorehdr_addr = elfcorehdr_alloc()
if (elfcorehdr_addr < )
return elfcorehdr_addr
if (!(is_vmcore_usable()))
return error
Thanks
Vivek
On Fri, Jun 07, 2013 at 06:55:56PM +0200, Michael Holzheu wrote:
[..]
> In this patch series I did not include the discussed ELF header swap trick
> patch because with the ELF header read functions this patch currently is
> not necessary.
Michael,
Would be good to this change atleast in a separate patch series so that
we have common mechanism across arches and understanding code becomes
easier.
Thanks
Vivek
On Fri, Jun 07, 2013 at 06:55:58PM +0200, Michael Holzheu wrote:
[..]
> /*
> - * Create kdump ELF core header in new kernel, if it has not been passed via
> - * the "elfcorehdr" kernel parameter
> + * Free ELF core header (new kernel)
> */
> -static int setup_kdump_elfcorehdr(void)
> +void elfcorehdr_free(void)
> {
I was hoping that we will pass the value returned by elfcorehdr_alloc()
here. Something along the lines of kmalloc() and kfree().
elfcorehdr_addr = elfcorhdr_alloc();
...
...
elfcorhdr_free(elfcorhdr_addr);
Only odd part here is that arch will not set elfcorehdr_addr=NULL as there
are functions like is_kdump_kernel() which depend on it being set.
That might be a separate cleanup thought and we can put a comment after
elfcorhdr_free() and explain that oddity.
Thanks
Vivek
On Fri, Jun 07, 2013 at 06:56:00PM +0200, Michael Holzheu wrote:
> From: Jan Willeke <[email protected]>
>
> This patch introduces the s390 specific way to map pages from oldmem.
> The memory area below OLDMEM_SIZE is mapped with offset OLDMEM_BASE.
> The other old memory is mapped directly.
>
If we get rid of swap logic in copy_oldmem_page(), quite some amount
of code can be cleaned up. One has to spcial case only zfcpdump to
read from HSA memory.
Anyway sounds like a s390 specific cleanup in a separate series.
Thanks
Vivek
> Signed-off-by: Jan Willeke <[email protected]>
> Signed-off-by: Michael Holzheu <[email protected]>
> ---
> arch/s390/kernel/crash_dump.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
> index a17f8d4..12ccc3c 100644
> --- a/arch/s390/kernel/crash_dump.c
> +++ b/arch/s390/kernel/crash_dump.c
> @@ -56,6 +56,32 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> }
>
> /*
> + * Remap "oldmem"
> + *
> + * For the kdump reserved memory this functions performs a swap operation:
> + * [0 - OLDMEM_SIZE] is mapped to [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE]
> + */
> +int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
> + unsigned long pfn, unsigned long size, pgprot_t prot)
> +{
> + unsigned long size_old;
> + int rc;
> +
> + if (pfn < OLDMEM_SIZE >> PAGE_SHIFT) {
> + size_old = min(size, OLDMEM_SIZE - (pfn << PAGE_SHIFT));
> + rc = remap_pfn_range(vma, from,
> + pfn + (OLDMEM_BASE >> PAGE_SHIFT),
> + size_old, prot);
> + if (rc || size == size_old)
> + return rc;
> + size -= size_old;
> + from += size_old;
> + pfn += size_old >> PAGE_SHIFT;
> + }
> + return remap_pfn_range(vma, from, pfn, size, prot);
> +}
> +
> +/*
> * Copy memory from old kernel
> */
> int copy_from_oldmem(void *dest, void *src, size_t count)
> --
> 1.8.1.6
On Fri, 14 Jun 2013 14:54:54 -0400
Vivek Goyal <[email protected]> wrote:
> On Fri, Jun 07, 2013 at 06:55:56PM +0200, Michael Holzheu wrote:
>
> [..]
> > In this patch series I did not include the discussed ELF header swap trick
> > patch because with the ELF header read functions this patch currently is
> > not necessary.
>
> Michael,
>
> Would be good to this change atleast in a separate patch series so that
> we have common mechanism across arches and understanding code becomes
> easier.
Ok fine, we can do that in a separate patch series on top.
Michael
On Fri, 14 Jun 2013 14:54:02 -0400
Vivek Goyal <[email protected]> wrote:
> On Fri, Jun 07, 2013 at 06:55:57PM +0200, Michael Holzheu wrote:
>
> [..]
> > @@ -935,10 +967,17 @@ static int __init vmcore_init(void)
> > {
> > int rc = 0;
> >
> > - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> > - if (!(is_vmcore_usable()))
> > - return rc;
> > + /*
> > + * If elfcorehdr= has not been passed in cmdline, try to get the
> > + * header from 2nd kernel, then capture the dump.
> > + */
> > + if (!(is_vmcore_usable())) {
> > + rc = elfcorehdr_alloc();
> > + if (rc)
> > + return rc;
> > + }
>
> Hi Michael,
>
> Patch description says that elfcorehdr_alloc() returns the addr and
> size of elf headers. But that does not seem to be the case here. Has
> it been modified in later patches.
Sorry, that is a relict of one of my previous experiments where I tried
to implement elfcorehdr_addr() similar to the way as you suggest it now.
Because elfcorehdr_addr is a global variable, I decided to not pass
it in the functions. But of course I can change that again if you prefer
that.
> Also will it be better if we call elfcorehdr_alloc() always and then
> check for is_vmcore_usable().
>
> Something like.
>
> elfcorehdr_addr = elfcorehdr_alloc()
> if (elfcorehdr_addr < )
> return elfcorehdr_addr
>
> if (!(is_vmcore_usable()))
> return error
Ok, but then I think elfcorehdr_alloc() should also return
the elfcorehdr_size.
So what about the following patch:
---
fs/proc/vmcore.c | 65 +++++++++++++++++++++++++++++++++++++--------
include/linux/crash_dump.h | 6 ++++
2 files changed, 60 insertions(+), 11 deletions(-)
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -123,6 +123,36 @@ static ssize_t read_from_oldmem(char *bu
return read;
}
+/*
+ * Architectures may override this function to allocate ELF header in 2nd kernel
+ */
+int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
+{
+ return 0;
+}
+
+/*
+ * Architectures may override this function to free header
+ */
+void __weak elfcorehdr_free(unsigned long long addr)
+{}
+
+/*
+ * Architectures may override this function to read from ELF header
+ */
+ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ return read_from_oldmem(buf, count, ppos, 0);
+}
+
+/*
+ * Architectures may override this function to read from notes sections
+ */
+ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
+{
+ return read_from_oldmem(buf, count, ppos, 0);
+}
+
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
@@ -322,7 +352,7 @@ static int __init update_note_header_siz
notes_section = kmalloc(max_sz, GFP_KERNEL);
if (!notes_section)
return -ENOMEM;
- rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
if (rc < 0) {
kfree(notes_section);
return rc;
@@ -409,7 +439,8 @@ static int __init copy_notes_elf64(const
if (phdr_ptr->p_type != PT_NOTE)
continue;
offset = phdr_ptr->p_offset;
- rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
+ &offset);
if (rc < 0)
return rc;
notes_buf += phdr_ptr->p_memsz;
@@ -510,7 +541,7 @@ static int __init update_note_header_siz
notes_section = kmalloc(max_sz, GFP_KERNEL);
if (!notes_section)
return -ENOMEM;
- rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
if (rc < 0) {
kfree(notes_section);
return rc;
@@ -597,7 +628,8 @@ static int __init copy_notes_elf32(const
if (phdr_ptr->p_type != PT_NOTE)
continue;
offset = phdr_ptr->p_offset;
- rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
+ &offset);
if (rc < 0)
return rc;
notes_buf += phdr_ptr->p_memsz;
@@ -793,7 +825,7 @@ static int __init parse_crash_elf64_head
addr = elfcorehdr_addr;
/* Read Elf header */
- rc = read_from_oldmem((char*)&ehdr, sizeof(Elf64_Ehdr), &addr, 0);
+ rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf64_Ehdr), &addr);
if (rc < 0)
return rc;
@@ -820,7 +852,7 @@ static int __init parse_crash_elf64_head
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
+ rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
if (rc < 0)
goto fail;
@@ -849,7 +881,7 @@ static int __init parse_crash_elf32_head
addr = elfcorehdr_addr;
/* Read Elf header */
- rc = read_from_oldmem((char*)&ehdr, sizeof(Elf32_Ehdr), &addr, 0);
+ rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf32_Ehdr), &addr);
if (rc < 0)
return rc;
@@ -875,7 +907,7 @@ static int __init parse_crash_elf32_head
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
+ rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
if (rc < 0)
goto fail;
@@ -902,7 +934,7 @@ static int __init parse_crash_elf_header
int rc=0;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(e_ident, EI_NIDENT, &addr, 0);
+ rc = elfcorehdr_read(e_ident, EI_NIDENT, &addr);
if (rc < 0)
return rc;
if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
@@ -935,7 +967,14 @@ static int __init vmcore_init(void)
{
int rc = 0;
- /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
+ /* Allow architectures to allocate ELF header in 2nd kernel */
+ rc = elfcorehdr_alloc(&elfcorehdr_addr, &elfcorehdr_size);
+ if (rc)
+ return rc;
+ /*
+ * If elfcorehdr= has been passed in cmdline or created in 2nd kernel,
+ * then capture the dump.
+ */
if (!(is_vmcore_usable()))
return rc;
rc = parse_crash_elf_headers();
@@ -943,7 +982,11 @@ static int __init vmcore_init(void)
pr_warn("Kdump: vmcore not initialized\n");
return rc;
}
-
+ elfcorehdr_free(elfcorehdr_addr);
+ /*
+ * elfcorehdr_addr must not be set to NULL here to keep
+ * is_kdump_kernel() working.
+ */
proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
if (proc_vmcore)
proc_vmcore->size = vmcore_size;
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -12,6 +12,12 @@
extern unsigned long long elfcorehdr_addr;
extern unsigned long long elfcorehdr_size;
+extern int __weak elfcorehdr_alloc(unsigned long long *addr,
+ unsigned long long *size);
+extern void __weak elfcorehdr_free(unsigned long long addr);
+extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
+extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+
extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
On Fri, Jun 21, 2013 at 04:17:03PM +0200, Michael Holzheu wrote:
> On Fri, 14 Jun 2013 14:54:02 -0400
> Vivek Goyal <[email protected]> wrote:
>
> > On Fri, Jun 07, 2013 at 06:55:57PM +0200, Michael Holzheu wrote:
> >
> > [..]
> > > @@ -935,10 +967,17 @@ static int __init vmcore_init(void)
> > > {
> > > int rc = 0;
> > >
> > > - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> > > - if (!(is_vmcore_usable()))
> > > - return rc;
> > > + /*
> > > + * If elfcorehdr= has not been passed in cmdline, try to get the
> > > + * header from 2nd kernel, then capture the dump.
> > > + */
> > > + if (!(is_vmcore_usable())) {
> > > + rc = elfcorehdr_alloc();
> > > + if (rc)
> > > + return rc;
> > > + }
> >
> > Hi Michael,
> >
> > Patch description says that elfcorehdr_alloc() returns the addr and
> > size of elf headers. But that does not seem to be the case here. Has
> > it been modified in later patches.
>
> Sorry, that is a relict of one of my previous experiments where I tried
> to implement elfcorehdr_addr() similar to the way as you suggest it now.
> Because elfcorehdr_addr is a global variable, I decided to not pass
> it in the functions. But of course I can change that again if you prefer
> that.
>
> > Also will it be better if we call elfcorehdr_alloc() always and then
> > check for is_vmcore_usable().
> >
> > Something like.
> >
> > elfcorehdr_addr = elfcorehdr_alloc()
> > if (elfcorehdr_addr < )
> > return elfcorehdr_addr
> >
> > if (!(is_vmcore_usable()))
> > return error
>
> Ok, but then I think elfcorehdr_alloc() should also return
> the elfcorehdr_size.
>
> So what about the following patch:
This patch looks good to me. There are little ugly pieces w.r.t not
setting elfcorehdr_addr null after calling free() as there are other
pieces depdendent on elfcorehdr_addr. I think sometime later we can
have a separate variable to track track whether this is kdump kernel
and remove dependency on elfcorehdr_addr being set.
So I am fine with this patch.
Vivek
> ---
> fs/proc/vmcore.c | 65 +++++++++++++++++++++++++++++++++++++--------
> include/linux/crash_dump.h | 6 ++++
> 2 files changed, 60 insertions(+), 11 deletions(-)
>
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -123,6 +123,36 @@ static ssize_t read_from_oldmem(char *bu
> return read;
> }
>
> +/*
> + * Architectures may override this function to allocate ELF header in 2nd kernel
> + */
> +int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
> +{
> + return 0;
> +}
> +
> +/*
> + * Architectures may override this function to free header
> + */
> +void __weak elfcorehdr_free(unsigned long long addr)
> +{}
> +
> +/*
> + * Architectures may override this function to read from ELF header
> + */
> +ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> +{
> + return read_from_oldmem(buf, count, ppos, 0);
> +}
> +
> +/*
> + * Architectures may override this function to read from notes sections
> + */
> +ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
> +{
> + return read_from_oldmem(buf, count, ppos, 0);
> +}
> +
> /* Read from the ELF header and then the crash dump. On error, negative value is
> * returned otherwise number of bytes read are returned.
> */
> @@ -322,7 +352,7 @@ static int __init update_note_header_siz
> notes_section = kmalloc(max_sz, GFP_KERNEL);
> if (!notes_section)
> return -ENOMEM;
> - rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
> + rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
> if (rc < 0) {
> kfree(notes_section);
> return rc;
> @@ -409,7 +439,8 @@ static int __init copy_notes_elf64(const
> if (phdr_ptr->p_type != PT_NOTE)
> continue;
> offset = phdr_ptr->p_offset;
> - rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> + rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
> + &offset);
> if (rc < 0)
> return rc;
> notes_buf += phdr_ptr->p_memsz;
> @@ -510,7 +541,7 @@ static int __init update_note_header_siz
> notes_section = kmalloc(max_sz, GFP_KERNEL);
> if (!notes_section)
> return -ENOMEM;
> - rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
> + rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
> if (rc < 0) {
> kfree(notes_section);
> return rc;
> @@ -597,7 +628,8 @@ static int __init copy_notes_elf32(const
> if (phdr_ptr->p_type != PT_NOTE)
> continue;
> offset = phdr_ptr->p_offset;
> - rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> + rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
> + &offset);
> if (rc < 0)
> return rc;
> notes_buf += phdr_ptr->p_memsz;
> @@ -793,7 +825,7 @@ static int __init parse_crash_elf64_head
> addr = elfcorehdr_addr;
>
> /* Read Elf header */
> - rc = read_from_oldmem((char*)&ehdr, sizeof(Elf64_Ehdr), &addr, 0);
> + rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf64_Ehdr), &addr);
> if (rc < 0)
> return rc;
>
> @@ -820,7 +852,7 @@ static int __init parse_crash_elf64_head
> if (!elfcorebuf)
> return -ENOMEM;
> addr = elfcorehdr_addr;
> - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
> + rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
> if (rc < 0)
> goto fail;
>
> @@ -849,7 +881,7 @@ static int __init parse_crash_elf32_head
> addr = elfcorehdr_addr;
>
> /* Read Elf header */
> - rc = read_from_oldmem((char*)&ehdr, sizeof(Elf32_Ehdr), &addr, 0);
> + rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf32_Ehdr), &addr);
> if (rc < 0)
> return rc;
>
> @@ -875,7 +907,7 @@ static int __init parse_crash_elf32_head
> if (!elfcorebuf)
> return -ENOMEM;
> addr = elfcorehdr_addr;
> - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
> + rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
> if (rc < 0)
> goto fail;
>
> @@ -902,7 +934,7 @@ static int __init parse_crash_elf_header
> int rc=0;
>
> addr = elfcorehdr_addr;
> - rc = read_from_oldmem(e_ident, EI_NIDENT, &addr, 0);
> + rc = elfcorehdr_read(e_ident, EI_NIDENT, &addr);
> if (rc < 0)
> return rc;
> if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
> @@ -935,7 +967,14 @@ static int __init vmcore_init(void)
> {
> int rc = 0;
>
> - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> + /* Allow architectures to allocate ELF header in 2nd kernel */
> + rc = elfcorehdr_alloc(&elfcorehdr_addr, &elfcorehdr_size);
> + if (rc)
> + return rc;
> + /*
> + * If elfcorehdr= has been passed in cmdline or created in 2nd kernel,
> + * then capture the dump.
> + */
> if (!(is_vmcore_usable()))
> return rc;
> rc = parse_crash_elf_headers();
> @@ -943,7 +982,11 @@ static int __init vmcore_init(void)
> pr_warn("Kdump: vmcore not initialized\n");
> return rc;
> }
> -
> + elfcorehdr_free(elfcorehdr_addr);
> + /*
> + * elfcorehdr_addr must not be set to NULL here to keep
> + * is_kdump_kernel() working.
> + */
> proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
> if (proc_vmcore)
> proc_vmcore->size = vmcore_size;
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -12,6 +12,12 @@
> extern unsigned long long elfcorehdr_addr;
> extern unsigned long long elfcorehdr_size;
>
> +extern int __weak elfcorehdr_alloc(unsigned long long *addr,
> + unsigned long long *size);
> +extern void __weak elfcorehdr_free(unsigned long long addr);
> +extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
> +extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
> +
> extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
> unsigned long, int);
>
On Thu, Jun 27, 2013 at 03:32:02PM -0400, Vivek Goyal wrote:
> On Fri, Jun 21, 2013 at 04:17:03PM +0200, Michael Holzheu wrote:
> > On Fri, 14 Jun 2013 14:54:02 -0400
> > Vivek Goyal <[email protected]> wrote:
> >
> > > On Fri, Jun 07, 2013 at 06:55:57PM +0200, Michael Holzheu wrote:
> > >
> > > [..]
> > > > @@ -935,10 +967,17 @@ static int __init vmcore_init(void)
> > > > {
> > > > int rc = 0;
> > > >
> > > > - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> > > > - if (!(is_vmcore_usable()))
> > > > - return rc;
> > > > + /*
> > > > + * If elfcorehdr= has not been passed in cmdline, try to get the
> > > > + * header from 2nd kernel, then capture the dump.
> > > > + */
> > > > + if (!(is_vmcore_usable())) {
> > > > + rc = elfcorehdr_alloc();
> > > > + if (rc)
> > > > + return rc;
> > > > + }
> > >
> > > Hi Michael,
> > >
> > > Patch description says that elfcorehdr_alloc() returns the addr and
> > > size of elf headers. But that does not seem to be the case here. Has
> > > it been modified in later patches.
> >
> > Sorry, that is a relict of one of my previous experiments where I tried
> > to implement elfcorehdr_addr() similar to the way as you suggest it now.
> > Because elfcorehdr_addr is a global variable, I decided to not pass
> > it in the functions. But of course I can change that again if you prefer
> > that.
> >
> > > Also will it be better if we call elfcorehdr_alloc() always and then
> > > check for is_vmcore_usable().
> > >
> > > Something like.
> > >
> > > elfcorehdr_addr = elfcorehdr_alloc()
> > > if (elfcorehdr_addr < )
> > > return elfcorehdr_addr
> > >
> > > if (!(is_vmcore_usable()))
> > > return error
> >
> > Ok, but then I think elfcorehdr_alloc() should also return
> > the elfcorehdr_size.
> >
> > So what about the following patch:
>
> This patch looks good to me. There are little ugly pieces w.r.t not
> setting elfcorehdr_addr null after calling free() as there are other
> pieces depdendent on elfcorehdr_addr. I think sometime later we can
> have a separate variable to track track whether this is kdump kernel
> and remove dependency on elfcorehdr_addr being set.
>
>
> So I am fine with this patch.
I would like to endorse this whole vmcore series. It greatly speeds the
writing of a system dump.
In my recent tests these patches cut 16% to 58% off the time to create a
dump, depending on the size of the system.
On an idle 117G memory, for example, the patches reduced the time from 4.8
minutes to 2.0 minutes. (the savings are less pronounced on a bigger
system)
-Cliff
>
> Vivek
>
> > ---
> > fs/proc/vmcore.c | 65 +++++++++++++++++++++++++++++++++++++--------
> > include/linux/crash_dump.h | 6 ++++
> > 2 files changed, 60 insertions(+), 11 deletions(-)
> >
> > --- a/fs/proc/vmcore.c
> > +++ b/fs/proc/vmcore.c
> > @@ -123,6 +123,36 @@ static ssize_t read_from_oldmem(char *bu
> > return read;
> > }
> >
> > +/*
> > + * Architectures may override this function to allocate ELF header in 2nd kernel
> > + */
> > +int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
> > +{
> > + return 0;
> > +}
> > +
> > +/*
> > + * Architectures may override this function to free header
> > + */
> > +void __weak elfcorehdr_free(unsigned long long addr)
> > +{}
> > +
> > +/*
> > + * Architectures may override this function to read from ELF header
> > + */
> > +ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> > +{
> > + return read_from_oldmem(buf, count, ppos, 0);
> > +}
> > +
> > +/*
> > + * Architectures may override this function to read from notes sections
> > + */
> > +ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
> > +{
> > + return read_from_oldmem(buf, count, ppos, 0);
> > +}
> > +
> > /* Read from the ELF header and then the crash dump. On error, negative value is
> > * returned otherwise number of bytes read are returned.
> > */
> > @@ -322,7 +352,7 @@ static int __init update_note_header_siz
> > notes_section = kmalloc(max_sz, GFP_KERNEL);
> > if (!notes_section)
> > return -ENOMEM;
> > - rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
> > + rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
> > if (rc < 0) {
> > kfree(notes_section);
> > return rc;
> > @@ -409,7 +439,8 @@ static int __init copy_notes_elf64(const
> > if (phdr_ptr->p_type != PT_NOTE)
> > continue;
> > offset = phdr_ptr->p_offset;
> > - rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> > + rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
> > + &offset);
> > if (rc < 0)
> > return rc;
> > notes_buf += phdr_ptr->p_memsz;
> > @@ -510,7 +541,7 @@ static int __init update_note_header_siz
> > notes_section = kmalloc(max_sz, GFP_KERNEL);
> > if (!notes_section)
> > return -ENOMEM;
> > - rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
> > + rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
> > if (rc < 0) {
> > kfree(notes_section);
> > return rc;
> > @@ -597,7 +628,8 @@ static int __init copy_notes_elf32(const
> > if (phdr_ptr->p_type != PT_NOTE)
> > continue;
> > offset = phdr_ptr->p_offset;
> > - rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> > + rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
> > + &offset);
> > if (rc < 0)
> > return rc;
> > notes_buf += phdr_ptr->p_memsz;
> > @@ -793,7 +825,7 @@ static int __init parse_crash_elf64_head
> > addr = elfcorehdr_addr;
> >
> > /* Read Elf header */
> > - rc = read_from_oldmem((char*)&ehdr, sizeof(Elf64_Ehdr), &addr, 0);
> > + rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf64_Ehdr), &addr);
> > if (rc < 0)
> > return rc;
> >
> > @@ -820,7 +852,7 @@ static int __init parse_crash_elf64_head
> > if (!elfcorebuf)
> > return -ENOMEM;
> > addr = elfcorehdr_addr;
> > - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
> > + rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
> > if (rc < 0)
> > goto fail;
> >
> > @@ -849,7 +881,7 @@ static int __init parse_crash_elf32_head
> > addr = elfcorehdr_addr;
> >
> > /* Read Elf header */
> > - rc = read_from_oldmem((char*)&ehdr, sizeof(Elf32_Ehdr), &addr, 0);
> > + rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf32_Ehdr), &addr);
> > if (rc < 0)
> > return rc;
> >
> > @@ -875,7 +907,7 @@ static int __init parse_crash_elf32_head
> > if (!elfcorebuf)
> > return -ENOMEM;
> > addr = elfcorehdr_addr;
> > - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
> > + rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
> > if (rc < 0)
> > goto fail;
> >
> > @@ -902,7 +934,7 @@ static int __init parse_crash_elf_header
> > int rc=0;
> >
> > addr = elfcorehdr_addr;
> > - rc = read_from_oldmem(e_ident, EI_NIDENT, &addr, 0);
> > + rc = elfcorehdr_read(e_ident, EI_NIDENT, &addr);
> > if (rc < 0)
> > return rc;
> > if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
> > @@ -935,7 +967,14 @@ static int __init vmcore_init(void)
> > {
> > int rc = 0;
> >
> > - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> > + /* Allow architectures to allocate ELF header in 2nd kernel */
> > + rc = elfcorehdr_alloc(&elfcorehdr_addr, &elfcorehdr_size);
> > + if (rc)
> > + return rc;
> > + /*
> > + * If elfcorehdr= has been passed in cmdline or created in 2nd kernel,
> > + * then capture the dump.
> > + */
> > if (!(is_vmcore_usable()))
> > return rc;
> > rc = parse_crash_elf_headers();
> > @@ -943,7 +982,11 @@ static int __init vmcore_init(void)
> > pr_warn("Kdump: vmcore not initialized\n");
> > return rc;
> > }
> > -
> > + elfcorehdr_free(elfcorehdr_addr);
> > + /*
> > + * elfcorehdr_addr must not be set to NULL here to keep
> > + * is_kdump_kernel() working.
> > + */
> > proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
> > if (proc_vmcore)
> > proc_vmcore->size = vmcore_size;
> > --- a/include/linux/crash_dump.h
> > +++ b/include/linux/crash_dump.h
> > @@ -12,6 +12,12 @@
> > extern unsigned long long elfcorehdr_addr;
> > extern unsigned long long elfcorehdr_size;
> >
> > +extern int __weak elfcorehdr_alloc(unsigned long long *addr,
> > + unsigned long long *size);
> > +extern void __weak elfcorehdr_free(unsigned long long addr);
> > +extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
> > +extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
> > +
> > extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
> > unsigned long, int);
> >
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
--
Cliff Wickman
SGI
[email protected]
(651) 683-3824
On Thu, Jun 27, 2013 at 03:32:02PM -0400, Vivek Goyal wrote:
> On Fri, Jun 21, 2013 at 04:17:03PM +0200, Michael Holzheu wrote:
> > On Fri, 14 Jun 2013 14:54:02 -0400
> > Vivek Goyal <[email protected]> wrote:
> >
> > > On Fri, Jun 07, 2013 at 06:55:57PM +0200, Michael Holzheu wrote:
> > >
> > > [..]
> > > > @@ -935,10 +967,17 @@ static int __init vmcore_init(void)
> > > > {
> > > > int rc = 0;
> > > >
> > > > - /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
> > > > - if (!(is_vmcore_usable()))
> > > > - return rc;
> > > > + /*
> > > > + * If elfcorehdr= has not been passed in cmdline, try to get the
> > > > + * header from 2nd kernel, then capture the dump.
> > > > + */
> > > > + if (!(is_vmcore_usable())) {
> > > > + rc = elfcorehdr_alloc();
> > > > + if (rc)
> > > > + return rc;
> > > > + }
> > >
> > > Hi Michael,
> > >
> > > Patch description says that elfcorehdr_alloc() returns the addr and
> > > size of elf headers. But that does not seem to be the case here. Has
> > > it been modified in later patches.
> >
> > Sorry, that is a relict of one of my previous experiments where I tried
> > to implement elfcorehdr_addr() similar to the way as you suggest it now.
> > Because elfcorehdr_addr is a global variable, I decided to not pass
> > it in the functions. But of course I can change that again if you prefer
> > that.
> >
> > > Also will it be better if we call elfcorehdr_alloc() always and then
> > > check for is_vmcore_usable().
> > >
> > > Something like.
> > >
> > > elfcorehdr_addr = elfcorehdr_alloc()
> > > if (elfcorehdr_addr < )
> > > return elfcorehdr_addr
> > >
> > > if (!(is_vmcore_usable()))
> > > return error
> >
> > Ok, but then I think elfcorehdr_alloc() should also return
> > the elfcorehdr_size.
> >
> > So what about the following patch:
>
> This patch looks good to me. There are little ugly pieces w.r.t not
> setting elfcorehdr_addr null after calling free() as there are other
> pieces depdendent on elfcorehdr_addr. I think sometime later we can
> have a separate variable to track track whether this is kdump kernel
> and remove dependency on elfcorehdr_addr being set.
Hi Michael,
Thinking more about it, I think let us cleanup with this little ugly
bit too so that future changes become easy.
Current convention is that elfcorehdr_addr and elfcorehdr_size are
already set by arch code by the time vmcore.c starts reading it. Can't
s390 allocate elf headers in early boot code and elfcorehdr_addr? Then
we don't have to call elfcorehdr_alloc().
And once we are done with reading headers, we can call elfcorehdr_free()
and s390 could free memory and set elfcorehdr_addr to ELFCORE_ADDR_ERR
and elfcorehdr_size=0. That would signify that one can not try to read
elf headers now and it must have been freed.
is_kdump_kernel() will continue to work as elfcorehdr_addr is
ELFCORE_ADDR_ERR. And that will mean that either elfcorehdr were not
readable/usable to begin with or they have been freed now.
Thanks
Vivek
On Thu, 27 Jun 2013 16:23:34 -0400
Vivek Goyal <[email protected]> wrote:
> On Thu, Jun 27, 2013 at 03:32:02PM -0400, Vivek Goyal wrote:
> > On Fri, Jun 21, 2013 at 04:17:03PM +0200, Michael Holzheu wrote:
> > > On Fri, 14 Jun 2013 14:54:02 -0400
> > > Vivek Goyal <[email protected]> wrote:
[snip]
> Thinking more about it, I think let us cleanup with this little ugly
> bit too so that future changes become easy.
>
> Current convention is that elfcorehdr_addr and elfcorehdr_size are
> already set by arch code by the time vmcore.c starts reading it. Can't
> s390 allocate elf headers in early boot code and elfcorehdr_addr? Then
> we don't have to call elfcorehdr_alloc().
>
> And once we are done with reading headers, we can call elfcorehdr_free()
> and s390 could free memory and set elfcorehdr_addr to ELFCORE_ADDR_ERR
> and elfcorehdr_size=0. That would signify that one can not try to read
> elf headers now and it must have been freed.
>
> is_kdump_kernel() will continue to work as elfcorehdr_addr is
> ELFCORE_ADDR_ERR. And that will mean that either elfcorehdr were not
> readable/usable to begin with or they have been freed now.
Hello Vivek,
We would like to keep the alloc/free symmetry as you have suggested in a
previous mail. This also has the advantage that we do not have to rely
on the ordering of init calls.
Wouldn't it be sufficient to just set elfcorehdr_addr to ELFCORE_ADDR_ERR
after elfcorehdr_free() and remove the comment?
So the code would look like the following:
static int __init vmcore_init(void)
{
int rc = 0;
/* Allow architectures to allocate ELF header in 2nd kernel */
rc = elfcorehdr_alloc(&elfcorehdr_addr, &elfcorehdr_size);
if (rc)
return rc;
/*
* If elfcorehdr= has been passed in cmdline or created in 2nd kernel,
* then capture the dump.
*/
if (!(is_vmcore_usable()))
return rc;
rc = parse_crash_elf_headers();
if (rc) {
pr_warn("Kdump: vmcore not initialized\n");
return rc;
}
elfcorehdr_free(elfcorehdr_addr);
elfcorehdr_addr = ELFCORE_ADDR_ERR;
proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
if (proc_vmcore)
proc_vmcore->size = vmcore_size;
return 0;
}
This looks clean for me.
What do you think?
Best Regards,
Michael
On Fri, Jun 28, 2013 at 10:15:52AM +0200, Michael Holzheu wrote:
> On Thu, 27 Jun 2013 16:23:34 -0400
> Vivek Goyal <[email protected]> wrote:
>
> > On Thu, Jun 27, 2013 at 03:32:02PM -0400, Vivek Goyal wrote:
> > > On Fri, Jun 21, 2013 at 04:17:03PM +0200, Michael Holzheu wrote:
> > > > On Fri, 14 Jun 2013 14:54:02 -0400
> > > > Vivek Goyal <[email protected]> wrote:
>
> [snip]
>
> > Thinking more about it, I think let us cleanup with this little ugly
> > bit too so that future changes become easy.
> >
> > Current convention is that elfcorehdr_addr and elfcorehdr_size are
> > already set by arch code by the time vmcore.c starts reading it. Can't
> > s390 allocate elf headers in early boot code and elfcorehdr_addr? Then
> > we don't have to call elfcorehdr_alloc().
> >
> > And once we are done with reading headers, we can call elfcorehdr_free()
> > and s390 could free memory and set elfcorehdr_addr to ELFCORE_ADDR_ERR
> > and elfcorehdr_size=0. That would signify that one can not try to read
> > elf headers now and it must have been freed.
> >
> > is_kdump_kernel() will continue to work as elfcorehdr_addr is
> > ELFCORE_ADDR_ERR. And that will mean that either elfcorehdr were not
> > readable/usable to begin with or they have been freed now.
>
> Hello Vivek,
>
> We would like to keep the alloc/free symmetry as you have suggested in a
> previous mail. This also has the advantage that we do not have to rely
> on the ordering of init calls.
>
> Wouldn't it be sufficient to just set elfcorehdr_addr to ELFCORE_ADDR_ERR
> after elfcorehdr_free() and remove the comment?
>
Hi Michael,
This has only one problem and that is what's the initialization semantics
of elfcorehdr_addr.
So far we expected it to be initialized very early in boot process and
once this is set, any component in the system could figure out if it
is kdump kernel (is_kdump_kernel()) and do kdump specific things.
But if we move initialization of this variable little late, then it
might be a problem for early users of is_kdump_kernel(). Though right
now I don't see drivers making use of it and only arch specific early
boot up code seems to have it.
So either we can stick to existing semantics of initializing headers
early or we could create a separate variable for is_kdump_kernel() which
is set in early boot and then one can delay initialization of
elfcorehdr_addr() in vmcore_init().
Given the fact that I don't see any users of is_kdump_kernel() in arch
independent directory, and I am assuming that you will tackle all early
users of is_kdump_kernel() in s390, I will be fine even with your patch
below.
Thanks
Vivek
> So the code would look like the following:
>
> static int __init vmcore_init(void)
> {
> int rc = 0;
>
> /* Allow architectures to allocate ELF header in 2nd kernel */
> rc = elfcorehdr_alloc(&elfcorehdr_addr, &elfcorehdr_size);
> if (rc)
> return rc;
> /*
> * If elfcorehdr= has been passed in cmdline or created in 2nd kernel,
> * then capture the dump.
> */
> if (!(is_vmcore_usable()))
> return rc;
> rc = parse_crash_elf_headers();
> if (rc) {
> pr_warn("Kdump: vmcore not initialized\n");
> return rc;
> }
> elfcorehdr_free(elfcorehdr_addr);
> elfcorehdr_addr = ELFCORE_ADDR_ERR;
>
> proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
> if (proc_vmcore)
> proc_vmcore->size = vmcore_size;
> return 0;
> }
>
> This looks clean for me. >
> What do you think?
>
> Best Regards,
> Michael
On Mon, 1 Jul 2013 13:37:27 -0400
Vivek Goyal <[email protected]> wrote:
> On Fri, Jun 28, 2013 at 10:15:52AM +0200, Michael Holzheu wrote:
> > On Thu, 27 Jun 2013 16:23:34 -0400
> > Vivek Goyal <[email protected]> wrote:
> >
> > > On Thu, Jun 27, 2013 at 03:32:02PM -0400, Vivek Goyal wrote:
> > > > On Fri, Jun 21, 2013 at 04:17:03PM +0200, Michael Holzheu wrote:
> > > > > On Fri, 14 Jun 2013 14:54:02 -0400
> > > > > Vivek Goyal <[email protected]> wrote:
> >
> > [snip]
> >
> > > Thinking more about it, I think let us cleanup with this little ugly
> > > bit too so that future changes become easy.
> > >
> > > Current convention is that elfcorehdr_addr and elfcorehdr_size are
> > > already set by arch code by the time vmcore.c starts reading it. Can't
> > > s390 allocate elf headers in early boot code and elfcorehdr_addr? Then
> > > we don't have to call elfcorehdr_alloc().
> > >
> > > And once we are done with reading headers, we can call elfcorehdr_free()
> > > and s390 could free memory and set elfcorehdr_addr to ELFCORE_ADDR_ERR
> > > and elfcorehdr_size=0. That would signify that one can not try to read
> > > elf headers now and it must have been freed.
> > >
> > > is_kdump_kernel() will continue to work as elfcorehdr_addr is
> > > ELFCORE_ADDR_ERR. And that will mean that either elfcorehdr were not
> > > readable/usable to begin with or they have been freed now.
> >
> > Hello Vivek,
> >
> > We would like to keep the alloc/free symmetry as you have suggested in a
> > previous mail. This also has the advantage that we do not have to rely
> > on the ordering of init calls.
> >
> > Wouldn't it be sufficient to just set elfcorehdr_addr to ELFCORE_ADDR_ERR
> > after elfcorehdr_free() and remove the comment?
> >
>
> Hi Michael,
>
> This has only one problem and that is what's the initialization semantics
> of elfcorehdr_addr.
>
> So far we expected it to be initialized very early in boot process and
> once this is set, any component in the system could figure out if it
> is kdump kernel (is_kdump_kernel()) and do kdump specific things.
>
> But if we move initialization of this variable little late, then it
> might be a problem for early users of is_kdump_kernel(). Though right
> now I don't see drivers making use of it and only arch specific early
> boot up code seems to have it.
>
> So either we can stick to existing semantics of initializing headers
> early or we could create a separate variable for is_kdump_kernel() which
> is set in early boot and then one can delay initialization of
> elfcorehdr_addr() in vmcore_init().
The later makes much sense to me. This would also make the s390 code
easier to read since we then could exchange some early kdump checks
with the official is_kdump() function.
Perhaps we can do that as an additional patch after we are ready with
this patch series.
>
> Given the fact that I don't see any users of is_kdump_kernel() in arch
> independent directory, and I am assuming that you will tackle all early
> users of is_kdump_kernel() in s390, I will be fine even with your patch
> below.
Ok great, so I will send you the current patch set version 6 with all the
discussed changes.
Thanks!
Michael