2013-07-23 15:49:05

by Michael Holzheu

[permalink] [raw]
Subject: [PATCH v8 0/6] kdump: Allow ELF header creation in new kernel

Hello Andrew,

Here the new kdump patch series as discussed with Vivek and Hatayama
during the last months.

I adjusted the code to v3.11-rc2 where the following patches have been
integrated:

https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=5a74953ff56aa870d6913ef4d81934f5c620c59d
https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=191a2fa0a8d2bbb64c98f9b1976fcb37ee5eae6b

Vivek has accepted the patch series. So would you integrate it into 3.12?

Best Regards,
Michael

ChangeLog
=========
v7 => v8)

- Rebase to v3.11-rc2
- Add patch 5/6: vmcore: Enable /proc/vmcore mmap for s390

v6 => v7)

- Rebase to v3.11-rc1
- Return VM_FAULT_SIGBUS in fault handler for non s390
- Use __va() for buffer in fault handler

v5 => v6)

- Set elfcorehdr_addr to ELFCORE_ADDR_ERR after elfcorehdr_free()
- Fix OLDMEM_SIZE/ZFCPDUMP_HSA_SIZE confusion
- Remove return VM_FAULT_MAJOR/MINOR
- Use find_or_create_page() in mmap_vmcore_fault()
- Use kfree instead of vfree in elfcorehdr_free()

v4 => v5)

- Add weak function elfcorehdr_read_notes() to read ELF notes
- Rename weak functions for ELF header access and use "vmcorehdr_" prefix
- Generic vmcore code calls elfcorehdr_alloc() if elfcorehdr= is not specified
- Add vmcore fault handler for mmap of non-resident memory regions
- Add weak function remap_oldmem_pfn_range() to be used by zfcpdump for mmap

v3 => v4)

- Rebase to 3.10-rc2 + vmcore mmap patches v8

v2 => v3)

- Get rid of ELFCORE_ADDR_NEWMEM
- Make read_from_crash_header() only read from kernel
- Move read_from_crash_header() to weak function arch_read_from_crash_header()
- Implement read_from_crash_header() strong function for s390
- Set elfcorehdr_addr to address of new ELF header

v1 => v2)

- Rebase 3.10-rc2 + vmcore mmap patches
- Introduced arch_get/free_crash_header() and ELFCORE_ADDR_NEWMEM

Feature Description
===================
For s390 we want to use /proc/vmcore for our SCSI stand-alone
dump (zfcpdump). We have support where the first HSA_SIZE bytes are
saved into a hypervisor owned memory area (HSA) before the kdump
kernel is booted. When the kdump kernel starts, it is restricted
to use only HSA_SIZE bytes.

The advantages of this mechanism are:

* No crashkernel memory has to be defined in the old kernel.
* Early boot problems (before kexec_load has been done) can be dumped
* Non-Linux systems can be dumped.

We modify the s390 copy_oldmem_page() function to read from the HSA memory
if memory below HSA_SIZE bytes is requested.

Since we cannot use the kexec tool to load the kernel in this scenario,
we have to build the ELF header in the 2nd (kdump/new) kernel.

So with the following patch set we would like to introduce the new
function that the ELF header for /proc/vmcore can be created in the 2nd
kernel memory.

The following steps are done during zfcpdump execution:

1. Production system crashes
2. User boots a SCSI disk that has been prepared with the zfcpdump tool
3. Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
4. Boot loader loads kernel into low memory area
5. Kernel boots and uses only HSA_SIZE bytes of memory
6. Kernel saves registers of non-boot CPUs
7. Kernel does memory detection for dump memory map
8. Kernel creates ELF header for /proc/vmcore
9. /proc/vmcore uses this header for initialization
10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
- copy_oldmem_page() copies from HSA for memory below HSA_SIZE
- copy_oldmem_page() copies from real memory for memory above HSA_SIZE

Jan Willeke (1):
s390/vmcore: Implement remap_oldmem_pfn_range for s390

Michael Holzheu (5):
vmcore: Introduce ELF header in new memory feature
s390/vmcore: Use ELF header in new memory feature
vmcore: Introduce remap_oldmem_pfn_range()
vmcore: Enable /proc/vmcore mmap for s390
s390/vmcore: Use vmcore for zfcpdump

arch/s390/Kconfig | 3 +-
arch/s390/include/asm/sclp.h | 1 +
arch/s390/kernel/crash_dump.c | 219 ++++++++++++++++++++++++++++++++++--------
drivers/s390/char/zcore.c | 6 +-
fs/proc/vmcore.c | 154 +++++++++++++++++++++++++----
include/linux/crash_dump.h | 9 ++
6 files changed, 329 insertions(+), 63 deletions(-)

--
1.8.2.3


2013-07-23 15:48:10

by Michael Holzheu

[permalink] [raw]
Subject: [PATCH v8 3/6] vmcore: Introduce remap_oldmem_pfn_range()

For zfcpdump we can't map the HSA storage because it is only available
via a read interface. Therefore, for the new vmcore mmap feature we have
introduce a new mechanism to create mappings on demand.

This patch introduces a new architecture function remap_oldmem_pfn_range()
that should be used to create mappings with remap_pfn_range() for oldmem
areas that can be directly mapped. For zfcpdump this is everything besides
of the HSA memory. For the areas that are not mapped by remap_oldmem_pfn_range()
a generic vmcore a new generic vmcore fault handler mmap_vmcore_fault()
is called.

This handler works as follows:

* Get already available or new page from page cache (find_or_create_page)
* Check if /proc/vmcore page is filled with data (PageUptodate)
* If yes:
Return that page
* If no:
Fill page using __vmcore_read(), set PageUptodate, and return page

Signed-off-by: Michael Holzheu <[email protected]>
Acked-by: Vivek Goyal <[email protected]>
cc: HATAYAMA Daisuke <[email protected]>
---
fs/proc/vmcore.c | 91 ++++++++++++++++++++++++++++++++++++++++++----
include/linux/crash_dump.h | 3 ++
2 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 02cb3ff..3f6cf0e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -21,6 +21,7 @@
#include <linux/crash_dump.h>
#include <linux/list.h>
#include <linux/vmalloc.h>
+#include <linux/pagemap.h>
#include <asm/uaccess.h>
#include <asm/io.h>
#include "internal.h"
@@ -153,11 +154,35 @@ ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
return read_from_oldmem(buf, count, ppos, 0);
}

+/*
+ * Architectures may override this function to map oldmem
+ */
+int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range(vma, from, pfn, size, prot);
+}
+
+/*
+ * Copy to either kernel or user space
+ */
+static int copy_to(void *target, void *src, size_t size, int userbuf)
+{
+ if (userbuf) {
+ if (copy_to_user(target, src, size))
+ return -EFAULT;
+ } else {
+ memcpy(target, src, size);
+ }
+ return 0;
+}
+
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
-static ssize_t read_vmcore(struct file *file, char __user *buffer,
- size_t buflen, loff_t *fpos)
+static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
+ int userbuf)
{
ssize_t acc = 0, tmp;
size_t tsz;
@@ -174,7 +199,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
/* Read ELF core header */
if (*fpos < elfcorebuf_sz) {
tsz = min(elfcorebuf_sz - (size_t)*fpos, buflen);
- if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
+ if (copy_to(buffer, elfcorebuf + *fpos, tsz, userbuf))
return -EFAULT;
buflen -= tsz;
*fpos += tsz;
@@ -192,7 +217,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,

tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
- if (copy_to_user(buffer, kaddr, tsz))
+ if (copy_to(buffer, kaddr, tsz, userbuf))
return -EFAULT;
buflen -= tsz;
*fpos += tsz;
@@ -208,7 +233,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
if (*fpos < m->offset + m->size) {
tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);
start = m->paddr + *fpos - m->offset;
- tmp = read_from_oldmem(buffer, tsz, &start, 1);
+ tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
if (tmp < 0)
return tmp;
buflen -= tsz;
@@ -225,6 +250,55 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
return acc;
}

+static ssize_t read_vmcore(struct file *file, char __user *buffer,
+ size_t buflen, loff_t *fpos)
+{
+ return __read_vmcore(buffer, buflen, fpos, 1);
+}
+
+/*
+ * The vmcore fault handler uses the page cache and fills data using the
+ * standard __vmcore_read() function.
+ *
+ * On s390 the fault handler is used for memory regions that can't be mapped
+ * directly with remap_pfn_range().
+ */
+static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+#ifdef CONFIG_S390
+ struct address_space *mapping = vma->vm_file->f_mapping;
+ pgoff_t index = vmf->pgoff;
+ struct page *page;
+ loff_t offset;
+ char *buf;
+ int rc;
+
+ page = find_or_create_page(mapping, index, GFP_KERNEL);
+ if (!page)
+ return VM_FAULT_OOM;
+ if (!PageUptodate(page)) {
+ offset = (loff_t) index << PAGE_CACHE_SHIFT;
+ buf = __va((page_to_pfn(page) << PAGE_SHIFT));
+ rc = __read_vmcore(buf, PAGE_SIZE, &offset, 0);
+ if (rc < 0) {
+ unlock_page(page);
+ page_cache_release(page);
+ return (rc == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
+ }
+ SetPageUptodate(page);
+ }
+ unlock_page(page);
+ vmf->page = page;
+ return 0;
+#else
+ return VM_FAULT_SIGBUS;
+#endif
+}
+
+static const struct vm_operations_struct vmcore_mmap_ops = {
+ .fault = mmap_vmcore_fault,
+};
+
/**
* alloc_elfnotes_buf - allocate buffer for ELF note segment in
* vmalloc memory
@@ -271,6 +345,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)

vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
vma->vm_flags |= VM_MIXEDMAP;
+ vma->vm_ops = &vmcore_mmap_ops;

len = 0;

@@ -312,9 +387,9 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)

tsz = min_t(size_t, m->offset + m->size - start, size);
paddr = m->paddr + start - m->offset;
- if (remap_pfn_range(vma, vma->vm_start + len,
- paddr >> PAGE_SHIFT, tsz,
- vma->vm_page_prot))
+ if (remap_oldmem_pfn_range(vma, vma->vm_start + len,
+ paddr >> PAGE_SHIFT, tsz,
+ vma->vm_page_prot))
goto fail;
size -= tsz;
start += tsz;
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 6571f82..fe68a5a 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -17,6 +17,9 @@ extern int __weak elfcorehdr_alloc(unsigned long long *addr,
extern void __weak elfcorehdr_free(unsigned long long addr);
extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+extern int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot);

extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
--
1.8.2.3

2013-07-23 15:48:09

by Michael Holzheu

[permalink] [raw]
Subject: [PATCH v8 6/6] s390/vmcore: Use vmcore for zfcpdump

This patch modifies the s390 copy_oldmem_page() and remap_oldmem_pfn_range()
function for zfcpdump to read from the HSA memory if memory below HSA_SIZE
bytes is requested. Otherwise real memory is used.

Signed-off-by: Michael Holzheu <[email protected]>
---
arch/s390/Kconfig | 3 +-
arch/s390/include/asm/sclp.h | 1 +
arch/s390/kernel/crash_dump.c | 122 +++++++++++++++++++++++++++++++++++-------
drivers/s390/char/zcore.c | 6 +--
4 files changed, 110 insertions(+), 22 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 22f75b5..f88bdac 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -514,6 +514,7 @@ config CRASH_DUMP
bool "kernel crash dumps"
depends on 64BIT && SMP
select KEXEC
+ select ZFCPDUMP
help
Generate crash dump after being started by kexec.
Crash dump kernels are loaded in the main kernel with kexec-tools
@@ -524,7 +525,7 @@ config CRASH_DUMP
config ZFCPDUMP
def_bool n
prompt "zfcpdump support"
- select SMP
+ depends on SMP
help
Select this option if you want to build an zfcpdump enabled kernel.
Refer to <file:Documentation/s390/zfcpdump.txt> for more details on this.
diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 06a1361..7dc7f9c 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -56,5 +56,6 @@ bool sclp_has_linemode(void);
bool sclp_has_vt220(void);
int sclp_pci_configure(u32 fid);
int sclp_pci_deconfigure(u32 fid);
+int memcpy_hsa(void *dest, unsigned long src, size_t count, int mode);

#endif /* _ASM_S390_SCLP_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 3e77615..c84f33d 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -16,6 +16,7 @@
#include <asm/os_info.h>
#include <asm/elf.h>
#include <asm/ipl.h>
+#include <asm/sclp.h>

#define PTR_ADD(x, y) (((char *) (x)) + ((unsigned long) (y)))
#define PTR_SUB(x, y) (((char *) (x)) - ((unsigned long) (y)))
@@ -69,22 +70,41 @@ static ssize_t copy_page_real(void *buf, void *src, size_t csize)
static void *elfcorehdr_newmem;

/*
- * Copy one page from "oldmem"
+ * Copy one page from zfcpdump "oldmem"
+ *
+ * For pages below ZFCPDUMP_HSA_SIZE memory from the HSA is copied. Otherwise
+ * real memory copy is used.
+ */
+static ssize_t copy_oldmem_page_zfcpdump(char *buf, size_t csize,
+ unsigned long src, int userbuf)
+{
+ int rc;
+
+ if (src < ZFCPDUMP_HSA_SIZE) {
+ rc = memcpy_hsa(buf, src, csize, userbuf);
+ } else {
+ if (userbuf)
+ rc = copy_to_user_real((void __force __user *) buf,
+ (void *) src, csize);
+ else
+ rc = memcpy_real(buf, (void *) src, csize);
+ }
+ return rc ? rc : csize;
+}
+
+/*
+ * Copy one page from kdump "oldmem"
*
* For the kdump reserved memory this functions performs a swap operation:
* - [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE] is mapped to [0 - OLDMEM_SIZE].
* - [0 - OLDMEM_SIZE] is mapped to [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE]
*/
-ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
- size_t csize, unsigned long offset, int userbuf)
+static ssize_t copy_oldmem_page_kdump(char *buf, size_t csize,
+ unsigned long src, int userbuf)
+
{
- unsigned long src;
int rc;

- if (!csize)
- return 0;
-
- src = (pfn << PAGE_SHIFT) + offset;
if (src < OLDMEM_SIZE)
src += OLDMEM_BASE;
else if (src > OLDMEM_BASE &&
@@ -95,17 +115,35 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
(void *) src, csize);
else
rc = copy_page_real(buf, (void *) src, csize);
- return (rc == 0) ? csize : rc;
+ return (rc == 0) ? rc : csize;
}

/*
- * Remap "oldmem"
+ * Copy one page from "oldmem"
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+ unsigned long offset, int userbuf)
+{
+ unsigned long src;
+
+ if (!csize)
+ return 0;
+ src = (pfn << PAGE_SHIFT) + offset;
+ if (OLDMEM_BASE)
+ return copy_oldmem_page_kdump(buf, csize, src, userbuf);
+ else
+ return copy_oldmem_page_zfcpdump(buf, csize, src, userbuf);
+}
+
+/*
+ * Remap "oldmem" for kdump
*
* For the kdump reserved memory this functions performs a swap operation:
* [0 - OLDMEM_SIZE] is mapped to [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE]
*/
-int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static int remap_oldmem_pfn_range_kdump(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot)
{
unsigned long size_old;
int rc;
@@ -125,6 +163,43 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
}

/*
+ * Remap "oldmem" for zfcpdump
+ *
+ * We only map available memory above ZFCPDUMP_HSA_SIZE. Memory below
+ * ZFCPDUMP_HSA_SIZE is read on demand using the copy_oldmem_page() function.
+ */
+static int remap_oldmem_pfn_range_zfcpdump(struct vm_area_struct *vma,
+ unsigned long from,
+ unsigned long pfn,
+ unsigned long size, pgprot_t prot)
+{
+ unsigned long size_hsa;
+
+ if (pfn < ZFCPDUMP_HSA_SIZE >> PAGE_SHIFT) {
+ size_hsa = min(size, ZFCPDUMP_HSA_SIZE - (pfn << PAGE_SHIFT));
+ if (size == size_hsa)
+ return 0;
+ size -= size_hsa;
+ from += size_hsa;
+ pfn += size_hsa >> PAGE_SHIFT;
+ }
+ return remap_pfn_range(vma, from, pfn, size, prot);
+}
+
+/*
+ * Remap "oldmem" for kdump or zfcpdump
+ */
+int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ if (OLDMEM_BASE)
+ return remap_oldmem_pfn_range_kdump(vma, from, pfn, size, prot);
+ else
+ return remap_oldmem_pfn_range_zfcpdump(vma, from, pfn, size,
+ prot);
+}
+
+/*
* Copy memory from old kernel
*/
int copy_from_oldmem(void *dest, void *src, size_t count)
@@ -132,11 +207,21 @@ int copy_from_oldmem(void *dest, void *src, size_t count)
unsigned long copied = 0;
int rc;

- if ((unsigned long) src < OLDMEM_SIZE) {
- copied = min(count, OLDMEM_SIZE - (unsigned long) src);
- rc = memcpy_real(dest, src + OLDMEM_BASE, copied);
- if (rc)
- return rc;
+ if (OLDMEM_BASE) {
+ if ((unsigned long) src < OLDMEM_SIZE) {
+ copied = min(count, OLDMEM_SIZE - (unsigned long) src);
+ rc = memcpy_real(dest, src + OLDMEM_BASE, copied);
+ if (rc)
+ return rc;
+ }
+ } else {
+ if ((unsigned long) src < ZFCPDUMP_HSA_SIZE) {
+ copied = min(count,
+ ZFCPDUMP_HSA_SIZE - (unsigned long) src);
+ rc = memcpy_hsa(dest, (unsigned long) src, copied, 0);
+ if (rc)
+ return rc;
+ }
}
return memcpy_real(dest + copied, src + copied, count - copied);
}
@@ -466,7 +551,8 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
u32 alloc_size;
u64 hdr_off;

- if (!OLDMEM_BASE)
+ /* If we are not in kdump or zfcpdump mode return */
+ if (!OLDMEM_BASE && ipl_info.type != IPL_TYPE_FCP_DUMP)
return 0;
/* If elfcorehdr= has been passed via cmdline, we use that one */
if (elfcorehdr_addr != ELFCORE_ADDR_MAX)
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index 9e5e146..794820a 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -30,8 +30,8 @@

#define TRACE(x...) debug_sprintf_event(zcore_dbf, 1, x)

-#define TO_USER 0
-#define TO_KERNEL 1
+#define TO_USER 1
+#define TO_KERNEL 0
#define CHUNK_INFO_SIZE 34 /* 2 16-byte char, each followed by blank */

enum arch_id {
@@ -73,7 +73,7 @@ static struct ipl_parameter_block *ipl_block;
* @count: Size of buffer, which should be copied
* @mode: Either TO_KERNEL or TO_USER
*/
-static int memcpy_hsa(void *dest, unsigned long src, size_t count, int mode)
+int memcpy_hsa(void *dest, unsigned long src, size_t count, int mode)
{
int offs, blk_num;
static char buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
--
1.8.2.3

2013-07-23 15:48:32

by Michael Holzheu

[permalink] [raw]
Subject: [PATCH v8 5/6] vmcore: Enable /proc/vmcore mmap for s390

The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
now to use mmap also on s390.

So enable mmap for s390 again.

Signed-off-by: Michael Holzheu <[email protected]>
---
fs/proc/vmcore.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3f6cf0e..532808e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -327,7 +327,7 @@ static inline char *alloc_elfnotes_buf(size_t notes_sz)
* regions in the 1st kernel pointed to by PT_LOAD entries) into
* virtually contiguous user-space in ELF layout.
*/
-#if defined(CONFIG_MMU) && !defined(CONFIG_S390)
+#ifdef CONFIG_MMU
static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
{
size_t size = vma->vm_end - vma->vm_start;
--
1.8.2.3

2013-07-23 15:48:34

by Michael Holzheu

[permalink] [raw]
Subject: [PATCH v8 2/6] s390/vmcore: Use ELF header in new memory feature

This patch now exchanges the old relocate mechanism with the new
arch function call override mechanism that allows to create the ELF
core header in the 2nd kernel.

Signed-off-by: Michael Holzheu <[email protected]>
---
arch/s390/kernel/crash_dump.c | 81 ++++++++++++++++++++++++++++---------------
1 file changed, 54 insertions(+), 27 deletions(-)

diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index d8f3556..0c9a897 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -64,6 +64,11 @@ static ssize_t copy_page_real(void *buf, void *src, size_t csize)
}

/*
+ * Pointer to ELF header in new kernel
+ */
+static void *elfcorehdr_newmem;
+
+/*
* Copy one page from "oldmem"
*
* For the kdump reserved memory this functions performs a swap operation:
@@ -368,14 +373,6 @@ static int get_mem_chunk_cnt(void)
}

/*
- * Relocate pointer in order to allow vmcore code access the data
- */
-static inline unsigned long relocate(unsigned long addr)
-{
- return OLDMEM_BASE + addr;
-}
-
-/*
* Initialize ELF loads (new kernel)
*/
static int loads_init(Elf64_Phdr *phdr, u64 loads_offset)
@@ -426,7 +423,7 @@ static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
ptr = nt_vmcoreinfo(ptr);
memset(phdr, 0, sizeof(*phdr));
phdr->p_type = PT_NOTE;
- phdr->p_offset = relocate(notes_offset);
+ phdr->p_offset = notes_offset;
phdr->p_filesz = (unsigned long) PTR_SUB(ptr, ptr_start);
phdr->p_memsz = phdr->p_filesz;
return ptr;
@@ -435,7 +432,7 @@ static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
/*
* Create ELF core header (new kernel)
*/
-static void s390_elf_corehdr_create(char **elfcorebuf, size_t *elfcorebuf_sz)
+int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
{
Elf64_Phdr *phdr_notes, *phdr_loads;
int mem_chunk_cnt;
@@ -443,6 +440,11 @@ static void s390_elf_corehdr_create(char **elfcorebuf, size_t *elfcorebuf_sz)
u32 alloc_size;
u64 hdr_off;

+ if (!OLDMEM_BASE)
+ return 0;
+ /* If elfcorehdr= has been passed via cmdline, we use that one */
+ if (elfcorehdr_addr != ELFCORE_ADDR_MAX)
+ return 0;
mem_chunk_cnt = get_mem_chunk_cnt();

alloc_size = 0x1000 + get_cpu_cnt() * 0x300 +
@@ -460,27 +462,52 @@ static void s390_elf_corehdr_create(char **elfcorebuf, size_t *elfcorebuf_sz)
ptr = notes_init(phdr_notes, ptr, ((unsigned long) hdr) + hdr_off);
/* Init loads */
hdr_off = PTR_DIFF(ptr, hdr);
- loads_init(phdr_loads, ((unsigned long) hdr) + hdr_off);
- *elfcorebuf_sz = hdr_off;
- *elfcorebuf = (void *) relocate((unsigned long) hdr);
- BUG_ON(*elfcorebuf_sz > alloc_size);
+ loads_init(phdr_loads, hdr_off);
+ *addr = (unsigned long long) hdr;
+ elfcorehdr_newmem = hdr;
+ *size = (unsigned long long) hdr_off;
+ BUG_ON(elfcorehdr_size > alloc_size);
+ return 0;
}

/*
- * Create kdump ELF core header in new kernel, if it has not been passed via
- * the "elfcorehdr" kernel parameter
+ * Free ELF core header (new kernel)
*/
-static int setup_kdump_elfcorehdr(void)
+void elfcorehdr_free(unsigned long long addr)
{
- size_t elfcorebuf_sz;
- char *elfcorebuf;
-
- if (!OLDMEM_BASE || is_kdump_kernel())
- return -EINVAL;
- s390_elf_corehdr_create(&elfcorebuf, &elfcorebuf_sz);
- elfcorehdr_addr = (unsigned long long) elfcorebuf;
- elfcorehdr_size = elfcorebuf_sz;
- return 0;
+ if (!elfcorehdr_newmem)
+ return;
+ kfree((void *)(unsigned long)addr);
+}
+
+/*
+ * Read from ELF header
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ void *src = (void *)(unsigned long)*ppos;
+
+ src = elfcorehdr_newmem ? src : src - OLDMEM_BASE;
+ memcpy(buf, src, count);
+ *ppos += count;
+ return count;
}

-subsys_initcall(setup_kdump_elfcorehdr);
+/*
+ * Read from ELF notes data
+ */
+ssize_t elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
+{
+ void *src = (void *)(unsigned long)*ppos;
+ int rc;
+
+ if (elfcorehdr_newmem) {
+ memcpy(buf, src, count);
+ } else {
+ rc = copy_from_oldmem(buf, src, count);
+ if (rc)
+ return rc;
+ }
+ *ppos += count;
+ return count;
+}
--
1.8.2.3

2013-07-23 15:49:03

by Michael Holzheu

[permalink] [raw]
Subject: [PATCH v8 1/6] vmcore: Introduce ELF header in new memory feature

Currently for s390 we create the ELF core header in the 2nd kernel
with a small trick. We relocate the addresses in the ELF header in
a way that for the /proc/vmcore code it seems to be in the 1st kernel
(old) memory and the read_from_oldmem() returns the correct data.
This allows the /proc/vmcore code to use the ELF header in the
2nd kernel.

This patch now exchanges the old mechanism with the new and much
cleaner function call override feature that now offcially allows to
create the ELF core header in the 2nd kernel.

To use the new feature the following function have to be defined
by the architecture backend code to read from new memory:

* elfcorehdr_alloc: Allocate ELF header
* elfcorehdr_free: Free the memory of the ELF header
* elfcorehdr_read: Read from ELF header
* elfcorehdr_read_notes: Read from ELF notes

Signed-off-by: Michael Holzheu <[email protected]>
Acked-by: Vivek Goyal <[email protected]>
---
fs/proc/vmcore.c | 61 ++++++++++++++++++++++++++++++++++++++--------
include/linux/crash_dump.h | 6 +++++
2 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a1a16eb..02cb3ff 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -123,6 +123,36 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
return read;
}

+/*
+ * Architectures may override this function to allocate ELF header in 2nd kernel
+ */
+int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
+{
+ return 0;
+}
+
+/*
+ * Architectures may override this function to free header
+ */
+void __weak elfcorehdr_free(unsigned long long addr)
+{}
+
+/*
+ * Architectures may override this function to read from ELF header
+ */
+ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ return read_from_oldmem(buf, count, ppos, 0);
+}
+
+/*
+ * Architectures may override this function to read from notes sections
+ */
+ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
+{
+ return read_from_oldmem(buf, count, ppos, 0);
+}
+
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
@@ -357,7 +387,7 @@ static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
notes_section = kmalloc(max_sz, GFP_KERNEL);
if (!notes_section)
return -ENOMEM;
- rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
if (rc < 0) {
kfree(notes_section);
return rc;
@@ -444,7 +474,8 @@ static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
if (phdr_ptr->p_type != PT_NOTE)
continue;
offset = phdr_ptr->p_offset;
- rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
+ &offset);
if (rc < 0)
return rc;
notes_buf += phdr_ptr->p_memsz;
@@ -536,7 +567,7 @@ static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
notes_section = kmalloc(max_sz, GFP_KERNEL);
if (!notes_section)
return -ENOMEM;
- rc = read_from_oldmem(notes_section, max_sz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_section, max_sz, &offset);
if (rc < 0) {
kfree(notes_section);
return rc;
@@ -623,7 +654,8 @@ static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
if (phdr_ptr->p_type != PT_NOTE)
continue;
offset = phdr_ptr->p_offset;
- rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+ rc = elfcorehdr_read_notes(notes_buf, phdr_ptr->p_memsz,
+ &offset);
if (rc < 0)
return rc;
notes_buf += phdr_ptr->p_memsz;
@@ -810,7 +842,7 @@ static int __init parse_crash_elf64_headers(void)
addr = elfcorehdr_addr;

/* Read Elf header */
- rc = read_from_oldmem((char*)&ehdr, sizeof(Elf64_Ehdr), &addr, 0);
+ rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf64_Ehdr), &addr);
if (rc < 0)
return rc;

@@ -837,7 +869,7 @@ static int __init parse_crash_elf64_headers(void)
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
+ rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
if (rc < 0)
goto fail;

@@ -866,7 +898,7 @@ static int __init parse_crash_elf32_headers(void)
addr = elfcorehdr_addr;

/* Read Elf header */
- rc = read_from_oldmem((char*)&ehdr, sizeof(Elf32_Ehdr), &addr, 0);
+ rc = elfcorehdr_read((char *)&ehdr, sizeof(Elf32_Ehdr), &addr);
if (rc < 0)
return rc;

@@ -892,7 +924,7 @@ static int __init parse_crash_elf32_headers(void)
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
- rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
+ rc = elfcorehdr_read(elfcorebuf, elfcorebuf_sz_orig, &addr);
if (rc < 0)
goto fail;

@@ -919,7 +951,7 @@ static int __init parse_crash_elf_headers(void)
int rc=0;

addr = elfcorehdr_addr;
- rc = read_from_oldmem(e_ident, EI_NIDENT, &addr, 0);
+ rc = elfcorehdr_read(e_ident, EI_NIDENT, &addr);
if (rc < 0)
return rc;
if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
@@ -952,7 +984,14 @@ static int __init vmcore_init(void)
{
int rc = 0;

- /* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
+ /* Allow architectures to allocate ELF header in 2nd kernel */
+ rc = elfcorehdr_alloc(&elfcorehdr_addr, &elfcorehdr_size);
+ if (rc)
+ return rc;
+ /*
+ * If elfcorehdr= has been passed in cmdline or created in 2nd kernel,
+ * then capture the dump.
+ */
if (!(is_vmcore_usable()))
return rc;
rc = parse_crash_elf_headers();
@@ -960,6 +999,8 @@ static int __init vmcore_init(void)
pr_warn("Kdump: vmcore not initialized\n");
return rc;
}
+ elfcorehdr_free(elfcorehdr_addr);
+ elfcorehdr_addr = ELFCORE_ADDR_ERR;

proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
if (proc_vmcore)
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 37e4f8d..6571f82 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -12,6 +12,12 @@
extern unsigned long long elfcorehdr_addr;
extern unsigned long long elfcorehdr_size;

+extern int __weak elfcorehdr_alloc(unsigned long long *addr,
+ unsigned long long *size);
+extern void __weak elfcorehdr_free(unsigned long long addr);
+extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
+extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+
extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);

--
1.8.2.3

2013-07-23 15:49:01

by Michael Holzheu

[permalink] [raw]
Subject: [PATCH v8 4/6] s390/vmcore: Implement remap_oldmem_pfn_range for s390

From: Jan Willeke <[email protected]>

This patch introduces the s390 specific way to map pages from oldmem.
The memory area below OLDMEM_SIZE is mapped with offset OLDMEM_BASE.
The other old memory is mapped directly.

Signed-off-by: Jan Willeke <[email protected]>
Signed-off-by: Michael Holzheu <[email protected]>
---
arch/s390/kernel/crash_dump.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 0c9a897..3e77615 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -99,6 +99,32 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
}

/*
+ * Remap "oldmem"
+ *
+ * For the kdump reserved memory this functions performs a swap operation:
+ * [0 - OLDMEM_SIZE] is mapped to [OLDMEM_BASE - OLDMEM_BASE + OLDMEM_SIZE]
+ */
+int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ unsigned long size_old;
+ int rc;
+
+ if (pfn < OLDMEM_SIZE >> PAGE_SHIFT) {
+ size_old = min(size, OLDMEM_SIZE - (pfn << PAGE_SHIFT));
+ rc = remap_pfn_range(vma, from,
+ pfn + (OLDMEM_BASE >> PAGE_SHIFT),
+ size_old, prot);
+ if (rc || size == size_old)
+ return rc;
+ size -= size_old;
+ from += size_old;
+ pfn += size_old >> PAGE_SHIFT;
+ }
+ return remap_pfn_range(vma, from, pfn, size, prot);
+}
+
+/*
* Copy memory from old kernel
*/
int copy_from_oldmem(void *dest, void *src, size_t count)
--
1.8.2.3