2010-12-14 09:53:00

by suzuki

[permalink] [raw]
Subject: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

Hi all,

This is series of patches implementing an infrastructure for capturing the core
of an application without disrupting its process semantics.

The infrastructure makes use of the freezer subsystem in kernel to freeze the
threads and then collect the information to generate the core.

The interface is provided by a /proc/pid/core file, reading which can give the
ELF formatted core of the process with "pid". The interface supports "seek"
operation on the fd, allowing the dumper to have control on the data that is
being dumped. Also it allows the user to store the dump at any location.

The current implementation supports both native as well as the compat ELF
tasks.

An open() call to the /proc/pid/core will try to freeze the threads in the
process and the read() requests will dynamically generate the contents for the
core file. The ELF header & Program Headers are stored in a kernel buffer to
allow us to map the fpos to the required data section.

In case a thread is not frozen within a time interval, after issuing the freeze
request, we fill the register state information with 0's to indicate we could
not capture the data.

A close() would kick the threads out of the refrigerator().


The implementation reuses some of the existing ELF core generation code by
exporting them. Some of the code common to both native and compat ELF class
support has been moved to a common place, elfcore-common.c. Also some of the
reusable functions, specific to the ELF class handling, has been made global,
after renaming the compat version of the same.

We also added a new API -elf_core_copy_extra_phdrs() -for "reading" the arch
specific program headers, versus the existing elf_core_write_extra_phdrs().

Patches 1 to 9 deals with re-arranging the ELF code to be reusable by the
infrastructure.

Patches 10 to 21 implements the infrastructure.

TODO: Add support for collecting the arch specific notes, currently used only
by Cell platform.

Please let me know your review comments / thoughts.

arch/ia64/kernel/elfcore.c | 34 ++
arch/um/sys-i386/elfcore.c | 32 ++
fs/Makefile | 1
fs/binfmt_elf.c | 162 +------------
fs/compat_binfmt_elf.c | 7
fs/elfcore-common.c | 138 +++++++++++
fs/proc/Makefile | 2
fs/proc/base.c | 1
fs/proc/gencore-compat-elf.c | 61 ++++
fs/proc/gencore-elf.c | 479+++++++++++++++++++++++++++++++++++++++
fs/proc/gencore.c | 297++++++++++++++++++++++++
fs/proc/gencore.h | 71 +++++
fs/proc/internal.h | 1
include/linux/elfcore-internal.h | 73 +++++
include/linux/elfcore.h | 3
include/linux/freezer.h | 12
kernel/elfcore.c | 6
kernel/power/process.c | 9
18 files changed, 1241 insertions(+), 148 deletions(-)


Thanks

Suzuki


2010-12-14 09:56:27

by suzuki

[permalink] [raw]
Subject: [Patch 1/21] Reuse freezable() predicate

Export the freezable() predicate.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
include/linux/freezer.h | 12 ++++++++++++
kernel/power/process.c | 9 ---------
2 files changed, 12 insertions(+), 9 deletions(-)

Index: linux-2.6.36-rc7/include/linux/freezer.h
===================================================================
--- linux-2.6.36-rc7.orig/include/linux/freezer.h
+++ linux-2.6.36-rc7/include/linux/freezer.h
@@ -8,6 +8,18 @@

#ifdef CONFIG_FREEZER
/*
+ * Check if the task is freezeable ?
+ */
+static inline int freezeable(struct task_struct * p)
+{
+ if ((p == current) ||
+ (p->flags & PF_NOFREEZE) ||
+ (p->exit_state != 0))
+ return 0;
+ return 1;
+}
+
+/*
* Check if a process has been frozen
*/
static inline int frozen(struct task_struct *p)
Index: linux-2.6.36-rc7/kernel/power/process.c
===================================================================
--- linux-2.6.36-rc7.orig/kernel/power/process.c
+++ linux-2.6.36-rc7/kernel/power/process.c
@@ -22,15 +22,6 @@
*/
#define TIMEOUT (20 * HZ)

-static inline int freezeable(struct task_struct * p)
-{
- if ((p == current) ||
- (p->flags & PF_NOFREEZE) ||
- (p->exit_state != 0))
- return 0;
- return 1;
-}
-
static int try_to_freeze_tasks(bool sig_only)
{
struct task_struct *g, *p;

2010-12-14 09:57:45

by suzuki

[permalink] [raw]
Subject: [Patch 2/21] Create elfcore-common.c for ELF class independent core generation helpers

Move the common code, shared by both native and compat ELF core generation code
to a single instance.These functions could be re-used later for application
core dump infrastructure.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/Makefile | 1
fs/binfmt_elf.c | 129 ---------------------------------------
fs/elfcore-common.c | 126 ++++++++++++++++++++++++++++++++++++++
include/linux/elfcore-internal.h | 35 ++++++++++
4 files changed, 163 insertions(+), 128 deletions(-)

Index: linux-2.6.36-rc7/fs/binfmt_elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/binfmt_elf.c
+++ linux-2.6.36-rc7/fs/binfmt_elf.c
@@ -46,6 +46,7 @@ static unsigned long elf_map(struct file
* don't even try.
*/
#ifdef CONFIG_ELF_CORE
+#include <linux/elfcore-internal.h>
static int elf_core_dump(struct coredump_params *cprm);
#else
#define elf_core_dump NULL
@@ -1087,98 +1088,6 @@ out:
* Jeremy Fitzhardinge <[email protected]>
*/

-/*
- * Decide what to dump of a segment, part, all or none.
- */
-static unsigned long vma_dump_size(struct vm_area_struct *vma,
- unsigned long mm_flags)
-{
-#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
-
- /* The vma can be set up to tell us the answer directly. */
- if (vma->vm_flags & VM_ALWAYSDUMP)
- goto whole;
-
- /* Hugetlb memory check */
- if (vma->vm_flags & VM_HUGETLB) {
- if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
- goto whole;
- if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
- goto whole;
- }
-
- /* Do not dump I/O mapped devices or special mappings */
- if (vma->vm_flags & (VM_IO | VM_RESERVED))
- return 0;
-
- /* By default, dump shared memory if mapped from an anonymous file. */
- if (vma->vm_flags & VM_SHARED) {
- if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0 ?
- FILTER(ANON_SHARED) : FILTER(MAPPED_SHARED))
- goto whole;
- return 0;
- }
-
- /* Dump segments that have been written to. */
- if (vma->anon_vma && FILTER(ANON_PRIVATE))
- goto whole;
- if (vma->vm_file == NULL)
- return 0;
-
- if (FILTER(MAPPED_PRIVATE))
- goto whole;
-
- /*
- * If this looks like the beginning of a DSO or executable mapping,
- * check for an ELF header. If we find one, dump the first page to
- * aid in determining what was mapped here.
- */
- if (FILTER(ELF_HEADERS) &&
- vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ)) {
- u32 __user *header = (u32 __user *) vma->vm_start;
- u32 word;
- mm_segment_t fs = get_fs();
- /*
- * Doing it this way gets the constant folded by GCC.
- */
- union {
- u32 cmp;
- char elfmag[SELFMAG];
- } magic;
- BUILD_BUG_ON(SELFMAG != sizeof word);
- magic.elfmag[EI_MAG0] = ELFMAG0;
- magic.elfmag[EI_MAG1] = ELFMAG1;
- magic.elfmag[EI_MAG2] = ELFMAG2;
- magic.elfmag[EI_MAG3] = ELFMAG3;
- /*
- * Switch to the user "segment" for get_user(),
- * then put back what elf_core_dump() had in place.
- */
- set_fs(USER_DS);
- if (unlikely(get_user(word, header)))
- word = 0;
- set_fs(fs);
- if (word == magic.cmp)
- return PAGE_SIZE;
- }
-
-#undef FILTER
-
- return 0;
-
-whole:
- return vma->vm_end - vma->vm_start;
-}
-
-/* An ELF note in memory */
-struct memelfnote
-{
- const char *name;
- int type;
- unsigned int datasz;
- void *data;
-};
-
static int notesize(struct memelfnote *en)
{
int sz;
@@ -1256,16 +1165,6 @@ static void fill_elf_note_phdr(struct el
return;
}

-static void fill_note(struct memelfnote *note, const char *name, int type,
- unsigned int sz, void *data)
-{
- note->name = name;
- note->type = type;
- note->datasz = sz;
- note->data = data;
- return;
-}
-
/*
* fill up all the fields in prstatus from the given task struct, except
* registers which need to be filled up separately.
@@ -1812,32 +1711,6 @@ static void free_note_info(struct elf_no

#endif

-static struct vm_area_struct *first_vma(struct task_struct *tsk,
- struct vm_area_struct *gate_vma)
-{
- struct vm_area_struct *ret = tsk->mm->mmap;
-
- if (ret)
- return ret;
- return gate_vma;
-}
-/*
- * Helper function for iterating across a vma list. It ensures that the caller
- * will visit `gate_vma' prior to terminating the search.
- */
-static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
- struct vm_area_struct *gate_vma)
-{
- struct vm_area_struct *ret;
-
- ret = this_vma->vm_next;
- if (ret)
- return ret;
- if (this_vma == gate_vma)
- return NULL;
- return gate_vma;
-}
-
static void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
elf_addr_t e_shoff, int segs)
{
Index: linux-2.6.36-rc7/fs/elfcore-common.c
===================================================================
--- /dev/null
+++ linux-2.6.36-rc7/fs/elfcore-common.c
@@ -0,0 +1,131 @@
+/*
+ * Copyright 1993, 1994: Eric Youngdale ([email protected]).
+ *
+ * Initial write up for ELF_CORE :
+ * Modelled on fs/exec.c:aout_core_dump()
+ * Jeremy Fitzhardinge <[email protected]>
+ *
+ * Moved common routines used by both native and compat ELF core generation
+ * from binfmt_elf.c to a single instance.
+ * Suzuki K. Poulose <[email protected]>
+ */
+
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/elf.h>
+#include <asm/uaccess.h>
+#include <asm/param.h>
+#include <asm/page.h>
+
+#include <linux/elfcore-internal.h>
+
+struct vm_area_struct *first_vma(struct task_struct *tsk,
+ struct vm_area_struct *gate_vma)
+{
+ struct vm_area_struct *ret = tsk->mm->mmap;
+
+ if (ret)
+ return ret;
+ return gate_vma;
+}
+/*
+ * Helper function for iterating across a vma list. It ensures that the caller
+ * will visit `gate_vma' prior to terminating the search.
+ */
+struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
+ struct vm_area_struct *gate_vma)
+{
+ struct vm_area_struct *ret;
+
+ ret = this_vma->vm_next;
+ if (ret)
+ return ret;
+ if (this_vma == gate_vma)
+ return NULL;
+ return gate_vma;
+}
+
+/*
+ * Decide what to dump of a segment, part, all or none.
+ */
+unsigned long vma_dump_size(struct vm_area_struct *vma,
+ unsigned long mm_flags)
+{
+#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
+
+ /* The vma can be set up to tell us the answer directly. */
+ if (vma->vm_flags & VM_ALWAYSDUMP)
+ goto whole;
+
+ /* Hugetlb memory check */
+ if (vma->vm_flags & VM_HUGETLB) {
+ if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
+ goto whole;
+ if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
+ goto whole;
+ }
+
+ /* Do not dump I/O mapped devices or special mappings */
+ if (vma->vm_flags & (VM_IO | VM_RESERVED))
+ return 0;
+
+ /* By default, dump shared memory if mapped from an anonymous file. */
+ if (vma->vm_flags & VM_SHARED) {
+ if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0 ?
+ FILTER(ANON_SHARED) : FILTER(MAPPED_SHARED))
+ goto whole;
+ return 0;
+ }
+
+ /* Dump segments that have been written to. */
+ if (vma->anon_vma && FILTER(ANON_PRIVATE))
+ goto whole;
+ if (vma->vm_file == NULL)
+ return 0;
+
+ if (FILTER(MAPPED_PRIVATE))
+ goto whole;
+
+ /*
+ * If this looks like the beginning of a DSO or executable mapping,
+ * check for an ELF header. If we find one, dump the first page to
+ * aid in determining what was mapped here.
+ */
+ if (FILTER(ELF_HEADERS) &&
+ vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ)) {
+ u32 __user *header = (u32 __user *) vma->vm_start;
+ u32 word;
+ mm_segment_t fs = get_fs();
+ /*
+ * Doing it this way gets the constant folded by GCC.
+ */
+ union {
+ u32 cmp;
+ char elfmag[SELFMAG];
+ } magic;
+ BUILD_BUG_ON(SELFMAG != sizeof word);
+ magic.elfmag[EI_MAG0] = ELFMAG0;
+ magic.elfmag[EI_MAG1] = ELFMAG1;
+ magic.elfmag[EI_MAG2] = ELFMAG2;
+ magic.elfmag[EI_MAG3] = ELFMAG3;
+ /*
+ * Switch to the user "segment" for get_user(),
+ * then put back what elf_core_dump() had in place.
+ */
+ set_fs(USER_DS);
+ if (unlikely(get_user(word, header)))
+ word = 0;
+ set_fs(fs);
+ if (word == magic.cmp)
+ return PAGE_SIZE;
+ }
+
+#undef FILTER
+
+ return 0;
+
+whole:
+ return vma->vm_end - vma->vm_start;
+}
Index: linux-2.6.36-rc7/include/linux/elfcore-internal.h
===================================================================
--- /dev/null
+++ linux-2.6.36-rc7/include/linux/elfcore-internal.h
@@ -0,0 +1,44 @@
+/*
+ * Common routines for native and compat elf core generation.
+ *
+ * Copyright 1993, 1994: Eric Youngdale ([email protected]).
+ *
+ * Modelled on fs/exec.c:aout_core_dump()
+ * Jeremy Fitzhardinge <[email protected]>
+ *
+ * Moved the common routines from binfmt_elf.c to elfcore-common.c
+ * - Suzuki K. Poulose <[email protected]>
+ */
+
+#ifndef __ELF_CORE_INTERNAL_H
+#define __ELF_CORE_INTERNAL_H
+
+#ifdef __KERNEL__
+
+/* An ELF note in memory */
+struct memelfnote
+{
+ const char *name;
+ int type;
+ unsigned int datasz;
+ void *data;
+};
+
+static inline void fill_note(struct memelfnote *note, const char *name,
+ int type, unsigned int sz, void *data)
+{
+ note->name = name;
+ note->type = type;
+ note->datasz = sz;
+ note->data = data;
+}
+
+extern struct vm_area_struct *first_vma(struct task_struct *tsk,
+ struct vm_area_struct *gate_vma);
+extern struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
+ struct vm_area_struct *gate_vma);
+extern unsigned long vma_dump_size(struct vm_area_struct *vma,
+ unsigned long mm_flags);
+
+#endif
+#endif
Index: linux-2.6.36-rc7/fs/Makefile
===================================================================
--- linux-2.6.36-rc7.orig/fs/Makefile
+++ linux-2.6.36-rc7/fs/Makefile
@@ -42,6 +42,7 @@ obj-y += binfmt_script.o

obj-$(CONFIG_BINFMT_ELF) += binfmt_elf.o
obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o
+obj-$(CONFIG_ELF_CORE) += elfcore-common.o
obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o
obj-$(CONFIG_BINFMT_SOM) += binfmt_som.o
obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o

2010-12-14 10:00:07

by suzuki

[permalink] [raw]
Subject: [Patch 3/21] Make vma_dump_size() generic

vma_dump_size calculates the file size of a vma area in the core file. It
assumes the vma belongs to the "current". Make it generic to work for any task.
This will be reused by application core dump infrastructure.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/binfmt_elf.c | 6 +++---
fs/elfcore-common.c | 21 ++++++++++++++-------
include/linux/elfcore-internal.h | 4 ++--
3 files changed, 19 insertions(+), 12 deletions(-)

Index: linux-2.6.36-rc7/fs/binfmt_elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/binfmt_elf.c
+++ linux-2.6.36-rc7/fs/binfmt_elf.c
@@ -1735,7 +1735,7 @@ static size_t elf_core_vma_data_size(str

for (vma = first_vma(current, gate_vma); vma != NULL;
vma = next_vma(vma, gate_vma))
- size += vma_dump_size(vma, mm_flags);
+ size += vma_dump_size(current, vma, mm_flags);
return size;
}

@@ -1860,7 +1860,7 @@ static int elf_core_dump(struct coredump
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = vma_dump_size(vma, cprm->mm_flags);
+ phdr.p_filesz = vma_dump_size(current, vma, cprm->mm_flags);
phdr.p_memsz = vma->vm_end - vma->vm_start;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -1895,7 +1895,7 @@ static int elf_core_dump(struct coredump
unsigned long addr;
unsigned long end;

- end = vma->vm_start + vma_dump_size(vma, cprm->mm_flags);
+ end = vma->vm_start + vma_dump_size(current, vma, cprm->mm_flags);

for (addr = vma->vm_start; addr < end; addr += PAGE_SIZE) {
struct page *page;
Index: linux-2.6.36-rc7/fs/elfcore-common.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/elfcore-common.c
+++ linux-2.6.36-rc7/fs/elfcore-common.c
@@ -50,8 +50,8 @@ struct vm_area_struct *next_vma(struct v
/*
* Decide what to dump of a segment, part, all or none.
*/
-unsigned long vma_dump_size(struct vm_area_struct *vma,
- unsigned long mm_flags)
+unsigned long vma_dump_size(struct task_struct *p, struct vm_area_struct *vma,
+ unsigned long mm_flags)
{
#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))

@@ -97,7 +97,6 @@ unsigned long vma_dump_size(struct vm_ar
vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ)) {
u32 __user *header = (u32 __user *) vma->vm_start;
u32 word;
- mm_segment_t fs = get_fs();
/*
* Doing it this way gets the constant folded by GCC.
*/
@@ -114,10 +113,18 @@ unsigned long vma_dump_size(struct vm_ar
* Switch to the user "segment" for get_user(),
* then put back what elf_core_dump() had in place.
*/
- set_fs(USER_DS);
- if (unlikely(get_user(word, header)))
- word = 0;
- set_fs(fs);
+ if (p->group_leader == current->group_leader) {
+ mm_segment_t fs = get_fs();
+ set_fs(USER_DS);
+ if (unlikely(get_user(word, header)))
+ word = 0;
+ set_fs(fs);
+ } else {
+ int bytes = access_process_vm(p, (unsigned long)header,
+ &word, sizeof (word), 0);
+ if (bytes != sizeof(word))
+ word = 0;
+ }
if (word == magic.cmp)
return PAGE_SIZE;
}
Index: linux-2.6.36-rc7/include/linux/elfcore-internal.h
===================================================================
--- linux-2.6.36-rc7.orig/include/linux/elfcore-internal.h
+++ linux-2.6.36-rc7/include/linux/elfcore-internal.h
@@ -37,8 +37,8 @@ extern struct vm_area_struct *first_vma(
struct vm_area_struct *gate_vma);
extern struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
struct vm_area_struct *gate_vma);
-extern unsigned long vma_dump_size(struct vm_area_struct *vma,
- unsigned long mm_flags);
+extern unsigned long vma_dump_size(struct task_struct *task,
+ struct vm_area_struct *vma, unsigned long mm_flags);

#endif
#endif

2010-12-14 10:01:30

by suzuki

[permalink] [raw]
Subject: [Patch 4/21] Make fill_psinfo generic

fill_psinfo() fills the NT_PRPSINFO note for the core. NT_PRPSINFO stores the
command line of the process, which is stored at from mm->arg_start. Make
fill_psinfo reusable by supporting other tasks. Use access_process_vm() to read
the command line args for non-current task.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/binfmt_elf.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

Index: linux-2.6.36-rc7/fs/binfmt_elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/binfmt_elf.c
+++ linux-2.6.36-rc7/fs/binfmt_elf.c
@@ -1211,9 +1211,20 @@ static int fill_psinfo(struct elf_prpsin
len = mm->arg_end - mm->arg_start;
if (len >= ELF_PRARGSZ)
len = ELF_PRARGSZ-1;
- if (copy_from_user(&psinfo->pr_psargs,
- (const char __user *)mm->arg_start, len))
- return -EFAULT;
+ /*
+ * If we are dumping core for "another task"
+ * we can't use copy_from_user().
+ */
+ if (p->group_leader == current->group_leader) {
+ if (copy_from_user(&psinfo->pr_psargs,
+ (const char __user *)mm->arg_start, len))
+ return -EFAULT;
+ } else {
+ int bytes = access_process_vm(p, mm->arg_start,
+ &psinfo->pr_psargs, len, 0);
+ if (bytes != len)
+ return -EFAULT;
+ }
for(i = 0; i < len; i++)
if (psinfo->pr_psargs[i] == 0)
psinfo->pr_psargs[i] = ' ';

2010-12-14 10:03:11

by suzuki

[permalink] [raw]
Subject: [Patch 5/21] Rename compat versions of the reusable core generation routines

Rename the ELF class specific functions reusable for the application core dump
infrastructure. The compat ELF class routines are prepended with compat_ .

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/compat_binfmt_elf.c | 7 +++++++
1 file changed, 7 insertions(+)

Index: linux-2.6.36-rc7/fs/compat_binfmt_elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/compat_binfmt_elf.c
+++ linux-2.6.36-rc7/fs/compat_binfmt_elf.c
@@ -127,6 +127,13 @@ static void cputime_to_compat_timeval(co
#define init_elf_binfmt init_compat_elf_binfmt
#define exit_elf_binfmt exit_compat_elf_binfmt

+/* Rename the functions that may be reused */
+#define fill_elf_header compat_fill_elf_header
+#define fill_psinfo compat_fill_psinfo
+#define fill_prstatus compat_fill_prstatus
+#define fill_extnum_info compat_fill_extnum_info
+#define fill_auxv_note compat_fill_auxv_note
+
/*
* We share all the actual code with the native (64-bit) version.
*/

2010-12-14 10:04:25

by suzuki

[permalink] [raw]
Subject: [Patch 6/21] Export the reusable ELF core generation routines

Export the ELF class specific core generation helper functions via
elfcore-internal.h

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/binfmt_elf.c | 10 +++++-----
include/linux/elfcore-internal.h | 29 +++++++++++++++++++++++++++++
2 files changed, 34 insertions(+), 5 deletions(-)

Index: linux-2.6.36-rc7/fs/binfmt_elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/binfmt_elf.c
+++ linux-2.6.36-rc7/fs/binfmt_elf.c
@@ -1129,7 +1129,7 @@ static int writenote(struct memelfnote *
}
#undef DUMP_WRITE

-static void fill_elf_header(struct elfhdr *elf, int segs,
+void fill_elf_header(struct elfhdr *elf, int segs,
u16 machine, u32 flags, u8 osabi)
{
memset(elf, 0, sizeof(*elf));
@@ -1169,7 +1169,7 @@ static void fill_elf_note_phdr(struct el
* fill up all the fields in prstatus from the given task struct, except
* registers which need to be filled up separately.
*/
-static void fill_prstatus(struct elf_prstatus *prstatus,
+void fill_prstatus(struct elf_prstatus *prstatus,
struct task_struct *p, long signr)
{
prstatus->pr_info.si_signo = prstatus->pr_cursig = signr;
@@ -1199,7 +1199,7 @@ static void fill_prstatus(struct elf_prs
cputime_to_timeval(p->signal->cstime, &prstatus->pr_cstime);
}

-static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
+int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
struct mm_struct *mm)
{
const struct cred *cred;
@@ -1253,7 +1253,7 @@ static int fill_psinfo(struct elf_prpsin
return 0;
}

-static void fill_auxv_note(struct memelfnote *note, struct mm_struct *mm)
+void fill_auxv_note(struct memelfnote *note, struct mm_struct *mm)
{
elf_addr_t *auxv = (elf_addr_t *) mm->saved_auxv;
int i = 0;
@@ -1722,7 +1722,7 @@ static void free_note_info(struct elf_no

#endif

-static void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
+void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
elf_addr_t e_shoff, int segs)
{
elf->e_shoff = e_shoff;
Index: linux-2.6.36-rc7/include/linux/elfcore-internal.h
===================================================================
--- linux-2.6.36-rc7.orig/include/linux/elfcore-internal.h
+++ linux-2.6.36-rc7/include/linux/elfcore-internal.h
@@ -15,6 +15,8 @@

#ifdef __KERNEL__

+#include <linux/elfcore.h>
+
/* An ELF note in memory */
struct memelfnote
{
@@ -40,5 +42,32 @@ extern struct vm_area_struct *next_vma(s
extern unsigned long vma_dump_size(struct task_struct *task,
struct vm_area_struct *vma, unsigned long mm_flags);

+/* Core dump generation routines for native ELF class */
+extern void fill_elf_header(struct elfhdr *elf, int segs,
+ u16 machine, u32 flags, u8 osabi);
+extern void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr,
+ elf_addr_t e_shoff, int segs);
+extern void fill_prstatus(struct elf_prstatus *prstatus,
+ struct task_struct *p, long signr);
+extern int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
+ struct mm_struct *mm);
+extern void fill_auxv_note(struct memelfnote *note, struct mm_struct *mm);
+
+/* compat ELF class variant of the above helpers*/
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+#include <linux/elfcore-compat.h>
+
+extern void compat_fill_extnum_info(struct elf32_hdr *elf, struct elf32_shdr *shdr,
+ Elf32_Addr e_shoff, int segs);
+extern void compat_fill_prstatus(struct compat_elf_prstatus *prstatus,
+ struct task_struct *p, long signr);
+extern int compat_fill_psinfo(struct compat_elf_prpsinfo *psinfo, struct task_struct *p,
+ struct mm_struct *mm);
+extern void compat_fill_auxv_note(struct memelfnote *note, struct mm_struct *mm);
+
+extern void compat_fill_elf_header(struct elf32_hdr *elf, int segs,
+ u16 machine, u32 flags, u8 osabi);
+#endif /* CONFIG_COMPAT_BINFMT_ELF */
+
#endif
#endif

2010-12-14 10:05:51

by suzuki

[permalink] [raw]
Subject: [Patch 7/21] Define API for reading arch specif Program Headers for Core

The binfmt ELF defines APIs for the "arch" specific Program headers to be
written to the ELF core. Define the "read" (or copy) variants of the APIs, to
collect it for the application core dump(which is a read based approach).

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
include/linux/elfcore.h | 3 +++
kernel/elfcore.c | 6 ++++++
2 files changed, 9 insertions(+)

Index: linux-2.6.36-rc7/include/linux/elfcore.h
===================================================================
--- linux-2.6.36-rc7.orig/include/linux/elfcore.h
+++ linux-2.6.36-rc7/include/linux/elfcore.h
@@ -163,6 +163,9 @@ extern int
elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
unsigned long limit);
extern int
+elf_core_copy_extra_phdrs(char *buf, loff_t offset, size_t *size,
+ unsigned long limit);
+extern int
elf_core_write_extra_data(struct file *file, size_t *size, unsigned long limit);
extern size_t elf_core_extra_data_size(void);

Index: linux-2.6.36-rc7/kernel/elfcore.c
===================================================================
--- linux-2.6.36-rc7.orig/kernel/elfcore.c
+++ linux-2.6.36-rc7/kernel/elfcore.c
@@ -16,6 +16,12 @@ int __weak elf_core_write_extra_phdrs(st
return 1;
}

+int __weak elf_core_copy_extra_phdrs(char *buf, loff_t offset, size_t *size,
+ unsigned long limit)
+{
+ return 1;
+}
+
int __weak elf_core_write_extra_data(struct file *file, size_t *size,
unsigned long limit)
{

2010-12-14 10:08:24

by suzuki

[permalink] [raw]
Subject: [Patch 8/21] ia64 Implementation of elf_core_copy_extra_phdrs()

ia64 impelementation for elf_core_copy_extra_phdrs(). Adapted from
elf_core_write_extra_phdrs().

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
arch/ia64/kernel/elfcore.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

Index: linux-2.6.36-rc7/arch/ia64/kernel/elfcore.c
===================================================================
--- linux-2.6.36-rc7.orig/arch/ia64/kernel/elfcore.c
+++ linux-2.6.36-rc7/arch/ia64/kernel/elfcore.c
@@ -42,6 +42,40 @@ int elf_core_write_extra_phdrs(struct fi
return 1;
}

+int elf_core_copy_extra_phdrs(char *bufp, loff_t offset, size_t *size,
+ unsigned long limit)
+{
+ const struct elf_phdr *const gate_phdrs =
+ (const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
+ int i;
+ Elf64_Off ofs = 0;
+ struct elf_phdr *phdr = (struct elf_phdr *) bufp;
+
+ for (i = 0; i < GATE_EHDR->e_phnum; ++i) {
+ *size += sizeof(*phdr);
+ if (*size > limit)
+ return 0;
+
+ *phdr = gate_phdrs[i];
+
+ if (phdr->p_type == PT_LOAD) {
+ phdr->p_memsz = PAGE_ALIGN(phdr->p_memsz);
+ phdr->p_filesz = phdr->p_memsz;
+ if (ofs == 0) {
+ ofs = phdr->p_offset = *offset;
+ offset += phdr->p_filesz;
+ } else {
+ phdr->p_offset = ofs;
+ }
+ } else {
+ phdr->p_offset += ofs;
+ }
+ phdr->p_paddr = 0; /* match other core phdrs */
+ phdr++;
+ }
+ return 1;
+}
+
int elf_core_write_extra_data(struct file *file, size_t *size,
unsigned long limit)
{

2010-12-14 10:09:51

by suzuki

[permalink] [raw]
Subject: [Patch 9/21] UML (i386) Implementation of elf_core_copy_extra_phdrs()

elf_core_copy_extra_phdrs() for UML. Adapted from elf_core_write_extra_phdrs()

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
arch/um/sys-i386/elfcore.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

Index: linux-2.6.36-rc7/arch/um/sys-i386/elfcore.c
===================================================================
--- linux-2.6.36-rc7.orig/arch/um/sys-i386/elfcore.c
+++ linux-2.6.36-rc7/arch/um/sys-i386/elfcore.c
@@ -41,6 +41,38 @@ int elf_core_write_extra_phdrs(struct fi
return 1;
}

+int elf_core_copy_extra_phdrs(char *bufp, loff_t offset, size_t *size,
+ unsigned long limit)
+{
+ if (vsyscall_ehdr) {
+ const struct elfhdr *const ehdrp =
+ (struct elfhdr *) vsyscall_ehdr;
+ const struct elf_phdr *const phdrp =
+ (const struct elf_phdr *) (vsyscall_ehdr + ehdrp->e_phoff);
+ struct elf_phdr *phdr = (struct elf_phdr *)bufp;
+ int i;
+ Elf32_Off ofs = 0;
+
+ for (i = 0; i < ehdrp->e_phnum; ++i) {
+ *size += sizeof(*phdr);
+ if (*size > limit)
+ return 0;
+
+ *phdr = phdrp[i];
+
+ if (phdr.p_type == PT_LOAD) {
+ ofs = phdr.p_offset = offset;
+ offset += phdr.p_filesz;
+ } else {
+ phdr.p_offset += ofs;
+ }
+ phdr.p_paddr = 0; /* match other core phdrs */
+ phdr++;
+ }
+ }
+ return 1;
+}
+
int elf_core_write_extra_data(struct file *file, size_t *size,
unsigned long limit)
{

2010-12-14 10:10:58

by suzuki

[permalink] [raw]
Subject: [Patch 10/21] Create /proc/pid/core entry

Create the /proc/PID/core entry.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N.Mavinakayanahalli <[email protected]>
---
fs/proc/Makefile | 1 +
fs/proc/base.c | 1 +
fs/proc/gencore.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/proc/internal.h | 1 +
4 files changed, 51 insertions(+)

Index: linux-2.6.36-rc7/fs/proc/Makefile
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/Makefile
+++ linux-2.6.36-rc7/fs/proc/Makefile
@@ -19,6 +19,7 @@ proc-y += stat.o
proc-y += uptime.o
proc-y += version.o
proc-y += softirqs.o
+proc-$(CONFIG_ELF_CORE) += gencore.o
proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o
Index: linux-2.6.36-rc7/fs/proc/base.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/base.c
+++ linux-2.6.36-rc7/fs/proc/base.c
@@ -2735,6 +2735,7 @@ static const struct pid_entry tgid_base_
#endif
#ifdef CONFIG_ELF_CORE
REG("coredump_filter", S_IRUGO|S_IWUSR, proc_coredump_filter_operations),
+ REG("core", S_IRUSR, proc_gen_core_operations),
#endif
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, proc_tgid_io_accounting),
Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- /dev/null
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -0,0 +1,48 @@
+/*
+ * Application core dump
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010
+ *
+ * Authors:
+ * Ananth N.Mavinakayanahalli <[email protected]>
+ * Suzuki K. Poulose <[email protected]>
+ */
+
+#include <linux/seq_file.h>
+#include "internal.h"
+
+static ssize_t read_gencore(struct file *file, char __user *buffer,
+ size_t buflen, loff_t *fpos)
+{
+ return 0;
+}
+
+static int release_gencore(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static int open_gencore(struct inode *inode, struct file *filp)
+{
+ return 0;
+}
+
+const struct file_operations proc_gen_core_operations = {
+ .open = open_gencore,
+ .read = read_gencore,
+ .release = release_gencore,
+};
Index: linux-2.6.36-rc7/fs/proc/internal.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/internal.h
+++ linux-2.6.36-rc7/fs/proc/internal.h
@@ -60,6 +60,7 @@ extern const struct file_operations proc
extern const struct file_operations proc_pagemap_operations;
extern const struct file_operations proc_net_operations;
extern const struct inode_operations proc_net_inode_operations;
+extern const struct file_operations proc_gen_core_operations;

void proc_init_inodecache(void);

2010-12-14 10:11:55

by suzuki

[permalink] [raw]
Subject: [Patch 11/21] Track the core generation requests

Keep track of the core generation requests. Concurrent core generation requests
for the same target process are not allowed.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N.Mavinakayanahalli <[email protected]>
---
fs/proc/gencore.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/proc/gencore.h | 12 +++++++
2 files changed, 95 insertions(+), 2 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -24,21 +24,102 @@

#include <linux/seq_file.h>
#include "internal.h"
+#include "gencore.h"
+
+static LIST_HEAD(core_list);
+static DEFINE_MUTEX(core_mutex);
+
+/* Called with core_mutex held */
+static struct core_proc* get_core_proc(struct task_struct *t)
+{
+ struct core_proc *cp;
+
+ list_for_each_entry(cp, &core_list, list) {
+ if (cp->task == t->group_leader)
+ return cp;
+ }
+ return NULL;
+}

static ssize_t read_gencore(struct file *file, char __user *buffer,
size_t buflen, loff_t *fpos)
{
- return 0;
+ struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct core_proc *cp;
+ ssize_t ret = 0;
+
+ if (!task)
+ return -EIO;
+
+ mutex_lock(&core_mutex);
+ cp = get_core_proc(task);
+ if (!cp) {
+ mutex_unlock(&core_mutex);
+ ret = -EIO;
+ goto out;
+ }
+ mutex_unlock(&core_mutex);
+
+out:
+ put_task_struct(task);
+ return ret;
}

static int release_gencore(struct inode *inode, struct file *file)
{
+ struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct core_proc *cp;
+
+ if (!task)
+ return -EIO;
+
+ mutex_lock(&core_mutex);
+ cp = get_core_proc(task);
+ if (cp) {
+ list_del(&cp->list);
+ kfree(cp);
+ }
+ mutex_unlock(&core_mutex);
+ put_task_struct(task);
return 0;
}

+/*
+ * Validate if the call is valid. We also need to prevent >1 open
+ * of the same file.
+ */
static int open_gencore(struct inode *inode, struct file *filp)
{
- return 0;
+ struct task_struct *task = get_proc_task(inode);
+ struct core_proc *cp;
+ int ret = 0;
+
+ if (!task)
+ return -ENOENT;
+
+ mutex_lock(&core_mutex);
+ cp = get_core_proc(task);
+ if (cp) {
+ ret = -EALREADY;
+ mutex_unlock(&core_mutex);
+ goto out;
+ }
+
+ cp = kzalloc(sizeof (*cp), GFP_KERNEL);
+ if (!cp) {
+ ret = -ENOMEM;
+ mutex_unlock(&core_mutex);
+ goto out;
+ }
+
+ cp->task = task->group_leader;
+ INIT_LIST_HEAD(&cp->list);
+ list_add(&cp->list, &core_list);
+ mutex_unlock(&core_mutex);
+
+out:
+ put_task_struct(task);
+ return ret;
}

const struct file_operations proc_gen_core_operations = {
Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- /dev/null
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -0,0 +1,12 @@
+#ifndef __GEN_CORE_H
+#define __GEN_CORE_H
+
+#include <linux/list.h>
+#include <linux/sched.h>
+
+struct core_proc {
+ struct list_head list;
+ struct task_struct *task;
+};
+
+#endif

2010-12-14 10:13:38

by suzuki

[permalink] [raw]
Subject: [Patch 12/21] Check if the process is an ELF executable

Validate if the process is an ELF exec. This will be later extended to identify
if the task is a native ELF or a compat ELF.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/proc/gencore.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -23,6 +23,8 @@
*/

#include <linux/seq_file.h>
+#include <linux/elf.h>
+
#include "internal.h"
#include "gencore.h"

@@ -85,6 +87,30 @@ static int release_gencore(struct inode
}

/*
+ * Determine whether the task is an ELF executable.
+ * Returns
+ * < 0 - Non-ELF
+ * 0 - Native ELF Executable
+ */
+static int get_elf_class(struct task_struct *task)
+{
+ struct elfhdr hdr;
+ int ret = 0;
+
+ ret = kernel_read(task->mm->exe_file, 0, (char*)&hdr, sizeof(hdr));
+ if (ret < 0)
+ return ret;
+
+ /* Verify the ELF magic on the exe_file */
+ if (memcmp(hdr.e_ident, ELFMAG, SELFMAG))
+ return -EINVAL;
+ if (elf_check_arch(&hdr))
+ return 0;
+
+ return -EINVAL;
+}
+
+/*
* Validate if the call is valid. We also need to prevent >1 open
* of the same file.
*/
@@ -93,10 +119,15 @@ static int open_gencore(struct inode *in
struct task_struct *task = get_proc_task(inode);
struct core_proc *cp;
int ret = 0;
+ int elf_class;

if (!task)
return -ENOENT;

+ elf_class = get_elf_class(task);
+ if (elf_class < 0)
+ return elf_class;
+
mutex_lock(&core_mutex);
cp = get_core_proc(task);
if (cp) {

2010-12-14 10:14:57

by suzuki

[permalink] [raw]
Subject: [Patch 13/21] Freeze / Thaw threads

All the threads in the target process are issued a freeze request on an "open()".
They are woken up when we release the fd.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N.Mavinakayanahalli <[email protected]>
---
fs/proc/gencore.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -24,6 +24,7 @@

#include <linux/seq_file.h>
#include <linux/elf.h>
+#include <linux/freezer.h>

#include "internal.h"
#include "gencore.h"
@@ -70,11 +71,20 @@ out:
static int release_gencore(struct inode *inode, struct file *file)
{
struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct task_struct *t;
struct core_proc *cp;

if (!task)
return -EIO;

+ /* Thaw all frozen threads */
+ t = task;
+ read_lock(&tasklist_lock);
+ do {
+ thaw_process(t);
+ } while ((t = next_thread(t)) != task);
+ read_unlock(&tasklist_lock);
+
mutex_lock(&core_mutex);
cp = get_core_proc(task);
if (cp) {
@@ -117,6 +127,7 @@ static int get_elf_class(struct task_str
static int open_gencore(struct inode *inode, struct file *filp)
{
struct task_struct *task = get_proc_task(inode);
+ struct task_struct *t;
struct core_proc *cp;
int ret = 0;
int elf_class;
@@ -148,6 +159,21 @@ static int open_gencore(struct inode *in
list_add(&cp->list, &core_list);
mutex_unlock(&core_mutex);

+ /* freeze the processes */
+ t = task;
+ read_lock(&tasklist_lock);
+ do {
+ if (frozen(t) || !freezeable(t) || freezing(t))
+ continue;
+
+ if (freeze_task(t, true))
+ continue;
+
+ if (task_is_stopped_or_traced(t))
+ continue;
+ } while ((t = next_thread(t)) != task);
+ read_unlock(&tasklist_lock);
+
out:
put_task_struct(task);
return ret;

2010-12-14 10:15:59

by suzuki

[permalink] [raw]
Subject: [Patch 14/21] Create ELF header

Build the ELF header on the fly for the very first read request. The ELF Header,
Program Headers are stored in a buffer for processing future read() requests.

gencore-elf.c contains the ELF class specific functions.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N.Mavinakayanahalli <[email protected]>

---
fs/proc/Makefile | 2
fs/proc/gencore-elf.c | 177 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/proc/gencore.c | 5 +
fs/proc/gencore.h | 6 +
4 files changed, 189 insertions(+), 1 deletion(-)

Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.h
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -7,6 +7,12 @@
struct core_proc {
struct list_head list;
struct task_struct *task;
+ void *shdr; /* elf_shdr, in case nphdrs > PN_XNUM */
+ char *elf_buf; /* buffer for elf_hdr + phdrs + notes */
+ size_t elf_buflen; /* size of elf_buf */
+ size_t nphdrs; /* number of phdrs */
};

+extern ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
+ size_t buflen, loff_t *foffset);
#endif
Index: linux-2.6.36-rc7/fs/proc/gencore-elf.c
===================================================================
--- /dev/null
+++ linux-2.6.36-rc7/fs/proc/gencore-elf.c
@@ -0,0 +1,177 @@
+/*
+ * Application core dump - ELF Class specific code
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010
+ */
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/freezer.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+#include <linux/elfcore-internal.h>
+
+#ifdef CORE_DUMP_USE_REGSET
+#include <linux/regset.h>
+#endif
+
+#include "gencore.h"
+
+static void get_elfhdr_size(struct core_proc *cp)
+{
+ struct vm_area_struct *gate_vma;
+ int segs;
+
+ segs = cp->task->mm->map_count;
+ segs += elf_core_extra_phdrs();
+
+ gate_vma = get_gate_vma(cp->task);
+ if (gate_vma != NULL)
+ segs++;
+
+ /* One phdr for PT_NOTE */
+ segs++;
+
+ cp->nphdrs = segs;
+ cp->elf_buflen = sizeof(struct elfhdr) +
+ (cp->nphdrs * sizeof(struct elf_phdr));
+ cp->elf_buflen = roundup(cp->elf_buflen, ELF_EXEC_PAGESIZE);
+
+ return;
+}
+
+/*
+ * Fill the elf_hdr and the phdrs
+ * Returns 0 on success. On error, returns errno
+ */
+static int create_elf_header(struct core_proc *cp)
+{
+ struct elfhdr *elf = (struct elfhdr *)cp->elf_buf;
+ struct elf_phdr *note;
+ struct vm_area_struct *vma, *gate_vma = get_gate_vma(cp->task);
+ char *bufp;
+ off_t dataoff, offset;
+ short e_phnum = (cp->nphdrs > PN_XNUM ? PN_XNUM : cp->nphdrs);
+ size_t exphdrs_sz = 0;
+ unsigned long limit = elf_core_extra_phdrs() * sizeof(struct elf_phdr);
+
+#ifdef CORE_DUMP_USE_REGSET
+ const struct user_regset_view *view = task_user_regset_view(cp->task);
+
+ fill_elf_header(elf, e_phnum, view->e_machine, view->e_flags,
+ view->ei_osabi);
+#else
+ fill_elf_header(elf, e_phnum, ELF_ARCH, ELF_CORE_EFLAGS, ELF_OSABI);
+#endif
+ offset = sizeof(struct elfhdr);
+ bufp = cp->elf_buf + offset;
+ dataoff = offset + (cp->nphdrs * sizeof(struct elf_phdr));
+
+ /* Setup ELF PT_NOTE */
+ note = (struct elf_phdr*)bufp;
+ bufp += sizeof(struct elf_phdr);
+ offset += sizeof(struct elf_phdr);
+ note->p_type = PT_NOTE;
+ note->p_offset = dataoff;
+ note->p_vaddr = 0;
+ note->p_paddr = 0;
+ /* TODO: Needs to be populated with the size of the notes section */
+ note->p_memsz = 0;
+ note->p_flags = 0;
+ note->p_align = 0;
+
+ dataoff = cp->elf_buflen;
+
+ /* Write the phdrs for memory segments */
+ down_read(&cp->task->mm->mmap_sem);
+ for (vma = first_vma(cp->task, gate_vma); vma != NULL;
+ vma = next_vma(vma, gate_vma)) {
+ struct elf_phdr *phdr = (struct elf_phdr *)bufp;
+
+ bufp += sizeof(struct elf_phdr);
+ offset += sizeof(struct elf_phdr);
+
+ phdr->p_type = PT_LOAD;
+ phdr->p_offset = dataoff;
+ phdr->p_vaddr = vma->vm_start;
+ phdr->p_paddr = 0;
+ phdr->p_filesz = vma_dump_size(cp->task, vma, cp->task->mm->flags);
+ phdr->p_memsz = vma->vm_end - vma->vm_start;
+ phdr->p_flags = (vma->vm_flags & VM_READ) ? PF_R : 0;
+ if (vma->vm_flags & VM_WRITE)
+ phdr->p_flags |= PF_W;
+ if (vma->vm_flags & VM_EXEC)
+ phdr->p_flags |= PF_X;
+ phdr->p_align = ELF_EXEC_PAGESIZE;
+
+ dataoff += phdr->p_filesz;
+ }
+ up_read(&cp->task->mm->mmap_sem);
+
+ if (!elf_core_copy_extra_phdrs(bufp, dataoff, &exphdrs_sz, limit))
+ return -EIO;
+ bufp += exphdrs_sz;
+ dataoff += elf_core_extra_data_size();
+
+ if (e_phnum == PN_XNUM) {
+ cp->shdr = kzalloc(sizeof(struct elf_shdr), GFP_KERNEL);
+ if (!cp->shdr)
+ return -ENOMEM;
+ fill_extnum_info(elf, (struct elf_shdr *)cp->shdr,
+ dataoff, cp->nphdrs);
+ dataoff += sizeof(struct elf_shdr);
+ }
+
+ return 0;
+}
+
+ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
+ size_t buflen, loff_t *fpos)
+{
+ ssize_t ret = 0;
+
+ if (!cp->elf_buf) {
+ get_elfhdr_size(cp);
+
+ cp->elf_buf = kzalloc(cp->elf_buflen, GFP_KERNEL);
+ if (!cp->elf_buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = create_elf_header(cp);
+ if (ret < 0)
+ goto out;
+ }
+
+ if (*fpos < cp->elf_buflen) {
+ size_t bcp = cp->elf_buflen - *fpos;
+
+ bcp = (bcp < buflen) ? bcp : buflen;
+ if (copy_to_user(buffer, (cp->elf_buf + *fpos), bcp)) {
+ ret = -EFAULT;
+ goto out;
+ } else {
+ ret = bcp;
+ *fpos += bcp;
+ buflen -= bcp;
+ buffer += bcp;
+ }
+ }
+
+out:
+ return ret;
+}
Index: linux-2.6.36-rc7/fs/proc/Makefile
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/Makefile
+++ linux-2.6.36-rc7/fs/proc/Makefile
@@ -19,7 +19,7 @@ proc-y += stat.o
proc-y += uptime.o
proc-y += version.o
proc-y += softirqs.o
-proc-$(CONFIG_ELF_CORE) += gencore.o
+proc-$(CONFIG_ELF_CORE) += gencore.o gencore-elf.o
proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o
Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -63,6 +63,8 @@ static ssize_t read_gencore(struct file
}
mutex_unlock(&core_mutex);

+ ret = elf_read_gencore(cp, buffer, buflen, fpos);
+
out:
put_task_struct(task);
return ret;
@@ -89,6 +91,9 @@ static int release_gencore(struct inode
cp = get_core_proc(task);
if (cp) {
list_del(&cp->list);
+ if (cp->shdr)
+ kfree(cp->shdr);
+ kfree(cp->elf_buf);
kfree(cp);
}
mutex_unlock(&core_mutex);

2010-12-14 10:17:24

by suzuki

[permalink] [raw]
Subject: [Patch 15/21] Collect ELF Core notes data

Collect the PT_NOTE information for the core.

There are two "process wide" notes. NT_PRPSINFO and NT_AUXV. These are captured
in the core_proc structure.

Each thread gets a NT_PRSTATUS note, which will contain the GPR contents. A
thread may have additional notes depending on the other register sets used by it.
Uses struct elf_thread_core_info to capture the thread specific information.

fill_thread_core_info() fills in the notes for a thread.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N. Mavinakayanahalli <[email protected]>

---
fs/proc/gencore-elf.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++-
fs/proc/gencore.c | 67 ++++++++++++++++--
fs/proc/gencore.h | 28 +++++++
3 files changed, 268 insertions(+), 10 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.h
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -3,16 +3,44 @@

#include <linux/list.h>
#include <linux/sched.h>
+#include <linux/elfcore.h>
+#include <linux/elfcore-internal.h>
+
+struct elf_thread_core_info {
+ unsigned short num_notes; /* Number of notes for this thread */
+ struct elf_thread_core_info *next;
+ struct task_struct *task;
+ struct elf_prstatus prstatus;
+ struct memelfnote notes[0];
+};

struct core_proc {
struct list_head list;
struct task_struct *task;
void *shdr; /* elf_shdr, in case nphdrs > PN_XNUM */
char *elf_buf; /* buffer for elf_hdr + phdrs + notes */
+ struct elf_thread_core_info *tinfo;
+ struct memelfnote psinfo;
+ struct memelfnote auxv;
+ struct elf_prpsinfo prpsinfo;
size_t elf_buflen; /* size of elf_buf */
size_t nphdrs; /* number of phdrs */
+ size_t notes_size;
};

+#ifdef CORE_DUMP_USE_REGSET
+#include <linux/regset.h>
+
+static inline int get_max_regsets(struct task_struct *task)
+{
+ const struct user_regset_view *view = task_user_regset_view(task);
+ return view->n;
+}
+
+#else
+#define get_max_regsets(task) 3 /* GPR, FP, XFP? */
+#endif
+
extern ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
size_t buflen, loff_t *foffset);
#endif
Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -25,6 +25,7 @@
#include <linux/seq_file.h>
#include <linux/elf.h>
#include <linux/freezer.h>
+#include <linux/sched.h>

#include "internal.h"
#include "gencore.h"
@@ -70,6 +71,39 @@ out:
return ret;
}

+static void free_notes_data(struct elf_thread_core_info *tinfo)
+{
+ int i;
+
+ for (i = 1; i < tinfo->num_notes; i++)
+ if (tinfo->notes[i].data) {
+ kfree(tinfo->notes[i].data);
+ tinfo->notes[i].data = NULL;
+ }
+}
+
+static void cleanup_cp(struct core_proc *cp)
+{
+ struct elf_thread_core_info *tmp, *tinfo = cp->tinfo;
+
+ mutex_lock(&core_mutex);
+ list_del(&cp->list);
+ mutex_unlock(&core_mutex);
+
+ if (tinfo) {
+ do {
+ tmp = tinfo;
+ tinfo = tinfo->next;
+ free_notes_data(tmp);
+ kfree(tmp);
+ } while (tinfo != NULL);
+ }
+ if (cp->shdr)
+ kfree(cp->shdr);
+ kfree(cp->elf_buf);
+ kfree(cp);
+}
+
static int release_gencore(struct inode *inode, struct file *file)
{
struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
@@ -89,14 +123,11 @@ static int release_gencore(struct inode

mutex_lock(&core_mutex);
cp = get_core_proc(task);
- if (cp) {
- list_del(&cp->list);
- if (cp->shdr)
- kfree(cp->shdr);
- kfree(cp->elf_buf);
- kfree(cp);
- }
mutex_unlock(&core_mutex);
+
+ if (cp)
+ cleanup_cp(cp);
+
put_task_struct(task);
return 0;
}
@@ -133,9 +164,11 @@ static int open_gencore(struct inode *in
{
struct task_struct *task = get_proc_task(inode);
struct task_struct *t;
+ struct elf_thread_core_info *tinfo = NULL;
struct core_proc *cp;
+ int elf_class, max_regset, i;
int ret = 0;
- int elf_class;
+

if (!task)
return -ENOENT;
@@ -164,10 +197,28 @@ static int open_gencore(struct inode *in
list_add(&cp->list, &core_list);
mutex_unlock(&core_mutex);

+ max_regset = get_max_regsets(task);
+
+ for (i = 0; i < get_nr_threads(task); i++) {
+ tinfo = kzalloc(offsetof(struct elf_thread_core_info,
+ notes[max_regset]), GFP_KERNEL);
+ if (unlikely(!tinfo)) {
+ cleanup_cp(cp);
+ ret = -ENOMEM;
+ goto out;
+ }
+ tinfo->next = cp->tinfo;
+ cp->tinfo = tinfo;
+ }
+
/* freeze the processes */
t = task;
read_lock(&tasklist_lock);
do {
+ if (tinfo) {
+ tinfo->task = t;
+ tinfo = tinfo->next;
+ }
if (frozen(t) || !freezeable(t) || freezing(t))
continue;

Index: linux-2.6.36-rc7/fs/proc/gencore-elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore-elf.c
+++ linux-2.6.36-rc7/fs/proc/gencore-elf.c
@@ -30,6 +30,158 @@

#include "gencore.h"

+static int notesize(struct memelfnote *men)
+{
+ int size = sizeof(struct elf_note);
+
+ size += roundup(strlen(men->name) + 1, 4);
+ size += roundup(men->datasz, 4);
+
+ return size;
+}
+
+/* Store the note in the header buffer */
+static char *storenote(struct memelfnote *men, char *bufp)
+{
+ struct elf_note *en = (struct elf_note *)bufp;
+
+ en->n_namesz = strlen(men->name) + 1;
+ en->n_descsz = men->datasz;
+ en->n_type = men->type;
+ bufp = (char *) (en + 1);
+
+ memcpy(bufp, men->name, en->n_namesz);
+ bufp = (char *) roundup((unsigned long)bufp + en->n_namesz, 4);
+
+ memcpy(bufp, men->data, men->datasz);
+ bufp = (char *) roundup((unsigned long)bufp + men->datasz, 4);
+
+ return bufp;
+}
+
+#ifdef CORE_DUMP_USE_REGSET
+static void do_thread_regset_writeback(struct task_struct *task,
+ const struct user_regset *regset)
+{
+ if (regset->writeback)
+ regset->writeback(task, regset, 1);
+}
+
+static int fill_thread_core_info(struct elf_thread_core_info *tinfo,
+ struct core_proc *cp)
+{
+ unsigned int i;
+ const struct user_regset_view *view = task_user_regset_view(tinfo->task);
+
+ fill_prstatus(&tinfo->prstatus, tinfo->task, 0);
+
+ do_thread_regset_writeback(tinfo->task, &view->regsets[0]);
+ (void) view->regsets[0].get(tinfo->task, &view->regsets[0],
+ 0, sizeof(tinfo->prstatus.pr_reg),
+ &tinfo->prstatus.pr_reg, NULL);
+ fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
+ sizeof(tinfo->prstatus), &tinfo->prstatus);
+ cp->notes_size += notesize(&tinfo->notes[0]);
+ tinfo->num_notes = view->n;
+
+ for (i = 1; i < view->n; i++) {
+ const struct user_regset *regset = &view->regsets[i];
+
+ do_thread_regset_writeback(tinfo->task, regset);
+ if (regset->core_note_type &&
+ (!regset->active || regset->active(tinfo->task, regset))) {
+ int ret;
+ size_t size = regset->n * regset->size;
+ void *data = kzalloc(size, GFP_KERNEL);
+ if (!unlikely(data))
+ return 0;
+ ret = regset->get(tinfo->task, regset,
+ 0, size, data, NULL);
+ if (unlikely(ret))
+ kfree(data);
+ else {
+ if (regset->core_note_type != NT_PRFPREG)
+ fill_note(&tinfo->notes[i], "LINUX",
+ regset->core_note_type,
+ size, data);
+ else {
+ tinfo->prstatus.pr_fpvalid = 1;
+ fill_note(&tinfo->notes[i], "CORE",
+ NT_PRFPREG, size, data);
+ }
+ cp->notes_size += notesize(&tinfo->notes[i]);
+ }
+ }
+ }
+ return 1;
+}
+#else
+static int fill_thread_core_info(struct elf_thread_core_info *tinfo,
+ struct core_proc *cp)
+{
+ elf_fpregset_t fpu, *pfpu;
+#ifdef ELF_CORE_COPY_XFPREGS
+ elf_fpxregset_t xfpu, *pxfpu;
+#endif
+
+ fill_prstatus(&tinfo->prstatus, t->task, 0);
+ elf_core_copy_task_regs(t->task, &tinfo->prstatus.pr_reg);
+ fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
+ sizeof(t->prstatus), &t->prstatus);
+
+ cp->notes_size += notesize(&tinfo->notes[0]);
+ tinfo->num_notes = 1;
+
+ if (tinfo->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(tinfo->task,
+ NULL, &fpu)) {
+ pfpu = kzalloc(sizeof(*pfpu), GFP_KERNEL);
+ if (pfpu == NULL)
+ return 0;
+ memcpy(pfpu, &fpu, sizeof(fpu));
+ fill_note(&tinfo->notes[tinfo->num_notes], "CORE", NT_PRFPREG,
+ sizeof(*pfpu), pfpu);
+ cp->notes_size += notesize(&tinfo->notes[tinfo->num_notes]);
+ tinfo->num_notes++;
+ }
+#ifdef ELF_CORE_COPY_XFPREGS
+ if (elf_core_copy_task_xfpregs(tinfo->task, &xfpu)) {
+ pxfpu = kzalloc(sizeof(*pxfpu), GFP_KERNEL);
+ if (!pxfpu)
+ return 0;
+ memcpy(pxfpu, &xfpu, sizeof(xfpu));
+ fill_note(&tinfo->notes[tinfo->num_notes], "LINUX",
+ ELF_CORE_XFPREG_TYPE, sizeof(*pxfpu), pxfpu);
+ cp->notes_size += notesize(&tinfo->notes[tinfo->num_notes]);
+ tinfo->num_notes++;
+ }
+#endif
+ return 1;
+}
+#endif
+
+/* Returns 0 on error, 1 on success */
+static int collect_notes(struct core_proc *cp)
+{
+ struct elf_thread_core_info *tinfo;
+
+ /* Fill the 2 process wide notes */
+ fill_psinfo(&cp->prpsinfo, cp->task, cp->task->mm);
+ fill_note(&cp->psinfo, "CORE", NT_PRPSINFO,
+ sizeof(struct elf_prpsinfo), &cp->prpsinfo);
+ cp->notes_size += notesize(&cp->psinfo);
+
+ fill_auxv_note(&cp->auxv, cp->task->mm);
+ cp->notes_size += notesize(&cp->auxv);
+
+ tinfo = cp->tinfo;
+ while (tinfo != NULL) {
+ if (!fill_thread_core_info(tinfo, cp))
+ return 0;
+ tinfo = tinfo->next;
+ }
+ return 1;
+}
+
static void get_elfhdr_size(struct core_proc *cp)
{
struct vm_area_struct *gate_vma;
@@ -47,7 +199,9 @@ static void get_elfhdr_size(struct core_

cp->nphdrs = segs;
cp->elf_buflen = sizeof(struct elfhdr) +
- (cp->nphdrs * sizeof(struct elf_phdr));
+ (cp->nphdrs * sizeof(struct elf_phdr)) +
+ cp->notes_size;
+
cp->elf_buflen = roundup(cp->elf_buflen, ELF_EXEC_PAGESIZE);

return;
@@ -62,11 +216,13 @@ static int create_elf_header(struct core
struct elfhdr *elf = (struct elfhdr *)cp->elf_buf;
struct elf_phdr *note;
struct vm_area_struct *vma, *gate_vma = get_gate_vma(cp->task);
+ struct elf_thread_core_info *tinfo;
char *bufp;
off_t dataoff, offset;
short e_phnum = (cp->nphdrs > PN_XNUM ? PN_XNUM : cp->nphdrs);
size_t exphdrs_sz = 0;
unsigned long limit = elf_core_extra_phdrs() * sizeof(struct elf_phdr);
+ int first = 1;

#ifdef CORE_DUMP_USE_REGSET
const struct user_regset_view *view = task_user_regset_view(cp->task);
@@ -88,7 +244,7 @@ static int create_elf_header(struct core
note->p_offset = dataoff;
note->p_vaddr = 0;
note->p_paddr = 0;
- /* TODO: Needs to be populated with the size of the notes section */
+ note->p_filesz = cp->notes_size;
note->p_memsz = 0;
note->p_flags = 0;
note->p_align = 0;
@@ -134,6 +290,22 @@ static int create_elf_header(struct core
dataoff, cp->nphdrs);
dataoff += sizeof(struct elf_shdr);
}
+ /* Store the notes */
+ tinfo = cp->tinfo;
+ do {
+ int i;
+
+ bufp = storenote(&tinfo->notes[0], bufp);
+ if (first) {
+ bufp = storenote(&cp->psinfo, bufp);
+ bufp = storenote(&cp->auxv, bufp);
+ }
+ for (i = 1; i < tinfo->num_notes; i++)
+ if (tinfo->notes[i].data != NULL)
+ bufp = storenote(&tinfo->notes[i], bufp);
+ first = 0;
+ tinfo = tinfo->next;
+ } while (tinfo != NULL);

return 0;
}
@@ -143,6 +315,13 @@ ssize_t elf_read_gencore(struct core_pro
{
ssize_t ret = 0;

+ if (!cp->notes_size) {
+ if (!collect_notes(cp)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+
if (!cp->elf_buf) {
get_elfhdr_size(cp);

2010-12-14 10:18:53

by suzuki

[permalink] [raw]
Subject: [Patch 16/21] Wait for threads to freeze

The threads may not have entered the "refrigerator()" by the time we start
generating the NOTE sections, which can result in invalid register states.
Wait for a short interval of time for the threads to freeze. The register
information of the threads which are not frozen at the end of the interval,
is filled with 0's, to indicate we couldn't capture the information.

Signed-off-by: Suzuki K. Poulose <[email protected]>

---
fs/proc/gencore-elf.c | 20 ++++++++++++++++++++
fs/proc/gencore.c | 42 ++++++++++++++++++++++++++++++++++++++++++
fs/proc/gencore.h | 3 +++
3 files changed, 65 insertions(+)

Index: linux-2.6.36-rc7/fs/proc/gencore-elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore-elf.c
+++ linux-2.6.36-rc7/fs/proc/gencore-elf.c
@@ -159,6 +159,21 @@ static int fill_thread_core_info(struct
}
#endif

+/*
+ * zero_thread_state :
+ * For threads, which are not frozen, we cannot capture the register state.
+ * We zero them out.
+ */
+static void zero_thread_state(struct elf_thread_core_info *tinfo)
+{
+ int i;
+
+ memset(&tinfo->prstatus.pr_reg, 0, sizeof(tinfo->prstatus.pr_reg));
+ for (i = 1; i < tinfo->num_notes; i++)
+ if (tinfo->notes[i].data)
+ memset(tinfo->notes[i].data, 0, tinfo->notes[i].datasz);
+}
+
/* Returns 0 on error, 1 on success */
static int collect_notes(struct core_proc *cp)
{
@@ -173,10 +188,15 @@ static int collect_notes(struct core_pro
fill_auxv_note(&cp->auxv, cp->task->mm);
cp->notes_size += notesize(&cp->auxv);

+ /* Make sure threads are frozen, before we start collecting notes */
+ try_to_freeze_core_threads(cp);
+
tinfo = cp->tinfo;
while (tinfo != NULL) {
if (!fill_thread_core_info(tinfo, cp))
return 0;
+ if (!tinfo->frozen)
+ zero_thread_state(tinfo);
tinfo = tinfo->next;
}
return 1;
Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.h
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -7,6 +7,7 @@
#include <linux/elfcore-internal.h>

struct elf_thread_core_info {
+ unsigned short frozen;
unsigned short num_notes; /* Number of notes for this thread */
struct elf_thread_core_info *next;
struct task_struct *task;
@@ -28,6 +29,8 @@ struct core_proc {
size_t notes_size;
};

+extern void try_to_freeze_core_threads(struct core_proc *cp);
+
#ifdef CORE_DUMP_USE_REGSET
#include <linux/regset.h>

Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -30,6 +30,8 @@
#include "internal.h"
#include "gencore.h"

+#define FREEZE_TIMEOUT (20 * HZ)
+
static LIST_HEAD(core_list);
static DEFINE_MUTEX(core_mutex);

@@ -45,6 +47,46 @@ static struct core_proc* get_core_proc(s
return NULL;
}

+/* Borrowed from kernel/cgroup_freezer.c */
+static int is_task_frozen_enough(struct task_struct *task)
+{
+ return frozen(task) ||
+ (task_is_stopped_or_traced(task) && freezing(task));
+}
+/*
+ * Wait for a short interval, to let the threads freeze.
+ * If "current" is generating the core, we should skip
+ * it.
+ * Based on try_to_freeze_tasks, kernel/power/process.c
+ */
+void try_to_freeze_core_threads(struct core_proc *cp)
+{
+ unsigned long end_time;
+ struct elf_thread_core_info *tinfo;
+ unsigned todo;
+
+ end_time = jiffies + FREEZE_TIMEOUT;
+
+retry:
+ tinfo = cp->tinfo;
+ todo = 0;
+ while (tinfo != NULL) {
+ if ((tinfo->task != current) && !tinfo->frozen) {
+ if (is_task_frozen_enough(tinfo->task))
+ tinfo->frozen = 1;
+ else
+ todo++;
+ }
+ tinfo = tinfo->next;
+ }
+
+ if (!todo || time_after(jiffies, end_time))
+ return;
+
+ schedule();
+ goto retry;
+}
+
static ssize_t read_gencore(struct file *file, char __user *buffer,
size_t buflen, loff_t *fpos)
{

2010-12-14 10:20:26

by suzuki

[permalink] [raw]
Subject: [Patch 17/21] Calculate the size of the core file

Calculate the size of the core file

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N. Mavinakayanahalli <[email protected]>

---
fs/proc/gencore-elf.c | 6 ++++++
fs/proc/gencore.h | 1 +
2 files changed, 7 insertions(+)

Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.h
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -27,6 +27,7 @@ struct core_proc {
size_t elf_buflen; /* size of elf_buf */
size_t nphdrs; /* number of phdrs */
size_t notes_size;
+ size_t size;
};

extern void try_to_freeze_core_threads(struct core_proc *cp);
Index: linux-2.6.36-rc7/fs/proc/gencore-elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore-elf.c
+++ linux-2.6.36-rc7/fs/proc/gencore-elf.c
@@ -310,6 +310,9 @@ static int create_elf_header(struct core
dataoff, cp->nphdrs);
dataoff += sizeof(struct elf_shdr);
}
+
+ cp->size = dataoff;
+
/* Store the notes */
tinfo = cp->tinfo;
do {
@@ -371,6 +374,9 @@ ssize_t elf_read_gencore(struct core_pro
}
}

+ if (*fpos > cp->size)
+ goto out;
+
out:
return ret;
}

2010-12-14 10:21:52

by suzuki

[permalink] [raw]
Subject: [Patch 18/21] Generate the data sections for ELF Core

Generate the "data" for the memory regions. Also write down the section header
if we have, number of phdrs > PN_XNUM.

The vma areas are read, page by page using access_process_vm() without an
mmap_sem. If there are active threads, then we may miss a vma if it is removed
while we are doing the read.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N. Mavinakayanahalli <[email protected]>
---
fs/proc/gencore-elf.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 87 insertions(+), 3 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/gencore-elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore-elf.c
+++ linux-2.6.36-rc7/fs/proc/gencore-elf.c
@@ -333,10 +333,29 @@ static int create_elf_header(struct core
return 0;
}

+/*
+ * Verify if the fpos asked for in read is valid.
+ * Returns the phdr corresponding to offset, else NULL.
+ */
+static struct elf_phdr *get_pos_elfphdr(struct core_proc *cp, loff_t pos)
+{
+ struct elfhdr *elf_hdr = (struct elfhdr *)cp->elf_buf;
+ struct elf_phdr *phdr = (struct elf_phdr*)(cp->elf_buf + elf_hdr->e_phoff);
+ int i;
+
+ for (i = 0; i < cp->nphdrs; i++, phdr++) {
+ unsigned long end = phdr->p_offset + phdr->p_filesz;
+ if ((pos >= phdr->p_offset) && (pos < end))
+ return phdr;
+ }
+ return NULL;
+}
+
ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
size_t buflen, loff_t *fpos)
{
- ssize_t ret = 0;
+ ssize_t ret = 0, acc = 0;
+ struct elfhdr *elf_hdr = (struct elfhdr *)cp->elf_buf;

if (!cp->notes_size) {
if (!collect_notes(cp)) {
@@ -367,16 +386,81 @@ ssize_t elf_read_gencore(struct core_pro
ret = -EFAULT;
goto out;
} else {
- ret = bcp;
+ acc = bcp;
*fpos += bcp;
buflen -= bcp;
buffer += bcp;
}
}

+
if (*fpos > cp->size)
- goto out;
+ goto done;
+
+ /*
+ * Read from the vma segments
+ * a. verify if the *fpos is within a phdr
+ * b. Use access_process_vm() to get data page by page
+ * c. copy_to_user into user buffer
+ */
+
+ while (buflen) {
+ size_t bufsz, offset, bytes;
+ char *readbuf;
+ struct elf_phdr *phdr = get_pos_elfphdr(cp, *fpos);
+
+ if (!phdr)
+ break;
+
+ bufsz = (buflen > PAGE_SIZE) ? PAGE_SIZE : buflen;
+ readbuf = kmalloc(bufsz, GFP_KERNEL);
+ if (!readbuf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ offset = *fpos - phdr->p_offset;
+ bytes = access_process_vm(cp->task, (phdr->p_vaddr + offset),
+ readbuf, bufsz, 0);
+ if (!bytes) {
+ ret = -EIO;
+ goto out;
+ }
+ if (copy_to_user(buffer, readbuf, bytes)) {
+ ret = -EFAULT;
+ kfree(readbuf);
+ goto out;
+ } else
+ acc += bytes;
+
+ kfree(readbuf);
+ buflen -= bytes;
+ buffer += bytes;
+ *fpos += bytes;
+ }
+
+ /* Fill extnum section header if present */
+ if (buflen &&
+ elf_hdr->e_shoff &&
+ (*fpos >= elf_hdr->e_shoff) &&
+ (*fpos < (elf_hdr->e_shoff + sizeof(struct elf_shdr)))) {
+
+ off_t offset = *fpos - elf_hdr->e_shoff;
+ size_t shdrsz = sizeof(struct elf_shdr) - offset;
+
+ shdrsz = (buflen < shdrsz) ? buflen : shdrsz;
+ if (copy_to_user(buffer, ((char *)cp->shdr) + offset, shdrsz)) {
+ ret = -EFAULT;
+ goto out;
+ } else {
+ acc += shdrsz;
+ buflen -= shdrsz;
+ buffer += shdrsz;
+ }
+ }

+done:
+ ret = acc;
out:
return ret;
}

2010-12-14 10:24:18

by suzuki

[permalink] [raw]
Subject: [Patch 19/21] Identify the ELF class of the process

Identify the ELF class of the process to native or compat.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/proc/gencore.c | 17 +++++++++++++----
fs/proc/gencore.h | 3 +++
2 files changed, 16 insertions(+), 4 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -106,7 +106,8 @@ static ssize_t read_gencore(struct file
}
mutex_unlock(&core_mutex);

- ret = elf_read_gencore(cp, buffer, buflen, fpos);
+ if (cp->elf_class == ELF_CLASS_NATIVE)
+ ret = elf_read_gencore(cp, buffer, buflen, fpos);

out:
put_task_struct(task);
@@ -177,8 +178,8 @@ static int release_gencore(struct inode
/*
* Determine whether the task is an ELF executable.
* Returns
- * < 0 - Non-ELF
- * 0 - Native ELF Executable
+ * -EINVAL - On non ELF execs
+ * Returns the ELF_CLASS_NATIVE or ELF_CLASS_COMPAT
*/
static int get_elf_class(struct task_struct *task)
{
@@ -193,8 +194,15 @@ static int get_elf_class(struct task_str
if (memcmp(hdr.e_ident, ELFMAG, SELFMAG))
return -EINVAL;
if (elf_check_arch(&hdr))
- return 0;
+ return ELF_CLASS_NATIVE;

+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ {
+ struct elf32_hdr *chdr = (struct elf32_hdr *)&hdr;
+ if (compat_elf_check_arch(chdr))
+ return ELF_CLASS_COMPAT;
+ }
+#endif
return -EINVAL;
}

@@ -238,6 +246,7 @@ static int open_gencore(struct inode *in
INIT_LIST_HEAD(&cp->list);
list_add(&cp->list, &core_list);
mutex_unlock(&core_mutex);
+ cp->elf_class = elf_class;

max_regset = get_max_regsets(task);

Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.h
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -6,6 +6,8 @@
#include <linux/elfcore.h>
#include <linux/elfcore-internal.h>

+enum { ELF_CLASS_NATIVE, ELF_CLASS_COMPAT };
+
struct elf_thread_core_info {
unsigned short frozen;
unsigned short num_notes; /* Number of notes for this thread */
@@ -28,6 +30,7 @@ struct core_proc {
size_t nphdrs; /* number of phdrs */
size_t notes_size;
size_t size;
+ unsigned char elf_class;
};

extern void try_to_freeze_core_threads(struct core_proc *cp);

2010-12-14 10:26:09

by suzuki

[permalink] [raw]
Subject: [Patch 20/21] Add supporting for compat ELF class data structures

Redefine the structures to include the compat class variant of the data
structures.

gencore-elf.c will be reused for the compat ELF class support. Hence, define
macros to get the right member for each class.

Signed-off-by: Suzuki K. Poulose <[email protected]>

---
fs/proc/gencore-elf.c | 37 +++++++++++++++++++++++++------------
fs/proc/gencore.h | 18 ++++++++++++++++--
2 files changed, 41 insertions(+), 14 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/gencore-elf.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore-elf.c
+++ linux-2.6.36-rc7/fs/proc/gencore-elf.c
@@ -16,6 +16,8 @@
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
* Copyright (C) IBM Corporation, 2010
+ *
+ * This code is reused in gencore-compat-elf.c for compat ELF class support.
*/
#include <linux/mm.h>
#include <linux/slab.h>
@@ -30,6 +32,17 @@

#include "gencore.h"

+#ifndef et_prstatus
+/* prstatus field for native elf, in elf_thread_core_info*/
+#define et_prstatus u.prstatus
+#endif
+
+#ifndef cp_prpsinfo
+/* prpsinfo field for native elf, in core_proc */
+#define cp_prpsinfo u.prpsinfo
+#endif
+
+
static int notesize(struct memelfnote *men)
{
int size = sizeof(struct elf_note);
@@ -73,14 +86,14 @@ static int fill_thread_core_info(struct
unsigned int i;
const struct user_regset_view *view = task_user_regset_view(tinfo->task);

- fill_prstatus(&tinfo->prstatus, tinfo->task, 0);
+ fill_prstatus(&tinfo->et_prstatus, tinfo->task, 0);

do_thread_regset_writeback(tinfo->task, &view->regsets[0]);
(void) view->regsets[0].get(tinfo->task, &view->regsets[0],
- 0, sizeof(tinfo->prstatus.pr_reg),
- &tinfo->prstatus.pr_reg, NULL);
+ 0, sizeof(tinfo->et_prstatus.pr_reg),
+ &tinfo->et_prstatus.pr_reg, NULL);
fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
- sizeof(tinfo->prstatus), &tinfo->prstatus);
+ sizeof(tinfo->et_prstatus), &tinfo->et_prstatus);
cp->notes_size += notesize(&tinfo->notes[0]);
tinfo->num_notes = view->n;

@@ -105,7 +118,7 @@ static int fill_thread_core_info(struct
regset->core_note_type,
size, data);
else {
- tinfo->prstatus.pr_fpvalid = 1;
+ tinfo->et_prstatus.pr_fpvalid = 1;
fill_note(&tinfo->notes[i], "CORE",
NT_PRFPREG, size, data);
}
@@ -124,15 +137,15 @@ static int fill_thread_core_info(struct
elf_fpxregset_t xfpu, *pxfpu;
#endif

- fill_prstatus(&tinfo->prstatus, t->task, 0);
- elf_core_copy_task_regs(t->task, &tinfo->prstatus.pr_reg);
+ fill_prstatus(&tinfo->et_prstatus, t->task, 0);
+ elf_core_copy_task_regs(t->task, &tinfo->et_prstatus.pr_reg);
fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
- sizeof(t->prstatus), &t->prstatus);
+ sizeof(t->et_prstatus), &t->et_prstatus);

cp->notes_size += notesize(&tinfo->notes[0]);
tinfo->num_notes = 1;

- if (tinfo->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(tinfo->task,
+ if (tinfo->et_prstatus.pr_fpvalid = elf_core_copy_task_fpregs(tinfo->task,
NULL, &fpu)) {
pfpu = kzalloc(sizeof(*pfpu), GFP_KERNEL);
if (pfpu == NULL)
@@ -168,7 +181,7 @@ static void zero_thread_state(struct elf
{
int i;

- memset(&tinfo->prstatus.pr_reg, 0, sizeof(tinfo->prstatus.pr_reg));
+ memset(&tinfo->et_prstatus.pr_reg, 0, sizeof(tinfo->et_prstatus.pr_reg));
for (i = 1; i < tinfo->num_notes; i++)
if (tinfo->notes[i].data)
memset(tinfo->notes[i].data, 0, tinfo->notes[i].datasz);
@@ -180,9 +193,9 @@ static int collect_notes(struct core_pro
struct elf_thread_core_info *tinfo;

/* Fill the 2 process wide notes */
- fill_psinfo(&cp->prpsinfo, cp->task, cp->task->mm);
+ fill_psinfo(&cp->cp_prpsinfo, cp->task, cp->task->mm);
fill_note(&cp->psinfo, "CORE", NT_PRPSINFO,
- sizeof(struct elf_prpsinfo), &cp->prpsinfo);
+ sizeof(struct elf_prpsinfo), &cp->cp_prpsinfo);
cp->notes_size += notesize(&cp->psinfo);

fill_auxv_note(&cp->auxv, cp->task->mm);
Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.h
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -6,6 +6,10 @@
#include <linux/elfcore.h>
#include <linux/elfcore-internal.h>

+#ifdef CONFIG_COMPAT_BINFMT_ELF
+#include <linux/elfcore-compat.h>
+#endif
+
enum { ELF_CLASS_NATIVE, ELF_CLASS_COMPAT };

struct elf_thread_core_info {
@@ -13,7 +17,12 @@ struct elf_thread_core_info {
unsigned short num_notes; /* Number of notes for this thread */
struct elf_thread_core_info *next;
struct task_struct *task;
- struct elf_prstatus prstatus;
+ union {
+ struct elf_prstatus prstatus;
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ struct compat_elf_prstatus compat_prstatus;
+#endif
+ } u;
struct memelfnote notes[0];
};

@@ -25,7 +34,12 @@ struct core_proc {
struct elf_thread_core_info *tinfo;
struct memelfnote psinfo;
struct memelfnote auxv;
- struct elf_prpsinfo prpsinfo;
+ union {
+ struct elf_prpsinfo prpsinfo;
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ struct compat_elf_prpsinfo compat_prpsinfo;
+#endif
+ } u;
size_t elf_buflen; /* size of elf_buf */
size_t nphdrs; /* number of phdrs */
size_t notes_size;

2010-12-14 10:27:30

by suzuki

[permalink] [raw]
Subject: [Patch 21/21] Compat ELF class Core generation support

Add full support for the compat ELF class objects. The gencore-compat-elf.c
reuses the gencore-elf.c, by renaming the ABI structures and functions.
(Inspired by compat_binfmt_elf.c)

Signed-off-by: Suzuki K. Poulose <[email protected]>

---
fs/proc/Makefile | 1
fs/proc/gencore-compat-elf.c | 61 +++++++++++++++++++++++++++++++++++++++++++
fs/proc/gencore.c | 4 ++
fs/proc/gencore.h | 4 ++
4 files changed, 70 insertions(+)

Index: linux-2.6.36-rc7/fs/proc/gencore.h
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.h
+++ linux-2.6.36-rc7/fs/proc/gencore.h
@@ -64,4 +64,8 @@ static inline int get_max_regsets(struc

extern ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
size_t buflen, loff_t *foffset);
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+extern ssize_t compat_elf_read_gencore(struct core_proc *cp, char __user *buffer,
+ size_t buflen, loff_t *foffset);
+#endif
#endif
Index: linux-2.6.36-rc7/fs/proc/gencore.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/gencore.c
+++ linux-2.6.36-rc7/fs/proc/gencore.c
@@ -108,6 +108,10 @@ static ssize_t read_gencore(struct file

if (cp->elf_class == ELF_CLASS_NATIVE)
ret = elf_read_gencore(cp, buffer, buflen, fpos);
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ else if (cp->elf_class == ELF_CLASS_COMPAT)
+ ret = compat_elf_read_gencore(cp, buffer, buflen, fpos);
+#endif

out:
put_task_struct(task);
Index: linux-2.6.36-rc7/fs/proc/gencore-compat-elf.c
===================================================================
--- /dev/null
+++ linux-2.6.36-rc7/fs/proc/gencore-compat-elf.c
@@ -0,0 +1,61 @@
+/*
+ * Application core dump - compat ELF Class specific code
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010
+ *
+ * We use macros to rename the types and functions in gencore-elf.c to
+ * the compat versions
+ *
+ */
+
+#include <linux/elfcore-compat.h>
+#include "gencore.h"
+
+#undef elfhdr
+#undef elf_phdr
+#undef elf_shdr
+#undef elf_note
+#undef elf_addr_t
+#define elfhdr elf32_hdr
+#define elf_phdr elf32_phdr
+#define elf_shdr elf32_shdr
+#define elf_note elf32_note
+#define elf_addr_t Elf32_Addr
+
+#undef ELF_CLASS
+#define ELF_CLASS ELF_CASS32
+
+#define elf_prstatus compat_elf_prstatus
+#define elf_prpsinfo compat_elf_prpsinfo
+
+/* Functions used from binfmt_elf core code */
+#define fill_psinfo compat_fill_psinfo
+#define fill_prstatus compat_fill_prstatus
+#define fill_extnum_info compat_fill_extnum_info
+#define fill_auxv_note compat_fill_auxv_note
+#define fill_elf_header compat_fill_elf_header
+/*
+ * Define the fileds in elf_thread_core_info and core_proc
+ * for compat ELF class
+ */
+#define et_prstatus u.compat_prstatus
+#define cp_prpsinfo u.compat_prpsinfo
+
+/* Rename the core function to compat_ */
+#define elf_read_gencore compat_elf_read_gencore
+
+#include "gencore-elf.c"
Index: linux-2.6.36-rc7/fs/proc/Makefile
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/Makefile
+++ linux-2.6.36-rc7/fs/proc/Makefile
@@ -20,6 +20,7 @@ proc-y += uptime.o
proc-y += version.o
proc-y += softirqs.o
proc-$(CONFIG_ELF_CORE) += gencore.o gencore-elf.o
+proc-$(CONFIG_COMPAT_BINFMT_ELF) += gencore-compat-elf.o
proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o

2010-12-14 10:37:09

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 10/21] Create /proc/pid/core entry

于 2010年12月14日 18:11, Suzuki K. Poulose 写道:
> Create the /proc/PID/core entry.
>
> Signed-off-by: Suzuki K. Poulose<[email protected]>
> Signed-off-by: Ananth N.Mavinakayanahalli<[email protected]>

Can we have a new Kconfig for this? Instead of reusing CONFIG_ELF_CORE.
You have to make it depend on CONFIG_FREEZER, I think.

Thanks.

2010-12-14 10:52:03

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [Patch 11/21] Track the core generation requests

On Tue, Dec 14, 2010 at 03:42:04PM +0530, Suzuki K. Poulose wrote:
> --- /dev/null
> +++ linux-2.6.36-rc7/fs/proc/gencore.h

> +#include <linux/sched.h>

struct task_struct; is enough.

> +struct core_proc {
> + struct list_head list;
> + struct task_struct *task;
> +};

2010-12-14 10:54:13

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Tue, Dec 14, 2010 at 03:22:59PM +0530, Suzuki K. Poulose wrote:
> The interface is provided by a /proc/pid/core file, reading which can give the

What happens when process opens /proc/self/core ?
What happens when process opens /proc/self/core and coredumps before close(2).

2010-12-14 14:59:16

by suzuki

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Tue, 14 Dec 2010 12:54:05 +0200
Alexey Dobriyan <[email protected]> wrote:

> On Tue, Dec 14, 2010 at 03:22:59PM +0530, Suzuki K. Poulose wrote:
> > The interface is provided by a /proc/pid/core file, reading which can give the
>
> What happens when process opens /proc/self/core ?
The current thread will be skipped for freeze(), and all the others will be
frozen.We fill in the register states with 0's for the active threads.

> What happens when process opens /proc/self/core and coredumps before close(2).

Thanks for pointing this out. With the current code, this causes a hang for the
current process, waiting for the other threads to clear. I will think about
a solution to this making use of the mm->core_state information.


Thanks a lot for the review.

Suzuki

2010-12-14 15:02:13

by suzuki

[permalink] [raw]
Subject: Re: [Patch 10/21] Create /proc/pid/core entry

On Tue, 14 Dec 2010 18:36:36 +0800
Cong Wang <[email protected]> wrote:

> 于 2010年12月14日 18:11, Suzuki K. Poulose 写道:
> > Create the /proc/PID/core entry.
> >
> > Signed-off-by: Suzuki K. Poulose<[email protected]>
> > Signed-off-by: Ananth N.Mavinakayanahalli<[email protected]>
>
> Can we have a new Kconfig for this? Instead of reusing CONFIG_ELF_CORE.
> You have to make it depend on CONFIG_FREEZER, I think.
Completely agree with your suggestion. I will rework this in the next iteration.

Thanks
Suzuki

2010-12-14 15:50:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Tue, Dec 14, 2010 at 1:52 AM, Suzuki K. Poulose <[email protected]> wrote:
>
> This is series of patches implementing an infrastructure for capturing the core
> of an application without disrupting its process semantics.
>
> The infrastructure makes use of the freezer subsystem in kernel to freeze the
> threads and then collect the information to generate ?the core.

This seems to be a fundamentally flawed approach.

>From a security standpoint, it looks like a total disaster. A frozen
process is really hard to get rid of, so it looks like an obvious DoS
attack to just create lots of processes, then sneakily freeze them
all, and then laugh at the poor system admin who has no idea what's
going on. While frozen, the things are basically unkillable but look
entirely normal, no?

Linus

2010-12-14 15:55:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Patch 4/21] Make fill_psinfo generic

On Tue, Dec 14, 2010 at 2:01 AM, Suzuki K. Poulose <[email protected]> wrote:
> + ? ? ? /*
> + ? ? ? ?* If we are dumping core for "another task"
> + ? ? ? ?* we can't use copy_from_user().
> + ? ? ? ?*/
> + ? ? ? if (p->group_leader == current->group_leader) {

This is totally bogus. "group_leader" has absolutely nothing to do
with whether the process shares the address space. It's about signal
sharing etc.

The way to check whether they share the same address space is to check
whether "mm" is the same.

Linus

2010-12-14 16:00:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 3/21] Make vma_dump_size() generic

Suzuki, I didn't read this series. Still I have some concerns after
the very brief look...

On 12/14, Suzuki K. Poulose wrote:
>
> @@ -114,10 +113,18 @@ unsigned long vma_dump_size(struct vm_ar
> * Switch to the user "segment" for get_user(),
> * then put back what elf_core_dump() had in place.
> */
> - set_fs(USER_DS);
> - if (unlikely(get_user(word, header)))
> - word = 0;
> - set_fs(fs);
> + if (p->group_leader == current->group_leader) {

same_thread_group() ?

> + mm_segment_t fs = get_fs();
> + set_fs(USER_DS);
> + if (unlikely(get_user(word, header)))
> + word = 0;
> + set_fs(fs);
> + } else {
> + int bytes = access_process_vm(p, (unsigned long)header,
> + &word, sizeof (word), 0);

Well. This adds a little pessimization. It is possible that the
"normal" coredump dumps the CLONE_VM process which doesn't below
to current's thread group.

May be you can compare ->mm's, but I am not sure.

Oleg.

2010-12-14 16:01:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 4/21] Make fill_psinfo generic

On 12/14, Suzuki K. Poulose wrote:
>
> + if (p->group_leader == current->group_leader) {
> + if (copy_from_user(&psinfo->pr_psargs,
> + (const char __user *)mm->arg_start, len))
> + return -EFAULT;
> + } else {
> + int bytes = access_process_vm(p, mm->arg_start,
> + &psinfo->pr_psargs, len, 0);

The same.

Oleg.

2010-12-14 16:04:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

Hello,

On 12/14/2010 04:49 PM, Linus Torvalds wrote:
> On Tue, Dec 14, 2010 at 1:52 AM, Suzuki K. Poulose <[email protected]> wrote:
>>
>> This is series of patches implementing an infrastructure for capturing the core
>> of an application without disrupting its process semantics.
>>
>> The infrastructure makes use of the freezer subsystem in kernel to freeze the
>> threads and then collect the information to generate the core.
>
> This seems to be a fundamentally flawed approach.
>
>>From a security standpoint, it looks like a total disaster. A frozen
> process is really hard to get rid of, so it looks like an obvious DoS
> attack to just create lots of processes, then sneakily freeze them
> all, and then laugh at the poor system admin who has no idea what's
> going on. While frozen, the things are basically unkillable but look
> entirely normal, no?

I think a better way would be adding a ptrace attach which is nestable
and doesn't have the nasty side effect caused by the interactions
between the implicit SIGSTOP and group stop. As a preparation step, I
posted a patchset to cleanup the interactions between ptrace and group
stop which is being reviewed. Once we have a nestable ptrace attach,
we should be able to simply adapt gcore(1) to use it and write out
core dump from userland.

Thanks.

--
tejun

2010-12-14 16:04:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 5/21] Rename compat versions of the reusable core generation routines

On 12/14, Suzuki K. Poulose wrote:
>
> Rename the ELF class specific functions reusable for the application core dump
> infrastructure. The compat ELF class routines are prepended with compat_ .
>
> Signed-off-by: Suzuki K. Poulose <[email protected]>
> ---
> fs/compat_binfmt_elf.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Index: linux-2.6.36-rc7/fs/compat_binfmt_elf.c
> ===================================================================
> --- linux-2.6.36-rc7.orig/fs/compat_binfmt_elf.c
> +++ linux-2.6.36-rc7/fs/compat_binfmt_elf.c
> @@ -127,6 +127,13 @@ static void cputime_to_compat_timeval(co
> #define init_elf_binfmt init_compat_elf_binfmt
> #define exit_elf_binfmt exit_compat_elf_binfmt
>
> +/* Rename the functions that may be reused */
> +#define fill_elf_header compat_fill_elf_header
> +#define fill_psinfo compat_fill_psinfo
> +#define fill_prstatus compat_fill_prstatus
> +#define fill_extnum_info compat_fill_extnum_info
> +#define fill_auxv_note compat_fill_auxv_note

Can't undestand.... Afaics, the kernel can't be compiled
with this change until the next patches actually implement
compat_xxx ?

This is not bisect friendly.

Oleg.

2010-12-14 16:11:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 11/21] Track the core generation requests

On 12/14, Suzuki K. Poulose wrote:
>
> Concurrent core generation requests
> for the same target process are not allowed.

OK, but

> +static struct core_proc* get_core_proc(struct task_struct *t)
> +{
> + struct core_proc *cp;
> +
> + list_for_each_entry(cp, &core_list, list) {
> + if (cp->task == t->group_leader)
> + return cp;
> + }
> + return NULL;
> +}
> ...
> static int open_gencore(struct inode *inode, struct file *filp)
> {
> - return 0;
> + struct task_struct *task = get_proc_task(inode);
> + struct core_proc *cp;
> + int ret = 0;
> +
> + if (!task)
> + return -ENOENT;
> +
> + mutex_lock(&core_mutex);
> + cp = get_core_proc(task);
> + if (cp) {
> + ret = -EALREADY;

I don't think this can work.

The task can change ->group_leader. This means 2 or more
open_gencore() can succeed if this task execs in between.

Oleg.

2010-12-14 16:12:29

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 12/21] Check if the process is an ELF executable

On 12/14, Suzuki K. Poulose wrote:
>
> + */
> +static int get_elf_class(struct task_struct *task)
> +{
> + struct elfhdr hdr;
> + int ret = 0;
> +
> + ret = kernel_read(task->mm->exe_file, 0, (char*)&hdr, sizeof(hdr));

->exe_file can be NULL.

Oleg.

2010-12-14 16:20:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Tue, Dec 14, 2010 at 8:03 AM, Tejun Heo <[email protected]> wrote:
>
> I think a better way would be adding a ptrace attach which is nestable
> and doesn't have the nasty side effect caused by the interactions
> between the implicit SIGSTOP and group stop.

Don't we already nest at least to some degree? At least you can strace
a gdb session or another strace, no?

Linus

2010-12-14 16:24:38

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 13/21] Freeze / Thaw threads

On 12/14, Suzuki K. Poulose wrote:
>
> @@ -148,6 +159,21 @@ static int open_gencore(struct inode *in
> list_add(&cp->list, &core_list);
> mutex_unlock(&core_mutex);
>
> + /* freeze the processes */
> + t = task;
> + read_lock(&tasklist_lock);
> + do {
> + if (frozen(t) || !freezeable(t) || freezing(t))
> + continue;
> +
> + if (freeze_task(t, true))
> + continue;
> +
> + if (task_is_stopped_or_traced(t))
> + continue;
> + } while ((t = next_thread(t)) != task);
> + read_unlock(&tasklist_lock);

Ooooh. Sorry, I dislike this approach very much.

Firstly, I can't understand why this can't race with the kernel
doing freeze/thaw.

But the main problem is: these series adds the new and nice way
to create the unkillable processes. This can't be good.

Say, until you close this file, even oom-killer can't kill it.
It is easy to DoS the system. Just fork the new processes which
use the memory and freeze them.

And, given that we have to handle task_is_stopped_or_traced()
tasks correctly anyway, I think gencore should rely on STOPPED/
TRACED and doesn't use freezer at all.

> + } while ((t = next_thread(t)) != task);

while_each_thread()

Oleg.

2010-12-14 16:29:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

Hello, Linus.

On 12/14/2010 05:19 PM, Linus Torvalds wrote:
> On Tue, Dec 14, 2010 at 8:03 AM, Tejun Heo <[email protected]> wrote:
>>
>> I think a better way would be adding a ptrace attach which is nestable
>> and doesn't have the nasty side effect caused by the interactions
>> between the implicit SIGSTOP and group stop.
>
> Don't we already nest at least to some degree? At least you can strace
> a gdb session or another strace, no?

Yes, it depends on what the 'nesting' means. I was thinking about
gcore'ing or attching gdb to an already strace'd process. This might
not be strictly necessary but for ptrace attach to be transparent, we
need another PTRACE_* call anyway as PTRACE_ATTACH implies SIGSTOP and
having nestable attach would be beneficial in several use cases
including using debugging tools nested and allowing userland
checkpointing of debugging sessions.

Thanks.

--
tejun

2010-12-14 16:31:11

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 14/21] Create ELF header

On 12/14, Suzuki K. Poulose wrote:
>
> +static void get_elfhdr_size(struct core_proc *cp)
> +{
> + struct vm_area_struct *gate_vma;
> + int segs;
> +
> + segs = cp->task->mm->map_count;

But this is unsafe. What if cp->task was stopped when
open_gencore() was called? In this case it can be killed/dead.

cp->task->mm can be NULL. In fact, cp->task can point to nothing.

Any usage of cp->task is not safe, afaics.

Oleg.

2010-12-14 16:44:29

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 15/21] Collect ELF Core notes data

On 12/14, Suzuki K. Poulose wrote:
>
> @@ -164,10 +197,28 @@ static int open_gencore(struct inode *in
> list_add(&cp->list, &core_list);
> mutex_unlock(&core_mutex);
>
> + max_regset = get_max_regsets(task);
> +
> + for (i = 0; i < get_nr_threads(task); i++) {
> + tinfo = kzalloc(offsetof(struct elf_thread_core_info,
> + notes[max_regset]), GFP_KERNEL);
> + if (unlikely(!tinfo)) {
> + cleanup_cp(cp);
> + ret = -ENOMEM;
> + goto out;
> + }
> + tinfo->next = cp->tinfo;
> + cp->tinfo = tinfo;
> + }

This looks strange... Obviously we can't trust get_nr_threads()
before this process is stopped/frozen.

Just noticed... release_gencore() does thaw_process() for each
thread. But, again, if it was stopped we can race with SIGCONT
and exit. IOW, -EIO is possible. Who will thaw other threads?
Also, we can probably race with exec, but I am not sure.

Oleg.

2010-12-14 16:49:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [Patch 16/21] Wait for threads to freeze

On 12/14, Suzuki K. Poulose wrote:
>
> + * zero_thread_state :
> + * For threads, which are not frozen, we cannot capture the register state.

Indeed, but

> + if (!tinfo->frozen)
> + zero_thread_state(tinfo);

Again, we can race with TASK_STOPPED/SIGCONT. And
is_task_frozen_enough() thins task_is_stopped_or_traced() is fine.

Oleg.

2010-12-15 01:10:30

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Tue, 14 Dec 2010 15:22:59 +0530
"Suzuki K. Poulose" <[email protected]> wrote:

> Hi all,
>
> This is series of patches implementing an infrastructure for capturing the core
> of an application without disrupting its process semantics.
>
> The infrastructure makes use of the freezer subsystem in kernel to freeze the
> threads and then collect the information to generate the core.
>
> The interface is provided by a /proc/pid/core file, reading which can give the
> ELF formatted core of the process with "pid". The interface supports "seek"
> operation on the fd, allowing the dumper to have control on the data that is
> being dumped. Also it allows the user to store the dump at any location.
>
> The current implementation supports both native as well as the compat ELF
> tasks.
>
> An open() call to the /proc/pid/core will try to freeze the threads in the
> process and the read() requests will dynamically generate the contents for the
> core file. The ELF header & Program Headers are stored in a kernel buffer to
> allow us to map the fpos to the required data section.
>
> In case a thread is not frozen within a time interval, after issuing the freeze
> request, we fill the register state information with 0's to indicate we could
> not capture the data.
>
> A close() would kick the threads out of the refrigerator().
>
>
> The implementation reuses some of the existing ELF core generation code by
> exporting them. Some of the code common to both native and compat ELF class
> support has been moved to a common place, elfcore-common.c. Also some of the
> reusable functions, specific to the ELF class handling, has been made global,
> after renaming the compat version of the same.
>
> We also added a new API -elf_core_copy_extra_phdrs() -for "reading" the arch
> specific program headers, versus the existing elf_core_write_extra_phdrs().
>
> Patches 1 to 9 deals with re-arranging the ELF code to be reusable by the
> infrastructure.
>
> Patches 10 to 21 implements the infrastructure.
>
> TODO: Add support for collecting the arch specific notes, currently used only
> by Cell platform.
>
> Please let me know your review comments / thoughts.
>

Your purpose of this patch is to debug an application without attaching to gdb
or take coredump by gcore ?

IIUC, "freeze" is a bit dangerous because no one can ends the application while
it's freezed and there is no information "it's frozen" via usaual user commands
as 'ps' or 'top'.

Can you add a new freeze state where the application can get SIGKILL,
at least ? and show task's state as "frozen" in some way ? as
task_state_array[] shows it in /proc/<pid>/status

Thanks,
-Kame



2010-12-15 02:22:48

by suzuki

[permalink] [raw]
Subject: Re: [Patch 4/21] Make fill_psinfo generic

On Tue, 14 Dec 2010 07:55:00 -0800
Linus Torvalds <[email protected]> wrote:

> On Tue, Dec 14, 2010 at 2:01 AM, Suzuki K. Poulose <[email protected]> wrote:
> > + ? ? ? /*
> > + ? ? ? ?* If we are dumping core for "another task"
> > + ? ? ? ?* we can't use copy_from_user().
> > + ? ? ? ?*/
> > + ? ? ? if (p->group_leader == current->group_leader) {
>
> This is totally bogus. "group_leader" has absolutely nothing to do
> with whether the process shares the address space. It's about signal
> sharing etc.
>
> The way to check whether they share the same address space is to check
> whether "mm" is the same.
>

OK. I will make use of the mm here.

Thanks
Suzuki

2010-12-15 02:30:34

by suzuki

[permalink] [raw]
Subject: Re: [Patch 5/21] Rename compat versions of the reusable core generation routines

On Tue, 14 Dec 2010 16:57:08 +0100
Oleg Nesterov <[email protected]> wrote:

> On 12/14, Suzuki K. Poulose wrote:
> >
> > Rename the ELF class specific functions reusable for the application core dump
> > infrastructure. The compat ELF class routines are prepended with compat_ .
> >
> > Signed-off-by: Suzuki K. Poulose <[email protected]>
> > ---
> > fs/compat_binfmt_elf.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > Index: linux-2.6.36-rc7/fs/compat_binfmt_elf.c
> > ===================================================================
> > --- linux-2.6.36-rc7.orig/fs/compat_binfmt_elf.c
> > +++ linux-2.6.36-rc7/fs/compat_binfmt_elf.c
> > @@ -127,6 +127,13 @@ static void cputime_to_compat_timeval(co
> > #define init_elf_binfmt init_compat_elf_binfmt
> > #define exit_elf_binfmt exit_compat_elf_binfmt
> >
> > +/* Rename the functions that may be reused */
> > +#define fill_elf_header compat_fill_elf_header
> > +#define fill_psinfo compat_fill_psinfo
> > +#define fill_prstatus compat_fill_prstatus
> > +#define fill_extnum_info compat_fill_extnum_info
> > +#define fill_auxv_note compat_fill_auxv_note
>
> Can't undestand.... Afaics, the kernel can't be compiled
> with this change until the next patches actually implement
> compat_xxx ?

The compat_binfmt_elf.c shares the actual ELF handling code from binfmt_elf.c (by
#includ ing it). This patch renames the functions so that we can later export it.
The "#include binfmt_elf.c" comes later in the compat_binfmt_elf.c. So we are fine.
>
> This is not bisect friendly.
>
> Oleg.
>

2010-12-15 05:24:07

by suzuki

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Wed, 15 Dec 2010 10:04:37 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> On Tue, 14 Dec 2010 15:22:59 +0530
> "Suzuki K. Poulose" <[email protected]> wrote:
>
> > Hi all,
> >
> > This is series of patches implementing an infrastructure for capturing the core
> > of an application without disrupting its process semantics.
> >
> > The infrastructure makes use of the freezer subsystem in kernel to freeze the
> > threads and then collect the information to generate the core.
> >
> > The interface is provided by a /proc/pid/core file, reading which can give the
> > ELF formatted core of the process with "pid". The interface supports "seek"
> > operation on the fd, allowing the dumper to have control on the data that is
> > being dumped. Also it allows the user to store the dump at any location.
> >
> > The current implementation supports both native as well as the compat ELF
> > tasks.
> >
> > An open() call to the /proc/pid/core will try to freeze the threads in the
> > process and the read() requests will dynamically generate the contents for the
> > core file. The ELF header & Program Headers are stored in a kernel buffer to
> > allow us to map the fpos to the required data section.
> >
> > In case a thread is not frozen within a time interval, after issuing the freeze
> > request, we fill the register state information with 0's to indicate we could
> > not capture the data.
> >
> > A close() would kick the threads out of the refrigerator().
> >
> >
> > The implementation reuses some of the existing ELF core generation code by
> > exporting them. Some of the code common to both native and compat ELF class
> > support has been moved to a common place, elfcore-common.c. Also some of the
> > reusable functions, specific to the ELF class handling, has been made global,
> > after renaming the compat version of the same.
> >
> > We also added a new API -elf_core_copy_extra_phdrs() -for "reading" the arch
> > specific program headers, versus the existing elf_core_write_extra_phdrs().
> >
> > Patches 1 to 9 deals with re-arranging the ELF code to be reusable by the
> > infrastructure.
> >
> > Patches 10 to 21 implements the infrastructure.
> >
> > TODO: Add support for collecting the arch specific notes, currently used only
> > by Cell platform.
> >
> > Please let me know your review comments / thoughts.
> >
>
> Your purpose of this patch is to debug an application without attaching to gdb
> or take coredump by gcore ?

The purpose is to take the coredump in a more reliable way without affecting
the process semantics.
>
> IIUC, "freeze" is a bit dangerous because no one can ends the application while
> it's freezed and there is no information "it's frozen" via usaual user commands
> as 'ps' or 'top'.
>
> Can you add a new freeze state where the application can get SIGKILL,
> at least ? and show task's state as "frozen" in some way ? as
> task_state_array[] shows it in /proc/<pid>/status
I will investigate this approach.

Thanks
Suzuki

2010-12-15 05:34:51

by suzuki

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Tue, 14 Dec 2010 07:49:37 -0800
Linus Torvalds <[email protected]> wrote:

> On Tue, Dec 14, 2010 at 1:52 AM, Suzuki K. Poulose <[email protected]> wrote:
> >
> > This is series of patches implementing an infrastructure for capturing the core
> > of an application without disrupting its process semantics.
> >
> > The infrastructure makes use of the freezer subsystem in kernel to freeze the
> > threads and then collect the information to generate ?the core.
>
> This seems to be a fundamentally flawed approach.
>
> From a security standpoint, it looks like a total disaster. A frozen
> process is really hard to get rid of, so it looks like an obvious DoS
> attack to just create lots of processes, then sneakily freeze them
> all, and then laugh at the poor system admin who has no idea what's
> going on. While frozen, the things are basically unkillable but look
> entirely normal, no?

You are right. We need a simple mechanism to hold the threads, so that we
could collect the register information of the process without affecting its
process semantics (eg, signals etc.). The suggestion by Kamezawa-san -a
freeze state variant which allows SIGKILL - is one possibility.

I'd be very glad not using the freezer if there is a neat way to accomplish
this without the undesired side effects. Tejun's ptrace enhancement would
still require a userland program to control it(gcore); something contained
in the kernel would be ideal.


Thanks

Suzuki

>
> Linus

2010-12-15 07:20:48

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 18/21] Generate the data sections for ELF Core

于 2010年12月14日 18:22, Suzuki K. Poulose 写道:
...
> + /*
> + * Read from the vma segments
> + * a. verify if the *fpos is within a phdr
> + * b. Use access_process_vm() to get data page by page
> + * c. copy_to_user into user buffer
> + */


This kind of comments is useless, "code tells you how, comments tell you why."


> +
> + while (buflen) {
> + size_t bufsz, offset, bytes;
> + char *readbuf;
> + struct elf_phdr *phdr = get_pos_elfphdr(cp, *fpos);
> +
> + if (!phdr)
> + break;
> +
> + bufsz = (buflen> PAGE_SIZE) ? PAGE_SIZE : buflen;
> + readbuf = kmalloc(bufsz, GFP_KERNEL);
> + if (!readbuf) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + offset = *fpos - phdr->p_offset;
> + bytes = access_process_vm(cp->task, (phdr->p_vaddr + offset),
> + readbuf, bufsz, 0);
> + if (!bytes) {
> + ret = -EIO;
> + goto out;


Why you don't kfree(readbuf) here?

> + }
> + if (copy_to_user(buffer, readbuf, bytes)) {
> + ret = -EFAULT;
> + kfree(readbuf);
> + goto out;
> + } else
> + acc += bytes;
> +
> + kfree(readbuf);
> + buflen -= bytes;
> + buffer += bytes;
> + *fpos += bytes;
> + }
> +
> + /* Fill extnum section header if present */
> + if (buflen&&
> + elf_hdr->e_shoff&&
> + (*fpos>= elf_hdr->e_shoff)&&
> + (*fpos< (elf_hdr->e_shoff + sizeof(struct elf_shdr)))) {


This indention seems ugly.

> +
> + off_t offset = *fpos - elf_hdr->e_shoff;

Are you sure it is 'off_t' not 'loff_t'?

> + size_t shdrsz = sizeof(struct elf_shdr) - offset;
> +
> + shdrsz = (buflen< shdrsz) ? buflen : shdrsz;
> + if (copy_to_user(buffer, ((char *)cp->shdr) + offset, shdrsz)) {
> + ret = -EFAULT;
> + goto out;
> + } else {
> + acc += shdrsz;
> + buflen -= shdrsz;
> + buffer += shdrsz;
> + }
> + }
>
> +done:
> + ret = acc;
> out:
> return ret;
> }

Thanks.

2010-12-15 09:38:10

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

Hello, Suzuki.

On 12/15/2010 06:34 AM, Suzuki K. Poulose wrote:
> I'd be very glad not using the freezer if there is a neat way to
> accomplish this without the undesired side effects. Tejun's ptrace
> enhancement would still require a userland program to control
> it(gcore); something contained in the kernel would be ideal.

Why is using gcore a bad thing? If we make ptrace avoid the implicit
SIGSTOP, the side effects of ptrace would be the same as using freezer
but with the benefit that it's properly integrated to the process
model and job control.

Thanks.

--
tejun

2010-12-15 11:26:21

by suzuki

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Wed, 15 Dec 2010 10:37:48 +0100
Tejun Heo <[email protected]> wrote:

> Hello, Suzuki.
>
> On 12/15/2010 06:34 AM, Suzuki K. Poulose wrote:
> > I'd be very glad not using the freezer if there is a neat way to
> > accomplish this without the undesired side effects. Tejun's ptrace
> > enhancement would still require a userland program to control
> > it(gcore); something contained in the kernel would be ideal.
>
> Why is using gcore a bad thing? If we make ptrace avoid the implicit
> SIGSTOP, the side effects of ptrace would be the same as using freezer
> but with the benefit that it's properly integrated to the process
> model and job control.

The advantages of the new approach are :

1) A process can trigger a core synchronously, upon an event, say a signal
handler and continue from there. gcore would require a fork(), which is not
safe to use from a signal handler.

2) We can seek to only the data we need

Thanks
Suzuki

2010-12-15 11:51:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

> 1) A process can trigger a core synchronously, upon an event, say a signal
> handler and continue from there. gcore would require a fork(), which is not
> safe to use from a signal handler.

Why is it not safe for a signal?

>From the kernel side it should be. glibc may do some funky things
on fork, but you could always call the syscall directly.

-Andi
--
[email protected] -- Speaking for myself only.

2010-12-15 12:46:38

by suzuki

[permalink] [raw]
Subject: Re: [Patch 18/21] Generate the data sections for ELF Core

On Wed, 15 Dec 2010 15:19:54 +0800
Cong Wang <[email protected]> wrote:

> 于 2010年12月14日 18:22, Suzuki K. Poulose 写道:
> ...
> > + /*
> > + * Read from the vma segments
> > + * a. verify if the *fpos is within a phdr
> > + * b. Use access_process_vm() to get data page by page
> > + * c. copy_to_user into user buffer
> > + */
>
>
> This kind of comments is useless, "code tells you how, comments tell you why."
>
>
> > +
> > + while (buflen) {
> > + size_t bufsz, offset, bytes;
> > + char *readbuf;
> > + struct elf_phdr *phdr = get_pos_elfphdr(cp, *fpos);
> > +
> > + if (!phdr)
> > + break;
> > +
> > + bufsz = (buflen> PAGE_SIZE) ? PAGE_SIZE : buflen;
> > + readbuf = kmalloc(bufsz, GFP_KERNEL);
> > + if (!readbuf) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + offset = *fpos - phdr->p_offset;
> > + bytes = access_process_vm(cp->task, (phdr->p_vaddr + offset),
> > + readbuf, bufsz, 0);
> > + if (!bytes) {
> > + ret = -EIO;
> > + goto out;
>
>
> Why you don't kfree(readbuf) here?
>

I will add that.
> > + }
> > + if (copy_to_user(buffer, readbuf, bytes)) {
> > + ret = -EFAULT;
> > + kfree(readbuf);
> > + goto out;
> > + } else
> > + acc += bytes;
> > +
> > + kfree(readbuf);
> > + buflen -= bytes;
> > + buffer += bytes;
> > + *fpos += bytes;
> > + }
> > +
> > + /* Fill extnum section header if present */
> > + if (buflen&&
> > + elf_hdr->e_shoff&&
> > + (*fpos>= elf_hdr->e_shoff)&&
> > + (*fpos< (elf_hdr->e_shoff + sizeof(struct elf_shdr)))) {
>
>
> This indention seems ugly.

I will fix this.
>
> > +
> > + off_t offset = *fpos - elf_hdr->e_shoff;
>
> Are you sure it is 'off_t' not 'loff_t'?
>

You are right. I will change it
> > + size_t shdrsz = sizeof(struct elf_shdr) - offset;
> > +
> > + shdrsz = (buflen< shdrsz) ? buflen : shdrsz;
> > + if (copy_to_user(buffer, ((char *)cp->shdr) + offset, shdrsz)) {
> > + ret = -EFAULT;
> > + goto out;
> > + } else {
> > + acc += shdrsz;
> > + buflen -= shdrsz;
> > + buffer += shdrsz;
> > + }
> > + }
> >
> > +done:
> > + ret = acc;
> > out:
> > return ret;
> > }
>

Thank You !

Suzuki
> Thanks.

2010-12-16 02:14:28

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 18/21] Generate the data sections for ELF Core

于 2010年12月15日 20:46, Suzuki K. Poulose 写道:
> On Wed, 15 Dec 2010 15:19:54 +0800
> Cong Wang<[email protected]> wrote:
>
>> 于 2010年12月14日 18:22, Suzuki K. Poulose 写道:
...
>>> +
>>> + off_t offset = *fpos - elf_hdr->e_shoff;
>>
>> Are you sure it is 'off_t' not 'loff_t'?
>>
>
> You are right. I will change it
>>> + size_t shdrsz = sizeof(struct elf_shdr) - offset;
>>> +

If you use loff_t, then this line needs to be changed too.
size_t could be shorter than loff_t.

Thanks.

2010-12-16 07:57:11

by suzuki

[permalink] [raw]
Subject: Re: [RFC] [Patch 0/21] Non disruptive application core dump infrastructure

On Wed, 15 Dec 2010 12:51:54 +0100
Andi Kleen <[email protected]> wrote:

> > 1) A process can trigger a core synchronously, upon an event, say a signal
> > handler and continue from there. gcore would require a fork(), which is not
> > safe to use from a signal handler.
>
> Why is it not safe for a signal?
>
> From the kernel side it should be. glibc may do some funky things
> on fork, but you could always call the syscall directly.

I came across the following links,

http://sourceware.org/bugzilla/show_bug.cgi?id=4737
https://www.opengroup.org/austin/docs/austin_445.txt

which seem to suggest that fork() is not async-safe due to malloc and the
fork handlers. But a direct fork syscall by-passing glibc should be safe.

Thanks
Suzuki

>
> -Andi