2013-10-04 10:30:41

by Janani Venkataraman1

[permalink] [raw]
Subject: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

Hi all,

The following series implements an infrastructure for capturing the core of an
application without disrupting its process.

So ideally what we are trying to do is to export the infrastructure using
/proc/pid/core. Reading the file would give an ELF Format core-dump at that
instant non-disruptively, without sending signals.

This would involve basically three operations:

1) Holding the threads of a process without sending a signal (SIGSTOP). At this
point we can collect the register set snapshot and collect other information
required to create the ELF header. The above operation could be initiated with
the open() call.

2) Once the ELF header is created, read() can return the CORE DUMP data
including, the process memory page-by-page, based on the fpos (file position).

3) The threads could be released upon a close().

We discussed various approaches for the implemenation in the post given below.
-https://lkml.org/lkml/2013/9/3/122

This series is based on the Task work add approach. We didn't adopt the CRIU
approch because of the following reasons:

* It is not upstream yet.

* There are concerns about the security of the dump.

* It involves a lot of changes and this approach provides a UNIX style
interface.

Task work add

task_work_add() is an interface and an API. The task work add will run any
queued work before returning to user space from the kernel. So that work is
guaranteed to be done before user space can run again. So basically it queues a
work for a task which is guaranteed to be executed when the task returns from
kernel space to user space.

* Exploit this function to hold the threads when they are returning to the
user space.

* Wait until all the threads of the process to be dumped, reach task_work_add.

* Once all the threads have reached, the dump is taken and they are released.

TODO:

* A mechanism to know when all the threads have reached the task added.

* A way to handle a case when one of the threads of the task to be dumped
is blocked in the kernel.

* We could also add the infrastructure under a config option,
say:CONFIG_ELF_GENCORE

* The current implementation doesn't wait for the threads to reach
wait_for_completion(). Hence there is no guarantee of collecting the
'register set' reliably. We will address this issue in the next version.
This is a prototype implementation to get reviews and comments.

Patches 1 to 8 deals with re-arranging the ELF code to be reusable by the
infrastructure.

Patches 9 to 19 implements the infrastructure.

Please let me know your reviews and comments.

Janani Venkataraman (19):
Create elfcore-common.c for ELF class independent core generation helpers
Make vma_dump_size() generic
Make fill_psinfo generic
Rename compat versions of the reusable core generation routines
Export the reusable ELF core generation routines
Define API for reading arch specif Program Headers for Core
ia64 impelementation for elf_core_copy_extra_phdrs()
elf_core_copy_extra_phdrs() for UML
Create /proc/pid/core entry
Track the core generation requests
Check if the process is an ELF executable
Hold the threads using task_work_add
Create ELF Header
Create ELF Core notes Data
Calculate the size of the core file
Generate the data sections for ELF Core
Identify the ELF class of the process
Adding support for compat ELF class data structures
Compat ELF class core generation support


arch/ia64/kernel/elfcore.c | 34 +++
arch/x86/um/elfcore.c | 32 +++
fs/Makefile | 1
fs/binfmt_elf.c | 190 ++--------------
fs/compat_binfmt_elf.c | 7 +
fs/elfcore-common.c | 169 ++++++++++++++
fs/proc/Makefile | 2
fs/proc/base.c | 2
fs/proc/gencore-compat-elf.c | 62 +++++
fs/proc/gencore-elf.c | 458 ++++++++++++++++++++++++++++++++++++++
fs/proc/gencore.c | 262 ++++++++++++++++++++++
fs/proc/gencore.h | 74 ++++++
fs/proc/internal.h | 1
include/linux/elfcore-internal.h | 72 ++++++
include/linux/elfcore.h | 3
kernel/elfcore.c | 6
16 files changed, 1209 insertions(+), 166 deletions(-)
create mode 100644 fs/elfcore-common.c
create mode 100644 fs/proc/gencore-compat-elf.c
create mode 100644 fs/proc/gencore-elf.c
create mode 100644 fs/proc/gencore.c
create mode 100644 fs/proc/gencore.h
create mode 100644 include/linux/elfcore-internal.h

--
Janani


2013-10-04 10:30:48

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 01/19] Create elfcore-common.c for ELF class independent core generation helpers

From:Suzuki K. Poulose <[email protected]>

Move the common code, shared by both native and compat ELF core generation code
to a single instance.These functions could be re-used later for application
core dump infrastructure.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/Makefile | 1
fs/binfmt_elf.c | 156 -------------------------------------
fs/elfcore-common.c | 161 ++++++++++++++++++++++++++++++++++++++
include/linux/elfcore-internal.h | 46 +++++++++++
4 files changed, 209 insertions(+), 155 deletions(-)
create mode 100644 fs/elfcore-common.c
create mode 100644 include/linux/elfcore-internal.h

diff --git a/fs/Makefile b/fs/Makefile
index 4fe6df3..70e7add 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_BINFMT_MISC) += binfmt_misc.o
obj-$(CONFIG_BINFMT_SCRIPT) += binfmt_script.o
obj-$(CONFIG_BINFMT_ELF) += binfmt_elf.o
obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o
+obj-$(CONFIG_ELF_CORE) += elfcore-common.o
obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o
obj-$(CONFIG_BINFMT_SOM) += binfmt_som.o
obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 100edcc..6c8df47 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -37,7 +37,7 @@
#include <asm/uaccess.h>
#include <asm/param.h>
#include <asm/page.h>
-
+#include <linux/elfcore-internal.h>
#ifndef user_long_t
#define user_long_t long
#endif
@@ -1095,124 +1095,6 @@ out:
* Jeremy Fitzhardinge <[email protected]>
*/

-/*
- * The purpose of always_dump_vma() is to make sure that special kernel mappings
- * that are useful for post-mortem analysis are included in every core dump.
- * In that way we ensure that the core dump is fully interpretable later
- * without matching up the same kernel and hardware config to see what PC values
- * meant. These special mappings include - vDSO, vsyscall, and other
- * architecture specific mappings
- */
-static bool always_dump_vma(struct vm_area_struct *vma)
-{
- /* Any vsyscall mappings? */
- if (vma == get_gate_vma(vma->vm_mm))
- return true;
- /*
- * arch_vma_name() returns non-NULL for special architecture mappings,
- * such as vDSO sections.
- */
- if (arch_vma_name(vma))
- return true;
-
- return false;
-}
-
-/*
- * Decide what to dump of a segment, part, all or none.
- */
-static unsigned long vma_dump_size(struct vm_area_struct *vma,
- unsigned long mm_flags)
-{
-#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
-
- /* always dump the vdso and vsyscall sections */
- if (always_dump_vma(vma))
- goto whole;
-
- if (vma->vm_flags & VM_DONTDUMP)
- return 0;
-
- /* Hugetlb memory check */
- if (vma->vm_flags & VM_HUGETLB) {
- if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
- goto whole;
- if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
- goto whole;
- return 0;
- }
-
- /* Do not dump I/O mapped devices or special mappings */
- if (vma->vm_flags & VM_IO)
- return 0;
-
- /* By default, dump shared memory if mapped from an anonymous file. */
- if (vma->vm_flags & VM_SHARED) {
- if (file_inode(vma->vm_file)->i_nlink == 0 ?
- FILTER(ANON_SHARED) : FILTER(MAPPED_SHARED))
- goto whole;
- return 0;
- }
-
- /* Dump segments that have been written to. */
- if (vma->anon_vma && FILTER(ANON_PRIVATE))
- goto whole;
- if (vma->vm_file == NULL)
- return 0;
-
- if (FILTER(MAPPED_PRIVATE))
- goto whole;
-
- /*
- * If this looks like the beginning of a DSO or executable mapping,
- * check for an ELF header. If we find one, dump the first page to
- * aid in determining what was mapped here.
- */
- if (FILTER(ELF_HEADERS) &&
- vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ)) {
- u32 __user *header = (u32 __user *) vma->vm_start;
- u32 word;
- mm_segment_t fs = get_fs();
- /*
- * Doing it this way gets the constant folded by GCC.
- */
- union {
- u32 cmp;
- char elfmag[SELFMAG];
- } magic;
- BUILD_BUG_ON(SELFMAG != sizeof word);
- magic.elfmag[EI_MAG0] = ELFMAG0;
- magic.elfmag[EI_MAG1] = ELFMAG1;
- magic.elfmag[EI_MAG2] = ELFMAG2;
- magic.elfmag[EI_MAG3] = ELFMAG3;
- /*
- * Switch to the user "segment" for get_user(),
- * then put back what elf_core_dump() had in place.
- */
- set_fs(USER_DS);
- if (unlikely(get_user(word, header)))
- word = 0;
- set_fs(fs);
- if (word == magic.cmp)
- return PAGE_SIZE;
- }
-
-#undef FILTER
-
- return 0;
-
-whole:
- return vma->vm_end - vma->vm_start;
-}
-
-/* An ELF note in memory */
-struct memelfnote
-{
- const char *name;
- int type;
- unsigned int datasz;
- void *data;
-};

static int notesize(struct memelfnote *en)
{
@@ -1291,16 +1173,6 @@ static void fill_elf_note_phdr(struct elf_phdr *phdr, int sz, loff_t offset)
return;
}

-static void fill_note(struct memelfnote *note, const char *name, int type,
- unsigned int sz, void *data)
-{
- note->name = name;
- note->type = type;
- note->datasz = sz;
- note->data = data;
- return;
-}
-
/*
* fill up all the fields in prstatus from the given task struct, except
* registers which need to be filled up separately.
@@ -1974,32 +1846,6 @@ static void free_note_info(struct elf_note_info *info)

#endif

-static struct vm_area_struct *first_vma(struct task_struct *tsk,
- struct vm_area_struct *gate_vma)
-{
- struct vm_area_struct *ret = tsk->mm->mmap;
-
- if (ret)
- return ret;
- return gate_vma;
-}
-/*
- * Helper function for iterating across a vma list. It ensures that the caller
- * will visit `gate_vma' prior to terminating the search.
- */
-static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
- struct vm_area_struct *gate_vma)
-{
- struct vm_area_struct *ret;
-
- ret = this_vma->vm_next;
- if (ret)
- return ret;
- if (this_vma == gate_vma)
- return NULL;
- return gate_vma;
-}
-
static void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
elf_addr_t e_shoff, int segs)
{
diff --git a/fs/elfcore-common.c b/fs/elfcore-common.c
new file mode 100644
index 0000000..4886f15
--- /dev/null
+++ b/fs/elfcore-common.c
@@ -0,0 +1,161 @@
+/*
+ *
+ * Copyright 1993, 1994: Eric Youngdale ([email protected]).
+ *
+ * Initial write up for ELF_CORE :
+ * Modelled on fs/exec.c:aout_core_dump()
+ * Jeremy Fitzhardinge <[email protected]>
+ *
+ * Moved common routines used by both native and compat ELF core generation
+ * from binfmt_elf.c to a single instance.
+ * Suzuki K. Poulose <[email protected]>
+ */
+
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/elf.h>
+#include <asm/uaccess.h>
+#include <asm/param.h>
+#include <asm/page.h>
+#include <linux/elfcore-internal.h>
+
+static struct vm_area_struct *first_vma(struct task_struct *tsk,
+ struct vm_area_struct *gate_vma)
+{
+ struct vm_area_struct *ret = tsk->mm->mmap;
+
+ if (ret)
+ return ret;
+ return gate_vma;
+}
+
+/*
+ * Helper function for iterating across a vma list. It ensures that the caller
+ * will visit `gate_vma' prior to terminating the search.
+ */
+static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
+ struct vm_area_struct *gate_vma)
+{
+ struct vm_area_struct *ret;
+
+ ret = this_vma->vm_next;
+ if (ret)
+ return ret;
+ if (this_vma == gate_vma)
+ return NULL;
+ return gate_vma;
+}
+
+/*
+ * The purpose of always_dump_vma() is to make sure that special kernel mappings
+ * that are useful for post-mortem analysis are included in every core dump.
+ * In that way we ensure that the core dump is fully interpretable later
+ * without matching up the same kernel and hardware config to see what PC values
+ * meant. These special mappings include - vDSO, vsyscall, and other
+ * architecture specific mappings
+ */
+static bool always_dump_vma(struct vm_area_struct *vma)
+{
+ /* Any vsyscall mappings? */
+ if (vma == get_gate_vma(vma->vm_mm))
+ return true;
+ /*
+ * arch_vma_name() returns non-NULL for special architecture mappings,
+ * such as vDSO sections.
+ */
+ if (arch_vma_name(vma))
+ return true;
+
+ return false;
+}
+
+/*
+ * Decide what to dump of a segment, part, all or none.
+ */
+static unsigned long vma_dump_size(struct vm_area_struct *vma,
+ unsigned long mm_flags)
+{
+#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
+
+ /* always dump the vdso and vsyscall sections */
+ if (always_dump_vma(vma))
+ goto whole;
+
+ if (vma->vm_flags & VM_DONTDUMP)
+ return 0;
+
+ /* Hugetlb memory check */
+ if (vma->vm_flags & VM_HUGETLB) {
+ if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
+ goto whole;
+ if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
+ goto whole;
+ return 0;
+ }
+
+ /* Do not dump I/O mapped devices or special mappings */
+ if (vma->vm_flags & VM_IO)
+ return 0;
+
+ /* By default, dump shared memory if mapped from an anonymous file. */
+ if (vma->vm_flags & VM_SHARED) {
+ if (file_inode(vma->vm_file)->i_nlink == 0 ?
+ FILTER(ANON_SHARED) : FILTER(MAPPED_SHARED))
+ goto whole;
+ return 0;
+ }
+
+ /* Dump segments that have been written to. */
+ if (vma->anon_vma && FILTER(ANON_PRIVATE))
+ goto whole;
+ if (vma->vm_file == NULL)
+ return 0;
+
+ if (FILTER(MAPPED_PRIVATE))
+ goto whole;
+
+ /*
+ * If this looks like the beginning of a DSO or executable mapping,
+ * check for an ELF header. If we find one, dump the first page to
+ * aid in determining what was mapped here.
+ */
+ if (FILTER(ELF_HEADERS) &&
+ vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ)) {
+ u32 __user *header = (u32 __user *) vma->vm_start;
+
+ u32 word;
+ mm_segment_t fs = get_fs();
+ /*
+ * Doing it this way gets the constant folded by GCC.
+ */
+ union {
+ u32 cmp;
+ char elfmag[SELFMAG];
+ } magic;
+ BUILD_BUG_ON(SELFMAG != sizeof word);
+ magic.elfmag[EI_MAG0] = ELFMAG0;
+ magic.elfmag[EI_MAG1] = ELFMAG1;
+ magic.elfmag[EI_MAG2] = ELFMAG2;
+ magic.elfmag[EI_MAG3] = ELFMAG3;
+ /*
+ * Switch to the user "segment" for get_user(),
+ * then put back what elf_core_dump() had in place.
+ */
+ set_fs(USER_DS);
+ if (unlikely(get_user(word, header)))
+ word = 0;
+ set_fs(fs);
+ if (word == magic.cmp)
+ return PAGE_SIZE;
+ }
+
+#undef FILTER
+
+ return 0;
+
+whole:
+ return vma->vm_end - vma->vm_start;
+}
+
diff --git a/include/linux/elfcore-internal.h b/include/linux/elfcore-internal.h
new file mode 100644
index 0000000..f6354f0
--- /dev/null
+++ b/include/linux/elfcore-internal.h
@@ -0,0 +1,46 @@
+/*
+ * Common routines for native and compat elf core generation.
+ *
+ * Copyright 1993, 1994: Eric Youngdale ([email protected]).
+ *
+ * Modelled on fs/exec.c:aout_core_dump()
+ * Jeremy Fitzhardinge <[email protected]>
+ *
+ * Moved the common routines from binfmt_elf.c to elfcore-common.c
+ * - Suzuki K. Poulose <[email protected]>
+ */
+
+#ifndef __ELF_CORE_INTERNAL_H
+#define __ELF_CORE_INTERNAL_H
+
+#ifdef __KERNEL__
+
+/* An ELF note in memory */
+struct memelfnote
+{
+ const char *name;
+ int type;
+ unsigned int datasz;
+ void *data;
+};
+
+static void fill_note(struct memelfnote *note, const char *name, int type,
+ unsigned int sz, void *data)
+{
+ note->name = name;
+ note->type = type;
+ note->datasz = sz;
+ note->data = data;
+ return;
+}
+
+extern struct vm_area_struct *first_vma(struct task_struct *tsk,
+ struct vm_area_struct *gate_vma);
+extern struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
+ struct vm_area_struct *gate_vma);
+extern unsigned long vma_dump_size(struct vm_area_struct *vma,
+ unsigned long mm_flags);
+
+#endif
+#endif
+

2013-10-04 10:31:10

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 02/19] Make vma_dump_size() generic

From:Suzuki K. Poulose <[email protected]>

vma_dump_size calculates the file size of a vma area in the core file. It
assumes the vma belongs to the "current". Make it generic to work for any task.
This will be reused by application core dump infrastructure.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/binfmt_elf.c | 6 +++---
fs/elfcore-common.c | 24 ++++++++++++++++--------
include/linux/elfcore-internal.h | 4 ++--
3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6c8df47..107bbd3 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1870,7 +1870,7 @@ static size_t elf_core_vma_data_size(struct vm_area_struct *gate_vma,

for (vma = first_vma(current, gate_vma); vma != NULL;
vma = next_vma(vma, gate_vma))
- size += vma_dump_size(vma, mm_flags);
+ size += vma_dump_size(current, vma, mm_flags);
return size;
}

@@ -1994,7 +1994,7 @@ static int elf_core_dump(struct coredump_params *cprm)
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = vma_dump_size(vma, cprm->mm_flags);
+ phdr.p_filesz = vma_dump_size(current, vma, cprm->mm_flags);
phdr.p_memsz = vma->vm_end - vma->vm_start;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -2029,7 +2029,7 @@ static int elf_core_dump(struct coredump_params *cprm)
unsigned long addr;
unsigned long end;

- end = vma->vm_start + vma_dump_size(vma, cprm->mm_flags);
+ end = vma->vm_start + vma_dump_size(current, vma, cprm->mm_flags);

for (addr = vma->vm_start; addr < end; addr += PAGE_SIZE) {
struct page *page;
diff --git a/fs/elfcore-common.c b/fs/elfcore-common.c
index 4886f15..48f1a9d 100644
--- a/fs/elfcore-common.c
+++ b/fs/elfcore-common.c
@@ -21,7 +21,7 @@
#include <asm/page.h>
#include <linux/elfcore-internal.h>

-static struct vm_area_struct *first_vma(struct task_struct *tsk,
+struct vm_area_struct *first_vma(struct task_struct *tsk,
struct vm_area_struct *gate_vma)
{
struct vm_area_struct *ret = tsk->mm->mmap;
@@ -35,7 +35,7 @@ static struct vm_area_struct *first_vma(struct task_struct *tsk,
* Helper function for iterating across a vma list. It ensures that the caller
* will visit `gate_vma' prior to terminating the search.
*/
-static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
+struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
struct vm_area_struct *gate_vma)
{
struct vm_area_struct *ret;
@@ -56,7 +56,7 @@ static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
* meant. These special mappings include - vDSO, vsyscall, and other
* architecture specific mappings
*/
-static bool always_dump_vma(struct vm_area_struct *vma)
+bool always_dump_vma(struct vm_area_struct *vma)
{
/* Any vsyscall mappings? */
if (vma == get_gate_vma(vma->vm_mm))
@@ -74,7 +74,7 @@ static bool always_dump_vma(struct vm_area_struct *vma)
/*
* Decide what to dump of a segment, part, all or none.
*/
-static unsigned long vma_dump_size(struct vm_area_struct *vma,
+unsigned long vma_dump_size(struct task_struct *p, struct vm_area_struct *vma,
unsigned long mm_flags)
{
#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
@@ -143,10 +143,18 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
* Switch to the user "segment" for get_user(),
* then put back what elf_core_dump() had in place.
*/
- set_fs(USER_DS);
- if (unlikely(get_user(word, header)))
- word = 0;
- set_fs(fs);
+ if (p->mm == current->mm) {
+ mm_segment_t fs = get_fs();
+ set_fs(USER_DS);
+ if (unlikely(get_user(word, header)))
+ word = 0;
+ set_fs(fs);
+ } else {
+ int bytes = access_process_vm(p, (unsigned long)header,
+ &word, sizeof (word), 0);
+ if (bytes != sizeof(word))
+ word = 0;
+ }
if (word == magic.cmp)
return PAGE_SIZE;
}
diff --git a/include/linux/elfcore-internal.h b/include/linux/elfcore-internal.h
index f6354f0..9e9cd93 100644
--- a/include/linux/elfcore-internal.h
+++ b/include/linux/elfcore-internal.h
@@ -38,9 +38,9 @@ extern struct vm_area_struct *first_vma(struct task_struct *tsk,
struct vm_area_struct *gate_vma);
extern struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
struct vm_area_struct *gate_vma);
-extern unsigned long vma_dump_size(struct vm_area_struct *vma,
+extern unsigned long vma_dump_size(struct task_struct *task,
+ struct vm_area_struct *vma,
unsigned long mm_flags);

#endif
#endif
-

2013-10-04 10:31:18

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 03/19] Make fill_psinfo generic

From:Suzuki K. Poulose <[email protected]>

fill_psinfo() fills the NT_PRPSINFO note for the core. NT_PRPSINFO stores the
command line of the process, which is stored at from mm->arg_start. Make
fill_psinfo reusable by supporting other tasks. Use access_process_vm() to read
the command line args for non-current task.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/binfmt_elf.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 107bbd3..4eee017 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1222,9 +1222,21 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
len = mm->arg_end - mm->arg_start;
if (len >= ELF_PRARGSZ)
len = ELF_PRARGSZ-1;
- if (copy_from_user(&psinfo->pr_psargs,
- (const char __user *)mm->arg_start, len))
- return -EFAULT;
+ /*
+ * If we are dumping core for "another task"
+ * we can't use copy_from_user().
+ */
+ if (p->mm == current->mm) {
+ if (copy_from_user(&psinfo->pr_psargs,
+ (const char __user *)mm->arg_start, len))
+ return -EFAULT;
+ } else {
+ int bytes = access_process_vm(p, mm->arg_start,
+ &psinfo->pr_psargs, len, 0);
+ if (bytes != len)
+ return -EFAULT;
+ }
+
for(i = 0; i < len; i++)
if (psinfo->pr_psargs[i] == 0)
psinfo->pr_psargs[i] = ' ';

2013-10-04 10:31:23

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 04/19] Rename compat versions of the reusable core generation routines

From:Suzuki K. Poulose <[email protected]>

Rename the ELF class specific functions reusable for the application core dump
infrastructure. The compat ELF class routines are prepended with compat_ .

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/compat_binfmt_elf.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c
index a81147e..9a288aa 100644
--- a/fs/compat_binfmt_elf.c
+++ b/fs/compat_binfmt_elf.c
@@ -134,6 +134,13 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
#define init_elf_binfmt init_compat_elf_binfmt
#define exit_elf_binfmt exit_compat_elf_binfmt

+/* Rename the functions that may be reused */
+#define fill_elf_header compat_fill_elf_header
+#define fill_psinfo compat_fill_psinfo
+#define fill_prstatus compat_fill_prstatus
+#define fill_extnum_info compat_fill_extnum_info
+#define fill_auxv_note compat_fill_auxv_note
+
/*
* We share all the actual code with the native (64-bit) version.
*/

2013-10-04 10:31:33

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 05/19] Export the reusable ELF core generation routines

From:Suzuki K. Poulose <[email protected]>

Export the ELF class specific core generation helper functions via
elfcore-internal.h

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/binfmt_elf.c | 10 +++++-----
include/linux/elfcore-internal.h | 28 +++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 4eee017..3f13203 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1137,7 +1137,7 @@ static int writenote(struct memelfnote *men, struct file *file,
}
#undef DUMP_WRITE

-static void fill_elf_header(struct elfhdr *elf, int segs,
+void fill_elf_header(struct elfhdr *elf, int segs,
u16 machine, u32 flags)
{
memset(elf, 0, sizeof(*elf));
@@ -1177,7 +1177,7 @@ static void fill_elf_note_phdr(struct elf_phdr *phdr, int sz, loff_t offset)
* fill up all the fields in prstatus from the given task struct, except
* registers which need to be filled up separately.
*/
-static void fill_prstatus(struct elf_prstatus *prstatus,
+void fill_prstatus(struct elf_prstatus *prstatus,
struct task_struct *p, long signr)
{
prstatus->pr_info.si_signo = prstatus->pr_cursig = signr;
@@ -1210,7 +1210,7 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
cputime_to_timeval(p->signal->cstime, &prstatus->pr_cstime);
}

-static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
+int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
struct mm_struct *mm)
{
const struct cred *cred;
@@ -1265,7 +1265,7 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
return 0;
}

-static void fill_auxv_note(struct memelfnote *note, struct mm_struct *mm)
+void fill_auxv_note(struct memelfnote *note, struct mm_struct *mm)
{
elf_addr_t *auxv = (elf_addr_t *) mm->saved_auxv;
int i = 0;
@@ -1858,7 +1858,7 @@ static void free_note_info(struct elf_note_info *info)

#endif

-static void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
+void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
elf_addr_t e_shoff, int segs)
{
elf->e_shoff = e_shoff;
diff --git a/include/linux/elfcore-internal.h b/include/linux/elfcore-internal.h
index 9e9cd93..8c5323a 100644
--- a/include/linux/elfcore-internal.h
+++ b/include/linux/elfcore-internal.h
@@ -14,7 +14,7 @@
#define __ELF_CORE_INTERNAL_H

#ifdef __KERNEL__
-
+#include <linux/elfcore.h>
/* An ELF note in memory */
struct memelfnote
{
@@ -42,5 +42,31 @@ extern unsigned long vma_dump_size(struct task_struct *task,
struct vm_area_struct *vma,
unsigned long mm_flags);

+/* Core dump generation routines for native ELF class */
+extern void fill_elf_header(struct elfhdr *elf, int segs,
+ u16 machine, u32 flags);
+extern void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr,
+ elf_addr_t e_shoff, int segs);
+extern void fill_prstatus(struct elf_prstatus *prstatus,
+ struct task_struct *p, long signr);
+extern int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
+ struct mm_struct *mm);
+extern void fill_auxv_note(struct memelfnote *note, struct mm_struct *mm);
+
+/* compat ELF class variant of the above helpers*/
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+#include <linux/elfcore-compat.h>
+
+extern void compat_fill_extnum_info(struct elf32_hdr *elf, struct elf32_shdr *shdr,
+ Elf32_Addr e_shoff, int segs);
+extern void compat_fill_prstatus(struct compat_elf_prstatus *prstatus,
+ struct task_struct *p, long signr);
+extern int compat_fill_psinfo(struct compat_elf_prpsinfo *psinfo, struct task_struct *p,
+ struct mm_struct *mm);
+extern void compat_fill_auxv_note(struct memelfnote *note, struct mm_struct *mm);
+extern void compat_fill_elf_header(struct elf32_hdr *elf, int segs,
+ u16 machine, u32 flags);
+#endif /* CONFIG_COMPAT_BINFMT_ELF */
+
#endif
#endif

2013-10-04 10:31:42

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 06/19] Define API for reading arch specif Program Headers for Core

From:Suzuki K. Poulose <[email protected]>

The binfmt ELF defines APIs for the "arch" specific Program headers to be
written to the ELF core. Define the "read" (or copy) variants of the APIs, to
collect it for the application core dump(which is a read based approach).

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
include/linux/elfcore.h | 3 +++
kernel/elfcore.c | 6 ++++++
2 files changed, 9 insertions(+)

diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
index cdd3d13..cf13b45 100644
--- a/include/linux/elfcore.h
+++ b/include/linux/elfcore.h
@@ -66,6 +66,9 @@ extern int
elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
unsigned long limit);
extern int
+elf_core_copy_extra_phdrs(char *buf, loff_t offset, size_t *size,
+ unsigned long limit);
+extern int
elf_core_write_extra_data(struct file *file, size_t *size, unsigned long limit);
extern size_t elf_core_extra_data_size(void);

diff --git a/kernel/elfcore.c b/kernel/elfcore.c
index ff915ef..f422c6c 100644
--- a/kernel/elfcore.c
+++ b/kernel/elfcore.c
@@ -16,6 +16,12 @@ int __weak elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *
return 1;
}

+int __weak elf_core_copy_extra_phdrs(char *buf, loff_t offset, size_t *size,
+ unsigned long limit)
+{
+ return 1;
+}
+
int __weak elf_core_write_extra_data(struct file *file, size_t *size,
unsigned long limit)
{

2013-10-04 10:31:47

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 07/19] ia64 impelementation for elf_core_copy_extra_phdrs()

From:Suzuki K. Poulose <[email protected]>

ia64 impelementation for elf_core_copy_extra_phdrs(). Adapted from
elf_core_write_extra_phdrs().

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
arch/ia64/kernel/elfcore.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/arch/ia64/kernel/elfcore.c b/arch/ia64/kernel/elfcore.c
index bac1639..64c8acf 100644
--- a/arch/ia64/kernel/elfcore.c
+++ b/arch/ia64/kernel/elfcore.c
@@ -42,6 +42,40 @@ int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
return 1;
}

+int elf_core_copy_extra_phdrs(char *bufp, loff_t offset, size_t *size,
+ unsigned long limit)
+{
+ const struct elf_phdr *const gate_phdrs =
+ (const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
+ int i;
+ Elf64_Off ofs = 0;
+ struct elf_phdr *phdr = (struct elf_phdr *) bufp;
+
+ for (i = 0; i < GATE_EHDR->e_phnum; ++i) {
+ *size += sizeof(*phdr);
+ if (*size > limit)
+ return 0;
+
+ *phdr = gate_phdrs[i];
+
+ if (phdr->p_type == PT_LOAD) {
+ phdr->p_memsz = PAGE_ALIGN(phdr->p_memsz);
+ phdr->p_filesz = phdr->p_memsz;
+ if (ofs == 0) {
+ ofs = phdr->p_offset = *offset;
+ offset += phdr->p_filesz;
+ } else {
+ phdr->p_offset = ofs;
+ }
+ } else {
+ phdr->p_offset += ofs;
+ }
+ phdr->p_paddr = 0; /* match other core phdrs */
+ phdr++;
+ }
+ return 1;
+}
+
int elf_core_write_extra_data(struct file *file, size_t *size,
unsigned long limit)
{

2013-10-04 10:32:08

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 11/19] Check if the process is an ELF executable

From:Suzuki K. Poulose <[email protected]>

Validate if the process is an ELF exec. This will be later extended to identify
if the task is a native ELF or a compat ELF.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/proc/gencore.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/fs/proc/gencore.c b/fs/proc/gencore.c
index 580a8b5..5f56910 100644
--- a/fs/proc/gencore.c
+++ b/fs/proc/gencore.c
@@ -22,6 +22,7 @@
* Suzuki K. Poulose <[email protected]>
*/

+#include <linux/elf.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
#include "internal.h"
@@ -85,6 +86,29 @@ static int release_gencore(struct inode *inode, struct file *file)
return 0;
}

+ /*
+ * Determine whether the task is an ELF executable.
+ * Returns
+ * < 0 - Non-ELF
+ * 0 - Native ELF Executable
+ */
+static int get_elf_class(struct task_struct *task)
+{
+ struct elfhdr hdr;
+ int ret = 0;
+ ret = kernel_read(task->mm->exe_file, 0, (char*)&hdr, sizeof(hdr));
+ if (ret < 0)
+ return ret;
+
+ /* Verify the ELF magic on the exe_file */
+ if (memcmp(hdr.e_ident, ELFMAG, SELFMAG))
+ return -EINVAL;
+ if (elf_check_arch(&hdr))
+ return 0;
+
+ return -EINVAL;
+}
+
/*
* Validate if the call is valid. We also need to prevent >1 open
* of the same file.
@@ -93,10 +117,15 @@ static int open_gencore(struct inode *inode, struct file *filp)
{
struct task_struct *task = get_proc_task(inode);
struct core_proc *cp;
+ int elf_class;
int ret = 0;
if (!task)
return -ENOENT;

+ elf_class = get_elf_class(task);
+ if (elf_class < 0)
+ return elf_class;
+
mutex_lock(&core_mutex);
cp = get_core_proc(task);
mutex_unlock(&core_mutex);

2013-10-04 10:32:12

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 10/19] Track the core generation requests

From:Suzuki K.Poulose <[email protected]>

Keep track of the core generation requests. Concurrent core generation requests
for the same target process are not allowed.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N.Mavinakayanahalli <[email protected]>
---
fs/proc/gencore.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/proc/gencore.h | 12 +++++++
2 files changed, 96 insertions(+), 2 deletions(-)
create mode 100644 fs/proc/gencore.h

diff --git a/fs/proc/gencore.c b/fs/proc/gencore.c
index 115f1e4..580a8b5 100644
--- a/fs/proc/gencore.c
+++ b/fs/proc/gencore.c
@@ -23,23 +23,105 @@
*/

#include <linux/seq_file.h>
+#include <linux/slab.h>
#include "internal.h"
+#include "gencore.h"
+
+static LIST_HEAD(core_list);
+static DEFINE_MUTEX(core_mutex);
+
+/* Called with core_mutex held */
+static struct core_proc* get_core_proc(struct task_struct *t)
+{
+ struct core_proc *cp;
+
+ list_for_each_entry(cp, &core_list, list) {
+ if (cp->task == t->group_leader)
+ return cp;
+ }
+ return NULL;
+}

static ssize_t read_gencore(struct file *file, char __user *buffer,
size_t buflen, loff_t *fpos)
{
- return 0;
+ struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct core_proc *cp;
+ ssize_t ret = 0;
+
+ if (!task)
+ return -EIO;
+
+ mutex_lock(&core_mutex);
+ cp = get_core_proc(task);
+ if (!cp) {
+ mutex_unlock(&core_mutex);
+ ret = -EIO;
+ goto out;
+ }
+ mutex_unlock(&core_mutex);
+
+out:
+ put_task_struct(task);
+ return ret;
}

static int release_gencore(struct inode *inode, struct file *file)
{
+ struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct core_proc *cp;
+
+ if (!task)
+ return -EIO;
+
+ mutex_lock(&core_mutex);
+ cp = get_core_proc(task);
+ if (cp) {
+ list_del(&cp->list);
+ kfree(cp);
+ }
+ mutex_unlock(&core_mutex);
+ put_task_struct(task);
return 0;
}

+/*
+ * Validate if the call is valid. We also need to prevent >1 open
+ * of the same file.
+ */
static int open_gencore(struct inode *inode, struct file *filp)
{
- return 0;
+ struct task_struct *task = get_proc_task(inode);
+ struct core_proc *cp;
+ int ret = 0;
+ if (!task)
+ return -ENOENT;
+
+ mutex_lock(&core_mutex);
+ cp = get_core_proc(task);
+ mutex_unlock(&core_mutex);
+ if (cp) {
+ ret = -EALREADY;
+ goto out;
+ }
+
+ cp = kzalloc(sizeof (*cp), GFP_KERNEL);
+ if (!cp) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ cp->task = task->group_leader;
+ INIT_LIST_HEAD(&cp->list);
+ mutex_lock(&core_mutex);
+ list_add(&cp->list, &core_list);
+ mutex_unlock(&core_mutex);
+
+out:
+ put_task_struct(task);
+ return ret;
}
+
const struct file_operations proc_gen_core_operations = {
.open = open_gencore,
.read = read_gencore,
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
new file mode 100644
index 0000000..c98fddf
--- /dev/null
+++ b/fs/proc/gencore.h
@@ -0,0 +1,12 @@
+#ifndef __GEN_CORE_H
+#define __GEN_CORE_H
+
+#include <linux/list.h>
+#include <linux/sched.h>
+
+struct core_proc {
+ struct list_head list;
+ struct task_struct *task;
+};
+
+#endif

2013-10-04 10:32:26

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 13/19] Create ELF Header

From:Suzuki K. Poulose <[email protected]>

Build the ELF header on the fly for the very first read request. The ELF Header,
Program Headers are stored in a buffer for processing future read() requests.

gencore-elf.c contains the ELF class specific functions.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N.Mavinakayanahalli <[email protected]>
---
fs/proc/Makefile | 2 -
fs/proc/gencore-elf.c | 181 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/proc/gencore.h | 7 ++
3 files changed, 189 insertions(+), 1 deletion(-)
create mode 100644 fs/proc/gencore-elf.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index a456c22..d231ac3 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -23,7 +23,7 @@ proc-y += version.o
proc-y += softirqs.o
proc-y += namespaces.o
proc-y += self.o
-proc-$(CONFIG_ELF_CORE) += gencore.o
+proc-$(CONFIG_ELF_CORE) += gencore.o gencore-elf.o
proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o
diff --git a/fs/proc/gencore-elf.c b/fs/proc/gencore-elf.c
new file mode 100644
index 0000000..0a245f0
--- /dev/null
+++ b/fs/proc/gencore-elf.c
@@ -0,0 +1,181 @@
+/*
+ * Application core dump - ELF Class specific code
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010
+ *
+ * Authors:
+ * Suzuki K. Poulose <[email protected]>
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/freezer.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+#include <linux/elfcore-internal.h>
+
+#ifdef CORE_DUMP_USE_REGSET
+#include <linux/regset.h>
+#endif
+
+#include "gencore.h"
+
+static void get_elfhdr_size(struct core_proc *cp)
+{
+ struct vm_area_struct *gate_vma;
+ int segs;
+
+ segs = cp->task->mm->map_count;
+ segs += elf_core_extra_phdrs();
+
+ gate_vma = get_gate_vma(cp->task->mm);
+ if (gate_vma != NULL)
+ segs++;
+
+ /* One phdr for PT_NOTE */
+ segs++;
+
+ cp->nphdrs = segs;
+ cp->elf_buflen = sizeof(struct elfhdr) +
+ (cp->nphdrs * sizeof(struct elf_phdr));
+ cp->elf_buflen = roundup(cp->elf_buflen, ELF_EXEC_PAGESIZE);
+
+ return;
+}
+
+/*
+ * Fill the elf_hdr and the phdrs
+ * Returns 0 on success. On error, returns errno
+ */
+static int create_elf_header(struct core_proc *cp)
+{
+ struct elfhdr *elf = (struct elfhdr *)cp->elf_buf;
+ struct elf_phdr *note;
+ struct vm_area_struct *vma, *gate_vma = get_gate_vma(cp->task->mm);
+ char *bufp;
+ off_t dataoff, offset;
+ short e_phnum = (cp->nphdrs > PN_XNUM ? PN_XNUM : cp->nphdrs);
+ size_t exphdrs_sz = 0;
+ unsigned long limit = elf_core_extra_phdrs() * sizeof(struct elf_phdr);
+
+#ifdef CORE_DUMP_USE_REGSET
+ const struct user_regset_view *view = task_user_regset_view(cp->task);
+
+ fill_elf_header(elf, e_phnum, view->e_machine, view->e_flags);
+#else
+ fill_elf_header(elf, e_phnum, ELF_ARCH, ELF_CORE_EFLAGS);
+#endif
+ offset = sizeof(struct elfhdr);
+ bufp = cp->elf_buf + offset;
+ dataoff = offset + (cp->nphdrs * sizeof(struct elf_phdr));
+
+ /* Setup ELF PT_NOTE */
+ note = (struct elf_phdr*)bufp;
+ bufp += sizeof(struct elf_phdr);
+ offset += sizeof(struct elf_phdr);
+ note->p_type = PT_NOTE;
+ note->p_offset = dataoff;
+ note->p_vaddr = 0;
+ note->p_paddr = 0;
+ /* TODO: Needs to be populated with the size of the notes section */
+ note->p_memsz = 0;
+ note->p_flags = 0;
+ note->p_align = 0;
+
+ dataoff = cp->elf_buflen;
+
+ /* Write the phdrs for memory segments */
+ down_read(&cp->task->mm->mmap_sem);
+ for (vma = first_vma(cp->task, gate_vma); vma != NULL;
+ vma = next_vma(vma, gate_vma)) {
+ struct elf_phdr *phdr = (struct elf_phdr *)bufp;
+
+ bufp += sizeof(struct elf_phdr);
+ offset += sizeof(struct elf_phdr);
+
+ phdr->p_type = PT_LOAD;
+ phdr->p_offset = dataoff;
+ phdr->p_vaddr = vma->vm_start;
+ phdr->p_paddr = 0;
+ phdr->p_filesz = vma_dump_size(cp->task, vma, cp->task->mm->flags);
+ phdr->p_memsz = vma->vm_end - vma->vm_start;
+ phdr->p_flags = (vma->vm_flags & VM_READ) ? PF_R : 0;
+ if (vma->vm_flags & VM_WRITE)
+ phdr->p_flags |= PF_W;
+ if (vma->vm_flags & VM_EXEC)
+ phdr->p_flags |= PF_X;
+ phdr->p_align = ELF_EXEC_PAGESIZE;
+ if (!(phdr->p_flags & (PF_R | PF_W | PF_X)))
+ phdr->p_filesz=0;
+ dataoff += phdr->p_filesz;
+ }
+ up_read(&cp->task->mm->mmap_sem);
+
+ if (!elf_core_copy_extra_phdrs(bufp, dataoff, &exphdrs_sz, limit))
+ return -EIO;
+ bufp += exphdrs_sz;
+ dataoff += elf_core_extra_data_size();
+
+ if (e_phnum == PN_XNUM) {
+ cp->shdr = kzalloc(sizeof(struct elf_shdr), GFP_KERNEL);
+ if (!cp->shdr)
+ return -ENOMEM;
+ fill_extnum_info(elf, (struct elf_shdr *)cp->shdr,
+ dataoff, cp->nphdrs);
+ dataoff += sizeof(struct elf_shdr);
+ }
+
+ return 0;
+}
+
+ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
+ size_t buflen, loff_t *fpos)
+{
+ ssize_t ret = 0;
+
+ if (!cp->elf_buf) {
+ get_elfhdr_size(cp);
+
+ cp->elf_buf = kzalloc(cp->elf_buflen, GFP_KERNEL);
+ if (!cp->elf_buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = create_elf_header(cp);
+ if (ret < 0)
+ goto out;
+ }
+
+ if (*fpos < cp->elf_buflen) {
+ size_t bcp = cp->elf_buflen - *fpos;
+
+ bcp = (bcp < buflen) ? bcp : buflen;
+ if (copy_to_user(buffer, (cp->elf_buf + *fpos), bcp)) {
+ ret = -EFAULT;
+ goto out;
+ } else {
+ ret = bcp;
+ *fpos += bcp;
+ buflen -= bcp;
+ buffer += bcp;
+ }
+ }
+out:
+ return ret;
+}
+
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
index 1a88e24..6c1d57c 100644
--- a/fs/proc/gencore.h
+++ b/fs/proc/gencore.h
@@ -11,6 +11,13 @@ struct core_proc {
struct task_struct *task;
struct completion hold;
struct callback_head twork;
+ void *shdr; /* elf_shdr, in case nphdrs > PN_XNUM */
+ char *elf_buf; /* buffer for elf_hdr + phdrs + notes */
+ size_t elf_buflen; /* size of elf_buf */
+ size_t nphdrs; /* number of phdrs */
};

+extern ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
+ size_t buflen, loff_t *foffset);
+
#endif

2013-10-04 10:32:37

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 14/19] Create ELF Core notes Data

From:Suzuki K. Poulose <[email protected]>

Collect the PT_NOTE information for the core.

There are two "process wide" notes. NT_PRPSINFO and NT_AUXV. These are captured
in the core_proc structure.

Each thread gets a NT_PRSTATUS note, which will contain the GPR contents. A
thread may have additional notes depending on the other register sets used by it.
Uses struct elf_thread_core_info to capture the thread specific information.

fill_thread_core_info() fills in the notes for a thread.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N. Mavinakayanahalli <[email protected]>
---
fs/proc/gencore-elf.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++-
fs/proc/gencore.c | 56 +++++++++++++++
fs/proc/gencore.h | 28 ++++++++
3 files changed, 260 insertions(+), 4 deletions(-)

diff --git a/fs/proc/gencore-elf.c b/fs/proc/gencore-elf.c
index 0a245f0..6e97f6a 100644
--- a/fs/proc/gencore-elf.c
+++ b/fs/proc/gencore-elf.c
@@ -34,6 +34,157 @@

#include "gencore.h"

+static int notesize(struct memelfnote *men)
+{
+ int size = sizeof(struct elf_note);
+
+ size += roundup(strlen(men->name) + 1, 4);
+ size += roundup(men->datasz, 4);
+
+ return size;
+}
+
+/* Store the note in the header buffer */
+static char *storenote(struct memelfnote *men, char *bufp)
+{
+ struct elf_note *en = (struct elf_note *)bufp;
+
+ en->n_namesz = strlen(men->name) + 1;
+ en->n_descsz = men->datasz;
+ en->n_type = men->type;
+ bufp = (char *) (en + 1);
+
+ memcpy(bufp, men->name, en->n_namesz);
+ bufp = (char *) roundup((unsigned long)bufp + en->n_namesz, 4);
+
+ memcpy(bufp, men->data, men->datasz);
+ bufp = (char *) roundup((unsigned long)bufp + men->datasz, 4);
+
+ return bufp;
+}
+
+#ifdef CORE_DUMP_USE_REGSET
+static void do_thread_regset_writeback(struct task_struct *task,
+ const struct user_regset *regset)
+{
+ if (regset->writeback)
+ regset->writeback(task, regset, 1);
+}
+
+static int fill_thread_core_info(struct elf_thread_core_info *tinfo,
+ struct core_proc *cp)
+{
+ unsigned int i;
+ const struct user_regset_view *view = task_user_regset_view(tinfo->task);
+
+ fill_prstatus(&tinfo->prstatus, tinfo->task, 0);
+
+ do_thread_regset_writeback(tinfo->task, &view->regsets[0]);
+ (void) view->regsets[0].get(tinfo->task, &view->regsets[0],
+ 0, sizeof(tinfo->prstatus.pr_reg),
+ &tinfo->prstatus.pr_reg, NULL);
+ fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
+ sizeof(tinfo->prstatus), &tinfo->prstatus);
+ cp->notes_size += notesize(&tinfo->notes[0]);
+ tinfo->num_notes = view->n;
+
+ for (i = 1; i < view->n; i++) {
+ const struct user_regset *regset = &view->regsets[i];
+
+ do_thread_regset_writeback(tinfo->task, regset);
+ if (regset->core_note_type &&
+ (!regset->active || regset->active(tinfo->task, regset))) {
+ int ret;
+ size_t size = regset->n * regset->size;
+ void *data = kzalloc(size, GFP_KERNEL);
+ if (!unlikely(data))
+ return 0;
+ ret = regset->get(tinfo->task, regset,
+ 0, size, data, NULL);
+ if (unlikely(ret))
+ kfree(data);
+ else {
+ if (regset->core_note_type != NT_PRFPREG)
+ fill_note(&tinfo->notes[i], "LINUX",
+ regset->core_note_type,
+ size, data);
+ else {
+ tinfo->prstatus.pr_fpvalid = 1;
+ fill_note(&tinfo->notes[i], "CORE",
+ NT_PRFPREG, size, data);
+ }
+ cp->notes_size += notesize(&tinfo->notes[i]);
+ }
+ }
+ }
+ return 1;
+}
+#else
+static int fill_thread_core_info(struct elf_thread_core_info *tinfo,
+ struct core_proc *cp)
+{
+ elf_fpregset_t fpu, *pfpu;
+#ifdef ELF_CORE_COPY_XFPREGS
+ elf_fpxregset_t xfpu, *pxfpu;
+#endif
+
+ fill_prstatus(&tinfo->prstatus, t->task, 0);
+ elf_core_copy_task_regs(t->task, &tinfo->prstatus.pr_reg);
+ fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
+ sizeof(t->prstatus), &t->prstatus);
+ cp->notes_size += notesize(&tinfo->notes[0]);
+ tinfo->num_notes = 1;
+
+ if (tinfo->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(tinfo->task,
+ NULL, &fpu)) {
+ pfpu = kzalloc(sizeof(*pfpu), GFP_KERNEL);
+ if (pfpu == NULL)
+ return 0;
+ memcpy(pfpu, &fpu, sizeof(fpu));
+ fill_note(&tinfo->notes[tinfo->num_notes], "CORE", NT_PRFPREG,
+ sizeof(*pfpu), pfpu);
+ cp->notes_size += notesize(&tinfo->notes[tinfo->num_notes]);
+ tinfo->num_notes++;
+ }
+#ifdef ELF_CORE_COPY_XFPREGS
+ if (elf_core_copy_task_xfpregs(tinfo->task, &xfpu)) {
+ pxfpu = kzalloc(sizeof(*pxfpu), GFP_KERNEL);
+ if (!pxfpu)
+ return 0;
+ memcpy(pxfpu, &xfpu, sizeof(xfpu));
+ fill_note(&tinfo->notes[tinfo->num_notes], "LINUX",
+ ELF_CORE_XFPREG_TYPE, sizeof(*pxfpu), pxfpu);
+ cp->notes_size += notesize(&tinfo->notes[tinfo->num_notes]);
+ tinfo->num_notes++;
+ }
+#endif
+ return 1;
+}
+#endif
+
+/* Returns 0 on error, 1 on success */
+static int collect_notes(struct core_proc *cp)
+{
+ struct elf_thread_core_info *tinfo;
+
+ /* Fill the 2 process wide notes */
+ fill_psinfo(&cp->prpsinfo, cp->task, cp->task->mm);
+ fill_note(&cp->psinfo, "CORE", NT_PRPSINFO,
+ sizeof(struct elf_prpsinfo), &cp->prpsinfo);
+ cp->notes_size += notesize(&cp->psinfo);
+
+ fill_auxv_note(&cp->auxv, cp->task->mm);
+ cp->notes_size += notesize(&cp->auxv);
+
+ tinfo = cp->tinfo;
+ while (tinfo != NULL) {
+ if (!fill_thread_core_info(tinfo, cp))
+ return 0;
+ tinfo = tinfo->next;
+ }
+ return 1;
+}
+
static void get_elfhdr_size(struct core_proc *cp)
{
struct vm_area_struct *gate_vma;
@@ -51,7 +202,7 @@ static void get_elfhdr_size(struct core_proc *cp)

cp->nphdrs = segs;
cp->elf_buflen = sizeof(struct elfhdr) +
- (cp->nphdrs * sizeof(struct elf_phdr));
+ (cp->nphdrs * sizeof(struct elf_phdr)) + cp->notes_size;
cp->elf_buflen = roundup(cp->elf_buflen, ELF_EXEC_PAGESIZE);

return;
@@ -66,11 +217,13 @@ static int create_elf_header(struct core_proc *cp)
struct elfhdr *elf = (struct elfhdr *)cp->elf_buf;
struct elf_phdr *note;
struct vm_area_struct *vma, *gate_vma = get_gate_vma(cp->task->mm);
+ struct elf_thread_core_info *tinfo;
char *bufp;
off_t dataoff, offset;
short e_phnum = (cp->nphdrs > PN_XNUM ? PN_XNUM : cp->nphdrs);
size_t exphdrs_sz = 0;
unsigned long limit = elf_core_extra_phdrs() * sizeof(struct elf_phdr);
+ int first = 1;

#ifdef CORE_DUMP_USE_REGSET
const struct user_regset_view *view = task_user_regset_view(cp->task);
@@ -91,7 +244,7 @@ static int create_elf_header(struct core_proc *cp)
note->p_offset = dataoff;
note->p_vaddr = 0;
note->p_paddr = 0;
- /* TODO: Needs to be populated with the size of the notes section */
+ note->p_filesz = cp->notes_size;
note->p_memsz = 0;
note->p_flags = 0;
note->p_align = 0;
@@ -138,6 +291,22 @@ static int create_elf_header(struct core_proc *cp)
dataoff, cp->nphdrs);
dataoff += sizeof(struct elf_shdr);
}
+ /* Store the notes */
+ tinfo = cp->tinfo;
+ do {
+ int i;
+
+ bufp = storenote(&tinfo->notes[0], bufp);
+ if (first) {
+ bufp = storenote(&cp->psinfo, bufp);
+ bufp = storenote(&cp->auxv, bufp);
+ }
+ for (i = 1; i < tinfo->num_notes; i++)
+ if (tinfo->notes[i].data != NULL)
+ bufp = storenote(&tinfo->notes[i], bufp);
+ first = 0;
+ tinfo = tinfo->next;
+ } while (tinfo != NULL);

return 0;
}
@@ -147,6 +316,13 @@ ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
{
ssize_t ret = 0;

+ if (!cp->notes_size) {
+ if (!collect_notes(cp)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+
if (!cp->elf_buf) {
get_elfhdr_size(cp);

diff --git a/fs/proc/gencore.c b/fs/proc/gencore.c
index d741f18..463bedd 100644
--- a/fs/proc/gencore.c
+++ b/fs/proc/gencore.c
@@ -29,7 +29,7 @@
#include <linux/slab.h>
#include "internal.h"
#include "gencore.h"
-
+#include <linux/sched.h>
static LIST_HEAD(core_list);
static DEFINE_MUTEX(core_mutex);

@@ -69,6 +69,39 @@ out:
return ret;
}

+static void free_notes_data(struct elf_thread_core_info *tinfo)
+{
+ int i;
+
+ for (i = 1; i < tinfo->num_notes; i++)
+ if (tinfo->notes[i].data) {
+ kfree(tinfo->notes[i].data);
+ tinfo->notes[i].data = NULL;
+ }
+}
+
+static void cleanup_cp(struct core_proc *cp)
+{
+ struct elf_thread_core_info *tmp, *tinfo = cp->tinfo;
+
+ mutex_lock(&core_mutex);
+ list_del(&cp->list);
+ mutex_unlock(&core_mutex);
+
+ if (tinfo) {
+ do {
+ tmp = tinfo;
+ tinfo = tinfo->next;
+ free_notes_data(tmp);
+ kfree(tmp);
+ } while (tinfo != NULL);
+ }
+ if (cp->shdr)
+ kfree(cp->shdr);
+ kfree(cp->elf_buf);
+ kfree(cp);
+}
+
static void gencore_work(struct callback_head *open_work)
{
/* TODO A method to know when all the threads have reached here */
@@ -143,7 +176,8 @@ static int open_gencore(struct inode *inode, struct file *filp)
struct task_struct *task = get_proc_task(inode);
struct core_proc *cp;
struct task_struct *t;
- int elf_class;
+ struct elf_thread_core_info *tinfo = NULL;
+ int elf_class, max_regset, i;
int ret = 0;
if (!task)
return -ENOENT;
@@ -171,12 +205,30 @@ static int open_gencore(struct inode *inode, struct file *filp)
mutex_lock(&core_mutex);
list_add(&cp->list, &core_list);
mutex_unlock(&core_mutex);
+ max_regset = get_max_regsets(task);
+
+ for (i = 0; i < get_nr_threads(task); i++) {
+ tinfo = kzalloc(offsetof(struct elf_thread_core_info,
+ notes[max_regset]), GFP_KERNEL);
+ if (unlikely(!tinfo)) {
+ cleanup_cp(cp);
+ ret = -ENOMEM;
+ goto out;
+ }
+ tinfo->next = cp->tinfo;
+ cp->tinfo = tinfo;
+ }
+
init_completion(&cp->hold);
/* Adding the work for all the threads except current */
t = cp->task;
init_task_work(&cp->twork, gencore_work);
read_lock(&tasklist_lock);
do {
+ if (tinfo) {
+ tinfo->task = t;
+ tinfo = tinfo->next;
+ }
if (t != current)
task_work_add(t, &cp->twork, true);
} while_each_thread(cp->task, t);
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
index 6c1d57c..e508417 100644
--- a/fs/proc/gencore.h
+++ b/fs/proc/gencore.h
@@ -5,6 +5,16 @@
#include <linux/list.h>
#include <linux/sched.h>
#include <linux/completion.h>
+#include <linux/elfcore.h>
+#include <linux/elfcore-internal.h>
+
+struct elf_thread_core_info {
+ unsigned short num_notes; /* Number of notes for this thread */
+ struct elf_thread_core_info *next;
+ struct task_struct *task;
+ struct elf_prstatus prstatus;
+ struct memelfnote notes[0];
+};

struct core_proc {
struct list_head list;
@@ -13,10 +23,28 @@ struct core_proc {
struct callback_head twork;
void *shdr; /* elf_shdr, in case nphdrs > PN_XNUM */
char *elf_buf; /* buffer for elf_hdr + phdrs + notes */
+ struct elf_thread_core_info *tinfo;
+ struct memelfnote psinfo;
+ struct memelfnote auxv;
+ struct elf_prpsinfo prpsinfo;
size_t elf_buflen; /* size of elf_buf */
size_t nphdrs; /* number of phdrs */
+ size_t notes_size;
};

+#ifdef CORE_DUMP_USE_REGSET
+#include <linux/regset.h>
+
+static inline int get_max_regsets(struct task_struct *task)
+{
+ const struct user_regset_view *view = task_user_regset_view(task);
+ return view->n;
+}
+
+#else
+#define get_max_regsets(task) 3 /* GPR, FP, XFP? */
+#endif
+
extern ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
size_t buflen, loff_t *foffset);

2013-10-04 10:32:44

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 15/19] Calculate the size of the core file

From:Suzuki K. Poulose <[email protected]>

Calculate the size of the core file

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N. Mavinakayanahalli <[email protected]>
---
fs/proc/gencore-elf.c | 6 ++++++
fs/proc/gencore.h | 1 +
2 files changed, 7 insertions(+)

diff --git a/fs/proc/gencore-elf.c b/fs/proc/gencore-elf.c
index 6e97f6a..efd1722 100644
--- a/fs/proc/gencore-elf.c
+++ b/fs/proc/gencore-elf.c
@@ -291,6 +291,9 @@ static int create_elf_header(struct core_proc *cp)
dataoff, cp->nphdrs);
dataoff += sizeof(struct elf_shdr);
}
+
+ cp->size = dataoff;
+
/* Store the notes */
tinfo = cp->tinfo;
do {
@@ -351,6 +354,9 @@ ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
buffer += bcp;
}
}
+ if (*fpos > cp->size)
+ goto out;
+
out:
return ret;
}
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
index e508417..f6822f2 100644
--- a/fs/proc/gencore.h
+++ b/fs/proc/gencore.h
@@ -30,6 +30,7 @@ struct core_proc {
size_t elf_buflen; /* size of elf_buf */
size_t nphdrs; /* number of phdrs */
size_t notes_size;
+ size_t size;
};

#ifdef CORE_DUMP_USE_REGSET

2013-10-04 10:33:02

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 16/19] Generate the data sections for ELF Core

From:Suzuki K. Poulose <[email protected]>

Generate the "data" for the memory regions. Also write down the section header
if we have, number of phdrs > PN_XNUM.

The vma areas are read, page by page using access_process_vm() without an
mmap_sem. If there are active threads, then we may miss a vma if it is removed
while we are doing the read.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N. Mavinakayanahalli <[email protected]>
---
fs/proc/gencore-elf.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 88 insertions(+), 2 deletions(-)

diff --git a/fs/proc/gencore-elf.c b/fs/proc/gencore-elf.c
index efd1722..244e5bb 100644
--- a/fs/proc/gencore-elf.c
+++ b/fs/proc/gencore-elf.c
@@ -314,10 +314,29 @@ static int create_elf_header(struct core_proc *cp)
return 0;
}

+/*
+ * Verify if the fpos asked for in read is valid.
+ * Returns the phdr corresponding to offset, else NULL.
+ */
+static struct elf_phdr *get_pos_elfphdr(struct core_proc *cp, loff_t pos)
+{
+ struct elfhdr *elf_hdr = (struct elfhdr *)cp->elf_buf;
+ struct elf_phdr *phdr = (struct elf_phdr*)(cp->elf_buf + elf_hdr->e_phoff);
+ int i;
+
+ for (i = 0; i < cp->nphdrs; i++, phdr++) {
+ unsigned long end = phdr->p_offset + phdr->p_filesz;
+ if ((pos >= phdr->p_offset) && (pos < end) && phdr->p_filesz)
+ return phdr;
+ }
+ return NULL;
+}
+
ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
size_t buflen, loff_t *fpos)
{
- ssize_t ret = 0;
+ ssize_t ret = 0, acc = 0;
+ struct elfhdr *elf_hdr = (struct elfhdr *)cp->elf_buf;

if (!cp->notes_size) {
if (!collect_notes(cp)) {
@@ -349,13 +368,80 @@ ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
goto out;
} else {
ret = bcp;
+ acc = bcp;
*fpos += bcp;
buflen -= bcp;
buffer += bcp;
}
}
+
if (*fpos > cp->size)
- goto out;
+ goto done;
+
+ /*
+ * Read from the vma segments
+ * a. verify if the *fpos is within a phdr
+ * b. Use access_process_vm() to get data page by page
+ * c. copy_to_user into user buffer
+ */
+
+ while (buflen) {
+ size_t bufsz, offset, bytes;
+ char *readbuf;
+ struct elf_phdr *phdr = get_pos_elfphdr(cp, *fpos);
+
+ if (!phdr)
+ break;
+
+ bufsz = (buflen > PAGE_SIZE) ? PAGE_SIZE : buflen;
+ readbuf = kmalloc(bufsz, GFP_KERNEL);
+ if (!readbuf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ offset = *fpos - phdr->p_offset;
+ bytes = access_process_vm(cp->task, (phdr->p_vaddr + offset),
+ readbuf, bufsz, 0);
+ if (!bytes) {
+ ret = -EIO;
+ goto out;
+ }
+ if (copy_to_user(buffer, readbuf, bytes)) {
+ ret = -EFAULT;
+ kfree(readbuf);
+ goto out;
+ } else
+ acc += bytes;
+
+ kfree(readbuf);
+ buflen -= bytes;
+ buffer += bytes;
+ *fpos += bytes;
+ }
+
+ /* Fill extnum section header if present */
+ if (buflen &&
+ elf_hdr->e_shoff &&
+ (*fpos >= elf_hdr->e_shoff) &&
+ (*fpos < (elf_hdr->e_shoff + sizeof(struct elf_shdr)))) {
+
+ off_t offset = *fpos - elf_hdr->e_shoff;
+ size_t shdrsz = sizeof(struct elf_shdr) - offset;
+
+ shdrsz = (buflen < shdrsz) ? buflen : shdrsz;
+ if (copy_to_user(buffer, ((char *)cp->shdr) + offset, shdrsz)) {
+ ret = -EFAULT;
+ goto out;
+ } else {
+ acc += shdrsz;
+ buflen -= shdrsz;
+ buffer += shdrsz;
+ }
+ }
+
+done:
+ ret = acc;

out:
return ret;

2013-10-04 10:33:14

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 17/19] Identify the ELF class of the process

From:Suzuki K. Poulose <[email protected]>

Identify the ELF class of the process to native or compat.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/proc/gencore.c | 17 ++++++++++++++---
fs/proc/gencore.h | 3 +++
2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/proc/gencore.c b/fs/proc/gencore.c
index 463bedd..3e0db30 100644
--- a/fs/proc/gencore.c
+++ b/fs/proc/gencore.c
@@ -64,6 +64,9 @@ static ssize_t read_gencore(struct file *file, char __user *buffer,
}
mutex_unlock(&core_mutex);

+ if (cp->elf_class == ELF_CLASS_NATIVE)
+ ret = elf_read_gencore(cp, buffer, buflen, fpos);
+
out:
put_task_struct(task);
return ret;
@@ -147,8 +150,8 @@ static int release_gencore(struct inode *inode, struct file *file)
/*
* Determine whether the task is an ELF executable.
* Returns
- * < 0 - Non-ELF
- * 0 - Native ELF Executable
+ * -EINVAL - On non ELF execs
+ * Returns the ELF_CLASS_NATIVE or ELF_CLASS_COMPAT
*/
static int get_elf_class(struct task_struct *task)
{
@@ -162,7 +165,14 @@ static int get_elf_class(struct task_struct *task)
if (memcmp(hdr.e_ident, ELFMAG, SELFMAG))
return -EINVAL;
if (elf_check_arch(&hdr))
- return 0;
+ return ELF_CLASS_NATIVE;
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ {
+ struct elf32_hdr *chdr = (struct elf32_hdr *)&hdr;
+ if (compat_elf_check_arch(chdr))
+ return ELF_CLASS_COMPAT;
+ }
+#endif

return -EINVAL;
}
@@ -206,6 +216,7 @@ static int open_gencore(struct inode *inode, struct file *filp)
list_add(&cp->list, &core_list);
mutex_unlock(&core_mutex);
max_regset = get_max_regsets(task);
+ cp->elf_class = elf_class;

for (i = 0; i < get_nr_threads(task); i++) {
tinfo = kzalloc(offsetof(struct elf_thread_core_info,
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
index f6822f2..d7a5fb3 100644
--- a/fs/proc/gencore.h
+++ b/fs/proc/gencore.h
@@ -8,6 +8,8 @@
#include <linux/elfcore.h>
#include <linux/elfcore-internal.h>

+enum { ELF_CLASS_NATIVE, ELF_CLASS_COMPAT };
+
struct elf_thread_core_info {
unsigned short num_notes; /* Number of notes for this thread */
struct elf_thread_core_info *next;
@@ -31,6 +33,7 @@ struct core_proc {
size_t nphdrs; /* number of phdrs */
size_t notes_size;
size_t size;
+ unsigned char elf_class;
};

#ifdef CORE_DUMP_USE_REGSET

2013-10-04 10:33:29

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 18/19] Adding support for compat ELF class data structures

From:Suzuki K. Poulose <[email protected]>

gencore-elf.c will be reused for the compat ELF class support. Hence, define
macros to get the right member for each class.

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/proc/gencore-elf.c | 31 ++++++++++++++++++++-----------
fs/proc/gencore.h | 16 +++++++++++++++-
2 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/fs/proc/gencore-elf.c b/fs/proc/gencore-elf.c
index 244e5bb..3c6b67a 100644
--- a/fs/proc/gencore-elf.c
+++ b/fs/proc/gencore-elf.c
@@ -33,6 +33,15 @@
#endif

#include "gencore.h"
+#ifndef et_prstatus
+/* prstatus field for native elf, in elf_thread_core_info*/
+#define et_prstatus u.prstatus
+#endif
+
+#ifndef cp_prpsinfo
+/* prpsinfo field for native elf, in core_proc */
+#define cp_prpsinfo u.prpsinfo
+#endif

static int notesize(struct memelfnote *men)
{
@@ -77,14 +86,14 @@ static int fill_thread_core_info(struct elf_thread_core_info *tinfo,
unsigned int i;
const struct user_regset_view *view = task_user_regset_view(tinfo->task);

- fill_prstatus(&tinfo->prstatus, tinfo->task, 0);
+ fill_prstatus(&tinfo->et_prstatus, tinfo->task, 0);

do_thread_regset_writeback(tinfo->task, &view->regsets[0]);
(void) view->regsets[0].get(tinfo->task, &view->regsets[0],
- 0, sizeof(tinfo->prstatus.pr_reg),
- &tinfo->prstatus.pr_reg, NULL);
+ 0, sizeof(tinfo->et_prstatus.pr_reg),
+ &tinfo->et_prstatus.pr_reg, NULL);
fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
- sizeof(tinfo->prstatus), &tinfo->prstatus);
+ sizeof(tinfo->et_prstatus), &tinfo->et_prstatus);
cp->notes_size += notesize(&tinfo->notes[0]);
tinfo->num_notes = view->n;

@@ -109,7 +118,7 @@ static int fill_thread_core_info(struct elf_thread_core_info *tinfo,
regset->core_note_type,
size, data);
else {
- tinfo->prstatus.pr_fpvalid = 1;
+ tinfo->et_prstatus.pr_fpvalid = 1;
fill_note(&tinfo->notes[i], "CORE",
NT_PRFPREG, size, data);
}
@@ -128,14 +137,14 @@ static int fill_thread_core_info(struct elf_thread_core_info *tinfo,
elf_fpxregset_t xfpu, *pxfpu;
#endif

- fill_prstatus(&tinfo->prstatus, t->task, 0);
- elf_core_copy_task_regs(t->task, &tinfo->prstatus.pr_reg);
+ fill_prstatus(&tinfo->et_prstatus, t->task, 0);
+ elf_core_copy_task_regs(t->task, &tinfo->et_prstatus.pr_reg);
fill_note(&tinfo->notes[0], "CORE", NT_PRSTATUS,
- sizeof(t->prstatus), &t->prstatus);
+ sizeof(t->et_prstatus), &t->et_prstatus);
cp->notes_size += notesize(&tinfo->notes[0]);
tinfo->num_notes = 1;

- if (tinfo->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(tinfo->task,
+ if (tinfo->et_prstatus.pr_fpvalid = elf_core_copy_task_fpregs(tinfo->task,
NULL, &fpu)) {
pfpu = kzalloc(sizeof(*pfpu), GFP_KERNEL);
if (pfpu == NULL)
@@ -168,9 +177,9 @@ static int collect_notes(struct core_proc *cp)
struct elf_thread_core_info *tinfo;

/* Fill the 2 process wide notes */
- fill_psinfo(&cp->prpsinfo, cp->task, cp->task->mm);
+ fill_psinfo(&cp->cp_prpsinfo, cp->task, cp->task->mm);
fill_note(&cp->psinfo, "CORE", NT_PRPSINFO,
- sizeof(struct elf_prpsinfo), &cp->prpsinfo);
+ sizeof(struct elf_prpsinfo), &cp->cp_prpsinfo);
cp->notes_size += notesize(&cp->psinfo);

fill_auxv_note(&cp->auxv, cp->task->mm);
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
index d7a5fb3..60972a2 100644
--- a/fs/proc/gencore.h
+++ b/fs/proc/gencore.h
@@ -7,6 +7,9 @@
#include <linux/completion.h>
#include <linux/elfcore.h>
#include <linux/elfcore-internal.h>
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+#include <linux/elfcore-compat.h>
+#endif

enum { ELF_CLASS_NATIVE, ELF_CLASS_COMPAT };

@@ -15,6 +18,12 @@ struct elf_thread_core_info {
struct elf_thread_core_info *next;
struct task_struct *task;
struct elf_prstatus prstatus;
+ union {
+ struct elf_prstatus prstatus;
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ struct compat_elf_prstatus compat_prstatus;
+#endif
+ } u;
struct memelfnote notes[0];
};

@@ -28,7 +37,12 @@ struct core_proc {
struct elf_thread_core_info *tinfo;
struct memelfnote psinfo;
struct memelfnote auxv;
- struct elf_prpsinfo prpsinfo;
+ union {
+ struct elf_prpsinfo prpsinfo;
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ struct compat_elf_prpsinfo compat_prpsinfo;
+#endif
+ } u;
size_t elf_buflen; /* size of elf_buf */
size_t nphdrs; /* number of phdrs */
size_t notes_size;

2013-10-04 10:33:36

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 19/19] Compat ELF class core generation support

From:Suzuki K. Poulose <[email protected]>

Add full support for the compat ELF class objects. The gencore-compat-elf.c
reuses the gencore-elf.c, by renaming the ABI structures and functions.
(Inspired by compat_binfmt_elf.c)

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/proc/Makefile | 1 +
fs/proc/gencore-compat-elf.c | 62 ++++++++++++++++++++++++++++++++++++++++++
fs/proc/gencore.c | 4 +++
fs/proc/gencore.h | 5 +++
4 files changed, 72 insertions(+)
create mode 100644 fs/proc/gencore-compat-elf.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index d231ac3..d8e17f2 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -24,6 +24,7 @@ proc-y += softirqs.o
proc-y += namespaces.o
proc-y += self.o
proc-$(CONFIG_ELF_CORE) += gencore.o gencore-elf.o
+proc-$(CONFIG_COMPAT_BINFMT_ELF) += gencore-compat-elf.o
proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o
diff --git a/fs/proc/gencore-compat-elf.c b/fs/proc/gencore-compat-elf.c
new file mode 100644
index 0000000..8fd11be
--- /dev/null
+++ b/fs/proc/gencore-compat-elf.c
@@ -0,0 +1,62 @@
+/*
+ * Application core dump - compat ELF Class specific code
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 2111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010
+ *
+ * We use macros to rename the types and functions in gencore-elf.c o
+ * the compat versions
+ *
+ */
+
+#include <linux/elfcore-compat.h>
+#include "gencore.h"
+
+#undef elfhdr
+#undef elf_phdr
+#undef elf_shdr
+#undef elf_note
+#undef elf_addr_t
+#define elfhdr elf32_hdr
+#define elf_phdr elf32_phdr
+#define elf_shdr elf32_shdr
+#define elf_note elf32_note
+#define elf_addr_t Elf32_Addr
+
+#undef ELF_CLASS
+#define ELF_CLASS ELF_CASS32
+
+#define elf_prstatus compat_elf_prstatus
+#define elf_prpsinfo compat_elf_prpsinfo
+
+/* Functions used from binfmt_elf core code */
+#define fill_psinfo compat_fill_psinfo
+#define fill_prstatus compat_fill_prstatus
+#define fill_extnum_info compat_fill_extnum_info
+#define fill_auxv_note compat_fill_auxv_note
+#define fill_elf_header compat_fill_elf_header
+
+/*
+ * Define the fileds in elf_thread_core_info and core_proc
+ * for compat ELF class
+ */
+#define et_prstatus u.compat_prstatus
+#define cp_prpsinfo u.compat_prpsinfo
+
+/* Rename the core function to compat_ */
+#define elf_read_gencore compat_elf_read_gencore
+
+#include "gencore-elf.c"
+
diff --git a/fs/proc/gencore.c b/fs/proc/gencore.c
index 3e0db30..92be778 100644
--- a/fs/proc/gencore.c
+++ b/fs/proc/gencore.c
@@ -66,6 +66,10 @@ static ssize_t read_gencore(struct file *file, char __user *buffer,

if (cp->elf_class == ELF_CLASS_NATIVE)
ret = elf_read_gencore(cp, buffer, buflen, fpos);
+#ifdef CONFIG_COMPAT_BINFMT_ELF
+ else if (cp->elf_class == ELF_CLASS_COMPAT)
+ ret = compat_elf_read_gencore(cp, buffer, buflen, fpos);
+#endif

out:
put_task_struct(task);
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
index 60972a2..840863b 100644
--- a/fs/proc/gencore.h
+++ b/fs/proc/gencore.h
@@ -66,4 +66,9 @@ static inline int get_max_regsets(struct task_struct *task)
extern ssize_t elf_read_gencore(struct core_proc *cp, char __user *buffer,
size_t buflen, loff_t *foffset);

+#ifdef CONFIG_COMPAT_BINFMT_ELF
+extern ssize_t compat_elf_read_gencore(struct core_proc *cp, char __user *buffer,
+ size_t buflen, loff_t *foffset);
+#endif
+
#endif

2013-10-04 10:34:30

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 12/19] Hold the threads using task_work_add

From:Janani Venkataraman <[email protected]>

Hold the threads in a killable state, for collection of register set.
This was implemented in the past using the freezer system. The freezer functions in
kernel essentially help start and stop sets of tasks and this approach
exploited the existing freezer subsystem kernel interface effectively to
quiesce all the threads of the application before triggering the core dump.
This approach was not accepted due to the potential Dos attack. Also the
community discussed that "freeze" is a bit dangerous because an application
which is frozen cannot be ended and while it's frozen and there is no
information "its frozen" via usual user commands as 'ps' or 'top'.

In this method we assign work to the threads of the task, which will make
them wait in a killable state until the operation is complete, using
'completion'. Once the dump is complete we release the threads.

The following are yet to be implemented:

1) A check to ensure all the threads have entered the work that was queued
2) Handling of threads blocked in kernel.These will not reach the work
queued and hence the dump may be delayed. If the process is in a non
running state we can probably take the dump as it would not lead to
inconsistency. We would like to community views on the same.

Signed-off-by: Janani Venkataraman <[email protected] >
Signed-off-by: Suzuki K. Poulose <[email protected]>
---
fs/proc/gencore.c | 42 +++++++++++++++++++++++++++++++++++++++---
fs/proc/gencore.h | 4 ++++
2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/proc/gencore.c b/fs/proc/gencore.c
index 5f56910..d741f18 100644
--- a/fs/proc/gencore.c
+++ b/fs/proc/gencore.c
@@ -20,8 +20,10 @@
* Authors:
* Ananth N.Mavinakayanahalli <[email protected]>
* Suzuki K. Poulose <[email protected]>
+ * Janani Venkataraman <[email protected]>
*/

+#include <linux/task_work.h>
#include <linux/elf.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
@@ -67,22 +69,45 @@ out:
return ret;
}

+static void gencore_work(struct callback_head *open_work)
+{
+ /* TODO A method to know when all the threads have reached here */
+ struct core_proc *cp = container_of(open_work, struct core_proc, twork);
+ /* If a thread is exiting we could let it go */
+ if (current->flags & PF_EXITING)
+ return;
+ wait_for_completion_killable(&cp->hold);
+}
+
static int release_gencore(struct inode *inode, struct file *file)
{
struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
struct core_proc *cp;
+ struct task_struct *t;

if (!task)
return -EIO;

mutex_lock(&core_mutex);
cp = get_core_proc(task);
- if (cp) {
+ if (cp)
list_del(&cp->list);
- kfree(cp);
- }
+ else
+ return -ENOENT;
+
+ complete_all(&cp->hold);
mutex_unlock(&core_mutex);
+ /* Cancelling the work added */
+ t = task;
+ read_lock(&tasklist_lock);
+ do {
+ if (t != current)
+ task_work_cancel(t, gencore_work);
+ } while_each_thread(cp->task, t);
+
+ read_unlock(&tasklist_lock);
put_task_struct(task);
+ kfree(cp);
return 0;
}

@@ -117,6 +142,7 @@ static int open_gencore(struct inode *inode, struct file *filp)
{
struct task_struct *task = get_proc_task(inode);
struct core_proc *cp;
+ struct task_struct *t;
int elf_class;
int ret = 0;
if (!task)
@@ -145,6 +171,16 @@ static int open_gencore(struct inode *inode, struct file *filp)
mutex_lock(&core_mutex);
list_add(&cp->list, &core_list);
mutex_unlock(&core_mutex);
+ init_completion(&cp->hold);
+ /* Adding the work for all the threads except current */
+ t = cp->task;
+ init_task_work(&cp->twork, gencore_work);
+ read_lock(&tasklist_lock);
+ do {
+ if (t != current)
+ task_work_add(t, &cp->twork, true);
+ } while_each_thread(cp->task, t);
+ read_unlock(&tasklist_lock);

out:
put_task_struct(task);
diff --git a/fs/proc/gencore.h b/fs/proc/gencore.h
index c98fddf..1a88e24 100644
--- a/fs/proc/gencore.h
+++ b/fs/proc/gencore.h
@@ -1,12 +1,16 @@
#ifndef __GEN_CORE_H
#define __GEN_CORE_H

+#include <linux/types.h>
#include <linux/list.h>
#include <linux/sched.h>
+#include <linux/completion.h>

struct core_proc {
struct list_head list;
struct task_struct *task;
+ struct completion hold;
+ struct callback_head twork;
};

#endif

2013-10-04 10:32:10

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 09/19] Create /proc/pid/core entry

From:Ananth N.Mavinakayanahalli <[email protected]>

Create the /proc/PID/core entry.

Signed-off-by: Suzuki K. Poulose <[email protected]>
Signed-off-by: Ananth N.Mavinakayanahalli <[email protected]>
---
fs/proc/Makefile | 1 +
fs/proc/base.c | 2 ++
fs/proc/gencore.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/proc/internal.h | 1 +
4 files changed, 52 insertions(+)
create mode 100644 fs/proc/gencore.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index ab30716..a456c22 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -23,6 +23,7 @@ proc-y += version.o
proc-y += softirqs.o
proc-y += namespaces.o
proc-y += self.o
+proc-$(CONFIG_ELF_CORE) += gencore.o
proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1485e38..da5a212 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2640,6 +2640,8 @@ static const struct pid_entry tgid_base_stuff[] = {
#endif
#ifdef CONFIG_ELF_CORE
REG("coredump_filter", S_IRUGO|S_IWUSR, proc_coredump_filter_operations),
+ REG("core", S_IRUSR, proc_gen_core_operations),
+
#endif
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUSR, proc_tgid_io_accounting),
diff --git a/fs/proc/gencore.c b/fs/proc/gencore.c
new file mode 100644
index 0000000..115f1e4
--- /dev/null
+++ b/fs/proc/gencore.c
@@ -0,0 +1,48 @@
+/*
+ * Application core dump
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010, 2013
+ *
+ * Authors:
+ * Ananth N.Mavinakayanahalli <[email protected]>
+ * Suzuki K. Poulose <[email protected]>
+ */
+
+#include <linux/seq_file.h>
+#include "internal.h"
+
+static ssize_t read_gencore(struct file *file, char __user *buffer,
+ size_t buflen, loff_t *fpos)
+{
+ return 0;
+}
+
+static int release_gencore(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static int open_gencore(struct inode *inode, struct file *filp)
+{
+ return 0;
+}
+const struct file_operations proc_gen_core_operations = {
+ .open = open_gencore,
+ .read = read_gencore,
+ .release = release_gencore,
+};
+
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 651d09a..47538f2 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -228,6 +228,7 @@ extern const struct file_operations proc_ns_dir_operations;
*/
extern const struct file_operations proc_net_operations;
extern const struct inode_operations proc_net_inode_operations;
+extern const struct file_operations proc_gen_core_operations;

#ifdef CONFIG_NET
extern int proc_net_init(void);

2013-10-04 10:35:09

by Janani Venkataraman1

[permalink] [raw]
Subject: [PATCH 08/19] elf_core_copy_extra_phdrs() for UML

From:Suzuki K. Poulose <[email protected]>

elf_core_copy_extra_phdrs() for UML. Adapted from elf_core_write_extra_phdrs()

Signed-off-by: Suzuki K. Poulose <[email protected]>
---
arch/x86/um/elfcore.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/arch/x86/um/elfcore.c b/arch/x86/um/elfcore.c
index 6bb49b6..edecfce 100644
--- a/arch/x86/um/elfcore.c
+++ b/arch/x86/um/elfcore.c
@@ -41,6 +41,38 @@ int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
return 1;
}

+int elf_core_copy_extra_phdrs(char *bufp, loff_t offset, size_t *size,
+ unsigned long limit)
+{
+ if (vsyscall_ehdr) {
+ const struct elfhdr *const ehdrp =
+ (struct elfhdr *) vsyscall_ehdr;
+ const struct elf_phdr *const phdrp =
+ (const struct elf_phdr *) (vsyscall_ehdr + ehdrp->e_phoff);
+ struct elf_phdr *phdr = (struct elf_phdr *)bufp;
+ int i;
+ Elf32_Off ofs = 0;
+
+ for (i = 0; i < ehdrp->e_phnum; ++i) {
+ *size += sizeof(*phdr);
+ if (*size > limit)
+ return 0;
+
+ *phdr = phdrp[i];
+
+ if (phdr.p_type == PT_LOAD) {
+ ofs = phdr.p_offset = offset;
+ offset += phdr.p_filesz;
+ } else {
+ phdr.p_offset += ofs;
+ }
+ phdr.p_paddr = 0; /* match other core phdrs */
+ phdr++;
+ }
+ }
+ return 1;
+}
+
int elf_core_write_extra_data(struct file *file, size_t *size,
unsigned long limit)
{

2013-10-04 10:39:27

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

On 10/04/2013 02:30 PM, Janani Venkataraman wrote:
> Hi all,
>

> This series is based on the Task work add approach. We didn't adopt the CRIU
> approch because of the following reasons:
>
> * It is not upstream yet.

It is, starting from criu-v0.7 + linux-3.11

> * There are concerns about the security of the dump.

Can you elaborate on this? Is it fixable in CRIU at all?

> * It involves a lot of changes and this approach provides a UNIX style
> interface.

Can you also shed more light on this -- what changes do you mean?

2013-10-04 13:44:24

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

On Fri, Oct 04, 2013 at 04:00:12PM +0530, Janani Venkataraman wrote:
> Hi all,
>
> The following series implements an infrastructure for capturing the core of an
> application without disrupting its process.

The problem is that gcore et.al. have to stop the process briefly
to attach and then use the pid mmap ptrace interfaces, right?

Couldn't they just use the new process_vm_readv() syscalls instead?
AFAIK those do not require ptrace.

Then this could be all done in user space.

Or are there some specific races with this approach?

-Andi

2013-10-07 06:07:56

by suzuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

On 10/04/2013 07:14 PM, Andi Kleen wrote:
> On Fri, Oct 04, 2013 at 04:00:12PM +0530, Janani Venkataraman wrote:
>> Hi all,
>>
>> The following series implements an infrastructure for capturing the core of an
>> application without disrupting its process.
>
> The problem is that gcore et.al. have to stop the process briefly
> to attach and then use the pid mmap ptrace interfaces, right?
>
Correct.

> Couldn't they just use the new process_vm_readv() syscalls instead?
> AFAIK those do not require ptrace.
>
We need the register set and hence would need a ptrace.

> Then this could be all done in user space.
>
> Or are there some specific races with this approach?
>
Cheers
Suzuki

2013-10-07 14:06:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

On 10/07, Suzuki K. Poulose wrote:
>
> On 10/04/2013 07:14 PM, Andi Kleen wrote:
>
> > Couldn't they just use the new process_vm_readv() syscalls instead?
> > AFAIK those do not require ptrace.
> >
> We need the register set and hence would need a ptrace.

Or the task itself can dump its memory/registers/whatever in responce
to the request from dumper.

gencore_work() is only used to freeze the process, but it can do more?

Oleg.

2013-10-07 18:10:23

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

> > Couldn't they just use the new process_vm_readv() syscalls instead?
> > AFAIK those do not require ptrace.
> >
> We need the register set and hence would need a ptrace.

But the kernel needs to stop to to read the registers.

Do you have data how much the latency difference is between
an optimized ptrace reader (using PTRACE_GETREGSET) vs the kernel ?

-Andi

2013-10-07 19:03:34

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

Hello,

On Fri, Oct 04, 2013 at 02:38:43PM +0400, Pavel Emelyanov wrote:
> > * It is not upstream yet.
>
> It is, starting from criu-v0.7 + linux-3.11
>
> > * There are concerns about the security of the dump.
>
> Can you elaborate on this? Is it fixable in CRIU at all?
>
> > * It involves a lot of changes and this approach provides a UNIX style
> > interface.
>
> Can you also shed more light on this -- what changes do you mean?

Yeah, I'd like to hear more too. It doesn't make much sense to me to
add something completely new if it can be served mostly by the
existing infrastructure. Also, what do you mean by "disruption"? You
mentioned signal but PTRACE_SEIZE is completely transparent
w.r.t. signals. If you mean without stopping the target process's
execution, what are you trying to use the dumping for and how much
gain are we talking about? Also, isn't it kinda mandatory to stop the
process to get a consistent dump? What am I missing here?

Thanks.

--
tejun

2013-10-08 00:24:05

by Ryan Mallon

[permalink] [raw]
Subject: Re: [PATCH 02/19] Make vma_dump_size() generic

On 04/10/13 20:30, Janani Venkataraman wrote:
> From:Suzuki K. Poulose <[email protected]>
>
> vma_dump_size calculates the file size of a vma area in the core file. It
> assumes the vma belongs to the "current". Make it generic to work for any task.
> This will be reused by application core dump infrastructure.
>
> Signed-off-by: Suzuki K. Poulose <[email protected]>
> ---

> -static unsigned long vma_dump_size(struct vm_area_struct *vma,
> +unsigned long vma_dump_size(struct task_struct *p, struct vm_area_struct *vma,
> unsigned long mm_flags)
> {
> #define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
> @@ -143,10 +143,18 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
> * Switch to the user "segment" for get_user(),
> * then put back what elf_core_dump() had in place.
> */
> - set_fs(USER_DS);
> - if (unlikely(get_user(word, header)))
> - word = 0;
> - set_fs(fs);
> + if (p->mm == current->mm) {
> + mm_segment_t fs = get_fs();

It looks like you missed the removal of the old:

mm_segment_t fs = get_fs();

above? Just below if (FILTER(ELF_HEADERS)).

~Ryan

2013-10-08 03:54:22

by Janani Venkataraman1

[permalink] [raw]
Subject: Re: [PATCH 02/19] Make vma_dump_size() generic





From: Ryan Mallon <[email protected]>
To: Janani Venkataraman1/India/IBM@IBMIN,
[email protected],
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected]
Date: 10/08/2013 05:52 AM
Subject: Re: [PATCH 02/19] Make vma_dump_size() generic



On 04/10/13 20:30, Janani Venkataraman wrote:
> From:Suzuki K. Poulose <[email protected]>
>
> vma_dump_size calculates the file size of a vma area in the core file. It
> assumes the vma belongs to the "current". Make it generic to work for any
task.
> This will be reused by application core dump infrastructure.
>
> Signed-off-by: Suzuki K. Poulose <[email protected]>
> ---

> -static unsigned long vma_dump_size(struct vm_area_struct *vma,
> +unsigned long vma_dump_size(struct task_struct *p, struct vm_area_struct
*vma,
> unsigned long
mm_flags)
> {
> #define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
> @@ -143,10 +143,18 @@ static unsigned long vma_dump_size(struct
vm_area_struct *vma,
> * Switch to the user "segment" for get_user(),
> * then put back what elf_core_dump() had in
place.
> */
> - set_fs(USER_DS);
> - if (unlikely(get_user(word, header)))
> - word = 0;
> - set_fs(fs);
> + if (p->mm == current->mm) {
> + mm_segment_t fs = get_fs();

It looks like you missed the removal of the old:

mm_segment_t fs = get_fs();

above? Just below if (FILTER(ELF_HEADERS)).

Yes thats correct. I will remove it. Thanks for pointing it out.

Thanks
Janani

~Ryan


2013-10-08 10:14:04

by Janani Venkataraman1

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()





From: Pavel Emelyanov <[email protected]>
To: Janani Venkataraman1/India/IBM@IBMIN,
Cc: <[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>
Date: 10/04/2013 04:08 PM
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump
infrastructure using task_work_add()



On 10/04/2013 02:30 PM, Janani Venkataraman wrote:
> Hi all,
>

> This series is based on the Task work add approach. We didn't adopt the
CRIU
> approch because of the following reasons:
>
> * It is not upstream yet.

It is, starting from criu-v0.7 + linux-3.11

> * There are concerns about the security of the dump.

Can you elaborate on this? Is it fixable in CRIU at all?

> * It involves a lot of changes and this approach provides a UNIX style
> interface.

Can you also shed more light on this -- what changes do you mean?

We had a prototype ready earlier using the freezer approach.
http://lwn.net/Articles/419756/

We made a couple of minor changes to it and implemented using task work
add.
We wanted to know what the community felt about this approach.

Also in the previous RFD, Andi Kleen had mentioned a concern on the
security with
respect to the daemon approach for a self dump in CRIU.

Thanks.
Janani

2013-10-08 10:16:45

by Janani Venkataraman1

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()





From: Tejun Heo <[email protected]>
To: Pavel Emelyanov <[email protected]>,
Cc: Janani Venkataraman1/India/IBM@IBMIN,
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected]
Date: 10/08/2013 12:26 AM
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump
infrastructure using task_work_add()
Sent by: Tejun Heo <[email protected]>



Hello,

On Fri, Oct 04, 2013 at 02:38:43PM +0400, Pavel Emelyanov wrote:
> > * It is not upstream yet.
>
> It is, starting from criu-v0.7 + linux-3.11
>
> > * There are concerns about the security of the dump.
>
> Can you elaborate on this? Is it fixable in CRIU at all?
>
> > * It involves a lot of changes and this approach provides a UNIX style
> > interface.
>
> Can you also shed more light on this -- what changes do you mean?

Yeah, I'd like to hear more too. It doesn't make much sense to me to
add something completely new if it can be served mostly by the
existing infrastructure. Also, what do you mean by "disruption"? You
mentioned signal but PTRACE_SEIZE is completely transparent
w.r.t. signals. If you mean without stopping the target process's
execution, what are you trying to use the dumping for and how much
gain are we talking about? Also, isn't it kinda mandatory to stop the
process to get a consistent dump? What am I missing here?

By disruption we do mean not using signals. PTRACE_SEIZE doesn't use
signals,
but the concern is, a seize can't be done on oneself and we are looking at
also
a self dump.

Thanks.
Janani

Thanks.

--
tejun


2013-10-09 08:58:27

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

On 10/08/2013 02:12 PM, Janani Venkataraman1 wrote:
>
>
>
>
> From: Pavel Emelyanov <[email protected]>
> To: Janani Venkataraman1/India/IBM@IBMIN,
> Cc: <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>,
> <[email protected]>, <[email protected]>
> Date: 10/04/2013 04:08 PM
> Subject: Re: [RFC] [PATCH 00/19] Non disruptive application core dump
> infrastructure using task_work_add()
>
>
>
> On 10/04/2013 02:30 PM, Janani Venkataraman wrote:
>> Hi all,
>>
>
>> This series is based on the Task work add approach. We didn't adopt the
> CRIU
>> approch because of the following reasons:
>>
>> * It is not upstream yet.
>
> It is, starting from criu-v0.7 + linux-3.11
>
>> * There are concerns about the security of the dump.
>
> Can you elaborate on this? Is it fixable in CRIU at all?
>
>> * It involves a lot of changes and this approach provides a UNIX style
>> interface.
>
> Can you also shed more light on this -- what changes do you mean?
>
> We had a prototype ready earlier using the freezer approach.
> http://lwn.net/Articles/419756/
>
> We made a couple of minor changes to it and implemented using task work
> add.
> We wanted to know what the community felt about this approach.
>
> Also in the previous RFD, Andi Kleen had mentioned a concern on the
> security with
> respect to the daemon approach for a self dump in CRIU.

We have this thing addressed -- when one requests a self-dump from criu daemon
the latter

a) gets pid to dump from SO_PEERCRED, thus requester cannot just send some other's pid
b) doesn't dump tasks that belong to user other than the one who requested the dump

What other security concerns do you have? We're also interested in addressing them.

> Thanks.
> Janani
>
> .
>