2007-02-16 13:34:51

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 0/4] coredump: core dump masking support v3

Hi,

This patch series is version 3 of the core dump masking feature,
which provides a per-process flag not to dump anonymous shared
memory segments.

In this version, /proc/<pid>/coredump_omit_anonymous_shared file
is provided as an interface instead of the previous
/proc/<pid>/core_flags. If you have written a non-zero value to the
file, anonymous shared memory segments of the process not to be
dumped.

Another change of this version is removal of kernel.core_flags_enables
sysctl which enables/disables core dump flags globally. The purpose
of this sysctl is for the system administrator to force all anonymous
memory to be dumped. But ulimit -c 0 breaks it. So this sysctl is
not helpful for the current linux.

This patch series can be applied against 2.6.20-mm1.
The supported core file formats are ELF and ELF-FDPIC. ELF has been
tested, but ELF-FDPIC has not been build and tested because I don't
have the test environment.


Background:
Some software programs share huge memory among hundreds of
processes. If a failure occurs on one of these processes, they can
be signaled by a monitoring process to generate core files and
restart the service. However, it can develop into a system-wide
failure such as system slow down for a long time and disk space
shortage because the total size of the core files is very huge!

To avoid the above situation we can limit the core file size by
setrlimit(2) or ulimit(1). But this method can lose important data
such as stack because core dumping is terminated halfway.
So I suggest keeping shared memory segments from being dumped for
particular processes. Because the shared memory attached to processes
is common in them, we don't need to dump the shared memory every time.


Usage:
Get all shared memory segments of pid 1234 not to dump:

$ echo 1 > /proc/1234/coredump_omit_anonymous_shared

When a new process is created, the process inherits the flag status
from its parent. It is useful to set the core dump flags before the
program runs. For example:

$ echo 1 > /proc/self/coredump_omit_anonymous_shared
$ ./some_program


ChangeLog:
v3:
- remove `/proc/<pid>/core_flags' proc entry
- add `/proc/<pid>/coredump_anonymous_shared' as a named flag
- remove kernel.core_flags_enable sysctl parameter

v2:
http://groups.google.com/group/linux.kernel/browse_frm/thread/cb254465971d4a42/
http://groups.google.com/group/linux.kernel/browse_frm/thread/da78f2702e06fa11/
- rename `coremask' to `core_flags'
- change `core_flags' member in mm_struct to a bit field
next to `dumpable'
- introduce a global spin lock to protect adjacent two bit fields
(core_flags and dumpable) from race condition
- fix a bug that the generated core file can be corrupted when
core dumping and updating core_flags occur concurrently
- add kernel.core_flags_enable sysctl parameter to enable/disable
flags in /proc/<pid>/core_flags
- support ELF-FDPIC binary format, but not tested

v1:
http://groups.google.com/group/linux.kernel/browse_frm/thread/1381fc54d716e3e6/

--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory
E-mail: [email protected]



2007-02-16 13:39:40

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 1/4] coredump: add an interface to control the core dump routine

This patch adds an interface to set/reset a flag which determines
anonymous shared memory segments should be dumped or not when a core
file is generated.

/proc/<pid>/coredump_omit_anonymous_shared file is provided to access
the flag. You can change the flag status for a particular process by
writing to or reading from the file.

The flag status is inherited to the child process when it is created.

The flag is stored into coredump_omit_anon_shared member of mm_struct,
which shares bytes with dumpable member because these two are adjacent
bit fields. In order to avoid write-write race between the two, we use
a global spin lock.

smp_wmb() at updating dumpable is removed because set_dumpable()
includes a pair of spin lock and unlock which has the effect of
memory barrier.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/exec.c | 10 ++--
fs/proc/base.c | 99 ++++++++++++++++++++++++++++++++++++++++
include/linux/sched.h | 33 +++++++++++++
kernel/fork.c | 3 +
kernel/sys.c | 62 ++++++++-----------------
security/commoncap.c | 2
security/dummy.c | 2
7 files changed, 164 insertions(+), 47 deletions(-)

Index: linux-2.6.20-mm1/fs/proc/base.c
===================================================================
--- linux-2.6.20-mm1.orig/fs/proc/base.c
+++ linux-2.6.20-mm1/fs/proc/base.c
@@ -74,6 +74,7 @@
#include <linux/poll.h>
#include <linux/nsproxy.h>
#include <linux/oom.h>
+#include <linux/elf.h>
#include "internal.h"

/* NOTE:
@@ -1753,6 +1754,100 @@ static const struct inode_operations pro

#endif

+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+static ssize_t proc_coredump_omit_anon_shared_read(struct file *file,
+ char __user *buf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct mm_struct *mm;
+ char buffer[PROC_NUMBUF];
+ size_t len;
+ loff_t __ppos = *ppos;
+ int ret;
+
+ ret = -ESRCH;
+ if (!task)
+ goto out_no_task;
+
+ ret = 0;
+ mm = get_task_mm(task);
+ if (!mm)
+ goto out_no_mm;
+
+ len = snprintf(buffer, sizeof(buffer), "%u\n",
+ mm->coredump_omit_anon_shared);
+ if (__ppos >= len)
+ goto out;
+ if (count > len - __ppos)
+ count = len - __ppos;
+
+ ret = -EFAULT;
+ if (copy_to_user(buf, buffer + __ppos, count))
+ goto out;
+
+ ret = count;
+ *ppos = __ppos + count;
+
+ out:
+ mmput(mm);
+ out_no_mm:
+ put_task_struct(task);
+ out_no_task:
+ return ret;
+}
+
+static ssize_t proc_coredump_omit_anon_shared_write(struct file *file,
+ const char __user *buf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct task_struct *task;
+ struct mm_struct *mm;
+ char buffer[PROC_NUMBUF], *end;
+ unsigned int val;
+ int ret;
+
+ ret = -EFAULT;
+ memset(buffer, 0, sizeof(buffer));
+ if (count > sizeof(buffer) - 1)
+ count = sizeof(buffer) - 1;
+ if (copy_from_user(buffer, buf, count))
+ goto out_no_task;
+
+ ret = -EINVAL;
+ val = (unsigned int)simple_strtoul(buffer, &end, 0);
+ if (*end == '\n')
+ end++;
+ if (end - buffer == 0)
+ goto out_no_task;
+
+ ret = -ESRCH;
+ task = get_proc_task(file->f_dentry->d_inode);
+ if (!task)
+ goto out_no_task;
+
+ ret = end - buffer;
+ mm = get_task_mm(task);
+ if (!mm)
+ goto out_no_mm;
+
+ set_coredump_omit_anon_shared(mm, (val != 0));
+
+ mmput(mm);
+ out_no_mm:
+ put_task_struct(task);
+ out_no_task:
+ return ret;
+}
+
+static struct file_operations proc_coredump_omit_anon_shared_operations = {
+ .read = proc_coredump_omit_anon_shared_read,
+ .write = proc_coredump_omit_anon_shared_write,
+};
+#endif
+
/*
* /proc/self:
*/
@@ -1972,6 +2067,10 @@ static struct pid_entry tgid_base_stuff[
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
#endif
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+ REG("coredump_omit_anonymous_shared", S_IRUGO|S_IWUSR,
+ coredump_omit_anon_shared),
+#endif
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, pid_io_accounting),
#endif
Index: linux-2.6.20-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.20-mm1.orig/include/linux/sched.h
+++ linux-2.6.20-mm1/include/linux/sched.h
@@ -366,7 +366,13 @@ struct mm_struct {
unsigned int token_priority;
unsigned int last_interval;

+ /*
+ * Writing to these bit fields can cause race condition. To avoid
+ * the race, use dump_bits_lock. You may also use set_dumpable() or
+ * set_coredump_*() macros to set a value.
+ */
unsigned char dumpable:2;
+ unsigned char coredump_omit_anon_shared:1; /* don't dump anon shm */

/* coredumping support */
int core_waiters;
@@ -1721,6 +1727,33 @@ static inline void inc_syscw(struct task
}
#endif

+#include <linux/elf.h>
+/*
+ * These macros are used to protect dumpable and coredump_omit_anon_shared bit
+ * fields in mm_struct from write race between the two.
+ */
+extern spinlock_t dump_bits_lock;
+#define __set_dump_bits(dest, val) \
+ do { \
+ spin_lock(&dump_bits_lock); \
+ (dest) = (val); \
+ spin_unlock(&dump_bits_lock); \
+ } while (0)
+
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+# define set_dumpable(mm, val) \
+ __set_dump_bits((mm)->dumpable, val)
+# define set_coredump_omit_anon_shared(mm, val) \
+ __set_dump_bits((mm)->coredump_omit_anon_shared, val)
+#else
+# define set_dumpable(mm, val) \
+ do { \
+ (mm)->dumpable = (val); \
+ smp_wmb(); \
+ } while (0)
+# define set_coredump_omit_anon_shared(mm, val)
+#endif
+
#endif /* __KERNEL__ */

#endif
Index: linux-2.6.20-mm1/fs/exec.c
===================================================================
--- linux-2.6.20-mm1.orig/fs/exec.c
+++ linux-2.6.20-mm1/fs/exec.c
@@ -62,6 +62,8 @@ int core_uses_pid;
char core_pattern[128] = "core";
int suid_dumpable = 0;

+DEFINE_SPINLOCK(dump_bits_lock);
+
EXPORT_SYMBOL(suid_dumpable);
/* The maximal length of core_pattern is also specified in sysctl.c */

@@ -853,9 +855,9 @@ int flush_old_exec(struct linux_binprm *
current->sas_ss_sp = current->sas_ss_size = 0;

if (current->euid == current->uid && current->egid == current->gid)
- current->mm->dumpable = 1;
+ set_dumpable(current->mm, 1);
else
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

name = bprm->filename;

@@ -883,7 +885,7 @@ int flush_old_exec(struct linux_binprm *
file_permission(bprm->file, MAY_READ) ||
(bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)) {
suid_keys(current);
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
}

/* An exec changes our domain. We are no longer part of the thread
@@ -1477,7 +1479,7 @@ int do_coredump(long signr, int exit_cod
flag = O_EXCL; /* Stop rewrite attacks */
current->fsuid = 0; /* Dump root private */
}
- mm->dumpable = 0;
+ set_dumpable(mm, 0);

retval = coredump_wait(exit_code);
if (retval < 0)
Index: linux-2.6.20-mm1/kernel/fork.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/fork.c
+++ linux-2.6.20-mm1/kernel/fork.c
@@ -333,6 +333,9 @@ static struct mm_struct * mm_init(struct
atomic_set(&mm->mm_count, 1);
init_rwsem(&mm->mmap_sem);
INIT_LIST_HEAD(&mm->mmlist);
+ /* don't need to use set_coredump_omit_anon_shared() */
+ mm->coredump_omit_anon_shared =
+ (current->mm) ? current->mm->coredump_omit_anon_shared : 0;
mm->core_waiters = 0;
mm->nr_ptes = 0;
set_mm_counter(mm, file_rss, 0);
Index: linux-2.6.20-mm1/kernel/sys.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/sys.c
+++ linux-2.6.20-mm1/kernel/sys.c
@@ -1022,10 +1022,8 @@ asmlinkage long sys_setregid(gid_t rgid,
else
return -EPERM;
}
- if (new_egid != old_egid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (new_egid != old_egid)
+ set_dumpable(current->mm, suid_dumpable);
if (rgid != (gid_t) -1 ||
(egid != (gid_t) -1 && egid != old_rgid))
current->sgid = new_egid;
@@ -1052,16 +1050,12 @@ asmlinkage long sys_setgid(gid_t gid)
return retval;

if (capable(CAP_SETGID)) {
- if (old_egid != gid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (old_egid != gid)
+ set_dumpable(current->mm, suid_dumpable);
current->gid = current->egid = current->sgid = current->fsgid = gid;
} else if ((gid == current->gid) || (gid == current->sgid)) {
- if (old_egid != gid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (old_egid != gid)
+ set_dumpable(current->mm, suid_dumpable);
current->egid = current->fsgid = gid;
}
else
@@ -1089,10 +1083,8 @@ static int set_user(uid_t new_ruid, int

switch_uid(new_user);

- if (dumpclear) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (dumpclear)
+ set_dumpable(current->mm, suid_dumpable);
current->uid = new_ruid;
return 0;
}
@@ -1145,10 +1137,8 @@ asmlinkage long sys_setreuid(uid_t ruid,
if (new_ruid != old_ruid && set_user(new_ruid, new_euid != old_euid) < 0)
return -EAGAIN;

- if (new_euid != old_euid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (new_euid != old_euid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsuid = current->euid = new_euid;
if (ruid != (uid_t) -1 ||
(euid != (uid_t) -1 && euid != old_ruid))
@@ -1195,10 +1185,8 @@ asmlinkage long sys_setuid(uid_t uid)
} else if ((uid != current->uid) && (uid != new_suid))
return -EPERM;

- if (old_euid != uid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (old_euid != uid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsuid = current->euid = uid;
current->suid = new_suid;

@@ -1240,10 +1228,8 @@ asmlinkage long sys_setresuid(uid_t ruid
return -EAGAIN;
}
if (euid != (uid_t) -1) {
- if (euid != current->euid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (euid != current->euid)
+ set_dumpable(current->mm, suid_dumpable);
current->euid = euid;
}
current->fsuid = current->euid;
@@ -1290,10 +1276,8 @@ asmlinkage long sys_setresgid(gid_t rgid
return -EPERM;
}
if (egid != (gid_t) -1) {
- if (egid != current->egid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (egid != current->egid)
+ set_dumpable(current->mm, suid_dumpable);
current->egid = egid;
}
current->fsgid = current->egid;
@@ -1336,10 +1320,8 @@ asmlinkage long sys_setfsuid(uid_t uid)
if (uid == current->uid || uid == current->euid ||
uid == current->suid || uid == current->fsuid ||
capable(CAP_SETUID)) {
- if (uid != old_fsuid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (uid != old_fsuid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsuid = uid;
}

@@ -1365,10 +1347,8 @@ asmlinkage long sys_setfsgid(gid_t gid)
if (gid == current->gid || gid == current->egid ||
gid == current->sgid || gid == current->fsgid ||
capable(CAP_SETGID)) {
- if (gid != old_fsgid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (gid != old_fsgid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsgid = gid;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
@@ -2163,7 +2143,7 @@ asmlinkage long sys_prctl(int option, un
error = -EINVAL;
break;
}
- current->mm->dumpable = arg2;
+ set_dumpable(current->mm, arg2);
break;

case PR_SET_UNALIGN:
Index: linux-2.6.20-mm1/security/commoncap.c
===================================================================
--- linux-2.6.20-mm1.orig/security/commoncap.c
+++ linux-2.6.20-mm1/security/commoncap.c
@@ -244,7 +244,7 @@ void cap_bprm_apply_creds (struct linux_

if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
!cap_issubset (new_permitted, current->cap_permitted)) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
if (!capable(CAP_SETUID)) {
Index: linux-2.6.20-mm1/security/dummy.c
===================================================================
--- linux-2.6.20-mm1.orig/security/dummy.c
+++ linux-2.6.20-mm1/security/dummy.c
@@ -130,7 +130,7 @@ static void dummy_bprm_free_security (st
static void dummy_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
{
if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

if ((unsafe & ~LSM_UNSAFE_PTRACE_CAP) && !capable(CAP_SETUID)) {
bprm->e_uid = current->uid;


2007-02-16 13:40:37

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory

This patch enables to omit anonymous shared memory from an ELF
formatted core file when it is generated.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/binfmt_elf.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)

Index: linux-2.6.20-mm1/fs/binfmt_elf.c
===================================================================
--- linux-2.6.20-mm1.orig/fs/binfmt_elf.c
+++ linux-2.6.20-mm1/fs/binfmt_elf.c
@@ -1181,7 +1181,7 @@ static int dump_seek(struct file *file,
*
* I think we should skip something. But I am not sure how. H.J.
*/
-static int maydump(struct vm_area_struct *vma)
+static int maydump(struct vm_area_struct *vma, unsigned int omit_anon_shared)
{
/* The vma can be set up to tell us the answer directly. */
if (vma->vm_flags & VM_ALWAYSDUMP)
@@ -1191,9 +1191,15 @@ static int maydump(struct vm_area_struct
if (vma->vm_flags & (VM_IO | VM_RESERVED))
return 0;

- /* Dump shared memory only if mapped from an anonymous file. */
- if (vma->vm_flags & VM_SHARED)
- return vma->vm_file->f_path.dentry->d_inode->i_nlink == 0;
+ /*
+ * Dump shared memory only if mapped from an anonymous file and
+ * /proc/<pid>/coredump_omit_anonymous_shared flag is not set.
+ */
+ if (vma->vm_flags & VM_SHARED) {
+ if (vma->vm_file->f_path.dentry->d_inode->i_nlink)
+ return 0;
+ return omit_anon_shared == 0;
+ }

/* If it hasn't been written to, don't write it out */
if (!vma->anon_vma)
@@ -1491,6 +1497,7 @@ static int elf_core_dump(long signr, str
#endif
int thread_status_size = 0;
elf_addr_t *auxv;
+ unsigned int omit_anon_shared = 0;

/*
* We no longer stop all VM operations.
@@ -1629,6 +1636,7 @@ static int elf_core_dump(long signr, str
}

dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
+ omit_anon_shared = current->mm->coredump_omit_anon_shared;

/* Write program headers for segments dump */
for (vma = first_vma(current, gate_vma); vma != NULL;
@@ -1642,7 +1650,7 @@ static int elf_core_dump(long signr, str
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = maydump(vma) ? sz : 0;
+ phdr.p_filesz = maydump(vma, omit_anon_shared) ? sz : 0;
phdr.p_memsz = sz;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -1685,7 +1693,7 @@ static int elf_core_dump(long signr, str
vma = next_vma(vma, gate_vma)) {
unsigned long addr;

- if (!maydump(vma))
+ if (!maydump(vma, omit_anon_shared))
continue;

for (addr = vma->vm_start;


2007-02-16 13:41:31

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

This patch enables to omit anonymous shared memory from an ELF-FDPIC
formatted core file when it is generated.

The debug messages from maydump() in fs/binfmt_elf_fdpic.c are changed
appropriately so that we can know what kind of memory segments are
dumped or not.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/binfmt_elf_fdpic.c | 37 +++++++++++++++++++++++++------------
1 files changed, 25 insertions(+), 12 deletions(-)

Index: linux-2.6.20-mm1/fs/binfmt_elf_fdpic.c
===================================================================
--- linux-2.6.20-mm1.orig/fs/binfmt_elf_fdpic.c
+++ linux-2.6.20-mm1/fs/binfmt_elf_fdpic.c
@@ -1168,7 +1168,7 @@ static int dump_seek(struct file *file,
*
* I think we should skip something. But I am not sure how. H.J.
*/
-static int maydump(struct vm_area_struct *vma)
+static int maydump(struct vm_area_struct *vma, unsigned int omit_anon_shared)
{
/* Do not dump I/O mapped devices or special mappings */
if (vma->vm_flags & (VM_IO | VM_RESERVED)) {
@@ -1184,15 +1184,22 @@ static int maydump(struct vm_area_struct
return 0;
}

- /* Dump shared memory only if mapped from an anonymous file. */
+ /*
+ * Dump shared memory only if mapped from an anonymous file and
+ * /proc/<pid>/coredump_omit_anonymous_shared flag is not set.
+ */
if (vma->vm_flags & VM_SHARED) {
- if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0) {
+ if (vma->vm_file->f_path.dentry->d_inode->i_nlink) {
kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
+ return 0;
+ }
+ if (omit_anon_shared) {
+ kdcore("%08lx: %08lx: no (anon-share)", vma->vm_start, vma->vm_flags);
+ return 0;
+ } else {
+ kdcore("%08lx: %08lx: yes (anon-share)", vma->vm_start, vma->vm_flags);
return 1;
}
-
- kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
- return 0;
}

#ifdef CONFIG_MMU
@@ -1444,14 +1451,15 @@ static int elf_dump_thread_status(long s
*/
#ifdef CONFIG_MMU
static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
- size_t *size, unsigned long *limit)
+ size_t *size, unsigned long *limit,
+ unsigned int omit_anon_shared)
{
struct vm_area_struct *vma;

for (vma = current->mm->mmap; vma; vma = vma->vm_next) {
unsigned long addr;

- if (!maydump(vma))
+ if (!maydump(vma, omit_anon_shared))
continue;

for (addr = vma->vm_start;
@@ -1499,14 +1507,15 @@ end_coredump:
*/
#ifndef CONFIG_MMU
static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
- size_t *size, unsigned long *limit)
+ size_t *size, unsigned long *limit,
+ unsigned int omit_anon_shared)
{
struct vm_list_struct *vml;

for (vml = current->mm->context.vmlist; vml; vml = vml->next) {
struct vm_area_struct *vma = vml->vma;

- if (!maydump(vma))
+ if (!maydump(vma, omit_anon_shared))
continue;

if ((*size += PAGE_SIZE) > *limit)
@@ -1557,6 +1566,7 @@ static int elf_fdpic_core_dump(long sign
struct vm_list_struct *vml;
#endif
elf_addr_t *auxv;
+ unsigned int omit_anon_shared = 0;

/*
* We no longer stop all VM operations.
@@ -1694,6 +1704,8 @@ static int elf_fdpic_core_dump(long sign
/* Page-align dumped data */
dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

+ omit_anon_shared = current->mm->coredump_omit_anon_shared;
+
/* write program headers for segments dump */
for (
#ifdef CONFIG_MMU
@@ -1715,7 +1727,7 @@ static int elf_fdpic_core_dump(long sign
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = maydump(vma) ? sz : 0;
+ phdr.p_filesz = maydump(vma, omit_anon_shared) ? sz : 0;
phdr.p_memsz = sz;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -1749,7 +1761,8 @@ static int elf_fdpic_core_dump(long sign

DUMP_SEEK(dataoff);

- if (elf_fdpic_dump_segments(file, current->mm, &size, &limit) < 0)
+ if (elf_fdpic_dump_segments(file, current->mm, &size, &limit,
+ omit_anon_shared) < 0)
goto end_coredump;

#ifdef ELF_CORE_WRITE_EXTRA_DATA


2007-02-16 13:42:32

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 4/4] coredump: documentation for proc entry

This patch adds the documentation for
/proc/<pid>/coredump_omit_anonymous_shared.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
Documentation/filesystems/proc.txt | 38 +++++++++++++++++++++++++++
1 files changed, 38 insertions(+)

Index: linux-2.6.20-mm1/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.20-mm1.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.20-mm1/Documentation/filesystems/proc.txt
@@ -41,6 +41,7 @@ Table of Contents
2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
2.13 /proc/<pid>/oom_score - Display current oom-killer score
+ 2.14 /proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator

------------------------------------------------------------------------------
Preface
@@ -1982,6 +1983,43 @@ This file can be used to check the curre
any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which
process should be killed in an out-of-memory situation.

+2.14 /proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator
+---------------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory.
+
+The /proc/<pid>/coredump_omit_anonymous_shared is a flag which enables you to
+omit anonymous shared memory segments from a core file when it is generated.
+When the <pid> process is dumped, the core dump routine decides whether a
+given memory segment should be dumped into a core file or not based on the
+type of the memory segment and the flag.
+
+If you have written a non-zero value to this proc file, anonymous shared
+memory segments are not dumped. There are three types of anonymous shared
+memory:
+
+ - IPC shared memory
+ - the memory segments created by mmap(2) with MAP_ANONYMOUS and MAP_SHARED
+ flags
+ - the memory segments created by mmap(2) with MAP_SHARED flag, and the
+ mapped file has already been unlinked
+
+Because current core dump routine doesn't distinguish these segments, you can
+only choose either dumping all anonymous shared memory segments or not.
+
+If you don't want to dump all shared memory segments attached to pid 1234,
+write 0 to the process's proc file.
+
+ $ echo 1 > /proc/1234/coredump_omit_anonymous_shared
+
+When a new process is created, the process inherits the flag status from its
+parent. It is useful to set the flag before the program runs.
+For example:
+
+ $ echo 1 > /proc/self/coredump_omit_anonymous_shared
+ $ ./some_program
+
------------------------------------------------------------------------------
Summary
------------------------------------------------------------------------------


2007-02-16 15:06:07

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Kawai, Hidehiro <[email protected]> wrote:

> static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
> - size_t *size, unsigned long *limit)
> + size_t *size, unsigned long *limit,
> + unsigned int omit_anon_shared)

Why are you passing it as an extra argument when you could just use
mm->coredump_omit_anon_shared here:

> + if (!maydump(vma, omit_anon_shared))

And here:

> + phdr.p_filesz = maydump(vma, omit_anon_shared) ? sz : 0;

Actually, I think I would just pass the mm pointer you have into maydump() and
let that dereference it here:

> + if (omit_anon_shared) {

which would then be:

if (mm->coredump_omit_anon_shared) {

Then the calls to maydump() would be:

if (!maydump(vma, mm))

and:

phdr.p_filesz = maydump(vma, current->mm) ? sz : 0;

I wouldn't worry, were I you, about the setting changing whilst it's being
used. If you are worried about that, you can probably do some locking against
that.

David

2007-02-16 15:09:33

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

Kawai, Hidehiro <[email protected]> wrote:

> To avoid the above situation we can limit the core file size by
> setrlimit(2) or ulimit(1). But this method can lose important data
> such as stack because core dumping is terminated halfway.
> So I suggest keeping shared memory segments from being dumped for
> particular processes.

A better way might be to place the shared memory segments last if that's
possible (I'm not sure ELF supports out-of-order segments).

> Because the shared memory attached to processes is common in them, we don't
> need to dump the shared memory every time.

So there's no guarantee that they'll be dumped at all... I'm not sure there's
any way around that, however.

David

2007-02-16 17:18:39

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

On Fri, Feb 16, 2007 at 03:05:35PM +0000, David Howells wrote:
> Actually, I think I would just pass the mm pointer you have into maydump() and
> let that dereference it here:
>
> > + if (omit_anon_shared) {
>
> which would then be:
>
> if (mm->coredump_omit_anon_shared) {

How about:
if (vma->vm_mm->coredump_omit_anon_shared) {

Then the calls to maydump() would be unchanged:

2007-02-16 20:09:28

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Robin Holt <[email protected]> wrote:

> How about:
> if (vma->vm_mm->coredump_omit_anon_shared) {
>
> Then the calls to maydump() would be unchanged:

VMAs are a shared resource under NOMMU conditions.

David

2007-02-20 09:45:36

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Hi,

Thank you for your comments.

David Howells wrote:

>> static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
>>- size_t *size, unsigned long *limit)
>>+ size_t *size, unsigned long *limit,
>>+ unsigned int omit_anon_shared)
>
> Why are you passing it as an extra argument when you could just use
> mm->coredump_omit_anon_shared here:
>
>>+ if (!maydump(vma, omit_anon_shared))

> I wouldn't worry, were I you, about the setting changing whilst it's being
> used. If you are worried about that, you can probably do some locking against
> that.

Core dumping is separated two phases, one is the phase of writing
headers, the other is the phase of writing memory segments. If the
coredump_omit_anon_shared setting is changed between these two phases,
a corrupted core file will be generated because the offsets written
in headers don't match their bodies. So we need to use the same
setting in both phases.

I think that locking makes codes complex and generates overhead.
So I wouldn't like to use lock as far as possible. I think passing
the flag as an extra argument is the simplest implementation to
avoid the core file corruption.

Thanks,
--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory

2007-02-20 09:48:34

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

Hi,

David Howells wrote:

> Kawai, Hidehiro <[email protected]> wrote:
>
>>To avoid the above situation we can limit the core file size by
>>setrlimit(2) or ulimit(1). But this method can lose important data
>>such as stack because core dumping is terminated halfway.
>>So I suggest keeping shared memory segments from being dumped for
>>particular processes.
>
> A better way might be to place the shared memory segments last if that's
> possible (I'm not sure ELF supports out-of-order segments).

Placing the shared memory segments last and limiting by ulimit -c
is one of the solutions. But there is no guarantee that the memory
segments other than anonymous shared memory are always dumped.
So your idea cannot alternate my suggesting feature.
But if your idea has a merit which my idea doesn't have, I try to
consider coexistence of both idea.


>>Because the shared memory attached to processes is common in them, we don't
>>need to dump the shared memory every time.
>
> So there's no guarantee that they'll be dumped at all... I'm not sure there's
> any way around that, however.

Indeed. However, some people don't want to dump anonymous shared memory
at all. Taking into account this requirement, I don't guarantee that.
But this feature allows per-process setting. So if you want to dump
the shared memory at least once, you can manage to do that in userland.

Thanks,
--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory

2007-02-20 10:58:42

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Kawai, Hidehiro <[email protected]> wrote:

> Core dumping is separated two phases, one is the phase of writing
> headers, the other is the phase of writing memory segments. If the
> coredump_omit_anon_shared setting is changed between these two phases,
> a corrupted core file will be generated because the offsets written
> in headers don't match their bodies. So we need to use the same
> setting in both phases.

Hmmm... Okay.

> I think that locking makes codes complex and generates overhead.
> So I wouldn't like to use lock as far as possible. I think passing
> the flag as an extra argument is the simplest implementation to
> avoid the core file corruption.

Actually, I don't think the locking is that hard or that complex.

int do_coredump(long signr, int exit_code, struct pt_regs * regs)
{
<setup vars>

down_read(&coredump_settings_sem);

...

fail:
up_read(&coredump_settings_sem);
return retval;
}

And:

static ssize_t proc_coredump_omit_anon_shared_write(struct file *file,
const char __user *buf,
size_t count,
loff_t *ppos)
{
<setup vars>

down_write(&coredump_settings_sem);

...

out_no_task:
up_write(&coredump_settings_sem);
return ret;
}

The same could be applied to all controls that change the coredumping
variables, in particular the sysctl for core_pattern could be wrapped so as to
remove one of the reliances on lock_kernel() and the lock_kernel pair could be
removed from do_coredump().

David

2007-02-20 12:56:36

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

On Tue, Feb 20, 2007 at 10:58:17AM +0000, David Howells wrote:
> Kawai, Hidehiro <[email protected]> wrote:
> Actually, I don't think the locking is that hard or that complex.
>
> int do_coredump(long signr, int exit_code, struct pt_regs * regs)
> {
> <setup vars>
>
> down_read(&coredump_settings_sem);
>
> ...
>
> fail:
> up_read(&coredump_settings_sem);
> return retval;
> }
>
> And:
>
> static ssize_t proc_coredump_omit_anon_shared_write(struct file *file,
> const char __user *buf,
> size_t count,
> loff_t *ppos)
> {
> <setup vars>
>
> down_write(&coredump_settings_sem);

If the dump has started, do we want to change this to a trylock and skip
changing the value if it is already locked?

Thanks,
Robin

2007-02-21 10:01:16

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Hi,

Thank you for your reply.

David Howells wrote:

>>I think that locking makes codes complex and generates overhead.
>>So I wouldn't like to use lock as far as possible. I think passing
>>the flag as an extra argument is the simplest implementation to
>>avoid the core file corruption.
>
> Actually, I don't think the locking is that hard or that complex.
>
> int do_coredump(long signr, int exit_code, struct pt_regs * regs)
> {
> <setup vars>
>
> down_read(&coredump_settings_sem);
>
> ...
>
> fail:
> up_read(&coredump_settings_sem);
> return retval;
> }
>
> And:
>
> static ssize_t proc_coredump_omit_anon_shared_write(struct file *file,
> const char __user *buf,
> size_t count,
> loff_t *ppos)
> {
> <setup vars>
>
> down_write(&coredump_settings_sem);
>
> ...
>
> out_no_task:
> up_write(&coredump_settings_sem);
> return ret;
> }

Is coredump_setting_sem a global semaphore? If so, it prevents concurrent
core dumping. Additionally, while some process is dumped, writing to
coredump_omit_anon_shared of unrelated process will be blocked.

So we should use per process or per mm locking. But where should we
place the variable for locking? Because we don't want to increase the
struct size, we might want to add another bit field in mm_struct:

struct mm_struct {
...
unsigned char dumpable:2;
unsigned char coredump_in_progress:1; /* this */
unsigned char coredump_omit_anon_shared:1;
...

And we use it to determine whether core dumping is in progress or not:

int do_coredump(long signr, int exit_code, struct pt_regs * regs)
{
<setup vars>

spin_lock(&dump_bits_lock);
current->mm->coredump_in_progress = 1;
spin_unlock(&dump_bits_lock);
...

Additionally:

static ssize_t proc_coredump_omit_anon_shared_write(struct file *file,
const char __user *buf,
size_t count,
loff_t *ppos)
{
<setup vars>

ret = -EBUSY;
spin_lock(&dump_bits_lock);
if (mm->coredump_in_progress) {
spin_unlock(&dump_bits_lock);
goto out;
}
mm->coredump_omit_anon_shared = (val != 0);
spin_unlock(&dump_bits_lock);
...

Do you think which is better this method or passing argument method
used by my patchset?
Or, are there even better way else?

--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory

2007-02-21 11:33:56

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Kawai, Hidehiro <[email protected]> wrote:

> Is coredump_setting_sem a global semaphore? If so, it prevents concurrent
> core dumping.

No, it doesn't. Look again:

int do_coredump(long signr, int exit_code, struct pt_regs * regs)
{
<setup vars>

>>>> down_read(&coredump_settings_sem);

> Additionally, while some process is dumped, writing to
> coredump_omit_anon_shared of unrelated process will be blocked.

Yes, but that's probably reasonable. How likely (a) is a process to coredump,
and (b) is a coredump to occur simultaneously with someone altering their
settings?

David

2007-02-21 11:54:44

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

On Wed, Feb 21, 2007 at 11:33:31AM +0000, David Howells wrote:
> Kawai, Hidehiro <[email protected]> wrote:
>
> > Is coredump_setting_sem a global semaphore? If so, it prevents concurrent
> > core dumping.
>
> No, it doesn't. Look again:
>
> int do_coredump(long signr, int exit_code, struct pt_regs * regs)
> {
> <setup vars>
>
> >>>> down_read(&coredump_settings_sem);
>
> > Additionally, while some process is dumped, writing to
> > coredump_omit_anon_shared of unrelated process will be blocked.
>
> Yes, but that's probably reasonable. How likely (a) is a process to coredump,
> and (b) is a coredump to occur simultaneously with someone altering their
> settings?

And (c) altering the setting during a core dump going to produce an
unusable core dump. I don't think the locking is that difficult to add
and it just makes sense. I would venture a guess that it will take less
time to actually do the locking than to continue arguing it is not needed
when it clearly appears it is needed for even a small number of cases.

Thanks,
Robin

2007-02-22 05:33:48

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Hi David and Robin,

Thank you for your reply.

Robin Holt wrote:
> On Wed, Feb 21, 2007 at 11:33:31AM +0000, David Howells wrote:
>
>>Kawai, Hidehiro <[email protected]> wrote:
>>
>>>Is coredump_setting_sem a global semaphore? If so, it prevents concurrent
>>>core dumping.
>>
>>No, it doesn't. Look again:
>>
>> int do_coredump(long signr, int exit_code, struct pt_regs * regs)
>> {
>> <setup vars>
>>
>> >>>> down_read(&coredump_settings_sem);

Oh, I'm sorry. I have overlooked it. There is no problem.


>>>Additionally, while some process is dumped, writing to
>>>coredump_omit_anon_shared of unrelated process will be blocked.
>>
>>Yes, but that's probably reasonable. How likely (a) is a process to coredump,
>>and (b) is a coredump to occur simultaneously with someone altering their
>>settings?
>
> And (c) altering the setting during a core dump going to produce an
> unusable core dump. I don't think the locking is that difficult to add
> and it just makes sense. I would venture a guess that it will take less
> time to actually do the locking than to continue arguing it is not needed
> when it clearly appears it is needed for even a small number of cases.

Okay, the probability that the process is blocked in the proc handler seems
to be small. But I'm not sure if problems never occur in enterprise use.
So I'd like to use down_write_trylock() as Robin said before. And if it
fails to acquire the lock, it returns EBUSY immediately.
Do you have any comments?

Thanks,
--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory

2007-02-22 11:48:44

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory

Kawai, Hidehiro <[email protected]> wrote:

> Okay, the probability that the process is blocked in the proc handler seems
> to be small. But I'm not sure if problems never occur in enterprise use.
> So I'd like to use down_write_trylock() as Robin said before. And if it
> fails to acquire the lock, it returns EBUSY immediately.
> Do you have any comments?

That's probably reasonable, but that means userspace then has to handle EBUSY.

David

2007-02-24 03:32:50

by Markus Gutschke

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

Kawai, Hidehiro wrote:
> This patch series is version 3 of the core dump masking feature,
> which provides a per-process flag not to dump anonymous shared
> memory segments.

I just wanted to remind you that you need to be careful about dumping
the [vdso] segment no matter whether you omit other segments. I didn't
actually try running your patches, and if the kernel doesn't actually
consider this segment anonymous and shared, things might already work
fine as is.

In any case, you can check with "readelf -a", if the [vdso] segment is
there. And you will find that if you forget to dump it, "gdb" can no
longer give you stack traces on call chains that involve signal handlers.

As an alternative to your kernel patch, you could achieve the same goal
in user space, by linking my coredumper
http://code.google.com/p/google-coredumper/ into your binaries and
setting up appropriate signal handlers. An equivalent patch for
selectively omitting memory regions would be trivial to add. While this
does give you more flexibility, it of course has the drawback of
requiring you to change your applications, so there still is some
benefit in a kernelspace solution.


Markus

2007-02-24 10:03:07

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

Markus Gutschke <[email protected]> wrote:

> As an alternative to your kernel patch, you could achieve the same goal in
> user space, by linking my coredumper

How does it work when you can't actually get back to userspace to have
userspace do the coredump? You still have to handle the userspace equivalents
of double/triple faults.

David

2007-02-24 11:39:18

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

> Kawai, Hidehiro wrote:
> >This patch series is version 3 of the core dump masking feature,
> >which provides a per-process flag not to dump anonymous shared
> >memory segments.
>
> I just wanted to remind you that you need to be careful about dumping
> the [vdso] segment no matter whether you omit other segments. I didn't
> actually try running your patches, and if the kernel doesn't actually
> consider this segment anonymous and shared, things might already work
> fine as is.
>
> In any case, you can check with "readelf -a", if the [vdso] segment is
> there. And you will find that if you forget to dump it, "gdb" can no
> longer give you stack traces on call chains that involve signal handlers.
>
> As an alternative to your kernel patch, you could achieve the same goal
> in user space, by linking my coredumper
> http://code.google.com/p/google-coredumper/ into your binaries and
> setting up appropriate signal handlers. An equivalent patch for
> selectively omitting memory regions would be trivial to add. While this
> does give you more flexibility, it of course has the drawback of
> requiring you to change your applications, so there still is some
> benefit in a kernelspace solution.

"We are too lazy to change 0.01% of apps that actually need it" is not
good enough reason to push the feature into kernel, I'd say.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-02-24 20:02:10

by Markus Gutschke

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

David Howells wrote:
> How does it work when you can't actually get back to userspace to have
> userspace do the coredump? You still have to handle the userspace equivalents
> of double/triple faults.

My experience shows that there are only very rare occurrences of
situations where you cannot get back into userspace to do the coredump,
and the coredumper tries very hard never to cause additional faults.

While I am sure you could construct scenarios where this would happen,
realistically the only one I have run into were stack overflows, and
they can be handled by carefully setting up an alternate stack for
signal handlers -- just make sure the entire stack is already dirtied
before you run out of memory (or, turn of overcommitting).


Markus

2007-02-26 11:49:48

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

Markus Gutschke <[email protected]> wrote:

> > How does it work when you can't actually get back to userspace to have
> > userspace do the coredump? You still have to handle the userspace
> > equivalents of double/triple faults.
>
> My experience shows that there are only very rare occurrences of situations
> where you cannot get back into userspace to do the coredump, and the
> coredumper tries very hard never to cause additional faults.

So what? If they can occur, you have to handle them.

> While I am sure you could construct scenarios where this would happen,
> realistically the only one I have run into were stack overflows, and they can
> be handled by carefully setting up an alternate stack for signal handlers --
> just make sure the entire stack is already dirtied before you run out of
> memory (or, turn of overcommitting).

Duff SIGSEGV or SIGBUS signal handlers are just as realistic. All that takes
is for someone to make a programming error. Remember: error paths are the
least frequently tested.

And any time you say "by carefully setting up" you can guarantee someone's
going to do it wrong.

David

2007-02-26 12:02:08

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3


> > While I am sure you could construct scenarios where this would happen,
> > realistically the only one I have run into were stack overflows, and they can
> > be handled by carefully setting up an alternate stack for signal handlers --
> > just make sure the entire stack is already dirtied before you run out of
> > memory (or, turn of overcommitting).
>
> Duff SIGSEGV or SIGBUS signal handlers are just as realistic. All that takes
> is for someone to make a programming error. Remember: error paths are the
> least frequently tested.
>
> And any time you say "by carefully setting up" you can guarantee someone's
> going to do it wrong.

By same argument, we should just give up the coredumping in kernel. It
is rarely tested, so someone will just get it wrong.

Remember: we are having people with huge apps, and therefore huge
coredumps. They want to hack a kernel in ugly way to make their dumps
smaller.

...but there's better solution for them, create (& debug) userland
coredumping library. No need to hack a kernel.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-02-26 12:42:28

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/4] coredump: core dump masking support v3

Pavel Machek <[email protected]> wrote:

> By same argument, we should just give up the coredumping in kernel. It
> is rarely tested, so someone will just get it wrong.

Not so rare... Lots of people still use Evolution after all:-)

However, to counter your point, I'll point out that there's just one main code
path to do coredumping in the kernel (for ELF; there are other binfmts that do
coredumping too), as opposed to lots of applications all trying to set up
coredumping themselves.

> Remember: we are having people with huge apps, and therefore huge
> coredumps. They want to hack a kernel in ugly way to make their dumps
> smaller.

MMAP_NODUMP or MADV_NODUMP might be better. Let the apps mark out for
themselves what they want.

> ...but there's better solution for them, create (& debug) userland
> coredumping library. No need to hack a kernel.

Actually, a better way to do this may be similar the way I assume Windows to
work. On fatal fault: start up a debugger and PT_ATTACH corpse to it. The
debugger could then be something that'll just save the core. No need to make
sure you link in the core dumping library which might not catch the event if
it's bad enough as it's working from *inside* the program, and so is subject
to being corrupted by beserk code.

David