2007-05-25 12:57:09

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 0/7] coredump: core dump masking support v5

Hi,

This patch series is version 5 of the core dump masking feature,
which controls which VMAs should be dumped based on their
memory types and per-process flags.

I adopted most of Andrew's suggestion at the previous version.
He also suggested using system call instead of /proc/<pid>/
interface, I decided to use the latter continuously because
adding new system call with pid argument will give a big impact
on the kernel.


You can access the per-process flags via /proc/<pid>/coredump_filter
interface. coredump_filter represents a bitmask of memory types,
and if a bit is set, VMAs of corresponding memory type are written
into a core file when the process is dumped. The bitmask is
inherited from the parent process when a process is created.

The original purpose is to avoid longtime system slowdown when a
number of processes which share a huge shared memory are dumped at
the same time. To achieve this purpose, this patch series adds an
ability to suppress dumping anonymous shared memory for specified
processes. In this version, three other memory types are also
supported.

Here are the coredump_filter bits:
bit 0: anonymous private memory
bit 1: anonymous shared memory
bit 2: file-backed private memory
bit 3: file-backed shared memory

The default value of coredump_filter is 0x3. This means the
new core dump routine has the same behavior as conventional
behavior by default.

In this version, coredump_filter bits and mm.dumpable are merged
into mm.flags, and it is accessed by atomic bitops.

This patch series can be applied against 2.6.22-rc2-mm1.
The supported core file formats are ELF and ELF-FDPIC. ELF has been
tested, but ELF-FDPIC has not been built and tested because I don't
have the test environment.

Any coments are welcome.

Best regards.

ChangeLog
=========
v5:
- map three-value dumpable to two flags and store into mm.flags
- enable to control anonymous private, file-backed private and
file-backed shared memory
- provide /proc/<pid>/coredump_filter interface instead of
/proc/<pid>/coredump_omit_anonymous_shared
- use bitops to atomically access the core dump setting instead
of using a spinlock
- pass the core dump setting to maydump() through its argument
again

v4:
http://lkml.org/lkml/2007/3/1/501
- in maydump(), retrieve the core dump setting from mm_struct
directly, instead of its additional argument
- writing to /proc/<pid>/coredump_omit_anonymous_shared returns
EBUSY while core dumping.

v3:
http://lkml.org/lkml/2007/2/16/149
- remove `/proc/<pid>/core_flags' proc entry
- add `/proc/<pid>/coredump_anonymous_shared' as a named flag
- remove kernel.core_flags_enable sysctl parameter

v2:
http://lkml.org/lkml/2007/1/26/108
http://lkml.org/lkml/2007/1/26/189
- rename `coremask' to `core_flags'
- change `core_flags' member in mm_struct to a bit field
next to `dumpable'
- introduce a global spin lock to protect adjacent two bit fields
(core_flags and dumpable) from race condition
- fix a bug that the generated core file can be corrupted when
core dumping and updating core_flags occur concurrently
- add kernel.core_flags_enable sysctl parameter to enable/disable
flags in /proc/<pid>/core_flags
- support ELF-FDPIC binary format, but not tested

v1:
http://lkml.org/lkml/2006/12/13/17

--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory



2007-05-25 13:05:17

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 1/7] bound suid_dumpable sysctl

This patch limits a value of suid_dumpable sysctl to the range of 0 to 2.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
kernel/sysctl.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc2-mm1/kernel/sysctl.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/kernel/sysctl.c
+++ linux-2.6.22-rc2-mm1/kernel/sysctl.c
@@ -690,6 +690,7 @@ static ctl_table kern_table[] = {
/* Constants for minimum and maximum testing in vm_table.
We use these as one-element integer vectors. */
static int zero;
+static int two = 2;
static int one_hundred = 100;


@@ -1125,7 +1126,10 @@ static ctl_table fs_table[] = {
.data = &suid_dumpable,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = &proc_dointvec,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ .extra2 = &two,
},
{
.ctl_name = CTL_UNNUMBERED,


2007-05-25 13:07:09

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 2/7] reimplementation of dumpable using two flags

This patch changes mm_struct.dumpable to a pair of bit flags.

set_dumpable() converts three-value dumpable to two flags and
stores it into lower two bits of mm_struct.flags instead of
mm_struct.dumpable. get_dumpable() behaves in the opposite way.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/exec.c | 61 +++++++++++++++++++++++++++++----
fs/proc/base.c | 2 -
include/asm-ia64/processor.h | 4 +-
include/linux/sched.h | 9 ++++
kernel/ptrace.c | 2 -
kernel/sys.c | 24 ++++++------
security/commoncap.c | 2 -
security/dummy.c | 2 -
8 files changed, 81 insertions(+), 25 deletions(-)

Index: linux-2.6.22-rc2-mm1/fs/exec.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/fs/exec.c
+++ linux-2.6.22-rc2-mm1/fs/exec.c
@@ -864,9 +864,9 @@ int flush_old_exec(struct linux_binprm *
current->sas_ss_sp = current->sas_ss_size = 0;

if (current->euid == current->uid && current->egid == current->gid)
- current->mm->dumpable = 1;
+ set_dumpable(current->mm, 1);
else
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

name = bprm->filename;

@@ -894,7 +894,7 @@ int flush_old_exec(struct linux_binprm *
file_permission(bprm->file, MAY_READ) ||
(bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)) {
suid_keys(current);
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
}

/* An exec changes our domain. We are no longer part of the thread
@@ -1484,6 +1484,55 @@ fail:
return core_waiters;
}

+/*
+ * set_dumpable converts traditional three-value dumpable to two flags and
+ * stores them into mm->flags. It modifies lower two bits of mm->flags, but
+ * these bits are not changed atomically. So get_dumpable can observe the
+ * intermediate state. To avoid doing unexpected behavior, get get_dumpable
+ * return either old dumpable or new one by paying attention to the order of
+ * modifying the bits.
+ *
+ * dumpable | mm->flags (binary)
+ * old new | initial interim final
+ * ---------+-----------------------
+ * 0 1 | 00 01 01
+ * 0 2 | 00 10(*) 11
+ * 1 0 | 01 00 00
+ * 1 2 | 01 11 11
+ * 2 0 | 11 10(*) 00
+ * 2 1 | 11 11 01
+ *
+ * (*) get_dumpable regards interim value of 10 as 11.
+ */
+void set_dumpable(struct mm_struct *mm, int value)
+{
+ switch (value) {
+ case 0:
+ clear_bit(MMF_DUMPABLE, &mm->flags);
+ smp_wmb();
+ clear_bit(MMF_DUMP_SECURELY, &mm->flags);
+ break;
+ case 1:
+ set_bit(MMF_DUMPABLE, &mm->flags);
+ smp_wmb();
+ clear_bit(MMF_DUMP_SECURELY, &mm->flags);
+ break;
+ case 2:
+ set_bit(MMF_DUMP_SECURELY, &mm->flags);
+ smp_wmb();
+ set_bit(MMF_DUMPABLE, &mm->flags);
+ break;
+ }
+}
+
+int get_dumpable(struct mm_struct *mm)
+{
+ int ret;
+
+ ret = mm->flags & 0x3;
+ return (ret >= 2) ? 2 : ret;
+}
+
int do_coredump(long signr, int exit_code, struct pt_regs * regs)
{
char corename[CORENAME_MAX_SIZE + 1];
@@ -1502,7 +1551,7 @@ int do_coredump(long signr, int exit_cod
if (!binfmt || !binfmt->core_dump)
goto fail;
down_write(&mm->mmap_sem);
- if (!mm->dumpable) {
+ if (!get_dumpable(mm)) {
up_write(&mm->mmap_sem);
goto fail;
}
@@ -1512,11 +1561,11 @@ int do_coredump(long signr, int exit_cod
* process nor do we know its entire history. We only know it
* was tainted so we dump it as root in mode 2.
*/
- if (mm->dumpable == 2) { /* Setuid core dump mode */
+ if (get_dumpable(mm) == 2) { /* Setuid core dump mode */
flag = O_EXCL; /* Stop rewrite attacks */
current->fsuid = 0; /* Dump root private */
}
- mm->dumpable = 0;
+ set_dumpable(mm, 0);

retval = coredump_wait(exit_code);
if (retval < 0)
Index: linux-2.6.22-rc2-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.22-rc2-mm1.orig/include/linux/sched.h
+++ linux-2.6.22-rc2-mm1/include/linux/sched.h
@@ -324,6 +324,13 @@ typedef unsigned long mm_counter_t;
(mm)->hiwater_vm = (mm)->total_vm; \
} while (0)

+extern void set_dumpable(struct mm_struct *mm, int value);
+extern int get_dumpable(struct mm_struct *mm);
+
+/* mm flags */
+#define MMF_DUMPABLE 0 /* core dump is permitted */
+#define MMF_DUMP_SECURELY 1 /* core file is readable only by root */
+
struct mm_struct {
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
@@ -381,7 +388,7 @@ struct mm_struct {
unsigned int token_priority;
unsigned int last_interval;

- unsigned char dumpable:2;
+ unsigned long flags; /* Must use atomic bitops to access the bits */

/* coredumping support */
int core_waiters;
Index: linux-2.6.22-rc2-mm1/fs/proc/base.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/fs/proc/base.c
+++ linux-2.6.22-rc2-mm1/fs/proc/base.c
@@ -1037,7 +1037,7 @@ static int task_dumpable(struct task_str
task_lock(task);
mm = task->mm;
if (mm)
- dumpable = mm->dumpable;
+ dumpable = get_dumpable(mm);
task_unlock(task);
if(dumpable == 1)
return 1;
Index: linux-2.6.22-rc2-mm1/include/asm-ia64/processor.h
===================================================================
--- linux-2.6.22-rc2-mm1.orig/include/asm-ia64/processor.h
+++ linux-2.6.22-rc2-mm1/include/asm-ia64/processor.h
@@ -295,9 +295,9 @@ struct thread_struct {
regs->ar_bspstore = current->thread.rbs_bot; \
regs->ar_fpsr = FPSR_DEFAULT; \
regs->loadrs = 0; \
- regs->r8 = current->mm->dumpable; /* set "don't zap registers" flag */ \
+ regs->r8 = get_dumpable(current->mm); /* set "don't zap registers" flag */ \
regs->r12 = new_sp - 16; /* allocate 16 byte scratch area */ \
- if (unlikely(!current->mm->dumpable)) { \
+ if (unlikely(!get_dumpable(current->mm))) { \
/* \
* Zap scratch regs to avoid leaking bits between processes with different \
* uid/privileges. \
Index: linux-2.6.22-rc2-mm1/kernel/ptrace.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/kernel/ptrace.c
+++ linux-2.6.22-rc2-mm1/kernel/ptrace.c
@@ -142,7 +142,7 @@ static int may_attach(struct task_struct
return -EPERM;
smp_rmb();
if (task->mm)
- dumpable = task->mm->dumpable;
+ dumpable = get_dumpable(task->mm);
if (!dumpable && !capable(CAP_SYS_PTRACE))
return -EPERM;

Index: linux-2.6.22-rc2-mm1/kernel/sys.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/kernel/sys.c
+++ linux-2.6.22-rc2-mm1/kernel/sys.c
@@ -1025,7 +1025,7 @@ asmlinkage long sys_setregid(gid_t rgid,
return -EPERM;
}
if (new_egid != old_egid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
if (rgid != (gid_t) -1 ||
@@ -1055,13 +1055,13 @@ asmlinkage long sys_setgid(gid_t gid)

if (capable(CAP_SETGID)) {
if (old_egid != gid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->gid = current->egid = current->sgid = current->fsgid = gid;
} else if ((gid == current->gid) || (gid == current->sgid)) {
if (old_egid != gid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->egid = current->fsgid = gid;
@@ -1092,7 +1092,7 @@ static int set_user(uid_t new_ruid, int
switch_uid(new_user);

if (dumpclear) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->uid = new_ruid;
@@ -1148,7 +1148,7 @@ asmlinkage long sys_setreuid(uid_t ruid,
return -EAGAIN;

if (new_euid != old_euid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->fsuid = current->euid = new_euid;
@@ -1198,7 +1198,7 @@ asmlinkage long sys_setuid(uid_t uid)
return -EPERM;

if (old_euid != uid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->fsuid = current->euid = uid;
@@ -1243,7 +1243,7 @@ asmlinkage long sys_setresuid(uid_t ruid
}
if (euid != (uid_t) -1) {
if (euid != current->euid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->euid = euid;
@@ -1293,7 +1293,7 @@ asmlinkage long sys_setresgid(gid_t rgid
}
if (egid != (gid_t) -1) {
if (egid != current->egid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->egid = egid;
@@ -1339,7 +1339,7 @@ asmlinkage long sys_setfsuid(uid_t uid)
uid == current->suid || uid == current->fsuid ||
capable(CAP_SETUID)) {
if (uid != old_fsuid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->fsuid = uid;
@@ -1368,7 +1368,7 @@ asmlinkage long sys_setfsgid(gid_t gid)
gid == current->sgid || gid == current->fsgid ||
capable(CAP_SETGID)) {
if (gid != old_fsgid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
smp_wmb();
}
current->fsgid = gid;
@@ -2165,14 +2165,14 @@ asmlinkage long sys_prctl(int option, un
error = put_user(current->pdeath_signal, (int __user *)arg2);
break;
case PR_GET_DUMPABLE:
- error = current->mm->dumpable;
+ error = get_dumpable(current->mm);
break;
case PR_SET_DUMPABLE:
if (arg2 < 0 || arg2 > 1) {
error = -EINVAL;
break;
}
- current->mm->dumpable = arg2;
+ set_dumpable(current->mm, arg2);
break;

case PR_SET_UNALIGN:
Index: linux-2.6.22-rc2-mm1/security/commoncap.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/security/commoncap.c
+++ linux-2.6.22-rc2-mm1/security/commoncap.c
@@ -241,7 +241,7 @@ void cap_bprm_apply_creds (struct linux_

if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
!cap_issubset (new_permitted, current->cap_permitted)) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
if (!capable(CAP_SETUID)) {
Index: linux-2.6.22-rc2-mm1/security/dummy.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/security/dummy.c
+++ linux-2.6.22-rc2-mm1/security/dummy.c
@@ -130,7 +130,7 @@ static void dummy_bprm_free_security (st
static void dummy_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
{
if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

if ((unsafe & ~LSM_UNSAFE_PTRACE_CAP) && !capable(CAP_SETUID)) {
bprm->e_uid = current->uid;


2007-05-25 13:08:16

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 3/7] add an interface for core dump filter

This patch adds an interface to set/reset flags which determines
each memory segment should be dumped or not when a core file is
generated.

/proc/<pid>/coredump_filter file is provided to access the flags.
You can change the flag status for a particular process by
writing to or reading from the file.

The flag status is inherited to the child process when it is created.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/proc/base.c | 89 ++++++++++++++++++++++++++++++++++++++++
include/linux/sched.h | 14 ++++++
kernel/fork.c | 2
3 files changed, 105 insertions(+)

Index: linux-2.6.22-rc2-mm1/fs/proc/base.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/fs/proc/base.c
+++ linux-2.6.22-rc2-mm1/fs/proc/base.c
@@ -73,6 +73,7 @@
#include <linux/poll.h>
#include <linux/nsproxy.h>
#include <linux/oom.h>
+#include <linux/elf.h>
#include "internal.h"

/* NOTE:
@@ -1808,6 +1809,91 @@ static const struct inode_operations pro

#endif

+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+static ssize_t proc_coredump_filter_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct mm_struct *mm;
+ char buffer[PROC_NUMBUF];
+ size_t len;
+ int ret;
+
+ if (!task)
+ return -ESRCH;
+
+ ret = 0;
+ mm = get_task_mm(task);
+ if (mm) {
+ len = snprintf(buffer, sizeof(buffer), "%08lx\n",
+ ((mm->flags & MMF_DUMP_FILTER_MASK) >>
+ MMF_DUMP_FILTER_SHIFT));
+ mmput(mm);
+ ret = simple_read_from_buffer(buf, count, ppos, buffer, len);
+ }
+
+ put_task_struct(task);
+
+ return ret;
+}
+
+static ssize_t proc_coredump_filter_write(struct file *file,
+ const char __user *buf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct task_struct *task;
+ struct mm_struct *mm;
+ char buffer[PROC_NUMBUF], *end;
+ unsigned int val;
+ int ret;
+ int i;
+ unsigned long mask;
+
+ ret = -EFAULT;
+ memset(buffer, 0, sizeof(buffer));
+ if (count > sizeof(buffer) - 1)
+ count = sizeof(buffer) - 1;
+ if (copy_from_user(buffer, buf, count))
+ goto out_no_task;
+
+ ret = -EINVAL;
+ val = (unsigned int)simple_strtoul(buffer, &end, 0);
+ if (*end == '\n')
+ end++;
+ if (end - buffer == 0)
+ goto out_no_task;
+
+ ret = -ESRCH;
+ task = get_proc_task(file->f_dentry->d_inode);
+ if (!task)
+ goto out_no_task;
+
+ ret = end - buffer;
+ mm = get_task_mm(task);
+ if (!mm)
+ goto out_no_mm;
+
+ for (i = 0, mask = 1; i < MMF_DUMP_FILTER_BITS; i++, mask <<= 1) {
+ if (val & mask)
+ set_bit(i + MMF_DUMP_FILTER_SHIFT, &mm->flags);
+ else
+ clear_bit(i + MMF_DUMP_FILTER_SHIFT, &mm->flags);
+ }
+
+ mmput(mm);
+ out_no_mm:
+ put_task_struct(task);
+ out_no_task:
+ return ret;
+}
+
+static const struct file_operations proc_coredump_filter_operations = {
+ .read = proc_coredump_filter_read,
+ .write = proc_coredump_filter_write,
+};
+#endif
+
/*
* /proc/self:
*/
@@ -2036,6 +2122,9 @@ static const struct pid_entry tgid_base_
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
#endif
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+ REG("coredump_filter", S_IRUGO|S_IWUSR, coredump_filter),
+#endif
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, pid_io_accounting),
#endif
Index: linux-2.6.22-rc2-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.22-rc2-mm1.orig/include/linux/sched.h
+++ linux-2.6.22-rc2-mm1/include/linux/sched.h
@@ -328,8 +328,22 @@ extern void set_dumpable(struct mm_struc
extern int get_dumpable(struct mm_struct *mm);

/* mm flags */
+/* dumpable bits */
#define MMF_DUMPABLE 0 /* core dump is permitted */
#define MMF_DUMP_SECURELY 1 /* core file is readable only by root */
+#define MMF_DUMPABLE_BITS 2
+
+/* coredump filter bits */
+#define MMF_DUMP_ANON_PRIVATE 2
+#define MMF_DUMP_ANON_SHARED 3
+#define MMF_DUMP_MAPPED_PRIVATE 4
+#define MMF_DUMP_MAPPED_SHARED 5
+#define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS
+#define MMF_DUMP_FILTER_BITS 4
+#define MMF_DUMP_FILTER_MASK \
+ (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
+#define MMF_DUMP_FILTER_DEFAULT \
+ ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED))

struct mm_struct {
struct vm_area_struct * mmap; /* list of VMAs */
Index: linux-2.6.22-rc2-mm1/kernel/fork.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/kernel/fork.c
+++ linux-2.6.22-rc2-mm1/kernel/fork.c
@@ -335,6 +335,8 @@ static struct mm_struct * mm_init(struct
atomic_set(&mm->mm_count, 1);
init_rwsem(&mm->mmap_sem);
INIT_LIST_HEAD(&mm->mmlist);
+ mm->flags = (current->mm) ? current->mm->flags
+ : MMF_DUMP_FILTER_DEFAULT;
mm->core_waiters = 0;
mm->nr_ptes = 0;
set_mm_counter(mm, file_rss, 0);


2007-05-25 13:09:19

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 4/7] ELF: enable core dump filtering

This patch enables core dump filtering for ELF-formatted core file.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/binfmt_elf.c | 30 +++++++++++++++++++++---------
1 files changed, 21 insertions(+), 9 deletions(-)

Index: linux-2.6.22-rc2-mm1/fs/binfmt_elf.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/fs/binfmt_elf.c
+++ linux-2.6.22-rc2-mm1/fs/binfmt_elf.c
@@ -1189,7 +1189,7 @@ static int dump_seek(struct file *file,
*
* I think we should skip something. But I am not sure how. H.J.
*/
-static int maydump(struct vm_area_struct *vma)
+static int maydump(struct vm_area_struct *vma, unsigned long mm_flags)
{
/* The vma can be set up to tell us the answer directly. */
if (vma->vm_flags & VM_ALWAYSDUMP)
@@ -1199,15 +1199,19 @@ static int maydump(struct vm_area_struct
if (vma->vm_flags & (VM_IO | VM_RESERVED))
return 0;

- /* Dump shared memory only if mapped from an anonymous file. */
- if (vma->vm_flags & VM_SHARED)
- return vma->vm_file->f_path.dentry->d_inode->i_nlink == 0;
+ /* By default, dump shared memory if mapped from an anonymous file. */
+ if (vma->vm_flags & VM_SHARED) {
+ if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0)
+ return test_bit(MMF_DUMP_ANON_SHARED, &mm_flags);
+ else
+ return test_bit(MMF_DUMP_MAPPED_SHARED, &mm_flags);
+ }

- /* If it hasn't been written to, don't write it out */
+ /* By default, if it hasn't been written to, don't write it out. */
if (!vma->anon_vma)
- return 0;
+ return test_bit(MMF_DUMP_MAPPED_PRIVATE, &mm_flags);

- return 1;
+ return test_bit(MMF_DUMP_ANON_PRIVATE, &mm_flags);
}

/* An ELF note in memory */
@@ -1499,6 +1503,7 @@ static int elf_core_dump(long signr, str
#endif
int thread_status_size = 0;
elf_addr_t *auxv;
+ unsigned long mm_flags;

/*
* We no longer stop all VM operations.
@@ -1638,6 +1643,13 @@ static int elf_core_dump(long signr, str

dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

+ /*
+ * We must use the same mm->flags while dumping core to avoid
+ * inconsistency between the program headers and bodies, otherwise an
+ * unusable core file can be generated.
+ */
+ mm_flags = current->mm->flags;
+
/* Write program headers for segments dump */
for (vma = first_vma(current, gate_vma); vma != NULL;
vma = next_vma(vma, gate_vma)) {
@@ -1650,7 +1662,7 @@ static int elf_core_dump(long signr, str
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = maydump(vma) ? sz : 0;
+ phdr.p_filesz = maydump(vma, mm_flags) ? sz : 0;
phdr.p_memsz = sz;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -1693,7 +1705,7 @@ static int elf_core_dump(long signr, str
vma = next_vma(vma, gate_vma)) {
unsigned long addr;

- if (!maydump(vma))
+ if (!maydump(vma, mm_flags))
continue;

for (addr = vma->vm_start;


2007-05-25 13:11:15

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 5/7] ELF-FDPIC: remove an unused argument

This patch removes an unused argument from elf_fdpic_dump_segments().

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/binfmt_elf_fdpic.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.22-rc2-mm1/fs/binfmt_elf_fdpic.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/fs/binfmt_elf_fdpic.c
+++ linux-2.6.22-rc2-mm1/fs/binfmt_elf_fdpic.c
@@ -1456,8 +1456,8 @@ static int elf_dump_thread_status(long s
* dump the segments for an MMU process
*/
#ifdef CONFIG_MMU
-static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
- size_t *size, unsigned long *limit)
+static int elf_fdpic_dump_segments(struct file *file, size_t *size,
+ unsigned long *limit)
{
struct vm_area_struct *vma;

@@ -1511,8 +1511,8 @@ end_coredump:
* dump the segments for a NOMMU process
*/
#ifndef CONFIG_MMU
-static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
- size_t *size, unsigned long *limit)
+static int elf_fdpic_dump_segments(struct file *file, size_t *size,
+ unsigned long *limit)
{
struct vm_list_struct *vml;



2007-05-25 13:12:14

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 6/7] ELF-FDPIC: enable core dump filtering

This patch enables core dump filtering for ELF-FDPIC-formatted
core file.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/binfmt_elf_fdpic.c | 52 ++++++++++++++++++++++++++--------------
1 files changed, 35 insertions(+), 17 deletions(-)

Index: linux-2.6.22-rc2-mm1/fs/binfmt_elf_fdpic.c
===================================================================
--- linux-2.6.22-rc2-mm1.orig/fs/binfmt_elf_fdpic.c
+++ linux-2.6.22-rc2-mm1/fs/binfmt_elf_fdpic.c
@@ -1181,8 +1181,10 @@ static int dump_seek(struct file *file,
*
* I think we should skip something. But I am not sure how. H.J.
*/
-static int maydump(struct vm_area_struct *vma)
+static int maydump(struct vm_area_struct *vma, unsigned long mm_flags)
{
+ int dump_ok;
+
/* Do not dump I/O mapped devices or special mappings */
if (vma->vm_flags & (VM_IO | VM_RESERVED)) {
kdcore("%08lx: %08lx: no (IO)", vma->vm_start, vma->vm_flags);
@@ -1197,27 +1199,35 @@ static int maydump(struct vm_area_struct
return 0;
}

- /* Dump shared memory only if mapped from an anonymous file. */
+ /* By default, dump shared memory if mapped from an anonymous file. */
if (vma->vm_flags & VM_SHARED) {
if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0) {
- kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
- return 1;
+ dump_ok = test_bit(MMF_DUMP_ANON_SHARED, &mm_flags);
+ kdcore("%08lx: %08lx: %s (share)", vma->vm_start,
+ vma->vm_flags, dump_ok ? "yes" : "no");
+ return dump_ok;
}

- kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
- return 0;
+ dump_ok = test_bit(MMF_DUMP_MAPPED_SHARED, &mm_flags);
+ kdcore("%08lx: %08lx: %s (share)", vma->vm_start,
+ vma->vm_flags, dump_ok ? "yes" : "no");
+ return dump_ok;
}

#ifdef CONFIG_MMU
- /* If it hasn't been written to, don't write it out */
+ /* By default, if it hasn't been written to, don't write it out */
if (!vma->anon_vma) {
- kdcore("%08lx: %08lx: no (!anon)", vma->vm_start, vma->vm_flags);
- return 0;
+ dump_ok = test_bit(MMF_DUMP_MAPPED_PRIVATE, &mm_flags);
+ kdcore("%08lx: %08lx: %s (!anon)", vma->vm_start,
+ vma->vm_flags, dump_ok ? "yes" : "no");
+ return dump_ok;
}
#endif

- kdcore("%08lx: %08lx: yes", vma->vm_start, vma->vm_flags);
- return 1;
+ dump_ok = test_bit(MMF_DUMP_ANON_PRIVATE, &mm_flags);
+ kdcore("%08lx: %08lx: %s", vma->vm_start, vma->vm_flags,
+ dump_ok ? "yes" : "no");
+ return dump_ok;
}

/* An ELF note in memory */
@@ -1457,14 +1467,14 @@ static int elf_dump_thread_status(long s
*/
#ifdef CONFIG_MMU
static int elf_fdpic_dump_segments(struct file *file, size_t *size,
- unsigned long *limit)
+ unsigned long *limit, unsigned long mm_flags)
{
struct vm_area_struct *vma;

for (vma = current->mm->mmap; vma; vma = vma->vm_next) {
unsigned long addr;

- if (!maydump(vma))
+ if (!maydump(vma, mm_flags))
continue;

for (addr = vma->vm_start;
@@ -1512,14 +1522,14 @@ end_coredump:
*/
#ifndef CONFIG_MMU
static int elf_fdpic_dump_segments(struct file *file, size_t *size,
- unsigned long *limit)
+ unsigned long *limit, unsigned long mm_flags)
{
struct vm_list_struct *vml;

for (vml = current->mm->context.vmlist; vml; vml = vml->next) {
struct vm_area_struct *vma = vml->vma;

- if (!maydump(vma))
+ if (!maydump(vma, mm_flags))
continue;

if ((*size += PAGE_SIZE) > *limit)
@@ -1570,6 +1580,7 @@ static int elf_fdpic_core_dump(long sign
struct vm_list_struct *vml;
#endif
elf_addr_t *auxv;
+ unsigned long mm_flags;

/*
* We no longer stop all VM operations.
@@ -1707,6 +1718,13 @@ static int elf_fdpic_core_dump(long sign
/* Page-align dumped data */
dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

+ /*
+ * We must use the same mm->flags while dumping core to avoid
+ * inconsistency between the program headers and bodies, otherwise an
+ * unusable core file can be generated.
+ */
+ mm_flags = current->mm->flags;
+
/* write program headers for segments dump */
for (
#ifdef CONFIG_MMU
@@ -1728,7 +1746,7 @@ static int elf_fdpic_core_dump(long sign
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = maydump(vma) ? sz : 0;
+ phdr.p_filesz = maydump(vma, mm_flags) ? sz : 0;
phdr.p_memsz = sz;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -1762,7 +1780,7 @@ static int elf_fdpic_core_dump(long sign

DUMP_SEEK(dataoff);

- if (elf_fdpic_dump_segments(file, current->mm, &size, &limit) < 0)
+ if (elf_fdpic_dump_segments(file, &size, &limit, mm_flags) < 0)
goto end_coredump;

#ifdef ELF_CORE_WRITE_EXTRA_DATA


2007-05-25 13:13:21

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 7/7] documentation for /proc/pid/coredump_filter

This patch adds the documentation for /proc/<pid>/coredump_filter.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
Documentation/filesystems/proc.txt | 38 +++++++++++++++++++++++++++
1 files changed, 38 insertions(+)

Index: linux-2.6.22-rc2-mm1/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.22-rc2-mm1.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.22-rc2-mm1/Documentation/filesystems/proc.txt
@@ -42,6 +42,7 @@ Table of Contents
2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
2.13 /proc/<pid>/oom_score - Display current oom-killer score
2.14 /proc/<pid>/io - Display the IO accounting fields
+ 2.15 /proc/<pid>/coredump_filter - Core dump filtering settings

------------------------------------------------------------------------------
Preface
@@ -2135,4 +2136,41 @@ those 64-bit counters, process A could s
More information about this can be found within the taskstats documentation in
Documentation/accounting.

+2.15 /proc/<pid>/coredump_filter - Core dump filtering settings
+---------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory. Conversely,
+sometimes we wnat to save file-backed memory segments into a core file, not
+only the individual files.
+
+/proc/<pid>/coredump_filter allows you to customize which memory segments
+will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
+of memory types. If a bit of the bitmask is set, memory segments of the
+corresponding memory type are dumped, otherwise they are not dumped.
+
+The following 4 memory types are supported:
+ - (bit 0) anonymous private memory
+ - (bit 1) anonymous shared memory
+ - (bit 2) file-backed private memory
+ - (bit 3) file-backed shared memory
+
+ Note that MMIO pages such as frame buffer are never dumped and vDSO pages
+ are always dumped regardless of the bitmask status.
+
+Default value of coredump_filter is 0x3; this means all anonymous memory
+segments are dumped.
+
+If you don't want to dump all shared memory segments attached to pid 1234,
+write 1 to the process's proc file.
+
+ $ echo 0x1 > /proc/1234/coredump_filter
+
+When a new process is created, the process inherits the bitmask status from its
+parent. It is useful to set up coredump_filter before the program runs.
+For example:
+
+ $ echo 0x7 > /proc/self/coredump_filter
+ $ ./some_program
+
------------------------------------------------------------------------------


2007-05-28 01:13:17

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 7/7] documentation for /proc/pid/coredump_filter

On Fri, 25 May 2007 22:12:55 +0900 Kawai, Hidehiro wrote:

> This patch adds the documentation for /proc/<pid>/coredump_filter.
>
> Signed-off-by: Hidehiro Kawai <[email protected]>
> ---
> Documentation/filesystems/proc.txt | 38 +++++++++++++++++++++++++++
> 1 files changed, 38 insertions(+)
>
> Index: linux-2.6.22-rc2-mm1/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-2.6.22-rc2-mm1.orig/Documentation/filesystems/proc.txt
> +++ linux-2.6.22-rc2-mm1/Documentation/filesystems/proc.txt
> @@ -42,6 +42,7 @@ Table of Contents
> 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
> 2.13 /proc/<pid>/oom_score - Display current oom-killer score
> 2.14 /proc/<pid>/io - Display the IO accounting fields
> + 2.15 /proc/<pid>/coredump_filter - Core dump filtering settings

Looks good. Just one typo below.

------------------------------------------------------------------------------
> Preface
> @@ -2135,4 +2136,41 @@ those 64-bit counters, process A could s
> More information about this can be found within the taskstats documentation in
> Documentation/accounting.
>
> +2.15 /proc/<pid>/coredump_filter - Core dump filtering settings
> +---------------------------------------------------------------
> +When a process is dumped, all anonymous memory is written to a core file as
> +long as the size of the core file isn't limited. But sometimes we don't want
> +to dump some memory segments, for example, huge shared memory. Conversely,
> +sometimes we wnat to save file-backed memory segments into a core file, not

want

> +only the individual files.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-05-28 11:43:51

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 7/7] documentation for /proc/pid/coredump_filter

Hi Randy,

Randy Dunlap wrote:
> Looks good. Just one typo below.
>
> ------------------------------------------------------------------------------
>
>> Preface
>>@@ -2135,4 +2136,41 @@ those 64-bit counters, process A could s
>> More information about this can be found within the taskstats documentation in
>> Documentation/accounting.
>>
>>+2.15 /proc/<pid>/coredump_filter - Core dump filtering settings
>>+---------------------------------------------------------------
>>+When a process is dumped, all anonymous memory is written to a core file as
>>+long as the size of the core file isn't limited. But sometimes we don't want
>>+to dump some memory segments, for example, huge shared memory. Conversely,
>>+sometimes we wnat to save file-backed memory segments into a core file, not
>
> want

Thank you for your review. I attached the fixed patch.

Best regards,
--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory


Signed-off-by: Hidehiro Kawai <[email protected]>
---
Documentation/filesystems/proc.txt | 38 +++++++++++++++++++++++++++
1 files changed, 38 insertions(+)

Index: linux-2.6.22-rc2-mm1/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.22-rc2-mm1.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.22-rc2-mm1/Documentation/filesystems/proc.txt
@@ -42,6 +42,7 @@ Table of Contents
2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
2.13 /proc/<pid>/oom_score - Display current oom-killer score
2.14 /proc/<pid>/io - Display the IO accounting fields
+ 2.15 /proc/<pid>/coredump_filter - Core dump filtering settings

------------------------------------------------------------------------------
Preface
@@ -2135,4 +2136,41 @@ those 64-bit counters, process A could s
More information about this can be found within the taskstats documentation in
Documentation/accounting.

+2.15 /proc/<pid>/coredump_filter - Core dump filtering settings
+---------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory. Conversely,
+sometimes we want to save file-backed memory segments into a core file, not
+only the individual files.
+
+/proc/<pid>/coredump_filter allows you to customize which memory segments
+will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
+of memory types. If a bit of the bitmask is set, memory segments of the
+corresponding memory type are dumped, otherwise they are not dumped.
+
+The following 4 memory types are supported:
+ - (bit 0) anonymous private memory
+ - (bit 1) anonymous shared memory
+ - (bit 2) file-backed private memory
+ - (bit 3) file-backed shared memory
+
+ Note that MMIO pages such as frame buffer are never dumped and vDSO pages
+ are always dumped regardless of the bitmask status.
+
+Default value of coredump_filter is 0x3; this means all anonymous memory
+segments are dumped.
+
+If you don't want to dump all shared memory segments attached to pid 1234,
+write 1 to the process's proc file.
+
+ $ echo 0x1 > /proc/1234/coredump_filter
+
+When a new process is created, the process inherits the bitmask status from its
+parent. It is useful to set up coredump_filter before the program runs.
+For example:
+
+ $ echo 0x7 > /proc/self/coredump_filter
+ $ ./some_program
+
------------------------------------------------------------------------------