2013-03-08 18:00:50

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH -mm 0/3] coredump: signal_pending() checks and cleanups

Hello.

To remind, this replaces the previous series,

coredump-factor-out-the-setting-of-pf_dumpcore.patch
freezer-do-not-send-a-fake-signal-to-a-pf_dumpcore-thread.patch
coredump-make-wait_for_dump_helpers-freezable.patch

As Mandeep pointed out, 2/3 is not enough to make the coredump really
freezable.

By discussion with Mandeep, we simply accept the fact that the freezer
can truncate a core-dump, hopefully not a problem in practice.

2 and 3 become the "off-topic" cleanups but imho make sense anyway and
can help if we decide to make the coredumping freezable.

Mandeep, I didn't dare to keep your ack on 3/3, the patch was updated
a bit (s/freezable/interruptible + comments), I hope you can ack v2 too.
And of course I hope you will review 1 and 2 as well.

Oleg.


2013-03-08 18:01:09

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/3] coredump: introduce dump_interrupted()

By discussion with Mandeep Singh Baines <[email protected]>.

Change dump_write(), dump_seek() and do_coredump() to check
signal_pending() and abort if it is true.

We add the new trivial helper, dump_interrupted(), to document that
this probably needs more work and to simplify the potential freezer
changes. Perhaps it will have more callers.

Ideally it should do try_to_freeze() but then we need the unpleasant
changes in dump_write() and wait_for_dump_helpers(). So far we simply
accept the fact that the freezer can truncate a core-dump but at least
you can reliably suspend.

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/coredump.c | 20 +++++++++++++++++---
1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 5503d94..66f65f0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -418,6 +418,17 @@ static void coredump_finish(struct mm_struct *mm, bool core_dumped)
mm->core_state = NULL;
}

+static bool dump_interrupted(void)
+{
+ /*
+ * SIGKILL or freezing() interrupt the coredumping. Perhaps we
+ * can do try_to_freeze() and check __fatal_signal_pending(),
+ * but then we need to teach dump_write() to restart and clear
+ * TIF_SIGPENDING.
+ */
+ return signal_pending(current);
+}
+
static void wait_for_dump_helpers(struct file *file)
{
struct pipe_inode_info *pipe;
@@ -636,7 +647,7 @@ void do_coredump(siginfo_t *siginfo)
if (displaced)
put_files_struct(displaced);

- core_dumped = binfmt->core_dump(&cprm);
+ core_dumped = !dump_interrupted() && binfmt->core_dump(&cprm);

if (ispipe && core_pipe_limit)
wait_for_dump_helpers(cprm.file);
@@ -664,7 +675,9 @@ fail:
*/
int dump_write(struct file *file, const void *addr, int nr)
{
- return access_ok(VERIFY_READ, addr, nr) && file->f_op->write(file, addr, nr, &file->f_pos) == nr;
+ return !dump_interrupted() &&
+ access_ok(VERIFY_READ, addr, nr) &&
+ file->f_op->write(file, addr, nr, &file->f_pos) == nr;
}
EXPORT_SYMBOL(dump_write);

@@ -673,7 +686,8 @@ int dump_seek(struct file *file, loff_t off)
int ret = 1;

if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
- if (file->f_op->llseek(file, off, SEEK_CUR) < 0)
+ if (dump_interrupted() ||
+ file->f_op->llseek(file, off, SEEK_CUR) < 0)
return 0;
} else {
char *buf = (char *)get_zeroed_page(GFP_KERNEL);
--
1.5.5.1

2013-03-08 18:01:15

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/3] coredump: factor out the setting of PF_DUMPCORE

Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE,
move this into zap_threads() called by do_coredump().

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/ia32/ia32_aout.c | 1 -
fs/binfmt_aout.c | 1 -
fs/binfmt_elf.c | 3 +--
fs/binfmt_elf_fdpic.c | 2 --
fs/coredump.c | 1 +
5 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index a703af1..24b787c 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -162,7 +162,6 @@ static int aout_core_dump(long signr, struct pt_regs *regs, struct file *file,
fs = get_fs();
set_fs(KERNEL_DS);
has_dumped = 1;
- current->flags |= PF_DUMPCORE;
strncpy(dump.u_comm, current->comm, sizeof(current->comm));
dump.u_ar0 = offsetof(struct user32, regs);
dump.signal = signr;
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 6043567..f70aea2 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -62,7 +62,6 @@ static int aout_core_dump(struct coredump_params *cprm)
fs = get_fs();
set_fs(KERNEL_DS);
has_dumped = 1;
- current->flags |= PF_DUMPCORE;
strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm));
dump.u_ar0 = offsetof(struct user, regs);
dump.signal = cprm->siginfo->si_signo;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 0c42cdb..1f52be1 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2082,8 +2082,7 @@ static int elf_core_dump(struct coredump_params *cprm)
goto cleanup;

has_dumped = 1;
- current->flags |= PF_DUMPCORE;
-
+
fs = get_fs();
set_fs(KERNEL_DS);

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index dc84732..8f2c03d 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1684,8 +1684,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
fill_elf_fdpic_header(elf, e_phnum);

has_dumped = 1;
- current->flags |= PF_DUMPCORE;
-
/*
* Set up the notes in similar form to SVR4 core dumps made
* with info from their /proc.
diff --git a/fs/coredump.c b/fs/coredump.c
index 66f65f0..477f393 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -299,6 +299,7 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
if (unlikely(nr < 0))
return nr;

+ tsk->flags = PF_DUMPCORE;
if (atomic_read(&mm->mm_users) == nr + 1)
goto done;
/*
--
1.5.5.1

2013-03-08 18:01:19

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/3] coredump: change wait_for_dump_helpers() to use wait_event_interruptible()

wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
wait_event-like loop. This is not needed and in fact this is not
strictly correct, we can/should do this only once after we change
pipe->writers. We could even check if it becomes zero.

Change this code to use use wait_event_interruptible(), this can
also help to make this wait freezable.

With this patch we check pipe->readers without pipe_lock(), this
is fine. Once we see pipe->readers == 1 we know that the handler
decremented the counter, this is all we need.

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/coredump.c | 15 +++++++++------
1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 477f393..667413c 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -439,17 +439,20 @@ static void wait_for_dump_helpers(struct file *file)
pipe_lock(pipe);
pipe->readers++;
pipe->writers--;
+ wake_up_interruptible_sync(&pipe->wait);
+ kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
+ pipe_unlock(pipe);

- while ((pipe->readers > 1) && (!signal_pending(current))) {
- wake_up_interruptible_sync(&pipe->wait);
- kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
- pipe_wait(pipe);
- }
+ /*
+ * We actually want wait_event_freezable() but then we need
+ * to clear TIF_SIGPENDING and improve dump_interrupted().
+ */
+ wait_event_interruptible(pipe->wait, pipe->readers == 1);

+ pipe_lock(pipe);
pipe->readers--;
pipe->writers++;
pipe_unlock(pipe);
-
}

/*
--
1.5.5.1

2013-03-08 20:21:09

by Mandeep Singh Baines

[permalink] [raw]
Subject: Re: [PATCH 2/3] coredump: factor out the setting of PF_DUMPCORE

On Fri, Mar 8, 2013 at 9:59 AM, Oleg Nesterov <[email protected]> wrote:
> Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE,
> move this into zap_threads() called by do_coredump().
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Mandeep Singh Baines <[email protected]>

> ---
> arch/x86/ia32/ia32_aout.c | 1 -
> fs/binfmt_aout.c | 1 -
> fs/binfmt_elf.c | 3 +--
> fs/binfmt_elf_fdpic.c | 2 --
> fs/coredump.c | 1 +
> 5 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
> index a703af1..24b787c 100644
> --- a/arch/x86/ia32/ia32_aout.c
> +++ b/arch/x86/ia32/ia32_aout.c
> @@ -162,7 +162,6 @@ static int aout_core_dump(long signr, struct pt_regs *regs, struct file *file,
> fs = get_fs();
> set_fs(KERNEL_DS);
> has_dumped = 1;
> - current->flags |= PF_DUMPCORE;
> strncpy(dump.u_comm, current->comm, sizeof(current->comm));
> dump.u_ar0 = offsetof(struct user32, regs);
> dump.signal = signr;
> diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
> index 6043567..f70aea2 100644
> --- a/fs/binfmt_aout.c
> +++ b/fs/binfmt_aout.c
> @@ -62,7 +62,6 @@ static int aout_core_dump(struct coredump_params *cprm)
> fs = get_fs();
> set_fs(KERNEL_DS);
> has_dumped = 1;
> - current->flags |= PF_DUMPCORE;
> strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm));
> dump.u_ar0 = offsetof(struct user, regs);
> dump.signal = cprm->siginfo->si_signo;
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 0c42cdb..1f52be1 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -2082,8 +2082,7 @@ static int elf_core_dump(struct coredump_params *cprm)
> goto cleanup;
>
> has_dumped = 1;
> - current->flags |= PF_DUMPCORE;
> -
> +
> fs = get_fs();
> set_fs(KERNEL_DS);
>
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index dc84732..8f2c03d 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -1684,8 +1684,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
> fill_elf_fdpic_header(elf, e_phnum);
>
> has_dumped = 1;
> - current->flags |= PF_DUMPCORE;
> -
> /*
> * Set up the notes in similar form to SVR4 core dumps made
> * with info from their /proc.
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 66f65f0..477f393 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -299,6 +299,7 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
> if (unlikely(nr < 0))
> return nr;
>
> + tsk->flags = PF_DUMPCORE;
> if (atomic_read(&mm->mm_users) == nr + 1)
> goto done;
> /*
> --
> 1.5.5.1
>

2013-03-08 20:22:32

by Mandeep Singh Baines

[permalink] [raw]
Subject: Re: [PATCH 3/3] coredump: change wait_for_dump_helpers() to use wait_event_interruptible()

On Fri, Mar 8, 2013 at 9:59 AM, Oleg Nesterov <[email protected]> wrote:
> wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
> wait_event-like loop. This is not needed and in fact this is not
> strictly correct, we can/should do this only once after we change
> pipe->writers. We could even check if it becomes zero.
>
> Change this code to use use wait_event_interruptible(), this can
> also help to make this wait freezable.
>
> With this patch we check pipe->readers without pipe_lock(), this
> is fine. Once we see pipe->readers == 1 we know that the handler
> decremented the counter, this is all we need.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Mandeep Singh Baines <[email protected]>

> ---
> fs/coredump.c | 15 +++++++++------
> 1 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 477f393..667413c 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -439,17 +439,20 @@ static void wait_for_dump_helpers(struct file *file)
> pipe_lock(pipe);
> pipe->readers++;
> pipe->writers--;
> + wake_up_interruptible_sync(&pipe->wait);
> + kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> + pipe_unlock(pipe);
>
> - while ((pipe->readers > 1) && (!signal_pending(current))) {
> - wake_up_interruptible_sync(&pipe->wait);
> - kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> - pipe_wait(pipe);
> - }
> + /*
> + * We actually want wait_event_freezable() but then we need
> + * to clear TIF_SIGPENDING and improve dump_interrupted().
> + */
> + wait_event_interruptible(pipe->wait, pipe->readers == 1);
>
> + pipe_lock(pipe);
> pipe->readers--;
> pipe->writers++;
> pipe_unlock(pipe);
> -
> }
>
> /*
> --
> 1.5.5.1
>

2013-03-08 20:54:33

by Mandeep Singh Baines

[permalink] [raw]
Subject: Re: [PATCH 1/3] coredump: introduce dump_interrupted()

On Fri, Mar 8, 2013 at 9:59 AM, Oleg Nesterov <[email protected]> wrote:
> By discussion with Mandeep Singh Baines <[email protected]>.
>
> Change dump_write(), dump_seek() and do_coredump() to check
> signal_pending() and abort if it is true.
>
> We add the new trivial helper, dump_interrupted(), to document that
> this probably needs more work and to simplify the potential freezer
> changes. Perhaps it will have more callers.
>
> Ideally it should do try_to_freeze() but then we need the unpleasant
> changes in dump_write() and wait_for_dump_helpers(). So far we simply
> accept the fact that the freezer can truncate a core-dump but at least
> you can reliably suspend.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Mandeep Singh Baines <[email protected]>

dump_write aborts anyway in the pipe case. pipe_wait is interruptible
and should return -ERESTARTSYS if there is a signal pending.

But I guess there is no signal pending check in the disk write path.
So this allows you to bail out early and unblock suspend instead of
trying to write out all the vmas to a slow disk.

You may want to consider just checking dump_interrupted() at the very
top of the functions instead.

Regards,
Mandeep

> ---
> fs/coredump.c | 20 +++++++++++++++++---
> 1 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 5503d94..66f65f0 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -418,6 +418,17 @@ static void coredump_finish(struct mm_struct *mm, bool core_dumped)
> mm->core_state = NULL;
> }
>
> +static bool dump_interrupted(void)
> +{
> + /*
> + * SIGKILL or freezing() interrupt the coredumping. Perhaps we
> + * can do try_to_freeze() and check __fatal_signal_pending(),
> + * but then we need to teach dump_write() to restart and clear
> + * TIF_SIGPENDING.
> + */
> + return signal_pending(current);
> +}
> +
> static void wait_for_dump_helpers(struct file *file)
> {
> struct pipe_inode_info *pipe;
> @@ -636,7 +647,7 @@ void do_coredump(siginfo_t *siginfo)
> if (displaced)
> put_files_struct(displaced);
>
> - core_dumped = binfmt->core_dump(&cprm);
> + core_dumped = !dump_interrupted() && binfmt->core_dump(&cprm);
>
> if (ispipe && core_pipe_limit)
> wait_for_dump_helpers(cprm.file);
> @@ -664,7 +675,9 @@ fail:
> */
> int dump_write(struct file *file, const void *addr, int nr)
> {
> - return access_ok(VERIFY_READ, addr, nr) && file->f_op->write(file, addr, nr, &file->f_pos) == nr;
> + return !dump_interrupted() &&
> + access_ok(VERIFY_READ, addr, nr) &&
> + file->f_op->write(file, addr, nr, &file->f_pos) == nr;
> }
> EXPORT_SYMBOL(dump_write);
>
> @@ -673,7 +686,8 @@ int dump_seek(struct file *file, loff_t off)
> int ret = 1;
>
> if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
> - if (file->f_op->llseek(file, off, SEEK_CUR) < 0)
> + if (dump_interrupted() ||
> + file->f_op->llseek(file, off, SEEK_CUR) < 0)
> return 0;
> } else {
> char *buf = (char *)get_zeroed_page(GFP_KERNEL);
> --
> 1.5.5.1
>

2013-03-08 21:20:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/3] coredump: introduce dump_interrupted()

On Fri, 8 Mar 2013 18:59:15 +0100 Oleg Nesterov <[email protected]> wrote:

> Change dump_write(), dump_seek() and do_coredump() to check
> signal_pending() and abort if it is true.

hm, why.

I think we're missing some context here - this is to support freezing,
yes? There's some undescribed interaction between the freezer and the
core-dumper which is being fixed?

IOW, can we please have the high-level overview of what this patchset
is trying to achieve?



An example of why this is needed: the dump_interrupted() check which
was added to dump_seek() is just weird. An lseek is instantaneous, so
why do we need to bother checking for signals there if the caller will
be checking one microsecond later anyway??

And if the file doesn't support lseek (do such files exist? should we
be returning 0 instead of -ENOMEM?), we just sit there in a loop
extending the file with write(). This can take *ages*, but this part
of dump_seek() *didn't* get the signal check!

So it makes no sense at all. If we had that what-Oleg-is-trying-to-do
text then perhaps others could understand all of this?

> We add the new trivial helper, dump_interrupted(), to document that
> this probably needs more work and to simplify the potential freezer
> changes. Perhaps it will have more callers.
>
> Ideally it should do try_to_freeze() but then we need the unpleasant
> changes in dump_write() and wait_for_dump_helpers(). So far we simply
> accept the fact that the freezer can truncate a core-dump but at least
> you can reliably suspend.

OK, so there is some connection between this and suspending. Details,
please...

2013-03-09 18:50:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/3] coredump: introduce dump_interrupted()

On 03/08, Mandeep Singh Baines wrote:
>
> On Fri, Mar 8, 2013 at 9:59 AM, Oleg Nesterov <[email protected]> wrote:
> > By discussion with Mandeep Singh Baines <[email protected]>.
> >
> > Change dump_write(), dump_seek() and do_coredump() to check
> > signal_pending() and abort if it is true.
> >
> > We add the new trivial helper, dump_interrupted(), to document that
> > this probably needs more work and to simplify the potential freezer
> > changes. Perhaps it will have more callers.
> >
> > Ideally it should do try_to_freeze() but then we need the unpleasant
> > changes in dump_write() and wait_for_dump_helpers(). So far we simply
> > accept the fact that the freezer can truncate a core-dump but at least
> > you can reliably suspend.
> >
> > Signed-off-by: Oleg Nesterov <[email protected]>
>
> Acked-by: Mandeep Singh Baines <[email protected]>

Thanks!

> dump_write aborts anyway in the pipe case. pipe_wait is interruptible
> and should return -ERESTARTSYS if there is a signal pending.

Yes,

> But I guess there is no signal pending check in the disk write path.
> So this allows you to bail out early and unblock suspend instead of
> trying to write out all the vmas to a slow disk.

Exactly.

And, please do not forget, we want to bail out even without suspend.
Someone told me they had to reboot the machine because they could not
interrupt the huge coredump on the slow media.

> You may want to consider just checking dump_interrupted() at the very
> top of the functions instead.

What do you mean?

dump_write/seek check it at the start. Probably dump_seek() should do
this unconditionally (with this patch the non-llseek case relies on
dump_write), I am fine either way.

Oleg.

2013-03-09 19:18:37

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/3] coredump: introduce dump_interrupted()

On 03/08, Andrew Morton wrote:
>
> On Fri, 8 Mar 2013 18:59:15 +0100 Oleg Nesterov <[email protected]> wrote:
>
> > Change dump_write(), dump_seek() and do_coredump() to check
> > signal_pending() and abort if it is true.
>
> hm, why.

Firstly. we need these changes to ensure that the coredump won't delay
suspend, and to ensure it reacts to SIGKILL "quickly enough". A core
dump can take a lot of time.

> I think we're missing some context here - this is to support freezing,
> yes?

No. This is to document that

- currently we do not support freezing

- why we do not support, and what should we do to support
(the comments in dump_interrupted/wait_for_dump_helpers)

If do_coredump() "races" with suspend/etc we simply abort, hopefully
this is fine in practice. And even if we decide to change this later,
I hope this series can be counted as a preparation.

> An example of why this is needed: the dump_interrupted() check which
> was added to dump_seek() is just weird. An lseek is instantaneous,
^^^^^^^^^^^^^
Oh, I simply do not know, this can depend on the filesystem?

> And if the file doesn't support lseek (do such files exist? should we
> be returning 0 instead of -ENOMEM?),

(can't comment, I do not know)

> we just sit there in a loop
> extending the file with write(). This can take *ages*, but this part
> of dump_seek() *didn't* get the signal check!

The loop does dump_write() which checks dump_interrupted() at the start.

> > Ideally it should do try_to_freeze() but then we need the unpleasant
> > changes in dump_write() and wait_for_dump_helpers(). So far we simply
> > accept the fact that the freezer can truncate a core-dump but at least
> > you can reliably suspend.
>
> OK, so there is some connection between this and suspending. Details,
> please...

It is not trivial to change dump_write() to restart if f_op->write()
fails because of freezing(). We need to handle the short writes, we need
to clear TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless
we change it to check PF_DUMPCORE), and somehow we need to avoid the
races with freeze_task + __thaw_task.

Everything looks possible but imho doesn't worth a trouble, a coredump
truncated by freezer is tolerable. I hope. And again, even if we decide
to "fix" this problem we can do this on top of these changes.

Oleg.