2009-06-01 16:18:33

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 05/31, Alan Cox wrote:
>
> On Sun, 31 May 2009 01:33:39 -0400
> Paul Smith <[email protected]> wrote:
>
> > coredump: Retry writes where appropriate
> >
> > Core dump write operations (especially to a pipe) can be incomplete due
> > to signal reception or possibly recoverable partial writes.
>
> NAK this
>
> > Previously any incomplete write in the ELF core dumper caused the core
> > dump to stop, giving short cores in these cases. Modify the core dumper
> > to retry the write where appropriate.
>
> The existing behaviour is an absolute godsend when you've something like
> a core dump stuck on an NFS mount or something trying to core dump to
> very slow media.

I agree, we should make the coredumping interruptible.

But I don't know which signal should intterrupt. At least SIGKILL should,
I think. As for other unhandled sig_fatal() signals, I am nor sure.
I can make a patch, but first I need to know what this patch should do.
Again, please look at:

killable/interruptible coredumps
http://marc.info/?l=linux-kernel&m=121665710711931


And we should not change dump_write(), we should create the new helper
which can be used by all fs/binfmt_*.c.

And of course, the coredumping thread should not play with ->blocked
or ->sighand->action. This is not even needed.

Oleg.


2009-06-01 16:43:01

by Alan

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> I think. As for other unhandled sig_fatal() signals, I am nor sure.
> I can make a patch, but first I need to know what this patch should do.
> Again, please look at:

If you are on the command line then SIGINT/SIGQUIT would be the obvious
ones for this ?

2009-06-01 17:17:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/01, Alan Cox wrote:
>
> > I think. As for other unhandled sig_fatal() signals, I am nor sure.
> > I can make a patch, but first I need to know what this patch should do.
> > Again, please look at:
>
> If you are on the command line then SIGINT/SIGQUIT would be the obvious
> ones for this ?

Not sure I understand. Do you mean we should treat the tty signals
specially ?

Personally, I don't think we should. If we decide that only SIGKILL
interrupts the coredumping, then I think ^C should not interrupt.

Oleg.

2009-06-01 17:37:54

by Paul Smith

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Mon, 2009-06-01 at 18:12 +0200, Oleg Nesterov wrote:
> I agree, we should make the coredumping interruptible.
>
> But I don't know which signal should intterrupt. At least SIGKILL should,
> I think. As for other unhandled sig_fatal() signals, I am nor sure.
> I can make a patch, but first I need to know what this patch should do.
> Again, please look at:
>
> killable/interruptible coredumps
> http://marc.info/?l=linux-kernel&m=121665710711931

Ideally interrupting a core dump is something that should only ever be
done because you wanted to actually interrupt the core, and would never
happen as a side effect of some other behavior that may have been
intended for the process before it dumped core.

In my setup, which is more like an embedded system where I can reboot
the device easily, I'd rather have the core dump hang if I happen to try
to write it to an unavailable resource than to lose cores if someone
sends an errant signal to the process (assuming these are the only two
choices). I realize that opinions about this differ based on the
purpose of the system (desktops and many types of servers probably would
rather have their pages back and don't care so much about cores).

My preference would be that no signal would ever cancel a core and there
would be some completely out-of-band method for this (something like
setting a flag via /proc/<pid> or similar) to be used if it was
necessary. However, I can't justify that complexity. So, SIGKILL seems
like a reasonable compromise.

On the other hand, IMO all other signals, including SIGINT and SIGQUIT,
should be ignored during core dumping. Allowing SIGKILL gives a method
for getting rid of core dumps in the relatively rare situation where
people want/need to do so, and I don't see any real benefit to adding
more signals to the list of things you can't do if you want robust
cores. Isn't one enough?

My $0.02.

2009-06-01 17:46:33

by Alan

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> > If you are on the command line then SIGINT/SIGQUIT would be the obvious
> > ones for this ?
>
> Not sure I understand. Do you mean we should treat the tty signals
> specially ?

Yes

> Personally, I don't think we should. If we decide that only SIGKILL
> interrupts the coredumping, then I think ^C should not interrupt.

Wait until you have a remote session over ssh that core dumps a 2GB core.
Then you'll understand why being able to ^C or ^\ it is useful.

Alan

2009-06-01 17:50:04

by Alan

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> On the other hand, IMO all other signals, including SIGINT and SIGQUIT,
> should be ignored during core dumping. Allowing SIGKILL gives a method
> for getting rid of core dumps in the relatively rare situation where
> people want/need to do so, and I don't see any real benefit to adding
> more signals to the list of things you can't do if you want robust
> cores. Isn't one enough?

I also want usability. SIGINT/SIGQUIT are never sent except by user
requests to terminate a process so they can safely be allowed. If the
alternatives are the status quo or SIGKILL only then I'd favour the
status quo particularly having experienced the alternatives on some old
Unix systems.

Alan

2009-06-01 18:28:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/01, Alan Cox wrote:
>
> > > If you are on the command line then SIGINT/SIGQUIT would be the obvious
> > > ones for this ?
> >
> > Not sure I understand. Do you mean we should treat the tty signals
> > specially ?
>
> Yes
>
> > Personally, I don't think we should. If we decide that only SIGKILL
> > interrupts the coredumping, then I think ^C should not interrupt.
>
> Wait until you have a remote session over ssh that core dumps a 2GB core.
> Then you'll understand why being able to ^C or ^\ it is useful.

Sure. But you have the same problem with

$ perl -e '$SIG{INT} = $SIG{QUIT} = IGNORE; sleep'
^C^C^C^\^\^\

over ssh.

And what if the coredumping task already has the pending SIGINT/SIGQUIT
or blocks/ignores them?

But don't get me wrong, I do agree this is useful, and this can be
implemented. But in this case, imho kill(SIGINT) should work as well,
not just ^C.


In short, I agree in advance with any authoritative decision. But
if we add more power to ^C (compared to kill), this should be a
separate patch imho.

Oleg.

2009-06-01 18:40:43

by Paul Smith

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Mon, 2009-06-01 at 18:49 +0100, Alan Cox wrote:
> > On the other hand, IMO all other signals, including SIGINT and SIGQUIT,
> > should be ignored during core dumping. Allowing SIGKILL gives a method
> > for getting rid of core dumps in the relatively rare situation where
> > people want/need to do so, and I don't see any real benefit to adding
> > more signals to the list of things you can't do if you want robust
> > cores. Isn't one enough?
>
> I also want usability. SIGINT/SIGQUIT are never sent except by user
> requests to terminate a process so they can safely be allowed. If the
> alternatives are the status quo or SIGKILL only then I'd favour the
> status quo particularly having experienced the alternatives on some old
> Unix systems.

SIGINT/SIGQUIT are sent all the time in situations where the user might
not want the core dump to be canceled. This is what I meant by "wanted
to actually interrupt the core"; it implies the user knows that a core
is being dumped and explicitly decides they do not want to have that
happen in this situation and takes some affirmative action to stop it.

If a program seems to be unresponsive the user could ^C, without
realizing that it was really dumping core. Now when they are asked to
produce the core so the problem can be debugged, they can't do it. Or,
a worker process might appear unresponsive due to a core being dumped
and the parent would decide to shoot it with SIGINT based on various
timeouts etc. Again we have no core available.

If the user has problems with coredumps there are all sorts of ways to
manage that. You can disable core dumps altogether via ulimit. You can
set core_pattern to dump to a fully-qualified pathname on faster media
instead of whatever working directory you're using.

Or, with this change, you can kill -9 the PID that's dumping core.

These things seem to me to provide a lot of usability features.

On the other hand there's no way to ensure full, reliable core dumps
with today's behavior.

2009-06-01 19:02:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Mon, Jun 01, 2009 at 08:02:32PM +0100, Alan Cox wrote:
> > If a program seems to be unresponsive the user could ^C, without
> > realizing that it was really dumping core. Now when they are asked to
> > produce the core so the problem can be debugged, they can't do it. Or,
>
> and get their prompt back, which is probably why they are banging ^C. If
> they didn't want their prompt back at that point they'd still be
> wondering why nothing was occuring at the point it said (core dumped)

Maybe we need a background core dump?

-Andi

--
[email protected] -- Speaking for myself only.

2009-06-01 19:02:51

by Alan

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> If a program seems to be unresponsive the user could ^C, without
> realizing that it was really dumping core. Now when they are asked to
> produce the core so the problem can be debugged, they can't do it. Or,

and get their prompt back, which is probably why they are banging ^C. If
they didn't want their prompt back at that point they'd still be
wondering why nothing was occuring at the point it said (core dumped)

Alan

2009-06-01 19:07:17

by Alan

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Mon, 1 Jun 2009 21:09:14 +0200
Andi Kleen <[email protected]> wrote:

> On Mon, Jun 01, 2009 at 08:02:32PM +0100, Alan Cox wrote:
> > > If a program seems to be unresponsive the user could ^C, without
> > > realizing that it was really dumping core. Now when they are asked to
> > > produce the core so the problem can be debugged, they can't do it. Or,
> >
> > and get their prompt back, which is probably why they are banging ^C. If
> > they didn't want their prompt back at that point they'd still be
> > wondering why nothing was occuring at the point it said (core dumped)
>
> Maybe we need a background core dump?

You can pretty much implement that via the pipe handler if you care. Just
buffer aggressively.

For the general case however programs assume that when wait() returns
indicating the core dump occurred that they can immediately access the
dump (eg bug-buddy in Gnome)

Alan

2009-06-01 19:07:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Mon, Jun 01, 2009 at 08:06:30PM +0100, Alan Cox wrote:
> On Mon, 1 Jun 2009 21:09:14 +0200
> Andi Kleen <[email protected]> wrote:
>
> > On Mon, Jun 01, 2009 at 08:02:32PM +0100, Alan Cox wrote:
> > > > If a program seems to be unresponsive the user could ^C, without
> > > > realizing that it was really dumping core. Now when they are asked to
> > > > produce the core so the problem can be debugged, they can't do it. Or,
> > >
> > > and get their prompt back, which is probably why they are banging ^C. If
> > > they didn't want their prompt back at that point they'd still be
> > > wondering why nothing was occuring at the point it said (core dumped)
> >
> > Maybe we need a background core dump?
>
> You can pretty much implement that via the pipe handler if you care. Just
> buffer aggressively.
>
> For the general case however programs assume that when wait() returns
> indicating the core dump occurred that they can immediately access the
> dump (eg bug-buddy in Gnome)

Then set a advisory lock on the dump... no seriously:

With full signal handling during dump they would probably get a lot of time
a partial dump at best, because it's common to set signals in a row.
I agree with Paul on that that that is unfortunate at best.

Perhaps that's something that just needs a sysctl.

-Andi

--
[email protected] -- Speaking for myself only.

2009-06-01 19:53:48

by Paul Smith

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Mon, 2009-06-01 at 20:02 +0100, Alan Cox wrote:
> > If a program seems to be unresponsive the user could ^C, without
> > realizing that it was really dumping core. Now when they are asked to
> > produce the core so the problem can be debugged, they can't do it. Or,
>
> and get their prompt back, which is probably why they are banging ^C. If
> they didn't want their prompt back at that point they'd still be
> wondering why nothing was occuring at the point it said (core dumped)

True. My concern is that non-interactive, non-user controlled processes
seem to be getting thrown out with the bathwater here in the search for
the ultimate ease-of-use for interactive users. SIGINT is not just a
user signal.

If it's interactive, can't the user ^Z (SIGSTOP) the process being
dumped, then kill -9 %1? Does SIGSTOP stop a process that's dumping
core? If this works it's not as simple as ^C, but I find myself doing
that all the time for processes which are catching SIGINT, as Oleg
points out.

Saying that SIGSTOP stops a core dump, SIGCONT continues it, SIGKILL
cancels it, and everything else is ignored would be just fine with me.

Yes, you need a shell with job control but... at some point we have to
just say it is what it is! Core dumps are not just annoying time/disk
space wasters, they have real value; a good core dump can save tens of
thousands of dollars or more in support and development costs. We need
(a way for) them to be reliable, even if it costs some interactive
ease-of-use.

Anyway, that's my opinion :-)

2009-06-01 20:26:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/01, Paul Smith wrote:
>
> If it's interactive, can't the user ^Z (SIGSTOP) the process being
> dumped,

No, this is very hard to implement.

Oleg.

2009-06-01 20:40:21

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

IMHO it would certainly be wrong to have behavior differ based on what
particular method from userland was used to generate a signal. I don't
think Alan was suggesting anything like that, just using ^C as shorthand
for what the user experience is that matters. (Alan also mentioned ^\,
i.e. SIGQUIT. I don't think it would make sense to have that interrupt
dumping--it's what you hit to request a core dump.)

I'm not really sure exactly how this has morphed over time, or how the
internal wrinkles were before that produced the observed behavior. I
suspect it was always more by happenstance than careful attention.

Any sig_fatal() non-core signal that was sent before the core dumping
actually started already prevents the dump now. The complete_signal()
code along with the signal_group_exit() check in zap_threads() should
ensure that. This is as it should be: there was no guarantee of the
order in which the signals were processed, so the effect is that the
core dump signal was swallowed by the arrival of the just-fatal signal.

Aside from that, I see the following categories for newly-arriving signals.

1. More core-dump signals. e.g., it was already crashing and you hit ^\
or maybe just hit ^\ twice with a finger delay.
2. Non-fatal signals (i.e. ones with handlers, stop signals).
3. Plain sig_fatal() non-core signals (e.g. SIGINT when not handled)
4. SIGKILL (an actual one from userland or oomkill, not group-exit)

#1 IMHO should not do anything at all.
You are asking for a core dump, it's already doing it.

#2 should not do anything at all.
It's not really possible to suspend during the core dump, so unhandled,
unblocked stop signals can't do anything either.

#4 IMHO should always stop everything immediately.
That's what SIGKILL is for. When userland generates a SIGKILL
explicitly, that says the top priority is to be gone and cease
consuming any resources ASAP.

#3 is the open question. I don't feel strongly either way.

Whatever the decision on #3, we have a problem to fix for #1 and #2 at
least anyway. These unblocked signals will cause TIF_SIGPENDING to be
set when dumping, either via recalc_sigpending() from dequeue_signal()
after the core signal is taken (more signals already pending), or via
signal_wake_up() from complete_signal() for newly-generated signals.
(PF_EXITING is not yet set to prevent it.) This spuriously prevents
interruptible waits in the fs code or the pipe code or whatnot.

That looks simple to avoid just by clobbering ->real_blocked when we
start the core dump. The dump-writing itself uses ->blocked, so that
actually won't pollute the data. The less magical way that is obvious
would be to add a SIGNAL_GROUP_DUMPING flag that we set at the beginning
of the dumping, and make recalc_sigpending_tsk/complete_signal obey that.


Thanks,
Roland

2009-06-01 21:35:40

by Alan

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> If it's interactive, can't the user ^Z (SIGSTOP)

^Z is SIGTSTP but that would be one way to implement it although I think
it might need a bit more work than just setting masks.

> Yes, you need a shell with job control but... at some point we have to
> just say it is what it is! Core dumps are not just annoying time/disk
> space wasters, they have real value; a good core dump can save tens of
> thousands of dollars or more in support and development costs. We need
> (a way for) them to be reliable, even if it costs some interactive
> ease-of-use.
>
> Anyway, that's my opinion :-)

And there we get into policy and preference rather than technical debate
- no right answers 8(

2009-06-01 22:38:23

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/01, Roland McGrath wrote:
>
> IMHO it would certainly be wrong to have behavior differ based on what
> particular method from userland was used to generate a signal.

Agreed.

> Aside from that, I see the following categories for newly-arriving signals.
>
> 1. More core-dump signals. e.g., it was already crashing and you hit ^\
> or maybe just hit ^\ twice with a finger delay.
> 2. Non-fatal signals (i.e. ones with handlers, stop signals).
> 3. Plain sig_fatal() non-core signals (e.g. SIGINT when not handled)
> 4. SIGKILL (an actual one from userland or oomkill, not group-exit)
>
> #1 IMHO should not do anything at all.
> You are asking for a core dump, it's already doing it.
>
> #2 should not do anything at all.
> It's not really possible to suspend during the core dump, so unhandled,
> unblocked stop signals can't do anything either.
>
> #4 IMHO should always stop everything immediately.
> That's what SIGKILL is for. When userland generates a SIGKILL
> explicitly, that says the top priority is to be gone and cease
> consuming any resources ASAP.

Agreed.

> #3 is the open question. I don't feel strongly either way.
>
> Whatever the decision on #3, we have a problem to fix for #1 and #2 at
> least anyway. These unblocked signals will cause TIF_SIGPENDING to be
> set when dumping, either via recalc_sigpending() from dequeue_signal()
> after the core signal is taken (more signals already pending), or via
> signal_wake_up() from complete_signal() for newly-generated signals.
> (PF_EXITING is not yet set to prevent it.) This spuriously prevents
> interruptible waits in the fs code or the pipe code or whatnot.
>
> That looks simple to avoid just by clobbering ->real_blocked when we
> start the core dump.

I don't think ->real_blocked is a good choice, we have to add more checks
to the signal sending path. Note that currenly it is only checked under
sig_fatal() && !SIGNAL_GROUP_EXIT.

Perhaps it is easier to change dump_write() to clear TIF_SIGPENDING
unless fatal_signal_pending(),

int coredump_file_write(struct file *file, const void *addr, int nr)
{
while (nr > 0) {
int res = file->f_op->write(file, addr, nr, &file->f_pos);

if (res > 0) {
nr -= res;
continue;
}

if (!signal_pending(current))
break;
if (__fatal_signal_pending(current))
break;
clear_thread_flag(TIF_SIGPENDING);
}

return !nr;
}

> The less magical way that is obvious
> would be to add a SIGNAL_GROUP_DUMPING flag that we set at the beginning
> of the dumping, and make recalc_sigpending_tsk/complete_signal obey that.

We do not need to change recalc_sigpending_tsk. For example, if we decide
that only SIGKILL interrupts the coredumping, than I think something like
the patch below should work. But I think we should change dump_write() to
handle the short write anyway?

Of course, this all assumes f_op->write() does not do recalc_sigpending().

Oleg.

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -644,6 +644,8 @@ static int prepare_signal(int sig, struc
/*
* The process is in the middle of dying, nothing to do.
*/
+ if ((signal->flags & SIGNAL_GROUP_DUMPING) && sig != SIGKILL)
+ return 0;
} else if (sig_kernel_stop(sig)) {
/*
* This is a stop signal. Remove SIGCONT from all queues.
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1537,6 +1537,8 @@ static inline int zap_threads(struct tas
if (!signal_group_exit(tsk->signal)) {
mm->core_state = core_state;
tsk->signal->group_exit_code = exit_code;
+ tsk->signal->flags |= SIGNAL_GROUP_DUMPING;
+ clear_thread_flag(TIF_SIGPENDING);
nr = zap_process(tsk);
}
spin_unlock_irq(&tsk->sighand->siglock);
@@ -1760,12 +1762,6 @@ void do_coredump(long signr, int exit_co
old_cred = override_creds(cred);

/*
- * Clear any false indication of pending signals that might
- * be seen by the filesystem code called to write the core file.
- */
- clear_thread_flag(TIF_SIGPENDING);
-
- /*
* lock_kernel() because format_corename() is controlled by sysctl, which
* uses lock_kernel()
*/

2009-06-01 23:03:51

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> I don't think ->real_blocked is a good choice, we have to add more checks
> to the signal sending path. Note that currenly it is only checked under
> sig_fatal() && !SIGNAL_GROUP_EXIT.

No, you're right. It's not a good idea.

> Perhaps it is easier to change dump_write() to clear TIF_SIGPENDING
> unless fatal_signal_pending(),

That is almost a separate subject, really. Having i/o calls' waits wrongly
interrupted and then clearing TIF_SIGPENDING just seems goofy to me. It
might not be just the ->write call, it might affect filp_open or dump_seek
or whatever. It makes no sense to me to take the approach of fiddling
after doing the wrong thing. Just don't do the wrong thing to begin with.

That means clear TIF_SIGPENDING and don't let it be set again by a signal
that is not meant to abort the core dump. Why do anything but that?

It's almost certain that recalc_sigpending_tsk won't be called once dumping
has started. But there is the possibility of recalc_sigpending_and_wake
via cancel_freezing, at least. Seems safer to make recalc_sigpending_tsk
robust in this case.


Thanks,
Roland

2009-06-02 00:14:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/01, Roland McGrath wrote:
>
> That is almost a separate subject, really. Having i/o calls' waits wrongly
> interrupted and then clearing TIF_SIGPENDING just seems goofy to me.

Yes, agreed. The patch I sent make the coredumping task invisible to all
signals except SIGKILL.

> But there is the possibility of recalc_sigpending_and_wake
> via cancel_freezing, at least. Seems safer to make recalc_sigpending_tsk
> robust in this case.

Oh, I forgot about freezer...

Well, not good to complicate recalc_sigpending_tsk() for this unlikely case.
And this can't help, freezer does signal_wake_up() unconditionally.

So in fact this is another argument to check signal_pending() and clear it
in dump_write/seek.

But since the coredumping task is not freezable anyway, perhaps we should
change fake_signal_wake_up() to ignore SIGNAL_GROUP_DUMPING task.

Or we should make the coredumping freezable. This means dump_write/seek
and exit_mm() should do try_to_freeze().


In any case, the coredumping is special. If ->write() returns -ERESTART/EINTR
it assumes the return to ths user-space, this is not true for the coredump.
This means that handling the spurious signals in coredump_file_write() is
not so bad if we can't avoid this.

Oleg.

2009-06-02 08:22:41

by Alan

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> Perhaps it is easier to change dump_write() to clear TIF_SIGPENDING
> unless fatal_signal_pending(),

If you are receiving a continuous stream of SIGIO's say then how do you
guarantee the code below will make progress ?

>
> int coredump_file_write(struct file *file, const void *addr, int nr)
> {
> while (nr > 0) {
> int res = file->f_op->write(file, addr, nr, &file->f_pos);
>
> if (res > 0) {
> nr -= res;
> continue;
> }
>
> if (!signal_pending(current))
> break;
> if (__fatal_signal_pending(current))
> break;
> clear_thread_flag(TIF_SIGPENDING);
> }
>
> return !nr;
> }
>

> Of course, this all assumes f_op->write() does not do recalc_sigpending().

Which is itself a dangerous assumption that shouldn't be propogated.

2009-06-02 15:35:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/02, Alan Cox wrote:
>
> > Perhaps it is easier to change dump_write() to clear TIF_SIGPENDING
> > unless fatal_signal_pending(),
>
> If you are receiving a continuous stream of SIGIO's say then how do you
> guarantee the code below will make progress ?

Yes, the second SIGIO has no effect.

> > int coredump_file_write(struct file *file, const void *addr, int nr)
> > {
> > while (nr > 0) {
> > int res = file->f_op->write(file, addr, nr, &file->f_pos);
> >
> > if (res > 0) {
> > nr -= res;
> > continue;
> > }
> >
> > if (!signal_pending(current))
> > break;
> > if (__fatal_signal_pending(current))
> > break;
> > clear_thread_flag(TIF_SIGPENDING);
> > }
> >
> > return !nr;
> > }
> >
>
> > Of course, this all assumes f_op->write() does not do recalc_sigpending().
>
> Which is itself a dangerous assumption that shouldn't be propogated.

I don't think this assumption is dangerous. Why should ->write() call
recalc_sigpending() ? It should not be called outside of signal code.

But yes, if we add SIGNAL_GROUP_DUMPING we can change recalc_sigpending_tsk().
Just I don't like the idea to slow down / complicate this helper for the
very special case.


Hmm. Does ->core_dump() report the state of ->sighand->action? I can't
find any usage. Perhaps we can just ignore all signals except SIGKILL ?

Oleg.

2009-06-03 07:12:17

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> Oh, I forgot about freezer...

We all would like to.

> Well, not good to complicate recalc_sigpending_tsk() for this unlikely case.
> And this can't help, freezer does signal_wake_up() unconditionally.
>
> So in fact this is another argument to check signal_pending() and clear it
> in dump_write/seek.

I can't agree with this at all. IMHO it's far better to have a consistent
definition of when TIF_SIGPENDING ought to be triggered, and have
recalc_sigpending_tsk() use logic that matches the logic controlling when
to set TIF_SIGPENDING asynchronously (i.e. signal_wake_up calls).

> But since the coredumping task is not freezable anyway, perhaps we should
> change fake_signal_wake_up() to ignore SIGNAL_GROUP_DUMPING task.

That could be a long delay and a lot of i/o before suspending.

> Or we should make the coredumping freezable. This means dump_write/seek
> and exit_mm() should do try_to_freeze().

Yes, I think this is the thing to do for that issue.
(It's kind of a separate problem.)

> In any case, the coredumping is special. If ->write() returns -ERESTART/EINTR
> it assumes the return to ths user-space, this is not true for the coredump.
> This means that handling the spurious signals in coredump_file_write() is
> not so bad if we can't avoid this.

I am not so confident. It seems far too easy to wind up with some other
way that TIF_SIGPENDING gets continually set and this loops, for example.
(This could be some day in the future when fs, driver or pipe-io code
changes somehow completely obscure.) It's far better to have confidence
just in the signals code itself: the only things that set TIF_SIGPENDING
interlock with the logic of the only things that are expected to clear it.


Thanks,
Roland

2009-06-03 07:18:22

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> Hmm. Does ->core_dump() report the state of ->sighand->action? I can't
> find any usage. Perhaps we can just ignore all signals except SIGKILL ?

I don't think anything uses that information. But it might one day,
or it might matter in some other arcane way (/proc/pid/status during
the dump is the only way I can think of to notice, but who knows).

I think it's better just not to get into any kludges that clobber
anything that records actual user-visible state that we normally
read out to userland in any fashion (even if none is known to do so
at dumping time or later). It's much more natural just to use pure
internal kernel state such as signal->flags for anything like this.


Thanks,
Roland

2009-06-03 14:06:16

by Paul Smith

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Mon, 2009-06-01 at 13:38 -0700, Roland McGrath wrote:
> 1. More core-dump signals. e.g., it was already crashing and you hit ^\
> or maybe just hit ^\ twice with a finger delay.
> 2. Non-fatal signals (i.e. ones with handlers, stop signals).
> 3. Plain sig_fatal() non-core signals (e.g. SIGINT when not handled)
> 4. SIGKILL (an actual one from userland or oomkill, not group-exit)
>
> #1 IMHO should not do anything at all.
> You are asking for a core dump, it's already doing it.
>
> #2 should not do anything at all.
> It's not really possible to suspend during the core dump, so unhandled,
> unblocked stop signals can't do anything either.
>
> #4 IMHO should always stop everything immediately.
> That's what SIGKILL is for. When userland generates a SIGKILL
> explicitly, that says the top priority is to be gone and cease
> consuming any resources ASAP.
>
> #3 is the open question. I don't feel strongly either way.

Thanks Roland. This is a great summary and lends clarity to the
discussion.

Actually I'm quite happy with the above for #'s 1, 2, and 4.

I've already stated my preference that #3 should behave like #2, but
certainly people can disagree on this and I understand that some would
like it to behave as #4. Best case is this can be configured or, at
least if it's documented clearly userspace applications can code
defensively by masking those signals (this has minor annoyances but...)

Unfortunately the discussion you and Oleg are having shows me how little
I know about this area of the kernel and what a bad idea it would be for
me to try to get this right on my own :-). However, I'm happy to test
patches, comment on solutions, etc.

2009-06-04 03:21:33

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/03, Roland McGrath wrote:
>
> > But since the coredumping task is not freezable anyway, perhaps we should
> > change fake_signal_wake_up() to ignore SIGNAL_GROUP_DUMPING task.
>
> That could be a long delay and a lot of i/o before suspending.
>
> > Or we should make the coredumping freezable. This means dump_write/seek
> > and exit_mm() should do try_to_freeze().
>
> Yes, I think this is the thing to do for that issue.

Fortunately, this doesn't look hard. Whatever we do, we should modify
dump_write/seek to check fatal_signal_pending() anyway. Because we can't
know if f_ops->write() pays attention to signals. This means we can just
add try_to_freeze().

As for exit_mm(), we can use freezer_do_not_count() + freezer_count()
around the "for (;;)" loop.

> > In any case, the coredumping is special. If ->write() returns -ERESTART/EINTR
> > it assumes the return to ths user-space, this is not true for the coredump.
> > This means that handling the spurious signals in coredump_file_write() is
> > not so bad if we can't avoid this.
>
> I am not so confident. It seems far too easy to wind up with some other
> way that TIF_SIGPENDING gets continually set and this loops, for example.
> (This could be some day in the future when fs, driver or pipe-io code
> changes somehow completely obscure.) It's far better to have confidence
> just in the signals code itself: the only things that set TIF_SIGPENDING
> interlock with the logic of the only things that are expected to clear it.

Looks like, if we introduce a difference between "really killed" tasks and
exiting/execing/coredumping tasks (as discussed in another thread), we get
this all for free.

Oleg.

2009-06-04 17:17:06

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

> Fortunately, this doesn't look hard. Whatever we do, we should modify
> dump_write/seek to check fatal_signal_pending() anyway. Because we can't
> know if f_ops->write() pays attention to signals.

Yes, that sounds fine.

> This means we can just add try_to_freeze().

Right.

> As for exit_mm(), we can use freezer_do_not_count() + freezer_count()
> around the "for (;;)" loop.

Ah yes, sure.


Thanks,
Roland

2009-06-23 17:34:36

by Paul Smith

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Thu, 2009-06-04 at 10:14 -0700, Roland McGrath wrote:
> > Fortunately, this doesn't look hard. Whatever we do, we should modify
> > dump_write/seek to check fatal_signal_pending() anyway. Because we can't
> > know if f_ops->write() pays attention to signals.
>
> Yes, that sounds fine.
>
> > This means we can just add try_to_freeze().
>
> Right.
>
> > As for exit_mm(), we can use freezer_do_not_count() + freezer_count()
> > around the "for (;;)" loop.
>
> Ah yes, sure.

Hi Oleg;

Did you have any more time to look into this? I'm currently using my
patch and it's OK for my purposes but I'll be happy to test any proposal
you come up with, if you like, so I can drop my patch in the future.

Let me know, thanks!

2009-06-23 22:53:41

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/23, Paul Smith wrote:
>
> Did you have any more time to look into this? I'm currently using my
> patch and it's OK for my purposes but I'll be happy to test any proposal
> you come up with, if you like, so I can drop my patch in the future.

Thanks for reminding...

I'll try to make the patch this or next week.

Oleg.

2009-07-07 19:41:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On 06/23, Oleg Nesterov wrote:
>
> On 06/23, Paul Smith wrote:
> >
> > Did you have any more time to look into this? I'm currently using my
> > patch and it's OK for my purposes but I'll be happy to test any proposal
> > you come up with, if you like, so I can drop my patch in the future.
>
> Thanks for reminding...
>
> I'll try to make the patch this or next week.

Well, I need to think more.

SIGNAL_GROUP_DUMPING is not that simple. If we want to use it
in recalc_sigpending(), we can only set it after coredump_wait() succeeds.
And we need some changes in send_signal()->complete_signal() anyway, to make
sure SIGKILL does wakeup even if SIGNAL_GROUP_EXIT.

Sorry for delay, will try to make patches soon ;)

Oleg.