2008-07-10 08:15:07

by Elias Oltmanns

[permalink] [raw]
Subject: Scheduling issue with signal handling

Hi there,

in another thread [1] we've discussed something which, after some
testing, I now believe to be a problem related to scheduling /
preemption.

Here is the problem: If there is heavy disk I/O in the background,
signals sent to an I/O related process interactively by pressing Ctrl+Z
or Ctrl+C are sometimes processed quite sluggishly. I can pretty
reliably reproduce this using a test case provided by the original
reporter [1] (on a uniprocessor if that's of any consequence). The
problem has been verified to exist at least since 2.6.19 but it's become
much easier to observe and thus attracted attention since 2.6.25 because
pressing Ctrl+Z or Ctrl+C is now acknowledged by printing ^Z or ^C to
the terminal, respectively. As a result, the user gets slightly
irritated and possibly impatient when pressing Ctrl+Z repeatedly or just
holding down these keys generates a string like
^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z
but the shell prompt doesn't appear for up to a few seconds.

By sprinkling some printk()s all over the place, I've managed to
establish the following sequence of events taking place in the event of
delayed signal handling as described above:
The first Ctrl+Z event enqueues a SIGTSTP signal which eventually
results in a call to kick_process(). For some reason though, the signal
isn't handled straight away but remains on the queue for some time.
Consequently, subsequent Ctrl+Z events result in echoing another ^Z to
the terminal but everything related to sending a signal is skipped (and
rightly so) because the kernel detects that a SIGTSTP is still pending.
Eventually, get_signal_to_deliver() dequeues the SIGTSTP signal and the
shell propt appears.

My question is this: Even under high disk I/O pressure, the threads
dealing with I/O to the terminal evidently still get their turn as
indicated by the sequence of ^Z appearing on screen. Why is it then,
that the threads which are meant to process the SIGTSTP or SIGINT signals
aren't scheduled for some seconds and is there a way to change this?

Please let me know if there is anything I can try to investigate this
any further or if you need further information.

Thanks in advance,

Elias

[1] http://lkml.org/lkml/2008/6/28/50


2008-07-10 08:55:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: Scheduling issue with signal handling


* Elias Oltmanns <[email protected]> wrote:

> By sprinkling some printk()s all over the place, I've managed to
> establish the following sequence of events taking place in the event
> of delayed signal handling as described above: The first Ctrl+Z event
> enqueues a SIGTSTP signal which eventually results in a call to
> kick_process(). For some reason though, the signal isn't handled
> straight away but remains on the queue for some time. Consequently,
> subsequent Ctrl+Z events result in echoing another ^Z to the terminal
> but everything related to sending a signal is skipped (and rightly so)
> because the kernel detects that a SIGTSTP is still pending.
> Eventually, get_signal_to_deliver() dequeues the SIGTSTP signal and
> the shell propt appears.
>
> My question is this: Even under high disk I/O pressure, the threads
> dealing with I/O to the terminal evidently still get their turn as
> indicated by the sequence of ^Z appearing on screen. Why is it then,
> that the threads which are meant to process the SIGTSTP or SIGINT
> signals aren't scheduled for some seconds and is there a way to change
> this?
>
> Please let me know if there is anything I can try to investigate this
> any further or if you need further information.
>
> Thanks in advance,
>
> Elias
>
> [1] http://lkml.org/lkml/2008/6/28/50

hm, kick_process() is a no-op on !SMP.

Basically, when a new signal is queued and a task is already running, it
will run in due course and process the signal the moment it's scheduled
again. (unless the signal is blocked)

If a task is not already running, then the signal code will wake up the
task and it will then process the signal the moment it's executed.

The maximum latency of a runnable task hitting the CPU is controlled via
/proc/sys/kernel/sched_latency [available if CONFIG_SCHED_DEBUG=y in the
.config] - 20 milliseconds on uniprocessors.

Several seconds of lag is almost out of question and would indicate a
serious scheduler bug, or - which is far more likely - either an
application signal processing hickup or a kernel signal processing
hickup.

If the lag happens with the task you can observe its worst-case
scheduling delay by looking at /proc/<PID>/sched, if you also have
CONFIG_SCHEDSTAT=y in your .config.

For example, a random shell's delays on a testbox:

phoenix:~> grep se.wait_max /proc/$$/sched
se.wait_max : 3.338588

That's 3.3 msecs _worst case_, on a system that has otherwise quite
insane load:

10:53:57 up 2:48, 2 users, load average: 77.56, 94.33, 102.75

So several seconds of delay, if it came from the scheduler, would be
really anomalous.

As a final measure, instead of printk's, you could try the scheduler
tracer in linux-next (CONFIG_CONTEXT_SWITCH_TRACER=y), to have an exact
idea about what is going on and when. (see /debug/tracing/README)

[ You might also want to try CONFIG_FTRACE=y and CONFIG_DYNAMIC_FTRACE=y
for extremely finegrained kernel tracing - available in linux-next
too. ]

Ingo

2008-07-10 14:33:26

by Elias Oltmanns

[permalink] [raw]
Subject: Re: Scheduling issue with signal handling

Ingo Molnar <[email protected]> wrote:
> * Elias Oltmanns <[email protected]> wrote:
>
>> By sprinkling some printk()s all over the place, I've managed to
>> establish the following sequence of events taking place in the event
>> of delayed signal handling as described above: The first Ctrl+Z event
>> enqueues a SIGTSTP signal which eventually results in a call to
>> kick_process(). For some reason though, the signal isn't handled
>> straight away but remains on the queue for some time. Consequently,
>> subsequent Ctrl+Z events result in echoing another ^Z to the terminal
>> but everything related to sending a signal is skipped (and rightly so)
>> because the kernel detects that a SIGTSTP is still pending.
>> Eventually, get_signal_to_deliver() dequeues the SIGTSTP signal and
>> the shell propt appears.
>>
>> My question is this: Even under high disk I/O pressure, the threads
>> dealing with I/O to the terminal evidently still get their turn as
>> indicated by the sequence of ^Z appearing on screen. Why is it then,
>> that the threads which are meant to process the SIGTSTP or SIGINT
>> signals aren't scheduled for some seconds and is there a way to change
>> this?
>>
>> Please let me know if there is anything I can try to investigate this
>> any further or if you need further information.
>>
>> Thanks in advance,
>>
>> Elias
>>
>> [1] http://lkml.org/lkml/2008/6/28/50
>
> hm, kick_process() is a no-op on !SMP.
>
> Basically, when a new signal is queued and a task is already running, it
> will run in due course and process the signal the moment it's scheduled
> again. (unless the signal is blocked)
>
> If a task is not already running, then the signal code will wake up the
> task and it will then process the signal the moment it's executed.
>
> The maximum latency of a runnable task hitting the CPU is controlled via
> /proc/sys/kernel/sched_latency [available if CONFIG_SCHED_DEBUG=y in the
> .config] - 20 milliseconds on uniprocessors.
>
> Several seconds of lag is almost out of question and would indicate a
> serious scheduler bug, or - which is far more likely - either an
> application signal processing hickup or a kernel signal processing
> hickup.

I feel quite certain that the kernel is the part to blame, one way or
another.

>
> If the lag happens with the task you can observe its worst-case
> scheduling delay by looking at /proc/<PID>/sched, if you also have
> CONFIG_SCHEDSTAT=y in your .config.

Nothing suspicious regarding the applications involved here. The only
threads with se.max_wait values between, say, 50 msecs and 190 msecs are
things like ksoftirqd/0, udevd and the likes.

[...]
> That's 3.3 msecs _worst case_, on a system that has otherwise quite
> insane load:
>
> 10:53:57 up 2:48, 2 users, load average: 77.56, 94.33, 102.75

Very impressive. I don't think I have ever observed the tenth part of
that load on my system.

>
> So several seconds of delay, if it came from the scheduler, would be
> really anomalous.

With the patch below applied to next-20080708, I can quite easily
produce the following:

denkblock:/mnt# fg
find / > /dev/null
^Z[ 421.968364] complete_signal(): sig = 20.
^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z[ 423.224898] get_signal_to_deliver(): signr = 20.

[4]+ Stopped find / > /dev/null
denkblock:/mnt#

That's about 1.26 msecs, not quite the worst case yet, but still
absurdly long by my reckoning given that the system does still respond
to keyboard events.

>
> As a final measure, instead of printk's, you could try the scheduler
> tracer in linux-next (CONFIG_CONTEXT_SWITCH_TRACER=y), to have an exact
> idea about what is going on and when. (see /debug/tracing/README)
>
> [ You might also want to try CONFIG_FTRACE=y and CONFIG_DYNAMIC_FTRACE=y
> for extremely finegrained kernel tracing - available in linux-next
> too. ]

Well, I've generated a trace using sched_switch. Right now, I can't make
head or tail of it and it's a ridiculous amount of data. If you'd like
to have a look at it, I can put it online somewhere. Alternatively, you
could give me a hint as to how I can read it or what I can do to
generate / filter / process the relevant data.

Thanks,

Elias

--------
diff --git a/kernel/signal.c b/kernel/signal.c
index 6c0958e..8ae57f8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -721,11 +721,18 @@ static inline int wants_signal(int sig, struct task_struct *p)
return task_curr(p) || !signal_pending(p);
}

+static inline void print_sig(const char *msg, int sig)
+{
+ if (sig == SIGTSTP || sig == SIGINT)
+ printk(msg, sig);
+}
+
static void complete_signal(int sig, struct task_struct *p, int group)
{
struct signal_struct *signal = p->signal;
struct task_struct *t;

+ print_sig("complete_signal(): sig = %d.\n", sig);
/*
* Now find a thread we can wake up to take the signal off the queue.
*
@@ -1778,6 +1785,7 @@ relock:
break; /* will return 0 */

if (signr != SIGKILL) {
+ print_sig("get_signal_to_deliver(): signr = %d.\n", signr);
signr = ptrace_signal(signr, info, regs, cookie);
if (!signr)
continue;