2006-12-18 20:34:35

by Nicholas Mc Guire

[permalink] [raw]
Subject: problem with signal delivery SIGCHLD



Hi !

I have a phenomena that I don't quite understand. gdbserver forks and
after setting ptrace (PTRACE_TRACEME, 0, 0, 0); it then execv
(program, allargs); when this child process hits ptrace_stoped (breakpoint
it does the following in kernel space:

pid 1242 = child process
pid 1241 = gdbserver
pid 0 = kernel
pid -1 = interrupt
pid
1 559 5 1242 ptrace_stop
3 6 2 1242 | do_notify_parent_cldstop
4 3 2 1242 | | __group_send_sig_info
5 1 1 1242 | | | handle_stop_signal
7 0 0 1242 | | | sig_ignored
8 1 0 1242 | | __wake_up_sync
8 1 1 1242 | | | __wake_up_common
10 547 541 1242 | schedule
10 2 2 1242 | | profile_hit
13 1 1 1242 | | sched_clock
15 1 0 1242 | | deactivate_task
15 1 1 1242 | | | dequeue_task
19 2 2 0 | | __switch_to
----------- !!!! start --------------
24 574 574 0 default_idle
----------- $$$$ end ----------------
----------- //// start --------------
780 41 12 0 do_IRQ
780 29 2 -1 / __do_IRQ
...
807 2 2 -1 / / / enable_8259A_irq
----------- //// end ----------------
----------- {{{{ start --------------
810 11 0 0 do_softirq
...
820 0 0 -1 { { { preempt_schedule
----------- {{{{ end ----------------
----------- %%%% start --------------
822 358 1 0 preempt_schedule_irq
...
827 1 1 1241 % % __switch_to
----------- %%%% end ----------------
829 1 1 1241 ( ( ( del_timer
----------- (((( end ----------------
----------- ]]]] start --------------
837 8 2 1241 sys_waitpid

So basically child signals -> delayed to next tick -> parent wakes up.

now what I don't understand is why does it not schedule the parent ? in
fact the traces show that on an idle system the parent will not be
scheduled until the next tick.

I don't want to post the full traces as they are very lengthy - but if
someone could point me in what direction to search to find out why this
delay between the child process signaling and the parent breaking out of
waitpid it would be helpfull - the background is that we implemented GDB
tracepoints but they only work properly on highly loaded systems (the
parent gets scheduled fast then and the tracepoint is processed "fast"
but on an idle system it is simply not usabel as you get almost
deterministic 10ms between the child process signaling and the parent
waking up...

Is this delay on sigchild expected behavior ?

any ideas ?

thx !
hofrat

ps.: if anybody is intersted in the detailed logs see
http://dslab.lzu.edu.cn/~hofrat/sigchld_problem.html


2006-12-19 18:00:54

by Chuck Ebbert

[permalink] [raw]
Subject: Re: problem with signal delivery SIGCHLD

In-Reply-To: <[email protected]>

On Mon, 18 Dec 2006 20:05:35 +0100, Nicholas Mc Guire wrote:

> I have a phenomena that I don't quite understand. gdbserver forks and
> after setting ptrace (PTRACE_TRACEME, 0, 0, 0); it then execv
> (program, allargs); when this child process hits ptrace_stoped (breakpoint
> it does the following in kernel space:
>
> pid
> 1 559 5 1242 ptrace_stop
> 3 6 2 1242 | do_notify_parent_cldstop
> 4 3 2 1242 | | __group_send_sig_info
> 5 1 1 1242 | | | handle_stop_signal
> 7 0 0 1242 | | | sig_ignored
> 8 1 0 1242 | | __wake_up_sync
> 8 1 1 1242 | | | __wake_up_common
> 10 547 541 1242 | schedule
> 10 2 2 1242 | | profile_hit
> 13 1 1 1242 | | sched_clock
> 15 1 0 1242 | | deactivate_task
> 15 1 1 1242 | | | dequeue_task
> 19 2 2 0 | | __switch_to

> 24 574 574 0 default_idle

> So basically child signals -> delayed to next tick -> parent wakes up.

What do the first three columns represent?

Do you see __wake_up_common() calling default_wake_function()?

Can you trace through schedule() and see exactly how it decides to
run the idle task instead of the newly-woken parent? Exactly what
path does it take? (And which kernel version is this?)

--
MBTI: IXTP

2006-12-20 07:10:20

by Mike Galbraith

[permalink] [raw]
Subject: Re: problem with signal delivery SIGCHLD

On Mon, 2006-12-18 at 20:05 +0100, Nicholas Mc Guire wrote:
>
> Hi !
>
> I have a phenomena that I don't quite understand. gdbserver forks and
> after setting ptrace (PTRACE_TRACEME, 0, 0, 0); it then execv
> (program, allargs); when this child process hits ptrace_stoped (breakpoint
> it does the following in kernel space:
>
> pid 1242 = child process
> pid 1241 = gdbserver
> pid 0 = kernel
> pid -1 = interrupt
> pid
> 1 559 5 1242 ptrace_stop
> 3 6 2 1242 | do_notify_parent_cldstop
> 4 3 2 1242 | | __group_send_sig_info
> 5 1 1 1242 | | | handle_stop_signal
> 7 0 0 1242 | | | sig_ignored
> 8 1 0 1242 | | __wake_up_sync
> 8 1 1 1242 | | | __wake_up_common
> 10 547 541 1242 | schedule
> 10 2 2 1242 | | profile_hit
> 13 1 1 1242 | | sched_clock
> 15 1 0 1242 | | deactivate_task
> 15 1 1 1242 | | | dequeue_task
> 19 2 2 0 | | __switch_to
> ----------- !!!! start --------------
> 24 574 574 0 default_idle
> ----------- $$$$ end ----------------
> ----------- //// start --------------
> 780 41 12 0 do_IRQ
> 780 29 2 -1 / __do_IRQ
> ...
> 807 2 2 -1 / / / enable_8259A_irq
> ----------- //// end ----------------
> ----------- {{{{ start --------------
> 810 11 0 0 do_softirq
> ...
> 820 0 0 -1 { { { preempt_schedule
> ----------- {{{{ end ----------------
> ----------- %%%% start --------------
> 822 358 1 0 preempt_schedule_irq
> ...
> 827 1 1 1241 % % __switch_to
> ----------- %%%% end ----------------
> 829 1 1 1241 ( ( ( del_timer
> ----------- (((( end ----------------
> ----------- ]]]] start --------------
> 837 8 2 1241 sys_waitpid
>
> So basically child signals -> delayed to next tick -> parent wakes up.

Hm. What does the trace of gdbserver look like prior to the clild doing
do_notify_parent_cldstop()? Sleeping someplace other than wait4?

-Mike