2005-10-11 18:13:50

by Kilau, Scott

[permalink] [raw]
Subject: [BUG?] 2.6.x (2.6.13) - new signals not being delivered to a terminating (PF_EXITING) process.

Hi everyone,

This email will deal with the serial/tty layer of the kernel,
but I don't think its completely isolated to this layer...

I have found a problem with signals not getting delivered to a
Process once it enters into the PF_EXITING state.

Probably the best way to show it, is an example with the built-in
COM port.

1) Don't have anything connected to COM1.
2) stty -ixon -ixoff -ixany crtscts < /dev/ttyS0
3) date > /dev/ttyS0
4) Press ctrl-c.

The process does get this ctrl-c, and starts closing down.

The serial driver gets its "tty close" for ttyS0, as it should,
and goes into a "drain" waiting for the data that is pending
in the UART to be written.

(Which can never be written, because the port is stuck in a
hardware flow control state)

5) Press ctrl-c again... And again, and again, and again. Nothing.

The process is stuck, and a ps -ef shows that it is in a
zombie state ([date])

Under stock 2.4 kernels, the 2nd and subsequent ctrl-c's would wake
up the serial driver's "wait" with a signal, which in turn would
allow the serial driver to bail out of the forever "drain",
and complete the close.

Now, eventually, the "date" will bail, but only because the serial
driver has a "timeout" set for the wait in its drain routine.
It still never receives the 2nd+ ctrl-c's.

Is this change intentional?
If so, why?

Thanks!
Scott


2005-10-11 18:30:20

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: [BUG?] 2.6.x (2.6.13) - new signals not being delivered to a terminating (PF_EXITING) process.


On Tue, 11 Oct 2005, Kilau, Scott wrote:

> Hi everyone,
>
> This email will deal with the serial/tty layer of the kernel,
> but I don't think its completely isolated to this layer...
>
> I have found a problem with signals not getting delivered to a
> Process once it enters into the PF_EXITING state.
>
> Probably the best way to show it, is an example with the built-in
> COM port.
>
> 1) Don't have anything connected to COM1.
> 2) stty -ixon -ixoff -ixany crtscts < /dev/ttyS0
> 3) date > /dev/ttyS0
> 4) Press ctrl-c.
>
> The process does get this ctrl-c, and starts closing down.
>
> The serial driver gets its "tty close" for ttyS0, as it should,
> and goes into a "drain" waiting for the data that is pending
> in the UART to be written.
>
> (Which can never be written, because the port is stuck in a
> hardware flow control state)
>
> 5) Press ctrl-c again... And again, and again, and again. Nothing.
>
> The process is stuck, and a ps -ef shows that it is in a
> zombie state ([date])
>
> Under stock 2.4 kernels, the 2nd and subsequent ctrl-c's would wake
> up the serial driver's "wait" with a signal, which in turn would
> allow the serial driver to bail out of the forever "drain",
> and complete the close.
>
> Now, eventually, the "date" will bail, but only because the serial
> driver has a "timeout" set for the wait in its drain routine.
> It still never receives the 2nd+ ctrl-c's.
>
> Is this change intentional?
> If so, why?
>
> Thanks!
> Scott

Once a process in in the 'Z' state it should not receive any
signals. Its signal handlers are already gone. It's just a
snippit of sys_exit code that remains. If the process truly
is in the 'Z' state, its input/output/error file-descriptors
should have already been closed so the time-out from the
shutdown should have already happened.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13 on an i686 machine (5589.44 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-10-11 19:32:01

by Kilau, Scott

[permalink] [raw]
Subject: RE: [BUG?] 2.6.x (2.6.13) - new signals not being delivered to a terminating (PF_EXITING) process.



> Once a process in in the 'Z' state it should not receive any
> signals. Its signal handlers are already gone. It's just a
> snippit of sys_exit code that remains. If the process truly
> is in the 'Z' state, its input/output/error file-descriptors
> should have already been closed so the time-out from the
> shutdown should have already happened.

Hi Dick,

You are right, its not a zombie.

The process is held up in "drain" in the tty close of the
processes stdin (/dev/ttyS0).
(I assumed brackets in ps -ef meant zombies, but that's wrong)

I added some code to make a short timeout in the "tty close" part of
the driver, then check the values of:

current->signal->shared_pending.list.next
current->signal->shared_pending.list.prev

They *do* change, when I send the process (date) a signal.

The kernel just isn't waking up the driver's "wait" to let it
know that there are signals pending.

Also, why did this work under 2.4?

This is why I was wondering if this was intentional, or was just an
oversight...

Thanks!
Scott

2005-10-11 19:49:17

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: RE: [BUG?] 2.6.x (2.6.13) - new signals not being delivered to a terminating (PF_EXITING) process.


On Tue, 11 Oct 2005, Kilau, Scott wrote:

>
>
>> Once a process in in the 'Z' state it should not receive any
>> signals. Its signal handlers are already gone. It's just a
>> snippit of sys_exit code that remains. If the process truly
>> is in the 'Z' state, its input/output/error file-descriptors
>> should have already been closed so the time-out from the
>> shutdown should have already happened.
>
> Hi Dick,
>
> You are right, its not a zombie.
>
> The process is held up in "drain" in the tty close of the
> processes stdin (/dev/ttyS0).
> (I assumed brackets in ps -ef meant zombies, but that's wrong)
>
> I added some code to make a short timeout in the "tty close" part of
> the driver, then check the values of:
>
> current->signal->shared_pending.list.next
> current->signal->shared_pending.list.prev
>
> They *do* change, when I send the process (date) a signal.
>
> The kernel just isn't waking up the driver's "wait" to let it
> know that there are signals pending.
>


Okay. I have to take a "work break", but you can check the
driver and see if the code is interruptible (it must be)
and if there is something like if(signal_pending(current))
get_to_hell_out_of_this_loop.... in the time-out loop.

You can check against 2.4 code to see what's changed (a lot).
Sometimes with massive re-writes, simple things are forgotten.


> Also, why did this work under 2.4?
>
> This is why I was wondering if this was intentional, or was just an
> oversight...
>

The tty drivers were modified.

> Thanks!
> Scott
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.48 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-10-11 20:30:56

by Kilau, Scott

[permalink] [raw]
Subject: RE: [BUG?] 2.6.x (2.6.13) - new signals not being delivered to a terminating (PF_EXITING) process.


> Okay. I have to take a "work break", but you can check the
> driver and see if the code is interruptible (it must be)
> and if there is something like if(signal_pending(current))
> get_to_hell_out_of_this_loop.... in the time-out loop.

Hi Dick,
I don't believe this is a serial driver problem.

In /usr/src/linux-2.6.13/drivers/serial/serial_core.c ->
uart_wait_until_sent()

You can see that it does this:

while (!port->ops->tx_empty(port)) {
msleep_interruptible(jiffies_to_msecs(char_time));
if (signal_pending(current))
break;
if (time_after(jiffies, expire))
break;
}

Since we know that tx will never be empty because of flow control,
the only way this guy is going to bail, is if it times out,
or if it signal_pending() comes back with something.

signal_pending() never does, no matter how many signals I send it.
(Even sending it multiple kill -9's)

Eventually it times out (after 30 seconds, "setserial closing_wait"),
and then the process goes away.

However, I see the signals climb, when I print out the values of
current->signal->shared_pending.list.next and
current->signal->shared_pending.list.prev

Its like those values and the signal_pending macro aren't in "synch"
Anymore, once the process has gone into the PF_EXITING state.
(It works fine when the process is not in that state)

Thanks!
Scott

2005-10-12 00:09:42

by Alan

[permalink] [raw]
Subject: RE: [BUG?] 2.6.x (2.6.13) - new signals not being delivered to a terminating (PF_EXITING) process.

On Maw, 2005-10-11 at 14:35 -0500, Kilau, Scott wrote:
> Also, why did this work under 2.4?
>
> This is why I was wondering if this was intentional, or was just an
> oversight...

It seems that the signal reception in exiting process logic has changed.
Serial depends on the old behaviour and its difficult to see how it
should be fixed and what else would be "correct behaviour" here.

Alan