2022-09-22 09:18:53

by Cambda Zhu

[permalink] [raw]
Subject: Syscall kill() can send signal to thread ID

I found syscall kill() can send signal to a thread id, which is
not the TGID. But the Linux manual page kill(2) said:

"The kill() system call can be used to send any signal to any
process group or process."

And the Linux manual page tkill(2) said:

"tgkill() sends the signal sig to the thread with the thread ID
tid in the thread group tgid. (By contrast, kill(2) can be used
to send a signal only to a process (i.e., thread group) as a
whole, and the signal will be delivered to an arbitrary thread
within that process.)"

I don't know whether the meaning of this 'process' should be
the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
while Linux sends signal to the thread group that the thread belongs
to.

If this is as expected, should we add a notice to the Linux manual
page? Because it's a syscall and the pids not equal to tgid are not
listed under /proc. This may be a little confusing, I guess.

Regards,
Cambda


2022-09-22 15:20:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID

[email protected] writes:

> I found syscall kill() can send signal to a thread id, which is
> not the TGID. But the Linux manual page kill(2) said:
>
> "The kill() system call can be used to send any signal to any
> process group or process."
>
> And the Linux manual page tkill(2) said:
>
> "tgkill() sends the signal sig to the thread with the thread ID
> tid in the thread group tgid. (By contrast, kill(2) can be used
> to send a signal only to a process (i.e., thread group) as a
> whole, and the signal will be delivered to an arbitrary thread
> within that process.)"
>
> I don't know whether the meaning of this 'process' should be
> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
> while Linux sends signal to the thread group that the thread belongs
> to.
>
> If this is as expected, should we add a notice to the Linux manual
> page? Because it's a syscall and the pids not equal to tgid are not
> listed under /proc. This may be a little confusing, I guess.

This is as expected.

The bit about is /proc is interesting. On linux try
"cd /proc; cd tid" and see what happens.

Using the thread id in kill(2) is used to select the process, and the
delivery happens just the same as if the TGID had been used.

It is one of those odd behaviors that we could potentially remove. It
would require hunting through all of the userspace applications to see
if something happens to depend upon that behavior. Unless it becomes
expensive to maintain I don't expect we will ever do that.

For the same reason we probably don't want to document it as we don't
want to encourage anyone to use that strange corner case. As it is when
we break it by accident and noone notices for a couple of years we can
remove the behavior as that will have proved that no one uses it ;)

Eric

2022-09-22 16:34:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID

[email protected] writes:

> I found syscall kill() can send signal to a thread id, which is
> not the TGID. But the Linux manual page kill(2) said:
>
> "The kill() system call can be used to send any signal to any
> process group or process."
>
> And the Linux manual page tkill(2) said:
>
> "tgkill() sends the signal sig to the thread with the thread ID
> tid in the thread group tgid. (By contrast, kill(2) can be used
> to send a signal only to a process (i.e., thread group) as a
> whole, and the signal will be delivered to an arbitrary thread
> within that process.)"
>
> I don't know whether the meaning of this 'process' should be
> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
> while Linux sends signal to the thread group that the thread belongs
> to.
>
> If this is as expected, should we add a notice to the Linux manual
> page? Because it's a syscall and the pids not equal to tgid are not
> listed under /proc. This may be a little confusing, I guess.

How did you come across this? Were you just experimenting?

I am wondering if you were tracking a bug, or a portability problem
or something else. If the current behavior is causing problems in
some way instead of just being a detail that no one really cares about
either way it would be worth considering if we want to maintain the
current behavior.

Eric

2022-09-23 04:21:43

by Cambda Zhu

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID



> On Sep 22, 2022, at 23:33, Eric W. Biederman <[email protected]> wrote:
>
> [email protected] writes:
>
>> I found syscall kill() can send signal to a thread id, which is
>> not the TGID. But the Linux manual page kill(2) said:
>>
>> "The kill() system call can be used to send any signal to any
>> process group or process."
>>
>> And the Linux manual page tkill(2) said:
>>
>> "tgkill() sends the signal sig to the thread with the thread ID
>> tid in the thread group tgid. (By contrast, kill(2) can be used
>> to send a signal only to a process (i.e., thread group) as a
>> whole, and the signal will be delivered to an arbitrary thread
>> within that process.)"
>>
>> I don't know whether the meaning of this 'process' should be
>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>> while Linux sends signal to the thread group that the thread belongs
>> to.
>>
>> If this is as expected, should we add a notice to the Linux manual
>> page? Because it's a syscall and the pids not equal to tgid are not
>> listed under /proc. This may be a little confusing, I guess.
>
> How did you come across this? Were you just experimenting?
>
> I am wondering if you were tracking a bug, or a portability problem
> or something else. If the current behavior is causing problems in
> some way instead of just being a detail that no one really cares about
> either way it would be worth considering if we want to maintain the
> current behavior.
>
> Eric

I have found I can cd into /proc/tid, and the proc_pid_readdir()
uses next_tgid() to filter tid. Also the 'ps' command reads the
/proc dir to show processes. That's why I was confused with kill().

And yes, I'm tracking a bug. A service monitor, like systemd or
some watchdog, uses kill() to check if a pid is valid or not:
1. Store service pid into cache.
2. Check if pid in cache is valid by kill(pid, 0).
3. Check if pid in cache is the service to watch.

So if kill(pid, 0) returns success but no process info shows on 'ps'
command, the service monitor could be confused. The monitor could
check if pid is tid, but this means the odd behavior would be used
intentionally. And this workaround may be unsafe on other OS?

I'm agreed with you that this behavior shouldn't be removed, in case
some userspace applications use it now.

Regards,
Cambda


2022-09-23 05:55:32

by Florian Weimer

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID

* Eric W. Biederman:

> [email protected] writes:
>
>> I found syscall kill() can send signal to a thread id, which is
>> not the TGID. But the Linux manual page kill(2) said:
>>
>> "The kill() system call can be used to send any signal to any
>> process group or process."
>>
>> And the Linux manual page tkill(2) said:
>>
>> "tgkill() sends the signal sig to the thread with the thread ID
>> tid in the thread group tgid. (By contrast, kill(2) can be used
>> to send a signal only to a process (i.e., thread group) as a
>> whole, and the signal will be delivered to an arbitrary thread
>> within that process.)"
>>
>> I don't know whether the meaning of this 'process' should be
>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>> while Linux sends signal to the thread group that the thread belongs
>> to.
>>
>> If this is as expected, should we add a notice to the Linux manual
>> page? Because it's a syscall and the pids not equal to tgid are not
>> listed under /proc. This may be a little confusing, I guess.
>
> This is as expected.
>
> The bit about is /proc is interesting. On linux try
> "cd /proc; cd tid" and see what happens.
>
> Using the thread id in kill(2) is used to select the process, and the
> delivery happens just the same as if the TGID had been used.
>
> It is one of those odd behaviors that we could potentially remove. It
> would require hunting through all of the userspace applications to see
> if something happens to depend upon that behavior. Unless it becomes
> expensive to maintain I don't expect we will ever do that.

It would just replace one odd behavior by another because kill for the
TID of the main thread will still send the signal to the entire process
(because the TID is equal to the PID), but for the other threads, it
would just send it to the thread. So it would still be inconsistent.

Thanks,
Florian

2022-09-23 06:52:35

by Cambda Zhu

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID


> On Sep 23, 2022, at 13:31, Florian Weimer <[email protected]> wrote:
>
> * Eric W. Biederman:
>
>> [email protected] writes:
>>
>>> I found syscall kill() can send signal to a thread id, which is
>>> not the TGID. But the Linux manual page kill(2) said:
>>>
>>> "The kill() system call can be used to send any signal to any
>>> process group or process."
>>>
>>> And the Linux manual page tkill(2) said:
>>>
>>> "tgkill() sends the signal sig to the thread with the thread ID
>>> tid in the thread group tgid. (By contrast, kill(2) can be used
>>> to send a signal only to a process (i.e., thread group) as a
>>> whole, and the signal will be delivered to an arbitrary thread
>>> within that process.)"
>>>
>>> I don't know whether the meaning of this 'process' should be
>>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>>> while Linux sends signal to the thread group that the thread belongs
>>> to.
>>>
>>> If this is as expected, should we add a notice to the Linux manual
>>> page? Because it's a syscall and the pids not equal to tgid are not
>>> listed under /proc. This may be a little confusing, I guess.
>>
>> This is as expected.
>>
>> The bit about is /proc is interesting. On linux try
>> "cd /proc; cd tid" and see what happens.
>>
>> Using the thread id in kill(2) is used to select the process, and the
>> delivery happens just the same as if the TGID had been used.
>>
>> It is one of those odd behaviors that we could potentially remove. It
>> would require hunting through all of the userspace applications to see
>> if something happens to depend upon that behavior. Unless it becomes
>> expensive to maintain I don't expect we will ever do that.
>
> It would just replace one odd behavior by another because kill for the
> TID of the main thread will still send the signal to the entire process
> (because the TID is equal to the PID), but for the other threads, it
> would just send it to the thread. So it would still be inconsistent.
>
> Thanks,
> Florian

I don't quite understand what you mean, sorry. But if kill() returns -ESRCH for
tid which is not equal to tgid, kill() can only send signal to thread group via
main thread id, that is what BSD did and manual said. It seems not odd?

Regards,
Cambda

2022-09-23 08:15:08

by Florian Weimer

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID

> I don't quite understand what you mean, sorry. But if kill() returns
> -ESRCH for tid which is not equal to tgid, kill() can only send signal
> to thread group via main thread id, that is what BSD did and manual
> said. It seems not odd?

It's still odd because there's one TID per process that's valid for
kill by accident. That's all.

Thanks,
Florian

2022-09-23 08:50:21

by Cambda Zhu

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID


> On Sep 23, 2022, at 15:53, Florian Weimer <[email protected]> wrote:
>
>> I don't quite understand what you mean, sorry. But if kill() returns
>> -ESRCH for tid which is not equal to tgid, kill() can only send signal
>> to thread group via main thread id, that is what BSD did and manual
>> said. It seems not odd?
>
> It's still odd because there's one TID per process that's valid for
> kill by accident. That's all.
>
> Thanks,
> Florian

As far as I know, there is no rule forbidding 'process ID'(TGID on Linux)
equals to main thread ID, is it right? If one wants to send signal to a
specific thread, tgkill() can do that. As far as I understand, the difference
between kill() and tgkill() is whether the signal is set on shared_pending,
whatever the ID is a process ID or a thread ID. For Linux, the main thread ID
just equals to the process ID. So the meaning of kill(main_tid, sig) is sending
signal to a process, of which the PID equals to the first argument. It's not odd,
I think.

Thanks,
Cambda

2022-09-23 12:04:50

by David Laight

[permalink] [raw]
Subject: RE: Syscall kill() can send signal to thread ID

...
> And yes, I'm tracking a bug. A service monitor, like systemd or
> some watchdog, uses kill() to check if a pid is valid or not:
> 1. Store service pid into cache.
> 2. Check if pid in cache is valid by kill(pid, 0).
> 3. Check if pid in cache is the service to watch.
>
> So if kill(pid, 0) returns success but no process info shows on 'ps'
> command, the service monitor could be confused. The monitor could
> check if pid is tid, but this means the odd behavior would be used
> intentionally. And this workaround may be unsafe on other OS?

That looks pretty broken to me.
On Linux a pid can be reused immediately a process exits.
So there is really no guarantee that the pid is the one you want.
IIRC there are some recent changes that mean opening /proc/<pid>
will stop the pid being reused - allowing checks before sending a signal.
(Netbsd won't reuse a pid for a reasonable number of forks
and then uses a semi-random pid allocator.
Don't know whether any other 'bsd picked up that change.)

Also using signals in multi-threaded programs is pretty much
non-portable.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-09-23 21:33:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID

Cambda Zhu <[email protected]> writes:

>> On Sep 23, 2022, at 15:53, Florian Weimer <[email protected]> wrote:
>>
>>> I don't quite understand what you mean, sorry. But if kill() returns
>>> -ESRCH for tid which is not equal to tgid, kill() can only send signal
>>> to thread group via main thread id, that is what BSD did and manual
>>> said. It seems not odd?
>>
>> It's still odd because there's one TID per process that's valid for
>> kill by accident. That's all.

> As far as I know, there is no rule forbidding 'process ID'(TGID on Linux)
> equals to main thread ID, is it right?

There is an unfortunate guarantee that glibc depends upon that after
exec TGID == TID for the initial thread in a process. I say unfortunate
because maintaining that guarantee when another thread in the process
calls exec is a bit painful.

> If one wants to send signal to a specific thread, tgkill() can do
> that. As far as I understand, the difference between kill() and
> tgkill() is whether the signal is set on shared_pending, whatever the
> ID is a process ID or a thread ID. For Linux, the main thread ID just
> equals to the process ID.

Correct. kill and tgkill uses different signal queues. Kill is global
to the destination process and tgkill is always thread local.

> So the meaning of kill(main_tid, sig) is sending signal to a process,
> of which the PID equals to the first argument. It's not odd, I think.

Yes, the oddity is the TGID and TID share the same value, nothing else.

Eric

2022-09-23 21:33:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID

"[email protected]" <[email protected]> writes:

>> On Sep 22, 2022, at 23:33, Eric W. Biederman <[email protected]> wrote:
>>
>> [email protected] writes:
>>
>>> I found syscall kill() can send signal to a thread id, which is
>>> not the TGID. But the Linux manual page kill(2) said:
>>>
>>> "The kill() system call can be used to send any signal to any
>>> process group or process."
>>>
>>> And the Linux manual page tkill(2) said:
>>>
>>> "tgkill() sends the signal sig to the thread with the thread ID
>>> tid in the thread group tgid. (By contrast, kill(2) can be used
>>> to send a signal only to a process (i.e., thread group) as a
>>> whole, and the signal will be delivered to an arbitrary thread
>>> within that process.)"
>>>
>>> I don't know whether the meaning of this 'process' should be
>>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>>> while Linux sends signal to the thread group that the thread belongs
>>> to.
>>>
>>> If this is as expected, should we add a notice to the Linux manual
>>> page? Because it's a syscall and the pids not equal to tgid are not
>>> listed under /proc. This may be a little confusing, I guess.
>>
>> How did you come across this? Were you just experimenting?
>>
>> I am wondering if you were tracking a bug, or a portability problem
>> or something else. If the current behavior is causing problems in
>> some way instead of just being a detail that no one really cares about
>> either way it would be worth considering if we want to maintain the
>> current behavior.
>>
>> Eric
>
> I have found I can cd into /proc/tid, and the proc_pid_readdir()
> uses next_tgid() to filter tid. Also the 'ps' command reads the
> /proc dir to show processes. That's why I was confused with kill().
>
> And yes, I'm tracking a bug. A service monitor, like systemd or
> some watchdog, uses kill() to check if a pid is valid or not:
> 1. Store service pid into cache.
> 2. Check if pid in cache is valid by kill(pid, 0).
> 3. Check if pid in cache is the service to watch.
>
> So if kill(pid, 0) returns success but no process info shows on 'ps'
> command, the service monitor could be confused. The monitor could
> check if pid is tid, but this means the odd behavior would be used
> intentionally. And this workaround may be unsafe on other OS?
>
> I'm agreed with you that this behavior shouldn't be removed, in case
> some userspace applications use it now.

As has already been mentioned using pids and api's like kill is
fundamentally racy. We try and to keep from reusing pids too quickly.
Unfortunately what we have is that on average there will be some time
between pid reuse not an kind of worst case guarantee.

We have slowly been introducing techniques into linux allow combatting
that. A directory processes directory in proc that you have open will
never point to another process even after the pid is reused. Similarly
we have pidfd that will associate with a specific process and will not
associate with any other process even if the processes pid is reused.

That is we have userspace pid value reuse, but we don't reuse struct pid
in the kernel.

Unfortunately I don't think there is anything that allows these races to
be addressed in a portable manner.

Eric

2022-09-24 04:06:57

by Cambda Zhu

[permalink] [raw]
Subject: Re: Syscall kill() can send signal to thread ID


> On Sep 24, 2022, at 05:21, Eric W. Biederman <[email protected]> wrote:
>
> "[email protected]" <[email protected]> writes:
>
>>> On Sep 22, 2022, at 23:33, Eric W. Biederman <[email protected]> wrote:
>>>
>>> [email protected] writes:
>>>
>>>> I found syscall kill() can send signal to a thread id, which is
>>>> not the TGID. But the Linux manual page kill(2) said:
>>>>
>>>> "The kill() system call can be used to send any signal to any
>>>> process group or process."
>>>>
>>>> And the Linux manual page tkill(2) said:
>>>>
>>>> "tgkill() sends the signal sig to the thread with the thread ID
>>>> tid in the thread group tgid. (By contrast, kill(2) can be used
>>>> to send a signal only to a process (i.e., thread group) as a
>>>> whole, and the signal will be delivered to an arbitrary thread
>>>> within that process.)"
>>>>
>>>> I don't know whether the meaning of this 'process' should be
>>>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>>>> while Linux sends signal to the thread group that the thread belongs
>>>> to.
>>>>
>>>> If this is as expected, should we add a notice to the Linux manual
>>>> page? Because it's a syscall and the pids not equal to tgid are not
>>>> listed under /proc. This may be a little confusing, I guess.
>>>
>>> How did you come across this? Were you just experimenting?
>>>
>>> I am wondering if you were tracking a bug, or a portability problem
>>> or something else. If the current behavior is causing problems in
>>> some way instead of just being a detail that no one really cares about
>>> either way it would be worth considering if we want to maintain the
>>> current behavior.
>>>
>>> Eric
>>
>> I have found I can cd into /proc/tid, and the proc_pid_readdir()
>> uses next_tgid() to filter tid. Also the 'ps' command reads the
>> /proc dir to show processes. That's why I was confused with kill().
>>
>> And yes, I'm tracking a bug. A service monitor, like systemd or
>> some watchdog, uses kill() to check if a pid is valid or not:
>> 1. Store service pid into cache.
>> 2. Check if pid in cache is valid by kill(pid, 0).
>> 3. Check if pid in cache is the service to watch.
>>
>> So if kill(pid, 0) returns success but no process info shows on 'ps'
>> command, the service monitor could be confused. The monitor could
>> check if pid is tid, but this means the odd behavior would be used
>> intentionally. And this workaround may be unsafe on other OS?
>>
>> I'm agreed with you that this behavior shouldn't be removed, in case
>> some userspace applications use it now.
>
> As has already been mentioned using pids and api's like kill is
> fundamentally racy. We try and to keep from reusing pids too quickly.
> Unfortunately what we have is that on average there will be some time
> between pid reuse not an kind of worst case guarantee.
>
> We have slowly been introducing techniques into linux allow combatting
> that. A directory processes directory in proc that you have open will
> never point to another process even after the pid is reused. Similarly
> we have pidfd that will associate with a specific process and will not
> associate with any other process even if the processes pid is reused.
>
> That is we have userspace pid value reuse, but we don't reuse struct pid
> in the kernel.
>
> Unfortunately I don't think there is anything that allows these races to
> be addressed in a portable manner.
>
> Eric

I got it. Thank you!

Regards,
Cambda