2009-07-27 16:51:11

by Stephane Eranian

[permalink] [raw]
Subject: perf_counters issue with self-sampling threads

Hi,

I believe there is a problem with the current perf_counters (PCL)
code for self-sampling threads. The problem is related to sample
notifications via signal.

PCL (just like perfmon) is using SIGIO, an asynchronous signal,
to notify user applications of the availability of data in the event
buffer.

POSIX does not mandate that asynchronous signals be delivered
to the thread in which they originated. Any thread in the process
may process the signal, assuming it does not have the signal
blocked.

This is a serious problem with any self-sampling program such as
those built on top of PAPI. When sampling, you do want the signal
to be delivered to the thread in which the counter overflowed. This is
not just for convenience but it is required if the signal handler needs
to operate on the thread's machine state. Although, there is always
a possibility of forwarding the signal via tkill() to the right thread, I
do not think this is the right solution as it incurs additional latency
and therefore skid.

Looking at the kernel source code related to that, it seems that
kill_fasync() ends up calling group_send_sig_info(). This function
adds the signal to the process SHARED sigpending queue. Then,
it picks a thread to "wakeup". It first tries the thread in which the
signal originated with the following selection test (wants_signal):
- signal is not blocked
- thread is not exiting
- no signal private pending for this thread

If that does not work, it iterates over the other threads of the process.

This explains why in trivial tests, the SIGIO is always delivered
to the right thread. However it the monitored thread is using any
other signals, e.g., SIGALRM, then the SIGIO signal can go to the
wrong thread. The problem also arises if the first SIGIO is not already
processed by the time a 2nd is pended.

For self-sampling, we want (and in fact require) asynchronous notifications.
But we want the extra guarantee that the signal is ALWAYS delivered to
the thread in which the event occurred.

It seems like we could either create a different version of kill_fasync() or
pass an extra parameter to force this function to use specific_send_sig_info().
This would be only when self-monitoring. When a tool is monitoring another
thread, it is probably okay to have the signal delivered to any threads. Most
likely, the tool is setup such that threads not processing notifications have
the signal blocked.


2009-07-27 16:54:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> Hi,
>
> I believe there is a problem with the current perf_counters (PCL)
> code for self-sampling threads. The problem is related to sample
> notifications via signal.
>
> PCL (just like perfmon) is using SIGIO, an asynchronous signal,
> to notify user applications of the availability of data in the event
> buffer.
>
> POSIX does not mandate that asynchronous signals be delivered
> to the thread in which they originated. Any thread in the process
> may process the signal, assuming it does not have the signal
> blocked.

Bugger, you're right. /me kicks POSIX again for creating these crazy ass
semantics.

I'll look at fixing this.

Thanks!

2009-07-27 21:25:20

by Andi Kleen

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads


I wonder how SIGPROF gets around the same problem, or is it
buggy too?

-Andi

--
[email protected] -- Speaking for myself only.

2009-07-28 08:51:09

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

[Reposting the message because of stupid MIME-encoding error on my part]
Andi,

Looks like SIGPROF is calling _group_send_sig_info(), so I
think it is subject to the same problem.

I have built a perfmon example to test the problem, it is
relatively easy to trigger by simply self-monitoring a thread
which is using setitimer() and thus SIGALRM. You just have
to increase the timer frequency. At some point, the SIGPROF
signal will be delivered to the wrong thread.

One thing I did not mention in my message is that one would think
that forcing a specific signal via F_SETSIG could be a workaround
if that signal is synchronous, e.g., SIGFPE. F_SETSIG looks like a
Linux extension but it does not solve the problem. Linux determines
the mode of delivery not on the signal number but on the code path,
it seems. If you use F_ASYNC+F_SETOWN, then this is systematically
considered asynchronous regardless of the signal used.If you come from
traps.c, then the signal is delivered to the correct thread. All of this does
not look unreasonable to me.

I believe sampling, be it timer or PMU-based, may be
a special case here. We want asynchronous + specific
thread (only) when self- sampling.

An alternative may be to choose the pending queue based on the signal type
(synchronous, asynchronous). Then, we could use F_SETSIG to override
SIGIO with a synchronous signal instead. But I am not a POSIX signal expert.


On Jul 27, 2009 11:25 PM, "Andi Kleen" <[email protected]> wrote:


I wonder how SIGPROF gets around the same problem, or is it
buggy too?

-Andi

--
[email protected] -- Speaking for myself only.

2009-07-28 08:56:37

by Andi Kleen

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On Tue, Jul 28, 2009 at 10:51:04AM +0200, stephane eranian wrote:
> [Reposting the message because of stupid MIME-encoding error on my part]
> Andi,
>
> Looks like SIGPROF is calling _group_send_sig_info(), so I
> think it is subject to the same problem.

So sounds like a whole class of signals can't be POSIX compliant.

Likely others will run into the same problem.

Perhaps should define a new sigaction flag for this to make
it all explicit.

-Andi


--
[email protected] -- Speaking for myself only.

2009-07-28 09:13:19

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

Andi,

On Tue, Jul 28, 2009 at 10:56 AM, Andi Kleen<[email protected]> wrote:
> On Tue, Jul 28, 2009 at 10:51:04AM +0200, stephane eranian wrote:
>> [Reposting the message because of stupid MIME-encoding error on my part]
>> Andi,
>>
>> Looks like SIGPROF is calling _group_send_sig_info(), so I
>> think it is subject to the same problem.
>
> So sounds like a whole class of signals can't be POSIX compliant.
>

Well, the problem is that I don't where to find the POSIX spec that defines
signal types and how they should be handled in multi-threaded programs.
That would be a good starting point.

> Likely others will run into the same problem.
>
> Perhaps should define a new sigaction flag for this to make
> it all explicit.
>
That's probably a clean way of doing this.

2009-07-29 12:16:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> I believe there is a problem with the current perf_counters (PCL)
> code for self-sampling threads. The problem is related to sample
> notifications via signal.
>
> PCL (just like perfmon) is using SIGIO, an asynchronous signal,
> to notify user applications of the availability of data in the event
> buffer.
>
> POSIX does not mandate that asynchronous signals be delivered
> to the thread in which they originated. Any thread in the process
> may process the signal, assuming it does not have the signal
> blocked.

This signal stuff makes my head spin a little, however:

fcntl(2) for F_SETOWN says:

If a non-zero value is given to F_SETSIG in a multi‐ threaded
process running with a threading library that supports thread groups
(e.g., NPTL), then a positive value given to F_SETOWN has a
different meaning: instead of being a process ID identifying a whole
pro‐ cess, it is a thread ID identifying a specific thread within a
process. Consequently, it may be necessary to pass F_SETOWN the
result of gettid(2) instead of get‐ pid(2) to get sensible results
when F_SETSIG is used. (In current Linux threading
implementations, a main thread’s thread ID is the same as its process
ID. This means that a single-threaded program can equally use
gettid(2) or getpid(2) in this scenario.) Note, how‐ ever, that
the statements in this paragraph do not apply to the SIGURG signal
generated for out-of-band data on a socket: this signal is always
sent to either a process or a process group, depending on the value
given to F_SETOWN. Note also that Linux imposes a limit on the
number of real-time signals that may be queued to a process (see
getrlimit(2) and signal(7)) and if this limit is reached, then the
kernel reverts to delivering SIGIO, and this signal is delivered
to the entire process rather than to a specific thread.


Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
a PID it should deliver SIGIO to the thread instead of the whole process
-- which, to me, seems a sane semantic.

However,

kill_fasync(SIGIO)
__kill_fasync()
send_sigio()
/* if pid_type is a PIDTYPE_PID and pid a TID this should
only iterate the one thread, I think */
do_each_pid_task() {
send_sigio_to_task();
} while_each_pid_task();

where:

send_sigio_to_task()
group_send_sig_info()
__group_send_sig_info()
send_signal(.group = 1) /* uh-ow trouble */
__send_signal()
if (group)
pending = &t->signal->shared_pending

which will result in the signal being send to the whole process anyway.


Now I was considering teaching send_sigio_to_task() to use
specific_send_sig_info() when fown->pid != fown->group_leader->pid or
something, but I'm not sure that won't break anything.

Alternatively, I've missed a detail and I either read the manpage wrong,
or the code, or both of them.


2009-07-29 12:37:13

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

Peter,

On Wed, Jul 29, 2009 at 2:19 PM, Peter Zijlstra<[email protected]> wrote:
> On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
>> I believe there is a problem with the current perf_counters (PCL)
>> code for self-sampling threads. The problem is related to sample
>> notifications via signal.
>>
>> PCL (just like perfmon) is using SIGIO, an asynchronous signal,
>> to notify user applications of the availability of data in the event
>> buffer.
>>
>> POSIX does not mandate that asynchronous signals be delivered
>> to the thread in which they originated. Any thread in the process
>> may process the signal, assuming it does not have the signal
>> blocked.
>
> This signal stuff makes my head spin a little, however:
>
> fcntl(2) for F_SETOWN says:
>
> If a non-zero value is given to F_SETSIG  in  a  multi‐ threaded
> process running with a threading library that supports thread groups
> (e.g., NPTL),  then  a  positive value  given  to  F_SETOWN  has  a
> different  meaning: instead of being a process ID identifying a whole
> pro‐ cess,  it  is a thread ID identifying a specific thread within a
> process.  Consequently, it may be necessary to pass  F_SETOWN  the
> result of gettid(2) instead of get‐ pid(2) to get sensible results
> when F_SETSIG  is  used.  (In  current  Linux  threading
> implementations, a main thread’s thread ID is the same as its process
> ID.  This means  that  a  single-threaded program can equally use
> gettid(2) or getpid(2) in this scenario.)   Note,  how‐ ever,  that
> the  statements  in  this paragraph do not apply to the SIGURG signal
> generated  for  out-of-band data  on a socket: this signal is always
> sent to either a process or a process group, depending  on  the  value
> given  to  F_SETOWN.   Note  also  that Linux imposes a limit on the
> number of real-time signals  that  may  be queued  to  a  process (see
> getrlimit(2) and signal(7)) and if this limit is reached, then the
> kernel  reverts to  delivering  SIGIO,  and this signal is delivered
> to the entire process rather than to a specific thread.
>
>
> Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
> a PID it should deliver SIGIO to the thread instead of the whole process
> -- which, to me, seems a sane semantic.
>
Yes, I remember that manpage. I got the same impression and in fact that is
what I document in some of my test programs. So you read this right.

> However,
>
>  kill_fasync(SIGIO)
>    __kill_fasync()
>      send_sigio()
>        /* if pid_type is a PIDTYPE_PID and pid a TID this should
>           only iterate the one thread, I think */
>        do_each_pid_task() {
>          send_sigio_to_task();
>        } while_each_pid_task();
>
> where:
>
>  send_sigio_to_task()
>    group_send_sig_info()
>      __group_send_sig_info()
>        send_signal(.group = 1) /* uh-ow trouble */
>          __send_signal()
>            if (group)
>               pending = &t->signal->shared_pending
>
> which will result in the signal being send to the whole process anyway.
>
Exactly! That is the code path and this is why this does not work as
expected. Nowhere along that path is there special casing for that
F_SETOWN of tid vs. pid. kill_fasync() implies group.


>
> Now I was considering teaching send_sigio_to_task() to use
> specific_send_sig_info() when fown->pid != fown->group_leader->pid or
> something, but I'm not sure that won't break anything.
>
Yes, that's the problem with touching this. I don't know if this will break
things. That's why I was suggested creating a parallel code path which
does what we want without modifying the existing path. Unless you know
some signal expert at redhat or elsewhere.

> Alternatively, I've missed a detail and I either read the manpage wrong,
> or the code, or both of them.
>
The code does not correspond to the manpage. Not clear which one
is correct though. This F_SETOWN trick looks very Linux specific.

2009-07-29 12:43:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On Wed, 2009-07-29 at 14:37 +0200, stephane eranian wrote:
>
> >
> > Now I was considering teaching send_sigio_to_task() to use
> > specific_send_sig_info() when fown->pid != fown->group_leader->pid or
> > something, but I'm not sure that won't break anything.
> >
> Yes, that's the problem with touching this. I don't know if this will break
> things. That's why I was suggested creating a parallel code path which
> does what we want without modifying the existing path. Unless you know
> some signal expert at redhat or elsewhere.

His name is Oleg, and he's on CC ;-)

> > Alternatively, I've missed a detail and I either read the manpage wrong,
> > or the code, or both of them.
> >
> The code does not correspond to the manpage. Not clear which one
> is correct though. This F_SETOWN trick looks very Linux specific.

Linus specific sounds good enough to me. Michael might have something so
say on this though...

2009-07-29 22:21:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

(add Roland)

On 07/29, Peter Zijlstra wrote:
>
> On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> >
> > POSIX does not mandate that asynchronous signals be delivered
> > to the thread in which they originated. Any thread in the process
> > may process the signal, assuming it does not have the signal
> > blocked.

Yes. I now nothing about POSIX, but this is what Linux does at least.
I don't think we can/should change this behaviour.

> fcntl(2) for F_SETOWN says:
>
> If a non-zero value is given to F_SETSIG in a multi‐ threaded
> process running with a threading library that supports thread groups
> (e.g., NPTL), then a positive value given to F_SETOWN has a
> different meaning: instead of being a process ID identifying a whole
> pro‐ cess, it is a thread ID identifying a specific thread within a
> process.

Heh. Definitely this is not what Linux does ;)

> Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
> a PID it should deliver SIGIO to the thread instead of the whole process
> -- which, to me, seems a sane semantic.

I am not sure I understand the man above... But to me it looks like we
should always send a private signal when fown->signum != 0 ?

The change should be simple, but as you pointed out we can break things.

Oleg.

2009-07-30 11:28:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On Thu, 2009-07-30 at 00:17 +0200, Oleg Nesterov wrote:
> (add Roland)

but you seem to have forgotten to actually edit the CC line, fixed
that ;-)

> On 07/29, Peter Zijlstra wrote:
> >
> > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> > >
> > > POSIX does not mandate that asynchronous signals be delivered
> > > to the thread in which they originated. Any thread in the process
> > > may process the signal, assuming it does not have the signal
> > > blocked.
>
> Yes. I now nothing about POSIX, but this is what Linux does at least.
> I don't think we can/should change this behaviour.

Well, we have plenty exceptions to that rule already, we have itimer
extentions, tkill sys_rt_tgsigqueueinfo and plenty more..

> > fcntl(2) for F_SETOWN says:
> >
> > If a non-zero value is given to F_SETSIG in a multi‐ threaded
> > process running with a threading library that supports thread groups
> > (e.g., NPTL), then a positive value given to F_SETOWN has a
> > different meaning: instead of being a process ID identifying a whole
> > pro‐ cess, it is a thread ID identifying a specific thread within a
> > process.
>
> Heh. Definitely this is not what Linux does ;)

Right, so the question is, did we ever? Why does the man page say this.

Looking at the .12 source (git start) we did:

440 if (!send_sig_info(fown->signum, &si, p))
441 break;
442 /* fall-through: fall back on the old plain SIGIO signal */
443 case 0:
444 send_group_sig_info(SIGIO, SEND_SIG_PRIV, p);

Which was 'corrected' in:

commit fc9c9ab22d5650977c417ef2032d02f455011b23
Author: Bharath Ramesh <[email protected]>
Date: Sat Apr 16 15:25:41 2005 -0700

[PATCH] AYSNC IO using singals other than SIGIO

A question on sigwaitinfo based IO mechanism in multithreaded applications.

I am trying to use RT signals to notify me of IO events using RT signals
instead of SIGIO in a multithreaded applications. I noticed that there was
some discussion on lkml during november 1999 with the subject of the
discussion as "Signal driven IO". In the thread I noticed that RT signals
were being delivered to the worker thread. I am running 2.6.10 kernel and
I am trying to use the very same mechanism and I find that only SIGIO being
propogated to the worker threads and RT signals only being propogated to
the main thread and not the worker threads where I actually want them to be
propogated too. On further inspection I found that the following patch
which I have attached solves the problem.

I am not sure if this is a bug or feature in the kernel.


Roland McGrath <[email protected]> said:

This relates only to fcntl F_SETSIG, which is a Linux extension. So there is
no POSIX issue. When changing various things like the normal SIGIO signalling
to do group signals, I was concerned strictly with the POSIX semantics and
generally avoided touching things in the domain of Linux inventions. That's
why I didn't change this when I changed the call right next to it. There is
no reason I can see that F_SETSIG-requested signals shouldn't use a group
signal like normal SIGIO does. I'm happy to ACK this patch, there is nothing
wrong with its change to the semantics in my book. But neither POSIX nor I
care a whit what F_SETSIG does.

Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

> > Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
> > a PID it should deliver SIGIO to the thread instead of the whole process
> > -- which, to me, seems a sane semantic.
>
> I am not sure I understand the man above... But to me it looks like we
> should always send a private signal when fown->signum != 0 ?
>
> The change should be simple, but as you pointed out we can break things.

Right, so the change I had in mind is like the below (except I don't
know if we can compare struct pid things by pointer value or if we
should look at the content).

In any case, we should either do something like the below (yay!), or
amend the manpage (Michael?) and introduce something like F_SETOWN2
which does have the below semantics :-(.

---
Index: linux-2.6/fs/fcntl.c
===================================================================
--- linux-2.6.orig/fs/fcntl.c
+++ linux-2.6/fs/fcntl.c
@@ -431,6 +431,16 @@ static void send_sigio_to_task(struct ta
int fd,
int reason)
{
+ int (*send_sig)(int, struct siginfo *, struct task_struct *);
+
+ send_sig = group_send_sig_info;
+ /*
+ * If the fown points to a specific TID instead of to a PID
+ * we'll send the signal to the thread only.
+ */
+ if (fown->pid_type == PIDTYPE_PID && fown->pid != task_tgid(p))
+ send_sig = send_sig_info;
+
/*
* F_SETSIG can change ->signum lockless in parallel, make
* sure we read it once and use the same value throughout.
@@ -461,11 +472,11 @@ static void send_sigio_to_task(struct ta
else
si.si_band = band_table[reason - POLL_IN];
si.si_fd = fd;
- if (!group_send_sig_info(signum, &si, p))
+ if (!send_sig(signum, &si, p))
break;
/* fall-through: fall back on the old plain SIGIO signal */
case 0:
- group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
+ send_sig(SIGIO, SEND_SIG_PRIV, p);
}
}


2009-07-30 19:25:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On 07/30, Peter Zijlstra wrote:
>
> On Thu, 2009-07-30 at 00:17 +0200, Oleg Nesterov wrote:
> > (add Roland)
>
> but you seem to have forgotten to actually edit the CC line, fixed
> that ;-)

Yes, thanks ;)

> > On 07/29, Peter Zijlstra wrote:
> > >
> > > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> > > >
> > > > POSIX does not mandate that asynchronous signals be delivered
> > > > to the thread in which they originated. Any thread in the process
> > > > may process the signal, assuming it does not have the signal
> > > > blocked.
> >
> > Yes. I now nothing about POSIX, but this is what Linux does at least.
> > I don't think we can/should change this behaviour.
>
> Well, we have plenty exceptions to that rule already, we have itimer
> extentions, tkill sys_rt_tgsigqueueinfo and plenty more..

Yes, yes, I meant the behaviour of kill(2), group_send_sig_info(), etc.

> > > fcntl(2) for F_SETOWN says:
> > >
> > > If a non-zero value is given to F_SETSIG in a multi‐ threaded
> > > process running with a threading library that supports thread groups
> > > (e.g., NPTL), then a positive value given to F_SETOWN has a
> > > different meaning: instead of being a process ID identifying a whole
> > > pro‐ cess, it is a thread ID identifying a specific thread within a
> > > process.
> >
> > Heh. Definitely this is not what Linux does ;)
>
> Right, so the question is, did we ever? Why does the man page say this.
>
> Looking at the .12 source (git start) we did:
>
> 440 if (!send_sig_info(fown->signum, &si, p))
> 441 break;
> 442 /* fall-through: fall back on the old plain SIGIO signal */
> 443 case 0:
> 444 send_group_sig_info(SIGIO, SEND_SIG_PRIV, p);

Yes, the send_sig_info() above seems to match the manpage.

Another thing I can't understand, group_send_sig_info() calls
check_kill_permission(). But check_kill_permission() uses current, which
can be a "random" task if kill_fasync() is called from interrupt. Even
if not interrupt, I don't understand why (say) pipe_read() can't send a
signal here. sigio_perm() has already checked permissions, and it correctly
uses fown->cred.

> Which was 'corrected' in:
>
> commit fc9c9ab22d5650977c417ef2032d02f455011b23
> Author: Bharath Ramesh <[email protected]>
> Date: Sat Apr 16 15:25:41 2005 -0700
>
> [PATCH] AYSNC IO using singals other than SIGIO
>
> A question on sigwaitinfo based IO mechanism in multithreaded applications.
>
> I am trying to use RT signals to notify me of IO events using RT signals
> instead of SIGIO in a multithreaded applications. I noticed that there was
> some discussion on lkml during november 1999 with the subject of the
> discussion as "Signal driven IO". In the thread I noticed that RT signals
> were being delivered to the worker thread. I am running 2.6.10 kernel and
> I am trying to use the very same mechanism and I find that only SIGIO being
> propogated to the worker threads and RT signals only being propogated to
> the main thread and not the worker threads where I actually want them to be
> propogated too. On further inspection I found that the following patch
> which I have attached solves the problem.

So, some people want shared signals here.

> > I am not sure I understand the man above... But to me it looks like we
> > should always send a private signal when fown->signum != 0 ?
> >
> > The change should be simple, but as you pointed out we can break things.
>
> Right, so the change I had in mind is like the below (except I don't
> know if we can compare struct pid things by pointer value or if we
> should look at the content).

(yes, we can compare the pointers)

> In any case, we should either do something like the below (yay!), or
> amend the manpage (Michael?) and introduce something like F_SETOWN2
> which does have the below semantics :-(.
>
> ---
> Index: linux-2.6/fs/fcntl.c
> ===================================================================
> --- linux-2.6.orig/fs/fcntl.c
> +++ linux-2.6/fs/fcntl.c
> @@ -431,6 +431,16 @@ static void send_sigio_to_task(struct ta
> int fd,
> int reason)
> {
> + int (*send_sig)(int, struct siginfo *, struct task_struct *);
> +
> + send_sig = group_send_sig_info;
> + /*
> + * If the fown points to a specific TID instead of to a PID
> + * we'll send the signal to the thread only.
> + */
> + if (fown->pid_type == PIDTYPE_PID && fown->pid != task_tgid(p))
> + send_sig = send_sig_info;

Yes, this allows to send a private signal to sub-thread.

But this is a bit strange, because the user can't specify it wants
a thread-specific signal to the main thread, its tid == pid.

I don't know what should we do. Perhaps we can just add
"bool is_group_signal" to fown_struct as another Linux extension.

Oleg.

2009-07-30 20:00:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On Thu, 2009-07-30 at 21:20 +0200, Oleg Nesterov wrote:

> Yes, this allows to send a private signal to sub-thread.
>
> But this is a bit strange, because the user can't specify it wants
> a thread-specific signal to the main thread, its tid == pid.

Ah, indeed. I'll make a patch for F_SETOWN_TID then, unless someone
comes up with a better name for the creature ;-)

2009-07-30 20:32:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On 07/30, Peter Zijlstra wrote:
>
> I'll make a patch for F_SETOWN_TID then, unless someone
> comes up with a better name for the creature ;-)

I think you are right. It is not safe to change the current
behaviour.

Oleg.

2009-07-30 21:09:37

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counters issue with self-sampling threads

On Thu, Jul 30, 2009 at 10:28 PM, Oleg Nesterov<[email protected]> wrote:
> On 07/30, Peter Zijlstra wrote:
>>
>> I'll make a patch for F_SETOWN_TID then, unless someone
>> comes up with a better name for the creature ;-)
>
> I think you are right. It is not safe to change the current
> behaviour.
>
I agree. As I said earlier, it is better to just add a new code
path.

2009-07-31 08:32:13

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID

In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put
a TID into the regular fcntl(F_SETOWN) call. It will still be send to
the whole process of which that thread is part.

Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
which functions similarly to F_SETOWN, except positive arguments are
interpreted as TIDs and negative arguments are interpreted as PIDs.

This extension is fully bug compatible with the old F_GETOWN
implementation in that F_GETOWN_TID will be troubled by the negative
return value for PIDs similarly to F_GETOWN's trouble with process
groups.

[ compile tested only so far ]

Signed-off-by: Peter Zijlstra <[email protected]>
---
arch/parisc/include/asm/fcntl.h | 2 +
fs/fcntl.c | 64 +++++++++++++++++++++++++++++++++-----
include/asm-generic/fcntl.h | 4 ++
include/linux/fs.h | 11 +++++-
net/socket.c | 2 +-
5 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h
index 1e1c824..5d5235a 100644
--- a/arch/parisc/include/asm/fcntl.h
+++ b/arch/parisc/include/asm/fcntl.h
@@ -28,6 +28,8 @@
#define F_SETOWN 12 /* for sockets. */
#define F_SETSIG 13 /* for sockets. */
#define F_GETSIG 14 /* for sockets. */
+#define F_GETOWN_TID 15
+#define F_SETOWN_TID 16

/* for posix fcntl() and lockf() */
#define F_RDLCK 01
diff --git a/fs/fcntl.c b/fs/fcntl.c
index ae41308..8d0b7f9 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -197,13 +197,15 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
}

static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
- int force)
+ int flags)
{
write_lock_irq(&filp->f_owner.lock);
- if (force || !filp->f_owner.pid) {
+ if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) {
put_pid(filp->f_owner.pid);
filp->f_owner.pid = get_pid(pid);
filp->f_owner.pid_type = type;
+ filp->f_owner.task_only =
+ (type == PIDTYPE_PID && (flags & FF_SETOWN_TID));

if (pid) {
const struct cred *cred = current_cred();
@@ -215,7 +217,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
}

int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
- int force)
+ int flags)
{
int err;

@@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
if (err)
return err;

- f_modown(filp, pid, type, force);
+ f_modown(filp, pid, type, flags);
return 0;
}
EXPORT_SYMBOL(__f_setown);

-int f_setown(struct file *filp, unsigned long arg, int force)
+int f_setown(struct file *filp, unsigned long arg, int flags)
{
enum pid_type type;
struct pid *pid;
@@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned long arg, int force)
}
rcu_read_lock();
pid = find_vpid(who);
- result = __f_setown(filp, pid, type, force);
+ result = __f_setown(filp, pid, type, flags);
rcu_read_unlock();
return result;
}
@@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp)
return pid;
}

+static int f_setown_tid(struct file *filp, unsigned long arg)
+{
+ int flags = FF_SETOWN_FORCE;
+ struct pid *pid;
+ int who = arg;
+ int ret = 0;
+
+ if (who < 0)
+ who = -who;
+ else
+ flags |= FF_SETOWN_TID;
+
+ rcu_read_lock();
+ pid = find_vpid(who);
+ ret = __f_setown(filp, pid, PIDTYPE_PID, flags);
+ rcu_read_unlock();
+
+ return ret;
+}
+
+static pid_t f_getown_tid(struct file *filp)
+{
+ pid_t tid;
+
+ read_lock(&filp->f_owner.lock);
+ tid = pid_vnr(filp->f_owner.pid);
+ if (filp->f_owner.pid_type == PIDTYPE_PGID)
+ tid = 0;
+ if (!filp->f_owner.task_only)
+ tid = -tid;
+ read_unlock(&filp->f_owner.lock);
+ return tid;
+}
+
static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
struct file *filp)
{
@@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
force_successful_syscall_return();
break;
case F_SETOWN:
- err = f_setown(filp, arg, 1);
+ err = f_setown(filp, arg, FF_SETOWN_FORCE);
+ break;
+ case F_GETOWN_TID:
+ err = f_getown_tid(filp);
+ force_successful_syscall_return();
+ break;
+ case F_SETOWN_TID:
+ err = f_setown_tid(filp, arg);
break;
case F_GETSIG:
err = filp->f_owner.signum;
@@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p,
int fd,
int reason)
{
+ int (*send_sig)(int, struct siginfo *, struct task_struct *);
/*
* F_SETSIG can change ->signum lockless in parallel, make
* sure we read it once and use the same value throughout.
@@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p,
if (!sigio_perm(p, fown, signum))
return;

+ send_sig = fown->task_only ? send_sig_info : group_send_sig_info;
+
switch (signum) {
siginfo_t si;
default:
@@ -461,11 +507,11 @@ static void send_sigio_to_task(struct task_struct *p,
else
si.si_band = band_table[reason - POLL_IN];
si.si_fd = fd;
- if (!group_send_sig_info(signum, &si, p))
+ if (!send_sig(signum, &si, p))
break;
/* fall-through: fall back on the old plain SIGIO signal */
case 0:
- group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
+ send_sig(SIGIO, SEND_SIG_PRIV, p);
}
}

diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index 4d3e483..d7906b8 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -73,6 +73,10 @@
#define F_SETSIG 10 /* for sockets. */
#define F_GETSIG 11 /* for sockets. */
#endif
+#ifndef F_SETOWN_TID
+#define F_SETOWN_TID 12
+#define F_GETOWN_TID 13
+#endif

/* for F_[GET|SET]FL */
#define FD_CLOEXEC 1 /* actually anything with low bit set goes */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0872372..42697e7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -872,6 +872,7 @@ struct fown_struct {
rwlock_t lock; /* protects pid, uid, euid fields */
struct pid *pid; /* pid or -pgrp where SIGIO should be sent */
enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */
+ bool task_only; /* task or group signal */
uid_t uid, euid; /* uid/euid of process setting the owner */
int signum; /* posix.1b rt signal to be delivered on IO */
};
@@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_struct **, int, int);
/* only for net: no internal synchronization */
extern void __kill_fasync(struct fasync_struct *, int, int);

-extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
-extern int f_setown(struct file *filp, unsigned long arg, int force);
+/*
+ * setown flags
+ */
+#define FF_SETOWN_FORCE 1
+#define FF_SETOWN_TID 2
+
+extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags);
+extern int f_setown(struct file *filp, unsigned long arg, int flags);
extern void f_delown(struct file *filp);
extern pid_t f_getown(struct file *filp);
extern int send_sigurg(struct fown_struct *fown);
diff --git a/net/socket.c b/net/socket.c
index 791d71a..ac57f8e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
err = -EFAULT;
if (get_user(pid, (int __user *)argp))
break;
- err = f_setown(sock->file, pid, 1);
+ err = f_setown(sock->file, pid, FF_SETOWN_FORCE);
break;
case FIOGETOWN:
case SIOCGPGRP:

2009-07-31 14:01:09

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID

Hi,

I have tried this patch in 2.6.30 and perfmon given I already had a
stress test program
for this problem. I can report that the patch seems to work for me.

This is a self-sampling multi-threaded program in which each thread also
uses setitimer(ITIMER_REAL) and thus is getting SIGALRM.

First run with stock F_SETOWN:

$ self_smpl_multi 8 20000000 29 0
program_time = 8, threshold = 20000000, signum = 29 fcntl(F_SETOWN)

launch main: fd: 3, tid: 5836, self: 0x7f75f43206e0
launch side: fd: 4, tid: 5837, self: 0x40b59950
1: fd = 3, count = 30, iter = 651, rate = 46/Kiter
1: fd = 4, count = 29, iter = 623, rate = 46/Kiter
(bad thread) ser = 287, fd = 4, tid = 5836, self = 0x7f75f43206e0
2: fd = 4, count = 119, iter = 2540, rate = 46/Kiter
2: fd = 3, count = 118, iter = 2510, rate = 47/Kiter
(bad thread) ser = 330, fd = 4, tid = 5836, self = 0x7f75f43206e0
(bad thread) ser = 395, fd = 4, tid = 5836, self = 0x7f75f43206e0


The "(bad thread)" message shows up when the signal goes to the wrong
thread.

Second run with F_SETOWN_TID:

$ ./self_smpl_multi 8 20000000 29 1
program_time = 8, threshold = 20000000, signum = 29 fcntl(F_SETOWN_TID)

launch main: fd: 3, tid: 5838, self: 0x7fa1800af6e0
launch side: fd: 4, tid: 5839, self: 0x4253a950
1: fd = 4, count = 52, iter = 1115, rate = 46/Kiter
1: fd = 3, count = 52, iter = 1115, rate = 46/Kiter
2: fd = 4, count = 119, iter = 2541, rate = 46/Kiter
2: fd = 3, count = 117, iter = 2490, rate = 46/Kiter
3: fd = 3, count = 114, iter = 2427, rate = 46/Kiter
3: fd = 4, count = 119, iter = 2541, rate = 46/Kiter
4: fd = 4, count = 119, iter = 2541, rate = 46/Kiter
4: fd = 3, count = 114, iter = 2428, rate = 46/Kiter
5: fd = 4, count = 119, iter = 2541, rate = 46/Kiter

No more error messages.

Thanks.


On Fri, Jul 31, 2009 at 10:35 AM, Peter Zijlstra<[email protected]> wrote:
> In order to direct the SIGIO signal to a particular thread of a
> multi-threaded application we cannot, like suggested by the manpage, put
> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
> the whole process of which that thread is part.
>
> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
> which functions similarly to F_SETOWN, except positive arguments are
> interpreted as TIDs and negative arguments are interpreted as PIDs.
>
> This extension is fully bug compatible with the old F_GETOWN
> implementation in that F_GETOWN_TID will be troubled by the negative
> return value for PIDs similarly to F_GETOWN's trouble with process
> groups.
>
> [ compile tested only so far ]
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
>  arch/parisc/include/asm/fcntl.h |    2 +
>  fs/fcntl.c                      |   64 +++++++++++++++++++++++++++++++++-----
>  include/asm-generic/fcntl.h     |    4 ++
>  include/linux/fs.h              |   11 +++++-
>  net/socket.c                    |    2 +-
>  5 files changed, 71 insertions(+), 12 deletions(-)
>
> diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h
> index 1e1c824..5d5235a 100644
> --- a/arch/parisc/include/asm/fcntl.h
> +++ b/arch/parisc/include/asm/fcntl.h
> @@ -28,6 +28,8 @@
>  #define F_SETOWN       12      /*  for sockets. */
>  #define F_SETSIG       13      /*  for sockets. */
>  #define F_GETSIG       14      /*  for sockets. */
> +#define F_GETOWN_TID   15
> +#define F_SETOWN_TID   16
>
>  /* for posix fcntl() and lockf() */
>  #define F_RDLCK                01
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index ae41308..8d0b7f9 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -197,13 +197,15 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
>  }
>
>  static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
> -                     int force)
> +                     int flags)
>  {
>        write_lock_irq(&filp->f_owner.lock);
> -       if (force || !filp->f_owner.pid) {
> +       if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) {
>                put_pid(filp->f_owner.pid);
>                filp->f_owner.pid = get_pid(pid);
>                filp->f_owner.pid_type = type;
> +               filp->f_owner.task_only =
> +                       (type == PIDTYPE_PID && (flags & FF_SETOWN_TID));
>
>                if (pid) {
>                        const struct cred *cred = current_cred();
> @@ -215,7 +217,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
>  }
>
>  int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
> -               int force)
> +               int flags)
>  {
>        int err;
>
> @@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
>        if (err)
>                return err;
>
> -       f_modown(filp, pid, type, force);
> +       f_modown(filp, pid, type, flags);
>        return 0;
>  }
>  EXPORT_SYMBOL(__f_setown);
>
> -int f_setown(struct file *filp, unsigned long arg, int force)
> +int f_setown(struct file *filp, unsigned long arg, int flags)
>  {
>        enum pid_type type;
>        struct pid *pid;
> @@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned long arg, int force)
>        }
>        rcu_read_lock();
>        pid = find_vpid(who);
> -       result = __f_setown(filp, pid, type, force);
> +       result = __f_setown(filp, pid, type, flags);
>        rcu_read_unlock();
>        return result;
>  }
> @@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp)
>        return pid;
>  }
>
> +static int f_setown_tid(struct file *filp, unsigned long arg)
> +{
> +       int flags = FF_SETOWN_FORCE;
> +       struct pid *pid;
> +       int who = arg;
> +       int ret = 0;
> +
> +       if (who < 0)
> +               who = -who;
> +       else
> +               flags |= FF_SETOWN_TID;
> +
> +       rcu_read_lock();
> +       pid = find_vpid(who);
> +       ret = __f_setown(filp, pid, PIDTYPE_PID, flags);
> +       rcu_read_unlock();
> +
> +       return ret;
> +}
> +
> +static pid_t f_getown_tid(struct file *filp)
> +{
> +       pid_t tid;
> +
> +       read_lock(&filp->f_owner.lock);
> +       tid = pid_vnr(filp->f_owner.pid);
> +       if (filp->f_owner.pid_type == PIDTYPE_PGID)
> +               tid = 0;
> +       if (!filp->f_owner.task_only)
> +               tid = -tid;
> +       read_unlock(&filp->f_owner.lock);
> +       return tid;
> +}
> +
>  static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>                struct file *filp)
>  {
> @@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>                force_successful_syscall_return();
>                break;
>        case F_SETOWN:
> -               err = f_setown(filp, arg, 1);
> +               err = f_setown(filp, arg, FF_SETOWN_FORCE);
> +               break;
> +       case F_GETOWN_TID:
> +               err = f_getown_tid(filp);
> +               force_successful_syscall_return();
> +               break;
> +       case F_SETOWN_TID:
> +               err = f_setown_tid(filp, arg);
>                break;
>        case F_GETSIG:
>                err = filp->f_owner.signum;
> @@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p,
>                               int fd,
>                               int reason)
>  {
> +       int (*send_sig)(int, struct siginfo *, struct task_struct *);
>        /*
>         * F_SETSIG can change ->signum lockless in parallel, make
>         * sure we read it once and use the same value throughout.
> @@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p,
>        if (!sigio_perm(p, fown, signum))
>                return;
>
> +       send_sig = fown->task_only ? send_sig_info : group_send_sig_info;
> +
>        switch (signum) {
>                siginfo_t si;
>                default:
> @@ -461,11 +507,11 @@ static void send_sigio_to_task(struct task_struct *p,
>                        else
>                                si.si_band = band_table[reason - POLL_IN];
>                        si.si_fd    = fd;
> -                       if (!group_send_sig_info(signum, &si, p))
> +                       if (!send_sig(signum, &si, p))
>                                break;
>                /* fall-through: fall back on the old plain SIGIO signal */
>                case 0:
> -                       group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
> +                       send_sig(SIGIO, SEND_SIG_PRIV, p);
>        }
>  }
>
> diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
> index 4d3e483..d7906b8 100644
> --- a/include/asm-generic/fcntl.h
> +++ b/include/asm-generic/fcntl.h
> @@ -73,6 +73,10 @@
>  #define F_SETSIG       10      /* for sockets. */
>  #define F_GETSIG       11      /* for sockets. */
>  #endif
> +#ifndef F_SETOWN_TID
> +#define F_SETOWN_TID   12
> +#define F_GETOWN_TID   13
> +#endif
>
>  /* for F_[GET|SET]FL */
>  #define FD_CLOEXEC     1       /* actually anything with low bit set goes */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0872372..42697e7 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -872,6 +872,7 @@ struct fown_struct {
>        rwlock_t lock;          /* protects pid, uid, euid fields */
>        struct pid *pid;        /* pid or -pgrp where SIGIO should be sent */
>        enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */
> +       bool task_only;         /* task or group signal */
>        uid_t uid, euid;        /* uid/euid of process setting the owner */
>        int signum;             /* posix.1b rt signal to be delivered on IO */
>  };
> @@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_struct **, int, int);
>  /* only for net: no internal synchronization */
>  extern void __kill_fasync(struct fasync_struct *, int, int);
>
> -extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
> -extern int f_setown(struct file *filp, unsigned long arg, int force);
> +/*
> + * setown flags
> + */
> +#define FF_SETOWN_FORCE                1
> +#define FF_SETOWN_TID          2
> +
> +extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags);
> +extern int f_setown(struct file *filp, unsigned long arg, int flags);
>  extern void f_delown(struct file *filp);
>  extern pid_t f_getown(struct file *filp);
>  extern int send_sigurg(struct fown_struct *fown);
> diff --git a/net/socket.c b/net/socket.c
> index 791d71a..ac57f8e 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
>                        err = -EFAULT;
>                        if (get_user(pid, (int __user *)argp))
>                                break;
> -                       err = f_setown(sock->file, pid, 1);
> +                       err = f_setown(sock->file, pid, FF_SETOWN_FORCE);
>                        break;
>                case FIOGETOWN:
>                case SIOCGPGRP:
>
>
>

2009-07-31 21:04:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID

On 07/31, Peter Zijlstra wrote:
>
> In order to direct the SIGIO signal to a particular thread of a
> multi-threaded application we cannot, like suggested by the manpage, put
> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
> the whole process of which that thread is part.
>
> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
> which functions similarly to F_SETOWN, except positive arguments are
> interpreted as TIDs and negative arguments are interpreted as PIDs.

I think this is correct. But,

> @@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p,
> int fd,
> int reason)
> {
> + int (*send_sig)(int, struct siginfo *, struct task_struct *);
> /*
> * F_SETSIG can change ->signum lockless in parallel, make
> * sure we read it once and use the same value throughout.
> @@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p,
> if (!sigio_perm(p, fown, signum))
> return;
>
> + send_sig = fown->task_only ? send_sig_info : group_send_sig_info;

this is ugly.

I do not blame your patch, I blame signal.c which has a lot of helpers
to send a signal but it is never possible to find the right one.

I think we need a new trivial helper,

int xxx(int sig, struct siginfo *info, struct task_struct *p, bool group)
{
unsigned long flags;
int ret = -ESRCH;

if (lock_task_sighand(p, &flags)) {
ret = send_signal(sig, info, p, group);
unlock_task_sighand(p, &flags);
}

return ret;
}

send_sigio_to_task() can just do: send_signal(..., !fown->task_only).

group_send_sig_info(), send_sig_info() should use this helper too.


Also. without the new helper, F_SETOWN does check_kill_permission()
while F_SETOWN_TID does not. This doesn't really matter, but this
looks a bit odd.

Note that send_sigio_to_task() does not need check_kill_permission().
We use either SEND_SIG_PRIV or SI_FROMKERNEL() signals, so we never
actually check permissions. Even if we did, it would be just wrong
to deny the signal here using current_cred().


Peter, may I suggest to to re-diff your patch on top of the trivial
patch I am going to send a bit later today?

Oleg.

2009-07-31 21:13:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID

On Fri, 31 Jul 2009 10:35:20 +0200
Peter Zijlstra <[email protected]> wrote:

> In order to direct the SIGIO signal to a particular thread of a
> multi-threaded application we cannot, like suggested by the manpage, put
> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
> the whole process of which that thread is part.
>
> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
> which functions similarly to F_SETOWN, except positive arguments are
> interpreted as TIDs and negative arguments are interpreted as PIDs.
>
> This extension is fully bug compatible with the old F_GETOWN
> implementation in that F_GETOWN_TID will be troubled by the negative
> return value for PIDs similarly to F_GETOWN's trouble with process
> groups.

I'd be interested in seeing a bit more explanation about the "people do
want to properly direct SIGIO" thing - use cases, how the current code
causes them problems, etc. As it stands, it's a bit of a mystery-patch.

> [ compile tested only so far ]

I will continue to lurk :)

> arch/parisc/include/asm/fcntl.h | 2 +
> fs/fcntl.c | 64 +++++++++++++++++++++++++++++++++-----
> include/asm-generic/fcntl.h | 4 ++
> include/linux/fs.h | 11 +++++-
> net/socket.c | 2 +-

OK.



Alpha has private definitions of F_SETSIG and F_GETSIG which are
identical to the generic ones. That's somewhat logical, given that
alpha's F_SETOWN/F_GETOWN _differ_ from the asm-generic ones.

Alpha appears to have made the decision to spell out _all_ the F_*
flags, given that some of them are different. That has some merit, I
guess.

But your patch broke that.

2009-08-01 01:38:21

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID)

Preparation for Peter's changes, but imho makes sense anyway.

Oleg.

2009-08-01 01:39:12

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/2] signals: introduce do_send_sig_info() helper

Introduce do_send_sig_info() and convert group_send_sig_info(),
send_sig_info(), do_send_specific() to use this helper.

Hopefully it will have more users soon, it allows to specify
specific/group behaviour via "bool group" argument.

Shaves 80 bytes from .text.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/signal.h | 2 +
kernel/signal.c | 56 +++++++++++++++++++++++--------------------------
2 files changed, 29 insertions(+), 29 deletions(-)

--- WAIT/include/linux/signal.h~1_HELPER 2009-06-11 14:16:46.000000000 +0200
+++ WAIT/include/linux/signal.h 2009-08-01 02:26:41.000000000 +0200
@@ -233,6 +233,8 @@ static inline int valid_signal(unsigned
}

extern int next_signal(struct sigpending *pending, sigset_t *mask);
+extern int do_send_sig_info(int sig, struct siginfo *info,
+ struct task_struct *p, bool group);
extern int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p);
extern int __group_send_sig_info(int, struct siginfo *, struct task_struct *);
extern long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig,
--- WAIT/kernel/signal.c~1_HELPER 2009-07-01 20:22:44.000000000 +0200
+++ WAIT/kernel/signal.c 2009-08-01 02:21:27.000000000 +0200
@@ -971,6 +971,20 @@ specific_send_sig_info(int sig, struct s
return send_signal(sig, info, t, 0);
}

+int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p,
+ bool group)
+{
+ unsigned long flags;
+ int ret = -ESRCH;
+
+ if (lock_task_sighand(p, &flags)) {
+ ret = send_signal(sig, info, p, group);
+ unlock_task_sighand(p, &flags);
+ }
+
+ return ret;
+}
+
/*
* Force a signal that the process can't ignore: if necessary
* we unblock the signal and change any SIG_IGN to SIG_DFL.
@@ -1068,18 +1082,10 @@ struct sighand_struct *lock_task_sighand
*/
int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
- unsigned long flags;
- int ret;
-
- ret = check_kill_permission(sig, info, p);
+ int ret = check_kill_permission(sig, info, p);

- if (!ret && sig) {
- ret = -ESRCH;
- if (lock_task_sighand(p, &flags)) {
- ret = __group_send_sig_info(sig, info, p);
- unlock_task_sighand(p, &flags);
- }
- }
+ if (!ret && sig)
+ ret = do_send_sig_info(sig, info, p, true);

return ret;
}
@@ -1224,15 +1230,9 @@ static int kill_something_info(int sig,
* These are for backward compatibility with the rest of the kernel source.
*/

-/*
- * The caller must ensure the task can't exit.
- */
int
send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
- int ret;
- unsigned long flags;
-
/*
* Make sure legacy kernel users don't send in bad values
* (normal paths check this in check_kill_permission).
@@ -1240,10 +1240,7 @@ send_sig_info(int sig, struct siginfo *i
if (!valid_signal(sig))
return -EINVAL;

- spin_lock_irqsave(&p->sighand->siglock, flags);
- ret = specific_send_sig_info(sig, info, p);
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
- return ret;
+ return do_send_sig_info(sig, info, p, false);
}

#define __si_special(priv) \
@@ -2281,7 +2278,6 @@ static int
do_send_specific(pid_t tgid, pid_t pid, int sig, struct siginfo *info)
{
struct task_struct *p;
- unsigned long flags;
int error = -ESRCH;

rcu_read_lock();
@@ -2291,14 +2287,16 @@ do_send_specific(pid_t tgid, pid_t pid,
/*
* The null signal is a permissions and process existence
* probe. No signal is actually delivered.
- *
- * If lock_task_sighand() fails we pretend the task dies
- * after receiving the signal. The window is tiny, and the
- * signal is private anyway.
*/
- if (!error && sig && lock_task_sighand(p, &flags)) {
- error = specific_send_sig_info(sig, info, p);
- unlock_task_sighand(p, &flags);
+ if (!error && sig) {
+ error = do_send_sig_info(sig, info, p, false);
+ /*
+ * If lock_task_sighand() failed we pretend the task
+ * dies after receiving the signal. The window is tiny,
+ * and the signal is private anyway.
+ */
+ if (unlikely(error == -ESRCH))
+ error = 0;
}
}
rcu_read_unlock();

2009-08-01 01:40:00

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission()

group_send_sig_info()->check_kill_permission() assumes that current
is the sender and uses current_cred().

This is not true in send_sigio_to_task() case. From the security pov
the sender is not current, but the task which did fcntl(F_SETOWN),
that is why we have sigio_perm() which uses the right creds to check.

Fortunately, send_sigio() always sends either SEND_SIG_PRIV or
SI_FROMKERNEL() signal, so check_kill_permission() does nothing. But
still it would be tidier to avoid this bogus security check and save
a couple of cycles.

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/fcntl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- WAIT/fs/fcntl.c~2_SIGIO 2009-06-17 14:11:26.000000000 +0200
+++ WAIT/fs/fcntl.c 2009-08-01 02:40:41.000000000 +0200
@@ -462,11 +462,11 @@ static void send_sigio_to_task(struct ta
else
si.si_band = band_table[reason - POLL_IN];
si.si_fd = fd;
- if (!group_send_sig_info(signum, &si, p))
+ if (!do_send_sig_info(signum, &si, p, true))
break;
/* fall-through: fall back on the old plain SIGIO signal */
case 0:
- group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
+ do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true);
}
}