2008-07-16 14:50:59

by Mark McLoughlin

[permalink] [raw]
Subject: [PATCH] posix-timers: Do not modify an already queued timer signal

When a timer fires, posix_timer_event() zeroes out its
pre-allocated siginfo structure, initialises it and then
queues up the signal with send_sigqueue().

However, we may have previously queued up this signal, in
which case we only want to increment si_overrun and
re-initialising the siginfo structure is incorrect.

Also, since we are modifying an already queued signal
without the protection of the sighand spinlock, we may also
race with e.g. collect_signal() causing it to fail to find
a signal on the pending list because it happens to look at
the siginfo struct after it was zeroed and before it was
re-initialised.

The race was observed with a modified kvm-userspace when
running a guest under heavy network load. When it occurs,
KVM never sees another SIGALRM signal because although
the signal is queued up the appropriate bit is never set
in the pending mask. Manually sending the process a SIGALRM
kicks it out of this state.

The fix is simple - only modify the pre-allocated sigqueue
once we're sure that it hasn't already been queued.

Signed-off-by: Mark McLoughlin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>

---
include/linux/sched.h | 2 +-
kernel/posix-timers.c | 20 +++++++++++---------
kernel/signal.c | 5 +++--
3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2134917..718f7ec 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1791,7 +1791,7 @@ extern void zap_other_threads(struct task_struct *p);
extern int kill_proc(pid_t, int, int);
extern struct sigqueue *sigqueue_alloc(void);
extern void sigqueue_free(struct sigqueue *);
-extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group);
+extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group);
extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long);

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index dbd8398..b42c964 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -298,19 +298,21 @@ void do_schedule_next_timer(struct siginfo *info)

int posix_timer_event(struct k_itimer *timr,int si_private)
{
- memset(&timr->sigq->info, 0, sizeof(siginfo_t));
- timr->sigq->info.si_sys_private = si_private;
+ siginfo_t info;
+
+ memset(&info, 0, sizeof(siginfo_t));
+ info.si_sys_private = si_private;
/* Send signal to the process that owns this timer.*/

- timr->sigq->info.si_signo = timr->it_sigev_signo;
- timr->sigq->info.si_errno = 0;
- timr->sigq->info.si_code = SI_TIMER;
- timr->sigq->info.si_tid = timr->it_id;
- timr->sigq->info.si_value = timr->it_sigev_value;
+ info.si_signo = timr->it_sigev_signo;
+ info.si_errno = 0;
+ info.si_code = SI_TIMER;
+ info.si_tid = timr->it_id;
+ info.si_value = timr->it_sigev_value;

if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
struct task_struct *leader;
- int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
+ int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);

if (likely(ret >= 0))
return ret;
@@ -321,7 +323,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
timr->it_process = leader;
}

- return send_sigqueue(timr->sigq, timr->it_process, 1);
+ return send_sigqueue(timr->sigq, &info, timr->it_process, 1);
}
EXPORT_SYMBOL_GPL(posix_timer_event);

diff --git a/kernel/signal.c b/kernel/signal.c
index 6c0958e..50e0b13 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q)
__sigqueue_free(q);
}

-int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
+int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group)
{
- int sig = q->info.si_signo;
+ int sig = info->si_signo;
struct sigpending *pending;
unsigned long flags;
int ret;
@@ -1322,6 +1322,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)

signalfd_notify(t, sig);
pending = group ? &t->signal->shared_pending : &t->pending;
+ copy_siginfo(&q->info, info);
list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig);
complete_signal(sig, t, group);
--
1.5.5.1


2008-07-16 16:18:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On 07/16, Mark McLoughlin wrote:
>
> When a timer fires, posix_timer_event() zeroes out its
> pre-allocated siginfo structure, initialises it and then
> queues up the signal with send_sigqueue().
>
> However, we may have previously queued up this signal, in
> which case we only want to increment si_overrun and
> re-initialising the siginfo structure is incorrect.

Quoting Roland McGrath:
>
> I'm not clear on how the already-queued case could ever happen. Do we
> really need that check at all? It shouldn't be possible for the timer to
> be firing when it's already queued, because it won't have been reloaded.
> It only reloads via do_schedule_next_timer after it's dequeued, or because
> a 1 return value said it never was queued.

> Also, since we are modifying an already queued signal
> without the protection of the sighand spinlock, we may also
> race with e.g. collect_signal() causing it to fail to find
> a signal on the pending list because it happens to look at
> the siginfo struct after it was zeroed and before it was
> re-initialised.
>
> The race was observed with a modified kvm-userspace when
> running a guest under heavy network load. When it occurs,
> KVM never sees another SIGALRM signal because although
> the signal is queued up the appropriate bit is never set
> in the pending mask.

Hmm. Yes, if collect_signal() races with posix_timer_event()->memset(),
we dequeue SIGALRM but leave the siginfo on list. I can't see how
this is possible though...

Could you verify that we don't have another bug? Say, add
WARN_ON(!list_empty()) to posix_timer_event().


If we need this fix, perhaps it is better to modify posix_timer_event()
to check !list_empty()?

Add Thomas.

Oleg.

> Manually sending the process a SIGALRM
> kicks it out of this state.
>
> The fix is simple - only modify the pre-allocated sigqueue
> once we're sure that it hasn't already been queued.
>
> Signed-off-by: Mark McLoughlin <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Roland McGrath <[email protected]>
>
> ---
> include/linux/sched.h | 2 +-
> kernel/posix-timers.c | 20 +++++++++++---------
> kernel/signal.c | 5 +++--
> 3 files changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2134917..718f7ec 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1791,7 +1791,7 @@ extern void zap_other_threads(struct task_struct *p);
> extern int kill_proc(pid_t, int, int);
> extern struct sigqueue *sigqueue_alloc(void);
> extern void sigqueue_free(struct sigqueue *);
> -extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group);
> +extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group);
> extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
> extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long);
>
> diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
> index dbd8398..b42c964 100644
> --- a/kernel/posix-timers.c
> +++ b/kernel/posix-timers.c
> @@ -298,19 +298,21 @@ void do_schedule_next_timer(struct siginfo *info)
>
> int posix_timer_event(struct k_itimer *timr,int si_private)
> {
> - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
> - timr->sigq->info.si_sys_private = si_private;
> + siginfo_t info;
> +
> + memset(&info, 0, sizeof(siginfo_t));
> + info.si_sys_private = si_private;
> /* Send signal to the process that owns this timer.*/
>
> - timr->sigq->info.si_signo = timr->it_sigev_signo;
> - timr->sigq->info.si_errno = 0;
> - timr->sigq->info.si_code = SI_TIMER;
> - timr->sigq->info.si_tid = timr->it_id;
> - timr->sigq->info.si_value = timr->it_sigev_value;
> + info.si_signo = timr->it_sigev_signo;
> + info.si_errno = 0;
> + info.si_code = SI_TIMER;
> + info.si_tid = timr->it_id;
> + info.si_value = timr->it_sigev_value;
>
> if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
> struct task_struct *leader;
> - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
> + int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);
>
> if (likely(ret >= 0))
> return ret;
> @@ -321,7 +323,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
> timr->it_process = leader;
> }
>
> - return send_sigqueue(timr->sigq, timr->it_process, 1);
> + return send_sigqueue(timr->sigq, &info, timr->it_process, 1);
> }
> EXPORT_SYMBOL_GPL(posix_timer_event);
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 6c0958e..50e0b13 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q)
> __sigqueue_free(q);
> }
>
> -int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
> +int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group)
> {
> - int sig = q->info.si_signo;
> + int sig = info->si_signo;
> struct sigpending *pending;
> unsigned long flags;
> int ret;
> @@ -1322,6 +1322,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
>
> signalfd_notify(t, sig);
> pending = group ? &t->signal->shared_pending : &t->pending;
> + copy_siginfo(&q->info, info);
> list_add_tail(&q->list, &pending->list);
> sigaddset(&pending->signal, sig);
> complete_signal(sig, t, group);
> --
> 1.5.5.1
>

2008-07-17 11:09:07

by Mark McLoughlin

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

Hi Oleg,

On Wed, 2008-07-16 at 20:21 +0400, Oleg Nesterov wrote:
> On 07/16, Mark McLoughlin wrote:
> >
> > When a timer fires, posix_timer_event() zeroes out its
> > pre-allocated siginfo structure, initialises it and then
> > queues up the signal with send_sigqueue().
> >
> > However, we may have previously queued up this signal, in
> > which case we only want to increment si_overrun and
> > re-initialising the siginfo structure is incorrect.
>
> Quoting Roland McGrath:
> >
> > I'm not clear on how the already-queued case could ever happen. Do we
> > really need that check at all? It shouldn't be possible for the timer to
> > be firing when it's already queued, because it won't have been reloaded.
> > It only reloads via do_schedule_next_timer after it's dequeued, or because
> > a 1 return value said it never was queued.

The app can reload the timer itself before the signal has been dequeued
via signalfd ...

> > Also, since we are modifying an already queued signal
> > without the protection of the sighand spinlock, we may also
> > race with e.g. collect_signal() causing it to fail to find
> > a signal on the pending list because it happens to look at
> > the siginfo struct after it was zeroed and before it was
> > re-initialised.
> >
> > The race was observed with a modified kvm-userspace when
> > running a guest under heavy network load. When it occurs,
> > KVM never sees another SIGALRM signal because although
> > the signal is queued up the appropriate bit is never set
> > in the pending mask.
>
> Hmm. Yes, if collect_signal() races with posix_timer_event()->memset(),
> we dequeue SIGALRM but leave the siginfo on list. I can't see how
> this is possible though...
>
> Could you verify that we don't have another bug? Say, add
> WARN_ON(!list_empty()) to posix_timer_event().

Yes, this case definitely occurs.

Now that I know what's going on, I've finally managed to extract a
standalone test case. Try this:

http://markmc.fedorapeople.org/test-posix-timer-race.c

On my quad-core machine it triggers the race fairly readily.

> If we need this fix, perhaps it is better to modify posix_timer_event()
> to check !list_empty()?

Yeah, I had considered that, but it's a tad more invasive. See below.

I mainly don't like this patch because we may lock the sighand from one
thread's task_struct and then unlock it via the group leader's sighand.
That probably is safe, but is pretty nasty.

Cheers,
Mark.

Subject: [PATCH] posix-timers: Do not modify an already queued timer signal

When a timer fires, posix_timer_event() zeroes out its
pre-allocated siginfo structure, initialises it and then
queues up the signal with send_sigqueue().

However, we may have previously queued up this signal, in
which case we only want to increment si_overrun and
re-initialising the siginfo structure is incorrect.

Also, since we are modifying an already queued signal
without the protection of the sighand spinlock, we may also
race with e.g. collect_signal() causing it to fail to find
a signal on the pending list because it happens to look at
the siginfo struct after it was zeroed and before it was
re-initialised.

The race was observed with a modified kvm-userspace when
running a guest under heavy network load. When it occurs,
KVM never sees another SIGALRM signal because although
the signal is queued up the appropriate bit is never set
in the pending mask. Manually sending the process a SIGALRM
kicks it out of this state.

The fix is simple - only modify the pre-allocated sigqueue
once we're sure that it hasn't already been queued.

Signed-off-by: Mark McLoughlin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/posix-timers.c | 23 ++++++++++++++++++++---
kernel/signal.c | 27 ++++-----------------------
2 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index dbd8398..65ce122 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -298,6 +298,19 @@ void do_schedule_next_timer(struct siginfo *info)

int posix_timer_event(struct k_itimer *timr,int si_private)
{
+ unsigned long flags;
+ int ret = -1;
+
+ if (!likely(lock_task_sighand(timr->it_process, &flags)))
+ goto ret;
+
+ ret = 0;
+ if (unlikely(!list_empty(&timr->sigq->list))) {
+ /* Already queued; just increment the overrun count */
+ timr->sigq->info.si_overrun++;
+ goto out;
+ }
+
memset(&timr->sigq->info, 0, sizeof(siginfo_t));
timr->sigq->info.si_sys_private = si_private;
/* Send signal to the process that owns this timer.*/
@@ -310,10 +323,10 @@ int posix_timer_event(struct k_itimer *timr,int si_private)

if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
struct task_struct *leader;
- int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
+ ret = send_sigqueue(timr->sigq, timr->it_process, 0);

if (likely(ret >= 0))
- return ret;
+ goto out;

timr->it_sigev_notify = SIGEV_SIGNAL;
leader = timr->it_process->group_leader;
@@ -321,7 +334,11 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
timr->it_process = leader;
}

- return send_sigqueue(timr->sigq, timr->it_process, 1);
+ ret = send_sigqueue(timr->sigq, timr->it_process, 1);
+out:
+ unlock_task_sighand(timr->it_process, &flags);
+ret:
+ return ret;
}
EXPORT_SYMBOL_GPL(posix_timer_event);

diff --git a/kernel/signal.c b/kernel/signal.c
index 6c0958e..62eb972 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1296,39 +1296,20 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
{
int sig = q->info.si_signo;
struct sigpending *pending;
- unsigned long flags;
- int ret;

BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
+ BUG_ON(!list_empty(&q->list));

- ret = -1;
- if (!likely(lock_task_sighand(t, &flags)))
- goto ret;
-
- ret = 1; /* the signal is ignored */
if (!prepare_signal(sig, t))
- goto out;
-
- ret = 0;
- if (unlikely(!list_empty(&q->list))) {
- /*
- * If an SI_TIMER entry is already queue just increment
- * the overrun count.
- */
- BUG_ON(q->info.si_code != SI_TIMER);
- q->info.si_overrun++;
- goto out;
- }
+ return 1; /* the signal is ignored */

signalfd_notify(t, sig);
pending = group ? &t->signal->shared_pending : &t->pending;
list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig);
complete_signal(sig, t, group);
-out:
- unlock_task_sighand(t, &flags);
-ret:
- return ret;
+
+ return 0;
}

/*
--
1.5.4.1


2008-07-17 13:52:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On 07/17, Mark McLoughlin wrote:
>
> On Wed, 2008-07-16 at 20:21 +0400, Oleg Nesterov wrote:
> > On 07/16, Mark McLoughlin wrote:
> > >
> > > When a timer fires, posix_timer_event() zeroes out its
> > > pre-allocated siginfo structure, initialises it and then
> > > queues up the signal with send_sigqueue().
> > >
> > > However, we may have previously queued up this signal, in
> > > which case we only want to increment si_overrun and
> > > re-initialising the siginfo structure is incorrect.
> >
> > Quoting Roland McGrath:
> > >
> > > I'm not clear on how the already-queued case could ever happen. Do we
> > > really need that check at all? It shouldn't be possible for the timer to
> > > be firing when it's already queued, because it won't have been reloaded.
> > > It only reloads via do_schedule_next_timer after it's dequeued, or because
> > > a 1 return value said it never was queued.
>
> The app can reload the timer itself before the signal has been dequeued
> via signalfd ...

Indeed! Thanks Mark.

Thomas, Roland, could you take a look?


> > If we need this fix, perhaps it is better to modify posix_timer_event()
> > to check !list_empty()?
>
> Yeah, I had considered that, but it's a tad more invasive. See below.
>
> I mainly don't like this patch

Agreed, this one looks worse.


I forgot (if ever knew ;) this code completely, but can't we make a simpler
fix? posix_timer_event() can check list_empty() lockless,

posix_timer_event()
{
if (!list_emtpy(sigq->list))
return 0;

... fill and send ->sigq...
}

When the signal will be dequeued, schedule_next_timer() should afaics
set ->it_overrun_last which is copied to ->si_overrun then. Or we can
increment timr->it_overrun before return if I misread hrtimer_forward().

What do you think?


Another possible fix... we can change sys_timer_settime() to do
sigqueue_free() and re-alloc ->sigq when it is pending. Not that
I like this very much though.

Oleg.

2008-07-18 10:39:49

by Mark McLoughlin

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On Thu, 2008-07-17 at 17:55 +0400, Oleg Nesterov wrote:

> I forgot (if ever knew ;) this code completely, but can't we make a simpler
> fix? posix_timer_event() can check list_empty() lockless,
>
> posix_timer_event()
> {
> if (!list_emtpy(sigq->list))
> return 0;
>
> ... fill and send ->sigq...
> }

Well, one issue with this is that we need to set the si_private supplied
to posix_timer_event() on the queued siginfo. See updated version of the
original patch below.

So, for that reason, we can't currently do it lockless.

Now, I've spent a while looking at the it_requeue_pending code and I
can't fully satisfy myself that we need it to be a modification counter
that we match up via si_sys_private. Do you know why this is needed? It
seems to me that it could be seriously simplified.

The explanation in the pre-2.6 code I could find was:

* An interesting problem can occure if, while a signal, and thus a call
* back is pending, the timer is rearmed, i.e. stopped and restarted.
* We then need to sort out the call back and do the right thing. What
* we do is to put a counter in the info block and match it with the
* timers copy on the call back.

> When the signal will be dequeued, schedule_next_timer() should afaics
> set ->it_overrun_last which is copied to ->si_overrun then. Or we can
> increment timr->it_overrun before return if I misread hrtimer_forward().

Yep; that sounds reasonable to me - i.e. do all the overrun accounting
in a callback just before the signal is to be delivered.

However, since the race condition I identified basically breaks the
timer permanently for the app (i.e. it will never fire again unless the
signal is delivered to the process for another reason) I do think the
fix below makes sense in advance of simplifying all this long standing
code and correcting overrun accounting.

Cheers,
Mark.

Subject: [PATCH] posix-timers: Do not modify an already queued timer signal

When a timer fires, posix_timer_event() zeroes out its
pre-allocated siginfo structure, initialises it and then
queues up the signal with send_sigqueue().

However, we may have previously queued up this signal, in
which case we only want to increment si_overrun and
re-initialising the siginfo structure is incorrect.

Also, since we are modifying an already queued signal
without the protection of the sighand spinlock, we may also
race with e.g. collect_signal() causing it to fail to find
a signal on the pending list because it happens to look at
the siginfo struct after it was zeroed and before it was
re-initialised.

The race was observed with a modified kvm-userspace when
running a guest under heavy network load. When it occurs,
KVM never sees another SIGALRM signal because although
the signal is queued up the appropriate bit is never set
in the pending mask. Manually sending the process a SIGALRM
kicks it out of this state.

The fix is simple - only modify the pre-allocated sigqueue
once we're sure that it hasn't already been queued.

Signed-off-by: Mark McLoughlin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/sched.h | 2 +-
kernel/posix-timers.c | 23 ++++++++++++-----------
kernel/signal.c | 6 ++++--
3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c5d3f84..f260585 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1786,7 +1786,7 @@ extern void zap_other_threads(struct task_struct *p);
extern int kill_proc(pid_t, int, int);
extern struct sigqueue *sigqueue_alloc(void);
extern void sigqueue_free(struct sigqueue *);
-extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group);
+extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group);
extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long);

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index dbd8398..d3a8561 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -296,21 +296,22 @@ void do_schedule_next_timer(struct siginfo *info)
unlock_timer(timr, flags);
}

-int posix_timer_event(struct k_itimer *timr,int si_private)
+int posix_timer_event(struct k_itimer *timr, int si_private)
{
- memset(&timr->sigq->info, 0, sizeof(siginfo_t));
- timr->sigq->info.si_sys_private = si_private;
- /* Send signal to the process that owns this timer.*/
+ siginfo_t info;

- timr->sigq->info.si_signo = timr->it_sigev_signo;
- timr->sigq->info.si_errno = 0;
- timr->sigq->info.si_code = SI_TIMER;
- timr->sigq->info.si_tid = timr->it_id;
- timr->sigq->info.si_value = timr->it_sigev_value;
+ memset(&info, 0, sizeof(siginfo_t));
+
+ info.si_sys_private = si_private;
+ info.si_signo = timr->it_sigev_signo;
+ info.si_errno = 0;
+ info.si_code = SI_TIMER;
+ info.si_tid = timr->it_id;
+ info.si_value = timr->it_sigev_value;

if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
struct task_struct *leader;
- int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
+ int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);

if (likely(ret >= 0))
return ret;
@@ -321,7 +322,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
timr->it_process = leader;
}

- return send_sigqueue(timr->sigq, timr->it_process, 1);
+ return send_sigqueue(timr->sigq, &info, timr->it_process, 1);
}
EXPORT_SYMBOL_GPL(posix_timer_event);

diff --git a/kernel/signal.c b/kernel/signal.c
index 6c0958e..95a34ff 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q)
__sigqueue_free(q);
}

-int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
+int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group)
{
- int sig = q->info.si_signo;
+ int sig = info->si_signo;
struct sigpending *pending;
unsigned long flags;
int ret;
@@ -1317,12 +1317,13 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
*/
BUG_ON(q->info.si_code != SI_TIMER);
q->info.si_overrun++;
+ q->info.si_sys_private = info->si_sys_private;
goto out;
}

signalfd_notify(t, sig);
pending = group ? &t->signal->shared_pending : &t->pending;
+ copy_siginfo(&q->info, info);
list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig);
complete_signal(sig, t, group);
--
1.5.5.1


2008-07-19 16:34:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On 07/18, Mark McLoughlin wrote:
>
> On Thu, 2008-07-17 at 17:55 +0400, Oleg Nesterov wrote:
>
> > I forgot (if ever knew ;) this code completely, but can't we make a simpler
> > fix? posix_timer_event() can check list_empty() lockless,
> >
> > posix_timer_event()
> > {
> > if (!list_emtpy(sigq->list))
> > return 0;
> >
> > ... fill and send ->sigq...
> > }
>
> Well, one issue with this is that we need to set the si_private supplied
> to posix_timer_event() on the queued siginfo. See updated version of the
> original patch below.
>
> So, for that reason, we can't currently do it lockless.
>
> Now, I've spent a while looking at the it_requeue_pending code and I
> can't fully satisfy myself that we need it to be a modification counter
> that we match up via si_sys_private. Do you know why this is needed? It
> seems to me that it could be seriously simplified.

No, I don't understand what does si_sys_private mean. In fact I don't even
understand what should we do with info.si_overrun in this corner case.

We have the active timer, the app does sys_timer_settime() which changes the
timer. This looks like creating the new timer which "inherits" ->it_id and
->it_sigev_value. But the queued siginfo is connected to the "old" timer...
OK, I just don't understand this all.

> Subject: [PATCH] posix-timers: Do not modify an already queued timer signal
>
> When a timer fires, posix_timer_event() zeroes out its
> pre-allocated siginfo structure, initialises it and then
> queues up the signal with send_sigqueue().
>
> However, we may have previously queued up this signal, in
> which case we only want to increment si_overrun and
> re-initialising the siginfo structure is incorrect.
>
> Also, since we are modifying an already queued signal
> without the protection of the sighand spinlock, we may also
> race with e.g. collect_signal() causing it to fail to find
> a signal on the pending list because it happens to look at
> the siginfo struct after it was zeroed and before it was
> re-initialised.
>
> The race was observed with a modified kvm-userspace when
> running a guest under heavy network load. When it occurs,
> KVM never sees another SIGALRM signal because although
> the signal is queued up the appropriate bit is never set
> in the pending mask. Manually sending the process a SIGALRM
> kicks it out of this state.

Please update the changelog to explain how it is possible to hit the
already queued siginfo.

> -int posix_timer_event(struct k_itimer *timr,int si_private)
> +int posix_timer_event(struct k_itimer *timr, int si_private)
> {
> - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
> - timr->sigq->info.si_sys_private = si_private;
> - /* Send signal to the process that owns this timer.*/
> + siginfo_t info;
>
> - timr->sigq->info.si_signo = timr->it_sigev_signo;
> - timr->sigq->info.si_errno = 0;
> - timr->sigq->info.si_code = SI_TIMER;
> - timr->sigq->info.si_tid = timr->it_id;
> - timr->sigq->info.si_value = timr->it_sigev_value;
> + memset(&info, 0, sizeof(siginfo_t));
> +
> + info.si_sys_private = si_private;
> + info.si_signo = timr->it_sigev_signo;
> + info.si_errno = 0;
> + info.si_code = SI_TIMER;
> + info.si_tid = timr->it_id;
> + info.si_value = timr->it_sigev_value;
>
> if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
> struct task_struct *leader;
> - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
> + int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);

I think this is a bit overkill. Note that (unless I missed something)
posix_timer_event() populates timr->sigq->info with the same numbers
every time, so afaics we can do

--- kernel/posix-timers.c
+++ kernel/posix-timers.c
@@ -298,19 +298,14 @@ void do_schedule_next_timer(struct sigin

int posix_timer_event(struct k_itimer *timr,int si_private)
{
- memset(&timr->sigq->info, 0, sizeof(siginfo_t));
- timr->sigq->info.si_sys_private = si_private;
- /* Send signal to the process that owns this timer.*/
-
timr->sigq->info.si_signo = timr->it_sigev_signo;
- timr->sigq->info.si_errno = 0;
timr->sigq->info.si_code = SI_TIMER;
timr->sigq->info.si_tid = timr->it_id;
timr->sigq->info.si_value = timr->it_sigev_value;

if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
struct task_struct *leader;
- int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
+ int ret = send_sigqueue(timr->sigq, si_private, timr->it_process, 0);

if (likely(ret >= 0))
return ret;
@@ -321,7 +316,7 @@ int posix_timer_event(struct k_itimer *t
timr->it_process = leader;
}

- return send_sigqueue(timr->sigq, timr->it_process, 1);
+ return send_sigqueue(timr->sigq, si_private, timr->it_process, 1);
}
EXPORT_SYMBOL_GPL(posix_timer_event);

@@ -435,6 +430,7 @@ static struct k_itimer * alloc_posix_tim
kmem_cache_free(posix_timers_cache, tmr);
tmr = NULL;
}
+ memset(&timr->sigq->info, 0, sizeof(siginfo_t));
return tmr;
}

--- kernel/signal.c
+++ kernel/signal.c
@@ -1283,7 +1283,7 @@ void sigqueue_free(struct sigqueue *q)
__sigqueue_free(q);
}

-int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
+int send_sigqueue(struct sigqueue *q, int si_private, struct task_struct *t, int group)
{
int sig = q->info.si_signo;
struct sigpending *pending;
@@ -1300,6 +1300,8 @@ int send_sigqueue(struct sigqueue *q, st
if (!prepare_signal(sig, t))
goto out;

+ q->info.si_sys_private = info->si_sys_private;
+
ret = 0;
if (unlikely(!list_empty(&q->list))) {
/*

But can't we do a simpler change?

--- kernel/posix-timers.c
+++ kernel/posix-timers.c
@@ -298,7 +298,6 @@ void do_schedule_next_timer(struct sigin

int posix_timer_event(struct k_itimer *timr,int si_private)
{
- memset(&timr->sigq->info, 0, sizeof(siginfo_t));
timr->sigq->info.si_sys_private = si_private;
/* Send signal to the process that owns this timer.*/

@@ -435,6 +434,7 @@ static struct k_itimer * alloc_posix_tim
kmem_cache_free(posix_timers_cache, tmr);
tmr = NULL;
}
+ memset(&timr->sigq->info, 0, sizeof(siginfo_t));
return tmr;
}

Yes, if sigq->info is queued, it can be dequeued right after
".si_sys_private = si_private" and before we send the signal. As I said,
I don't know what si_sys_private means for the user-level, is this bad?

Note that the we can't race with do_schedule_next_timer(), the timer is
locked.

Thoughts?

Oleg.

2008-07-20 06:52:55

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

I think the solution we want is to make sure that a timer whose
expiration signal is still pending never gets a second notification.

POSIX says, "The effect of disarming or resetting a timer with pending
expiration notifications is unspecified." This gives us lots of
latitude. I think the best behavior is the one that comports with the
general specification, "Only a single signal shall be queued to the
process for a given timer at any point in time."

The it_requeue_pending/si_sys_private logic is indeed intended to
address this case. It predates my involvement of the code, and I
agree it has always looked oddly hairy. Its plan is to make sure that
dequeue_signal's call to do_schedule_next_timer does not re-arm the
timer when an intervening timer_settime call has already done so. As
Mark found, this is buggy because it's not really safe for the timer
to go off (call posix_timer_event again) before the dequeue.

In the signals code, si_sys_private is used purely as a flag that
dequeue_signal needs to bother with dropping the siglock to call
do_schedule_next_timer. Its use as a serialization counter between
timer_settime and do_schedule_next_timer is entirely local to the
posix-timers code.

What userland should observe is that when the new timer expiration
time arrives while the previously-fired timer signal is still pending,
there is no new signal, and the timer accrues an overrun.

An obvious idea is simply not to re-arm in timer_settime when the
timer already has a pending notification. When the signal gets
dequeued, do_schedule_next_timer will arm it then. However, the way
we dispatch to different clocks' timer_set functions makes it awkward
to validate and store a new setting and fetch the old value without
actually arming the timer. Also, the logic for one-shot timers has
more wrinkles.

So I think the thing to do is arm the timer in timer_settime even when
it's already pending (as we do now), but have posix_timer_event detect
a firing when already queued and just bump the overrun count.

We can do away with it_requeue_pending, and make si_sys_private purely
a simple 0/1 flag for whether to call do_schedule_next_timer. To
optimize the extra checks, we use different bookkeeping fields in
struct k_itimer.

Have a flag that means "might be queued". do_schedule_next_timer
replaces:

if (timr && timr->it_requeue_pending == info->si_sys_private) {
...
}

with:

if (timr && timr->fired) {
timr->fired = 0;
...
}

That avoids races with timer_settime, which does (after timer_set
call, timer still locked):

if (timr->fired) {
spin_lock(&current->sighand->siglock);
if (list_empty(&timr->sigq->list))
timr->fired = 0;
spin_unlock(&current->sighand->siglock);
}

In posix_timer_event (timer is locked), check:

if (timr->fired) {
spin_lock(&current->sighand->siglock);
if (list_empty(&timr->sigq->list))
timr->fired = 0;
spin_unlock(&current->sighand->siglock);
if (timr->fired) {
timr->it_overrun++;
return 0;
}
}
timr->fired = 1;

The siglock'd double-check is in case of a one-shot firing that didn't
ever call do_schedule_next_timer to reset the flag.

This bookkeeping plan may need more thought. I think the idea is
sound: "firing", meaning notification and reload, only occurs when
both the expiration time has passed and the previous expiration
notification signal has been dequeued. When the expiration time
passes but the other criterion is unmet, there's an overrun and no new
notification or reload.


Thanks,
Roland

2008-07-20 11:05:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On 07/19, Roland McGrath wrote:
>
> I think the solution we want is to make sure that a timer whose
> expiration signal is still pending never gets a second notification.
>
> POSIX says, "The effect of disarming or resetting a timer with pending
> expiration notifications is unspecified." This gives us lots of
> latitude. I think the best behavior is the one that comports with the
> general specification, "Only a single signal shall be queued to the
> process for a given timer at any point in time."
>
> The it_requeue_pending/si_sys_private logic is indeed intended to
> address this case. It predates my involvement of the code, and I
> agree it has always looked oddly hairy. Its plan is to make sure that
> dequeue_signal's call to do_schedule_next_timer does not re-arm the
> timer when an intervening timer_settime call has already done so. As
> Mark found, this is buggy because it's not really safe for the timer
> to go off (call posix_timer_event again) before the dequeue.
>
> In the signals code, si_sys_private is used purely as a flag that
> dequeue_signal needs to bother with dropping the siglock to call
> do_schedule_next_timer. Its use as a serialization counter between
> timer_settime and do_schedule_next_timer is entirely local to the
> posix-timers code.

Yes, thanks, I see. But does it have any meaning for the user-space?

Let me repeat. Can't we make a simple fix for now for this nasty and
ancient bug, before we do the more clever changes,

--- kernel/posix-timers.c
+++ kernel/posix-timers.c
@@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin

int posix_timer_event(struct k_itimer *timr,int si_private)
{
- memset(&timr->sigq->info, 0, sizeof(siginfo_t));
timr->sigq->info.si_sys_private = si_private;
/* Send signal to the process that owns this timer.*/

timr->sigq->info.si_signo = timr->it_sigev_signo;
- timr->sigq->info.si_errno = 0;
timr->sigq->info.si_code = SI_TIMER;
timr->sigq->info.si_tid = timr->it_id;
timr->sigq->info.si_value = timr->it_sigev_value;
@@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
kmem_cache_free(posix_timers_cache, tmr);
tmr = NULL;
}
+ memset(&timr->sigq->info, 0, sizeof(siginfo_t));
return tmr;
}


?

Suppose ->sigq is queued. In that case "si_sys_private = si_private"
can race with dequeue_signal/do_schedule_next_timer, but everything
looks OK because the timer is locked.

If dequeue_signal() sees the old value of si_sys_private, it won't
call do_schedule_next_timer(). If it sees the new value, do_schedule_next_timer()
will wait for ->it_lock.

Yes, in both cases we can queue ->sigq again instead of incrementing
info.si_overrun. But this is not worse than we have now, and afaics
this is also possible with timr->fired, please see below.

> What userland should observe is that when the new timer expiration
> time arrives while the previously-fired timer signal is still pending,
> there is no new signal, and the timer accrues an overrun.
>
> An obvious idea is simply not to re-arm in timer_settime when the
> timer already has a pending notification. When the signal gets
> dequeued, do_schedule_next_timer will arm it then. However, the way
> we dispatch to different clocks' timer_set functions makes it awkward
> to validate and store a new setting and fetch the old value without
> actually arming the timer. Also, the logic for one-shot timers has
> more wrinkles.
>
> So I think the thing to do is arm the timer in timer_settime even when
> it's already pending (as we do now), but have posix_timer_event detect
> a firing when already queued and just bump the overrun count.
>
> We can do away with it_requeue_pending, and make si_sys_private purely
> a simple 0/1 flag for whether to call do_schedule_next_timer. To
> optimize the extra checks, we use different bookkeeping fields in
> struct k_itimer.
>
> Have a flag that means "might be queued". do_schedule_next_timer
> replaces:
>
> if (timr && timr->it_requeue_pending == info->si_sys_private) {
> ...
> }
>
> with:
>
> if (timr && timr->fired) {
> timr->fired = 0;
> ...
> }
>
> That avoids races with timer_settime, which does (after timer_set
> call, timer still locked):
>
> if (timr->fired) {
> spin_lock(&current->sighand->siglock);
> if (list_empty(&timr->sigq->list))
> timr->fired = 0;
> spin_unlock(&current->sighand->siglock);
> }
>
> In posix_timer_event (timer is locked), check:
>
> if (timr->fired) {
> spin_lock(&current->sighand->siglock);
> if (list_empty(&timr->sigq->list))
> timr->fired = 0;
> spin_unlock(&current->sighand->siglock);

(we can't use current->sighand->siglock, this is irq context, we should
use timr->it_process->group_leader...)

Suppose that the signal was already dequeued but sigq->info wasn't copied
to the user-space. The thread which does dequeue_signal() unlocked siglock
and waits for ->it_lock.

posix_timer_event() sees list_empty(), proceeds and calls send_sigqueue().
Now we requeue this ->sigq instead of incrementing si_overrun.

The thread which does dequeue_signal() continues and re-schedules the
timer while ->sigq is queued. Then it copies sigq->info to the user space.

The same problem as with the simple patch above. Not very bad though,
afaics. The user space just recieves 2 signals instead of one with
the right value of si_overrun.



While we are here, off-topics question. posix_timer_event() does

if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
struct task_struct *leader;
int ret = send_sigqueue(timr->sigq, timr->it_process, 0);

if (likely(ret >= 0))
return ret;

timr->it_sigev_notify = SIGEV_SIGNAL;
leader = timr->it_process->group_leader;
put_task_struct(timr->it_process);
timr->it_process = leader;
}

return send_sigqueue(timr->sigq, timr->it_process, 1);

Is it really useful to convert the SIGEV_THREAD_ID timer to the group-wide
timer when the thread dies?

This uglifies de_thread(), but more importantly this doesn't work reliably.
Suppose that timr->it_process dies with the pending ->sigq. In that case
the timer stops anyway.

Oleg.

2008-07-20 12:23:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On 07/20, Oleg Nesterov wrote:
>
> Suppose that the signal was already dequeued but sigq->info wasn't copied
> to the user-space.

To avoid a possible confusion: this is of course fine by itself, the content
of sigq->info was already copied to the local siginfo.

Oleg.

2008-07-21 00:48:13

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

> Yes, thanks, I see. But does it have any meaning for the user-space?
[si_sys_private]

No, it's not part of the user ABI. It's not even copied out (see
copy_siginfo_to_user). (A spare siginfo_t field was just used as a handy
bit of storage in the signal queue element. It's a kludge.)

> Let me repeat. Can't we make a simple fix for now for this nasty and
> ancient bug, before we do the more clever changes,

Probably. I don't find it easy to be sure there aren't more bad
problems caused by trying to re-send the same sigqueue entry.

You do need to clear si_overrun there to be correct in the usual case
(not already queued). The fields set explicitly in posix_timer_event,
plus si_overrun, are the only meaningful fields for SI_TIMER (again see
copy_siginfo_to_user). The memset is already superfluous.

It would be a perfectly fine and worthwhile optimization/cleanup on its
own just to move all the initialization of sigq->info (everything but
si_sys_private) to alloc_posix_timer. That would also happen to mask
the symptom of this double-queuing problem that's been observed.

Even if it's a fine stopgap, I'm not comfortable calling this a real "fix".
It seems likely this is the good choice for the stable branch.

> Suppose that the signal was already dequeued but sigq->info wasn't copied
> to the user-space. The thread which does dequeue_signal() unlocked siglock
> and waits for ->it_lock.

This is fine. sigq->info is copied into a local (kernel stack) copy in
collect_signal(), i.e. inside dequeue_signal() long before the unlock.
In short, once __dequeue_signal() returns, the signal goes from "queued"
to "delivered". The delivery is on its way to happening, and from the
instant that thread released the siglock it's no different from if it
had completed handler setup, returned to user, and had a long scheduling
delay before actually executing the first user instruction. At this
point, any other event is not "racing", it's just "after".

> posix_timer_event() sees list_empty(), proceeds and calls send_sigqueue().
> Now we requeue this ->sigq instead of incrementing si_overrun.

It's not "requeue". It's "queue" after it was no longer queued, with
semantics no different from the first time you queue it.

> The thread which does dequeue_signal() continues and re-schedules the
> timer while ->sigq is queued. Then it copies sigq->info to the user space.

The thread that dequeued the first timer signal had ceased all reference
to sigq by the time it unlocked siglock. When its do_schedule_next_timer
call gets it_lock, it can do bookkeeping in struct k_itimer to figure out
what posix_timer_event or timer_settime has done lately. (Like I said,
the ->fired flag alone might not really cover all the angles. I was
expecting you to figure out the exactly details for me. ;-)

> While we are here, off-topics question.

You know I hate those. We aren't here. We're in chairs reading email.
We'll be in the same places if you send a separate message for a separate
topic.

> Is it really useful to convert the SIGEV_THREAD_ID timer to the group-wide
> timer when the thread dies?

Not really. I think the main intent there is just to be sure we never
hold on to an old task_struct pointer (which might have been the bug in
the past).


Thanks,
Roland

2008-07-21 15:20:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On 07/20, Roland McGrath wrote:
>
> > Yes, thanks, I see. But does it have any meaning for the user-space?
> [si_sys_private]
>
> No, it's not part of the user ABI. It's not even copied out (see
> copy_siginfo_to_user).

Heh, I didn't know, thanks.

> > Let me repeat. Can't we make a simple fix for now for this nasty and
> > ancient bug, before we do the more clever changes,
>
> You do need to clear si_overrun there to be correct in the usual case
> (not already queued).

Indeed, I missed that. Can't we do this in send_sigqueue() ?

> It would be a perfectly fine and worthwhile optimization/cleanup on its
> own just to move all the initialization of sigq->info (everything but
> si_sys_private) to alloc_posix_timer.

Yes, we can do this in sys_timer_create(). But this is not very trivial to
do without uglifying the code futher, note this "if (timer_event_spec)".
And we can't do this after "->it_process = process", the timer is already
visible to sys_timer_settime().

> Even if it's a fine stopgap, I'm not comfortable calling this a real "fix".
> ...
> I don't find it easy to be sure there aren't more bad
> problems caused by trying to re-send the same sigqueue entry.

Yes, yes, I agree. I propose this change as a first step only.

> It seems likely this is the good choice for the stable branch.

So, what do you and Mark think about the patch below?

> > The thread which does dequeue_signal() continues and re-schedules the
> > timer while ->sigq is queued. Then it copies sigq->info to the user space.
>
> The thread that dequeued the first timer signal had ceased all reference
> to sigq by the time it unlocked siglock. When its do_schedule_next_timer
> call gets it_lock, it can do bookkeeping in struct k_itimer to figure out
> what posix_timer_event or timer_settime has done lately.

Yes, this should work.

Oleg.

--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin

int posix_timer_event(struct k_itimer *timr,int si_private)
{
- memset(&timr->sigq->info, 0, sizeof(siginfo_t));
timr->sigq->info.si_sys_private = si_private;
/* Send signal to the process that owns this timer.*/

timr->sigq->info.si_signo = timr->it_sigev_signo;
- timr->sigq->info.si_errno = 0;
timr->sigq->info.si_code = SI_TIMER;
timr->sigq->info.si_tid = timr->it_id;
timr->sigq->info.si_value = timr->it_sigev_value;
@@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
kmem_cache_free(posix_timers_cache, tmr);
tmr = NULL;
}
+ memset(&tmr->sigq->info, 0, sizeof(siginfo_t));
return tmr;
}

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1310,6 +1310,7 @@ int send_sigqueue(struct sigqueue *q, st
q->info.si_overrun++;
goto out;
}
+ q->info.si_overrun = 0;

signalfd_notify(t, sig);
pending = group ? &t->signal->shared_pending : &t->pending;

2008-07-21 15:36:39

by Oleg Nesterov

[permalink] [raw]
Subject: do_schedule_next_timer && si_overrun (Was: [PATCH] posix-timers: Do not modify an already queued timer signal)

On 07/21, Oleg Nesterov wrote:
>
> On 07/20, Roland McGrath wrote:
> >
> > You do need to clear si_overrun there to be correct in the usual case
> > (not already queued).
>
> Indeed, I missed that. Can't we do this in send_sigqueue() ?

The more I look at the code, the more I'm getting confused...

Suppose that posix_timer_event()->send_sigqueue() increments info.si_overrun.
But (in general) it is not reported to the user-space, do_schedule_next_timer()
does:

info->si_overrun = timr->it_overrun_last;

shouldn't it do

info->si_overrun += timr->it_overrun_last;

instead?

Oleg.

2008-07-21 15:56:07

by Oliver Pinter

[permalink] [raw]
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

[add stable]

On 7/21/08, Oleg Nesterov <[email protected]> wrote:
> On 07/20, Roland McGrath wrote:
>>
>> > Yes, thanks, I see. But does it have any meaning for the user-space?
>> [si_sys_private]
>>
>> No, it's not part of the user ABI. It's not even copied out (see
>> copy_siginfo_to_user).
>
> Heh, I didn't know, thanks.
>
>> > Let me repeat. Can't we make a simple fix for now for this nasty and
>> > ancient bug, before we do the more clever changes,
>>
>> You do need to clear si_overrun there to be correct in the usual case
>> (not already queued).
>
> Indeed, I missed that. Can't we do this in send_sigqueue() ?
>
>> It would be a perfectly fine and worthwhile optimization/cleanup on its
>> own just to move all the initialization of sigq->info (everything but
>> si_sys_private) to alloc_posix_timer.
>
> Yes, we can do this in sys_timer_create(). But this is not very trivial to
> do without uglifying the code futher, note this "if (timer_event_spec)".
> And we can't do this after "->it_process = process", the timer is already
> visible to sys_timer_settime().
>
>> Even if it's a fine stopgap, I'm not comfortable calling this a real
>> "fix".
>> ...
>> I don't find it easy to be sure there aren't more bad
>> problems caused by trying to re-send the same sigqueue entry.
>
> Yes, yes, I agree. I propose this change as a first step only.
>
>> It seems likely this is the good choice for the stable branch.
>
> So, what do you and Mark think about the patch below?
>
>> > The thread which does dequeue_signal() continues and re-schedules the
>> > timer while ->sigq is queued. Then it copies sigq->info to the user
>> > space.
>>
>> The thread that dequeued the first timer signal had ceased all reference
>> to sigq by the time it unlocked siglock. When its do_schedule_next_timer
>> call gets it_lock, it can do bookkeeping in struct k_itimer to figure out
>> what posix_timer_event or timer_settime has done lately.
>
> Yes, this should work.
>
> Oleg.
>
> --- a/kernel/posix-timers.c
> +++ b/kernel/posix-timers.c
> @@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin
>
> int posix_timer_event(struct k_itimer *timr,int si_private)
> {
> - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
> timr->sigq->info.si_sys_private = si_private;
> /* Send signal to the process that owns this timer.*/
>
> timr->sigq->info.si_signo = timr->it_sigev_signo;
> - timr->sigq->info.si_errno = 0;
> timr->sigq->info.si_code = SI_TIMER;
> timr->sigq->info.si_tid = timr->it_id;
> timr->sigq->info.si_value = timr->it_sigev_value;
> @@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
> kmem_cache_free(posix_timers_cache, tmr);
> tmr = NULL;
> }
> + memset(&tmr->sigq->info, 0, sizeof(siginfo_t));
> return tmr;
> }
>
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1310,6 +1310,7 @@ int send_sigqueue(struct sigqueue *q, st
> q->info.si_overrun++;
> goto out;
> }
> + q->info.si_overrun = 0;
>
> signalfd_notify(t, sig);
> pending = group ? &t->signal->shared_pending : &t->pending;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Thanks,
Oliver