Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760069AbYGPQSU (ORCPT ); Wed, 16 Jul 2008 12:18:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756910AbYGPQSK (ORCPT ); Wed, 16 Jul 2008 12:18:10 -0400 Received: from x346.tv-sign.ru ([89.108.83.215]:34068 "EHLO mail.screens.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752116AbYGPQSJ (ORCPT ); Wed, 16 Jul 2008 12:18:09 -0400 Date: Wed, 16 Jul 2008 20:21:31 +0400 From: Oleg Nesterov To: Mark McLoughlin Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Roland McGrath , Thomas Gleixner Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal Message-ID: <20080716162131.GA1785@tv-sign.ru> References: <1216219846-663-1-git-send-email-markmc@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1216219846-663-1-git-send-email-markmc@redhat.com> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5616 Lines: 151 On 07/16, Mark McLoughlin wrote: > > When a timer fires, posix_timer_event() zeroes out its > pre-allocated siginfo structure, initialises it and then > queues up the signal with send_sigqueue(). > > However, we may have previously queued up this signal, in > which case we only want to increment si_overrun and > re-initialising the siginfo structure is incorrect. Quoting Roland McGrath: > > I'm not clear on how the already-queued case could ever happen. Do we > really need that check at all? It shouldn't be possible for the timer to > be firing when it's already queued, because it won't have been reloaded. > It only reloads via do_schedule_next_timer after it's dequeued, or because > a 1 return value said it never was queued. > Also, since we are modifying an already queued signal > without the protection of the sighand spinlock, we may also > race with e.g. collect_signal() causing it to fail to find > a signal on the pending list because it happens to look at > the siginfo struct after it was zeroed and before it was > re-initialised. > > The race was observed with a modified kvm-userspace when > running a guest under heavy network load. When it occurs, > KVM never sees another SIGALRM signal because although > the signal is queued up the appropriate bit is never set > in the pending mask. Hmm. Yes, if collect_signal() races with posix_timer_event()->memset(), we dequeue SIGALRM but leave the siginfo on list. I can't see how this is possible though... Could you verify that we don't have another bug? Say, add WARN_ON(!list_empty()) to posix_timer_event(). If we need this fix, perhaps it is better to modify posix_timer_event() to check !list_empty()? Add Thomas. Oleg. > Manually sending the process a SIGALRM > kicks it out of this state. > > The fix is simple - only modify the pre-allocated sigqueue > once we're sure that it hasn't already been queued. > > Signed-off-by: Mark McLoughlin > Cc: Oleg Nesterov > Cc: Roland McGrath > > --- > include/linux/sched.h | 2 +- > kernel/posix-timers.c | 20 +++++++++++--------- > kernel/signal.c | 5 +++-- > 3 files changed, 15 insertions(+), 12 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 2134917..718f7ec 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1791,7 +1791,7 @@ extern void zap_other_threads(struct task_struct *p); > extern int kill_proc(pid_t, int, int); > extern struct sigqueue *sigqueue_alloc(void); > extern void sigqueue_free(struct sigqueue *); > -extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group); > +extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group); > extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *); > extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long); > > diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c > index dbd8398..b42c964 100644 > --- a/kernel/posix-timers.c > +++ b/kernel/posix-timers.c > @@ -298,19 +298,21 @@ void do_schedule_next_timer(struct siginfo *info) > > int posix_timer_event(struct k_itimer *timr,int si_private) > { > - memset(&timr->sigq->info, 0, sizeof(siginfo_t)); > - timr->sigq->info.si_sys_private = si_private; > + siginfo_t info; > + > + memset(&info, 0, sizeof(siginfo_t)); > + info.si_sys_private = si_private; > /* Send signal to the process that owns this timer.*/ > > - timr->sigq->info.si_signo = timr->it_sigev_signo; > - timr->sigq->info.si_errno = 0; > - timr->sigq->info.si_code = SI_TIMER; > - timr->sigq->info.si_tid = timr->it_id; > - timr->sigq->info.si_value = timr->it_sigev_value; > + info.si_signo = timr->it_sigev_signo; > + info.si_errno = 0; > + info.si_code = SI_TIMER; > + info.si_tid = timr->it_id; > + info.si_value = timr->it_sigev_value; > > if (timr->it_sigev_notify & SIGEV_THREAD_ID) { > struct task_struct *leader; > - int ret = send_sigqueue(timr->sigq, timr->it_process, 0); > + int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0); > > if (likely(ret >= 0)) > return ret; > @@ -321,7 +323,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private) > timr->it_process = leader; > } > > - return send_sigqueue(timr->sigq, timr->it_process, 1); > + return send_sigqueue(timr->sigq, &info, timr->it_process, 1); > } > EXPORT_SYMBOL_GPL(posix_timer_event); > > diff --git a/kernel/signal.c b/kernel/signal.c > index 6c0958e..50e0b13 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q) > __sigqueue_free(q); > } > > -int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) > +int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group) > { > - int sig = q->info.si_signo; > + int sig = info->si_signo; > struct sigpending *pending; > unsigned long flags; > int ret; > @@ -1322,6 +1322,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) > > signalfd_notify(t, sig); > pending = group ? &t->signal->shared_pending : &t->pending; > + copy_siginfo(&q->info, info); > list_add_tail(&q->list, &pending->list); > sigaddset(&pending->signal, sig); > complete_signal(sig, t, group); > -- > 1.5.5.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/