Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754858AbZCTIQ0 (ORCPT ); Fri, 20 Mar 2009 04:16:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754141AbZCTIQK (ORCPT ); Fri, 20 Mar 2009 04:16:10 -0400 Received: from casper.infradead.org ([85.118.1.10]:42176 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753864AbZCTIQI (ORCPT ); Fri, 20 Mar 2009 04:16:08 -0400 Subject: Re: check *uaddr==val after queueing - without faulting From: Peter Zijlstra To: Darren Hart Cc: "lkml," , Thomas Gleixner , Ingo Molnar , John Stultz , Jakub Jelinek , Ulrich Drepper , Eric Dumazet , Oleg Nesterov In-Reply-To: <49C2C0D6.5080700@us.ibm.com> References: <49C2BCF4.50908@us.ibm.com> <49C2C0D6.5080700@us.ibm.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Fri, 20 Mar 2009 09:15:34 +0100 Message-Id: <1237536935.24626.26.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.24.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2483 Lines: 46 On Thu, 2009-03-19 at 15:01 -0700, Darren Hart wrote: > Adding a few key folks to the Cc, apologies for the short initial Cc list. > > Darren Hart wrote: > > The current futex_wait() code (I'm looking at tip/core/futexes) > > conflicts with a warning in the comments about checking *uaddr==val > > before the futex_q is queued on the hb list. While userspace is able to > > alter *uaddr at will and should expect to hang in the kernel forever > > should it do so haphazardly, there are legitimate scenarios where the > > futex value might change between the call to futex_wait() and when the > > futex_q gets on the hb list. > > > > For example, glibc protects access to the value of cond.__data.__futex > > via the cond.__data.__lock. However, before it can issue the syscall it > > has to drop the cond.__data.__lock, leaving a small race window where > > userspace might issue a signal or broadcast, which will modify the value > > of cond.__data.__futex. As I understand it, this will result in the > > waiter having changed the value of the futex prior to entering the > > kernel, but not enqueuing itself on the hb list until after the waiter > > issues the broadcast that was intended to wake it up. > > > > I was working up a patch to move the test to after the call to > > queue_me(), but in order to do the test we also have to perform the > > get_user() after the queue_me(), which might sleep if we still hold the > > hb->lock. If we let queue_me() drop the hb->lock before we call > > get_user() then we may see a legitimate change in *uaddr that occured > > after the queue_me() and before the get_user(). > > > > I'm at a loss for how to resolve the race without causing the false > > positive inside the kernel. It might be resolvable in glibc by looking > > at the return code from futex_requeue and checking if the number > > woken_or_requeued agrees with the number it expected to be sleeping; > > this likely leaves other gaps for other waking calls, like FUTEX_WAKE. > > > > Any thoughts? Am I missing something that guards against this race? get_user_pages_fast() the futex page, that will pin it, then under the lock you can kmap_atomic() the page, and read it. Probably massive overkill though :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/