Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754402Ab1FFQL0 (ORCPT ); Mon, 6 Jun 2011 12:11:26 -0400 Received: from merlin.infradead.org ([205.233.59.134]:39907 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752785Ab1FFQLY convert rfc822-to-8bit (ORCPT ); Mon, 6 Jun 2011 12:11:24 -0400 Subject: Re: Change in functionality of futex() system call. From: Peter Zijlstra To: Eric Dumazet Cc: David Oliver , linux-kernel@vger.kernel.org, Shawn Bohrer , Zachary Vonler , KOSAKI Motohiro , Hugh Dickins , Thomas Gleixner , Darren Hart , Ingo Molnar In-Reply-To: <1307373819.3098.40.camel@edumazet-laptop> References: <1307373819.3098.40.camel@edumazet-laptop> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Date: Mon, 06 Jun 2011 18:11:12 +0200 Message-ID: <1307376672.2322.167.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5442 Lines: 162 On Mon, 2011-06-06 at 17:23 +0200, Eric Dumazet wrote: > Le lundi 06 juin 2011 à 09:28 -0500, David Oliver a écrit : > > Hello, > > > > The functionality of the futex() system call appears to have changed > > between versions 2.6.18 and 2.6.32.28. > > > > Specifically, performing a FUTEX_WAIT on a read-only mapped location > > results in an EFAULT. Although other operations, such as FUTEX_WAKE, > > are only meaningful for writable locations, FUTEX_WAIT is useful for > > processes with read-only access to a memory-mapped file. > > > > The code below illustrates the changed behavior (each of the EXPECT > > operations succeed on the older kernel, the ASSERTs pass in each > > case), assuming the file /tmp/futex_test exists and contains int(42). > > > > With the older kernel, the syscall() suspends until another process > > changes the file and issues a FUTEX_WAKE, whereas the new behavior is > > for an EFAULT error, independent of the file contents. > > > > Let me know if you need further clarification. > > > > Cheers! > > > > David Oliver. > > > > > > #include > > #include > > #include > > typedef uint32_t u32; // for futex.h > > #include > > #include > > #include > > #include > > #include "gtest/gtest.h" // test framework to illustrate issue. > > > > > > TEST(Futex, futex_in_read_only_file_is_ok) { > > int fd = open("/tmp/futex_test", O_RDONLY); > > ASSERT_GE(fd, 0); > > int* futex = static_cast(mmap(0, sizeof(int), PROT_READ, > > MAP_SHARED, fd, 0)); > > ASSERT_NE((int *)(0), futex); > > > > int rc = syscall(SYS_futex, futex, FUTEX_WAIT, 42, 0, 0, 0); > > > > EXPECT_NE(-1, rc); // fails. > > if (rc == -1) { > > EXPECT_NE(errno, EFAULT); // fails. > > } > > } > > > > Right you are, this came from commit 7485d0d3758e8e6491a5 (futexes: > Remove rw parameter from get_futex_key()) in 2.6.33 > > commit 7485d0d3758e8e6491a5c9468114e74dc050785d > Author: KOSAKI Motohiro > Date: Tue Jan 5 16:32:43 2010 +0900 > > futexes: Remove rw parameter from get_futex_key() > > Currently, futexes have two problem: > > A) The current futex code doesn't handle private file mappings properly. > > get_futex_key() uses PageAnon() to distinguish file and > anon, which can cause the following bad scenario: > > 1) thread-A call futex(private-mapping, FUTEX_WAIT), it > sleeps on file mapping object. > 2) thread-B writes a variable and it makes it cow. > 3) thread-B calls futex(private-mapping, FUTEX_WAKE), it > wakes up blocked thread on the anonymous page. (but it's nothing) > > B) Current futex code doesn't handle zero page properly. > > Read mode get_user_pages() can return zero page, but current > futex code doesn't handle it at all. Then, zero page makes > infinite loop internally. > > The solution is to use write mode get_user_page() always for > page lookup. It prevents the lookup of both file page of private > mappings and zero page. > > Performance concerns: > > Probaly very little, because glibc always initialize variables > for futex before to call futex(). It means glibc users never see > the overhead of this patch. > > Compatibility concerns: > > This patch has few compatibility issues. After this patch, > FUTEX_WAIT require writable access to futex variables (read-only > mappings makes EFAULT). But practically it's not a problem, > glibc always initalizes variables for futexes explicitly - nobody > uses read-only mappings. Urgh,. maybe something like the below but with more conditionals that enable the extra logic only for FUTEX_WAIT.. The idea is to try a RO gup() when the RW gup() fails so as not to slow down the common path of writable anonymous maps and bail when we used the RO path on anonymous memory. --- diff --git a/kernel/futex.c b/kernel/futex.c index fe28dc2..11f2ad1 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -234,7 +234,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key) unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current->mm; struct page *page, *page_head; - int err; + int err, ro = 0; /* * The futex address must be "naturally" aligned. @@ -262,6 +262,10 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key) again: err = get_user_pages_fast(address, 1, 1, &page); + if (err == -EFAULT) { + err = get_user_pages_fast(address, 1, 0, &page); + ro = 1; + } if (err < 0) return err; @@ -316,6 +320,11 @@ again: * the object not the particular process. */ if (PageAnon(page_head)) { + if (ro) { + err = -EFAULT; + goto out; + } + key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */ key->private.mm = mm; key->private.address = address; @@ -327,9 +336,10 @@ again: get_futex_key_refs(key); +out: unlock_page(page_head); put_page(page_head); - return 0; + return err; } static inline void put_futex_key(union futex_key *key) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/