Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753336Ab1F1X4N (ORCPT ); Tue, 28 Jun 2011 19:56:13 -0400 Received: from mga01.intel.com ([192.55.52.88]:5258 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753128Ab1F1X4F (ORCPT ); Tue, 28 Jun 2011 19:56:05 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.65,440,1304319600"; d="scan'208";a="21686900" Message-ID: <4E0A6A0F.9020004@linux.intel.com> Date: Tue, 28 Jun 2011 16:55:59 -0700 From: Darren Hart User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: Shawn Bohrer CC: Peter Zijlstra , KOSAKI Motohiro , eric.dumazet@gmail.com, david@rgmadvisors.com, linux-kernel@vger.kernel.org, zvonler@rgmadvisors.com, hughd@google.com, tglx@linutronix.de, mingo@elte.hu Subject: Re: [PATCH v3] futex: Fix regression with read only mappings References: <4E08F893.10103@linux.intel.com> <1309213375-31471-1-git-send-email-sbohrer@rgmadvisors.com> <1309258475.6701.200.camel@twins> <4E09EA96.5080005@linux.intel.com> <20110628173841.GA2089@BohrerMBP.rgmadvisors.com> In-Reply-To: <20110628173841.GA2089@BohrerMBP.rgmadvisors.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6026 Lines: 173 On 06/28/2011 10:38 AM, Shawn Bohrer wrote: > On Tue, Jun 28, 2011 at 07:52:06AM -0700, Darren Hart wrote: >> >> >> On 06/28/2011 03:54 AM, Peter Zijlstra wrote: >>> On Mon, 2011-06-27 at 17:22 -0500, Shawn Bohrer wrote: >>>> commit 7485d0d3758e8e6491a5c9468114e74dc050785d (futexes: Remove rw >>>> parameter from get_futex_key()) in 2.6.33 introduced a user-mode >>>> regression in that it additionally prevented futex operations on a >>>> region within a read only memory mapped file. For example this breaks >>>> workloads that have one or more reader processes doing a FUTEX_WAIT on a >>>> futex within a read only shared mapping, and a writer processes that has >>>> a writable mapping issuing the FUTEX_WAKE. >>>> >>>> This fixes the regression for futex operations that should be valid on >>>> RO mappings by trying a RO get_user_pages_fast() when the RW >>>> get_user_pages_fast() fails so as not to slow down the common path of >>>> writable anonymous maps and bailing when we used the RO path on >>>> anonymous memory. >>>> >>>> While fixing the regression this patch opens up two possible bad >>>> scenarios as identified by KOSAKI Motohiro: >>>> >>>> 1) This patch also allows FUTEX_WAIT on RO private mappings which have >>>> the following corner case. >>>> >>>> Thread-A: call futex(FUTEX_WAIT, memory-region-A). >>>> get_futex_key() return inode based key. >>>> sleep on the key >>>> Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A) >>>> Thread-B: write memory-region-A. >>>> COW happen. This process's memory-region-A become related >>>> to new COWed private (ie PageAnon=1) page. >>>> Thread-B: call futex(FUETX_WAKE, memory-region-A). >>>> get_futex_key() return mm based key. >>>> IOW, we fail to wake up Thread-A. >>>> >>>> 2) Current futex code doesn't handle zero page properly. >>>> >>>> Read mode get_user_pages() can return zero page, but current futex >>>> code doesn't handle it at all. Then, zero page makes infinite loop >>>> internally. >>>> >>>> This Patch is based on Peter Zijlstra's initial patch with modifications to >>>> only allow RO mappings for futex operations that need VERIFY_READ access. >>>> >>>> Reported-by: David Oliver >>>> Signed-off-by: Shawn Bohrer >>> >>> Acked-by: Peter Zijlstra >> >> I think we need to address #2 above first. > > I believe the following contrived case triggers the zero page problem: > > #include > #include > #include > #include > #include > #include > #include > #include > > > int main(int argc, char *argv[]) > { > int fd, *futex, rc, val = 42; > struct timespec ts = {.tv_sec = 2, .tv_nsec = 0 }; > > fd = open("/tmp/futex_test", O_RDWR|O_CREAT, 0644); > write(fd, &val, 4); > futex = (int *)mmap(0, sizeof(int), PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); > rc = syscall(SYS_futex, futex, FUTEX_WAIT, val, &ts, 0, 0); > printf("rc=%d errno=%d\n", rc, errno); > } > > Running the program above with my patch applied will spin in the > kernel at 100% CPU usage. > >> 1) Is the loop killable? > > Yes, SIGINT causes the program to return. OK, so while it is killable... I'd really rather not introduce a busy loop in the kernel. Especially not after we removed it with: http://lkml.org/lkml/2010/1/13/136 How about adding something like this. I _think_ the only way to get a ZERO_PAGE here now is with the contrived testcase below. $ git diff diff --git a/kernel/futex.c b/kernel/futex.c index 1737a66..a5417c2 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -316,6 +316,13 @@ again: if (!page_head->mapping) { unlock_page(page_head); put_page(page_head); + /* + * ZERO_PAGE pages don't have a mapping. Avoid a busy loop + * trying to find one. RW mapping would have COW'd (and thus + * have a mapping) so this page is RO and won't ever change. + */ + if ((page_head == ZERO_PAGE(address))) + return -EFAULT; goto again; } This returns EFAULT for the following testcase which spun in get_futex_key with your V3 patch. #include #include #include #include #include #include #include #include int main(int argc, char *argv[]) { int fd, *futex, rc; struct timespec ts = {.tv_sec = 2, .tv_nsec = 0 }; futex = (int *)mmap(0, sizeof(int), PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); printf("futex @ %p\n", futex); rc = syscall(SYS_futex, futex, FUTEX_WAIT, 0, &ts, 0, 0); printf("rc=%d errno=%d\n", rc, errno); } Thanks, Darren > >> 2) If not, and we try to catch zero_page scenario, we need to ensure RO >> sparse file mapping work. Peter notes that these are probably fine, >> filemap.c doesn't seem to use the zero page - but we need to test. >> >> It's agreed, I believe, between myself, Peter, and Shawn that we will >> document Private RW mappings as unsupported. >> >> -- >> Darren Hart >> Intel Open Source Technology Center >> Yocto Project - Linux Kernel > > > --------------------------------------------------------------- > This email, along with any attachments, is confidential. If you > believe you received this message in error, please contact the > sender immediately and delete all copies of the message. > Thank you. > -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/