Received: by 10.223.164.202 with SMTP id h10csp177446wrb; Tue, 7 Nov 2017 04:51:23 -0800 (PST) X-Google-Smtp-Source: ABhQp+RtAhNmyO7LcFmOYpV9SCI90AK8zg5SFIrlQRW3bHRk0xlxVP3vzeG1SpnSM/StO2Qsd/pv X-Received: by 10.99.178.15 with SMTP id x15mr15650866pge.243.1510059083278; Tue, 07 Nov 2017 04:51:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510059083; cv=none; d=google.com; s=arc-20160816; b=fobek7e2zlEoSxNgtOwwv8SY+eY0HXQm1iozE/zs5fdvkTDPtL7kcco9NdO0punbJe cGZo6IH0cf5mx7pYlWogbGl9Si7tHGdOFPDwJVSiA3qYCLMKIubtE5Y5va6bf7U205eT 20UbUlL2+QqMSudcyOzj0qGkeKA9mT2d6JHGPrsmCA370xQU7W5dhR4uIlMjNiecJRDw Vv4NLc58xDyH8UTGN1JooLxTrKZNNCm+J6QvwlCFgStl0TsGnNxrDlTQj8kWJ/BRB0/Z +iFFDTz/S5q4k7r9V8j+tPA6PH4DDKFyYBZy+O13Ot63inwBI2Qkya2IeTcwG0dSl5df 9jVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=h92cNkh1bA4EAoI4o7F8Y+pWUq/liCcwmCKoLXsXSDk=; b=Dfu2IlDKJqQF3+l+HS9lzgveqsMfHnQb7GDKoYmd7qxDI19wSIsdtrsetd/g6J4P6w NCngbkfeqPE3A5cQlfNQOyp42cqlRYholQEhBuWzJDpAoHkNCIE6sMNfJYYg1W6Mqorl 7SNCthpVShoJDbN6Cv2swXAaoNr2/OvDXeHhzFsCht07z0EQY+ELl5tifRU+wUX1huOY yy3O31dtR2kmacF7Q7bKh7XCEHBxeABMn1bW2F7nqn6NXAN87d7zvohxTdXwRReBX61H /NWt/uUFCNEGG6NbN6eiVMV8RA39dOCZgByRa/v8/zFANQDhnUX+JpadHr6m20KtRx/4 0HYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=ZAb9236l; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l3si1083243pgs.468.2017.11.07.04.51.10; Tue, 07 Nov 2017 04:51:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=ZAb9236l; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756878AbdKGLPt (ORCPT + 91 others); Tue, 7 Nov 2017 06:15:49 -0500 Received: from mail-wm0-f47.google.com ([74.125.82.47]:47869 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756586AbdKGLPr (ORCPT ); Tue, 7 Nov 2017 06:15:47 -0500 Received: by mail-wm0-f47.google.com with SMTP id r196so2925837wmf.2 for ; Tue, 07 Nov 2017 03:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=h92cNkh1bA4EAoI4o7F8Y+pWUq/liCcwmCKoLXsXSDk=; b=ZAb9236lOLoBKYpVJqUaUP+7Mu66IG/YsCkc5SN26ULZH9MWxfSxQJlWgvPRBMldx+ XU2PlXs+n05ikE60A66J6ziSYRSSj/T6uSTX1f2NKx+DybMwwMEOisgUbSsNV2DyoBYk I2JmWXAfDPZgLcIls16xtbq8j2ugKNKpprqoqQFY6AHX3dAjrP8Zu9H3fqo9LbD74Ua9 FccYm7HRvXDRqwXzzrpdUGYN0TLT7isTi5I5ysKq4DsJ2OIh0G6bZriAe4/J2y+f6wUl Mcxw2LT2T2Ykt6jMc0hecUbRviQdxVrbrv4W+GhD/uXXF+X36dVg+3b0aBWvEODYcvqI Areg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=h92cNkh1bA4EAoI4o7F8Y+pWUq/liCcwmCKoLXsXSDk=; b=M24cMjGUNigxncWpsgE3PZyV+YJU/dsGyUyrvkcLB+jjtONVkUOVnn2gCaR06C77L8 DSWYmpdx7rupKM2MXirt77pxf6bbxMgYtGa/kPEkETp3CQvsAPFC+jSobXag2MZtnxRd 1qjH9eg7ElwXYZV+Ata60Pg/qZ2WtxIPxx/aWUoult8awDhVuBp8OPhAXhkmlC07DwO+ 9RmL66gmLi3ZWrraSIo99Le5rI7OqB5JVaNKLP4P8YrWxsduRGlJm9MUv3ZZmVsc3FKD 6K5xujx1I9WSNTGYkMNoFpe/QkrcPhu2VgI/JoSJHzA30NneESI2TOqMqd/VQ/dj0euc U3gQ== X-Gm-Message-State: AMCzsaWt7G5UcEqbIS19hjfV6qOpB/bGfoBSifctj6im2/dFRCUL/ryi cCRcboOWVQ2rRtbvoyISUJ4H0g== X-Received: by 10.80.240.208 with SMTP id a16mr23976077edm.288.1510053345597; Tue, 07 Nov 2017 03:15:45 -0800 (PST) Received: from node.shutemov.name (mm-29-3-84-93.mgts.dynamic.pppoe.byfly.by. [93.84.3.29]) by smtp.gmail.com with ESMTPSA id j59sm1023142edd.78.2017.11.07.03.15.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Nov 2017 03:15:44 -0800 (PST) Received: by node.shutemov.name (Postfix, from userid 1000) id D5CF8648D520; Tue, 7 Nov 2017 14:15:43 +0300 (+03) Date: Tue, 7 Nov 2017 14:15:43 +0300 From: "Kirill A. Shutemov" To: Nicholas Piggin Cc: Florian Weimer , "Aneesh Kumar K.V" , "Kirill A. Shutemov" , linuxppc-dev@lists.ozlabs.org, linux-mm , Andrew Morton , Andy Lutomirski , Dave Hansen , Linus Torvalds , Peter Zijlstra , Thomas Gleixner , linux-arch@vger.kernel.org, Ingo Molnar , Linux Kernel Mailing List Subject: Re: POWER: Unexpected fault when writing to brk-allocated memory Message-ID: <20171107111543.ep57evfxxbwwlhdh@node.shutemov.name> References: <20171105231850.5e313e46@roar.ozlabs.ibm.com> <871slcszfl.fsf@linux.vnet.ibm.com> <20171106174707.19f6c495@roar.ozlabs.ibm.com> <24b93038-76f7-33df-d02e-facb0ce61cd2@redhat.com> <20171106192524.12ea3187@roar.ozlabs.ibm.com> <546d4155-5b7c-6dba-b642-29c103e336bc@redhat.com> <20171107160705.059e0c2b@roar.ozlabs.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171107160705.059e0c2b@roar.ozlabs.ibm.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 07, 2017 at 04:07:05PM +1100, Nicholas Piggin wrote: > C'ing everyone who was on the x86 56-bit user virtual address patch. > > I think we need more time to discuss this behaviour, in light of the > regression Florian uncovered. I would propose we turn off the 56-bit > user virtual address support for x86 for 4.14, and powerpc would > follow and turn off its 512T support until we can get a better handle > on the problems. (Actually Florian initially hit a couple of bugs in > powerpc implementation, but pulling that string uncovers a whole lot > of difficulties.) > > The bi-modal behavior switched based on a combination of mmap address > hint and MAP_FIXED just sucks. It's segregating our VA space with > some non-standard heuristics, and it doesn't seem to work very well. > > What are we trying to do? Allow SAP HANA etc use huge address spaces > by coding to these specific mmap heuristics we're going to add, > rather than solving it properly in a way that requires adding a new > syscall or personality or prctl or sysctl. Okay, but the cost is that > despite best efforts, it still changes ABI behaviour for existing > applications and these heuristics will become baked into the ABI that > we will have to support. Not a good tradeoff IMO. > > First of all, using addr and MAP_FIXED to develop our heuristic can > never really give unchanged ABI. It's an in-band signal. brk() is a > good example that steadily keeps incrementing address, so depending > on malloc usage and address space randomization, you will get a brk() > that ends exactly at 128T, then the next one will be > > DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space. No, it won't. You will hit stack first. > Second, the kernel can never completely solve the problem this way. > How do we know a malloc library will not ask for > 128TB addresses > and pass them to an unknowing application? The idea is that an application can provide hint (mallopt() ?) to malloc implementation that it's ready to full address space. In this case, malloc can use mmap((void *) -1,...) for its allocations and get full address space this way. > And lastly, there are a fair few bugs and places where description > in changelogs and mailing lists does not match code. You don't want > to know the mess in powerpc, but even x86 has two I can see: > MAP_FIXED succeeds even when crossing 128TB addresses (where changelog > indicated it should not), Hm. I don't see where the changelog indicated that MAP_FIXED across 128TB shouldn't work. My intention was that it should, although I haven't stated it in the changelog. The idea was we shouldn't allow to slip above 47-bits by accidentally. Correctly functioning program would never request addr+len above 47-bit with MAP_FIXED, unless it's ready to handle such addresses. Otherwise the request would simply fail on machine that doesn't support large VA. In contrast, addr+len above 47-bit without MAP_FIXED will not fail on machine that doesn't support large VA, kernel will find another place under 47-bit. And I can imagine a reasonable application that does something like this. So we cannot rely that application is ready to handle large addresses if we see addr+len without MAP_FIXED. > arch_get_unmapped_area_topdown() with an address hint is checking > against TASK_SIZE rather than the limited 128TB address, so it looks > like it won't follow the heuristics. You are right. This is broken. If user would request mapping above vdso, but below DEFAULT_MAP_WINDOW it will succeed. I'll send patch to fix this. But it doesn't look as a show-stopper to me. Re-checking things for this reply I found actual bug, see: http://lkml.kernel.org/r/20171107103804.47341-1-kirill.shutemov@linux.intel.com > So unless everyone else thinks I'm crazy and disagrees, I'd ask for > a bit more time to make sure we get this interface right. I would > hope for something like prctl PR_SET_MM which can be used to set > our user virtual address bits on a fine grained basis. Maybe a > sysctl, maybe a personality. Something out-of-band. I don't wan to > get too far into that discussion yet. First we need to agree whether > or not the code in the tree today is a problem. Well, we've discussed before all options you are proposing. Linus wanted a minimalistic interface, so we took this path for now. We can always add more ways to get access to full address space later. -- Kirill A. Shutemov From 1583411654279734486@xxx Tue Nov 07 12:50:26 +0000 2017 X-GM-THRID: 1583404961130869946 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread