Received: by 10.223.164.202 with SMTP id h10csp471129wrb; Tue, 7 Nov 2017 09:10:28 -0800 (PST) X-Google-Smtp-Source: ABhQp+RGhE7C8HGutRTS4Luvhxz5g1Ed5Q4VoQ5y4RbQlqh0UmWO6n8VYgs13M/Kdj/rSzVhec/G X-Received: by 10.98.157.156 with SMTP id a28mr21485769pfk.74.1510074628079; Tue, 07 Nov 2017 09:10:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510074628; cv=none; d=google.com; s=arc-20160816; b=sOJuuGZI88IemLXD7QnjjUbCb9iAGuFzQyTo3zoYGQ3aEh+VhT7PsH64YVCtP2I2GF hapM2Bf5fjoQ8PlVMbnfxlrInl8Qa010dG8xhvKiGy/da7Vwt0topDzUL4qKEWHiq44l koSW5lH2tqce4p3vC9kd1kQCze/hBknaB0GiBJzBihUZUq4Gy7g1SwFeLbOk/JfVY4Io lXHWwUBLx4KZTf6Nya8eCHroL4P6VWnPfFO9jKohcVijswwYbcBRARIhr0qWSBT8An3F 3wVBIuVU+Rs3BhIUlNhcYhbYjA1hfJyHSeGp0hDYTxPr1eiKo9QW4A/i8n+a7jGTeC1/ bthw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature:arc-authentication-results; bh=ZCCfT5peHFiC+M/LWxK1bqF6Zk5TeE3yi7aGPATYqAA=; b=tiE5B7fq/SC7jt6u8OxYWRqRJ+3qgG5g1Wjn/yvaK5k/YuoLJZ3S5wLUdobWoMqhoz tZhOV4s4c94Buw3CRS/8RR95wJE4cA0RTqqva3bkGdRMfPkI1K9A2prdeVZIt+e/fIgl JNxLnMA//j3RvI4YQHeWWBq95AlphqXvr10LHl8moso4kitWgnPXa1L7jRyj6uVxH2JZ pbbMp2hlhIUYQxZUH9bWzPb0vuTbs879NDt/Fq8SwnlHa1d8+IXJgvbuwColmEJAgh1J /9YwCGSyGXkrXByaMlDqpbSj9RjDVkR1BOVwGNAdhuXxXJxbMPe361xyMgyBi63PijJF bxOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IiHvNMek; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3si1672792plm.80.2017.11.07.09.10.15; Tue, 07 Nov 2017 09:10:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IiHvNMek; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752482AbdKGFHX (ORCPT + 91 others); Tue, 7 Nov 2017 00:07:23 -0500 Received: from mail-pf0-f196.google.com ([209.85.192.196]:52710 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751906AbdKGFHT (ORCPT ); Tue, 7 Nov 2017 00:07:19 -0500 Received: by mail-pf0-f196.google.com with SMTP id e64so9468997pfk.9; Mon, 06 Nov 2017 21:07:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=ZCCfT5peHFiC+M/LWxK1bqF6Zk5TeE3yi7aGPATYqAA=; b=IiHvNMekJdGx9xKBJjr7IQGzxM/r+WzfiFqBRe3WpnRdUhCGwVCDrtkkTl8lXrY9nd dcgYzQ0s2DXYtdswu5YDJcKJBXjJiXJeX511+GqvJJmZuS1Xrkz1CgBbjxAZ3kbQs5HJ gYJWAMVg5yqAuviKkIwvnAMxnEo1GOTGj7C9wKOjOZI/9dmMVKuaefBiV7AlykoOXvAd 0skyloemiCN2BCPFyiLOAouUF33vBztvBKIx1mG9sn7Z1YaMrIIQQz2IsjsvatvZR4nj swJflUrpa3rb7KMMpCMQ3u40W3HAPmRk0fqpyRK3OwMp7wnsa1E2V2lxwa8Vcl+mteom V+Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=ZCCfT5peHFiC+M/LWxK1bqF6Zk5TeE3yi7aGPATYqAA=; b=kKIk6wRtvgmf6hXKJYh6hjdNK6CGYVzIGpfXiK5wi5iKct2IJIZumEo9nivCE9XKjB ut4hNWihkxBvMvob57D+7NPe115jf/FaJ/wqXHLGuPydHSwZ1xIMwDDvUPUVRFJND+RZ yUVkOtOM9JCYLO+GmDjn2q+WjO+VuVi+FZHMf2HAeJIHMunEBQwQVsqLxqNT5rWRqn5F sbwKUD3Noon8uPzCe1fWxBwl7axxWqqXxflSmY3mwuVrCqeaFo7uZnQ8DAQtkJWwmms3 7lgGUlXH1gJ2L2VFAApIGjDp5RA/X/JiXXOpMAWJOZC0g7y8bxJ+VM75Su+cj6D+mfxN Bw5g== X-Gm-Message-State: AMCzsaUfJAbpQ89GzISyhIZq6e0im7482NUZ13kkHNbpfYb7Z5N1gjC5 uHZiYOpy/n0pWyWZygmhOt4= X-Received: by 10.101.69.6 with SMTP id n6mr17258881pgq.290.1510031238606; Mon, 06 Nov 2017 21:07:18 -0800 (PST) Received: from roar.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id k8sm675424pgt.22.2017.11.06.21.07.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 06 Nov 2017 21:07:17 -0800 (PST) Date: Tue, 7 Nov 2017 16:07:05 +1100 From: Nicholas Piggin To: Florian Weimer Cc: "Aneesh Kumar K.V" , "Kirill A. Shutemov" , linuxppc-dev@lists.ozlabs.org, linux-mm , Andrew Morton , Andy Lutomirski , Dave Hansen , Linus Torvalds , Peter Zijlstra , Thomas Gleixner , linux-arch@vger.kernel.org, Ingo Molnar , Linux Kernel Mailing List Subject: Re: POWER: Unexpected fault when writing to brk-allocated memory Message-ID: <20171107160705.059e0c2b@roar.ozlabs.ibm.com> In-Reply-To: <546d4155-5b7c-6dba-b642-29c103e336bc@redhat.com> References: <20171105231850.5e313e46@roar.ozlabs.ibm.com> <871slcszfl.fsf@linux.vnet.ibm.com> <20171106174707.19f6c495@roar.ozlabs.ibm.com> <24b93038-76f7-33df-d02e-facb0ce61cd2@redhat.com> <20171106192524.12ea3187@roar.ozlabs.ibm.com> <546d4155-5b7c-6dba-b642-29c103e336bc@redhat.com> Organization: IBM X-Mailer: Claws Mail 3.15.1-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org C'ing everyone who was on the x86 56-bit user virtual address patch. I think we need more time to discuss this behaviour, in light of the regression Florian uncovered. I would propose we turn off the 56-bit user virtual address support for x86 for 4.14, and powerpc would follow and turn off its 512T support until we can get a better handle on the problems. (Actually Florian initially hit a couple of bugs in powerpc implementation, but pulling that string uncovers a whole lot of difficulties.) The bi-modal behavior switched based on a combination of mmap address hint and MAP_FIXED just sucks. It's segregating our VA space with some non-standard heuristics, and it doesn't seem to work very well. What are we trying to do? Allow SAP HANA etc use huge address spaces by coding to these specific mmap heuristics we're going to add, rather than solving it properly in a way that requires adding a new syscall or personality or prctl or sysctl. Okay, but the cost is that despite best efforts, it still changes ABI behaviour for existing applications and these heuristics will become baked into the ABI that we will have to support. Not a good tradeoff IMO. First of all, using addr and MAP_FIXED to develop our heuristic can never really give unchanged ABI. It's an in-band signal. brk() is a good example that steadily keeps incrementing address, so depending on malloc usage and address space randomization, you will get a brk() that ends exactly at 128T, then the next one will be > DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space. Second, the kernel can never completely solve the problem this way. How do we know a malloc library will not ask for > 128TB addresses and pass them to an unknowing application? And lastly, there are a fair few bugs and places where description in changelogs and mailing lists does not match code. You don't want to know the mess in powerpc, but even x86 has two I can see: MAP_FIXED succeeds even when crossing 128TB addresses (where changelog indicated it should not), arch_get_unmapped_area_topdown() with an address hint is checking against TASK_SIZE rather than the limited 128TB address, so it looks like it won't follow the heuristics. So unless everyone else thinks I'm crazy and disagrees, I'd ask for a bit more time to make sure we get this interface right. I would hope for something like prctl PR_SET_MM which can be used to set our user virtual address bits on a fine grained basis. Maybe a sysctl, maybe a personality. Something out-of-band. I don't wan to get too far into that discussion yet. First we need to agree whether or not the code in the tree today is a problem. Thanks, Nick On Mon, 6 Nov 2017 09:32:25 +0100 Florian Weimer wrote: > On 11/06/2017 09:30 AM, Aneesh Kumar K.V wrote: > > On 11/06/2017 01:55 PM, Nicholas Piggin wrote: > >> On Mon, 6 Nov 2017 09:11:37 +0100 > >> Florian Weimer wrote: > >> > >>> On 11/06/2017 07:47 AM, Nicholas Piggin wrote: > >>>> "You get < 128TB unless explicitly requested." > >>>> > >>>> Simple, reasonable, obvious rule. Avoids breaking apps that store > >>>> some bits in the top of pointers (provided that memory allocator > >>>> userspace libraries also do the right thing). > >>> > >>> So brk would simplify fail instead of crossing the 128 TiB threshold? > >> > >> Yes, that was the intention and that's what x86 seems to do. > >> > >>> > >>> glibc malloc should cope with that and switch to malloc, but this code > >>> path is obviously less well-tested than the regular way. > >> > >> Switch to mmap() I guess you meant? > > Yes, sorry. > > >> powerpc has a couple of bugs in corner cases, so those should be fixed > >> according to intended policy for stable kernels I think. > >> > >> But I question the policy. Just seems like an ugly and ineffective wart. > >> Exactly for such cases as this -- behaviour would change from run to run > >> depending on your address space randomization for example! In case your > >> brk happens to land nicely on 128TB then the next one would succeed. > > > > Why ? It should not change between run to run. We limit the free > > area search range based on hint address. So we should get consistent > > results across run. even if we changed the context.addr_limit. > > The size of the gap to the 128 TiB limit varies between runs because of > ASLR. So some runs would use brk alone, others would use brk + malloc. > That's not really desirable IMHO. From 1583417043601202476@xxx Tue Nov 07 14:16:06 +0000 2017 X-GM-THRID: 1583404961130869946 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread