Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756648AbcKDOVn (ORCPT ); Fri, 4 Nov 2016 10:21:43 -0400 Received: from mail-ua0-f174.google.com ([209.85.217.174]:36204 "EHLO mail-ua0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755314AbcKDOVl (ORCPT ); Fri, 4 Nov 2016 10:21:41 -0400 MIME-Version: 1.0 In-Reply-To: <20161104131408.16886-1-cov@codeaurora.org> References: <20161104131408.16886-1-cov@codeaurora.org> From: Andy Lutomirski Date: Fri, 4 Nov 2016 07:21:20 -0700 Message-ID: Subject: Re: [PATCH] procfs: Add mem_end to /proc//stat To: Christopher Covington Cc: linux-mm@vger.kernel.org, Jonathan Corbet , Andrew Morton , Michal Hocko , Vlastimil Babka , Hugh Dickins , Konstantin Khlebnikov , Naoya Horiguchi , "Kirill A. Shutemov" , Robert Foss , John Stultz , Robert Ho , Ross Zwisler , Jerome Marchand , Andy Lutomirski , Johannes Weiner , Kees Cook , Alexey Dobriyan , Jann Horn , Joe Perches , Andy Shevchenko , "Richard W.M. Jones" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Christopher Covington Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3032 Lines: 59 On Fri, Nov 4, 2016 at 6:14 AM, Christopher Covington wrote: > Applications such as Just-In-Time (JIT) compilers, Checkpoint/Restore In > Userspace (CRIU), and User Mode Linux (UML) need to know the highest > virtual address, TASK_SIZE, to implement pointer tagging or make a first > educated guess at where to find a large, unused region of memory. > Unfortunately the currently available mechanisms for determining TASK_SIZE > are either convoluted and potentially error-prone, such as making repeated > munmap() calls and checking the return code, Oh boy -- if you do this you are just asking to segfault. > or make use of hard-coded > assumptions that limit an application's portability across kernels with > different Kconfig options and multiple architectures. > > Therefore, expose TASK_SIZE to userspace. While PAGE_SIZE is exposed to > userspace via an auxiliary vector, that approach is not used for TASK_SIZE > in case run-time alterations to the usable virtual address range are one > day implemented, such as through an extension to prctl(PR_SET_MM) or a flag > to clone. There is no prctl(PR_GET_MM). Instead such information is > expected to come from /proc//stat[m]. For the same extendability > reason, use a per-pid proc entry rather than a system-wide entry like > /proc/sys/vm/mmap_min_addr. First, this should be in status, not stat, but that's moot because TASK_SIZE is nonsensical as a task property on x86. And, as was nicely covered yesterday at LPC, we already have too much of a mess in /proc where per-mm properties are mixed up with per-task properties. Can we make a point of not adding any new mm-related things to files that are about the task? But also, NAK for x86 if you look at TASK_SIZE: TASK_SIZE is a mess and needs to go away completely -- only TASK_SIZE_MAX makes any sense. If you want to ask "what the largest address that could possibly be mapped in this mm", the answer is 2^47-1-PAGE_SIZE [1] on present CPUs. If you want a prctl to return that, then adding one *might* make sense. OTOH it's a bit unclear what happens if your task is migrated to a hypothetical future CPU with more address bits. If you're a 32-bit process on x86, you have zero high bits free because the address limit is above 2^31-1. If you're an x32 process, then (a) I'm surprised and (b) there might be room for "what is the highest address that an mmap call done without trickery would return". That could be added as well with a suitably scary name in prctl. But this is still rather odd: x32 pointers are exactly 32 bits unless you write weird asm code to use 64-bit pointers, and you wouldn't do that because it defeats the whole point of x32 which is to treat all pointers as exactly 32 bits. So an x32 application should just hard-code 32 as the number of bits. [1] That PAGE_SIZE offset has an interesting backstory involving some overly clever Intel hardware designers and a root hole that, as far as I know, affected every single x86_64 operating system. --Andy