Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1602278yba; Thu, 9 May 2019 20:34:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqxmSS7FHxoJjgIZqYJaW3acfDy6pQk3V/KTiRROsuSccXP1UVkYfABU8njZrIT8skEQV/Y+ X-Received: by 2002:a63:d10:: with SMTP id c16mr10542393pgl.156.1557459247529; Thu, 09 May 2019 20:34:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557459247; cv=none; d=google.com; s=arc-20160816; b=MasM+p17LaVMP8kZCYQp14X3cGunG9wmhxoY3bf0SFbeYChucm6N61tmj4PiXx/0y2 5LZtaWS/Z+8Ji8ALwgOWfFL6jKLosITs+ByTsum7d06d510Dn2WN+q77d5177BJpWZzv +fv5pn2LNsGAim6scOsus/rYIhLjkBVmHo8L/95R5RjiJp+fNdTrn5u5PO4pI5ndcw9b achAJv6WlkN4m4vfctW6okF0uA1TTd3R/L27B60VqwVC0uwyzEXGVB2lGIgMHTEaO4pb nQgg5EIQjP5Z0ZBpsx+Q4RWbn0+S+R5Vjs41Suht56l0g+KnKJyLgAhFX6Hdo6l3X/gg RBaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=HXxoktbLNx6nlt46e1rvXk315d1aoNJ/3GGDscIUkKw=; b=jliFTaQz+2eMYXjKt8QPdUpanzugTWVude4JYPnhXAsmGJjTUmujnVUDw7rDxvU9vK aCFdvCbUwg9KVgLKXaO+v18CpRjfO72vGOkqtg+3b3sA543iWtyTpzpVGUsdS/VaWqKW wUpFh7wAqsCX/gQ7Ded/MxhOdf7NGo2DOde6tQXL5bAaxscUfSeIXdrX0p7+0QnoNT1g OwWKlAIF0l845pX9QzevI3YfX341splU+Yxv10hVcioBmudCh9kox0Vffd293fVtzqXM RNops++X0dM+1pDZ9dTubPEJf4YWDDs1KCN3eqoGV9hAXQhYL+byiyR2Skkp+UDs1Y04 qXIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 127si5562131pgc.349.2019.05.09.20.33.51; Thu, 09 May 2019 20:34:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727081AbfEJDc2 (ORCPT + 99 others); Thu, 9 May 2019 23:32:28 -0400 Received: from ozlabs.org ([203.11.71.1]:37975 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726882AbfEJDc2 (ORCPT ); Thu, 9 May 2019 23:32:28 -0400 Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 450bPL6bGSz9sBp; Fri, 10 May 2019 13:32:22 +1000 (AEST) From: Michael Ellerman To: Yury Norov , Rafael Aquini Cc: Joel Savitz , linux-kernel@vger.kernel.org, Alexey Dobriyan , Andrew Morton , Vlastimil Babka , "Aneesh Kumar K.V" , Ram Pai , Andrea Arcangeli , Huang Ying , Sandeep Patil , linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status In-Reply-To: <20190508063716.GA3096@yury-thinkpad> References: <1557158023-23021-1-git-send-email-jsavitz@redhat.com> <20190507125430.GA31025@x230.aquini.net> <20190508063716.GA3096@yury-thinkpad> Date: Fri, 10 May 2019 13:32:22 +1000 Message-ID: <87k1ezugqh.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Yury Norov writes: > On Tue, May 07, 2019 at 08:54:31AM -0400, Rafael Aquini wrote: >> On Mon, May 06, 2019 at 11:53:43AM -0400, Joel Savitz wrote: >> > There is currently no easy and architecture-independent way to find the >> > lowest unusable virtual address available to a process without >> > brute-force calculation. This patch allows a user to easily retrieve >> > this value via /proc//status. >> > >> > Using this patch, any program that previously needed to waste cpu cycles >> > recalculating a non-sensitive process-dependent value already known to >> > the kernel can now be optimized to use this mechanism. >> > >> > Signed-off-by: Joel Savitz >> > --- >> > Documentation/filesystems/proc.txt | 2 ++ >> > fs/proc/task_mmu.c | 2 ++ >> > 2 files changed, 4 insertions(+) >> > >> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt >> > index 66cad5c86171..1c6a912e3975 100644 >> > --- a/Documentation/filesystems/proc.txt >> > +++ b/Documentation/filesystems/proc.txt >> > @@ -187,6 +187,7 @@ read the file /proc/PID/status: >> > VmLib: 1412 kB >> > VmPTE: 20 kb >> > VmSwap: 0 kB >> > + VmTaskSize: 137438953468 kB >> > HugetlbPages: 0 kB >> > CoreDumping: 0 >> > THP_enabled: 1 >> > @@ -263,6 +264,7 @@ Table 1-2: Contents of the status files (as of 4.19) >> > VmPTE size of page table entries >> > VmSwap amount of swap used by anonymous private data >> > (shmem swap usage is not included) >> > + VmTaskSize lowest unusable address in process virtual memory >> >> Can we change this help text to "size of process' virtual address space memory" ? > > Agree. Or go in other direction and make it VmEnd Yeah I think VmEnd would be clearer to folks who aren't familiar with the kernel's usage of the TASK_SIZE terminology. >> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> > index 95ca1fe7283c..0af7081f7b19 100644 >> > --- a/fs/proc/task_mmu.c >> > +++ b/fs/proc/task_mmu.c >> > @@ -74,6 +74,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) >> > seq_put_decimal_ull_width(m, >> > " kB\nVmPTE:\t", mm_pgtables_bytes(mm) >> 10, 8); >> > SEQ_PUT_DEC(" kB\nVmSwap:\t", swap); >> > + seq_put_decimal_ull_width(m, >> > + " kB\nVmTaskSize:\t", mm->task_size >> 10, 8); >> > seq_puts(m, " kB\n"); >> > hugetlb_report_usage(m, mm); >> > } > > I'm OK with technical part, but I still have questions not answered > (or wrongly answered) in v1 and v2. Below is the very detailed > description of the concerns I have. > > 1. What is the exact reason for it? Original version tells about some > test that takes so much time that you were able to drink a cup of > coffee before it was done. The test as you said implements linear > search to find the last page and so is of O(n). If it's only for some > random test, I think the kernel can survive without it. Do you have a > real example of useful programs that suffer without this information? > > > 2. I have nothing against taking breaks and see nothing weird if > ineffective algorithms take time. On my system (x86, Ubuntu) the last > mapped region according to /proc//maps is: > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > So to find the required address, we have to inspect 2559 pages. With a > binary search it would take 12 iterations at max. If my calculation is > wrong or your environment is completely different - please elaborate. I agree it should not be hard to calculate, but at the same time it's trivial for the kernel to export the information so I don't see why the kernel shouldn't. > 3. As far as I can see, Linux currently does not support dynamic > TASK_SIZE. It means that for any platform+ABI combination VmTaskSize > will be always the same. So VmTaskSize would be effectively useless waste > of lines. In fact, TASK SIZE is compiler time information and should > be exposed to user in headers. In discussion to v2 Rafael Aquini answered > for this concern that TASK_SIZE is a runtime resolved macro. It's > true, but my main point is: GCC knows what type of binary it compiles > and is able to select proper value. We are already doing similar things > where appropriate. Refer for example to my arm64/ilp32 series: > > arch/arm64/include/uapi/asm/bitsperlong.h: > -#define __BITS_PER_LONG 64 > +#if defined(__LP64__) > +/* Assuming __LP64__ will be defined for native ELF64's and not for ILP32. */ > +# define __BITS_PER_LONG 64 > +#elif defined(__ILP32__) > +# define __BITS_PER_LONG 32 > +#else > +# error "Neither LP64 nor ILP32: unsupported ABI in asm/bitsperlong.h" > +#endif > > __LP64__ and __ILP32__ are symbols provided by GCC to distinguish > between ABIs. So userspace is able to take proper __BITS_PER_LONG value > at compile time, not at runtime. I think, you can do the same thing for > TASK_SIZE. No you can't do it at compile time for TASK_SIZE. On powerpc a 64-bit program might run on a kernel built with 4K pages where TASK_SIZE is 64TB, and that same program can run on a kernel built with 64K pages where TASK_SIZE is 4PB. And it's not just determined by PAGE_SIZE, that same program might also run on an older kernel where TASK_SIZE with 64K pages was 512TB. cheers