Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp2796339pxb; Tue, 12 Oct 2021 13:44:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyelZelgpGaT1RgN3LNHq4lQlZGJw77l3VyEt6Lu0rKoJptcXBraNVKg9GhPDhMDziE/1CA X-Received: by 2002:a17:906:8a79:: with SMTP id hy25mr14272635ejc.371.1634071480513; Tue, 12 Oct 2021 13:44:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634071480; cv=none; d=google.com; s=arc-20160816; b=NxBXmf2qIGLoiiScsPtFPl84/eAXtia9ym1NA/xOWYqkWum8YgJtyjKijJVfoHb5PE jYe3gx2qcn0osF8TGM8m3ioRpU/mhGmuCpRHBGhBnmCkx0PgH0X8C78MkDIeeH4IOHLK 9KXDrDEgHh1YFsLGBc8WvQi0AR6dUSmH1gqZVdIWcIvXgT5DId7LVHnar31e/EfRMWmd wINDTum7/831GuVNytCczi2LGghOs30Hf70hdOWNVyKaLmmtxb/IsiVbTOF8qeu/x/y9 otKK/ffE0m1oYTIP16RjmQWkEFdmsT94Sr1XfipXVhtBZyO5TXXEw7QfrP4ZjWrAGtTG LZAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=8ogQ4mnsQj8ETlpyzS2eA3WAs/OwqDsrN2VK9OaY8oc=; b=T/fMXHA9egI8OHeG4H5/EYaOSluP7ktmRNQYT6QtvzdFsHItS68vlQaKeiGDG52lHs 5OgLgsfc/5XKkrxsEas+dYmIWGvxvvcyEF6PJvyBfm61BAWqJdWqQu0QjDMxtb5fuR6Y IvuWaDk93qwymZd0y+8SBfASbTc0+/BWYln0ysAikKNjGXtAOhORQdWxULscaIH6NcIJ JtC9ORmXB5ZzUuLydHRQuYg2BFoDZdGhJNrfW8+n67y5Y/XLd1PmCiC1idDzU1UeUUgI 6HGn3B3z4ZRM92c4/HSzAqn24SdrYA5o72dJm/ZqJOF6SAzpwoV0PFOT6DjFZO/sBvQf W9Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=GwAX9Yx4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gb9si1009866ejc.301.2021.10.12.13.44.16; Tue, 12 Oct 2021 13:44:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=GwAX9Yx4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233709AbhJLUnN (ORCPT + 99 others); Tue, 12 Oct 2021 16:43:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233140AbhJLUnN (ORCPT ); Tue, 12 Oct 2021 16:43:13 -0400 Received: from mail-qt1-x834.google.com (mail-qt1-x834.google.com [IPv6:2607:f8b0:4864:20::834]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD4C9C061745 for ; Tue, 12 Oct 2021 13:41:10 -0700 (PDT) Received: by mail-qt1-x834.google.com with SMTP id c20so636897qtb.2 for ; Tue, 12 Oct 2021 13:41:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=8ogQ4mnsQj8ETlpyzS2eA3WAs/OwqDsrN2VK9OaY8oc=; b=GwAX9Yx4n+IDjslEE1dZgCNFztsk78R2Ca7CtDxH6fG6p+ZRILW1rOW02m8wCS3SHt b65b57EQ1PQYtZjcPWlBWNmCjobMnxCmwODxExtlAr1Q9PW9hVxWyWez6d+9cG897VsV 6oCMBToXko/4s5pHzlj9zG35dPapqo9FB1YgKsWjOhKycJZI4OuXCkQc75kNPOGQxSb6 3UwOO+pErNGusoUTCLNqY7rOYZg4yAxI9UqfXgzAp0ivwLMA8Z5NB9hThvmuipuZIJUK t6G/bdjJbvZzQ8xzCVDvq9zbqWOk5pxd1mAXn+tjshJMnl+c4CVWfNQ/ILvSHEyc4MuU Ydaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=8ogQ4mnsQj8ETlpyzS2eA3WAs/OwqDsrN2VK9OaY8oc=; b=K/Zu0LM2y/BOvSvxa5nGFfFBA/wkJZBGmx7JhjT0yubCSdGcyVDDOVuCsYc2TUPuze 1h6XdLjptDr/R+p3K1mNWrV0OJrGF6BJb9KI/G8ozIh2GbWLAgvFg5aNrbzNjEGWElyT n9QF6HuDyROu/iB1P/0HOyjZ29yTAr21EBxz+EHShEKsphv3DMWVWZORe0zc44uJPXk9 MUTp2HSj/k28fJ7VgHD/GQBNA/ANNRBflV+vpYYdRK7O8Eg4HJJh1R+bRRdL0/6psbS3 qTnzJ+PdgRjNqyfb/XEZ0TfiroOsbaUmo3gcZsUj2BwGSAcRuSumJ0LD4Up6+OKAdfu7 OVCg== X-Gm-Message-State: AOAM531BmOemfm0YYX/Z0IIF7D/Ts3Wz130SkGBHSDVD+VNH7e6Sj9gl /z8KHQqvejhPNA8PloNZmHMlcA== X-Received: by 2002:ac8:6bcc:: with SMTP id b12mr25518988qtt.101.1634071269872; Tue, 12 Oct 2021 13:41:09 -0700 (PDT) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id 74sm6233475qke.109.2021.10.12.13.41.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Oct 2021 13:41:08 -0700 (PDT) Date: Tue, 12 Oct 2021 16:41:07 -0400 From: Johannes Weiner To: Suren Baghdasaryan Cc: Michal Hocko , Kees Cook , Pavel Machek , Rasmus Villemoes , David Hildenbrand , John Hubbard , Andrew Morton , Colin Cross , Sumit Semwal , Dave Hansen , Matthew Wilcox , "Kirill A . Shutemov" , Vlastimil Babka , Jonathan Corbet , Al Viro , Randy Dunlap , Kalesh Singh , Peter Xu , rppt@kernel.org, Peter Zijlstra , Catalin Marinas , vincenzo.frascino@arm.com, Chinwen Chang =?utf-8?B?KOW8temMpuaWhyk=?= , Axel Rasmussen , Andrea Arcangeli , Jann Horn , apopple@nvidia.com, Yu Zhao , Will Deacon , fenghua.yu@intel.com, thunder.leizhen@huawei.com, Hugh Dickins , feng.tang@intel.com, Jason Gunthorpe , Roman Gushchin , Thomas Gleixner , krisman@collabora.com, Chris Hyser , Peter Collingbourne , "Eric W. Biederman" , Jens Axboe , legion@kernel.org, Rolf Eike Beer , Cyrill Gorcunov , Muchun Song , Viresh Kumar , Thomas Cedeno , sashal@kernel.org, cxfcosmos@gmail.com, LKML , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm , kernel-team , Tim Murray Subject: Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting Message-ID: References: <202110071111.DF87B4EE3@keescook> <202110081344.FE6A7A82@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 12, 2021 at 11:52:42AM -0700, Suren Baghdasaryan wrote: > On Tue, Oct 12, 2021 at 11:26 AM Johannes Weiner wrote: > > > > On Mon, Oct 11, 2021 at 10:36:24PM -0700, Suren Baghdasaryan wrote: > > > On Mon, Oct 11, 2021 at 8:00 PM Johannes Weiner wrote: > > > > > > > > On Mon, Oct 11, 2021 at 06:20:25PM -0700, Suren Baghdasaryan wrote: > > > > > On Mon, Oct 11, 2021 at 6:18 PM Suren Baghdasaryan wrote: > > > > > > > > > > > > On Mon, Oct 11, 2021 at 1:36 AM Michal Hocko wrote: > > > > > > > > > > > > > > On Fri 08-10-21 13:58:01, Kees Cook wrote: > > > > > > > > - Strings for "anon" specifically have no required format (this is good) > > > > > > > > it's informational like the task_struct::comm and can (roughly) > > > > > > > > anything. There's no naming convention for memfds, AF_UNIX, etc. Why > > > > > > > > is one needed here? That seems like a completely unreasonable > > > > > > > > requirement. > > > > > > > > > > > > > > I might be misreading the justification for the feature. Patch 2 is > > > > > > > talking about tools that need to understand memeory usage to make > > > > > > > further actions. Also Suren was suggesting "numbering convetion" as an > > > > > > > argument against. > > > > > > > > > > > > > > So can we get a clear example how is this being used actually? If this > > > > > > > is just to be used to debug by humans than I can see an argument for > > > > > > > human readable form. If this is, however, meant to be used by tools to > > > > > > > make some actions then the argument for strings is much weaker. > > > > > > > > > > > > The simplest usecase is when we notice that a process consumes more > > > > > > memory than usual and we do "cat /proc/$(pidof my_process)/maps" to > > > > > > check which area is contributing to this growth. The names we assign > > > > > > to anonymous areas are descriptive enough for a developer to get an > > > > > > idea where the increased consumption is coming from and how to proceed > > > > > > with their investigation. > > > > > > There are of course cases when tools are involved, but the end-user is > > > > > > always a human and the final report should contain easily > > > > > > understandable data. > > > > > > > > > > > > IIUC, the main argument here is whether the userspace can provide > > > > > > tools to perform the translations between ids and names, with the > > > > > > kernel accepting and reporting ids instead of strings. Technically > > > > > > it's possible, but to be practical that conversion should be fast > > > > > > because we will need to make name->id conversion potentially for each > > > > > > mmap. On the consumer side the performance is not as critical, but the > > > > > > fact that instead of dumping /proc/$pid/maps we will have to parse the > > > > > > file, do id->name conversion and replace all [anon:id] with > > > > > > [anon:name] would be an issue when we do that in bulk, for example > > > > > > when collecting system-wide data for a bugreport. > > > > > > > > Is that something you need to do client-side? Or could the bug tool > > > > upload the userspace-maintained name:ids database alongside the > > > > /proc/pid/maps dump for external processing? > > > > > > You can generate a bugreport and analyze it locally or submit it as an > > > attachment to a bug for further analyzes. > > > Sure, we can attach the id->name conversion table to the bugreport but > > > either way, some tool would have to post-process it to resolve the > > > ids. If we are not analyzing the results immediately then that step > > > can be postponed and I think that's what you mean? If so, then yes, > > > that is correct. > > > > Right, somebody needs to do it at some point, but I suppose it's less > > of a problem if a developer machine does it than a mobile device. > > True, and that's why I mentioned that it's not as critical as the > efficiency at mmap() time. In any case, if we could avoid translations > at all that would be ideal. > > > > > One advantage of an ID over a string - besides not having to maintain > > a deduplicating arbitrary string storage in the kernel - is that we > > may be able to auto-assign unique IDs to VMAs in the kernel, in a way > > that we could not with strings. You'd still have to do IPC calls to > > write new name mappings into your db, but you wouldn't have to do the > > prctl() to assign stuff in the kernel at all. > > You still have to retrieve that tag from the kernel to record it in > your db, so this would still require some syscall, no? Don't you have to do this with the string setting interface as well? How do you know the vma address to pass into the prctl()? Is this somehow coordinated with the mmap()? > > (We'd have to think of a solution of how IDs work with vma merging and > > splitting, but I think to a certain degree that's policy and we should > > be able to find something workable - a MAP_ID flag, using anon_vma as > > identity, assigning IDs at mmap time and do merges only for protection > > changes etc. etc.) > > Overall, I think keeping the kernel out of this and letting it treat > this tag as a cookie which only userspace cares about is simpler. > Unless you see other uses where kernel's involvement is needed. It depends on what you consider keeping the kernel out of it. A small extension to assign unique IDs to mappings automatically in an intuitive way (with a compat option to disable) is a much smaller ABI commitment than a prctl()-controlled string storage. When I say policy on how to assign the ID, I didn't mean that it should be a free for all. Rather that we should pick one reasonable way to do it, comparable to picking the parameters for how long the stored strings could be, which characters to allow etc.