Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp8359903ybi; Tue, 23 Jul 2019 07:15:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqyVQQRzc71vOcRs24DAf7Cg1IQmZ2IYJglxcQIizpuCtsUqFOb8hI+Vb6S+cr6myvOMw8lv X-Received: by 2002:a17:902:f204:: with SMTP id gn4mr81516434plb.3.1563891345255; Tue, 23 Jul 2019 07:15:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563891345; cv=none; d=google.com; s=arc-20160816; b=A5TvCclfYQKD4sbeRvkeKusDiV1f0QX6sG2jSB09Fci/UBvY14yYOK5TgNRJpASKyY jwzp6iDK2E+lzpp1iCnpjwZygkIaGVmL61BNscllhnwJX8qcDCjuu6yQpadC8+wU2XEE O8I6YA5jzMjlX1xIzdIssayU4Dllf/4ujvpwIP5fy1APy/ir6qjp2+/hzqkRrEo9gsvu 314JK47S77NaRkTqmv5C7ymqZPCJdMR70+7TWcSSrCqjGqMIRzDrwDU8lC09EKXTthja 88oyS69JTgyIuIKAnlfiPKuVmPSgfqHom47XQ/2v+eYTVWSaGu/EJvux/Cp6DF/bkEsN iECw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=wZavSHwrM0Rlryz5bWLqrhSScYMHD+KWghq/h1q8GpU=; b=xGLfqUWad/WjZuDm3FhKfKAablZiLh/s24jLu352TWr98t3erUnEezXekp88ybP0Sc 847+wNHB6xoAiJEaHVp0f3l9OcYXF+NLYxzIZcdaxz4KOp9mIghTj/mPaWbQbovIqqhr 0Mwx6+rzQiACOgF3LXGqStMiNBbXSu8FWxxcwbqwlsAAqWnGAgxGD62x8y8H1an25YaJ 356zfsMILXsYNdHcoBvzu64E0HJfSM1vNO3uig2RoOh4JsGLsCVpw3QD4UO/G7Hj6y7U nXieYWoqNlZOPl+fuh+lApRFBa5N+AVyPdKvqywuaHQMOlkVqMOiJ8aKbe8sODUd9lFm v5nQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 67si14186740plf.400.2019.07.23.07.15.27; Tue, 23 Jul 2019 07:15:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387929AbfGWGFa (ORCPT + 99 others); Tue, 23 Jul 2019 02:05:30 -0400 Received: from mx2.suse.de ([195.135.220.15]:49662 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725788AbfGWGFa (ORCPT ); Tue, 23 Jul 2019 02:05:30 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 817ADAF35; Tue, 23 Jul 2019 06:05:27 +0000 (UTC) Date: Tue, 23 Jul 2019 08:05:25 +0200 From: Michal Hocko To: "Joel Fernandes (Google)" Cc: linux-kernel@vger.kernel.org, vdavydov.dev@gmail.com, Brendan Gregg , kernel-team@android.com, Alexey Dobriyan , Al Viro , Andrew Morton , carmenjackson@google.com, Christian Hansen , Colin Ian King , dancol@google.com, David Howells , fmayer@google.com, joaodias@google.com, joelaf@google.com, Jonathan Corbet , Kees Cook , Kirill Tkhai , Konstantin Khlebnikov , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport , minchan@google.com, minchan@kernel.org, namhyung@google.com, sspatil@google.com, surenb@google.com, Thomas Gleixner , timmurray@google.com, tkjos@google.com, Vlastimil Babka , wvw@google.com, linux-api@vger.kernel.org Subject: Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing Message-ID: <20190723060525.GA4552@dhcp22.suse.cz> References: <20190722213205.140845-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190722213205.140845-1-joel@joelfernandes.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Cc linux-api - please always do CC this list when introducing a user visible API] On Mon 22-07-19 17:32:04, Joel Fernandes (Google) wrote: > The page_idle tracking feature currently requires looking up the pagemap > for a process followed by interacting with /sys/kernel/mm/page_idle. > This is quite cumbersome and can be error-prone too. If between > accessing the per-PID pagemap and the global page_idle bitmap, if > something changes with the page then the information is not accurate. > More over looking up PFN from pagemap in Android devices is not > supported by unprivileged process and requires SYS_ADMIN and gives 0 for > the PFN. > > This patch adds support to directly interact with page_idle tracking at > the PID level by introducing a /proc//page_idle file. This > eliminates the need for userspace to calculate the mapping of the page. > It follows the exact same semantics as the global > /sys/kernel/mm/page_idle, however it is easier to use for some usecases > where looking up PFN is not needed and also does not require SYS_ADMIN. > It ended up simplifying userspace code, solving the security issue > mentioned and works quite well. SELinux does not need to be turned off > since no pagemap look up is needed. > > In Android, we are using this for the heap profiler (heapprofd) which > profiles and pin points code paths which allocates and leaves memory > idle for long periods of time. > > Documentation material: > The idle page tracking API for virtual address indexing using virtual page > frame numbers (VFN) is located at /proc//page_idle. It is a bitmap > that follows the same semantics as /sys/kernel/mm/page_idle/bitmap > except that it uses virtual instead of physical frame numbers. > > This idle page tracking API can be simpler to use than physical address > indexing, since the pagemap for a process does not need to be looked up > to mark or read a page's idle bit. It is also more accurate than > physical address indexing since in physical address indexing, address > space changes can occur between reading the pagemap and reading the > bitmap. In virtual address indexing, the process's mmap_sem is held for > the duration of the access. I didn't get to read the actual code but the overall idea makes sense to me. I can see this being useful for userspace memory management (along with remote MADV_PAGEOUT, MADV_COLD). Normally I would object that a cumbersome nature of the existing interface can be hidden in a userspace but I do agree that rowhammer has made this one close to unusable for anything but a privileged process. I do not think you can make any argument about accuracy because the information will never be accurate. Sure the race window is smaller in principle but you can hardly say anything about how much or whether at all. Thanks. -- Michal Hocko SUSE Labs