Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1228877ybl; Tue, 13 Aug 2019 09:16:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqyyOvbLc7HNicIZ/vygEIMXTDyuoIqu8pCQQrqxOIzx8S6VDjTTy07+/GKA8GJ1KcbuFXAc X-Received: by 2002:a17:902:bb8e:: with SMTP id m14mr17566729pls.107.1565712978410; Tue, 13 Aug 2019 09:16:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565712978; cv=none; d=google.com; s=arc-20160816; b=wSwI8C0ej72z1/mSlxDfSBkinaHOEgqnz9qvtc7QmGVNUN8iTUejYP2bmvhZNjgYJF R5/IIB1OYYfW4BBs5XOL5AT8KrBFk6wJdJ6YatvstRWgaR737Bb+4ai953eyKNbDQHN/ 8sktrQbeJqoZXhXxOG10BnTJY4uEgf7K17QmYssHrCH6C3DfbQiaY95pUTXdl+9wLpUk XfkzrNUdvL/W999EuOIhjF3gWVdfs1/AI9FJ/XCSXlQ1yQXokt0BLEluZqdAYbznb2/h PdgMCVNsWD7Nk6YiV+cp5FwbwG1aOY+YaWquqxPVwldk3k7HbINNNyYDYnCq6zDwczT9 4iig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jMeI5ltMPEKZ0Tj88O24MXE9cWJp6pw2JTg4dO3y4Mo=; b=zfERBCdZDprBOgzJ1gs1DjiixvsTOVfywztFvNSY5dGTd4/vXFTMlUTvGBduBTb2Pt nL/ff11tF3uLWQ/HNTcIW+eLul1teompoKpvFDCepyS6RsFMWJvzgi/z5FGyHUtyF3Tz Jb3sVvMaBIZJOiVF1UreTpogV7wGdt+dgBC8+gcEvVzMiLxgPaPsl3JrAer736Oj3TOe QQhzfKCQhalpVsp7H2y0rRBYTLF4sYCXBVGgt2ExvEVHMdeC+hnGpijD9U97sL8AfPjC 9U8MLH+/9pUQJovNR53x7ugrFfpJeADUnmZqb5PSerpfYF3JPoKpkV5FHhy6j3lEkBBA LZKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="bImk/Y4v"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c13si1182867pjo.75.2019.08.13.09.16.01; Tue, 13 Aug 2019 09:16:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="bImk/Y4v"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728330AbfHMP3h (ORCPT + 99 others); Tue, 13 Aug 2019 11:29:37 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:34054 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727869AbfHMP3h (ORCPT ); Tue, 13 Aug 2019 11:29:37 -0400 Received: by mail-ot1-f68.google.com with SMTP id c7so1439262otp.1 for ; Tue, 13 Aug 2019 08:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jMeI5ltMPEKZ0Tj88O24MXE9cWJp6pw2JTg4dO3y4Mo=; b=bImk/Y4vshDj5gFBaBVzsjv6Rz+kwzqnWzrOBiLYKXqof3GfeusuSBS3ZiJ0pa2Biz +tCSIZX2wDwVr22g3BeKZT3HzY/3F3KR5WgLqESzTmOD6PixUCsy4sgNyuiEqlGA2Bjp O3h7Bxpj0KGCCfJLCDX5YuylzaDhD6YWh0+jnG/g8GPay7nwiIg92/zGE80yk/76C2Ir tInAOj2kgN2/l0orZzstbjsX+QeiJpnBb75cjQr+tZH8sYPtvAJiOVQTMuMyxL1xu9O/ oCa/4Px4H0ZvIlllpY5mSai4UcuTIufdPCdutaJGOguKuqlkfxkxQaDhfVzv45rlbMpZ AHIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jMeI5ltMPEKZ0Tj88O24MXE9cWJp6pw2JTg4dO3y4Mo=; b=V70QVgn2BghRpZjegJfcpmB+lDwPe7IMKrvH0uF+0eD834WA7dQl3GAODfoYx+164G RcCz/nK8RRhSyhpOJXomkj06mU53nGkWYxAWD9S8G+1riYscjYxxcPNkA3sOYc3w8mYe DvRqGmuJgzObvY+nfOafggFsfs4KqnRnoe3ix0bGqsu5HNq79rIgdWVe5DfqAIxiIpNF Pyc+uBDgbxZe+GIKgvVfY1KLcuMC5Jht6hSOzpOYhA5aztux03CQe+EtASkBq3GMzgPF qyLBlsRcI4Mm2vpFxAF1gG8Um6yfFU6C038/vfbKC82jdz2Bm4BGxXgbJrVB+1i2Mc2y JVFA== X-Gm-Message-State: APjAAAW7fC/WUUtxiEcJs7bOX+nuMSk0HjGSnmJa93VQQe59vSnZpM7s Tp/SvpEX3ZfWO0sptzEEsiFBvrK5mYV/eBO/TL0rcw== X-Received: by 2002:a9d:5a91:: with SMTP id w17mr35070043oth.32.1565710175793; Tue, 13 Aug 2019 08:29:35 -0700 (PDT) MIME-Version: 1.0 References: <20190807171559.182301-1-joel@joelfernandes.org> <20190813100856.GF17933@dhcp22.suse.cz> In-Reply-To: <20190813100856.GF17933@dhcp22.suse.cz> From: Jann Horn Date: Tue, 13 Aug 2019 17:29:09 +0200 Message-ID: Subject: Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index To: Michal Hocko , Daniel Gruss , "Joel Fernandes (Google)" Cc: kernel list , Alexey Dobriyan , Andrew Morton , Borislav Petkov , Brendan Gregg , Catalin Marinas , Christian Hansen , Daniel Colascione , fmayer@google.com, "H. Peter Anvin" , Ingo Molnar , Joel Fernandes , Jonathan Corbet , Kees Cook , kernel-team , Linux API , linux-doc@vger.kernel.org, linux-fsdevel , Linux-MM , Mike Rapoport , Minchan Kim , namhyung@google.com, "Paul E. McKenney" , Robin Murphy , Roman Gushchin , Stephen Rothwell , Suren Baghdasaryan , Thomas Gleixner , Todd Kjos , Vladimir Davydov , Vlastimil Babka , Will Deacon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 13, 2019 at 12:09 PM Michal Hocko wrote: > On Mon 12-08-19 20:14:38, Jann Horn wrote: > > On Wed, Aug 7, 2019 at 7:16 PM Joel Fernandes (Google) > > wrote: > > > The page_idle tracking feature currently requires looking up the pagemap > > > for a process followed by interacting with /sys/kernel/mm/page_idle. > > > Looking up PFN from pagemap in Android devices is not supported by > > > unprivileged process and requires SYS_ADMIN and gives 0 for the PFN. > > > > > > This patch adds support to directly interact with page_idle tracking at > > > the PID level by introducing a /proc//page_idle file. It follows > > > the exact same semantics as the global /sys/kernel/mm/page_idle, but now > > > looking up PFN through pagemap is not needed since the interface uses > > > virtual frame numbers, and at the same time also does not require > > > SYS_ADMIN. > > > > > > In Android, we are using this for the heap profiler (heapprofd) which > > > profiles and pin points code paths which allocates and leaves memory > > > idle for long periods of time. This method solves the security issue > > > with userspace learning the PFN, and while at it is also shown to yield > > > better results than the pagemap lookup, the theory being that the window > > > where the address space can change is reduced by eliminating the > > > intermediate pagemap look up stage. In virtual address indexing, the > > > process's mmap_sem is held for the duration of the access. > > > > What happens when you use this interface on shared pages, like memory > > inherited from the zygote, library file mappings and so on? If two > > profilers ran concurrently for two different processes that both map > > the same libraries, would they end up messing up each other's data? > > Yup PageIdle state is shared. That is the page_idle semantic even now > IIRC. > > > Can this be used to observe which library pages other processes are > > accessing, even if you don't have access to those processes, as long > > as you can map the same libraries? I realize that there are already a > > bunch of ways to do that with side channels and such; but if you're > > adding an interface that allows this by design, it seems to me like > > something that should be gated behind some sort of privilege check. > > Hmm, you need to be priviledged to get the pfn now and without that you > cannot get to any page so the new interface is weakening the rules. > Maybe we should limit setting the idle state to processes with the write > status. Or do you think that even observing idle status is useful for > practical side channel attacks? If yes, is that a problem of the > profiler which does potentially dangerous things? I suppose read-only access isn't a real problem as long as the profiler isn't writing the idle state in a very tight loop... but I don't see a usecase where you'd actually want that? As far as I can tell, if you can't write the idle state, being able to read it is pretty much useless. If the profiler only wants to profile process-private memory, then that should be implementable in a safe way in principle, I think, but since Joel said that they want to profile CoW memory as well, I think that's inherently somewhat dangerous.