Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp484814ybl; Wed, 14 Aug 2019 00:56:55 -0700 (PDT) X-Google-Smtp-Source: APXvYqwXvfnVHcDbrnxSJSseHLTSHnL/jsMyBZD+Sl5agAArKu1d77eODaeUC2HrF+CdKGeIel0+ X-Received: by 2002:a65:4b8b:: with SMTP id t11mr37644871pgq.130.1565769415166; Wed, 14 Aug 2019 00:56:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565769415; cv=none; d=google.com; s=arc-20160816; b=sjOLRpt174CquarE9JnFGsdGtT4xed5YTnvxqEGyPoc+YYHOA6lQV0RjRz7xIC2An5 kOA0H7rfekOAnO6i9YJE6iKUvD9gYlm801YTJGZEmG+u+WQHvE4Zv8lQ2DEUGV0azKUX 5onxWwKuIn/RbjjY6/ScEBLiejuK+0chNSd+i9vDPsn3Vqbz4Zgww2i9DumEzx2ttqae IL/gPXGB2LARfXKUG0f6UmIjnaPrkkBQpEjkLkIZC0Xd49zL5w+U2qrojAOjU5MEbWHD ywnfWThSVz2/1DWioCurBN73rLO+S3fC7Z3Ctjqz1C6UQ/3aDUzLbkmQz1B4OWRTt2iq MUug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VqgwxkzwOq7hhpPg6c+vy7EY8G5CeC407hKASpLmzFA=; b=ASakiyFqHfgXK+0KePz1yc47ozF9YR00b16onct96MvXXYG4LGdixhQjHPy1HfZs++ AmLQU6643tcTW7OzhzwGPVsx6A04RzijN+9aoc87ylAd/ACgAIROzYbMwRYCkCfWnEqH Gidi2vLff6Wc44sPigBuQGV6UW2s99C5ghFsAycE/3BXDNug6XFSGqzDs9dpDH+09bem UNaj4vODjJL/Vn+MFcjbBp+v01q3PEVjI79fL28Q410fKGkOAwH5FrixQaTpv4RDnPhG KmOxAP9UJnpLXtZ7+4M3GzBB8kD+MfnCl6iJJwAyorGdRi7tv0FRAYWXxI5+k5lFwUKR NZUQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y7si25706106pgj.486.2019.08.14.00.56.39; Wed, 14 Aug 2019 00:56:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726610AbfHNH4G (ORCPT + 99 others); Wed, 14 Aug 2019 03:56:06 -0400 Received: from mx2.suse.de ([195.135.220.15]:33080 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725928AbfHNH4G (ORCPT ); Wed, 14 Aug 2019 03:56:06 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2336EACBA; Wed, 14 Aug 2019 07:56:03 +0000 (UTC) Date: Wed, 14 Aug 2019 09:56:01 +0200 From: Michal Hocko To: Jann Horn Cc: Daniel Gruss , "Joel Fernandes (Google)" , kernel list , Alexey Dobriyan , Andrew Morton , Borislav Petkov , Brendan Gregg , Catalin Marinas , Christian Hansen , Daniel Colascione , fmayer@google.com, "H. Peter Anvin" , Ingo Molnar , Joel Fernandes , Jonathan Corbet , Kees Cook , kernel-team , Linux API , linux-doc@vger.kernel.org, linux-fsdevel , Linux-MM , Mike Rapoport , Minchan Kim , namhyung@google.com, "Paul E. McKenney" , Robin Murphy , Roman Gushchin , Stephen Rothwell , Suren Baghdasaryan , Thomas Gleixner , Todd Kjos , Vladimir Davydov , Vlastimil Babka , Will Deacon Subject: Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index Message-ID: <20190814075601.GO17933@dhcp22.suse.cz> References: <20190807171559.182301-1-joel@joelfernandes.org> <20190813100856.GF17933@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 13-08-19 17:29:09, Jann Horn wrote: > On Tue, Aug 13, 2019 at 12:09 PM Michal Hocko wrote: > > On Mon 12-08-19 20:14:38, Jann Horn wrote: > > > On Wed, Aug 7, 2019 at 7:16 PM Joel Fernandes (Google) > > > wrote: > > > > The page_idle tracking feature currently requires looking up the pagemap > > > > for a process followed by interacting with /sys/kernel/mm/page_idle. > > > > Looking up PFN from pagemap in Android devices is not supported by > > > > unprivileged process and requires SYS_ADMIN and gives 0 for the PFN. > > > > > > > > This patch adds support to directly interact with page_idle tracking at > > > > the PID level by introducing a /proc//page_idle file. It follows > > > > the exact same semantics as the global /sys/kernel/mm/page_idle, but now > > > > looking up PFN through pagemap is not needed since the interface uses > > > > virtual frame numbers, and at the same time also does not require > > > > SYS_ADMIN. > > > > > > > > In Android, we are using this for the heap profiler (heapprofd) which > > > > profiles and pin points code paths which allocates and leaves memory > > > > idle for long periods of time. This method solves the security issue > > > > with userspace learning the PFN, and while at it is also shown to yield > > > > better results than the pagemap lookup, the theory being that the window > > > > where the address space can change is reduced by eliminating the > > > > intermediate pagemap look up stage. In virtual address indexing, the > > > > process's mmap_sem is held for the duration of the access. > > > > > > What happens when you use this interface on shared pages, like memory > > > inherited from the zygote, library file mappings and so on? If two > > > profilers ran concurrently for two different processes that both map > > > the same libraries, would they end up messing up each other's data? > > > > Yup PageIdle state is shared. That is the page_idle semantic even now > > IIRC. > > > > > Can this be used to observe which library pages other processes are > > > accessing, even if you don't have access to those processes, as long > > > as you can map the same libraries? I realize that there are already a > > > bunch of ways to do that with side channels and such; but if you're > > > adding an interface that allows this by design, it seems to me like > > > something that should be gated behind some sort of privilege check. > > > > Hmm, you need to be priviledged to get the pfn now and without that you > > cannot get to any page so the new interface is weakening the rules. > > Maybe we should limit setting the idle state to processes with the write > > status. Or do you think that even observing idle status is useful for > > practical side channel attacks? If yes, is that a problem of the > > profiler which does potentially dangerous things? > > I suppose read-only access isn't a real problem as long as the > profiler isn't writing the idle state in a very tight loop... but I > don't see a usecase where you'd actually want that? As far as I can > tell, if you can't write the idle state, being able to read it is > pretty much useless. > > If the profiler only wants to profile process-private memory, then > that should be implementable in a safe way in principle, I think, but > since Joel said that they want to profile CoW memory as well, I think > that's inherently somewhat dangerous. I cannot really say how useful that would be but I can see that implementing ownership checks would be really non-trivial for shared pages. Reducing the interface to exclusive pages would make it easier as you noted but less helpful. Besides that the attack vector shouldn't be really much different from the page cache access, right? So essentially can_do_mincore model. I guess we want to document that page idle tracking should be used with care because it potentially opens a side channel opportunity if used on sensitive data. -- Michal Hocko SUSE Labs