Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758025AbbDVTgh (ORCPT ); Wed, 22 Apr 2015 15:36:37 -0400 Received: from mail-la0-f49.google.com ([209.85.215.49]:34379 "EHLO mail-la0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757983AbbDVTge (ORCPT ); Wed, 22 Apr 2015 15:36:34 -0400 MIME-Version: 1.0 In-Reply-To: <20150422145701.GC25105@dhcp22.suse.cz> References: <20150420205638.GA3015@kroah.com> <55356CC1.1040301@nod.at> <20150420214651.GA4215@kroah.com> <20150421103519.5b0de5ea@lxorguk.ukuu.org.uk> <20150421122031.GA32624@dhcp22.suse.cz> <20150421142748.GB32624@dhcp22.suse.cz> <20150422145701.GC25105@dhcp22.suse.cz> From: Andy Lutomirski Date: Wed, 22 Apr 2015 12:36:12 -0700 Message-ID: Subject: Re: [GIT PULL] kdbus for 4.1-rc1 To: Michal Hocko Cc: Arnd Bergmann , "linux-kernel@vger.kernel.org" , Jiri Kosina , Andrew Morton , Al Viro , Daniel Mack , Borislav Petkov , One Thousand Gnomes , Linus Torvalds , Richard Weinberger , Tom Gundersen , Steven Rostedt , Greg Kroah-Hartman , "Eric W. Biederman" , David Herrmann , Djalal Harouni Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4675 Lines: 97 On Apr 22, 2015 7:57 AM, "Michal Hocko" wrote: > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote: > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko wrote: > > > On Tue 21-04-15 16:01:01, David Herrmann wrote: > > >> Hi > > >> > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko wrote: > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: > > >> >> Hi > > >> >> > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > > >> >> wrote: > > >> >> >> On top of that, I think that someone into resource management needs to > > >> >> >> seriously consider whether having a broadcast send do get_user_pages > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > > >> >> >> is a good idea. > > >> >> > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures > > >> >> > referenced 8) > > >> >> > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal > > >> >> issue that does not belong here. > > >> > > > >> > It kind of does AFAIU. > > >> > > >> No, it is not. The issue with GUP is that you elevate the page > > >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > > > > > The point was that such a memory might be not present yet and need a > > > page fault with all the side effects - memory reclaim, memcg charge... > > > > > >> I cannot see how it is related to kdbus. However, ... > > >> > > >> > If for nothing else then the memcg reasons mentioned in > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in > > >> > a different user's context then you basically allow to hide memory > > >> > allocations from the memcg. That is a clear show stopper. > > >> > > > >> > Or have I misunderstood the way how shmem buffers are used here? > > >> > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are > > >> used as receive-buffers by kdbus peers. They are read-only to > > >> user-space. All allocations are done by the kernel on message passing. > > > > > > OK, so the shmem buffer is allocated on the kernels behalf and under > > > its control and no userspace can hand over one to kdbus. Do I get > > > it right? If yes then the memcg escape I was describing above is > > > not possible of course. This wasn't clear to me from the previous > > > discussion. Thanks for the clarification! > > > > I'm still missing something here, I think. At the time of pool > > creation, the kernel calls shmem_file_setup in the context of the > > untrusted user. Then, when a privileged daemon broadcasts, the kernel > > calls vfs_iter_write or similar, thus allocating the page, right? I > > don't see why the page would be allocated early or why vfs_iter_write > > and the associated shmem code would care what memcg created the shmem > > file -- all of that code seems to use current's memcg on brief > > inspection. > > Yes it is the current task on the first charge or the original memcg on > the swap in. But my understanding from the above, and I haven't read the > code yet, is that the untrusted userspace is only reader from the buffer > and isn't allowed to modify the buffer. > > > Bear in mind that the bad guy gets to use madvise, etc to mess around > > with the page cache state. > > How can an untrusted user play with shmem when it is read-only? > shmem_file_setup shouls create an unlinked file so no process can access > it via tmpfs AFAIU and potentially fault the memory before the producent > will fill it up (thus fault in in the trusted context). I have no idea > how the receiver gets to the buffer though. > The receiver gets to mmap the buffer. I'm not sure what protection they get. The thing I'm worried about is that the receiver might deliberately avoid faulting in a bunch of pages and instead wait for the producer to touch them, causing pages that logically belong to the receiver to be charged to the producer instead. --Andy > -- > Michal Hocko > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/