Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965105AbbD0UL1 (ORCPT ); Mon, 27 Apr 2015 16:11:27 -0400 Received: from mail-lb0-f178.google.com ([209.85.217.178]:36472 "EHLO mail-lb0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964944AbbD0ULZ (ORCPT ); Mon, 27 Apr 2015 16:11:25 -0400 MIME-Version: 1.0 In-Reply-To: <20150427124656.GC15288@dhcp22.suse.cz> References: <20150420214651.GA4215@kroah.com> <20150421103519.5b0de5ea@lxorguk.ukuu.org.uk> <20150421122031.GA32624@dhcp22.suse.cz> <20150421142748.GB32624@dhcp22.suse.cz> <20150422145701.GC25105@dhcp22.suse.cz> <20150427124656.GC15288@dhcp22.suse.cz> From: Andy Lutomirski Date: Mon, 27 Apr 2015 13:11:03 -0700 Message-ID: Subject: Re: [GIT PULL] kdbus for 4.1-rc1 To: Michal Hocko Cc: Arnd Bergmann , "linux-kernel@vger.kernel.org" , Jiri Kosina , Andrew Morton , Al Viro , Daniel Mack , Borislav Petkov , One Thousand Gnomes , Linus Torvalds , Richard Weinberger , Tom Gundersen , Steven Rostedt , Greg Kroah-Hartman , "Eric W. Biederman" , David Herrmann , Djalal Harouni Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7053 Lines: 133 [resent without HTML] On Apr 27, 2015 5:46 AM, "Michal Hocko" wrote: > > On Wed 22-04-15 12:36:12, Andy Lutomirski wrote: > > On Apr 22, 2015 7:57 AM, "Michal Hocko" wrote: > > > > > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote: > > > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko wrote: > > > > > On Tue 21-04-15 16:01:01, David Herrmann wrote: > > > > >> Hi > > > > >> > > > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko wrote: > > > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: > > > > >> >> Hi > > > > >> >> > > > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > > > > >> >> wrote: > > > > >> >> >> On top of that, I think that someone into resource management needs to > > > > >> >> >> seriously consider whether having a broadcast send do get_user_pages > > > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > > > > >> >> >> is a good idea. > > > > >> >> > > > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or > > > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures > > > > >> >> > referenced 8) > > > > >> >> > > > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is > > > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > > > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal > > > > >> >> issue that does not belong here. > > > > >> > > > > > >> > It kind of does AFAIU. > > > > >> > > > > >> No, it is not. The issue with GUP is that you elevate the page > > > > >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > > > > > > > > > The point was that such a memory might be not present yet and need a > > > > > page fault with all the side effects - memory reclaim, memcg charge... > > > > > > > > > >> I cannot see how it is related to kdbus. However, ... > > > > >> > > > > >> > If for nothing else then the memcg reasons mentioned in > > > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > > > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't > > > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in > > > > >> > a different user's context then you basically allow to hide memory > > > > >> > allocations from the memcg. That is a clear show stopper. > > > > >> > > > > > >> > Or have I misunderstood the way how shmem buffers are used here? > > > > >> > > > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are > > > > >> used as receive-buffers by kdbus peers. They are read-only to > > > > >> user-space. All allocations are done by the kernel on message passing. > > > > > > > > > > OK, so the shmem buffer is allocated on the kernels behalf and under > > > > > its control and no userspace can hand over one to kdbus. Do I get > > > > > it right? If yes then the memcg escape I was describing above is > > > > > not possible of course. This wasn't clear to me from the previous > > > > > discussion. Thanks for the clarification! > > > > > > > > I'm still missing something here, I think. At the time of pool > > > > creation, the kernel calls shmem_file_setup in the context of the > > > > untrusted user. Then, when a privileged daemon broadcasts, the kernel > > > > calls vfs_iter_write or similar, thus allocating the page, right? I > > > > don't see why the page would be allocated early or why vfs_iter_write > > > > and the associated shmem code would care what memcg created the shmem > > > > file -- all of that code seems to use current's memcg on brief > > > > inspection. > > > > > > Yes it is the current task on the first charge or the original memcg on > > > the swap in. But my understanding from the above, and I haven't read the > > > code yet, is that the untrusted userspace is only reader from the buffer > > > and isn't allowed to modify the buffer. > > > > > > > Bear in mind that the bad guy gets to use madvise, etc to mess around > > > > with the page cache state. > > > > > > How can an untrusted user play with shmem when it is read-only? > > > shmem_file_setup shouls create an unlinked file so no process can access > > > it via tmpfs AFAIU and potentially fault the memory before the producent > > > will fill it up (thus fault in in the trusted context). I have no idea > > > how the receiver gets to the buffer though. > > > > > > > The receiver gets to mmap the buffer. I'm not sure what protection they get. > > OK, so I've checked the code. kdbus_pool_new sets up a shmem file > (unlinked) so not visible externally. The consumer will get it via mmap > on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and > clears VM_MAYWRITE. The receiver even doesn't have access to the shmem > file directly. > > It is ugly that kdbus_pool_mmap replaces the original vm_file and make > it point to the shmem file. I am not sure whether this is safe all the > time and it would deserve a big fat comment. On the other hand, it seems > some drivers are doing this already (e.g. dma_buf_mmap). What happens to map_files in proc? It seems unlikely that CRIU would ever work on dma_buf, but this could be a problem for CRIU with kdbus. > > > The thing I'm worried about is that the receiver might deliberately > > avoid faulting in a bunch of pages and instead wait for the producer > > to touch them, causing pages that logically belong to the receiver to > > be charged to the producer instead. > > Hmm, now that I am looking into the code it seems you are right. E.g. > kdbus_cmd_send runs in the context of the sender AFAIU. This gets down > to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this > is where we get to charge the memory. AFAIU the terminology all the > receivers will share the same shmem file when mmaping the endpoint. > > This, however, doesn't seem to be exploitable to hide memory charges > because the receiver cannot make the buffer writable. A nasty process > with a small memcg limit could still pre-fault the memory before any > writer gets sends a message and slow the whole endpoint traffic. But > that wouldn't be a completely new thing because processes might hammer > on memory even without memcg... It is just that this would be kind of > easier with memcg. > If that is the concern then the buffer should be pre-charged at the time > when it is created. The attach I had in mind was that the nasty process with a small memcg creates one or many of these and doesn't pre-fault it. Then a sender (systemd?) sends messages and they get charged, possibly once for each copy sent, to the root memcg. So kdbus should probably pre-charge the creator of the pool. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/