Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932785AbbD0MrD (ORCPT ); Mon, 27 Apr 2015 08:47:03 -0400 Received: from cantor2.suse.de ([195.135.220.15]:55271 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932560AbbD0MrA (ORCPT ); Mon, 27 Apr 2015 08:47:00 -0400 Date: Mon, 27 Apr 2015 14:46:56 +0200 From: Michal Hocko To: Andy Lutomirski Cc: Arnd Bergmann , "linux-kernel@vger.kernel.org" , Jiri Kosina , Andrew Morton , Al Viro , Daniel Mack , Borislav Petkov , One Thousand Gnomes , Linus Torvalds , Richard Weinberger , Tom Gundersen , Steven Rostedt , Greg Kroah-Hartman , "Eric W. Biederman" , David Herrmann , Djalal Harouni Subject: Re: [GIT PULL] kdbus for 4.1-rc1 Message-ID: <20150427124656.GC15288@dhcp22.suse.cz> References: <20150420214651.GA4215@kroah.com> <20150421103519.5b0de5ea@lxorguk.ukuu.org.uk> <20150421122031.GA32624@dhcp22.suse.cz> <20150421142748.GB32624@dhcp22.suse.cz> <20150422145701.GC25105@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6322 Lines: 123 On Wed 22-04-15 12:36:12, Andy Lutomirski wrote: > On Apr 22, 2015 7:57 AM, "Michal Hocko" wrote: > > > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote: > > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko wrote: > > > > On Tue 21-04-15 16:01:01, David Herrmann wrote: > > > >> Hi > > > >> > > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko wrote: > > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: > > > >> >> Hi > > > >> >> > > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > > > >> >> wrote: > > > >> >> >> On top of that, I think that someone into resource management needs to > > > >> >> >> seriously consider whether having a broadcast send do get_user_pages > > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > > > >> >> >> is a good idea. > > > >> >> > > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or > > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures > > > >> >> > referenced 8) > > > >> >> > > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is > > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal > > > >> >> issue that does not belong here. > > > >> > > > > >> > It kind of does AFAIU. > > > >> > > > >> No, it is not. The issue with GUP is that you elevate the page > > > >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > > > > > > > The point was that such a memory might be not present yet and need a > > > > page fault with all the side effects - memory reclaim, memcg charge... > > > > > > > >> I cannot see how it is related to kdbus. However, ... > > > >> > > > >> > If for nothing else then the memcg reasons mentioned in > > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't > > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in > > > >> > a different user's context then you basically allow to hide memory > > > >> > allocations from the memcg. That is a clear show stopper. > > > >> > > > > >> > Or have I misunderstood the way how shmem buffers are used here? > > > >> > > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are > > > >> used as receive-buffers by kdbus peers. They are read-only to > > > >> user-space. All allocations are done by the kernel on message passing. > > > > > > > > OK, so the shmem buffer is allocated on the kernels behalf and under > > > > its control and no userspace can hand over one to kdbus. Do I get > > > > it right? If yes then the memcg escape I was describing above is > > > > not possible of course. This wasn't clear to me from the previous > > > > discussion. Thanks for the clarification! > > > > > > I'm still missing something here, I think. At the time of pool > > > creation, the kernel calls shmem_file_setup in the context of the > > > untrusted user. Then, when a privileged daemon broadcasts, the kernel > > > calls vfs_iter_write or similar, thus allocating the page, right? I > > > don't see why the page would be allocated early or why vfs_iter_write > > > and the associated shmem code would care what memcg created the shmem > > > file -- all of that code seems to use current's memcg on brief > > > inspection. > > > > Yes it is the current task on the first charge or the original memcg on > > the swap in. But my understanding from the above, and I haven't read the > > code yet, is that the untrusted userspace is only reader from the buffer > > and isn't allowed to modify the buffer. > > > > > Bear in mind that the bad guy gets to use madvise, etc to mess around > > > with the page cache state. > > > > How can an untrusted user play with shmem when it is read-only? > > shmem_file_setup shouls create an unlinked file so no process can access > > it via tmpfs AFAIU and potentially fault the memory before the producent > > will fill it up (thus fault in in the trusted context). I have no idea > > how the receiver gets to the buffer though. > > > > The receiver gets to mmap the buffer. I'm not sure what protection they get. OK, so I've checked the code. kdbus_pool_new sets up a shmem file (unlinked) so not visible externally. The consumer will get it via mmap on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and clears VM_MAYWRITE. The receiver even doesn't have access to the shmem file directly. It is ugly that kdbus_pool_mmap replaces the original vm_file and make it point to the shmem file. I am not sure whether this is safe all the time and it would deserve a big fat comment. On the other hand, it seems some drivers are doing this already (e.g. dma_buf_mmap). > The thing I'm worried about is that the receiver might deliberately > avoid faulting in a bunch of pages and instead wait for the producer > to touch them, causing pages that logically belong to the receiver to > be charged to the producer instead. Hmm, now that I am looking into the code it seems you are right. E.g. kdbus_cmd_send runs in the context of the sender AFAIU. This gets down to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this is where we get to charge the memory. AFAIU the terminology all the receivers will share the same shmem file when mmaping the endpoint. This, however, doesn't seem to be exploitable to hide memory charges because the receiver cannot make the buffer writable. A nasty process with a small memcg limit could still pre-fault the memory before any writer gets sends a message and slow the whole endpoint traffic. But that wouldn't be a completely new thing because processes might hammer on memory even without memcg... It is just that this would be kind of easier with memcg. If that is the concern then the buffer should be pre-charged at the time when it is created. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/