Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752752AbbHJCKm (ORCPT ); Sun, 9 Aug 2015 22:10:42 -0400 Received: from mail-ob0-f176.google.com ([209.85.214.176]:36545 "EHLO mail-ob0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752362AbbHJCKl (ORCPT ); Sun, 9 Aug 2015 22:10:41 -0400 MIME-Version: 1.0 In-Reply-To: <55C7D02A.9060905@zonque.org> References: <55C3A403.8020202@zonque.org> <55C4C35A.4070306@zonque.org> <20150809190027.GA24185@kroah.com> <55C7D02A.9060905@zonque.org> From: Andy Lutomirski Date: Sun, 9 Aug 2015 19:10:20 -0700 Message-ID: Subject: Re: kdbus: to merge or not to merge? To: Daniel Mack Cc: Greg Kroah-Hartman , Linus Torvalds , Tom Gundersen , "Kalle A. Sandstrom" , Borislav Petkov , One Thousand Gnomes , Havoc Pennington , Djalal Harouni , "linux-kernel@vger.kernel.org" , "Eric W. Biederman" , cee1 , David Herrmann , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2784 Lines: 56 On Sun, Aug 9, 2015 at 3:11 PM, Daniel Mack wrote: > > Internally, the connection pool is simply a shmem backed file. From the > context of the HELLO ioctl, we are calling into shmem_file_setup(), so > the file is eventually owned by the task which created the bus task > connecting to the bus. One reason why we do the shmem file allocation in > the kernel and on behalf of a the userspace task is that we clear the > VM_MAYWRITE bit to prevent the task from writing to the pool through its > mapped buffer. We also do not set VM_NORESERVE, so the entire buffer is > pre-accounted for the task that created the connection. I don't have access to the system I've been using for testing right now, but I wonder how the kdbus pool stack up against the entire rest of memory allocations for the average desktop process. > > The pool implementation uses an r/b tree to organize the buffer into > slices. Those slices can be kept by userspace as long as the parsing > implementation needs to have access to them. When finished, the slices > are freed. A simple ring buffer cannot cope with the gaps that emerge by > that. > > When a connection buffer is written to, it is done from the context of > another task which calls into the kdbus code through one of the ioctls. > The memcg implementation should hence charge the task that acts as > writer, which is maybe not ideal but can be changed easily with some > addition to the internal APIs. We omitted it for the current version, > which is non-intrusive with regards to other kernel subsystems. > This has at least the following weakness. I can very easily get systemd to write to my shmem-backed pool: simply subscribe to one of its broadcasts. If I cause such a write to be very slow (intentionally or otherwise), then PID 1 blocks. If you change the memcg code to charge me instead of PID 1 (as it should IMO), then the problem gets worse. > The kdbus implementation is actually comparable to two tasks X and Y > which both have their own buffer file open and mmap()ed, and they both > pass their FD to the other side. If X now writes to Y's file, and that > is causing a page fault, X is accounted for it, correct? If PID 1 accepted a memfd from me (even a properly sealed one) and wrote to it, I would wonder whether it were actually a good idea. Does this scheme have any actual measurable advantage over the traditional model of a small non-paged buffer in the kernel (i.e. the way sockets work) with explicit userspace memfd use as appropriate? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/