Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755352AbbDPRbb (ORCPT ); Thu, 16 Apr 2015 13:31:31 -0400 Received: from mail-qk0-f182.google.com ([209.85.220.182]:33820 "EHLO mail-qk0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754898AbbDPRbX (ORCPT ); Thu, 16 Apr 2015 13:31:23 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150413190350.GA9485@kroah.com> <20150413204547.GB1760@kroah.com> <20150414175019.GA2874@kroah.com> <20150414192357.GA6107@kroah.com> <20150414193533.GF889@ZenIV.linux.org.uk> <20150414194348.GA7540@kroah.com> <552EA700.7000200@gmail.com> <20150415232218.7df214ba@lxorguk.ukuu.org.uk> Date: Thu, 16 Apr 2015 19:31:22 +0200 Message-ID: Subject: Re: [GIT PULL] kdbus for 4.1-rc1 From: David Herrmann To: Al Viro Cc: Greg Kroah-Hartman , Jiri Kosina , Borislav Petkov , Andy Lutomirski , Linus Torvalds , Andrew Morton , Arnd Bergmann , "Eric W. Biederman" , One Thousand Gnomes , Tom Gundersen , "linux-kernel@vger.kernel.org" , Daniel Mack , Djalal Harouni Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4534 Lines: 104 Hi On Wed, Apr 15, 2015 at 2:36 PM, Al Viro wrote: > On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote: > >> I've asked for it, but finding people to review code is hard, as you >> know. It's only 13k lines long, smaller than a serial port driver (my >> unit of code review), so it's not all that big. >> >> It's smaller than the USB3 host controller driver as well, and very few >> people ever reviewed that beast :) >> >> > For something that's potentially such a core mechanism as a completely >> > new, massively-adopted IPC, this does send a warning singal. >> >> If you know of a way to force others to review code, please let me know. > > Have it in a less nasty state, perhaps? Random question: > > al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock > ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock); > ipc/kdbus/node.c:340: down_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:344: up_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:444: down_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:452: up_write(&kdbus_node_idr_lock); As Greg said, this is a leftover from times we actually needed a lookup here. Nice catch, I have a local patch to convert the whole IDR into an IDA and drop the lock entirely (like kernfs does right now, for kernfs_node->ino). > Do you see anything wrong with that? Or with things like that: > mutex_lock(&pos->lock); > v_pre = atomic_read(&pos->active); > if (v_pre >= 0) > atomic_add_return(KDBUS_NODE_BIAS, &pos->active); > else if (v_pre == KDBUS_NODE_NEW) > atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT); > mutex_unlock(&pos->lock); > What are the locking rules for ->active/->waitq/->lock? Are those the > outermost thing in the hierarchy? Or is that dependent on the node location? > It sure as hell is outside of (at least) ->mmap_sem (by way of > kdbus_conn_connect() establishing that ->active/->waitq is outside of > ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything > taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone > brings in a lot... I'm working on patches to add more comments similar to how we did in node.c. For now, please see my explanations below: node->lock is the _innermost_ lock. node->active implements revoke support for nodes. It follows what kernfs->active does and isn't a lock in particular. We kinda treat it as rwsem, where down_write() is the outer-most lock in kdbus and _only_ called without any other lock held (kdbus_node_deactivate()). Read-side, we never ever block on the "lock", but only use try-lock. If it fails, the node is dead/revoked. Therefore, the read-side of 'active' nests almost arbitrarily. We hold 'active'-references almost everywhere, to make sure a node is not destroyed while we use it. However, we never sleep for an indefinite time while holding it. Given that the write-side is the outer-most lock in kdbus, it doesn't dead-lock against the try-lock readers. > Document your goddamn locking, would you? It *IS* new code, and you, as you > say, had very few people working on it, so you don't have the excuses for > the mess existing in older parts of the tree. Locking order (outer-most to inner-most): 1) domain->lock 2) names->rwlock 3) endpoint->lock 4) bus->conn_rwlock 5) policy->entries_rwlock 6) connection->lock 7) metadata->lock mmap_sem nests below metadata->lock. With the rcu-protected exe_file patches by Davidlohr Bueso, we can even drop that dependency. They have kinda stalled, though. Then we have a bunch of data structure protection, which can be called from any context: * bus->notify_lock * pool->lock * match->mdb_rwlock * node->lock Lastly, there're 2 locks which nest around everything and must not be taken with any lock held: * handle->rwlock (taken in ioctl-entry) * bus->notify_flush_lock (taken in work-queue) General object stacking is: domain -> bus -> endpoint -> policy -> connection -> {metadata,pool,match,node} The conn_rwlock protection of the conn-list locks on kdbus_bus is the only lock that doesn't follow this ordering. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/