Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753420AbdLSWlK (ORCPT ); Tue, 19 Dec 2017 17:41:10 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:48080 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751820AbdLSWlJ (ORCPT ); Tue, 19 Dec 2017 17:41:09 -0500 Date: Tue, 19 Dec 2017 22:40:54 +0000 From: Al Viro To: "Eric W. Biederman" Cc: Giuseppe Scrivano , Andrew Morton , LKML , alexander.deucher@amd.com, broonie@kernel.org, chris@chris-wilson.co.uk, David Miller , deepa.kernel@gmail.com, Greg KH , luc.vanoostenryck@gmail.com, lucien xin , Ingo Molnar , Neil Horman , syzkaller-bugs@googlegroups.com, Vladislav Yasevich Subject: Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free Message-ID: <20171219224054.GV21978@ZenIV.linux.org.uk> References: <20171219101440.19736-1-gscrivan@redhat.com> <20171219114819.GQ21978@ZenIV.linux.org.uk> <20171219153225.GA14771@ZenIV.linux.org.uk> <874lomhcwb.fsf@redhat.com> <87vah2ftn8.fsf@redhat.com> <20171219201411.GT21978@ZenIV.linux.org.uk> <8737465qxn.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8737465qxn.fsf@xmission.com> User-Agent: Mutt/1.9.0 (2017-09-02) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1618 Lines: 25 On Tue, Dec 19, 2017 at 03:49:24PM -0600, Eric W. Biederman wrote: > > what would you be delaying? kmem_cache_alloc() for struct mount and assignments > > to its fields? That's noise; if anything, I would expect the main cost with > > a plenty of containers to be in sget() scanning the list of mqueue superblocks. > > And we can get rid of that, while we are at it - to hell with mount_ns(), with > > that approach we can just use mount_nodev() instead. The logics in > > mq_internal_mount() will deal with multiple instances - if somebody has already > > triggered creation of internal mount, all subsequent calls in that ipcns will > > end up avoiding kern_mount_data() entirely. And if you have two callers > > racing - sure, you will get two superblocks. Not for long, though - the first > > one to get to setting ->mq_mnt (serialized on mq_lock) wins, the second loses > > and prompty destroys his vfsmount and superblock. I seriously suspect that > > variant below would cut down on the cost a whole lot more - as it is, we have > > the total of O(N^2) spent in the loop inside of sget_userns() when we create > > N ipcns and mount in each of those; this patch should cut that to > > O(N)... > > If that is where the cost is, is there any point in delaying creating > the internal mount at all? We won't know without the profiles... Incidentally, is there any point in using mount_ns() for procfs? Similar scheme (with ->proc_mnt instead of ->mq_mnt, of course) would live with mount_nodev() just fine, and it's definitely less costly - we don't bother with the loop in sget_userns() at all that way.