Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753553AbbGWBvt (ORCPT ); Wed, 22 Jul 2015 21:51:49 -0400 Received: from ipmail04.adl6.internode.on.net ([150.101.137.141]:4847 "EHLO ipmail04.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750779AbbGWBvq (ORCPT ); Wed, 22 Jul 2015 21:51:46 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AxCQDsR7BVPBqxLXlTCIMVgT2GUaIBAQEBAQEBBppkBAICgVdNAQEBAQEBBwEBAQFAAT+EJAEBBDocIxAIAxgJJQ8FJQMHGhOILc0bAQEBBwIgGYYFhS6ELw5JB4QrBZRcjDSZGYEJgyssMYEFAR4HgSABAQE Date: Thu, 23 Jul 2015 11:51:35 +1000 From: Dave Chinner To: "J. Bruce Fields" Cc: Austin S Hemmelgarn , "Eric W. Biederman" , Casey Schaufler , Andy Lutomirski , Seth Forshee , Alexander Viro , Linux FS Devel , LSM List , SELinux-NSA , Serge Hallyn , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 0/7] Initial support for user namespace owned mounts Message-ID: <20150723015135.GH7943@dastard> References: <55A71CE3.4050708@schaufler-ca.com> <87fv4owvxv.fsf@x220.int.ebiederm.org> <20150717000914.GO7943@dastard> <87380nobs4.fsf@x220.int.ebiederm.org> <20150717024735.GW3902@dastard> <20150721173721.GE11050@fieldses.org> <20150722075640.GE7943@dastard> <20150722140923.GD22718@fieldses.org> <55AFCA6A.60304@gmail.com> <20150722174100.GJ22718@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150722174100.GJ22718@fieldses.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2699 Lines: 55 On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > >>>result in creating a cycle in the dcache and then deadlocking. > > >> > > >>Therein lies the problem: how do you detect such structural defects > > >>without doing a full structure validation? > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > >which would be part of a cycle. > > > > > Except if the user can write to the filesystem's backing storage (be > > it a device or a file), and has sufficient knowledge of the on-disk > > structures, they can create all the cycles they want in the > > metadata. So unless the kernel builds the graph internally by > > parsing the metadata _and_ has some way to detect that the on-disk > > metadata has hit a cycle (which may not just involve 2 items), > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > exactly what it checks for. But that only addresses one type of loop in one specific metadata structure. There's plenty of other ways you could construct metadata loops that are essentially undetected and result in either deadlock or livelock within the filesystem code itself. e.g. just make btree sibling pointers loop over a range of entries that have the same index key (e.g. free space extents of the same size). If allocation then falls into this loop, the kernel will just spin searching the same blocks for something it will never find. Such resource consumption attacks are trivial to construct but extremely difficult to detect because they exploit normal behaviour of the structure and algorithms by mangling trusted pointers. Of course, this sort of attack will eventually deadlock the filesystem because it will backs up on locks held by the live locked search. Once the filesystem is deadlocked, it can then cause sync() calls to get stuck on the filesystem. And because sync() is a global operation, a deadlocked filesystem in one container could cause sync to hang in completely unrelated container.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/