Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753134AbbGWXtB (ORCPT ); Thu, 23 Jul 2015 19:49:01 -0400 Received: from ipmail04.adl6.internode.on.net ([150.101.137.141]:52672 "EHLO ipmail04.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752218AbbGWXs6 (ORCPT ); Thu, 23 Jul 2015 19:48:58 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AICQDYfLFVPBqxLXlcgxWBPYJVg3yiMgaaaAQCAoFPTQEBAQEBAQcBAQEBQAE/hCMBAQEDATocIwULCAMYCSUPBSUDBxoTiCYHyhYBAQEHAiAZhgWFLoQ9SQeELAWUYYw5mSGBCoMrLDGBBQEeB4EgAQEB Date: Fri, 24 Jul 2015 09:48:54 +1000 From: Dave Chinner To: "J. Bruce Fields" Cc: Austin S Hemmelgarn , "Eric W. Biederman" , Casey Schaufler , Andy Lutomirski , Seth Forshee , Alexander Viro , Linux FS Devel , LSM List , SELinux-NSA , Serge Hallyn , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 0/7] Initial support for user namespace owned mounts Message-ID: <20150723234854.GI7943@dastard> References: <20150717000914.GO7943@dastard> <87380nobs4.fsf@x220.int.ebiederm.org> <20150717024735.GW3902@dastard> <20150721173721.GE11050@fieldses.org> <20150722075640.GE7943@dastard> <20150722140923.GD22718@fieldses.org> <55AFCA6A.60304@gmail.com> <20150722174100.GJ22718@fieldses.org> <20150723015135.GH7943@dastard> <20150723131928.GA11582@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150723131928.GA11582@fieldses.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3004 Lines: 65 On Thu, Jul 23, 2015 at 09:19:28AM -0400, J. Bruce Fields wrote: > On Thu, Jul 23, 2015 at 11:51:35AM +1000, Dave Chinner wrote: > > On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > > > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > > > >>>result in creating a cycle in the dcache and then deadlocking. > > > > >> > > > > >>Therein lies the problem: how do you detect such structural defects > > > > >>without doing a full structure validation? > > > > > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > > > >which would be part of a cycle. > > > > > > > > > Except if the user can write to the filesystem's backing storage (be > > > > it a device or a file), and has sufficient knowledge of the on-disk > > > > structures, they can create all the cycles they want in the > > > > metadata. So unless the kernel builds the graph internally by > > > > parsing the metadata _and_ has some way to detect that the on-disk > > > > metadata has hit a cycle (which may not just involve 2 items), > > > > > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > > > exactly what it checks for. > > > > But that only addresses one type of loop in one specific metadata > > structure. > > Yep, agreed! > > > There's plenty of other ways you could construct metadata > > loops that are essentially undetected and result in either deadlock > > or livelock within the filesystem code itself. e.g. just make btree > > sibling pointers loop over a range of entries that have the same > > index key (e.g. free space extents of the same size). If allocation > > then falls into this loop, the kernel will just spin searching the > > same blocks for something it will never find. Such resource > > consumption attacks are trivial to construct but extremely difficult > > to detect because they exploit normal behaviour of the structure and > > algorithms by mangling trusted pointers. > > Interesting example, thanks! I doubt this particular example would be > *that* hard to detect? Yes, it can be detected, but it's not as easy as it sounds because of abstractions between tree walking and record parsing. > But understood that there may be lots of others. Yeah, that's just one of many, many ways I can think of modifying on disk structures to screw up the kernel. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/