Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751743AbbEFQWg (ORCPT ); Wed, 6 May 2015 12:22:36 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:47083 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750730AbbEFQWd (ORCPT ); Wed, 6 May 2015 12:22:33 -0400 Date: Wed, 6 May 2015 17:22:30 +0100 From: Al Viro To: NeilBrown Cc: Linus Torvalds , Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 51/79] namei: remove restrictions on nesting depth Message-ID: <20150506162230.GY889@ZenIV.linux.org.uk> References: <20150505052205.GS889@ZenIV.linux.org.uk> <1430803373-4948-51-git-send-email-viro@ZenIV.linux.org.uk> <20150506125528.0be6d30e@notabene.brown> <20150506072406.GX889@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150506072406.GX889@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3665 Lines: 89 On Wed, May 06, 2015 at 08:24:07AM +0100, Al Viro wrote: > On Wed, May 06, 2015 at 12:55:28PM +1000, NeilBrown wrote: > > > > - if (unlikely(current->total_link_count >= 40)) { > > > + if (unlikely(current->total_link_count >= MAXSYMLINKS)) { > > > > There is still a literal '40' in follow_automount. > > > > current->total_link_count++; > > if (current->total_link_count >= 40) > > return -ELOOP; > > > > > > should that become MAXSYMLINKS too? > > Yes, assuming we want to keep it at all... To elaborate a bit - the reason why we count those among the symlink crossings is historical; we used to implement automounting via ->follow_link(). Note that it's not really consistent - once a process has triggered automount, subsequent lookups crossing that point *won't* have it counted as link traversal. Except for ones that come while automount attempt is in progress, etc. Limit on link traversals makes obvious sense wrt avoiding loops and DoS - reaching a symlink in effect is adding a pile of path components to the path yet to be traversed. Automount points do nothing of that sort - the pending path remains as-is. So I'm not sure it makes sense to have those contribute to the same counter. OTOH, there _is_ an unpleasant problem around follow_automount() - it's a place where we call a method with heavy stack footprint still fairly deep in stack; in mainline it can get as bad as ~2.8K deep, with this tree the worst call chain entirely inside fs/namei.c is SyS_renameat2 -> user_path_parent -> filename_lookup -> path_lookupat -> path_init -> link_path_walk -> walk_component -> lookup_fast -> follow_managed -> ->d_automount and it costs 1.1K; slightly more shallow chains lead from linkat(2) and do_filp_open()/do_file_open_root(). Again, mainline is more than two times worse on those. ->d_manage() gets hit on the same depth, ->d_revalidate() at a bit less than that. FWIW, the sums on a fairly sane amd64 config are [->d_manage] 1104 [->d_automount] 1104 [->d_revalidate] 1024 [->lookup] 992 [->put_link] 896 [->permission] 832 [->follow_link] 768 [->d_hash] 768 [->d_weak_revalidate] 624 [->create] 544 [->tmpfile] 480 [->atomic_open] 480 [->rename] 336 [->rename2] 336 [->link] 240 [->unlink] 224 [->rmdir] 176 [->mknod] 160 [->symlink] 144 [->mkdir] 144 And ->d_automount() is easily the heaviest of that bunch in terms of stack footprint. So much that I really wonder if we ought to use something like schedule_work() and (interruptibly) wait for completion; the interesting question here is how much of the initiator's state is needed by worker. The real namespace isn't - actual mounting is in finish_automount(). Credentials probably are; what about netns? Suppose a process steps on NFS4 referral point and triggers a new NFS mount; which netns should be used - that of the triggering process or that of the NFS mount where we'd run into the referral? Ditto for AFS and CIFS... BTW, if you want to torture the whole thing wrt stack footprint, try to make /lib/firmware a symlink that would lead you through a referral point (on the mainline - do so at the maximal nesting depth). The thing is, request_firmware() will chase that one *on* *caller's* *stack*. Seeing that we do have an async variant anyway (request_firmware_nowait()), could we have request_firmware() itself go through schedule_work() (and wait for completion, of course)? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/