Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752045AbbERC4i (ORCPT ); Sun, 17 May 2015 22:56:38 -0400 Received: from mail-ig0-f174.google.com ([209.85.213.174]:33381 "EHLO mail-ig0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750949AbbERC41 (ORCPT ); Sun, 17 May 2015 22:56:27 -0400 MIME-Version: 1.0 In-Reply-To: <20150518091601.5c95322c@notabene.brown> References: <20150516093022.51e1464e@notabene.brown> <20150516112503.2f970573@notabene.brown> <20150516014718.GO7232@ZenIV.linux.org.uk> <20150516144527.20b89194@notabene.brown> <20150516054626.GS7232@ZenIV.linux.org.uk> <20150516141811.GT7232@ZenIV.linux.org.uk> <20150517131203.7342afc8@notabene.brown> <20150517105535.GU7232@ZenIV.linux.org.uk> <20150518091601.5c95322c@notabene.brown> Date: Sun, 17 May 2015 19:56:26 -0700 X-Google-Sender-Auth: ve5p-U10TQN8F2eamUulJyO6hHU Message-ID: Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks From: Linus Torvalds To: NeilBrown Cc: Al Viro , Andreas Dilger , Dave Chinner , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3684 Lines: 86 On Sun, May 17, 2015 at 4:16 PM, NeilBrown wrote: > > Just to be crystal clear about what I want: > I want the filesystem to be in control Yeah, no. Not going to happen. You seem to think that the dcache is "just" a cache. It's not. It's a cache, but that is absolutely not all that it is. It's very much a cache with strong semantics. And no, we're not handing over those semantics over to the filesystem. The dcache is not just a cache, it's the *primary* data structure that we use for pathname validation, local security checking, and for doing things like "getcwd()" and handling ".." etc. So there's no way the filesystem is "in control". You as a filesystem are not really even doing the actual pathname lookup. The *only* thing you're doing is filling in the dcache. The actual real pathname lookup is done by the VFS layer using the dcache data. That's how it very fundamentally works. It's *so* much more than a cache - it really *is* the primary path lookup. The filesystem is the slave in this relationship. > The filesystem then uses generic helpers (or not) to find the answers and adds > more current information to the cache. You can do that already. There *are* those generic helpers to add data to the cache. That's what "d_instantiate()" and friends _are_ for. But no, you do *not* control name lookup. You get notified when there's not enough data in the cache, and then you can fill it up any which way you want. You can populate the dcache with other entries than the one we asked for, and you can ask the dcache to revalidate and throw dentries out. But no, you do *not* get access to things like do_last() or to the decision to follow symlinks or namespace rules, or mountpoints or things like that. > So for Al's example of revalidating multiple components at once, once the VFS > gets to a point in the path where d_revalidate says "I need more time", > the VFS just passes the rest of the path to the filesystem. That's bullshit,. for a very simple and basic reason: "the rest of the path" is not necessarily at all for your filesystem! Really. There might be mount-points, there might be symlinks, there might be tons of stuff like that. You're not getting control, for the very simple reason that IT IS NOT YOUR DATA. And it really never ever will be. Now, this is why I said we can do a "hint" style thing. Part of that "hint" issue is very very much that it has no semantic meaning. You can't screw it up, because if it turns out that the path component we're looking up is a symlink and we actually end up in some other filesystem, if you end up looking up the hint part, it just would never actually get used. So it's kind of like a prefetch for names. It's semantically much weaker than saying "look up this name". The hint would be "this is likely the next part of the name that the VFS layer will look up". And the key part of that statement is (a) "likely" (it might not happen, and even if it does happen, it migth not be for your filesystem) and (b) "the VFS layer will look up" because it won't be the low-level filesystem doing it. So it would be the low-level filesystem pre-populating the dcache - if the low-level filesystem decides the hint is worth using for that - and the VFS layer then uses the data in the dcache without further bothering the filesystem. Exactly because the dcache is *so* much more than "just a cache". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/