Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423484AbbEOCSU (ORCPT ); Thu, 14 May 2015 22:18:20 -0400 Received: from mail-ig0-f170.google.com ([209.85.213.170]:38807 "EHLO mail-ig0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423235AbbEOCSR (ORCPT ); Thu, 14 May 2015 22:18:17 -0400 MIME-Version: 1.0 In-Reply-To: <20150515012645.GH7232@ZenIV.linux.org.uk> References: <20150505052205.GS889@ZenIV.linux.org.uk> <20150511180650.GA4147@ZenIV.linux.org.uk> <20150513222533.GA24192@ZenIV.linux.org.uk> <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514220932.GC31808@samba2> <20150514233632.GG7232@ZenIV.linux.org.uk> <20150515012645.GH7232@ZenIV.linux.org.uk> Date: Thu, 14 May 2015 19:18:16 -0700 X-Google-Sender-Auth: eqqfKbSgH7ihrP8J-Ifw6hGdn70 Message-ID: Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks From: Linus Torvalds To: Al Viro Cc: Jeremy Allison , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig , Neil Brown Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3424 Lines: 72 On Thu, May 14, 2015 at 6:26 PM, Al Viro wrote: > > Hold on. Should > stat("blah", &buf) => ENOENT, OK, let's create it > mkdir("blah", 0) => EEXIST, bugger, looks like a race > stat("blah", &buf) => ENOENT, Whiskey, Tango, Foxtrot > be possible? No. What I described would not in any way change any of the above. I'm not understanding what your point is. The only difference - EVER - would be if you pass in the ICASE flag. Nothing I suggested would change semantics without it (the _hash_ changes, but that doesn't change semantics, it's a purely internal random number). Now, *with* O_ICASE/AT_ICASE, semantics change. Obviously. At that point the dentry lookup would match case-insensitively. For example, let's say that you have a directory where you already have both "Blah" and "blah", because you created them in a sane environment. They'll be two different dentries (assuming they are cached), but they'll have the same dentry hash. Now, you open "blah" with O_ICASE, and the end result is that you would randomly open one or the other (it would be the one you find first on the hash chain). Tough. Mixing icase and case-insensitive is by definition going to cause those kinds of issues. The nasty issue (and the case that samba apparently wants it for) is that ICASE wouldn't be able to trust negative dentries (us having a negative dentry in one case doesn't mean that it's negative in ICASE). And that might be the killer part. Negative dentries are really useful. Now, the VFS layer support part is I think fairly simple. I might be wrong, but I really think the hashing etc wouldn't be too painful. After all, we already do support ->d_hash() and ->d_compare(), this is "more of the same", just supported at a vfs level directly (and _allowing_ aliases in case). The real pain is that the low-level filesystem has to support it too. That's simple for some filesystems, but it can be hard for things that hash filenames. Because there - unlike at the VFS layer - the hashes have meaning and you can't just change them to suit a ICASE lookup (because they exist on-disk). So supporting that is likely trivial on filesystems like FAT or SYSV, which just iterate over the directory anyway at lookup() time. On ext* with hashed directories, it's nasty (and a ICASE lookup would probably have to just walk the whole directory. old-style). But I think all the code to do the nonhashed lookup is still there, since it is a filesystem feature bit. And it would only need to do that linear search thing when the ICASE flag is set in the lookup flags. Of course, if it ends up just walking the directory linearly anyway, it doesn't fix the one samba performance problem that Jeremy pointed out, so that makes this of dubious value. If we can't do this better than samba can already do it on its own, it's kind of pointless. Again - the filesystems (and the vfs layer) would remain case sensitive. But I think it might be fairly straightforward to allow per-operation ICASE handling for thins that want it. Keyword "think". Maybe there's something I didn't think of. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/