From: Gao Xiang Subject: Re: [PATCH RFC v2 00/13] NLS/UTF-8 Case-Insensitive lookups for ext4 and VFS proposal Date: Tue, 6 Feb 2018 11:21:44 +0800 Message-ID: References: <20180125025349.31494-1-krisman@collabora.co.uk> <20180125031650.GU13338@ZenIV.linux.org.uk> <87po5ij1nu.fsf@collabora.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , , , , , , hutj To: Gabriel Krisman Bertazi , Al Viro Return-path: In-Reply-To: <87po5ij1nu.fsf@collabora.co.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org Hi Gabriel, On 2018/2/6 10:24, Gabriel Krisman Bertazi wrote: > Al Viro writes: > >> On Thu, Jan 25, 2018 at 12:53:36AM -0200, Gabriel Krisman Bertazi wrote: >> >>> The second proposal is related to the VFS layer: >>> >>> (2) Enable Insensitive lookup support on a per-mountpoint basis, >>> via a MS_CASEFOLD flag, with the ultimate goal of supporting a >>> case-insensitive bind mount of a subtree, side-by-side with a >>> sensitive version of the filesystem. >> >> First reaction: No. With the side of HELL NO. >> >> Your ultimate goal sounds utterly insane - dcache tree must be shared >> for all mounts. Moreover, "would these two names refer to the same >> object" can not be mount-dependent. Not going to happen. >> >> Please, post the description of what you are planning to do. >> Detailed. I'm not saying that it's 100% impossible to do correctly, >> but I'm _very_ sceptical about the feasibility. >> >> I'm certainly not going to ACK any VFS changes until you convince >> me that this thing can be done with sane semantics. > > Hi Viro, > > Sorry for the delay. I wanted to get a better understanding of the > code before getting back to you. > > Our customer is interested in exposing a subtree of an existing > filesystem (native Linux filesystems, xfs, ext4 and others) in an > case-insensitive lookup manner, without paying the cost of a userspace > getdents implementation, and, preferably, without requiring the user to > modify data on the disk. Could I express my opinion? I have working on case-insensitive sdcardfs for months. I think your problem is how we optimise a case-insensitive lookup on the file system with a case-sensitive dcache (I mean d_add and no d_compare and d_hash). In that case, we could not trust the negative dentry when _creating_ a case-insensitive file, for example: there exists "anDroid" on-disk, but ext4's in-memory dcache only has the negative "Android", if we lookup "Android" we will get the _negative_ dentry, but we _cannot_ create it since "anDroid" exists on disk. In the create case, an on-disk _iterate_ (or readdir) is necessary. I could give another example, if we uses case-insentive ext4 and create "Android" and "anDroid", how to deal with the case in the case-insensitive way? I mean in that case we should make both "Android" and "anDroid" can access, right? I think we need to build a special case-sensitive dcache rather than a case-insensitive dcache following the native case-insentive fs(use d_add_ci, d_compare and d_hash, eg. fat, ntfs...) Finally, I agree "let the user shot herself in the foot by having two files with the exact CI name", but I think it could not the VFS _busniess_ itself since each customer solution "case-sensitive ext4 -> case-insensitive lookup" has their _perfered_ way (for example, "android" and "Android" exist, A perfers android and B perfers Android. Finally, I think for optmization, ext4 or other fs could add some dir inode _tag_ and supports native case-insensitive for these dirs could be better.... Sorry to bother, Thanks, > > After learning, last year, about the similar Android issue from Ted at a > conference, I started to believe a bind mount is the appropriate > solution for both our use cases. > > Like Ted mentioned, we let the user shot herself in the foot by having > two files with the exact CI name, and which one she gets will be > undefined. My current solution gets the exact match if it is present, > and my second approach could be adapted to do that with a performance > cost only if both files are not cached. That seem to be acceptable by > Ted and shouldn't be a problem for me too. > > Let me start by discussing what I have now, which is a functional > solution, to the best of my knowledge, but which still requires rework > to better benefit from the dentry cache. Then I will discuss the > solutions I am working on. > > (1) What I have now: > > As I mentioned in the first email, MS_CASEFOLD triggers a MNT_CASEFOLD > flag in the vfsmount structure. This flag, which gets recalculated > every time a mountpoint is traversed, is used to decide if we do a > casefolded ->lookup() for each component of the path. This obviously > allows paths with multiple nested mountpoints to work. > > The ->lookup() function calls d_add_ci(), making sure that only the > exact-match dentry is inserted in the cache. This means that we don't > have multiple dentries, differing only by case, associated to the same > file. This is an important semantic, which I am not modifying. This > also means that a following CI lookup with an inexact-case match will > trigger another ->lookup(), which is bad, but is manageable (for now). > I address this problem later in this email. > > Regarding negative dentries. If we have a negative dentry in the cache, > the case-sensitive algorithm will abort the lookup and return the > dentry, nothing new here. If, on the other hand, That component lookup > is under a MNT_CASEFOLD search, the code does a bigger effort, ignores > the negative dentry and still do ->lookup(). > > This means that: > > - Case-sensitive (CS) search performance is not affected, except for > the following sequence: Do a CS(foo) which returns a negative_dentry, > then CI(FOO), and finally CS(foo). This will require the second > CS(foo) to re-do the search and recreate the negative dentry. If > only CS searches are done, the performance should be unaffected. > > - Case-insensitive (CI) benefits from the dentry cache as long as it is > doing an exact-case name match. Otherwise, it always falls to > ->lookup(). The exact-name dentry, if already in the cache, will be > returned in both cases. > > - Negative dentries are ignored if doing CI and it always re-do the > search. > > The prototype code is in the following branch. I can send > patches if you want. > > https://gitlab.collabora.com/krisman/linux.git -b vfs-ms_casefold > > (2) What I want to do: > > I want to allow lookups of the inexact name to find the cached dentry, > preventing it from going to the disk only to find out later it already > has that dentry cached in some other cache bucket. This means that, > whenever there is a MNT_CASEFOLD enabled, every mountpoint of that > superblock needs to use d_hash(casefold(name)), such that every file is > in a known bucket. Obviously this is a worse hash function, but the > cost is only payed by filesystems using the feature. If a bind mount is > done, we must rehash all existing entries of that filesystem at > "bind,remount,ignorecase" time. > > Like the previous version, we only store exact name dentries, even > though the bucket is different now. When doing a LOOKUP_CASEFOLD, we > use a different ->d_compare_CI() to do the matching, such that we can > find the right dentry or not, in both scenarios (CI and CS). > > We still can't trust negative dentries when doing a CI lookup. Consider > the case where we do CS(FOO) and foo doesn't exist. Now we have a > negative_dentry(FOO), but CI(FOO) would still need to find an existing > 'foo' file. The first solution is to make fast_CI_lookup() walk the > entire bucket and ignore previous matching negative dentries if a > positive one is found. If no positive dentry is found, then fallback > and do ->lookup(). A superior solution is to mark negative dentries as > soft negatives and hard negatives, depending on what kind of lookup > created the dentry, whether CS or CI, respectively. Then, we can allow > a CI lookup to return the negative dentry only if it is a HARD negative, > and try the ->lookup() otherwise. For this to work, any dentry creation > on that parent requires that we invalidate hard negative children > dentries, and make them soft, at insertion/rename time. > > With these changes I believe the dentry cache could be shared by code > doing CI and CS lookups of the same dentry. My main concern is whether > we could invalidate a negative dentry without harming parallel lookups > using that dentry, but I believe I can work around that. > > I believe this covers everything I could identify, but please let me > know if I missed something, obvious or not. >