2023-11-20 15:06:26

by Christian Brauner

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Sun, Nov 19, 2023 at 06:11:39PM -0500, Gabriel Krisman Bertazi wrote:
> Christian Brauner <[email protected]> writes:
>
> > On Wed, 16 Aug 2023 01:07:54 -0400, Gabriel Krisman Bertazi wrote:
> >> This is v6 of the negative dentry on case-insensitive directories.
> >> Thanks Eric for the review of the last iteration. This version
> >> drops the patch to expose the helper to check casefolding directories,
> >> since it is not necessary in ecryptfs and it might be going away. It
> >> also addresses some documentation details, fix a build bot error and
> >> simplifies the commit messages. See the changelog in each patch for
> >> more details.
> >>
> >> [...]
> >
> > Ok, let's put it into -next so it sees some testing.
> > So it's too late for v6.7. Seems we forgot about this series.
> > Sorry about that.
>
> Christian,
>
> We are approaching -rc2 and, until last Friday, it didn't shown up in
> linux-next. So, to avoid turning a 6 month delay into 9 months, I pushed
> your signed tag to linux-next myself.
>
> That obviously uncovered a merge conflict: in v6.6, ceph added fscrypt,
> and the caller had to be updated. I fixed it and pushed again to
> linux-next to get more testing.
>
> Now, I don't want to send it to Linus myself. This is 100% VFS/FS code,
> I'm not the maintainer and it will definitely raise eyebrows. Can you
> please requeue and make sure it goes through this time? I'm happy to

My current understanding is that core dcache stuff is usually handled by
Al. And he's got a dcache branches sitting in his tree.

So this isn't me ignoring you in any way. My hands are tied and so I
can't sort this out for you easily.

> drop my branch from linux-next once yours shows up.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git/log/?h=negative-dentries
>
> This branch has the latest version with the ceph conflict folded in. I
> did it this way because I'd consider it was never picked up and there is
> no point in making the history complex by adding a fix on top of your
> signed tag, since it already fails to build ceph.
>
> I can send it as a v7; but I prefer you just pull from the branch
> above. Or you can ack and I'll send to Linus.


2023-11-20 17:00:16

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

Christian Brauner <[email protected]> writes:

> On Sun, Nov 19, 2023 at 06:11:39PM -0500, Gabriel Krisman Bertazi wrote:
>> Christian Brauner <[email protected]> writes:
>>
>> > On Wed, 16 Aug 2023 01:07:54 -0400, Gabriel Krisman Bertazi wrote:
>> >> This is v6 of the negative dentry on case-insensitive directories.
>> >> Thanks Eric for the review of the last iteration. This version
>> >> drops the patch to expose the helper to check casefolding directories,
>> >> since it is not necessary in ecryptfs and it might be going away. It
>> >> also addresses some documentation details, fix a build bot error and
>> >> simplifies the commit messages. See the changelog in each patch for
>> >> more details.
>> >>
>> >> [...]
>> >
>> > Ok, let's put it into -next so it sees some testing.
>> > So it's too late for v6.7. Seems we forgot about this series.
>> > Sorry about that.
>>
>> Christian,
>>
>> We are approaching -rc2 and, until last Friday, it didn't shown up in
>> linux-next. So, to avoid turning a 6 month delay into 9 months, I pushed
>> your signed tag to linux-next myself.
>>
>> That obviously uncovered a merge conflict: in v6.6, ceph added fscrypt,
>> and the caller had to be updated. I fixed it and pushed again to
>> linux-next to get more testing.
>>
>> Now, I don't want to send it to Linus myself. This is 100% VFS/FS code,
>> I'm not the maintainer and it will definitely raise eyebrows. Can you
>> please requeue and make sure it goes through this time? I'm happy to
>
> My current understanding is that core dcache stuff is usually handled by
> Al. And he's got a dcache branches sitting in his tree.
>
> So this isn't me ignoring you in any way. My hands are tied and so I
> can't sort this out for you easily.

Please don't take it personally, but you surely see how frustrating this
is.

While I appreciate your very prompt answer, this is very different from:

https://lore.kernel.org/linux-fsdevel/20230821-derart-serienweise-3506611e576d@brauner/
https://lore.kernel.org/linux-fsdevel/20230822-denkmal-operette-f16d8bd815fc@brauner/
https://lore.kernel.org/linux-fsdevel/20231025-selektiert-leibarzt-5d0070d85d93@brauner/

Perhaps it all has a vfs-specific meaning. But the following suggests it
was accepted and queued long ago:

> Thanks! We're a bit too late for v6.6 with this given that this hasn't
> even been in -next. So this will be up for v6.7.
[...]
> Ok, let's put it into -next so it sees some testing.
[...]
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied.
[...]
> Patches in the vfs.casefold branch should appear in linux-next soon.
[...]

Obviously, there are big issues with the process here. But I fail to see
how I could have communicated clearer or where I didn't follow the
process in this entire thread.

The branches you mentioned are 10 days old. This patchset was
"accepted" two months ago.

As one of the VFS maintainer, can you send an acked-by - or at least a
r-b in cases like this, if you agree with the patches? Then it makes
more sense for me to send to Linus directly.

Viro,

You are CC'ed since early 2022. Can you comment? Ted and Eric
reviewed, Christian too. there's been only small changes since the
first RFC.

I'll ask Linus to pull from the unicode tree in the next merge window
if I don't hear from you.

--
Gabriel Krisman Bertazi

2023-11-20 18:08:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Mon, 20 Nov 2023 at 07:06, Christian Brauner <[email protected]> wrote:
>
> My current understanding is that core dcache stuff is usually handled by
> Al. And he's got a dcache branches sitting in his tree.
>
> So this isn't me ignoring you in any way. My hands are tied and so I
> can't sort this out for you easily.

Well, we all know - very much including Al - that Al isn't always the
most responsive person, and tends to have his own ratholes that he
dives deep into.

The good news is that I do know the dcache code pretty well, and while
I really would like Al to deal with any locking issues (because
"pretty well" is not "as well as Al Viro"), for just about any other
issue I'll happily take pulls from you.

I dislike case folding with a passion - it's about the worst design
decision a filesystem can ever do - but the other side of that is that
if you have to have case folding, the last thing you want to do is to
have each filesystem deal with that sh*t-for-brains decision itself.

So moving more support for case folding into the VFS so that the
horrid thing at least gets better support is something I'm perfectly
fine with despite my dislike of it.

Of course, "do it in shared generic code" doesn't tend to really fix
the braindamage, but at least it's now shared braindamage and not
spread out all over. I'm looking at things like
generic_ci_d_compare(), and it hurts to see the mindless "let's do
lookups and compares one utf8 character at a time". What a disgrace.
Somebody either *really* didn't care, or was a Unicode person who
didn't understand the point of UTF-8.

Oh well. I guess people went "this is going to suck anyway, so let's
make sure it *really* sucks".

The patches look fine to me. Al - do you even care about them?

Linus

2023-11-21 02:03:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Mon, Nov 20, 2023 at 10:07:51AM -0800, Linus Torvalds wrote:
> Of course, "do it in shared generic code" doesn't tend to really fix
> the braindamage, but at least it's now shared braindamage and not
> spread out all over. I'm looking at things like
> generic_ci_d_compare(), and it hurts to see the mindless "let's do
> lookups and compares one utf8 character at a time". What a disgrace.
> Somebody either *really* didn't care, or was a Unicode person who
> didn't understand the point of UTF-8.

This isn't because of case-folding brain damage, but rather Unicode
brain damage. We compare one character at a time because it's
possible for some character like ? to either be encoded as 0x0089 (aka
"Latin Small Letter E with Acute") OR as 0x0065 0x0301 ("Latin Small
Letter E" plus "Combining Acute Accent").

Typically, we pretend that UTF-8 means that we can just encode ?, or
0x0089 as 0xC3 0xA9 and then call it a day and just use strcmp(3) on
the sucker. But Unicode is a lot more insane than that. Technically,
0x65 0xCC 0x81 is the same character as 0xC3 0xA9.

> Oh well. I guess people went "this is going to suck anyway, so let's
> make sure it *really* sucks".

It's more like, "this is going to suck, but if it's going to suck
anyway, let's implement the full Unicode spec in all its gory^H^H^H^H
glory, whether or not it's sane".


- Ted

2023-11-21 02:27:56

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Mon, Nov 20, 2023 at 10:07:51AM -0800, Linus Torvalds wrote:

> Well, we all know - very much including Al - that Al isn't always the
> most responsive person, and tends to have his own ratholes that he
> dives deep into.

FWIW, the right now I'm getting out of one of those - rename() deadlocks
fun. Will post that, assuming it survives local beating, then post
the dcache stuff and other branches (all rebased by now).

Other ratholes - d_name/d_parent audit is about halfway through -
we do have fun shite in quite a few ->d_revalidate() instances (UAF,
etc.); RCU pathwalk methods need to be re-audited; the fixes caught
in the last cycle are either in mainline by now, or rebased.

But that stuff is relatively easy to suspend for a few days.

> Of course, "do it in shared generic code" doesn't tend to really fix
> the braindamage, but at least it's now shared braindamage and not
> spread out all over. I'm looking at things like
> generic_ci_d_compare(), and it hurts to see the mindless "let's do
> lookups and compares one utf8 character at a time". What a disgrace.
> Somebody either *really* didn't care, or was a Unicode person who
> didn't understand the point of UTF-8.
>
> Oh well. I guess people went "this is going to suck anyway, so let's
> make sure it *really* sucks".
>

> The patches look fine to me. Al - do you even care about them?

I will review that series; my impression from the previous iterations
had been fairly unpleasant, TBH, but I hadn't rechecked since April
or so.

Re dcache conflicts - see #untested-pile-dcache; most the dcache-affecting
stuff should be there. One intersting exception is the general helper
for safely picking parent/parent's inode/entry name for ->d_revalidate()
use; we have the parent-related part boilerplated, but the name side
tends to be missing. That's still being tweaked, looking for sane
calling conventions.

2023-11-21 02:29:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Mon, 20 Nov 2023 at 18:03, Theodore Ts'o <[email protected]> wrote:
>
> On Mon, Nov 20, 2023 at 10:07:51AM -0800, Linus Torvalds wrote:
> > I'm looking at things like
> > generic_ci_d_compare(), and it hurts to see the mindless "let's do
> > lookups and compares one utf8 character at a time". What a disgrace.
> > Somebody either *really* didn't care, or was a Unicode person who
> > didn't understand the point of UTF-8.
>
> This isn't because of case-folding brain damage, but rather Unicode
> brain damage.

No, it really is just stupidity and horribleness.

The thing is, when you check two strings for equality, the FIRST THING
you should do is to just compare them for exactly that: equality.

And no, the way you do that is not by checking each unicode character
one by one.

You do it by just doing a regular memcmp. In fact, you can do even
better than that: while at it, check whether
(a) all bytes are equal in everything but bit#5
(b) none of the bytes have the high bit set
and you have now narrowed down things in a big way. You can do these
things trivially one whole word at a time, and you'll handle 99% of
all input without EVER doing any Unicode garbage AT ALL.

Yes, yes, if you actually have complex characters, you end up having
to deal with that mess. But no, that is *not* an excuse for saying
"all characters are complex".

So no. There is absolutely zero excuse for doing stupid things, except
for "nobody has ever cared, because case folding is so stupid to begin
with that people just expect it to perform horribly badly".

End result:

- generic_ci_d_compare() should *not* consider the memcmp() to be a
"fall back to this for non-casefolded". You should start with that,
and if the bytes are equal then the strings are equal. End of story.

- if the bytes are not equal, then the strings *might* still compare
equal if it's a casefolded directory.

- but EVEN THEN you shouldn't fall back to actually doing UTF-8
decoding unless you saw the high bit being set at some point.

- and if they different in anything but bit #5 and you didn't see the
high bit, you know they are different.

It's a bit complicated, yes. But no, doing things one unicode
character at a time is just bad bad bad.

Linus

2023-11-21 03:03:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Mon, 20 Nov 2023 at 18:29, Linus Torvalds
<[email protected]> wrote:
>
> It's a bit complicated, yes. But no, doing things one unicode
> character at a time is just bad bad bad.

Put another way: the _point_ of UTF-8 is that ASCII is still ASCII.
It's literally why UTF-8 doesn't suck.

So you can still compare ASCII strings as-is.

No, that doesn't help people who are really using other locales, and
are actively using complicated characters.

But it very much does mean that you can compare "Bad" and "bad" and
never ever look at any unicode translation ever.

In a perfect world, you'd use all the complicated DCACHE_WORD_ACCESS
stuff that can do all of this one word at a time.

But even if you end up doing the rules just one byte at a time, it
means that you can deal with the common cases without "unicode
cursors" or function calls to extract unicode characters, or anything
like that. You can still treat things as bytes.

So the top of generic_ci_d_compare() should probably be something
trivial like this:

const char *ct = name.name;
unsigned int tcount = name.len;

/* Handle the exact equality quickly */
if (len == tcount && !dentry_string_cmp(str, ct, tcount))
return 0;

because byte-wise equality is equality even if high bits are set.

After that, it should probably do something like

/* Not byte-identical, but maybe igncase identical in ASCII */
do {
unsigned char a, b;

/* Dentry name byte */
a = *str;

/* If that's NUL, the qstr needs to be done too! */
if (!a)
return !!tcount;

/* Alternatively, if the qstr is done, it needed to be NUL */
if (!tcount)
return 1;
b = *ct;

if ((a | b) & 0x80)
break;

if (a != b) {
/* Quick "not same" igncase ASCII */
if ((a ^ b) & ~32)
return 1;
a &= ~32;
if (a < 'A' || a > 'Z')
return 1;
}

/* Ok, same ASCII, bytefolded, go to next */
str++;
ct++;
tcount--;
len--;
}

and only after THAT should it do the utf name comparison (and only on
the remaining parts, since the above will have checked for common
ASCII beginnings).

And the above was obviously never tested, and written in the MUA, and
may be completely wrong in all the details, but you get the idea. Deal
with the usual cases first. Do the full unicode only when you
absolutely have to.

Linus

2023-11-21 05:12:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Mon, Nov 20, 2023 at 07:03:13PM -0800, Linus Torvalds wrote:
> On Mon, 20 Nov 2023 at 18:29, Linus Torvalds
> <[email protected]> wrote:
> >
> > It's a bit complicated, yes. But no, doing things one unicode
> > character at a time is just bad bad bad.
>
> Put another way: the _point_ of UTF-8 is that ASCII is still ASCII.
> It's literally why UTF-8 doesn't suck.
>
> So you can still compare ASCII strings as-is.
>
> No, that doesn't help people who are really using other locales, and
> are actively using complicated characters.
>
> But it very much does mean that you can compare "Bad" and "bad" and
> never ever look at any unicode translation ever.

Yeah, agreed, that would be a nice optimization. However, in the
unfortunate case where (a) it's non-ASCII, and (b) the input string is
non-normalized and/or differs in case, we end up scanning some portion
of the two strings twice; once doing the strcmp, and once doing the
Unicode slow path.

That being said, given that even in the case where we're dealing with
non-ASCII strings, in the fairly common case where the program is
doing a readdir() followed by a open() or stat(), the filename will be
byte-identical and so a strcmp() will suffice.

So I agree that it's a nice optimization. It'd be interesting how
much such an optimization would actually show up in various
benchmarks. It'd have to be something that was really metadata-heavy,
or else the filenamea lookups would get drowned out.

- Ted

2023-11-22 03:31:19

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

Linus Torvalds <[email protected]> writes:

> I dislike case folding with a passion - it's about the worst design
> decision a filesystem can ever do - but the other side of that is that
> if you have to have case folding, the last thing you want to do is to
> have each filesystem deal with that sh*t-for-brains decision itself.

Thanks for pitching in.

We all agree it is a horrible feature that we support for very specific
use cases. I'm the one who added the code, yes, but I was never
under the illusion it's a good feature. It solely exists to let Linux
handle the bad decisions made elsewhere.

> So moving more support for case folding into the VFS so that the
> horrid thing at least gets better support is something I'm perfectly
> fine with despite my dislike of it.

Yes. The entire implementation was meant to stay as far away as possible
from the fast lookup path (didn't want to displease Viro). The negative
dentry is the only exception that required more changes to vfs to
address the issue he found of dangling negative dentries when turning a
directory case-insensitive.

But, fyi, there is work in progress to add support to more filesystems.
This is why I really want to get all of this done first. There is a use
case to enable it in shmem because of containerized environments
running wine; I was recently cc'ed on a bcachefs implementation; and
there are people working on adding it to btrfs (to support wine in
specific products).

> Of course, "do it in shared generic code" doesn't tend to really fix
> the braindamage, but at least it's now shared braindamage and not
> spread out all over. I'm looking at things like
> generic_ci_d_compare(), and it hurts to see the mindless "let's do
> lookups and compares one utf8 character at a time". What a disgrace.
> Somebody either *really* didn't care, or was a Unicode person who
> didn't understand the point of UTF-8.

Yes. I saw the rest of the thread and you are obviously correct here.
It needs to be fixed. I will follow up with patches.

> The patches look fine to me. Al - do you even care about them?

I saw that Al Viro answered. Thank you, Al. So I'll wait for either his
review or the merge window.

Thanks,

--
Gabriel Krisman Bertazi

2023-11-22 21:04:32

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Tue, Nov 21, 2023 at 12:12:15AM -0500, Theodore Ts'o wrote:
> Yeah, agreed, that would be a nice optimization. However, in the
> unfortunate case where (a) it's non-ASCII, and (b) the input string is
> non-normalized and/or differs in case, we end up scanning some portion
> of the two strings twice; once doing the strcmp, and once doing the
> Unicode slow path.

The first pass is cheap and the second one will be running entirely
from the cache, so I doubt that scanning them twice will be a loss...

2023-11-22 21:19:17

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Tue, Nov 21, 2023 at 02:27:34AM +0000, Al Viro wrote:

> I will review that series; my impression from the previous iterations
> had been fairly unpleasant, TBH, but I hadn't rechecked since April
> or so.

The serious gap, AFAICS, is the interplay with open-by-fhandle.
It's not unfixable, but we need to figure out what to do when
lookup runs into a disconnected directory alias. d_splice_alias()
will move it in place, all right, but any state ->lookup() has
hung off the dentry that had been passed to it will be lost.

And I seriously suspect that we want to combine that state
propagation with d_splice_alias() (or its variant to be used in
such cases), rather than fixing the things up afterwards.

In particular, propagating ->d_op is really not trivial at that
point; it is safe to do to ->lookup() argument prior to d_splice_alias()
(even though that's too subtle and brittle, IMO), but after
d_splice_alias() has succeeded, the damn thing is live and can
be hit by hash lookups, revalidate, etc.

The only things that can't happen to it are ->d_delete(), ->d_prune(),
->d_iput() and ->d_init(). Everything else is fair game.

And then there's an interesting question about the interplay with
reparenting. It's OK to return an error rather than reparent,
but we need some way to tell if we need to do so.

2023-11-23 00:19:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Wed, 22 Nov 2023 at 13:19, Al Viro <[email protected]> wrote:
>
> The serious gap, AFAICS, is the interplay with open-by-fhandle.

So I'm obviously not a fan of igncase filesystems, but I don't think
this series actually changes any of that.

> It's not unfixable, but we need to figure out what to do when
> lookup runs into a disconnected directory alias. d_splice_alias()
> will move it in place, all right, but any state ->lookup() has
> hung off the dentry that had been passed to it will be lost.

I guess this migth be about the new DCACHE_CASEFOLDED_NAME bit.

At least for now, that is only used by generic_ci_d_revalidate() for
negative dentries, so it shouldn't matter for that d_splice_alias()
that only does positive dentries. No?

Or is there something else you worry about?

Side note: Gabriel, as things are now, instead of that

if (!d_is_casefolded_name(dentry))
return 0;

in generic_ci_d_revalidate(), I would suggest that any time a
directory is turned into a case-folded one, you'd just walk all the
dentries for that directory and invalidate negative ones at that
point. Or was there some reason I missed that made it a good idea to
do it at run-time after-the-fact?

Linus

2023-11-23 01:12:38

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Wed, Nov 22, 2023 at 09:19:01PM +0000, Al Viro wrote:
> On Tue, Nov 21, 2023 at 02:27:34AM +0000, Al Viro wrote:
>
> > I will review that series; my impression from the previous iterations
> > had been fairly unpleasant, TBH, but I hadn't rechecked since April
> > or so.
>
> The serious gap, AFAICS, is the interplay with open-by-fhandle.
> It's not unfixable, but we need to figure out what to do when
> lookup runs into a disconnected directory alias. d_splice_alias()
> will move it in place, all right, but any state ->lookup() has
> hung off the dentry that had been passed to it will be lost.
>
> And I seriously suspect that we want to combine that state
> propagation with d_splice_alias() (or its variant to be used in
> such cases), rather than fixing the things up afterwards.
>
> In particular, propagating ->d_op is really not trivial at that
> point; it is safe to do to ->lookup() argument prior to d_splice_alias()
> (even though that's too subtle and brittle, IMO), but after
> d_splice_alias() has succeeded, the damn thing is live and can
> be hit by hash lookups, revalidate, etc.
>
> The only things that can't happen to it are ->d_delete(), ->d_prune(),
> ->d_iput() and ->d_init(). Everything else is fair game.
>
> And then there's an interesting question about the interplay with
> reparenting. It's OK to return an error rather than reparent,
> but we need some way to tell if we need to do so.

Hmm... int (*d_transfer)(struct dentry *alias, struct dentry *new)?
Called if d_splice_alias() picks that sucker, under rename_lock,
before the call of __d_move(). Can check IS_ROOT(alias) (due to
rename_lock), so can tell attaching from reparenting, returning
an error - failed d_splice_alias().

Perhaps would be even better inside __d_move(), once all ->d_lock
are taken... Turn the current bool exchange in there into honest
enum (exchange/move/splice) and call ->d_transfer() on splice.
In case of failure it's still not too late to back out - __d_move()
would return an int, ignored in d_move() and d_exchange() and
treated as "fail in unlikely case it's non-zero" in d_splice_alias()
and __d_unalias()...

Comments? Note that e.g.
res = d_splice_alias(inode, dentry);
if (!IS_ERR(fid)) {
if (!res)
v9fs_fid_add(dentry, &fid);
else if (!IS_ERR(res))
v9fs_fid_add(res, &fid);
else
p9_fid_put(fid);
}

in 9p ->lookup() would turn into

v9fs_fid_add(dentry, &fid);
return d_splice_alias(inode, dentry);

with ->d_transfer(alias, new) being simply

struct hlist_node *p = new->d_fsdata;
hlist_del_init(p);
__add_fid(alias, hlist_entry(p, struct p9_fid, dlist));
return 0;

assuming the call from __d_move()...

2023-11-23 01:22:44

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Thu, Nov 23, 2023 at 01:12:08AM +0000, Al Viro wrote:
> On Wed, Nov 22, 2023 at 09:19:01PM +0000, Al Viro wrote:
> > On Tue, Nov 21, 2023 at 02:27:34AM +0000, Al Viro wrote:
> >
> > > I will review that series; my impression from the previous iterations
> > > had been fairly unpleasant, TBH, but I hadn't rechecked since April
> > > or so.
> >
> > The serious gap, AFAICS, is the interplay with open-by-fhandle.
> > It's not unfixable, but we need to figure out what to do when
> > lookup runs into a disconnected directory alias. d_splice_alias()
> > will move it in place, all right, but any state ->lookup() has
> > hung off the dentry that had been passed to it will be lost.
> >
> > And I seriously suspect that we want to combine that state
> > propagation with d_splice_alias() (or its variant to be used in
> > such cases), rather than fixing the things up afterwards.
> >
> > In particular, propagating ->d_op is really not trivial at that
> > point; it is safe to do to ->lookup() argument prior to d_splice_alias()
> > (even though that's too subtle and brittle, IMO), but after
> > d_splice_alias() has succeeded, the damn thing is live and can
> > be hit by hash lookups, revalidate, etc.
> >
> > The only things that can't happen to it are ->d_delete(), ->d_prune(),
> > ->d_iput() and ->d_init(). Everything else is fair game.
> >
> > And then there's an interesting question about the interplay with
> > reparenting. It's OK to return an error rather than reparent,
> > but we need some way to tell if we need to do so.
>
> Hmm... int (*d_transfer)(struct dentry *alias, struct dentry *new)?
> Called if d_splice_alias() picks that sucker, under rename_lock,
> before the call of __d_move(). Can check IS_ROOT(alias) (due to
> rename_lock), so can tell attaching from reparenting, returning
> an error - failed d_splice_alias().
>
> Perhaps would be even better inside __d_move(), once all ->d_lock
> are taken... Turn the current bool exchange in there into honest
> enum (exchange/move/splice) and call ->d_transfer() on splice.
> In case of failure it's still not too late to back out - __d_move()
> would return an int, ignored in d_move() and d_exchange() and
> treated as "fail in unlikely case it's non-zero" in d_splice_alias()
> and __d_unalias()...
>
> Comments? Note that e.g.
> res = d_splice_alias(inode, dentry);
> if (!IS_ERR(fid)) {
> if (!res)
> v9fs_fid_add(dentry, &fid);
> else if (!IS_ERR(res))
> v9fs_fid_add(res, &fid);
> else
> p9_fid_put(fid);
> }
>
> in 9p ->lookup() would turn into
>
> v9fs_fid_add(dentry, &fid);
> return d_splice_alias(inode, dentry);
>
> with ->d_transfer(alias, new) being simply
>
> struct hlist_node *p = new->d_fsdata;
> hlist_del_init(p);
> __add_fid(alias, hlist_entry(p, struct p9_fid, dlist));
> return 0;
>
> assuming the call from __d_move()...

Incidentally, 9p and this one would not be the only places that could use it -
affs - alias->d_fsdata = new->d_fsdata
afs - ditto
ocfs2 - smells like another possible benefitiary (attaching locks, etc.
would be saner if done before d_splice_alias(), with ->d_transfer()
moving the lock to the alias)...

2023-11-23 05:10:03

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Wed, Nov 22, 2023 at 04:18:56PM -0800, Linus Torvalds wrote:
> On Wed, 22 Nov 2023 at 13:19, Al Viro <[email protected]> wrote:
> >
> > The serious gap, AFAICS, is the interplay with open-by-fhandle.
>
> So I'm obviously not a fan of igncase filesystems, but I don't think
> this series actually changes any of that.
>
> > It's not unfixable, but we need to figure out what to do when
> > lookup runs into a disconnected directory alias. d_splice_alias()
> > will move it in place, all right, but any state ->lookup() has
> > hung off the dentry that had been passed to it will be lost.
>
> I guess this migth be about the new DCACHE_CASEFOLDED_NAME bit.
>
> At least for now, that is only used by generic_ci_d_revalidate() for
> negative dentries, so it shouldn't matter for that d_splice_alias()
> that only does positive dentries. No?
>
> Or is there something else you worry about?

Dentries created by d_obtain_alias() will never go anywhere near
generic_set_encrypted_ci_d_ops(). They do *not* get ->d_op set
that way. When ext4_lookup() does a lookup in c-i directory it
does have ->d_op set on dentry it got from the caller. Which is
promptly discarded when d_splice_alias() finds a preexisting
alias for it.

Positive dentries eventually become negative; not invalidating them
when that happens is a large part of the point of this series.
->d_revalidate() is taught to check if they are marked with that
bit, but... they won't have that ->d_revalidate() associated with
them, will they? ->d_hash() and ->d_compare() come from the
parent, but ->d_revalidate() comes from dentry itself.

In other words, generic_ci_d_revalidate() won't see the lack of
that bit on dentry, etc. - it won't be called for that dentry
in the first place.

2023-11-23 15:57:40

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

Linus Torvalds <[email protected]> writes:

> Side note: Gabriel, as things are now, instead of that
>
> if (!d_is_casefolded_name(dentry))
> return 0;
>
> in generic_ci_d_revalidate(), I would suggest that any time a
> directory is turned into a case-folded one, you'd just walk all the
> dentries for that directory and invalidate negative ones at that
> point. Or was there some reason I missed that made it a good idea to
> do it at run-time after-the-fact?
>

The problem I found with that approach, which I originally tried, was
preventing concurrent lookups from racing with the invalidation and
creating more 'case-sensitive' negative dentries. Did I miss a way to
synchronize with concurrent lookups of the children of the dentry? We
can trivially ensure the dentry doesn't have positive children by
holding the parent lock, but that doesn't protect from concurrent
lookups creating negative dentries, as far as I understand.

--
Gabriel Krisman Bertazi