2023-11-24 06:02:44

by Al Viro

[permalink] [raw]
Subject: [RFC][PATCHSET v3] simplifying fast_dput(), dentry_kill() et.al.

The series below is the fallout of trying to document the dentry
refcounting and life cycle - basically, getting rid of the bits that
had been too subtle and ugly to write them up.

Compared to v2: two commits moved to misc pile, lockless variant of
retain_dentry() added.

Results so far:
* -132LoC (-166LoC not counting the additions in D/f/porting ;-)
* considerably simpler locking for __dentry_kill()
* fast_dput() is pretty much "dput() sans killing dentry; returns true if
we are done, false - if dentry needs killing (in which case dentry will
be left locked and refcount is known to be 0).
* retain_dentry() not messing with refcounting - called with refcount 0
and ->d_lock held, returns whether we want the dentry retained in cache.
* rules for shrink lists are much simpler now - to_shrink_list() puts
a locked dentry with zero refcount into a shrink list, no need to guarantee
that filesystem containing that dentry won't get shut down before we get
to eventual shrink_dentry_list() - it would do the right thing.
* ->d_iput() and ->d_release() no longer have weird corner cases when they
could get called with parent already killed. That happened to be avoided
in the cases where in-kernel instances would bother to work with the parent,
but that used to be very brittle and convoluted. Now it's "parent is kept
pinned until __dentry_kill() of child is done".
* a bunch of other subtle shit is gone (e.g. the logics in shrink_lock_dentry()
had rather subtle corner cases, with convoluted reasons why they won't break
things - except that in some cases they would, etc.)

This stuff lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.dcache2
individual patches are in followups. Help with reviewing and testing would
be very welcome - it seems to survive the local beating, but it definitely
needs more.

Shortlog:
Al Viro (21):
switch nfsd_client_rmdir() to use of simple_recursive_removal()
coda_flag_children(): cope with dentries turning negative
dentry: switch the lists of children to hlist
centralize killing dentry from shrink list
shrink_dentry_list(): no need to check that dentry refcount is marked dead
fast_dput(): having ->d_delete() is not reason to delay refcount decrement
fast_dput(): handle underflows gracefully
fast_dput(): new rules for refcount
__dput_to_list(): do decrement of refcount in the callers
Make retain_dentry() neutral with respect to refcounting
__dentry_kill(): get consistent rules for victim's refcount
dentry_kill(): don't bother with retain_dentry() on slow path
Call retain_dentry() with refcount 0
fold the call of retain_dentry() into fast_dput()
don't try to cut corners in shrink_lock_dentry()
fold dentry_kill() into dput()
to_shrink_list(): call only if refcount is 0
switch select_collect{,2}() to use of to_shrink_list()
d_prune_aliases(): use a shrink list
__dentry_kill(): new locking scheme
retain_dentry(): introduce a trimmed-down lockless variant

Diffstat:
Documentation/filesystems/porting.rst | 34 +++
arch/powerpc/platforms/cell/spufs/inode.c | 5 +-
fs/afs/dynroot.c | 5 +-
fs/autofs/expire.c | 7 +-
fs/ceph/dir.c | 2 +-
fs/ceph/mds_client.c | 2 +-
fs/coda/cache.c | 9 +-
fs/dcache.c | 493 +++++++++++-------------------
fs/libfs.c | 45 ++-
fs/nfsd/nfsctl.c | 70 +----
fs/notify/fsnotify.c | 2 +-
fs/tracefs/inode.c | 34 +--
include/linux/dcache.h | 20 +-
13 files changed, 298 insertions(+), 430 deletions(-)

Patch description follows:

Part 1 - preparations

1/21) nfsd_client_rmdir() and its gut open-code simple_recursive_removal();
converting to calling that cleans the things up in there *and* reduces
the amount of places where we touch the list of children, which simplifies
the work later in the series.

2/21) more fun caught while looking at the places that go through the
lists of children: coda_flag_children() assumes that ->d_lock on parent
is enough to prevent children going negative. Ain't so...

3/21) switch the lists of children to hlist. We never bother with
accessing the list tail and using hlist saves us a pointer per each
dentry. Besides, it ends up more readable that way. Fields used to hold
the lists got renamed - d_children/d_sib instead of d_subdirs/d_child.
Yes, any out-of-tree code that works with the lists of children gets
loudly broken; not hard to fix.

4/21) centralize killing dentry from shrink list
There are identical pieces of code in shrink_dentry_list() and
shrink_dcache_for_umount(); they would require identical massage through
the series below, unifying them into an inlined helper reduces the amount
of noise.

5/21) shrink_dentry_list(): no need to check that dentry refcount is
marked dead. We won't see DCACHE_MAY_FREE on anything that is *not*
dead and checking d_flags is just as cheap as checking refcount.

Part 2 - massage of dput() and friends

6/21) fast_dput(): having ->d_delete() is not reason to delay refcount
decrement.
->d_delete() is a way for filesystem to tell that dentry is not
worth keeping cached. It is not guaranteed to be called every time a dentry
has refcount drop down to zero; it is not guaranteed to be called before
dentry gets evicted. In other words, it is not suitable for any kind
of keeping track of dentry state.
None of the in-tree filesystems attempt to use it that way,
fortunately.
So the contortions done by fast_dput() (as well as dentry_kill())
are not warranted. fast_dput() certainly should treat having ->d_delete()
instance as "can't assume we'll be keeping it", but that's not different
from the way we treat e.g. DCACHE_DONTCACHE (which is rather similar
to making ->d_delete() returns true when called).

7/21) fast_dput(): handle underflows gracefully.
If refcount is less than 1, we should just warn, unlock
dentry and return true, so that the caller doesn't try to do anything
else.
Taking care of that leaves the rest of "lockref_put_return() has
failed" case equivalent to "decrement refcount and rejoin the normal
slow path after the point where we grab ->d_lock".

8/21) fast_dput(): new rules for refcount.
By now there is only one place in entire fast_dput() where we
return false; that happens after refcount had been decremented and found
(under ->d_lock) to be zero. In that case, just prior to returning
false to caller, fast_dput() forcibly changes the refcount from 0 to 1.
Lift that resetting refcount to 1 into the callers; later in
the series it will be massaged out of existence.

9/21) __dput_to_list(): do decrement of refcount in the callers
... and rename it to to_shrink_list(), seeing that it no longer
does dropping any references.

10/21) make retain_dentry() neutral with respect to refcounting.
It used to decrement refcount if and only if it returned true.
Lift those decrements into the callers.

11/21) __dentry_kill(): get consistent rules for victim's refcount
Currently we call it with refcount equal to 1 when called from
dentry_kill(); all other callers have it equal to 0.
Make it always be called with zero refcount; on this step we just
decrement it before the calls in dentry_kill(). That is safe, since
all places that care about the value of refcount either do that under
->d_lock or hold a reference to dentry in question. Either is sufficient
to prevent observing a dentry immediately prior to __dentry_kill()
getting called from dentry_kill().

12/21) dentry_kill(): don't bother with retain_dentry() on the slow path
We have already checked it and dentry used to look not worthy
of keeping. The only hard obstacle to evicting dentry is non-zero
refcount; everything else is advisory - e.g. memory pressure could evict
any dentry found with refcount zero. On the slow path in dentry_kill()
we had dropped and regained ->d_lock; we must recheck the refcount,
but everything else is not worth bothering with.
Note that filesystem can not count upon ->d_delete() being
called for dentry - not even once. Again, memory pressure (as well as
d_prune_aliases(), or attempted rmdir() of ancestor, or...) will not
call ->d_delete() at all.
So from the correctness point of view we are fine doing the
check only once. And it makes things simpler down the road.
The doctor said "To the morgue", so to the morgue it is!

13/21) Call retain_dentry() with refcount 0.
Instead of bumping it from 0 to 1, calling retain_dentry(),
then decrementing it back to 0 (with ->d_lock held all the way through),
just leave refcount at 0 through all of that.
It will have a visible effect for ->d_delete() - now it can
be called with refcount 0 instead of 1 and it can no longer play silly
buggers with dropping/regaining ->d_lock. Not that any in-tree instances
tried to (it's pretty hard to get right).
Any out-of-tree ones will have to adjust (assuming they need
any changes).

14/21) fold the call of retain_dentry() into fast_dput()
Calls of retain_dentry() happen immediately after getting false
from fast_dput() and getting true from retain_dentry() is treated the
same way as non-zero refcount would be treated by fast_dput() - unlock
dentry and bugger off.

15/21) don't try to cut corners in shrink_lock_dentry().
That is to say, do *not* treat the ->d_inode or ->d_parent
changes as "it's hard, return false; somebody must have grabbed it,
so even if has zero refcount, we don't need to bother killing it -
final dput() from whoever grabbed it would've done everything".
First of all, that is not guaranteed. It might have been dropped
by shrink_kill() handling of victim's parent, which would've found it
already on a shrink list (ours) and decided that they don't need to put
it on their shrink list.
What's more, dentry_kill() is doing pretty much the same thing,
cutting its own set of corners (it assumes that dentry can't go from
positive to negative, so its inode can change but only once and only in
one direction).
Doing that right allows to get rid of that not-quite-duplication
and removes the only reason for re-incrementing refcount before the call
of dentry_kill().
Replacement is called lock_for_kill(); called under rcu_read_lock
and with ->d_lock held. If it returns false, dentry has non-zero refcount
and the same locks are held. If it returns true, dentry has zero refcount
and all locks required by __dentry_kill() are taken.
Part of __lock_parent() had been lifted into lock_parent() to
allow its reuse. Now it's called with rcu_read_lock already held and
dentry already unlocked.
Note that this is not the final change - locking requirements
for __dentry_kill() are going to change later in the series and the set
of locks taken by lock_for_kill() will be adjusted. Both lock_parent()
and __lock_parent() will be gone once that happens.

16/21) fold dentry_kill() into dput().
Not worth keeping separate.

17/21) to_shrink_list(): call only if refcount is 0
The only thing it does if refcount is not zero is d_lru_del();
no point, IMO, seeing that plain dput() does nothing of that sort...
Note that 2 of 3 current callers are guaranteed that refcount is 0.

18/21) switch select_collect{,2}() to use of to_shrink_list()
Same note about d_lru_del() as in (17/21).

19/21) d_prune_aliases(): use a shrink list
Instead of dropping aliases one by one, restarting, etc., just
collect them into a shrink list and kill them off in one pass.
We don't really need the restarts - one alias can't pin another
(directory has only one alias, and couldn't be its own ancestor anyway),
so collecting everything that is not busy and taking it out would
take care of everything evictable that had been there as we entered
the function. And new aliases added while we'd been dropping old ones
could just as easily have appeared right as we return to caller...

20/21) __dentry_kill(): new locking scheme
Currently we enter __dentry_kill() with parent (along with the
victim dentry and victim's inode) held locked. Then we
mark dentry refcount as dead
call ->d_prune()
remove dentry from hash
remove it from the parent's list of children
unlock the parent, don't need it from that point on
detach dentry from inode,
unlock dentry and drop the inode (via ->d_iput())
call ->d_release()
regain the lock on dentry
check if it's on a shrink list (in which case freeing its empty
husk has to be left to shrink_dentry_list()) or not (in which
case we can free it ourselves). In the former case, mark it
as an empty husk, so that shrink_dentry_list() would know it
can free the sucker.
drop the lock on dentry
... and usually the caller proceeds to drop a reference on the parent,
possibly retaking the lock on it.
That is painful for a bunch of reasons, starting with the need
to take locks out of order, but not limited to that - the parent of
positive dentry can change if we drop its ->d_lock, so getting these
locks has to be done with care.
Moreover, as soon as dentry is out of the parent's list of
children, shrink_dcache_for_umount() won't see it anymore, making it
appear as if the parent is inexplicably busy. We do work around that
by having shrink_dentry_list() decrement the parent's refcount first and
put it on shrink list to be evicted once we are done with __dentry_kill()
of child, but that may in some cases lead to ->d_iput() on child called
after the parent got killed. That doesn't happen in cases where in-tree
->d_iput() instances might want to look at the parent, but that's brittle
as hell.
Solution: do removal from the parent's list of children in the
very end of __dentry_kill(). As the result, the callers do not need to
lock the parent and by the time we really need the parent locked, dentry
is negative and is guaranteed not to be moved around.
It does mean that ->d_prune() will be called with parent not
locked. It also means that we might see dentries in process of being torn
down while going through the parent's list of children; those dentries
will be unhashed, negative and with refcount marked dead. In practice,
that's enough for in-tree code that looks through the list of children
to do the right thing as-is. Out-of-tree code might need to be adjusted.
Calling conventions: __dentry_kill(dentry) is called with
dentry->d_lock held, along with ->i_lock of its inode (if any).
It either returns the parent (locked, with refcount decremented to 0)
or NULL (if there'd been no parent or if refcount decrement for parent
hadn't reached 0).
lock_for_kill() is adjusted for new requirements - it doesn't
touch the parent's ->d_lock at all.
Callers adjusted. Note that for dput() we don't need to
bother with fast_dput() for the parent - we just need to check
retain_dentry() for it, since its ->d_lock is still held since the
moment when __dentry_kill() had taken it to remove the victim from the
list of children.
The kludge with early decrement of parent's refcount in
shrink_dentry_list() is no longer needed - shrink_dcache_for_umount()
sees the half-killed dentries in the list of children for as long as
they are pinning the parent. They are easily recognized and accounted
for by select_collect(), so we know we are not done yet.
As the result, we always have the expected ordering
for ->d_iput()/->d_release() vs. __dentry_kill() of the parent, no
exceptions. Moreover, the current rules for shrink lists (one must make
sure that shrink_dcache_for_umount() won't happen while any dentries
from the superblock in question are on any shrink lists) are gone -
shrink_dcache_for_umount() will do the right thing in all cases, taking
such dentries out. Their empty husks (memory occupied by struct dentry
itself + its external name, if any) will remain on the shrink lists,
but they are no obstacles to filesystem shutdown. And such husks will
get freed as soon as shrink_dentry_list() of the list they are on gets
to them.

21/21) retain_dentry(): introduce a trimmed-down lockless variant

fast_dput() contains a small piece of code, preceded by scary
comments about 5 times longer than it. What is actually done there is
a trimmed-down subset of retain_dentry() - in some situations we can
tell that retain_dentry() would have returned true without ever needing
->d_lock and that's what that code checks. If these checks come true
fast_dput() can declare that we are done, without bothering with ->d_lock;
otherwise it has to take the lock and do full variant of retain_dentry()
checks.
Trimmed-down variant of the checks is hard to follow and
it's asking for trouble - if we ever decide to change the rules in
retain_dentry(), we'll have to remember to update that code. It turns
out that an equivalent variant of these checks more obviously parallel
to retain_dentry() is not just possible, but easy to unify with
retain_dentry() itself, passing it a new boolean argument ('locked')
to distinguish between the full semantics and trimmed down one.
Note that in lockless case true is returned only when locked
variant would have returned true without ever needing the lock; false
means "punt to the locking path and recheck there".


2023-11-24 06:04:57

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 01/21] switch nfsd_client_rmdir() to use of simple_recursive_removal()

nfsd_client_rmdir() open-codes a subset of simple_recursive_removal().
Conversion to calling simple_recursive_removal() allows to clean things
up quite a bit.

While we are at it, nfsdfs_create_files() doesn't need to mess with "pick
the reference to struct nfsdfs_client from the already created parent" -
the caller already knows it (that's where the parent got it from,
after all), so we might as well just pass it as an explicit argument.
So __get_nfsdfs_client() is only needed in get_nfsdfs_client() and
can be folded in there.

Incidentally, the locking in get_nfsdfs_client() is too heavy - we don't
need ->i_rwsem for that, ->i_lock serves just fine.

Reviewed-by: Jeff Layton <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Acked-by: Chuck Lever <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/nfsd/nfsctl.c | 70 ++++++++++--------------------------------------
1 file changed, 14 insertions(+), 56 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 3e15b72f421d..50df5e9d0069 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1236,63 +1236,34 @@ static inline void _nfsd_symlink(struct dentry *parent, const char *name,

#endif

-static void clear_ncl(struct inode *inode)
+static void clear_ncl(struct dentry *dentry)
{
+ struct inode *inode = d_inode(dentry);
struct nfsdfs_client *ncl = inode->i_private;

+ spin_lock(&inode->i_lock);
inode->i_private = NULL;
+ spin_unlock(&inode->i_lock);
kref_put(&ncl->cl_ref, ncl->cl_release);
}

-static struct nfsdfs_client *__get_nfsdfs_client(struct inode *inode)
-{
- struct nfsdfs_client *nc = inode->i_private;
-
- if (nc)
- kref_get(&nc->cl_ref);
- return nc;
-}
-
struct nfsdfs_client *get_nfsdfs_client(struct inode *inode)
{
struct nfsdfs_client *nc;

- inode_lock_shared(inode);
- nc = __get_nfsdfs_client(inode);
- inode_unlock_shared(inode);
+ spin_lock(&inode->i_lock);
+ nc = inode->i_private;
+ if (nc)
+ kref_get(&nc->cl_ref);
+ spin_unlock(&inode->i_lock);
return nc;
}
-/* from __rpc_unlink */
-static void nfsdfs_remove_file(struct inode *dir, struct dentry *dentry)
-{
- int ret;
-
- clear_ncl(d_inode(dentry));
- dget(dentry);
- ret = simple_unlink(dir, dentry);
- d_drop(dentry);
- fsnotify_unlink(dir, dentry);
- dput(dentry);
- WARN_ON_ONCE(ret);
-}
-
-static void nfsdfs_remove_files(struct dentry *root)
-{
- struct dentry *dentry, *tmp;
-
- list_for_each_entry_safe(dentry, tmp, &root->d_subdirs, d_child) {
- if (!simple_positive(dentry)) {
- WARN_ON_ONCE(1); /* I think this can't happen? */
- continue;
- }
- nfsdfs_remove_file(d_inode(root), dentry);
- }
-}

/* XXX: cut'n'paste from simple_fill_super; figure out if we could share
* code instead. */
static int nfsdfs_create_files(struct dentry *root,
const struct tree_descr *files,
+ struct nfsdfs_client *ncl,
struct dentry **fdentries)
{
struct inode *dir = d_inode(root);
@@ -1311,8 +1282,9 @@ static int nfsdfs_create_files(struct dentry *root,
dput(dentry);
goto out;
}
+ kref_get(&ncl->cl_ref);
inode->i_fop = files->ops;
- inode->i_private = __get_nfsdfs_client(dir);
+ inode->i_private = ncl;
d_add(dentry, inode);
fsnotify_create(dir, dentry);
if (fdentries)
@@ -1321,7 +1293,6 @@ static int nfsdfs_create_files(struct dentry *root,
inode_unlock(dir);
return 0;
out:
- nfsdfs_remove_files(root);
inode_unlock(dir);
return -ENOMEM;
}
@@ -1341,7 +1312,7 @@ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn,
dentry = nfsd_mkdir(nn->nfsd_client_dir, ncl, name);
if (IS_ERR(dentry)) /* XXX: tossing errors? */
return NULL;
- ret = nfsdfs_create_files(dentry, files, fdentries);
+ ret = nfsdfs_create_files(dentry, files, ncl, fdentries);
if (ret) {
nfsd_client_rmdir(dentry);
return NULL;
@@ -1352,20 +1323,7 @@ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn,
/* Taken from __rpc_rmdir: */
void nfsd_client_rmdir(struct dentry *dentry)
{
- struct inode *dir = d_inode(dentry->d_parent);
- struct inode *inode = d_inode(dentry);
- int ret;
-
- inode_lock(dir);
- nfsdfs_remove_files(dentry);
- clear_ncl(inode);
- dget(dentry);
- ret = simple_rmdir(dir, dentry);
- WARN_ON_ONCE(ret);
- d_drop(dentry);
- fsnotify_rmdir(dir, dentry);
- dput(dentry);
- inode_unlock(dir);
+ simple_recursive_removal(dentry, clear_ncl);
}

static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
--
2.39.2

2023-11-24 06:04:57

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 04/21] centralize killing dentry from shrink list

new helper unifying identical bits of shrink_dentry_list() and
shring_dcache_for_umount()

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 59f76c9a15d1..bb862a304e1b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1174,10 +1174,18 @@ static bool shrink_lock_dentry(struct dentry *dentry)
return false;
}

+static inline void shrink_kill(struct dentry *victim, struct list_head *list)
+{
+ struct dentry *parent = victim->d_parent;
+ if (parent != victim)
+ __dput_to_list(parent, list);
+ __dentry_kill(victim);
+}
+
void shrink_dentry_list(struct list_head *list)
{
while (!list_empty(list)) {
- struct dentry *dentry, *parent;
+ struct dentry *dentry;

dentry = list_entry(list->prev, struct dentry, d_lru);
spin_lock(&dentry->d_lock);
@@ -1195,10 +1203,7 @@ void shrink_dentry_list(struct list_head *list)
}
rcu_read_unlock();
d_shrink_del(dentry);
- parent = dentry->d_parent;
- if (parent != dentry)
- __dput_to_list(parent, list);
- __dentry_kill(dentry);
+ shrink_kill(dentry, list);
}
}

@@ -1629,17 +1634,13 @@ void shrink_dcache_parent(struct dentry *parent)
data.victim = NULL;
d_walk(parent, &data, select_collect2);
if (data.victim) {
- struct dentry *parent;
spin_lock(&data.victim->d_lock);
if (!shrink_lock_dentry(data.victim)) {
spin_unlock(&data.victim->d_lock);
rcu_read_unlock();
} else {
rcu_read_unlock();
- parent = data.victim->d_parent;
- if (parent != data.victim)
- __dput_to_list(parent, &data.dispose);
- __dentry_kill(data.victim);
+ shrink_kill(data.victim, &data.dispose);
}
}
if (!list_empty(&data.dispose))
--
2.39.2

2023-11-24 06:05:04

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 02/21] coda_flag_children(): cope with dentries turning negative

->d_lock on parent does not stabilize ->d_inode of child.
We don't do much with that inode in there, but we need
at least to avoid struct inode getting freed under us...

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/coda/cache.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/coda/cache.c b/fs/coda/cache.c
index 3b8c4513118f..bfbc03c6b632 100644
--- a/fs/coda/cache.c
+++ b/fs/coda/cache.c
@@ -92,13 +92,16 @@ static void coda_flag_children(struct dentry *parent, int flag)
{
struct dentry *de;

+ rcu_read_lock();
spin_lock(&parent->d_lock);
list_for_each_entry(de, &parent->d_subdirs, d_child) {
+ struct inode *inode = d_inode_rcu(de);
/* don't know what to do with negative dentries */
- if (d_inode(de) )
- coda_flag_inode(d_inode(de), flag);
+ if (inode)
+ coda_flag_inode(inode, flag);
}
spin_unlock(&parent->d_lock);
+ rcu_read_unlock();
return;
}

--
2.39.2

2023-11-24 06:05:10

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 08/21] fast_dput(): new rules for refcount

By now there is only one place in entire fast_dput() where we return
false; that happens after refcount had been decremented and found (under
->d_lock) to be zero. In that case, just prior to returning false to
caller, fast_dput() forcibly changes the refcount from 0 to 1.

Lift that resetting refcount to 1 into the callers; later in the series
it will be massaged out of existence.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 9edabc7e2e64..a00e9ba22480 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -847,13 +847,6 @@ static inline bool fast_dput(struct dentry *dentry)
spin_unlock(&dentry->d_lock);
return true;
}
-
- /*
- * Re-get the reference we optimistically dropped. We hold the
- * lock, and we just tested that it was zero, so we can just
- * set it to 1.
- */
- dentry->d_lockref.count = 1;
return false;
}

@@ -896,6 +889,7 @@ void dput(struct dentry *dentry)
}

/* Slow case: now with the dentry lock held */
+ dentry->d_lockref.count = 1;
rcu_read_unlock();

if (likely(retain_dentry(dentry))) {
@@ -930,6 +924,7 @@ void dput_to_list(struct dentry *dentry, struct list_head *list)
return;
}
rcu_read_unlock();
+ dentry->d_lockref.count = 1;
if (!retain_dentry(dentry))
__dput_to_list(dentry, list);
spin_unlock(&dentry->d_lock);
--
2.39.2

2023-11-24 06:05:17

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 09/21] __dput_to_list(): do decrement of refcount in the callers

... and rename it to to_shrink_list(), seeing that it no longer
does dropping any references

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a00e9ba22480..0718b3895c12 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -902,16 +902,13 @@ void dput(struct dentry *dentry)
}
EXPORT_SYMBOL(dput);

-static void __dput_to_list(struct dentry *dentry, struct list_head *list)
+static void to_shrink_list(struct dentry *dentry, struct list_head *list)
__must_hold(&dentry->d_lock)
{
- if (dentry->d_flags & DCACHE_SHRINK_LIST) {
- /* let the owner of the list it's on deal with it */
- --dentry->d_lockref.count;
- } else {
+ if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
if (dentry->d_flags & DCACHE_LRU_LIST)
d_lru_del(dentry);
- if (!--dentry->d_lockref.count)
+ if (!dentry->d_lockref.count)
d_shrink_add(dentry, list);
}
}
@@ -925,8 +922,10 @@ void dput_to_list(struct dentry *dentry, struct list_head *list)
}
rcu_read_unlock();
dentry->d_lockref.count = 1;
- if (!retain_dentry(dentry))
- __dput_to_list(dentry, list);
+ if (!retain_dentry(dentry)) {
+ --dentry->d_lockref.count;
+ to_shrink_list(dentry, list);
+ }
spin_unlock(&dentry->d_lock);
}

@@ -1165,8 +1164,10 @@ static bool shrink_lock_dentry(struct dentry *dentry)
static inline void shrink_kill(struct dentry *victim, struct list_head *list)
{
struct dentry *parent = victim->d_parent;
- if (parent != victim)
- __dput_to_list(parent, list);
+ if (parent != victim) {
+ --parent->d_lockref.count;
+ to_shrink_list(parent, list);
+ }
__dentry_kill(victim);
}

--
2.39.2

2023-11-24 06:05:24

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 07/21] fast_dput(): handle underflows gracefully

If refcount is less than 1, we should just warn, unlock dentry and
return true, so that the caller doesn't try to do anything else.

Taking care of that leaves the rest of "lockref_put_return() has
failed" case equivalent to "decrement refcount and rejoin the
normal slow path after the point where we grab ->d_lock".

NOTE: lockref_put_return() is strictly a fastpath thing - unlike
the rest of lockref primitives, it does not contain a fallback.
Caller (and it looks like fast_dput() is the only legitimate one
in the entire kernel) has to do that itself. Reasons for
lockref_put_return() failures:
* ->d_lock held by somebody
* refcount <= 0
* ... or an architecture not supporting lockref use of
cmpxchg - sparc, anything non-SMP, config with spinlock debugging...

We could add a fallback, but it would be a clumsy API - we'd have
to distinguish between:
(1) refcount > 1 - decremented, lock not held on return
(2) refcount < 1 - left alone, probably no sense to hold the lock
(3) refcount is 1, no cmphxcg - decremented, lock held on return
(4) refcount is 1, cmphxcg supported - decremented, lock *NOT* held
on return.
We want to return with no lock held in case (4); that's the whole point of that
thing. We very much do not want to have the fallback in case (3) return without
a lock, since the caller might have to retake it in that case.
So it wouldn't be more convenient than doing the fallback in the caller and
it would be very easy to screw up, especially since the test coverage would
suck - no way to test (3) and (4) on the same kernel build.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 00c19041adf3..9edabc7e2e64 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -779,12 +779,12 @@ static inline bool fast_dput(struct dentry *dentry)
*/
if (unlikely(ret < 0)) {
spin_lock(&dentry->d_lock);
- if (dentry->d_lockref.count > 1) {
- dentry->d_lockref.count--;
+ if (WARN_ON_ONCE(dentry->d_lockref.count <= 0)) {
spin_unlock(&dentry->d_lock);
return true;
}
- return false;
+ dentry->d_lockref.count--;
+ goto locked;
}

/*
@@ -842,6 +842,7 @@ static inline bool fast_dput(struct dentry *dentry)
* else could have killed it and marked it dead. Either way, we
* don't need to do anything else.
*/
+locked:
if (dentry->d_lockref.count) {
spin_unlock(&dentry->d_lock);
return true;
--
2.39.2

2023-11-24 06:05:24

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 10/21] make retain_dentry() neutral with respect to refcounting

retain_dentry() used to decrement refcount if and only if it returned
true. Lift those decrements into the callers.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 0718b3895c12..2e74f3f2ce2e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -680,7 +680,6 @@ static inline bool retain_dentry(struct dentry *dentry)
return false;

/* retain; LRU fodder */
- dentry->d_lockref.count--;
if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
d_lru_add(dentry);
else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED)))
@@ -744,6 +743,8 @@ static struct dentry *dentry_kill(struct dentry *dentry)
} else if (likely(!retain_dentry(dentry))) {
__dentry_kill(dentry);
return parent;
+ } else {
+ dentry->d_lockref.count--;
}
/* we are keeping it, after all */
if (inode)
@@ -893,6 +894,7 @@ void dput(struct dentry *dentry)
rcu_read_unlock();

if (likely(retain_dentry(dentry))) {
+ dentry->d_lockref.count--;
spin_unlock(&dentry->d_lock);
return;
}
@@ -925,6 +927,8 @@ void dput_to_list(struct dentry *dentry, struct list_head *list)
if (!retain_dentry(dentry)) {
--dentry->d_lockref.count;
to_shrink_list(dentry, list);
+ } else {
+ --dentry->d_lockref.count;
}
spin_unlock(&dentry->d_lock);
}
--
2.39.2

2023-11-24 06:05:45

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 12/21] dentry_kill(): don't bother with retain_dentry() on slow path

We have already checked it and dentry used to look not worthy
of keeping. The only hard obstacle to evicting dentry is
non-zero refcount; everything else is advisory - e.g. memory
pressure could evict any dentry found with refcount zero.
On the slow path in dentry_kill() we had dropped and regained
->d_lock; we must recheck the refcount, but everything else
is not worth bothering with.

Note that filesystem can not count upon ->d_delete() being
called for dentry - not even once. Again, memory pressure
(as well as d_prune_aliases(), or attempted rmdir() of ancestor,
or...) will not call ->d_delete() at all.

So from the correctness point of view we are fine doing the
check only once. And it makes things simpler down the road.

Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b527db8e5901..80992e49561c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -739,14 +739,10 @@ static struct dentry *dentry_kill(struct dentry *dentry)
spin_lock(&dentry->d_lock);
parent = lock_parent(dentry);
got_locks:
- if (unlikely(dentry->d_lockref.count != 1)) {
- dentry->d_lockref.count--;
- } else if (likely(!retain_dentry(dentry))) {
- dentry->d_lockref.count--;
+ dentry->d_lockref.count--;
+ if (likely(dentry->d_lockref.count == 0)) {
__dentry_kill(dentry);
return parent;
- } else {
- dentry->d_lockref.count--;
}
/* we are keeping it, after all */
if (inode)
--
2.39.2

2023-11-24 06:06:00

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 11/21] __dentry_kill(): get consistent rules for victim's refcount

Currently we call it with refcount equal to 1 when called from
dentry_kill(); all other callers have it equal to 0.

Make it always be called with zero refcount; on this step we
just decrement it before the calls in dentry_kill(). That is
safe, since all places that care about the value of refcount
either do that under ->d_lock or hold a reference to dentry
in question. Either is sufficient to prevent observing a
dentry immediately prior to __dentry_kill() getting called
from dentry_kill().

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 2e74f3f2ce2e..b527db8e5901 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -729,6 +729,7 @@ static struct dentry *dentry_kill(struct dentry *dentry)
goto slow_positive;
}
}
+ dentry->d_lockref.count--;
__dentry_kill(dentry);
return parent;

@@ -741,6 +742,7 @@ static struct dentry *dentry_kill(struct dentry *dentry)
if (unlikely(dentry->d_lockref.count != 1)) {
dentry->d_lockref.count--;
} else if (likely(!retain_dentry(dentry))) {
+ dentry->d_lockref.count--;
__dentry_kill(dentry);
return parent;
} else {
--
2.39.2

2023-11-24 06:06:01

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 13/21] Call retain_dentry() with refcount 0

Instead of bumping it from 0 to 1, calling retain_dentry(), then
decrementing it back to 0 (with ->d_lock held all the way through),
just leave refcount at 0 through all of that.

It will have a visible effect for ->d_delete() - now it can be
called with refcount 0 instead of 1 and it can no longer play
silly buggers with dropping/regaining ->d_lock. Not that any
in-tree instances tried to (it's pretty hard to get right).

Any out-of-tree ones will have to adjust (assuming they need any
changes).

Note that we do not need to extend rcu-critical area here - we have
verified that refcount is non-negative after having grabbed ->d_lock,
so nobody will be able to free dentry until they get into __dentry_kill(),
which won't happen until they manage to grab ->d_lock.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/porting.rst | 8 ++++++++
fs/dcache.c | 10 ++--------
2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 331405f4b29f..3714efcb6f65 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1070,3 +1070,11 @@ The list of children anchored in parent dentry got turned into hlist now.
Field names got changed (->d_children/->d_sib instead of ->d_subdirs/->d_child
for anchor/entries resp.), so any affected places will be immediately caught
by compiler.
+
+---
+
+**mandatory**
+
+ ->d_delete() instances are now called for dentries with ->d_lock held
+and refcount equal to 0. They are not permitted to drop/regain ->d_lock.
+None of in-tree instances did anything of that sort. Make sure yours do not...
diff --git a/fs/dcache.c b/fs/dcache.c
index 80992e49561c..8ce0fe70f303 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -888,15 +888,14 @@ void dput(struct dentry *dentry)
}

/* Slow case: now with the dentry lock held */
- dentry->d_lockref.count = 1;
rcu_read_unlock();

if (likely(retain_dentry(dentry))) {
- dentry->d_lockref.count--;
spin_unlock(&dentry->d_lock);
return;
}

+ dentry->d_lockref.count = 1;
dentry = dentry_kill(dentry);
}
}
@@ -921,13 +920,8 @@ void dput_to_list(struct dentry *dentry, struct list_head *list)
return;
}
rcu_read_unlock();
- dentry->d_lockref.count = 1;
- if (!retain_dentry(dentry)) {
- --dentry->d_lockref.count;
+ if (!retain_dentry(dentry))
to_shrink_list(dentry, list);
- } else {
- --dentry->d_lockref.count;
- }
spin_unlock(&dentry->d_lock);
}

--
2.39.2

2023-11-24 06:06:26

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 14/21] fold the call of retain_dentry() into fast_dput()

Calls of retain_dentry() happen immediately after getting false
from fast_dput() and getting true from retain_dentry() is
treated the same way as non-zero refcount would be treated by
fast_dput() - unlock dentry and bugger off.

Doing that in fast_dput() itself is simpler.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 8ce0fe70f303..b69ff3a0b30f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -757,6 +757,8 @@ static struct dentry *dentry_kill(struct dentry *dentry)
* Try to do a lockless dput(), and return whether that was successful.
*
* If unsuccessful, we return false, having already taken the dentry lock.
+ * In that case refcount is guaranteed to be zero and we have already
+ * decided that it's not worth keeping around.
*
* The caller needs to hold the RCU read lock, so that the dentry is
* guaranteed to stay around even if the refcount goes down to zero!
@@ -842,7 +844,7 @@ static inline bool fast_dput(struct dentry *dentry)
* don't need to do anything else.
*/
locked:
- if (dentry->d_lockref.count) {
+ if (dentry->d_lockref.count || retain_dentry(dentry)) {
spin_unlock(&dentry->d_lock);
return true;
}
@@ -889,12 +891,6 @@ void dput(struct dentry *dentry)

/* Slow case: now with the dentry lock held */
rcu_read_unlock();
-
- if (likely(retain_dentry(dentry))) {
- spin_unlock(&dentry->d_lock);
- return;
- }
-
dentry->d_lockref.count = 1;
dentry = dentry_kill(dentry);
}
@@ -920,8 +916,7 @@ void dput_to_list(struct dentry *dentry, struct list_head *list)
return;
}
rcu_read_unlock();
- if (!retain_dentry(dentry))
- to_shrink_list(dentry, list);
+ to_shrink_list(dentry, list);
spin_unlock(&dentry->d_lock);
}

--
2.39.2

2023-11-24 06:06:39

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 18/21] switch select_collect{,2}() to use of to_shrink_list()

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 704676bf06fd..f68fe7c863e0 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1495,13 +1495,9 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)

if (dentry->d_flags & DCACHE_SHRINK_LIST) {
data->found++;
- } else {
- if (dentry->d_flags & DCACHE_LRU_LIST)
- d_lru_del(dentry);
- if (!dentry->d_lockref.count) {
- d_shrink_add(dentry, &data->dispose);
- data->found++;
- }
+ } else if (!dentry->d_lockref.count) {
+ to_shrink_list(dentry, &data->dispose);
+ data->found++;
}
/*
* We can return to the caller if we have found some (this
@@ -1522,17 +1518,13 @@ static enum d_walk_ret select_collect2(void *_data, struct dentry *dentry)
if (data->start == dentry)
goto out;

- if (dentry->d_flags & DCACHE_SHRINK_LIST) {
- if (!dentry->d_lockref.count) {
+ if (!dentry->d_lockref.count) {
+ if (dentry->d_flags & DCACHE_SHRINK_LIST) {
rcu_read_lock();
data->victim = dentry;
return D_WALK_QUIT;
}
- } else {
- if (dentry->d_flags & DCACHE_LRU_LIST)
- d_lru_del(dentry);
- if (!dentry->d_lockref.count)
- d_shrink_add(dentry, &data->dispose);
+ to_shrink_list(dentry, &data->dispose);
}
/*
* We can return to the caller if we have found some (this
--
2.39.2

2023-11-24 06:06:50

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 15/21] don't try to cut corners in shrink_lock_dentry()

That is to say, do *not* treat the ->d_inode or ->d_parent changes
as "it's hard, return false; somebody must have grabbed it, so
even if has zero refcount, we don't need to bother killing it -
final dput() from whoever grabbed it would've done everything".

First of all, that is not guaranteed. It might have been dropped
by shrink_kill() handling of victim's parent, which would've found
it already on a shrink list (ours) and decided that they don't need
to put it on their shrink list.

What's more, dentry_kill() is doing pretty much the same thing,
cutting its own set of corners (it assumes that dentry can't
go from positive to negative, so its inode can change but only once
and only in one direction).

Doing that right allows to get rid of that not-quite-duplication
and removes the only reason for re-incrementing refcount before
the call of dentry_kill().

Replacement is called lock_for_kill(); called under rcu_read_lock
and with ->d_lock held. If it returns false, dentry has non-zero
refcount and the same locks are held. If it returns true,
dentry has zero refcount and all locks required by __dentry_kill()
are taken.

Part of __lock_parent() had been lifted into lock_parent() to
allow its reuse. Now it's called with rcu_read_lock already
held and dentry already unlocked.

Note that this is not the final change - locking requirements for
__dentry_kill() are going to change later in the series and the
set of locks taken by lock_for_kill() will be adjusted. Both
lock_parent() and __lock_parent() will be gone once that happens.

Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 159 ++++++++++++++++++++++------------------------------
1 file changed, 66 insertions(+), 93 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b69ff3a0b30f..a7f99d46c41b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -625,8 +625,6 @@ static void __dentry_kill(struct dentry *dentry)
static struct dentry *__lock_parent(struct dentry *dentry)
{
struct dentry *parent;
- rcu_read_lock();
- spin_unlock(&dentry->d_lock);
again:
parent = READ_ONCE(dentry->d_parent);
spin_lock(&parent->d_lock);
@@ -642,7 +640,6 @@ static struct dentry *__lock_parent(struct dentry *dentry)
spin_unlock(&parent->d_lock);
goto again;
}
- rcu_read_unlock();
if (parent != dentry)
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
else
@@ -657,7 +654,64 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
return NULL;
if (likely(spin_trylock(&parent->d_lock)))
return parent;
- return __lock_parent(dentry);
+ rcu_read_lock();
+ spin_unlock(&dentry->d_lock);
+ parent = __lock_parent(dentry);
+ rcu_read_unlock();
+ return parent;
+}
+
+/*
+ * Lock a dentry for feeding it to __dentry_kill().
+ * Called under rcu_read_lock() and dentry->d_lock; the former
+ * guarantees that nothing we access will be freed under us.
+ * Note that dentry is *not* protected from concurrent dentry_kill(),
+ * d_delete(), etc.
+ *
+ * Return false if dentry is busy. Otherwise, return true and have
+ * that dentry's inode and parent both locked.
+ */
+
+static bool lock_for_kill(struct dentry *dentry)
+{
+ struct inode *inode = dentry->d_inode;
+ struct dentry *parent = dentry->d_parent;
+
+ if (unlikely(dentry->d_lockref.count))
+ return false;
+
+ if (inode && unlikely(!spin_trylock(&inode->i_lock)))
+ goto slow;
+ if (dentry == parent)
+ return true;
+ if (likely(spin_trylock(&parent->d_lock)))
+ return true;
+
+ if (inode)
+ spin_unlock(&inode->i_lock);
+slow:
+ spin_unlock(&dentry->d_lock);
+
+ for (;;) {
+ if (inode)
+ spin_lock(&inode->i_lock);
+ parent = __lock_parent(dentry);
+ if (likely(inode == dentry->d_inode))
+ break;
+ if (inode)
+ spin_unlock(&inode->i_lock);
+ inode = dentry->d_inode;
+ spin_unlock(&dentry->d_lock);
+ if (parent)
+ spin_unlock(&parent->d_lock);
+ }
+ if (likely(!dentry->d_lockref.count))
+ return true;
+ if (inode)
+ spin_unlock(&inode->i_lock);
+ if (parent)
+ spin_unlock(&parent->d_lock);
+ return false;
}

static inline bool retain_dentry(struct dentry *dentry)
@@ -710,45 +764,16 @@ EXPORT_SYMBOL(d_mark_dontcache);
static struct dentry *dentry_kill(struct dentry *dentry)
__releases(dentry->d_lock)
{
- struct inode *inode = dentry->d_inode;
- struct dentry *parent = NULL;

- if (inode && unlikely(!spin_trylock(&inode->i_lock)))
- goto slow_positive;
-
- if (!IS_ROOT(dentry)) {
- parent = dentry->d_parent;
- if (unlikely(!spin_trylock(&parent->d_lock))) {
- parent = __lock_parent(dentry);
- if (likely(inode || !dentry->d_inode))
- goto got_locks;
- /* negative that became positive */
- if (parent)
- spin_unlock(&parent->d_lock);
- inode = dentry->d_inode;
- goto slow_positive;
- }
- }
dentry->d_lockref.count--;
- __dentry_kill(dentry);
- return parent;
-
-slow_positive:
- spin_unlock(&dentry->d_lock);
- spin_lock(&inode->i_lock);
- spin_lock(&dentry->d_lock);
- parent = lock_parent(dentry);
-got_locks:
- dentry->d_lockref.count--;
- if (likely(dentry->d_lockref.count == 0)) {
+ rcu_read_lock();
+ if (likely(lock_for_kill(dentry))) {
+ struct dentry *parent = dentry->d_parent;
+ rcu_read_unlock();
__dentry_kill(dentry);
- return parent;
+ return parent != dentry ? parent : NULL;
}
- /* we are keeping it, after all */
- if (inode)
- spin_unlock(&inode->i_lock);
- if (parent)
- spin_unlock(&parent->d_lock);
+ rcu_read_unlock();
spin_unlock(&dentry->d_lock);
return NULL;
}
@@ -1100,58 +1125,6 @@ void d_prune_aliases(struct inode *inode)
}
EXPORT_SYMBOL(d_prune_aliases);

-/*
- * Lock a dentry from shrink list.
- * Called under rcu_read_lock() and dentry->d_lock; the former
- * guarantees that nothing we access will be freed under us.
- * Note that dentry is *not* protected from concurrent dentry_kill(),
- * d_delete(), etc.
- *
- * Return false if dentry has been disrupted or grabbed, leaving
- * the caller to kick it off-list. Otherwise, return true and have
- * that dentry's inode and parent both locked.
- */
-static bool shrink_lock_dentry(struct dentry *dentry)
-{
- struct inode *inode;
- struct dentry *parent;
-
- if (dentry->d_lockref.count)
- return false;
-
- inode = dentry->d_inode;
- if (inode && unlikely(!spin_trylock(&inode->i_lock))) {
- spin_unlock(&dentry->d_lock);
- spin_lock(&inode->i_lock);
- spin_lock(&dentry->d_lock);
- if (unlikely(dentry->d_lockref.count))
- goto out;
- /* changed inode means that somebody had grabbed it */
- if (unlikely(inode != dentry->d_inode))
- goto out;
- }
-
- parent = dentry->d_parent;
- if (IS_ROOT(dentry) || likely(spin_trylock(&parent->d_lock)))
- return true;
-
- spin_unlock(&dentry->d_lock);
- spin_lock(&parent->d_lock);
- if (unlikely(parent != dentry->d_parent)) {
- spin_unlock(&parent->d_lock);
- spin_lock(&dentry->d_lock);
- goto out;
- }
- spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
- if (likely(!dentry->d_lockref.count))
- return true;
- spin_unlock(&parent->d_lock);
-out:
- if (inode)
- spin_unlock(&inode->i_lock);
- return false;
-}
-
static inline void shrink_kill(struct dentry *victim, struct list_head *list)
{
struct dentry *parent = victim->d_parent;
@@ -1170,7 +1143,7 @@ void shrink_dentry_list(struct list_head *list)
dentry = list_entry(list->prev, struct dentry, d_lru);
spin_lock(&dentry->d_lock);
rcu_read_lock();
- if (!shrink_lock_dentry(dentry)) {
+ if (!lock_for_kill(dentry)) {
bool can_free;
rcu_read_unlock();
d_shrink_del(dentry);
@@ -1614,7 +1587,7 @@ void shrink_dcache_parent(struct dentry *parent)
d_walk(parent, &data, select_collect2);
if (data.victim) {
spin_lock(&data.victim->d_lock);
- if (!shrink_lock_dentry(data.victim)) {
+ if (!lock_for_kill(data.victim)) {
spin_unlock(&data.victim->d_lock);
rcu_read_unlock();
} else {
--
2.39.2

2023-11-24 06:06:48

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 20/21] __dentry_kill(): new locking scheme

Currently we enter __dentry_kill() with parent (along with the victim
dentry and victim's inode) held locked. Then we
mark dentry refcount as dead
call ->d_prune()
remove dentry from hash
remove it from the parent's list of children
unlock the parent, don't need it from that point on
detach dentry from inode, unlock dentry and drop the inode
(via ->d_iput())
call ->d_release()
regain the lock on dentry
check if it's on a shrink list (in which case freeing its empty husk
has to be left to shrink_dentry_list()) or not (in which case we can free it
ourselves). In the former case, mark it as an empty husk, so that
shrink_dentry_list() would know it can free the sucker.
drop the lock on dentry
... and usually the caller proceeds to drop a reference on the parent,
possibly retaking the lock on it.

That is painful for a bunch of reasons, starting with the need to take locks
out of order, but not limited to that - the parent of positive dentry can
change if we drop its ->d_lock, so getting these locks has to be done with
care. Moreover, as soon as dentry is out of the parent's list of children,
shrink_dcache_for_umount() won't see it anymore, making it appear as if
the parent is inexplicably busy. We do work around that by having
shrink_dentry_list() decrement the parent's refcount first and put it on
shrink list to be evicted once we are done with __dentry_kill() of child,
but that may in some cases lead to ->d_iput() on child called after the
parent got killed. That doesn't happen in cases where in-tree ->d_iput()
instances might want to look at the parent, but that's brittle as hell.

Solution: do removal from the parent's list of children in the very
end of __dentry_kill(). As the result, the callers do not need to
lock the parent and by the time we really need the parent locked,
dentry is negative and is guaranteed not to be moved around.

It does mean that ->d_prune() will be called with parent not locked.
It also means that we might see dentries in process of being torn
down while going through the parent's list of children; those dentries
will be unhashed, negative and with refcount marked dead. In practice,
that's enough for in-tree code that looks through the list of children
to do the right thing as-is. Out-of-tree code might need to be adjusted.

Calling conventions: __dentry_kill(dentry) is called with dentry->d_lock
held, along with ->i_lock of its inode (if any). It either returns
the parent (locked, with refcount decremented to 0) or NULL (if there'd
been no parent or if refcount decrement for parent hadn't reached 0).

lock_for_kill() is adjusted for new requirements - it doesn't touch
the parent's ->d_lock at all.

Callers adjusted. Note that for dput() we don't need to bother with
fast_dput() for the parent - we just need to check retain_dentry()
for it, since its ->d_lock is still held since the moment when
__dentry_kill() had taken it to remove the victim from the list of
children.

The kludge with early decrement of parent's refcount in
shrink_dentry_list() is no longer needed - shrink_dcache_for_umount()
sees the half-killed dentries in the list of children for as long
as they are pinning the parent. They are easily recognized and
accounted for by select_collect(), so we know we are not done yet.

As the result, we always have the expected ordering for ->d_iput()/->d_release()
vs. __dentry_kill() of the parent, no exceptions. Moreover, the current
rules for shrink lists (one must make sure that shrink_dcache_for_umount()
won't happen while any dentries from the superblock in question are on
any shrink lists) are gone - shrink_dcache_for_umount() will do the
right thing in all cases, taking such dentries out. Their empty
husks (memory occupied by struct dentry itself + its external name,
if any) will remain on the shrink lists, but they are no obstacles
to filesystem shutdown. And such husks will get freed as soon as
shrink_dentry_list() of the list they are on gets to them.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/porting.rst | 17 ++++
fs/dcache.c | 129 ++++++++++----------------
2 files changed, 65 insertions(+), 81 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 3714efcb6f65..c15e4dffa8af 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1078,3 +1078,20 @@ by compiler.
->d_delete() instances are now called for dentries with ->d_lock held
and refcount equal to 0. They are not permitted to drop/regain ->d_lock.
None of in-tree instances did anything of that sort. Make sure yours do not...
+
+--
+
+**mandatory**
+
+ ->d_prune() instances are now called without ->d_lock held on the parent.
+->d_lock on dentry itself is still held; if you need per-parent exclusions (none
+of the in-tree instances did), use your own spinlock.
+
+ ->d_iput() and ->d_release() are called with victim dentry still in the
+list of parent's children. It is still unhashed, marked killed, etc., just not
+removed from parent's ->d_children yet.
+
+ Anyone iterating through the list of children needs to be aware of the
+half-killed dentries that might be seen there; taking ->d_lock on those will
+see them negative, unhashed and with negative refcount, which means that most
+of the in-kernel users would've done the right thing anyway without any adjustment.
diff --git a/fs/dcache.c b/fs/dcache.c
index a3cc612a80d5..c795154ffa3a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -575,12 +575,10 @@ static inline void dentry_unlist(struct dentry *dentry)
}
}

-static void __dentry_kill(struct dentry *dentry)
+static struct dentry *__dentry_kill(struct dentry *dentry)
{
struct dentry *parent = NULL;
bool can_free = true;
- if (!IS_ROOT(dentry))
- parent = dentry->d_parent;

/*
* The dentry is now unrecoverably dead to the world.
@@ -600,9 +598,6 @@ static void __dentry_kill(struct dentry *dentry)
}
/* if it was on the hash then remove it */
__d_drop(dentry);
- dentry_unlist(dentry);
- if (parent)
- spin_unlock(&parent->d_lock);
if (dentry->d_inode)
dentry_unlink_inode(dentry);
else
@@ -611,7 +606,14 @@ static void __dentry_kill(struct dentry *dentry)
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);

- spin_lock(&dentry->d_lock);
+ cond_resched();
+ /* now that it's negative, ->d_parent is stable */
+ if (!IS_ROOT(dentry)) {
+ parent = dentry->d_parent;
+ spin_lock(&parent->d_lock);
+ }
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ dentry_unlist(dentry);
if (dentry->d_flags & DCACHE_SHRINK_LIST) {
dentry->d_flags |= DCACHE_MAY_FREE;
can_free = false;
@@ -619,31 +621,10 @@ static void __dentry_kill(struct dentry *dentry)
spin_unlock(&dentry->d_lock);
if (likely(can_free))
dentry_free(dentry);
- cond_resched();
-}
-
-static struct dentry *__lock_parent(struct dentry *dentry)
-{
- struct dentry *parent;
-again:
- parent = READ_ONCE(dentry->d_parent);
- spin_lock(&parent->d_lock);
- /*
- * We can't blindly lock dentry until we are sure
- * that we won't violate the locking order.
- * Any changes of dentry->d_parent must have
- * been done with parent->d_lock held, so
- * spin_lock() above is enough of a barrier
- * for checking if it's still our child.
- */
- if (unlikely(parent != dentry->d_parent)) {
+ if (parent && --parent->d_lockref.count) {
spin_unlock(&parent->d_lock);
- goto again;
+ return NULL;
}
- if (parent != dentry)
- spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
- else
- parent = NULL;
return parent;
}

@@ -655,48 +636,32 @@ static struct dentry *__lock_parent(struct dentry *dentry)
* d_delete(), etc.
*
* Return false if dentry is busy. Otherwise, return true and have
- * that dentry's inode and parent both locked.
+ * that dentry's inode locked.
*/

static bool lock_for_kill(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
- struct dentry *parent = dentry->d_parent;

if (unlikely(dentry->d_lockref.count))
return false;

- if (inode && unlikely(!spin_trylock(&inode->i_lock)))
- goto slow;
- if (dentry == parent)
- return true;
- if (likely(spin_trylock(&parent->d_lock)))
+ if (!inode || likely(spin_trylock(&inode->i_lock)))
return true;

- if (inode)
- spin_unlock(&inode->i_lock);
-slow:
- spin_unlock(&dentry->d_lock);
-
- for (;;) {
- if (inode)
- spin_lock(&inode->i_lock);
- parent = __lock_parent(dentry);
+ do {
+ spin_unlock(&dentry->d_lock);
+ spin_lock(&inode->i_lock);
+ spin_lock(&dentry->d_lock);
if (likely(inode == dentry->d_inode))
break;
- if (inode)
- spin_unlock(&inode->i_lock);
+ spin_unlock(&inode->i_lock);
inode = dentry->d_inode;
- spin_unlock(&dentry->d_lock);
- if (parent)
- spin_unlock(&parent->d_lock);
- }
+ } while (inode);
if (likely(!dentry->d_lockref.count))
return true;
if (inode)
spin_unlock(&inode->i_lock);
- if (parent)
- spin_unlock(&parent->d_lock);
return false;
}

@@ -869,29 +834,27 @@ static inline bool fast_dput(struct dentry *dentry)
*/
void dput(struct dentry *dentry)
{
- while (dentry) {
- might_sleep();
-
- rcu_read_lock();
- if (likely(fast_dput(dentry))) {
- rcu_read_unlock();
+ if (!dentry)
+ return;
+ might_sleep();
+ rcu_read_lock();
+ if (likely(fast_dput(dentry))) {
+ rcu_read_unlock();
+ return;
+ }
+ while (lock_for_kill(dentry)) {
+ rcu_read_unlock();
+ dentry = __dentry_kill(dentry);
+ if (!dentry)
return;
- }
-
- /* Slow case: now with the dentry lock held */
- if (likely(lock_for_kill(dentry))) {
- struct dentry *parent = dentry->d_parent;
- rcu_read_unlock();
- __dentry_kill(dentry);
- if (dentry == parent)
- return;
- dentry = parent;
- } else {
- rcu_read_unlock();
+ if (retain_dentry(dentry)) {
spin_unlock(&dentry->d_lock);
return;
}
+ rcu_read_lock();
}
+ rcu_read_unlock();
+ spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL(dput);

@@ -1091,12 +1054,16 @@ void d_prune_aliases(struct inode *inode)
}
EXPORT_SYMBOL(d_prune_aliases);

-static inline void shrink_kill(struct dentry *victim, struct list_head *list)
+static inline void shrink_kill(struct dentry *victim)
{
- struct dentry *parent = victim->d_parent;
- if (parent != victim && !--parent->d_lockref.count)
- to_shrink_list(parent, list);
- __dentry_kill(victim);
+ do {
+ rcu_read_unlock();
+ victim = __dentry_kill(victim);
+ rcu_read_lock();
+ } while (victim && lock_for_kill(victim));
+ rcu_read_unlock();
+ if (victim)
+ spin_unlock(&victim->d_lock);
}

void shrink_dentry_list(struct list_head *list)
@@ -1117,9 +1084,8 @@ void shrink_dentry_list(struct list_head *list)
dentry_free(dentry);
continue;
}
- rcu_read_unlock();
d_shrink_del(dentry);
- shrink_kill(dentry, list);
+ shrink_kill(dentry);
}
}

@@ -1478,6 +1444,8 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)
} else if (!dentry->d_lockref.count) {
to_shrink_list(dentry, &data->dispose);
data->found++;
+ } else if (dentry->d_lockref.count < 0) {
+ data->found++;
}
/*
* We can return to the caller if we have found some (this
@@ -1547,8 +1515,7 @@ void shrink_dcache_parent(struct dentry *parent)
spin_unlock(&data.victim->d_lock);
rcu_read_unlock();
} else {
- rcu_read_unlock();
- shrink_kill(data.victim, &data.dispose);
+ shrink_kill(data.victim);
}
}
if (!list_empty(&data.dispose))
--
2.39.2

2023-11-24 06:06:58

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 17/21] to_shrink_list(): call only if refcount is 0

The only thing it does if refcount is not zero is d_lru_del(); no
point, IMO, seeing that plain dput() does nothing of that sort...

Note that 2 of 3 current callers are guaranteed that refcount is 0.

Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5284b02747cd..704676bf06fd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -915,8 +915,7 @@ __must_hold(&dentry->d_lock)
if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
if (dentry->d_flags & DCACHE_LRU_LIST)
d_lru_del(dentry);
- if (!dentry->d_lockref.count)
- d_shrink_add(dentry, list);
+ d_shrink_add(dentry, list);
}
}

@@ -1115,10 +1114,8 @@ EXPORT_SYMBOL(d_prune_aliases);
static inline void shrink_kill(struct dentry *victim, struct list_head *list)
{
struct dentry *parent = victim->d_parent;
- if (parent != victim) {
- --parent->d_lockref.count;
+ if (parent != victim && !--parent->d_lockref.count)
to_shrink_list(parent, list);
- }
__dentry_kill(victim);
}

--
2.39.2

2023-11-24 06:07:19

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 03/21] dentry: switch the lists of children to hlist

Saves a pointer per struct dentry and actually makes the things less
clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use
of hlist_for_... iterators.

A couple of new helpers - d_first_child() and d_next_sibling(),
to make the expressions less awful.

X-fuck-kABI: gladly
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/porting.rst | 9 +++
arch/powerpc/platforms/cell/spufs/inode.c | 5 +-
fs/afs/dynroot.c | 5 +-
fs/autofs/expire.c | 7 +--
fs/ceph/dir.c | 2 +-
fs/ceph/mds_client.c | 2 +-
fs/coda/cache.c | 2 +-
fs/dcache.c | 76 +++++++++++------------
fs/libfs.c | 45 +++++++-------
fs/notify/fsnotify.c | 2 +-
fs/tracefs/inode.c | 34 +++++-----
include/linux/dcache.h | 20 ++++--
12 files changed, 108 insertions(+), 101 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 878e72b2f8b7..331405f4b29f 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1061,3 +1061,12 @@ export_operations ->encode_fh() no longer has a default implementation to
encode FILEID_INO32_GEN* file handles.
Filesystems that used the default implementation may use the generic helper
generic_encode_ino32_fh() explicitly.
+
+---
+
+**mandatory**
+
+The list of children anchored in parent dentry got turned into hlist now.
+Field names got changed (->d_children/->d_sib instead of ->d_subdirs/->d_child
+for anchor/entries resp.), so any affected places will be immediately caught
+by compiler.
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 10c1320adfd0..030de2b8c145 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -145,10 +145,11 @@ spufs_evict_inode(struct inode *inode)

static void spufs_prune_dir(struct dentry *dir)
{
- struct dentry *dentry, *tmp;
+ struct dentry *dentry;
+ struct hlist_node *n;

inode_lock(d_inode(dir));
- list_for_each_entry_safe(dentry, tmp, &dir->d_subdirs, d_child) {
+ hlist_for_each_entry_safe(dentry, n, &dir->d_children, d_sib) {
spin_lock(&dentry->d_lock);
if (simple_positive(dentry)) {
dget_dlock(dentry);
diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c
index 4d04ef2d3ae7..fe45462834cc 100644
--- a/fs/afs/dynroot.c
+++ b/fs/afs/dynroot.c
@@ -370,7 +370,7 @@ int afs_dynroot_populate(struct super_block *sb)
void afs_dynroot_depopulate(struct super_block *sb)
{
struct afs_net *net = afs_sb2net(sb);
- struct dentry *root = sb->s_root, *subdir, *tmp;
+ struct dentry *root = sb->s_root, *subdir;

/* Prevent more subdirs from being created */
mutex_lock(&net->proc_cells_lock);
@@ -379,10 +379,11 @@ void afs_dynroot_depopulate(struct super_block *sb)
mutex_unlock(&net->proc_cells_lock);

if (root) {
+ struct hlist_node *n;
inode_lock(root->d_inode);

/* Remove all the pins for dirs created for manually added cells */
- list_for_each_entry_safe(subdir, tmp, &root->d_subdirs, d_child) {
+ hlist_for_each_entry_safe(subdir, n, &root->d_children, d_sib) {
if (subdir->d_fsdata) {
subdir->d_fsdata = NULL;
dput(subdir);
diff --git a/fs/autofs/expire.c b/fs/autofs/expire.c
index 038b3d2d9f57..39d8c84c16f4 100644
--- a/fs/autofs/expire.c
+++ b/fs/autofs/expire.c
@@ -73,12 +73,9 @@ static int autofs_mount_busy(struct vfsmount *mnt,
/* p->d_lock held */
static struct dentry *positive_after(struct dentry *p, struct dentry *child)
{
- if (child)
- child = list_next_entry(child, d_child);
- else
- child = list_first_entry(&p->d_subdirs, struct dentry, d_child);
+ child = child ? d_next_sibling(child) : d_first_child(p);

- list_for_each_entry_from(child, &p->d_subdirs, d_child) {
+ hlist_for_each_entry_from(child, d_sib) {
spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
if (simple_positive(child)) {
dget_dlock(child);
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 91709934c8b1..678596684596 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -174,7 +174,7 @@ __dcache_find_get_entry(struct dentry *parent, u64 idx,
/*
* When possible, we try to satisfy a readdir by peeking at the
* dcache. We make this work by carefully ordering dentries on
- * d_child when we initially get results back from the MDS, and
+ * d_children when we initially get results back from the MDS, and
* falling back to a "normal" sync readdir if any dentries in the dir
* are dropped.
*
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index d95eb525519a..02ebfabfc8ee 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2128,7 +2128,7 @@ static bool drop_negative_children(struct dentry *dentry)
goto out;

spin_lock(&dentry->d_lock);
- list_for_each_entry(child, &dentry->d_subdirs, d_child) {
+ hlist_for_each_entry(child, &dentry->d_children, d_sib) {
if (d_really_is_positive(child)) {
all_negative = false;
break;
diff --git a/fs/coda/cache.c b/fs/coda/cache.c
index bfbc03c6b632..f5b71a35f9db 100644
--- a/fs/coda/cache.c
+++ b/fs/coda/cache.c
@@ -94,7 +94,7 @@ static void coda_flag_children(struct dentry *parent, int flag)

rcu_read_lock();
spin_lock(&parent->d_lock);
- list_for_each_entry(de, &parent->d_subdirs, d_child) {
+ hlist_for_each_entry(de, &parent->d_children, d_sib) {
struct inode *inode = d_inode_rcu(de);
/* don't know what to do with negative dentries */
if (inode)
diff --git a/fs/dcache.c b/fs/dcache.c
index c82ae731df9a..59f76c9a15d1 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -51,8 +51,8 @@
* - d_lru
* - d_count
* - d_unhashed()
- * - d_parent and d_subdirs
- * - childrens' d_child and d_parent
+ * - d_parent and d_chilren
+ * - childrens' d_sib and d_parent
* - d_u.d_alias, d_inode
*
* Ordering:
@@ -537,7 +537,7 @@ void d_drop(struct dentry *dentry)
}
EXPORT_SYMBOL(d_drop);

-static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
+static inline void dentry_unlist(struct dentry *dentry)
{
struct dentry *next;
/*
@@ -545,12 +545,12 @@ static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
* attached to the dentry tree
*/
dentry->d_flags |= DCACHE_DENTRY_KILLED;
- if (unlikely(list_empty(&dentry->d_child)))
+ if (unlikely(hlist_unhashed(&dentry->d_sib)))
return;
- __list_del_entry(&dentry->d_child);
+ __hlist_del(&dentry->d_sib);
/*
* Cursors can move around the list of children. While we'd been
- * a normal list member, it didn't matter - ->d_child.next would've
+ * a normal list member, it didn't matter - ->d_sib.next would've
* been updated. However, from now on it won't be and for the
* things like d_walk() it might end up with a nasty surprise.
* Normally d_walk() doesn't care about cursors moving around -
@@ -558,20 +558,20 @@ static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
* of its own, we get through it without ever unlocking the parent.
* There is one exception, though - if we ascend from a child that
* gets killed as soon as we unlock it, the next sibling is found
- * using the value left in its ->d_child.next. And if _that_
+ * using the value left in its ->d_sib.next. And if _that_
* pointed to a cursor, and cursor got moved (e.g. by lseek())
* before d_walk() regains parent->d_lock, we'll end up skipping
* everything the cursor had been moved past.
*
- * Solution: make sure that the pointer left behind in ->d_child.next
+ * Solution: make sure that the pointer left behind in ->d_sib.next
* points to something that won't be moving around. I.e. skip the
* cursors.
*/
- while (dentry->d_child.next != &parent->d_subdirs) {
- next = list_entry(dentry->d_child.next, struct dentry, d_child);
+ while (dentry->d_sib.next) {
+ next = hlist_entry(dentry->d_sib.next, struct dentry, d_sib);
if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))
break;
- dentry->d_child.next = next->d_child.next;
+ dentry->d_sib.next = next->d_sib.next;
}
}

@@ -600,7 +600,7 @@ static void __dentry_kill(struct dentry *dentry)
}
/* if it was on the hash then remove it */
__d_drop(dentry);
- dentry_unlist(dentry, parent);
+ dentry_unlist(dentry);
if (parent)
spin_unlock(&parent->d_lock);
if (dentry->d_inode)
@@ -1348,8 +1348,7 @@ enum d_walk_ret {
static void d_walk(struct dentry *parent, void *data,
enum d_walk_ret (*enter)(void *, struct dentry *))
{
- struct dentry *this_parent;
- struct list_head *next;
+ struct dentry *this_parent, *dentry;
unsigned seq = 0;
enum d_walk_ret ret;
bool retry = true;
@@ -1371,13 +1370,9 @@ static void d_walk(struct dentry *parent, void *data,
break;
}
repeat:
- next = this_parent->d_subdirs.next;
+ dentry = d_first_child(this_parent);
resume:
- while (next != &this_parent->d_subdirs) {
- struct list_head *tmp = next;
- struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
- next = tmp->next;
-
+ hlist_for_each_entry_from(dentry, d_sib) {
if (unlikely(dentry->d_flags & DCACHE_DENTRY_CURSOR))
continue;

@@ -1398,7 +1393,7 @@ static void d_walk(struct dentry *parent, void *data,
continue;
}

- if (!list_empty(&dentry->d_subdirs)) {
+ if (!hlist_empty(&dentry->d_children)) {
spin_unlock(&this_parent->d_lock);
spin_release(&dentry->d_lock.dep_map, _RET_IP_);
this_parent = dentry;
@@ -1413,24 +1408,23 @@ static void d_walk(struct dentry *parent, void *data,
rcu_read_lock();
ascend:
if (this_parent != parent) {
- struct dentry *child = this_parent;
- this_parent = child->d_parent;
+ dentry = this_parent;
+ this_parent = dentry->d_parent;

- spin_unlock(&child->d_lock);
+ spin_unlock(&dentry->d_lock);
spin_lock(&this_parent->d_lock);

/* might go back up the wrong parent if we have had a rename. */
if (need_seqretry(&rename_lock, seq))
goto rename_retry;
/* go into the first sibling still alive */
- do {
- next = child->d_child.next;
- if (next == &this_parent->d_subdirs)
- goto ascend;
- child = list_entry(next, struct dentry, d_child);
- } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
- rcu_read_unlock();
- goto resume;
+ hlist_for_each_entry_continue(dentry, d_sib) {
+ if (likely(!(dentry->d_flags & DCACHE_DENTRY_KILLED))) {
+ rcu_read_unlock();
+ goto resume;
+ }
+ }
+ goto ascend;
}
if (need_seqretry(&rename_lock, seq))
goto rename_retry;
@@ -1530,7 +1524,7 @@ int d_set_mounted(struct dentry *dentry)
* Search the dentry child list of the specified parent,
* and move any unused dentries to the end of the unused
* list for prune_dcache(). We descend to the next level
- * whenever the d_subdirs list is non-empty and continue
+ * whenever the d_children list is non-empty and continue
* searching.
*
* It returns zero iff there are no unused children,
@@ -1657,7 +1651,7 @@ EXPORT_SYMBOL(shrink_dcache_parent);
static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
{
/* it has busy descendents; complain about those instead */
- if (!list_empty(&dentry->d_subdirs))
+ if (!hlist_empty(&dentry->d_children))
return D_WALK_CONTINUE;

/* root with refcount 1 is fine */
@@ -1814,9 +1808,9 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
dentry->d_fsdata = NULL;
INIT_HLIST_BL_NODE(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
- INIT_LIST_HEAD(&dentry->d_subdirs);
+ INIT_HLIST_HEAD(&dentry->d_children);
INIT_HLIST_NODE(&dentry->d_u.d_alias);
- INIT_LIST_HEAD(&dentry->d_child);
+ INIT_HLIST_NODE(&dentry->d_sib);
d_set_d_op(dentry, dentry->d_sb->s_d_op);

if (dentry->d_op && dentry->d_op->d_init) {
@@ -1855,7 +1849,7 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
*/
__dget_dlock(parent);
dentry->d_parent = parent;
- list_add(&dentry->d_child, &parent->d_subdirs);
+ hlist_add_head(&dentry->d_sib, &parent->d_children);
spin_unlock(&parent->d_lock);

return dentry;
@@ -2993,11 +2987,15 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
} else {
target->d_parent = old_parent;
swap_names(dentry, target);
- list_move(&target->d_child, &target->d_parent->d_subdirs);
+ if (!hlist_unhashed(&target->d_sib))
+ __hlist_del(&target->d_sib);
+ hlist_add_head(&target->d_sib, &target->d_parent->d_children);
__d_rehash(target);
fsnotify_update_flags(target);
}
- list_move(&dentry->d_child, &dentry->d_parent->d_subdirs);
+ if (!hlist_unhashed(&dentry->d_sib))
+ __hlist_del(&dentry->d_sib);
+ hlist_add_head(&dentry->d_sib, &dentry->d_parent->d_children);
__d_rehash(dentry);
fsnotify_update_flags(dentry);
fscrypt_handle_d_move(dentry);
diff --git a/fs/libfs.c b/fs/libfs.c
index e9440d55073c..46c9177769c1 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -104,15 +104,16 @@ EXPORT_SYMBOL(dcache_dir_close);
* If no such element exists, NULL is returned.
*/
static struct dentry *scan_positives(struct dentry *cursor,
- struct list_head *p,
+ struct hlist_node **p,
loff_t count,
struct dentry *last)
{
struct dentry *dentry = cursor->d_parent, *found = NULL;

spin_lock(&dentry->d_lock);
- while ((p = p->next) != &dentry->d_subdirs) {
- struct dentry *d = list_entry(p, struct dentry, d_child);
+ while (*p) {
+ struct dentry *d = hlist_entry(*p, struct dentry, d_sib);
+ p = &d->d_sib.next;
// we must at least skip cursors, to avoid livelocks
if (d->d_flags & DCACHE_DENTRY_CURSOR)
continue;
@@ -126,8 +127,10 @@ static struct dentry *scan_positives(struct dentry *cursor,
count = 1;
}
if (need_resched()) {
- list_move(&cursor->d_child, p);
- p = &cursor->d_child;
+ if (!hlist_unhashed(&cursor->d_sib))
+ __hlist_del(&cursor->d_sib);
+ hlist_add_behind(&cursor->d_sib, &d->d_sib);
+ p = &cursor->d_sib.next;
spin_unlock(&dentry->d_lock);
cond_resched();
spin_lock(&dentry->d_lock);
@@ -159,13 +162,12 @@ loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence)
inode_lock_shared(dentry->d_inode);

if (offset > 2)
- to = scan_positives(cursor, &dentry->d_subdirs,
+ to = scan_positives(cursor, &dentry->d_children.first,
offset - 2, NULL);
spin_lock(&dentry->d_lock);
+ hlist_del_init(&cursor->d_sib);
if (to)
- list_move(&cursor->d_child, &to->d_child);
- else
- list_del_init(&cursor->d_child);
+ hlist_add_behind(&cursor->d_sib, &to->d_sib);
spin_unlock(&dentry->d_lock);
dput(to);

@@ -187,19 +189,16 @@ int dcache_readdir(struct file *file, struct dir_context *ctx)
{
struct dentry *dentry = file->f_path.dentry;
struct dentry *cursor = file->private_data;
- struct list_head *anchor = &dentry->d_subdirs;
struct dentry *next = NULL;
- struct list_head *p;
+ struct hlist_node **p;

if (!dir_emit_dots(file, ctx))
return 0;

if (ctx->pos == 2)
- p = anchor;
- else if (!list_empty(&cursor->d_child))
- p = &cursor->d_child;
+ p = &dentry->d_children.first;
else
- return 0;
+ p = &cursor->d_sib.next;

while ((next = scan_positives(cursor, p, 1, next)) != NULL) {
if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
@@ -207,13 +206,12 @@ int dcache_readdir(struct file *file, struct dir_context *ctx)
fs_umode_to_dtype(d_inode(next)->i_mode)))
break;
ctx->pos++;
- p = &next->d_child;
+ p = &next->d_sib.next;
}
spin_lock(&dentry->d_lock);
+ hlist_del_init(&cursor->d_sib);
if (next)
- list_move_tail(&cursor->d_child, &next->d_child);
- else
- list_del_init(&cursor->d_child);
+ hlist_add_before(&cursor->d_sib, &next->d_sib);
spin_unlock(&dentry->d_lock);
dput(next);

@@ -492,12 +490,11 @@ const struct file_operations simple_offset_dir_operations = {

static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev)
{
- struct dentry *child = NULL;
- struct list_head *p = prev ? &prev->d_child : &parent->d_subdirs;
+ struct dentry *child = NULL, *d;

spin_lock(&parent->d_lock);
- while ((p = p->next) != &parent->d_subdirs) {
- struct dentry *d = container_of(p, struct dentry, d_child);
+ d = prev ? d_next_sibling(prev) : d_first_child(parent);
+ hlist_for_each_entry_from(d, d_sib) {
if (simple_positive(d)) {
spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
if (simple_positive(d))
@@ -658,7 +655,7 @@ int simple_empty(struct dentry *dentry)
int ret = 0;

spin_lock(&dentry->d_lock);
- list_for_each_entry(child, &dentry->d_subdirs, d_child) {
+ hlist_for_each_entry(child, &dentry->d_children, d_sib) {
spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
if (simple_positive(child)) {
spin_unlock(&child->d_lock);
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 7974e91ffe13..8bfd690e9f10 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -124,7 +124,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
* d_flags to indicate parental interest (their parent is the
* original inode) */
spin_lock(&alias->d_lock);
- list_for_each_entry(child, &alias->d_subdirs, d_child) {
+ hlist_for_each_entry(child, &alias->d_children, d_sib) {
if (!child->d_inode)
continue;

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 5b54948514fe..61ca5fcf10f9 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -199,26 +199,21 @@ static void change_gid(struct dentry *dentry, kgid_t gid)
*/
static void set_gid(struct dentry *parent, kgid_t gid)
{
- struct dentry *this_parent;
- struct list_head *next;
+ struct dentry *this_parent, *dentry;

this_parent = parent;
spin_lock(&this_parent->d_lock);

change_gid(this_parent, gid);
repeat:
- next = this_parent->d_subdirs.next;
+ dentry = d_first_child(this_parent);
resume:
- while (next != &this_parent->d_subdirs) {
- struct list_head *tmp = next;
- struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
- next = tmp->next;
-
+ hlist_for_each_entry_from(dentry, d_sib) {
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);

change_gid(dentry, gid);

- if (!list_empty(&dentry->d_subdirs)) {
+ if (!hlist_empty(&dentry->d_children)) {
spin_unlock(&this_parent->d_lock);
spin_release(&dentry->d_lock.dep_map, _RET_IP_);
this_parent = dentry;
@@ -233,21 +228,20 @@ static void set_gid(struct dentry *parent, kgid_t gid)
rcu_read_lock();
ascend:
if (this_parent != parent) {
- struct dentry *child = this_parent;
- this_parent = child->d_parent;
+ dentry = this_parent;
+ this_parent = dentry->d_parent;

- spin_unlock(&child->d_lock);
+ spin_unlock(&dentry->d_lock);
spin_lock(&this_parent->d_lock);

/* go into the first sibling still alive */
- do {
- next = child->d_child.next;
- if (next == &this_parent->d_subdirs)
- goto ascend;
- child = list_entry(next, struct dentry, d_child);
- } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
- rcu_read_unlock();
- goto resume;
+ hlist_for_each_entry_continue(dentry, d_sib) {
+ if (likely(!(dentry->d_flags & DCACHE_DENTRY_KILLED))) {
+ rcu_read_unlock();
+ goto resume;
+ }
+ }
+ goto ascend;
}
rcu_read_unlock();
spin_unlock(&this_parent->d_lock);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3da2f0545d5d..0e397a0c519c 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -68,12 +68,12 @@ extern const struct qstr dotdot_name;
* large memory footprint increase).
*/
#ifdef CONFIG_64BIT
-# define DNAME_INLINE_LEN 32 /* 192 bytes */
+# define DNAME_INLINE_LEN 40 /* 192 bytes */
#else
# ifdef CONFIG_SMP
-# define DNAME_INLINE_LEN 36 /* 128 bytes */
-# else
# define DNAME_INLINE_LEN 40 /* 128 bytes */
+# else
+# define DNAME_INLINE_LEN 44 /* 128 bytes */
# endif
#endif

@@ -101,8 +101,8 @@ struct dentry {
struct list_head d_lru; /* LRU list */
wait_queue_head_t *d_wait; /* in-lookup ones only */
};
- struct list_head d_child; /* child of parent list */
- struct list_head d_subdirs; /* our children */
+ struct hlist_node d_sib; /* child of parent list */
+ struct hlist_head d_children; /* our children */
/*
* d_alias and d_rcu can share memory
*/
@@ -600,4 +600,14 @@ struct name_snapshot {
void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
void release_dentry_name_snapshot(struct name_snapshot *);

+static inline struct dentry *d_first_child(const struct dentry *dentry)
+{
+ return hlist_entry_safe(dentry->d_children.first, struct dentry, d_sib);
+}
+
+static inline struct dentry *d_next_sibling(const struct dentry *dentry)
+{
+ return hlist_entry_safe(dentry->d_sib.next, struct dentry, d_sib);
+}
+
#endif /* __LINUX_DCACHE_H */
--
2.39.2

2023-11-24 06:07:26

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 21/21] retain_dentry(): introduce a trimmed-down lockless variant

fast_dput() contains a small piece of code, preceded by scary
comments about 5 times longer than it. What is actually done there is
a trimmed-down subset of retain_dentry() - in some situations we can
tell that retain_dentry() would have returned true without ever needing
->d_lock and that's what that code checks. If these checks come true
fast_dput() can declare that we are done, without bothering with ->d_lock;
otherwise it has to take the lock and do full variant of retain_dentry()
checks.

Trimmed-down variant of the checks is hard to follow and
it's asking for trouble - if we ever decide to change the rules in
retain_dentry(), we'll have to remember to update that code. It turns
out that an equivalent variant of these checks more obviously parallel
to retain_dentry() is not just possible, but easy to unify with
retain_dentry() itself, passing it a new boolean argument ('locked')
to distinguish between the full semantics and trimmed down one.

Note that in lockless case true is returned only when locked
variant would have returned true without ever needing the lock; false
means "punt to the locking path and recheck there".

Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 95 ++++++++++++++++++++++++++---------------------------
1 file changed, 47 insertions(+), 48 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index c795154ffa3a..b212a65ed190 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -665,30 +665,57 @@ static bool lock_for_kill(struct dentry *dentry)
return false;
}

-static inline bool retain_dentry(struct dentry *dentry)
+/*
+ * Decide if dentry is worth retaining. Usually this is called with dentry
+ * locked; if not locked, we are more limited and might not be able to tell
+ * without a lock. False in this case means "punt to locked path and recheck".
+ *
+ * In case we aren't locked, these predicates are not "stable". However, it is
+ * sufficient that at some point after we dropped the reference the dentry was
+ * hashed and the flags had the proper value. Other dentry users may have
+ * re-gotten a reference to the dentry and change that, but our work is done -
+ * we can leave the dentry around with a zero refcount.
+ */
+static inline bool retain_dentry(struct dentry *dentry, bool locked)
{
- WARN_ON(d_in_lookup(dentry));
+ unsigned int d_flags;

- /* Unreachable? Get rid of it */
+ smp_rmb();
+ d_flags = READ_ONCE(dentry->d_flags);
+
+ // Unreachable? Nobody would be able to look it up, no point retaining
if (unlikely(d_unhashed(dentry)))
return false;

- if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
+ // Same if it's disconnected
+ if (unlikely(d_flags & DCACHE_DISCONNECTED))
return false;

- if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
- if (dentry->d_op->d_delete(dentry))
+ // ->d_delete() might tell us not to bother, but that requires
+ // ->d_lock; can't decide without it
+ if (unlikely(d_flags & DCACHE_OP_DELETE)) {
+ if (!locked || dentry->d_op->d_delete(dentry))
return false;
}

- if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
+ // Explicitly told not to bother
+ if (unlikely(d_flags & DCACHE_DONTCACHE))
return false;

- /* retain; LRU fodder */
- if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
+ // At this point it looks like we ought to keep it. We also might
+ // need to do something - put it on LRU if it wasn't there already
+ // and mark it referenced if it was on LRU, but not marked yet.
+ // Unfortunately, both actions require ->d_lock, so in lockless
+ // case we'd have to punt rather than doing those.
+ if (unlikely(!(d_flags & DCACHE_LRU_LIST))) {
+ if (!locked)
+ return false;
d_lru_add(dentry);
- else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED)))
+ } else if (unlikely(!(d_flags & DCACHE_REFERENCED))) {
+ if (!locked)
+ return false;
dentry->d_flags |= DCACHE_REFERENCED;
+ }
return true;
}

@@ -720,7 +747,6 @@ EXPORT_SYMBOL(d_mark_dontcache);
static inline bool fast_dput(struct dentry *dentry)
{
int ret;
- unsigned int d_flags;

/*
* try to decrement the lockref optimistically.
@@ -749,45 +775,18 @@ static inline bool fast_dput(struct dentry *dentry)
return true;

/*
- * Careful, careful. The reference count went down
- * to zero, but we don't hold the dentry lock, so
- * somebody else could get it again, and do another
- * dput(), and we need to not race with that.
- *
- * However, there is a very special and common case
- * where we don't care, because there is nothing to
- * do: the dentry is still hashed, it does not have
- * a 'delete' op, and it's referenced and already on
- * the LRU list.
- *
- * NOTE! Since we aren't locked, these values are
- * not "stable". However, it is sufficient that at
- * some point after we dropped the reference the
- * dentry was hashed and the flags had the proper
- * value. Other dentry users may have re-gotten
- * a reference to the dentry and change that, but
- * our work is done - we can leave the dentry
- * around with a zero refcount.
- *
- * Nevertheless, there are two cases that we should kill
- * the dentry anyway.
- * 1. free disconnected dentries as soon as their refcount
- * reached zero.
- * 2. free dentries if they should not be cached.
+ * Can we decide that decrement of refcount is all we needed without
+ * taking the lock? There's a very common case when it's all we need -
+ * dentry looks like it ought to be retained and there's nothing else
+ * to do.
*/
- smp_rmb();
- d_flags = READ_ONCE(dentry->d_flags);
- d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_OP_DELETE |
- DCACHE_DISCONNECTED | DCACHE_DONTCACHE;
-
- /* Nothing to do? Dropping the reference was all we needed? */
- if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && !d_unhashed(dentry))
+ if (retain_dentry(dentry, false))
return true;

/*
- * Not the fast normal case? Get the lock. We've already decremented
- * the refcount, but we'll need to re-check the situation after
- * getting the lock.
+ * Either not worth retaining or we can't tell without the lock.
+ * Get the lock, then. We've already decremented the refcount to 0,
+ * but we'll need to re-check the situation after getting the lock.
*/
spin_lock(&dentry->d_lock);

@@ -798,7 +797,7 @@ static inline bool fast_dput(struct dentry *dentry)
* don't need to do anything else.
*/
locked:
- if (dentry->d_lockref.count || retain_dentry(dentry)) {
+ if (dentry->d_lockref.count || retain_dentry(dentry, true)) {
spin_unlock(&dentry->d_lock);
return true;
}
@@ -847,7 +846,7 @@ void dput(struct dentry *dentry)
dentry = __dentry_kill(dentry);
if (!dentry)
return;
- if (retain_dentry(dentry)) {
+ if (retain_dentry(dentry, true)) {
spin_unlock(&dentry->d_lock);
return;
}
--
2.39.2

2023-11-24 06:07:44

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 16/21] fold dentry_kill() into dput()

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 37 ++++++++++++-------------------------
1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a7f99d46c41b..5284b02747cd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -756,28 +756,6 @@ void d_mark_dontcache(struct inode *inode)
}
EXPORT_SYMBOL(d_mark_dontcache);

-/*
- * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *dentry_kill(struct dentry *dentry)
- __releases(dentry->d_lock)
-{
-
- dentry->d_lockref.count--;
- rcu_read_lock();
- if (likely(lock_for_kill(dentry))) {
- struct dentry *parent = dentry->d_parent;
- rcu_read_unlock();
- __dentry_kill(dentry);
- return parent != dentry ? parent : NULL;
- }
- rcu_read_unlock();
- spin_unlock(&dentry->d_lock);
- return NULL;
-}
-
/*
* Try to do a lockless dput(), and return whether that was successful.
*
@@ -915,9 +893,18 @@ void dput(struct dentry *dentry)
}

/* Slow case: now with the dentry lock held */
- rcu_read_unlock();
- dentry->d_lockref.count = 1;
- dentry = dentry_kill(dentry);
+ if (likely(lock_for_kill(dentry))) {
+ struct dentry *parent = dentry->d_parent;
+ rcu_read_unlock();
+ __dentry_kill(dentry);
+ if (dentry == parent)
+ return;
+ dentry = parent;
+ } else {
+ rcu_read_unlock();
+ spin_unlock(&dentry->d_lock);
+ return;
+ }
}
}
EXPORT_SYMBOL(dput);
--
2.39.2

2023-11-24 06:07:58

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 19/21] d_prune_aliases(): use a shrink list

Instead of dropping aliases one by one, restarting, etc., just
collect them into a shrink list and kill them off in one pass.

We don't really need the restarts - one alias can't pin another
(directory has only one alias, and couldn't be its own ancestor
anyway), so collecting everything that is not busy and taking it
out would take care of everything evictable that had been there
as we entered the function. And new aliases added while we'd
been dropping old ones could just as easily have appeared right
as we return to caller...

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/dcache.c | 30 +++++-------------------------
1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f68fe7c863e0..a3cc612a80d5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -647,20 +647,6 @@ static struct dentry *__lock_parent(struct dentry *dentry)
return parent;
}

-static inline struct dentry *lock_parent(struct dentry *dentry)
-{
- struct dentry *parent = dentry->d_parent;
- if (IS_ROOT(dentry))
- return NULL;
- if (likely(spin_trylock(&parent->d_lock)))
- return parent;
- rcu_read_lock();
- spin_unlock(&dentry->d_lock);
- parent = __lock_parent(dentry);
- rcu_read_unlock();
- return parent;
-}
-
/*
* Lock a dentry for feeding it to __dentry_kill().
* Called under rcu_read_lock() and dentry->d_lock; the former
@@ -1090,24 +1076,18 @@ struct dentry *d_find_alias_rcu(struct inode *inode)
*/
void d_prune_aliases(struct inode *inode)
{
+ LIST_HEAD(dispose);
struct dentry *dentry;
-restart:
+
spin_lock(&inode->i_lock);
hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) {
spin_lock(&dentry->d_lock);
- if (!dentry->d_lockref.count) {
- struct dentry *parent = lock_parent(dentry);
- if (likely(!dentry->d_lockref.count)) {
- __dentry_kill(dentry);
- dput(parent);
- goto restart;
- }
- if (parent)
- spin_unlock(&parent->d_lock);
- }
+ if (!dentry->d_lockref.count)
+ to_shrink_list(dentry, &dispose);
spin_unlock(&dentry->d_lock);
}
spin_unlock(&inode->i_lock);
+ shrink_dentry_list(&dispose);
}
EXPORT_SYMBOL(d_prune_aliases);

--
2.39.2

2023-11-24 07:45:04

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v3 03/21] dentry: switch the lists of children to hlist

On Fri, Nov 24, 2023 at 8:05 AM Al Viro <[email protected]> wrote:
>
> Saves a pointer per struct dentry and actually makes the things less
> clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use
> of hlist_for_... iterators.
>
> A couple of new helpers - d_first_child() and d_next_sibling(),
> to make the expressions less awful.
>
> X-fuck-kABI: gladly

???

> Reviewed-by: Christian Brauner <[email protected]>
> Signed-off-by: Al Viro <[email protected]>
> ---
> Documentation/filesystems/porting.rst | 9 +++
> arch/powerpc/platforms/cell/spufs/inode.c | 5 +-
> fs/afs/dynroot.c | 5 +-
> fs/autofs/expire.c | 7 +--
> fs/ceph/dir.c | 2 +-
> fs/ceph/mds_client.c | 2 +-
> fs/coda/cache.c | 2 +-
> fs/dcache.c | 76 +++++++++++------------
> fs/libfs.c | 45 +++++++-------
> fs/notify/fsnotify.c | 2 +-
> fs/tracefs/inode.c | 34 +++++-----
> include/linux/dcache.h | 20 ++++--
> 12 files changed, 108 insertions(+), 101 deletions(-)
>
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 878e72b2f8b7..331405f4b29f 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -1061,3 +1061,12 @@ export_operations ->encode_fh() no longer has a default implementation to
> encode FILEID_INO32_GEN* file handles.
> Filesystems that used the default implementation may use the generic helper
> generic_encode_ino32_fh() explicitly.
> +
> +---
> +
> +**mandatory**
> +
> +The list of children anchored in parent dentry got turned into hlist now.
> +Field names got changed (->d_children/->d_sib instead of ->d_subdirs/->d_child
> +for anchor/entries resp.), so any affected places will be immediately caught
> +by compiler.
> diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
> index 10c1320adfd0..030de2b8c145 100644
> --- a/arch/powerpc/platforms/cell/spufs/inode.c
> +++ b/arch/powerpc/platforms/cell/spufs/inode.c
> @@ -145,10 +145,11 @@ spufs_evict_inode(struct inode *inode)
>
> static void spufs_prune_dir(struct dentry *dir)
> {
> - struct dentry *dentry, *tmp;
> + struct dentry *dentry;
> + struct hlist_node *n;
>
> inode_lock(d_inode(dir));
> - list_for_each_entry_safe(dentry, tmp, &dir->d_subdirs, d_child) {
> + hlist_for_each_entry_safe(dentry, n, &dir->d_children, d_sib) {
> spin_lock(&dentry->d_lock);
> if (simple_positive(dentry)) {
> dget_dlock(dentry);
> diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c
> index 4d04ef2d3ae7..fe45462834cc 100644
> --- a/fs/afs/dynroot.c
> +++ b/fs/afs/dynroot.c
> @@ -370,7 +370,7 @@ int afs_dynroot_populate(struct super_block *sb)
> void afs_dynroot_depopulate(struct super_block *sb)
> {
> struct afs_net *net = afs_sb2net(sb);
> - struct dentry *root = sb->s_root, *subdir, *tmp;
> + struct dentry *root = sb->s_root, *subdir;
>
> /* Prevent more subdirs from being created */
> mutex_lock(&net->proc_cells_lock);
> @@ -379,10 +379,11 @@ void afs_dynroot_depopulate(struct super_block *sb)
> mutex_unlock(&net->proc_cells_lock);
>
> if (root) {
> + struct hlist_node *n;
> inode_lock(root->d_inode);
>
> /* Remove all the pins for dirs created for manually added cells */
> - list_for_each_entry_safe(subdir, tmp, &root->d_subdirs, d_child) {
> + hlist_for_each_entry_safe(subdir, n, &root->d_children, d_sib) {
> if (subdir->d_fsdata) {
> subdir->d_fsdata = NULL;
> dput(subdir);
> diff --git a/fs/autofs/expire.c b/fs/autofs/expire.c
> index 038b3d2d9f57..39d8c84c16f4 100644
> --- a/fs/autofs/expire.c
> +++ b/fs/autofs/expire.c
> @@ -73,12 +73,9 @@ static int autofs_mount_busy(struct vfsmount *mnt,
> /* p->d_lock held */
> static struct dentry *positive_after(struct dentry *p, struct dentry *child)
> {
> - if (child)
> - child = list_next_entry(child, d_child);
> - else
> - child = list_first_entry(&p->d_subdirs, struct dentry, d_child);
> + child = child ? d_next_sibling(child) : d_first_child(p);
>
> - list_for_each_entry_from(child, &p->d_subdirs, d_child) {
> + hlist_for_each_entry_from(child, d_sib) {
> spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
> if (simple_positive(child)) {
> dget_dlock(child);
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 91709934c8b1..678596684596 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -174,7 +174,7 @@ __dcache_find_get_entry(struct dentry *parent, u64 idx,
> /*
> * When possible, we try to satisfy a readdir by peeking at the
> * dcache. We make this work by carefully ordering dentries on
> - * d_child when we initially get results back from the MDS, and
> + * d_children when we initially get results back from the MDS, and
> * falling back to a "normal" sync readdir if any dentries in the dir
> * are dropped.
> *
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index d95eb525519a..02ebfabfc8ee 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2128,7 +2128,7 @@ static bool drop_negative_children(struct dentry *dentry)
> goto out;
>
> spin_lock(&dentry->d_lock);
> - list_for_each_entry(child, &dentry->d_subdirs, d_child) {
> + hlist_for_each_entry(child, &dentry->d_children, d_sib) {
> if (d_really_is_positive(child)) {
> all_negative = false;
> break;
> diff --git a/fs/coda/cache.c b/fs/coda/cache.c
> index bfbc03c6b632..f5b71a35f9db 100644
> --- a/fs/coda/cache.c
> +++ b/fs/coda/cache.c
> @@ -94,7 +94,7 @@ static void coda_flag_children(struct dentry *parent, int flag)
>
> rcu_read_lock();
> spin_lock(&parent->d_lock);
> - list_for_each_entry(de, &parent->d_subdirs, d_child) {
> + hlist_for_each_entry(de, &parent->d_children, d_sib) {
> struct inode *inode = d_inode_rcu(de);
> /* don't know what to do with negative dentries */
> if (inode)
> diff --git a/fs/dcache.c b/fs/dcache.c
> index c82ae731df9a..59f76c9a15d1 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -51,8 +51,8 @@
> * - d_lru
> * - d_count
> * - d_unhashed()
> - * - d_parent and d_subdirs
> - * - childrens' d_child and d_parent
> + * - d_parent and d_chilren
> + * - childrens' d_sib and d_parent
> * - d_u.d_alias, d_inode
> *
> * Ordering:
> @@ -537,7 +537,7 @@ void d_drop(struct dentry *dentry)
> }
> EXPORT_SYMBOL(d_drop);
>
> -static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
> +static inline void dentry_unlist(struct dentry *dentry)
> {
> struct dentry *next;
> /*
> @@ -545,12 +545,12 @@ static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
> * attached to the dentry tree
> */
> dentry->d_flags |= DCACHE_DENTRY_KILLED;
> - if (unlikely(list_empty(&dentry->d_child)))
> + if (unlikely(hlist_unhashed(&dentry->d_sib)))
> return;
> - __list_del_entry(&dentry->d_child);
> + __hlist_del(&dentry->d_sib);
> /*
> * Cursors can move around the list of children. While we'd been
> - * a normal list member, it didn't matter - ->d_child.next would've
> + * a normal list member, it didn't matter - ->d_sib.next would've
> * been updated. However, from now on it won't be and for the
> * things like d_walk() it might end up with a nasty surprise.
> * Normally d_walk() doesn't care about cursors moving around -
> @@ -558,20 +558,20 @@ static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
> * of its own, we get through it without ever unlocking the parent.
> * There is one exception, though - if we ascend from a child that
> * gets killed as soon as we unlock it, the next sibling is found
> - * using the value left in its ->d_child.next. And if _that_
> + * using the value left in its ->d_sib.next. And if _that_
> * pointed to a cursor, and cursor got moved (e.g. by lseek())
> * before d_walk() regains parent->d_lock, we'll end up skipping
> * everything the cursor had been moved past.
> *
> - * Solution: make sure that the pointer left behind in ->d_child.next
> + * Solution: make sure that the pointer left behind in ->d_sib.next
> * points to something that won't be moving around. I.e. skip the
> * cursors.
> */
> - while (dentry->d_child.next != &parent->d_subdirs) {
> - next = list_entry(dentry->d_child.next, struct dentry, d_child);
> + while (dentry->d_sib.next) {
> + next = hlist_entry(dentry->d_sib.next, struct dentry, d_sib);
> if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))
> break;
> - dentry->d_child.next = next->d_child.next;
> + dentry->d_sib.next = next->d_sib.next;
> }
> }
>
> @@ -600,7 +600,7 @@ static void __dentry_kill(struct dentry *dentry)
> }
> /* if it was on the hash then remove it */
> __d_drop(dentry);
> - dentry_unlist(dentry, parent);
> + dentry_unlist(dentry);
> if (parent)
> spin_unlock(&parent->d_lock);
> if (dentry->d_inode)
> @@ -1348,8 +1348,7 @@ enum d_walk_ret {
> static void d_walk(struct dentry *parent, void *data,
> enum d_walk_ret (*enter)(void *, struct dentry *))
> {
> - struct dentry *this_parent;
> - struct list_head *next;
> + struct dentry *this_parent, *dentry;
> unsigned seq = 0;
> enum d_walk_ret ret;
> bool retry = true;
> @@ -1371,13 +1370,9 @@ static void d_walk(struct dentry *parent, void *data,
> break;
> }
> repeat:
> - next = this_parent->d_subdirs.next;
> + dentry = d_first_child(this_parent);
> resume:
> - while (next != &this_parent->d_subdirs) {
> - struct list_head *tmp = next;
> - struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
> - next = tmp->next;
> -
> + hlist_for_each_entry_from(dentry, d_sib) {
> if (unlikely(dentry->d_flags & DCACHE_DENTRY_CURSOR))
> continue;
>
> @@ -1398,7 +1393,7 @@ static void d_walk(struct dentry *parent, void *data,
> continue;
> }
>
> - if (!list_empty(&dentry->d_subdirs)) {
> + if (!hlist_empty(&dentry->d_children)) {
> spin_unlock(&this_parent->d_lock);
> spin_release(&dentry->d_lock.dep_map, _RET_IP_);
> this_parent = dentry;
> @@ -1413,24 +1408,23 @@ static void d_walk(struct dentry *parent, void *data,
> rcu_read_lock();
> ascend:
> if (this_parent != parent) {
> - struct dentry *child = this_parent;
> - this_parent = child->d_parent;
> + dentry = this_parent;
> + this_parent = dentry->d_parent;
>
> - spin_unlock(&child->d_lock);
> + spin_unlock(&dentry->d_lock);
> spin_lock(&this_parent->d_lock);
>
> /* might go back up the wrong parent if we have had a rename. */
> if (need_seqretry(&rename_lock, seq))
> goto rename_retry;
> /* go into the first sibling still alive */
> - do {
> - next = child->d_child.next;
> - if (next == &this_parent->d_subdirs)
> - goto ascend;
> - child = list_entry(next, struct dentry, d_child);
> - } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
> - rcu_read_unlock();
> - goto resume;
> + hlist_for_each_entry_continue(dentry, d_sib) {
> + if (likely(!(dentry->d_flags & DCACHE_DENTRY_KILLED))) {
> + rcu_read_unlock();
> + goto resume;
> + }
> + }
> + goto ascend;
> }
> if (need_seqretry(&rename_lock, seq))
> goto rename_retry;
> @@ -1530,7 +1524,7 @@ int d_set_mounted(struct dentry *dentry)
> * Search the dentry child list of the specified parent,
> * and move any unused dentries to the end of the unused
> * list for prune_dcache(). We descend to the next level
> - * whenever the d_subdirs list is non-empty and continue
> + * whenever the d_children list is non-empty and continue
> * searching.
> *
> * It returns zero iff there are no unused children,
> @@ -1657,7 +1651,7 @@ EXPORT_SYMBOL(shrink_dcache_parent);
> static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
> {
> /* it has busy descendents; complain about those instead */
> - if (!list_empty(&dentry->d_subdirs))
> + if (!hlist_empty(&dentry->d_children))
> return D_WALK_CONTINUE;
>
> /* root with refcount 1 is fine */
> @@ -1814,9 +1808,9 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
> dentry->d_fsdata = NULL;
> INIT_HLIST_BL_NODE(&dentry->d_hash);
> INIT_LIST_HEAD(&dentry->d_lru);
> - INIT_LIST_HEAD(&dentry->d_subdirs);
> + INIT_HLIST_HEAD(&dentry->d_children);
> INIT_HLIST_NODE(&dentry->d_u.d_alias);
> - INIT_LIST_HEAD(&dentry->d_child);
> + INIT_HLIST_NODE(&dentry->d_sib);
> d_set_d_op(dentry, dentry->d_sb->s_d_op);
>
> if (dentry->d_op && dentry->d_op->d_init) {
> @@ -1855,7 +1849,7 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
> */
> __dget_dlock(parent);
> dentry->d_parent = parent;
> - list_add(&dentry->d_child, &parent->d_subdirs);
> + hlist_add_head(&dentry->d_sib, &parent->d_children);
> spin_unlock(&parent->d_lock);
>
> return dentry;
> @@ -2993,11 +2987,15 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
> } else {
> target->d_parent = old_parent;
> swap_names(dentry, target);
> - list_move(&target->d_child, &target->d_parent->d_subdirs);
> + if (!hlist_unhashed(&target->d_sib))
> + __hlist_del(&target->d_sib);
> + hlist_add_head(&target->d_sib, &target->d_parent->d_children);
> __d_rehash(target);
> fsnotify_update_flags(target);
> }
> - list_move(&dentry->d_child, &dentry->d_parent->d_subdirs);
> + if (!hlist_unhashed(&dentry->d_sib))
> + __hlist_del(&dentry->d_sib);
> + hlist_add_head(&dentry->d_sib, &dentry->d_parent->d_children);
> __d_rehash(dentry);
> fsnotify_update_flags(dentry);
> fscrypt_handle_d_move(dentry);
> diff --git a/fs/libfs.c b/fs/libfs.c
> index e9440d55073c..46c9177769c1 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -104,15 +104,16 @@ EXPORT_SYMBOL(dcache_dir_close);
> * If no such element exists, NULL is returned.
> */
> static struct dentry *scan_positives(struct dentry *cursor,
> - struct list_head *p,
> + struct hlist_node **p,
> loff_t count,
> struct dentry *last)
> {
> struct dentry *dentry = cursor->d_parent, *found = NULL;
>
> spin_lock(&dentry->d_lock);
> - while ((p = p->next) != &dentry->d_subdirs) {
> - struct dentry *d = list_entry(p, struct dentry, d_child);
> + while (*p) {
> + struct dentry *d = hlist_entry(*p, struct dentry, d_sib);
> + p = &d->d_sib.next;
> // we must at least skip cursors, to avoid livelocks
> if (d->d_flags & DCACHE_DENTRY_CURSOR)
> continue;
> @@ -126,8 +127,10 @@ static struct dentry *scan_positives(struct dentry *cursor,
> count = 1;
> }
> if (need_resched()) {
> - list_move(&cursor->d_child, p);
> - p = &cursor->d_child;
> + if (!hlist_unhashed(&cursor->d_sib))
> + __hlist_del(&cursor->d_sib);
> + hlist_add_behind(&cursor->d_sib, &d->d_sib);
> + p = &cursor->d_sib.next;
> spin_unlock(&dentry->d_lock);
> cond_resched();
> spin_lock(&dentry->d_lock);
> @@ -159,13 +162,12 @@ loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence)
> inode_lock_shared(dentry->d_inode);
>
> if (offset > 2)
> - to = scan_positives(cursor, &dentry->d_subdirs,
> + to = scan_positives(cursor, &dentry->d_children.first,
> offset - 2, NULL);
> spin_lock(&dentry->d_lock);
> + hlist_del_init(&cursor->d_sib);
> if (to)
> - list_move(&cursor->d_child, &to->d_child);
> - else
> - list_del_init(&cursor->d_child);
> + hlist_add_behind(&cursor->d_sib, &to->d_sib);
> spin_unlock(&dentry->d_lock);
> dput(to);
>
> @@ -187,19 +189,16 @@ int dcache_readdir(struct file *file, struct dir_context *ctx)
> {
> struct dentry *dentry = file->f_path.dentry;
> struct dentry *cursor = file->private_data;
> - struct list_head *anchor = &dentry->d_subdirs;
> struct dentry *next = NULL;
> - struct list_head *p;
> + struct hlist_node **p;
>
> if (!dir_emit_dots(file, ctx))
> return 0;
>
> if (ctx->pos == 2)
> - p = anchor;
> - else if (!list_empty(&cursor->d_child))
> - p = &cursor->d_child;
> + p = &dentry->d_children.first;
> else
> - return 0;
> + p = &cursor->d_sib.next;
>
> while ((next = scan_positives(cursor, p, 1, next)) != NULL) {
> if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
> @@ -207,13 +206,12 @@ int dcache_readdir(struct file *file, struct dir_context *ctx)
> fs_umode_to_dtype(d_inode(next)->i_mode)))
> break;
> ctx->pos++;
> - p = &next->d_child;
> + p = &next->d_sib.next;
> }
> spin_lock(&dentry->d_lock);
> + hlist_del_init(&cursor->d_sib);
> if (next)
> - list_move_tail(&cursor->d_child, &next->d_child);
> - else
> - list_del_init(&cursor->d_child);
> + hlist_add_before(&cursor->d_sib, &next->d_sib);
> spin_unlock(&dentry->d_lock);
> dput(next);
>
> @@ -492,12 +490,11 @@ const struct file_operations simple_offset_dir_operations = {
>
> static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev)
> {
> - struct dentry *child = NULL;
> - struct list_head *p = prev ? &prev->d_child : &parent->d_subdirs;
> + struct dentry *child = NULL, *d;
>
> spin_lock(&parent->d_lock);
> - while ((p = p->next) != &parent->d_subdirs) {
> - struct dentry *d = container_of(p, struct dentry, d_child);
> + d = prev ? d_next_sibling(prev) : d_first_child(parent);
> + hlist_for_each_entry_from(d, d_sib) {
> if (simple_positive(d)) {
> spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
> if (simple_positive(d))
> @@ -658,7 +655,7 @@ int simple_empty(struct dentry *dentry)
> int ret = 0;
>
> spin_lock(&dentry->d_lock);
> - list_for_each_entry(child, &dentry->d_subdirs, d_child) {
> + hlist_for_each_entry(child, &dentry->d_children, d_sib) {
> spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
> if (simple_positive(child)) {
> spin_unlock(&child->d_lock);
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 7974e91ffe13..8bfd690e9f10 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -124,7 +124,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
> * d_flags to indicate parental interest (their parent is the
> * original inode) */
> spin_lock(&alias->d_lock);
> - list_for_each_entry(child, &alias->d_subdirs, d_child) {
> + hlist_for_each_entry(child, &alias->d_children, d_sib) {
> if (!child->d_inode)
> continue;
>
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 5b54948514fe..61ca5fcf10f9 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -199,26 +199,21 @@ static void change_gid(struct dentry *dentry, kgid_t gid)
> */
> static void set_gid(struct dentry *parent, kgid_t gid)
> {
> - struct dentry *this_parent;
> - struct list_head *next;
> + struct dentry *this_parent, *dentry;
>
> this_parent = parent;
> spin_lock(&this_parent->d_lock);
>
> change_gid(this_parent, gid);
> repeat:
> - next = this_parent->d_subdirs.next;
> + dentry = d_first_child(this_parent);
> resume:
> - while (next != &this_parent->d_subdirs) {
> - struct list_head *tmp = next;
> - struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
> - next = tmp->next;
> -
> + hlist_for_each_entry_from(dentry, d_sib) {
> spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
>
> change_gid(dentry, gid);
>
> - if (!list_empty(&dentry->d_subdirs)) {
> + if (!hlist_empty(&dentry->d_children)) {
> spin_unlock(&this_parent->d_lock);
> spin_release(&dentry->d_lock.dep_map, _RET_IP_);
> this_parent = dentry;
> @@ -233,21 +228,20 @@ static void set_gid(struct dentry *parent, kgid_t gid)
> rcu_read_lock();
> ascend:
> if (this_parent != parent) {
> - struct dentry *child = this_parent;
> - this_parent = child->d_parent;
> + dentry = this_parent;
> + this_parent = dentry->d_parent;
>
> - spin_unlock(&child->d_lock);
> + spin_unlock(&dentry->d_lock);
> spin_lock(&this_parent->d_lock);
>
> /* go into the first sibling still alive */
> - do {
> - next = child->d_child.next;
> - if (next == &this_parent->d_subdirs)
> - goto ascend;
> - child = list_entry(next, struct dentry, d_child);
> - } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
> - rcu_read_unlock();
> - goto resume;
> + hlist_for_each_entry_continue(dentry, d_sib) {
> + if (likely(!(dentry->d_flags & DCACHE_DENTRY_KILLED))) {
> + rcu_read_unlock();
> + goto resume;
> + }
> + }
> + goto ascend;
> }
> rcu_read_unlock();
> spin_unlock(&this_parent->d_lock);
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 3da2f0545d5d..0e397a0c519c 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -68,12 +68,12 @@ extern const struct qstr dotdot_name;
> * large memory footprint increase).
> */
> #ifdef CONFIG_64BIT
> -# define DNAME_INLINE_LEN 32 /* 192 bytes */
> +# define DNAME_INLINE_LEN 40 /* 192 bytes */
> #else
> # ifdef CONFIG_SMP
> -# define DNAME_INLINE_LEN 36 /* 128 bytes */
> -# else
> # define DNAME_INLINE_LEN 40 /* 128 bytes */
> +# else
> +# define DNAME_INLINE_LEN 44 /* 128 bytes */
> # endif
> #endif
>
> @@ -101,8 +101,8 @@ struct dentry {
> struct list_head d_lru; /* LRU list */
> wait_queue_head_t *d_wait; /* in-lookup ones only */
> };
> - struct list_head d_child; /* child of parent list */
> - struct list_head d_subdirs; /* our children */
> + struct hlist_node d_sib; /* child of parent list */
> + struct hlist_head d_children; /* our children */
> /*
> * d_alias and d_rcu can share memory
> */
> @@ -600,4 +600,14 @@ struct name_snapshot {
> void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
> void release_dentry_name_snapshot(struct name_snapshot *);
>
> +static inline struct dentry *d_first_child(const struct dentry *dentry)
> +{
> + return hlist_entry_safe(dentry->d_children.first, struct dentry, d_sib);
> +}
> +
> +static inline struct dentry *d_next_sibling(const struct dentry *dentry)
> +{
> + return hlist_entry_safe(dentry->d_sib.next, struct dentry, d_sib);
> +}
> +
> #endif /* __LINUX_DCACHE_H */
> --
> 2.39.2
>
>

2023-11-24 07:56:29

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v3 03/21] dentry: switch the lists of children to hlist

On Fri, Nov 24, 2023 at 09:44:27AM +0200, Amir Goldstein wrote:
> On Fri, Nov 24, 2023 at 8:05 AM Al Viro <[email protected]> wrote:
> >
> > Saves a pointer per struct dentry and actually makes the things less
> > clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use
> > of hlist_for_... iterators.
> >
> > A couple of new helpers - d_first_child() and d_next_sibling(),
> > to make the expressions less awful.
> >
> > X-fuck-kABI: gladly
>
> ???

It breaks kABI. Hard. For a good reason. And if you need an elaboration
of the reasons why kABI is, er, not universally liked - let's take it
to private email. Some rants are really too unprintable for maillists...

2023-11-24 08:14:22

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v3 03/21] dentry: switch the lists of children to hlist

On Fri, Nov 24, 2023 at 9:55 AM Al Viro <[email protected]> wrote:
>
> On Fri, Nov 24, 2023 at 09:44:27AM +0200, Amir Goldstein wrote:
> > On Fri, Nov 24, 2023 at 8:05 AM Al Viro <[email protected]> wrote:
> > >
> > > Saves a pointer per struct dentry and actually makes the things less
> > > clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use
> > > of hlist_for_... iterators.
> > >
> > > A couple of new helpers - d_first_child() and d_next_sibling(),
> > > to make the expressions less awful.
> > >
> > > X-fuck-kABI: gladly
> >
> > ???
>
> It breaks kABI. Hard. For a good reason. And if you need an elaboration
> of the reasons why kABI is, er, not universally liked - let's take it
> to private email. Some rants are really too unprintable for maillists...

I don't mind rants, even quite fond of yours :)

But "X-fuck-kABI: gladly" serves no real value.
You'd already documented the kAPI change in porting.rst.
If you want to add a note about kABI, I suggest that you add it there.

Thanks,
Amir.

2023-11-24 21:23:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3 02/21] coda_flag_children(): cope with dentries turning negative

On Thu, 23 Nov 2023 at 22:04, Al Viro <[email protected]> wrote:
>
> ->d_lock on parent does not stabilize ->d_inode of child.
> We don't do much with that inode in there, but we need
> at least to avoid struct inode getting freed under us...

Gaah. We've gone back and forth on this. Being non-preemptible is
already equivalent to rcu read locking.

From Documentation/RCU/rcu_dereference.rst:

With the new consolidated
RCU flavors, an RCU read-side critical section is entered
using rcu_read_lock(), anything that disables bottom halves,
anything that disables interrupts, or anything that disables
preemption.

so I actually think the coda code is already mostly fine, because that
parent spin_lock may not stabilize d_child per se, but it *does* imply
a RCU read lock.

So I think you should drop the rcu_read_lock/rcu_read_unlock from that patch.

But that

struct inode *inode = d_inode_rcu(de);

conversion is required to get a stable inode pointer.

So half of this patch is unnecessary.

Adding Paul to the cc just to verify that the docs are up-to-date and
that we're still good here.

Because we've gone back-and-forth on the "spinlocks are an implied RCU
read-side critical section" a couple of times.

Linus

2023-11-24 21:29:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] simplifying fast_dput(), dentry_kill() et.al.

On Thu, 23 Nov 2023 at 22:02, Al Viro <[email protected]> wrote:
>
> The series below is the fallout of trying to document the dentry
> refcounting and life cycle - basically, getting rid of the bits that
> had been too subtle and ugly to write them up.

Apart from my RCU note, this looks like "Al knows what he's doing" to me.

Although I'm inclined to agree with Amir on the "no need to call out
kabi" on patch#3. It's also not like we've ever cared: as long as you
convert all users, kabi is simply not relevant.

Linus

2023-11-24 22:59:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 02/21] coda_flag_children(): cope with dentries turning negative

On Fri, Nov 24, 2023 at 01:22:19PM -0800, Linus Torvalds wrote:
> On Thu, 23 Nov 2023 at 22:04, Al Viro <[email protected]> wrote:
> >
> > ->d_lock on parent does not stabilize ->d_inode of child.
> > We don't do much with that inode in there, but we need
> > at least to avoid struct inode getting freed under us...
>
> Gaah. We've gone back and forth on this. Being non-preemptible is
> already equivalent to rcu read locking.
>
> >From Documentation/RCU/rcu_dereference.rst:
>
> With the new consolidated
> RCU flavors, an RCU read-side critical section is entered
> using rcu_read_lock(), anything that disables bottom halves,
> anything that disables interrupts, or anything that disables
> preemption.
>
> so I actually think the coda code is already mostly fine, because that
> parent spin_lock may not stabilize d_child per se, but it *does* imply
> a RCU read lock.
>
> So I think you should drop the rcu_read_lock/rcu_read_unlock from that patch.
>
> But that
>
> struct inode *inode = d_inode_rcu(de);
>
> conversion is required to get a stable inode pointer.
>
> So half of this patch is unnecessary.
>
> Adding Paul to the cc just to verify that the docs are up-to-date and
> that we're still good here.
>
> Because we've gone back-and-forth on the "spinlocks are an implied RCU
> read-side critical section" a couple of times.

Yes, spinlocks are implied RCU read-side critical sections. Even in -rt,
where non-raw spinlocks are preemptible, courtesy of this:

static __always_inline void __rt_spin_lock(spinlock_t *lock)
{
rtlock_might_resched();
rtlock_lock(&lock->lock);
rcu_read_lock();
migrate_disable();
}

So given -rt's preemptible spinlocks still being RCU readers, I need to
explicitly call this out in the documentation.

How about as shown below for a start?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/Documentation/RCU/rcu_dereference.rst b/Documentation/RCU/rcu_dereference.rst
index 659d5913784d..2524dcdadde2 100644
--- a/Documentation/RCU/rcu_dereference.rst
+++ b/Documentation/RCU/rcu_dereference.rst
@@ -408,7 +408,10 @@ member of the rcu_dereference() to use in various situations:
RCU flavors, an RCU read-side critical section is entered
using rcu_read_lock(), anything that disables bottom halves,
anything that disables interrupts, or anything that disables
- preemption.
+ preemption. Please note that spinlock critical sections
+ are also implied RCU read-side critical sections, even when
+ they are preemptible, as they are in kernels built with
+ CONFIG_PREEMPT_RT=y.

2. If the access might be within an RCU read-side critical section
on the one hand, or protected by (say) my_lock on the other,