2022-06-16 01:20:29

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH RFC 00/12] Allow concurrent directory updates.

On Wed, 15 Jun 2022, Daire Byrne wrote:
...
> With the patch, the aggregate increases to 15 creates/s for 10 clients
> which again matches the results of a single patched client. Not quite
> a x10 increase but a healthy improvement nonetheless.

Great!

>
> However, it is at this point that I started to experience some
> stability issues with the re-export server that are not present with
> the vanilla unpatched v5.19-rc2 kernel. In particular the knfsd
> threads start to lock up with stack traces like this:
>
> [ 1234.460696] INFO: task nfsd:5514 blocked for more than 123 seconds.
> [ 1234.461481] Tainted: G W E 5.19.0-1.dneg.x86_64 #1
> [ 1234.462289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1234.463227] task:nfsd state:D stack: 0 pid: 5514
> ppid: 2 flags:0x00004000
> [ 1234.464212] Call Trace:
> [ 1234.464677] <TASK>
> [ 1234.465104] __schedule+0x2a9/0x8a0
> [ 1234.465663] schedule+0x55/0xc0
> [ 1234.466183] ? nfs_lookup_revalidate_dentry+0x3a0/0x3a0 [nfs]
> [ 1234.466995] __nfs_lookup_revalidate+0xdf/0x120 [nfs]

I can see the cause of this - I forget a wakeup. This patch should fix
it, though I hope to find a better solution.

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 54c2c7adcd56..072130d000c4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2483,17 +2483,16 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
if (!(dentry->d_flags & DCACHE_PAR_UPDATE)) {
/* Must have exclusive lock on parent */
did_set_par_update = true;
+ lock_acquire_exclusive(&dentry->d_update_map, 0,
+ 0, NULL, _THIS_IP_);
dentry->d_flags |= DCACHE_PAR_UPDATE;
}

spin_unlock(&dentry->d_lock);
error = nfs_safe_remove(dentry);
nfs_dentry_remove_handle_error(dir, dentry, error);
- if (did_set_par_update) {
- spin_lock(&dentry->d_lock);
- dentry->d_flags &= ~DCACHE_PAR_UPDATE;
- spin_unlock(&dentry->d_lock);
- }
+ if (did_set_par_update)
+ d_unlock_update(dentry);
out:
trace_nfs_unlink_exit(dir, dentry, error);
return error;

>
> So all in all, the performance improvements in the knfsd re-export
> case is looking great and we have real world use cases that this helps
> with (batch processing workloads with latencies >10ms). If we can
> figure out the hanging knfsd threads, then I can test it more heavily.

Hopefully the above patch will allow the more heavy testing to continue.
In any case, thanks a lot for the testing so far,

NeilBrown