2021-11-17 00:47:13

by NeilBrown

[permalink] [raw]
Subject: [PATCH 00/14] SUNRPC: clean up server thread management.

I have a dream of making nfsd threads start and stop dynamically.
This would free the admin of having to choose a number of threads to
start.
I'm not there yet, and I may never get there, but the current state of
the thread management code makes it harder to experiment than it needs
to be. There is a lot of technical debt that needs to be repaid first.

This series addresses much of this debt. There are three users of
service threads: nfsd, lockd, and nfs-callback.
nfs-callback, the newest, is quite clean. This patch brings nfsd and
lockd up to a similar standard, and takes advantage of this increased
uniformity to simplify some shared interfaces.

It doesn't introduce any functionality improvements, and (as far as I
know) only fixes one minor bug (can you spot it? If not, look at
c20106944eb6 and if you can see a second place that it could have
fixed).

Thanks for your review,
NeilBrown


---

NeilBrown (14):
SUNRPC: stop using ->sv_nrthreads as a refcount
nfsd: make nfsd_stats.th_cnt atomic_t
NFSD: narrow nfsd_mutex protection in nfsd thread
SUNRPC: use sv_lock to protect updates to sv_nrthreads.
NFSD: Make it possible to use svc_set_num_threads_sync
SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()
NFSD: simplify locking for network notifier.
lockd: introduce nlmsvc_serv
lockd: simplify management of network status notifiers
lockd: move lockd_start_svc() call into lockd_create_svc()
lockd: move svc_exit_thread() into the thread
lockd: introduce lockd_put()
lockd: rename lockd_create_svc() to lockd_get()
lockd: use svc_set_num_threads() for thread start and stop


fs/lockd/svc.c | 190 ++++++++++++-------------------------
fs/nfs/callback.c | 8 +-
fs/nfsd/netns.h | 6 --
fs/nfsd/nfsctl.c | 2 -
fs/nfsd/nfssvc.c | 99 +++++++++----------
fs/nfsd/stats.c | 2 +-
fs/nfsd/stats.h | 4 +-
include/linux/sunrpc/svc.h | 11 +--
net/sunrpc/svc.c | 61 ++----------
9 files changed, 128 insertions(+), 255 deletions(-)

--
Signature



2021-11-17 00:47:18

by NeilBrown

[permalink] [raw]
Subject: [PATCH 01/14] SUNRPC: stop using ->sv_nrthreads as a refcount

The use of sv_nrthreads as a general refcount results in clumsy code, as
is seen by various comments needed to explain the situation.

This patch introduces a 'struct kref' and uses that for reference
counting, leaving sv_nrthreads to be a pure count of threads.

svc_get() increments the refcount, and now returns the serv as well.
svc_put() decrements and calls svc_destroy() (which is now
unconditional) when count reaches zero.

nfsd doesn't use svc_put() as it needs to perform an action before
svc_destroy(). So it defines nfsd_put() which uses kref_put() directly.

nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This
happens when a transport is created before the first thread is started.
To support this, a 'keep_active' flag is introduced which holds a ref on
the svc_serv. This is set when any listening socket is successfully
added (unless there are running threads), and cleared when the number of
threads is set. So when the last thread exits, the nfs_serv will be
destroyed.

We no longer clear ->rq_server when nfsd() exits. This was done
to prevent svc_exit_thread() from calling svc_destroy().
Instead we take an extra reference to the svc_serv to prevent
svc_destroy() from being called.

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 16 ++---------
fs/nfs/callback.c | 22 +++-----------
fs/nfsd/netns.h | 7 +++++
fs/nfsd/nfsctl.c | 21 ++++++--------
fs/nfsd/nfsd.h | 2 +
fs/nfsd/nfssvc.c | 67 ++++++++++++++++++++++++++------------------
include/linux/sunrpc/svc.h | 19 +++++++-----
net/sunrpc/svc.c | 33 +++++++++-------------
8 files changed, 88 insertions(+), 99 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index b220e1b91726..a9669b106dbd 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -430,14 +430,8 @@ static struct svc_serv *lockd_create_svc(void)
/*
* Check whether we're already up and running.
*/
- if (nlmsvc_rqst) {
- /*
- * Note: increase service usage, because later in case of error
- * svc_destroy() will be called.
- */
- svc_get(nlmsvc_rqst->rq_server);
- return nlmsvc_rqst->rq_server;
- }
+ if (nlmsvc_rqst)
+ return svc_get(nlmsvc_rqst->rq_server);

/*
* Sanity check: if there's no pid,
@@ -492,12 +486,8 @@ int lockd_up(struct net *net, const struct cred *cred)
goto err_put;
}
nlmsvc_users++;
- /*
- * Note: svc_serv structures have an initial use count of 1,
- * so we exit through here on both success and failure.
- */
err_put:
- svc_destroy(serv);
+ svc_put(serv);
err_create:
mutex_unlock(&nlmsvc_mutex);
return error;
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 86d856de1389..d9d78ffd1d65 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -169,7 +169,7 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt,
if (nrservs < NFS4_MIN_NR_CALLBACK_THREADS)
nrservs = NFS4_MIN_NR_CALLBACK_THREADS;

- if (serv->sv_nrthreads-1 == nrservs)
+ if (serv->sv_nrthreads == nrservs)
return 0;

ret = serv->sv_ops->svo_setup(serv, NULL, nrservs);
@@ -266,14 +266,8 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
/*
* Check whether we're already up and running.
*/
- if (cb_info->serv) {
- /*
- * Note: increase service usage, because later in case of error
- * svc_destroy() will be called.
- */
- svc_get(cb_info->serv);
- return cb_info->serv;
- }
+ if (cb_info->serv)
+ return svc_get(cb_info->serv);

switch (minorversion) {
case 0:
@@ -335,16 +329,10 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
goto err_start;

cb_info->users++;
- /*
- * svc_create creates the svc_serv with sv_nrthreads == 1, and then
- * svc_prepare_thread increments that. So we need to call svc_destroy
- * on both success and failure so that the refcount is 1 when the
- * thread exits.
- */
err_net:
if (!cb_info->users)
cb_info->serv = NULL;
- svc_destroy(serv);
+ svc_put(serv);
err_create:
mutex_unlock(&nfs_callback_mutex);
return ret;
@@ -370,7 +358,7 @@ void nfs_callback_down(int minorversion, struct net *net)
if (cb_info->users == 0) {
svc_get(serv);
serv->sv_ops->svo_setup(serv, NULL, 0);
- svc_destroy(serv);
+ svc_put(serv);
dprintk("nfs_callback_down: service destroyed\n");
cb_info->serv = NULL;
}
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 935c1028c217..08bcd8f23b01 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -123,6 +123,13 @@ struct nfsd_net {
u32 clverifier_counter;

struct svc_serv *nfsd_serv;
+ /* When a listening socket is added to nfsd, keep_active is set
+ * and this justifies a reference on nfsd_serv. This stops
+ * nfsd_serv from being freed. When the number of threads is
+ * set, keep_active is cleared and the reference is dropped. So
+ * when the last thread exits, the service will be destroyed.
+ */
+ int keep_active;

wait_queue_head_t ntf_wq;
atomic_t ntf_refcnt;
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index af8531c3854a..22dbd638f6c8 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -742,13 +742,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
return err;

err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
- if (err < 0) {
- nfsd_destroy(net);
- return err;
- }

- /* Decrease the count, but don't shut down the service */
- nn->nfsd_serv->sv_nrthreads--;
+ if (!err && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
+ svc_get(nn->nfsd_serv);
+
+ nfsd_put(net);
return err;
}

@@ -783,8 +781,10 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr
if (err < 0 && err != -EAFNOSUPPORT)
goto out_close;

- /* Decrease the count, but don't shut down the service */
- nn->nfsd_serv->sv_nrthreads--;
+ if (!nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
+ svc_get(nn->nfsd_serv);
+
+ nfsd_put(net);
return 0;
out_close:
xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port);
@@ -793,10 +793,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr
svc_xprt_put(xprt);
}
out_err:
- if (!list_empty(&nn->nfsd_serv->sv_permsocks))
- nn->nfsd_serv->sv_nrthreads--;
- else
- nfsd_destroy(net);
+ nfsd_put(net);
return err;
}

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 498e5a489826..3e5008b475ff 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -97,7 +97,7 @@ int nfsd_pool_stats_open(struct inode *, struct file *);
int nfsd_pool_stats_release(struct inode *, struct file *);
void nfsd_shutdown_threads(struct net *net);

-void nfsd_destroy(struct net *net);
+void nfsd_put(struct net *net);

bool i_am_nfsd(void);

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 80431921e5d7..1c9505ee4f89 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -60,13 +60,13 @@ static __be32 nfsd_init_request(struct svc_rqst *,
* extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt
*
* If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a
- * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0. That number
- * of nfsd threads must exist and each must listed in ->sp_all_threads in each
- * entry of ->sv_pools[].
+ * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless
+ * nn->keep_active is set). That number of nfsd threads must
+ * exist and each must be listed in ->sp_all_threads in some entry of
+ * ->sv_pools[].
*
- * Transitions of the thread count between zero and non-zero are of particular
- * interest since the svc_serv needs to be created and initialized at that
- * point, or freed.
+ * Each active thread holds a counted reference on nn->nfsd_serv, as does
+ * the nn->keep_active flag and various transient calls to svc_get().
*
* Finally, the nfsd_mutex also protects some of the global variables that are
* accessed when nfsd starts and that are settable via the write_* routines in
@@ -623,7 +623,7 @@ void nfsd_shutdown_threads(struct net *net)
svc_get(serv);
/* Kill outstanding nfsd threads */
serv->sv_ops->svo_setup(serv, NULL, 0);
- nfsd_destroy(net);
+ nfsd_put(net);
mutex_unlock(&nfsd_mutex);
/* Wait for shutdown of nfsd_serv to complete */
wait_for_completion(&nn->nfsd_shutdown_complete);
@@ -656,7 +656,10 @@ int nfsd_create_serv(struct net *net)
nn->nfsd_serv->sv_maxconn = nn->max_connections;
error = svc_bind(nn->nfsd_serv, net);
if (error < 0) {
- svc_destroy(nn->nfsd_serv);
+ /* NOT nfsd_put() as notifiers (see below) haven't
+ * been set up yet.
+ */
+ svc_put(nn->nfsd_serv);
nfsd_complete_shutdown(net);
return error;
}
@@ -671,6 +674,7 @@ int nfsd_create_serv(struct net *net)
}
atomic_inc(&nn->ntf_refcnt);
nfsd_reset_boot_verifier(nn);
+ svc_get(nn->nfsd_serv);
return 0;
}

@@ -697,16 +701,24 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net)
return 0;
}

-void nfsd_destroy(struct net *net)
+/* This is the callback for kref_put() below.
+ * There is no code here as the first thing to be done is
+ * call svc_shutdown_net(), but we cannot get the 'net' from
+ * the kref. So do all the work when kref_put returns true.
+ */
+static void nfsd_noop(struct kref *ref)
+{
+}
+
+void nfsd_put(struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- int destroy = (nn->nfsd_serv->sv_nrthreads == 1);

- if (destroy)
+ if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) {
svc_shutdown_net(nn->nfsd_serv, net);
- svc_destroy(nn->nfsd_serv);
- if (destroy)
+ svc_destroy(&nn->nfsd_serv->sv_refcnt);
nfsd_complete_shutdown(net);
+ }
}

int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
@@ -758,7 +770,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
if (err)
break;
}
- nfsd_destroy(net);
+ nfsd_put(net);
return err;
}

@@ -795,21 +807,20 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred)

error = nfsd_startup_net(net, cred);
if (error)
- goto out_destroy;
+ goto out_put;
error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv,
NULL, nrservs);
if (error)
goto out_shutdown;
- /* We are holding a reference to nn->nfsd_serv which
- * we don't want to count in the return value,
- * so subtract 1
- */
- error = nn->nfsd_serv->sv_nrthreads - 1;
+ error = nn->nfsd_serv->sv_nrthreads;
out_shutdown:
if (error < 0 && !nfsd_up_before)
nfsd_shutdown_net(net);
-out_destroy:
- nfsd_destroy(net); /* Release server */
+out_put:
+ /* Threads now hold service active */
+ if (xchg(&nn->keep_active, 0))
+ nfsd_put(net);
+ nfsd_put(net);
out:
mutex_unlock(&nfsd_mutex);
return error;
@@ -977,12 +988,16 @@ nfsd(void *vrqstp)
nfsdstats.th_cnt --;

out:
- rqstp->rq_server = NULL;
+ /* Take an extra ref so that the svc_put in svc_exit_thread()
+ * doesn't call svc_destroy()
+ */
+ svc_get(nn->nfsd_serv);

/* Release the thread */
svc_exit_thread(rqstp);

- nfsd_destroy(net);
+ /* Now if needed we call svc_destroy in appropriate context */
+ nfsd_put(net);

/* Release module */
mutex_unlock(&nfsd_mutex);
@@ -1096,7 +1111,6 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file)
mutex_unlock(&nfsd_mutex);
return -ENODEV;
}
- /* bump up the psudo refcount while traversing */
svc_get(nn->nfsd_serv);
ret = svc_pool_stats_open(nn->nfsd_serv, file);
mutex_unlock(&nfsd_mutex);
@@ -1109,8 +1123,7 @@ int nfsd_pool_stats_release(struct inode *inode, struct file *file)
struct net *net = inode->i_sb->s_fs_info;

mutex_lock(&nfsd_mutex);
- /* this function really, really should have been called svc_put() */
- nfsd_destroy(net);
+ nfsd_put(net);
mutex_unlock(&nfsd_mutex);
return ret;
}
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 0ae28ae6caf2..94c14c61a5f9 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -85,6 +85,7 @@ struct svc_serv {
struct svc_program * sv_program; /* RPC program */
struct svc_stat * sv_stats; /* RPC statistics */
spinlock_t sv_lock;
+ struct kref sv_refcnt;
unsigned int sv_nrthreads; /* # of server threads */
unsigned int sv_maxconn; /* max connections allowed or
* '0' causing max to be based
@@ -114,15 +115,16 @@ struct svc_serv {
#endif /* CONFIG_SUNRPC_BACKCHANNEL */
};

-/*
- * We use sv_nrthreads as a reference count. svc_destroy() drops
- * this refcount, so we need to bump it up around operations that
- * change the number of threads. Horrible, but there it is.
- * Should be called with the "service mutex" held.
- */
-static inline void svc_get(struct svc_serv *serv)
+static inline struct svc_serv *svc_get(struct svc_serv *serv)
+{
+ kref_get(&serv->sv_refcnt);
+ return serv;
+}
+
+void svc_destroy(struct kref *);
+static inline int svc_put(struct svc_serv *serv)
{
- serv->sv_nrthreads++;
+ return kref_put(&serv->sv_refcnt, svc_destroy);
}

/*
@@ -514,7 +516,6 @@ struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int,
int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
int svc_set_num_threads_sync(struct svc_serv *, struct svc_pool *, int);
int svc_pool_stats_open(struct svc_serv *serv, struct file *file);
-void svc_destroy(struct svc_serv *);
void svc_shutdown_net(struct svc_serv *, struct net *);
int svc_process(struct svc_rqst *);
int bc_svc_process(struct svc_serv *, struct rpc_rqst *,
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 4292278a9552..bd610709e88c 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -435,8 +435,8 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
return NULL;
serv->sv_name = prog->pg_name;
serv->sv_program = prog;
- serv->sv_nrthreads = 1;
serv->sv_stats = prog->pg_stats;
+ kref_init(&serv->sv_refcnt);
if (bufsize > RPCSVC_MAXPAYLOAD)
bufsize = RPCSVC_MAXPAYLOAD;
serv->sv_max_payload = bufsize? bufsize : 4096;
@@ -526,19 +526,12 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net);
* protect the sv_nrthreads, sv_permsocks and sv_tempsocks.
*/
void
-svc_destroy(struct svc_serv *serv)
+svc_destroy(struct kref *ref)
{
- dprintk("svc: svc_destroy(%s, %d)\n",
- serv->sv_program->pg_name,
- serv->sv_nrthreads);
-
- if (serv->sv_nrthreads) {
- if (--(serv->sv_nrthreads) != 0) {
- svc_sock_update_bufs(serv);
- return;
- }
- } else
- printk("svc_destroy: no threads for serv=%p!\n", serv);
+ struct svc_serv *serv = container_of(ref, struct svc_serv, sv_refcnt);
+
+ dprintk("svc: svc_destroy(%s)\n",
+ serv->sv_program->pg_name);

del_timer_sync(&serv->sv_temptimer);

@@ -647,6 +640,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
if (!rqstp)
return ERR_PTR(-ENOMEM);

+ svc_get(serv);
serv->sv_nrthreads++;
spin_lock_bh(&pool->sp_lock);
pool->sp_nrthreads++;
@@ -786,8 +780,7 @@ int
svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
if (pool == NULL) {
- /* The -1 assumes caller has done a svc_get() */
- nrservs -= (serv->sv_nrthreads-1);
+ nrservs -= serv->sv_nrthreads;
} else {
spin_lock_bh(&pool->sp_lock);
nrservs -= pool->sp_nrthreads;
@@ -824,8 +817,7 @@ int
svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
if (pool == NULL) {
- /* The -1 assumes caller has done a svc_get() */
- nrservs -= (serv->sv_nrthreads-1);
+ nrservs -= serv->sv_nrthreads;
} else {
spin_lock_bh(&pool->sp_lock);
nrservs -= pool->sp_nrthreads;
@@ -890,11 +882,12 @@ svc_exit_thread(struct svc_rqst *rqstp)
list_del_rcu(&rqstp->rq_all);
spin_unlock_bh(&pool->sp_lock);

+ serv->sv_nrthreads -= 1;
+ svc_sock_update_bufs(serv);
+
svc_rqst_free(rqstp);

- /* Release the server */
- if (serv)
- svc_destroy(serv);
+ svc_put(serv);
}
EXPORT_SYMBOL_GPL(svc_exit_thread);




2021-11-17 00:47:22

by NeilBrown

[permalink] [raw]
Subject: [PATCH 02/14] nfsd: make nfsd_stats.th_cnt atomic_t

This allows us to move the updates for th_cnt out of the mutex.
This is a step towards reducing mutex coverage in nfsd().

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/nfssvc.c | 6 +++---
fs/nfsd/stats.c | 2 +-
fs/nfsd/stats.h | 4 +---
3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 1c9505ee4f89..3c6e4faac3c3 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -57,7 +57,7 @@ static __be32 nfsd_init_request(struct svc_rqst *,
/*
* nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
* of the svc_serv struct. In particular, ->sv_nrthreads but also to some
- * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt
+ * extent ->sv_temp_socks and ->sv_permsocks.
*
* If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a
* properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless
@@ -956,8 +956,8 @@ nfsd(void *vrqstp)
allow_signal(SIGINT);
allow_signal(SIGQUIT);

- nfsdstats.th_cnt++;
mutex_unlock(&nfsd_mutex);
+ atomic_inc(&nfsdstats.th_cnt);

set_freezable();

@@ -984,8 +984,8 @@ nfsd(void *vrqstp)
/* Clear signals before calling svc_exit_thread() */
flush_signals(current);

+ atomic_dec(&nfsdstats.th_cnt);
mutex_lock(&nfsd_mutex);
- nfsdstats.th_cnt --;

out:
/* Take an extra ref so that the svc_put in svc_exit_thread()
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c
index 1d3b881e7382..a8c5a02a84f0 100644
--- a/fs/nfsd/stats.c
+++ b/fs/nfsd/stats.c
@@ -45,7 +45,7 @@ static int nfsd_proc_show(struct seq_file *seq, void *v)
percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE]));

/* thread usage: */
- seq_printf(seq, "th %u 0", nfsdstats.th_cnt);
+ seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt));

/* deprecated thread usage histogram stats */
for (i = 0; i < 10; i++)
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h
index 51ecda852e23..9b43dc3d9991 100644
--- a/fs/nfsd/stats.h
+++ b/fs/nfsd/stats.h
@@ -29,11 +29,9 @@ enum {
struct nfsd_stats {
struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM];

- /* Protected by nfsd_mutex */
- unsigned int th_cnt; /* number of available threads */
+ atomic_t th_cnt; /* number of available threads */
};

-
extern struct nfsd_stats nfsdstats;

extern struct svc_stat nfsd_svcstats;



2021-11-17 00:47:26

by NeilBrown

[permalink] [raw]
Subject: [PATCH 03/14] NFSD: narrow nfsd_mutex protection in nfsd thread

There is nothing happening in the start of nfsd() that requires
protection by the mutex, so don't take it until shutting down the thread
- which does still require protection.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/nfssvc.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 3c6e4faac3c3..9e517ddd8d4e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -934,9 +934,6 @@ nfsd(void *vrqstp)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
int err;

- /* Lock module and set up kernel thread */
- mutex_lock(&nfsd_mutex);
-
/* At this point, the thread shares current->fs
* with the init process. We need to create files with the
* umask as defined by the client instead of init's umask. */
@@ -956,7 +953,6 @@ nfsd(void *vrqstp)
allow_signal(SIGINT);
allow_signal(SIGQUIT);

- mutex_unlock(&nfsd_mutex);
atomic_inc(&nfsdstats.th_cnt);

set_freezable();
@@ -985,9 +981,9 @@ nfsd(void *vrqstp)
flush_signals(current);

atomic_dec(&nfsdstats.th_cnt);
- mutex_lock(&nfsd_mutex);

out:
+ mutex_lock(&nfsd_mutex);
/* Take an extra ref so that the svc_put in svc_exit_thread()
* doesn't call svc_destroy()
*/



2021-11-17 00:47:33

by NeilBrown

[permalink] [raw]
Subject: [PATCH 04/14] SUNRPC: use sv_lock to protect updates to sv_nrthreads.

Using sv_lock means we don't need to hold the service mutex over these
updates.

In particular, svc_exit_thread() no longer requires synchronisation, so
threads can exit asynchronously.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/nfssvc.c | 9 ++++-----
net/sunrpc/svc.c | 9 +++++++--
2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 9e517ddd8d4e..27a735a4df29 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -55,9 +55,8 @@ static __be32 nfsd_init_request(struct svc_rqst *,
struct svc_process_info *);

/*
- * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
- * of the svc_serv struct. In particular, ->sv_nrthreads but also to some
- * extent ->sv_temp_socks and ->sv_permsocks.
+ * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and some members
+ * of the svc_serv struct such as ->sv_temp_socks and ->sv_permsocks.
*
* If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a
* properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless
@@ -983,7 +982,6 @@ nfsd(void *vrqstp)
atomic_dec(&nfsdstats.th_cnt);

out:
- mutex_lock(&nfsd_mutex);
/* Take an extra ref so that the svc_put in svc_exit_thread()
* doesn't call svc_destroy()
*/
@@ -993,10 +991,11 @@ nfsd(void *vrqstp)
svc_exit_thread(rqstp);

/* Now if needed we call svc_destroy in appropriate context */
+ mutex_lock(&nfsd_mutex);
nfsd_put(net);
+ mutex_unlock(&nfsd_mutex);

/* Release module */
- mutex_unlock(&nfsd_mutex);
module_put_and_exit(0);
return 0;
}
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index bd610709e88c..a099d0145d89 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -523,7 +523,7 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net);

/*
* Destroy an RPC service. Should be called with appropriate locking to
- * protect the sv_nrthreads, sv_permsocks and sv_tempsocks.
+ * protect sv_permsocks and sv_tempsocks.
*/
void
svc_destroy(struct kref *ref)
@@ -641,7 +641,10 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
return ERR_PTR(-ENOMEM);

svc_get(serv);
- serv->sv_nrthreads++;
+ spin_lock_bh(&serv->sv_lock);
+ serv->sv_nrthreads += 1;
+ spin_unlock_bh(&serv->sv_lock);
+
spin_lock_bh(&pool->sp_lock);
pool->sp_nrthreads++;
list_add_rcu(&rqstp->rq_all, &pool->sp_all_threads);
@@ -882,7 +885,9 @@ svc_exit_thread(struct svc_rqst *rqstp)
list_del_rcu(&rqstp->rq_all);
spin_unlock_bh(&pool->sp_lock);

+ spin_lock_bh(&serv->sv_lock);
serv->sv_nrthreads -= 1;
+ spin_unlock_bh(&serv->sv_lock);
svc_sock_update_bufs(serv);

svc_rqst_free(rqstp);



2021-11-17 00:47:39

by NeilBrown

[permalink] [raw]
Subject: [PATCH 05/14] NFSD: Make it possible to use svc_set_num_threads_sync

nfsd cannot currently use svc_set_num_threads_sync. It instead
uses svc_set_num_threads which does *not* wait for threads to all
exit, and has a separate mechanism (nfsd_shutdown_complete) to wait
for completion.

The reason that nfsd is unlike other services is that nfsd threads can
exit separately from svc_set_num_threads being called - they die on
receipt of SIGKILL. Also, when the last thread exits, the service must
be shut down (sockets closed).

For this, the nfsd_mutex needs to be taken, and as that mutex needs to
be held while svc_set_num_threads is called, the one cannot wait for
the other.

This patch changes the nfsd thread so that it can drop the ref on the
service without blocking on nfsd_mutex, so that svc_set_num_threads_sync
can be used:
- if it can drop a non-last reference, it does that. This does not
trigger shutdown and does not require a mutex. This will likely
happen for all but the last thread signalled, and for all threads
being shut down by nfsd_shutdown_threads()
- if it can get the mutex without blocking (trylock), it does that
and then drops the reference. This will likely happen for the
last thread killed by SIGKILL
- Otherwise there might be an unrelated task holding the mutex,
possibly in another network namespace, or nfsd_shutdown_threads()
might be just about to get a reference on the service, after which
we can drop ours safely.
We cannot conveniently get wakeup notifications on these events,
and we are unlikely to need to, so we sleep briefly and check again.

With this we can discard nfsd_shutdown_complete and
nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/netns.h | 3 ---
fs/nfsd/nfssvc.c | 41 ++++++++++++++++++++---------------------
include/linux/sunrpc/svc.h | 5 +++++
3 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 08bcd8f23b01..1fd59eb0730b 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -134,9 +134,6 @@ struct nfsd_net {
wait_queue_head_t ntf_wq;
atomic_t ntf_refcnt;

- /* Allow umount to wait for nfsd state cleanup */
- struct completion nfsd_shutdown_complete;
-
/*
* clientid and stateid data for construction of net unique COPY
* stateids.
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 27a735a4df29..bf2813ac4443 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -593,20 +593,10 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = {
.svo_shutdown = nfsd_last_thread,
.svo_function = nfsd,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
- .svo_setup = svc_set_num_threads,
+ .svo_setup = svc_set_num_threads_sync,
.svo_module = THIS_MODULE,
};

-static void nfsd_complete_shutdown(struct net *net)
-{
- struct nfsd_net *nn = net_generic(net, nfsd_net_id);
-
- WARN_ON(!mutex_is_locked(&nfsd_mutex));
-
- nn->nfsd_serv = NULL;
- complete(&nn->nfsd_shutdown_complete);
-}
-
void nfsd_shutdown_threads(struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
@@ -624,8 +614,6 @@ void nfsd_shutdown_threads(struct net *net)
serv->sv_ops->svo_setup(serv, NULL, 0);
nfsd_put(net);
mutex_unlock(&nfsd_mutex);
- /* Wait for shutdown of nfsd_serv to complete */
- wait_for_completion(&nn->nfsd_shutdown_complete);
}

bool i_am_nfsd(void)
@@ -650,7 +638,6 @@ int nfsd_create_serv(struct net *net)
&nfsd_thread_sv_ops);
if (nn->nfsd_serv == NULL)
return -ENOMEM;
- init_completion(&nn->nfsd_shutdown_complete);

nn->nfsd_serv->sv_maxconn = nn->max_connections;
error = svc_bind(nn->nfsd_serv, net);
@@ -659,7 +646,7 @@ int nfsd_create_serv(struct net *net)
* been set up yet.
*/
svc_put(nn->nfsd_serv);
- nfsd_complete_shutdown(net);
+ nn->nfsd_serv = NULL;
return error;
}

@@ -716,7 +703,7 @@ void nfsd_put(struct net *net)
if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) {
svc_shutdown_net(nn->nfsd_serv, net);
svc_destroy(&nn->nfsd_serv->sv_refcnt);
- nfsd_complete_shutdown(net);
+ nn->nfsd_serv = NULL;
}
}

@@ -744,7 +731,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
if (tot > NFSD_MAXSERVS) {
/* total too large: scale down requested numbers */
for (i = 0; i < n && tot > 0; i++) {
- int new = nthreads[i] * NFSD_MAXSERVS / tot;
+ int new = nthreads[i] * NFSD_MAXSERVS / tot;
tot -= (nthreads[i] - new);
nthreads[i] = new;
}
@@ -990,10 +977,22 @@ nfsd(void *vrqstp)
/* Release the thread */
svc_exit_thread(rqstp);

- /* Now if needed we call svc_destroy in appropriate context */
- mutex_lock(&nfsd_mutex);
- nfsd_put(net);
- mutex_unlock(&nfsd_mutex);
+ /* We need to drop a ref, but may not drop the last reference
+ * without holding nfsd_mutex, and we cannot wait for nfsd_mutex as that
+ * could deadlock with nfsd_shutdown_threads() waiting for us.
+ * So three options are:
+ * - drop a non-final reference,
+ * - get the mutex without waiting
+ * - sleep briefly andd try the above again
+ */
+ while (!svc_put_not_last(nn->nfsd_serv)) {
+ if (mutex_trylock(&nfsd_mutex)) {
+ nfsd_put(net);
+ mutex_unlock(&nfsd_mutex);
+ break;
+ }
+ msleep(20);
+ }

/* Release module */
module_put_and_exit(0);
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 94c14c61a5f9..dc7bd89f1284 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -127,6 +127,11 @@ static inline int svc_put(struct svc_serv *serv)
return kref_put(&serv->sv_refcnt, svc_destroy);
}

+static inline int svc_put_not_last(struct svc_serv *serv)
+{
+ return refcount_dec_not_one(&serv->sv_refcnt.refcount);
+}
+
/*
* Maximum payload size supported by a kernel RPC server.
* This is use to determine the max number of pages nfsd is



2021-11-17 00:47:42

by NeilBrown

[permalink] [raw]
Subject: [PATCH 06/14] SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()

The ->svo_setup callback serves no purpose. It is always called from
within the same module that chooses which callback is needed. So
discard it and call the relevant function directly.

Now that svc_set_num_threads() is no longer used remove it and rename
svc_set_num_threads_sync() to remove the "_sync" suffix.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/callback.c | 8 +++----
fs/nfsd/nfssvc.c | 11 ++++------
include/linux/sunrpc/svc.h | 4 ----
net/sunrpc/svc.c | 49 ++------------------------------------------
4 files changed, 10 insertions(+), 62 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index d9d78ffd1d65..6cdc9d18a7dd 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -172,9 +172,9 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt,
if (serv->sv_nrthreads == nrservs)
return 0;

- ret = serv->sv_ops->svo_setup(serv, NULL, nrservs);
+ ret = svc_set_num_threads(serv, NULL, nrservs);
if (ret) {
- serv->sv_ops->svo_setup(serv, NULL, 0);
+ svc_set_num_threads(serv, NULL, 0);
return ret;
}
dprintk("nfs_callback_up: service started\n");
@@ -235,14 +235,12 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv,
static const struct svc_serv_ops nfs40_cb_sv_ops = {
.svo_function = nfs4_callback_svc,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
- .svo_setup = svc_set_num_threads_sync,
.svo_module = THIS_MODULE,
};
#if defined(CONFIG_NFS_V4_1)
static const struct svc_serv_ops nfs41_cb_sv_ops = {
.svo_function = nfs41_callback_svc,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
- .svo_setup = svc_set_num_threads_sync,
.svo_module = THIS_MODULE,
};

@@ -357,7 +355,7 @@ void nfs_callback_down(int minorversion, struct net *net)
cb_info->users--;
if (cb_info->users == 0) {
svc_get(serv);
- serv->sv_ops->svo_setup(serv, NULL, 0);
+ svc_set_num_threads(serv, NULL, 0);
svc_put(serv);
dprintk("nfs_callback_down: service destroyed\n");
cb_info->serv = NULL;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index bf2813ac4443..37408b644607 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -593,7 +593,6 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = {
.svo_shutdown = nfsd_last_thread,
.svo_function = nfsd,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
- .svo_setup = svc_set_num_threads_sync,
.svo_module = THIS_MODULE,
};

@@ -611,7 +610,7 @@ void nfsd_shutdown_threads(struct net *net)

svc_get(serv);
/* Kill outstanding nfsd threads */
- serv->sv_ops->svo_setup(serv, NULL, 0);
+ svc_set_num_threads(serv, NULL, 0);
nfsd_put(net);
mutex_unlock(&nfsd_mutex);
}
@@ -751,8 +750,9 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
/* apply the new numbers */
svc_get(nn->nfsd_serv);
for (i = 0; i < n; i++) {
- err = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv,
- &nn->nfsd_serv->sv_pools[i], nthreads[i]);
+ err = svc_set_num_threads(nn->nfsd_serv,
+ &nn->nfsd_serv->sv_pools[i],
+ nthreads[i]);
if (err)
break;
}
@@ -794,8 +794,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred)
error = nfsd_startup_net(net, cred);
if (error)
goto out_put;
- error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv,
- NULL, nrservs);
+ error = svc_set_num_threads(nn->nfsd_serv, NULL, nrservs);
if (error)
goto out_shutdown;
error = nn->nfsd_serv->sv_nrthreads;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index dc7bd89f1284..e544444b0259 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -64,9 +64,6 @@ struct svc_serv_ops {
/* queue up a transport for servicing */
void (*svo_enqueue_xprt)(struct svc_xprt *);

- /* set up thread (or whatever) execution context */
- int (*svo_setup)(struct svc_serv *, struct svc_pool *, int);
-
/* optional module to count when adding threads (pooled svcs only) */
struct module *svo_module;
};
@@ -519,7 +516,6 @@ void svc_pool_map_put(void);
struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int,
const struct svc_serv_ops *);
int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
-int svc_set_num_threads_sync(struct svc_serv *, struct svc_pool *, int);
int svc_pool_stats_open(struct svc_serv *serv, struct file *file);
void svc_shutdown_net(struct svc_serv *, struct net *);
int svc_process(struct svc_rqst *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index a099d0145d89..18fc18778a80 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -745,58 +745,13 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
return 0;
}

-
-/* destroy old threads */
-static int
-svc_signal_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
-{
- struct task_struct *task;
- unsigned int state = serv->sv_nrthreads-1;
-
- /* destroy old threads */
- do {
- task = choose_victim(serv, pool, &state);
- if (task == NULL)
- break;
- send_sig(SIGINT, task, 1);
- nrservs++;
- } while (nrservs < 0);
-
- return 0;
-}
-
/*
* Create or destroy enough new threads to make the number
* of threads the given number. If `pool' is non-NULL, applies
* only to threads in that pool, otherwise round-robins between
* all pools. Caller must ensure that mutual exclusion between this and
* server startup or shutdown.
- *
- * Destroying threads relies on the service threads filling in
- * rqstp->rq_task, which only the nfs ones do. Assumes the serv
- * has been created using svc_create_pooled().
- *
- * Based on code that used to be in nfsd_svc() but tweaked
- * to be pool-aware.
*/
-int
-svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
-{
- if (pool == NULL) {
- nrservs -= serv->sv_nrthreads;
- } else {
- spin_lock_bh(&pool->sp_lock);
- nrservs -= pool->sp_nrthreads;
- spin_unlock_bh(&pool->sp_lock);
- }
-
- if (nrservs > 0)
- return svc_start_kthreads(serv, pool, nrservs);
- if (nrservs < 0)
- return svc_signal_kthreads(serv, pool, nrservs);
- return 0;
-}
-EXPORT_SYMBOL_GPL(svc_set_num_threads);

/* destroy old threads */
static int
@@ -817,7 +772,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
}

int
-svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
+svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
if (pool == NULL) {
nrservs -= serv->sv_nrthreads;
@@ -833,7 +788,7 @@ svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrser
return svc_stop_kthreads(serv, pool, nrservs);
return 0;
}
-EXPORT_SYMBOL_GPL(svc_set_num_threads_sync);
+EXPORT_SYMBOL_GPL(svc_set_num_threads);

/**
* svc_rqst_replace_page - Replace one page in rq_pages[]



2021-11-17 00:47:51

by NeilBrown

[permalink] [raw]
Subject: [PATCH 07/14] NFSD: simplify locking for network notifier.

nfsd currently maintains an open-coded read/write semaphore (refcount
and wait queue) for each network namespace to ensure the nfs service
isn't shut down while the notifier is running.

This is excessive. As there is unlikely to be contention between
notifiers and they run without sleeping, a single spinlock is sufficient
to avoid problems.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/netns.h | 3 ---
fs/nfsd/nfsctl.c | 2 --
fs/nfsd/nfssvc.c | 38 ++++++++++++++++++++------------------
3 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 1fd59eb0730b..021acdc0d03b 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -131,9 +131,6 @@ struct nfsd_net {
*/
int keep_active;

- wait_queue_head_t ntf_wq;
- atomic_t ntf_refcnt;
-
/*
* clientid and stateid data for construction of net unique COPY
* stateids.
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 22dbd638f6c8..28c26429988e 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1482,8 +1482,6 @@ static __net_init int nfsd_init_net(struct net *net)
nn->clientid_counter = nn->clientid_base + 1;
nn->s2s_cp_cl_id = nn->clientid_counter++;

- atomic_set(&nn->ntf_refcnt, 0);
- init_waitqueue_head(&nn->ntf_wq);
seqlock_init(&nn->boot_lock);

return 0;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 37408b644607..e00dc1172786 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -434,6 +434,7 @@ static void nfsd_shutdown_net(struct net *net)
nfsd_shutdown_generic();
}

+DEFINE_SPINLOCK(nfsd_notifier_lock);
static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
void *ptr)
{
@@ -443,18 +444,17 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct sockaddr_in sin;

- if ((event != NETDEV_DOWN) ||
- !atomic_inc_not_zero(&nn->ntf_refcnt))
+ if (event != NETDEV_DOWN || !nn->nfsd_serv)
goto out;

+ spin_lock(&nfsd_notifier_lock);
if (nn->nfsd_serv) {
dprintk("nfsd_inetaddr_event: removed %pI4\n", &ifa->ifa_local);
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = ifa->ifa_local;
svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin);
}
- atomic_dec(&nn->ntf_refcnt);
- wake_up(&nn->ntf_wq);
+ spin_unlock(&nfsd_notifier_lock);

out:
return NOTIFY_DONE;
@@ -474,10 +474,10 @@ static int nfsd_inet6addr_event(struct notifier_block *this,
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct sockaddr_in6 sin6;

- if ((event != NETDEV_DOWN) ||
- !atomic_inc_not_zero(&nn->ntf_refcnt))
+ if (event != NETDEV_DOWN || !nn->nfsd_serv)
goto out;

+ spin_lock(&nfsd_notifier_lock);
if (nn->nfsd_serv) {
dprintk("nfsd_inet6addr_event: removed %pI6\n", &ifa->addr);
sin6.sin6_family = AF_INET6;
@@ -486,8 +486,8 @@ static int nfsd_inet6addr_event(struct notifier_block *this,
sin6.sin6_scope_id = ifa->idev->dev->ifindex;
svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin6);
}
- atomic_dec(&nn->ntf_refcnt);
- wake_up(&nn->ntf_wq);
+ spin_unlock(&nfsd_notifier_lock);
+
out:
return NOTIFY_DONE;
}
@@ -504,7 +504,6 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);

- atomic_dec(&nn->ntf_refcnt);
/* check if the notifier still has clients */
if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
unregister_inetaddr_notifier(&nfsd_inetaddr_notifier);
@@ -512,7 +511,6 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
unregister_inet6addr_notifier(&nfsd_inet6addr_notifier);
#endif
}
- wait_event(nn->ntf_wq, atomic_read(&nn->ntf_refcnt) == 0);

/*
* write_ports can create the server without actually starting
@@ -624,6 +622,7 @@ int nfsd_create_serv(struct net *net)
{
int error;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv;

WARN_ON(!mutex_is_locked(&nfsd_mutex));
if (nn->nfsd_serv) {
@@ -633,21 +632,23 @@ int nfsd_create_serv(struct net *net)
if (nfsd_max_blksize == 0)
nfsd_max_blksize = nfsd_get_default_max_blksize();
nfsd_reset_versions(nn);
- nn->nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize,
- &nfsd_thread_sv_ops);
- if (nn->nfsd_serv == NULL)
+ serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize,
+ &nfsd_thread_sv_ops);
+ if (serv == NULL)
return -ENOMEM;

- nn->nfsd_serv->sv_maxconn = nn->max_connections;
- error = svc_bind(nn->nfsd_serv, net);
+ serv->sv_maxconn = nn->max_connections;
+ error = svc_bind(serv, net);
if (error < 0) {
/* NOT nfsd_put() as notifiers (see below) haven't
* been set up yet.
*/
- svc_put(nn->nfsd_serv);
- nn->nfsd_serv = NULL;
+ svc_put(serv);
return error;
}
+ spin_lock(&nfsd_notifier_lock);
+ nn->nfsd_serv = serv;
+ spin_unlock(&nfsd_notifier_lock);

set_max_drc();
/* check if the notifier is already set */
@@ -657,7 +658,6 @@ int nfsd_create_serv(struct net *net)
register_inet6addr_notifier(&nfsd_inet6addr_notifier);
#endif
}
- atomic_inc(&nn->ntf_refcnt);
nfsd_reset_boot_verifier(nn);
svc_get(nn->nfsd_serv);
return 0;
@@ -702,7 +702,9 @@ void nfsd_put(struct net *net)
if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) {
svc_shutdown_net(nn->nfsd_serv, net);
svc_destroy(&nn->nfsd_serv->sv_refcnt);
+ spin_lock(&nfsd_notifier_lock);
nn->nfsd_serv = NULL;
+ spin_unlock(&nfsd_notifier_lock);
}
}




2021-11-17 00:48:00

by NeilBrown

[permalink] [raw]
Subject: [PATCH 08/14] lockd: introduce nlmsvc_serv

lockd has two globals - nlmsvc_task and nlmsvc_rqst - but mostly it
wants the 'struct svc_serv', and when it doesn't want it exactly it can
get to what it wants from the serv.

This patch is a first step to removing nlmsvc_task and nlmsvc_rqst. It
introduces nlmsvc_serv to store the 'struct svc_serv*'. This is set as
soon as the serv is created, and cleared only when it is destroyed.

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index a9669b106dbd..83874878f41d 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -54,6 +54,7 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops);

static DEFINE_MUTEX(nlmsvc_mutex);
static unsigned int nlmsvc_users;
+static struct svc_serv *nlmsvc_serv;
static struct task_struct *nlmsvc_task;
static struct svc_rqst *nlmsvc_rqst;
unsigned long nlmsvc_timeout;
@@ -306,13 +307,12 @@ static int lockd_inetaddr_event(struct notifier_block *this,
!atomic_inc_not_zero(&nlm_ntf_refcnt))
goto out;

- if (nlmsvc_rqst) {
+ if (nlmsvc_serv) {
dprintk("lockd_inetaddr_event: removed %pI4\n",
&ifa->ifa_local);
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = ifa->ifa_local;
- svc_age_temp_xprts_now(nlmsvc_rqst->rq_server,
- (struct sockaddr *)&sin);
+ svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin);
}
atomic_dec(&nlm_ntf_refcnt);
wake_up(&nlm_ntf_wq);
@@ -336,14 +336,13 @@ static int lockd_inet6addr_event(struct notifier_block *this,
!atomic_inc_not_zero(&nlm_ntf_refcnt))
goto out;

- if (nlmsvc_rqst) {
+ if (nlmsvc_serv) {
dprintk("lockd_inet6addr_event: removed %pI6\n", &ifa->addr);
sin6.sin6_family = AF_INET6;
sin6.sin6_addr = ifa->addr;
if (ipv6_addr_type(&sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL)
sin6.sin6_scope_id = ifa->idev->dev->ifindex;
- svc_age_temp_xprts_now(nlmsvc_rqst->rq_server,
- (struct sockaddr *)&sin6);
+ svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin6);
}
atomic_dec(&nlm_ntf_refcnt);
wake_up(&nlm_ntf_wq);
@@ -423,15 +422,17 @@ static const struct svc_serv_ops lockd_sv_ops = {
.svo_enqueue_xprt = svc_xprt_do_enqueue,
};

-static struct svc_serv *lockd_create_svc(void)
+static int lockd_create_svc(void)
{
struct svc_serv *serv;

/*
* Check whether we're already up and running.
*/
- if (nlmsvc_rqst)
- return svc_get(nlmsvc_rqst->rq_server);
+ if (nlmsvc_serv) {
+ svc_get(nlmsvc_serv);
+ return 0;
+ }

/*
* Sanity check: if there's no pid,
@@ -448,14 +449,15 @@ static struct svc_serv *lockd_create_svc(void)
serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, &lockd_sv_ops);
if (!serv) {
printk(KERN_WARNING "lockd_up: create service failed\n");
- return ERR_PTR(-ENOMEM);
+ return -ENOMEM;
}
+ nlmsvc_serv = serv;
register_inetaddr_notifier(&lockd_inetaddr_notifier);
#if IS_ENABLED(CONFIG_IPV6)
register_inet6addr_notifier(&lockd_inet6addr_notifier);
#endif
dprintk("lockd_up: service created\n");
- return serv;
+ return 0;
}

/*
@@ -468,11 +470,10 @@ int lockd_up(struct net *net, const struct cred *cred)

mutex_lock(&nlmsvc_mutex);

- serv = lockd_create_svc();
- if (IS_ERR(serv)) {
- error = PTR_ERR(serv);
+ error = lockd_create_svc();
+ if (error)
goto err_create;
- }
+ serv = nlmsvc_serv;

error = lockd_up_net(serv, net, cred);
if (error < 0) {
@@ -487,6 +488,8 @@ int lockd_up(struct net *net, const struct cred *cred)
}
nlmsvc_users++;
err_put:
+ if (nlmsvc_users == 0)
+ nlmsvc_serv = NULL;
svc_put(serv);
err_create:
mutex_unlock(&nlmsvc_mutex);
@@ -501,7 +504,7 @@ void
lockd_down(struct net *net)
{
mutex_lock(&nlmsvc_mutex);
- lockd_down_net(nlmsvc_rqst->rq_server, net);
+ lockd_down_net(nlmsvc_serv, net);
if (nlmsvc_users) {
if (--nlmsvc_users)
goto out;
@@ -519,6 +522,7 @@ lockd_down(struct net *net)
dprintk("lockd_down: service stopped\n");
lockd_svc_exit_thread();
dprintk("lockd_down: service destroyed\n");
+ nlmsvc_serv = NULL;
nlmsvc_task = NULL;
nlmsvc_rqst = NULL;
out:



2021-11-17 00:48:07

by NeilBrown

[permalink] [raw]
Subject: [PATCH 09/14] lockd: simplify management of network status notifiers

Now that the network status notifiers use nlmsvc_serv rather then
nlmsvc_rqst the management can be simplified.

Notifier unregistration synchronises with any pending notifications so
providing we unregister before nlm_serv is freed no further interlock
is required.

So we move the unregister call to just before the thread is killed
(which destroys the service) and just before the service is destroyed in
the failure-path of lockd_up().

Then nlm_ntf_refcnt and nlm_ntf_wq can be removed.

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 35 +++++++++--------------------------
1 file changed, 9 insertions(+), 26 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 83874878f41d..20cebb191350 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -59,9 +59,6 @@ static struct task_struct *nlmsvc_task;
static struct svc_rqst *nlmsvc_rqst;
unsigned long nlmsvc_timeout;

-static atomic_t nlm_ntf_refcnt = ATOMIC_INIT(0);
-static DECLARE_WAIT_QUEUE_HEAD(nlm_ntf_wq);
-
unsigned int lockd_net_id;

/*
@@ -303,8 +300,7 @@ static int lockd_inetaddr_event(struct notifier_block *this,
struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
struct sockaddr_in sin;

- if ((event != NETDEV_DOWN) ||
- !atomic_inc_not_zero(&nlm_ntf_refcnt))
+ if (event != NETDEV_DOWN)
goto out;

if (nlmsvc_serv) {
@@ -314,8 +310,6 @@ static int lockd_inetaddr_event(struct notifier_block *this,
sin.sin_addr.s_addr = ifa->ifa_local;
svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin);
}
- atomic_dec(&nlm_ntf_refcnt);
- wake_up(&nlm_ntf_wq);

out:
return NOTIFY_DONE;
@@ -332,8 +326,7 @@ static int lockd_inet6addr_event(struct notifier_block *this,
struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
struct sockaddr_in6 sin6;

- if ((event != NETDEV_DOWN) ||
- !atomic_inc_not_zero(&nlm_ntf_refcnt))
+ if (event != NETDEV_DOWN)
goto out;

if (nlmsvc_serv) {
@@ -344,8 +337,6 @@ static int lockd_inet6addr_event(struct notifier_block *this,
sin6.sin6_scope_id = ifa->idev->dev->ifindex;
svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin6);
}
- atomic_dec(&nlm_ntf_refcnt);
- wake_up(&nlm_ntf_wq);

out:
return NOTIFY_DONE;
@@ -362,14 +353,6 @@ static void lockd_unregister_notifiers(void)
#if IS_ENABLED(CONFIG_IPV6)
unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
#endif
- wait_event(nlm_ntf_wq, atomic_read(&nlm_ntf_refcnt) == 0);
-}
-
-static void lockd_svc_exit_thread(void)
-{
- atomic_dec(&nlm_ntf_refcnt);
- lockd_unregister_notifiers();
- svc_exit_thread(nlmsvc_rqst);
}

static int lockd_start_svc(struct svc_serv *serv)
@@ -388,11 +371,9 @@ static int lockd_start_svc(struct svc_serv *serv)
printk(KERN_WARNING
"lockd_up: svc_rqst allocation failed, error=%d\n",
error);
- lockd_unregister_notifiers();
goto out_rqst;
}

- atomic_inc(&nlm_ntf_refcnt);
svc_sock_update_bufs(serv);
serv->sv_maxconn = nlm_max_connections;

@@ -410,7 +391,7 @@ static int lockd_start_svc(struct svc_serv *serv)
return 0;

out_task:
- lockd_svc_exit_thread();
+ svc_exit_thread(nlmsvc_rqst);
nlmsvc_task = NULL;
out_rqst:
nlmsvc_rqst = NULL;
@@ -477,7 +458,6 @@ int lockd_up(struct net *net, const struct cred *cred)

error = lockd_up_net(serv, net, cred);
if (error < 0) {
- lockd_unregister_notifiers();
goto err_put;
}

@@ -488,8 +468,10 @@ int lockd_up(struct net *net, const struct cred *cred)
}
nlmsvc_users++;
err_put:
- if (nlmsvc_users == 0)
+ if (nlmsvc_users == 0) {
+ lockd_unregister_notifiers();
nlmsvc_serv = NULL;
+ }
svc_put(serv);
err_create:
mutex_unlock(&nlmsvc_mutex);
@@ -518,13 +500,14 @@ lockd_down(struct net *net)
printk(KERN_ERR "lockd_down: no lockd running.\n");
BUG();
}
+ lockd_unregister_notifiers();
kthread_stop(nlmsvc_task);
dprintk("lockd_down: service stopped\n");
- lockd_svc_exit_thread();
+ svc_exit_thread(nlmsvc_rqst);
+ nlmsvc_rqst = NULL;
dprintk("lockd_down: service destroyed\n");
nlmsvc_serv = NULL;
nlmsvc_task = NULL;
- nlmsvc_rqst = NULL;
out:
mutex_unlock(&nlmsvc_mutex);
}



2021-11-17 00:48:10

by NeilBrown

[permalink] [raw]
Subject: [PATCH 10/14] lockd: move lockd_start_svc() call into lockd_create_svc()

lockd_start_svc() only needs to be called once, just after the svc is
created. If the start fails, the svc is discarded too.

It thus makes sense to call lockd_start_svc() from lockd_create_svc().
This allows us to remove the test against nlmsvc_rqst at the start of
lockd_start_svc() - it must always be NULL.

lockd_up() only held an extra reference on the svc until a thread was
created - then it dropped it. The thread - and thus the extra reference
- will remain until kthread_stop() is called.
Now that the thread is created in lockd_create_svc(), the extra
reference can be dropped there. So the 'serv' variable is no longer
needed in lockd_up().

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 20cebb191350..91e7c839841e 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -359,9 +359,6 @@ static int lockd_start_svc(struct svc_serv *serv)
{
int error;

- if (nlmsvc_rqst)
- return 0;
-
/*
* Create the kernel thread and wait for it to start.
*/
@@ -406,6 +403,7 @@ static const struct svc_serv_ops lockd_sv_ops = {
static int lockd_create_svc(void)
{
struct svc_serv *serv;
+ int error;

/*
* Check whether we're already up and running.
@@ -432,6 +430,13 @@ static int lockd_create_svc(void)
printk(KERN_WARNING "lockd_up: create service failed\n");
return -ENOMEM;
}
+
+ error = lockd_start_svc(serv);
+ /* The thread now holds the only reference */
+ svc_put(serv);
+ if (error < 0)
+ return error;
+
nlmsvc_serv = serv;
register_inetaddr_notifier(&lockd_inetaddr_notifier);
#if IS_ENABLED(CONFIG_IPV6)
@@ -446,7 +451,6 @@ static int lockd_create_svc(void)
*/
int lockd_up(struct net *net, const struct cred *cred)
{
- struct svc_serv *serv;
int error;

mutex_lock(&nlmsvc_mutex);
@@ -454,25 +458,19 @@ int lockd_up(struct net *net, const struct cred *cred)
error = lockd_create_svc();
if (error)
goto err_create;
- serv = nlmsvc_serv;

- error = lockd_up_net(serv, net, cred);
+ error = lockd_up_net(nlmsvc_serv, net, cred);
if (error < 0) {
goto err_put;
}

- error = lockd_start_svc(serv);
- if (error < 0) {
- lockd_down_net(serv, net);
- goto err_put;
- }
nlmsvc_users++;
err_put:
if (nlmsvc_users == 0) {
lockd_unregister_notifiers();
+ kthread_stop(nlmsvc_task);
nlmsvc_serv = NULL;
}
- svc_put(serv);
err_create:
mutex_unlock(&nlmsvc_mutex);
return error;



2021-11-17 00:48:17

by NeilBrown

[permalink] [raw]
Subject: [PATCH 11/14] lockd: move svc_exit_thread() into the thread

The normal place to call svc_exit_thread() is from the thread itself
just before it exists.
Do this for lockd.

This means that nlmsvc_rqst is not used out side of lockd_start_svc(),
so it can be made local to that function, and renamed to 'rqst'.

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 91e7c839841e..9aa499a76159 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -56,7 +56,6 @@ static DEFINE_MUTEX(nlmsvc_mutex);
static unsigned int nlmsvc_users;
static struct svc_serv *nlmsvc_serv;
static struct task_struct *nlmsvc_task;
-static struct svc_rqst *nlmsvc_rqst;
unsigned long nlmsvc_timeout;

unsigned int lockd_net_id;
@@ -182,6 +181,11 @@ lockd(void *vrqstp)
nlm_shutdown_hosts();
cancel_delayed_work_sync(&ln->grace_period_end);
locks_end_grace(&ln->lockd_manager);
+
+ dprintk("lockd_down: service stopped\n");
+
+ svc_exit_thread(rqstp);
+
return 0;
}

@@ -358,13 +362,14 @@ static void lockd_unregister_notifiers(void)
static int lockd_start_svc(struct svc_serv *serv)
{
int error;
+ struct svc_rqst *rqst;

/*
* Create the kernel thread and wait for it to start.
*/
- nlmsvc_rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
- if (IS_ERR(nlmsvc_rqst)) {
- error = PTR_ERR(nlmsvc_rqst);
+ rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
+ if (IS_ERR(rqst)) {
+ error = PTR_ERR(rqst);
printk(KERN_WARNING
"lockd_up: svc_rqst allocation failed, error=%d\n",
error);
@@ -374,24 +379,23 @@ static int lockd_start_svc(struct svc_serv *serv)
svc_sock_update_bufs(serv);
serv->sv_maxconn = nlm_max_connections;

- nlmsvc_task = kthread_create(lockd, nlmsvc_rqst, "%s", serv->sv_name);
+ nlmsvc_task = kthread_create(lockd, rqst, "%s", serv->sv_name);
if (IS_ERR(nlmsvc_task)) {
error = PTR_ERR(nlmsvc_task);
printk(KERN_WARNING
"lockd_up: kthread_run failed, error=%d\n", error);
goto out_task;
}
- nlmsvc_rqst->rq_task = nlmsvc_task;
+ rqst->rq_task = nlmsvc_task;
wake_up_process(nlmsvc_task);

dprintk("lockd_up: service started\n");
return 0;

out_task:
- svc_exit_thread(nlmsvc_rqst);
+ svc_exit_thread(rqst);
nlmsvc_task = NULL;
out_rqst:
- nlmsvc_rqst = NULL;
return error;
}

@@ -500,9 +504,6 @@ lockd_down(struct net *net)
}
lockd_unregister_notifiers();
kthread_stop(nlmsvc_task);
- dprintk("lockd_down: service stopped\n");
- svc_exit_thread(nlmsvc_rqst);
- nlmsvc_rqst = NULL;
dprintk("lockd_down: service destroyed\n");
nlmsvc_serv = NULL;
nlmsvc_task = NULL;



2021-11-17 00:48:24

by NeilBrown

[permalink] [raw]
Subject: [PATCH 12/14] lockd: introduce lockd_put()

There is some cleanup that is duplicated in lockd_down() and the failure
path of lockd_up().
Factor these out into a new lockd_put() and call it from both places.

lockd_put() does *not* take the mutex - that must be held by the caller.
It decrements nlmsvc_users and if that reaches zero, it cleans up.

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 64 ++++++++++++++++++++++++--------------------------------
1 file changed, 27 insertions(+), 37 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 9aa499a76159..7f12c280fd30 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -351,14 +351,6 @@ static struct notifier_block lockd_inet6addr_notifier = {
};
#endif

-static void lockd_unregister_notifiers(void)
-{
- unregister_inetaddr_notifier(&lockd_inetaddr_notifier);
-#if IS_ENABLED(CONFIG_IPV6)
- unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
-#endif
-}
-
static int lockd_start_svc(struct svc_serv *serv)
{
int error;
@@ -450,6 +442,27 @@ static int lockd_create_svc(void)
return 0;
}

+static void lockd_put(void)
+{
+ if (WARN(nlmsvc_users <= 0, "lockd_down: no users!\n"))
+ return;
+ if (--nlmsvc_users)
+ return;
+
+ unregister_inetaddr_notifier(&lockd_inetaddr_notifier);
+#if IS_ENABLED(CONFIG_IPV6)
+ unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
+#endif
+
+ if (nlmsvc_task) {
+ kthread_stop(nlmsvc_task);
+ dprintk("lockd_down: service stopped\n");
+ nlmsvc_task = NULL;
+ }
+ nlmsvc_serv = NULL;
+ dprintk("lockd_down: service destroyed\n");
+}
+
/*
* Bring up the lockd process if it's not already up.
*/
@@ -461,21 +474,16 @@ int lockd_up(struct net *net, const struct cred *cred)

error = lockd_create_svc();
if (error)
- goto err_create;
+ goto err;
+ nlmsvc_users++;

error = lockd_up_net(nlmsvc_serv, net, cred);
if (error < 0) {
- goto err_put;
+ lockd_put();
+ goto err;
}

- nlmsvc_users++;
-err_put:
- if (nlmsvc_users == 0) {
- lockd_unregister_notifiers();
- kthread_stop(nlmsvc_task);
- nlmsvc_serv = NULL;
- }
-err_create:
+err:
mutex_unlock(&nlmsvc_mutex);
return error;
}
@@ -489,25 +497,7 @@ lockd_down(struct net *net)
{
mutex_lock(&nlmsvc_mutex);
lockd_down_net(nlmsvc_serv, net);
- if (nlmsvc_users) {
- if (--nlmsvc_users)
- goto out;
- } else {
- printk(KERN_ERR "lockd_down: no users! task=%p\n",
- nlmsvc_task);
- BUG();
- }
-
- if (!nlmsvc_task) {
- printk(KERN_ERR "lockd_down: no lockd running.\n");
- BUG();
- }
- lockd_unregister_notifiers();
- kthread_stop(nlmsvc_task);
- dprintk("lockd_down: service destroyed\n");
- nlmsvc_serv = NULL;
- nlmsvc_task = NULL;
-out:
+ lockd_put();
mutex_unlock(&nlmsvc_mutex);
}
EXPORT_SYMBOL_GPL(lockd_down);



2021-11-17 00:48:32

by NeilBrown

[permalink] [raw]
Subject: [PATCH 13/14] lockd: rename lockd_create_svc() to lockd_get()

lockd_create_svc() already does an svc_get() if the service already
exists, so it is more like a "get" than a "create".

So:
- Move the increment of nlmsvc_users into the function as well
- rename to lockd_get().

It is now the inverse of lockd_put().

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 7f12c280fd30..1a7c11118b32 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -396,16 +396,14 @@ static const struct svc_serv_ops lockd_sv_ops = {
.svo_enqueue_xprt = svc_xprt_do_enqueue,
};

-static int lockd_create_svc(void)
+static int lockd_get(void)
{
struct svc_serv *serv;
int error;

- /*
- * Check whether we're already up and running.
- */
if (nlmsvc_serv) {
svc_get(nlmsvc_serv);
+ nlmsvc_users++;
return 0;
}

@@ -439,6 +437,7 @@ static int lockd_create_svc(void)
register_inet6addr_notifier(&lockd_inet6addr_notifier);
#endif
dprintk("lockd_up: service created\n");
+ nlmsvc_users++;
return 0;
}

@@ -472,10 +471,9 @@ int lockd_up(struct net *net, const struct cred *cred)

mutex_lock(&nlmsvc_mutex);

- error = lockd_create_svc();
+ error = lockd_get();
if (error)
goto err;
- nlmsvc_users++;

error = lockd_up_net(nlmsvc_serv, net, cred);
if (error < 0) {



2021-11-17 00:48:40

by NeilBrown

[permalink] [raw]
Subject: [PATCH 14/14] lockd: use svc_set_num_threads() for thread start and stop

svc_set_num_threads() does everything that lockd_start_svc() does, except
set sv_maxconn. It also (when passed 0) finds the threads and
stops them with kthread_stop().

So move the setting for sv_maxconn, and use svc_set_num_thread()

We now don't need nlmsvc_task.

Also set svc_module - just for consistency.

svc_prepare_thread is now only used where it is defined, so it can be
made static.

Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/svc.c | 56 ++++++--------------------------------------
include/linux/sunrpc/svc.h | 2 --
net/sunrpc/svc.c | 3 +-
3 files changed, 8 insertions(+), 53 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 1a7c11118b32..93f5a4f262f9 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -55,7 +55,6 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops);
static DEFINE_MUTEX(nlmsvc_mutex);
static unsigned int nlmsvc_users;
static struct svc_serv *nlmsvc_serv;
-static struct task_struct *nlmsvc_task;
unsigned long nlmsvc_timeout;

unsigned int lockd_net_id;
@@ -292,8 +291,8 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net)
__func__, net->ns.inum);
}
} else {
- pr_err("%s: no users! task=%p, net=%x\n",
- __func__, nlmsvc_task, net->ns.inum);
+ pr_err("%s: no users! net=%x\n",
+ __func__, net->ns.inum);
BUG();
}
}
@@ -351,49 +350,11 @@ static struct notifier_block lockd_inet6addr_notifier = {
};
#endif

-static int lockd_start_svc(struct svc_serv *serv)
-{
- int error;
- struct svc_rqst *rqst;
-
- /*
- * Create the kernel thread and wait for it to start.
- */
- rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
- if (IS_ERR(rqst)) {
- error = PTR_ERR(rqst);
- printk(KERN_WARNING
- "lockd_up: svc_rqst allocation failed, error=%d\n",
- error);
- goto out_rqst;
- }
-
- svc_sock_update_bufs(serv);
- serv->sv_maxconn = nlm_max_connections;
-
- nlmsvc_task = kthread_create(lockd, rqst, "%s", serv->sv_name);
- if (IS_ERR(nlmsvc_task)) {
- error = PTR_ERR(nlmsvc_task);
- printk(KERN_WARNING
- "lockd_up: kthread_run failed, error=%d\n", error);
- goto out_task;
- }
- rqst->rq_task = nlmsvc_task;
- wake_up_process(nlmsvc_task);
-
- dprintk("lockd_up: service started\n");
- return 0;
-
-out_task:
- svc_exit_thread(rqst);
- nlmsvc_task = NULL;
-out_rqst:
- return error;
-}
-
static const struct svc_serv_ops lockd_sv_ops = {
.svo_shutdown = svc_rpcb_cleanup,
+ .svo_function = lockd,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
+ .svo_module = THIS_MODULE,
};

static int lockd_get(void)
@@ -425,7 +386,8 @@ static int lockd_get(void)
return -ENOMEM;
}

- error = lockd_start_svc(serv);
+ serv->sv_maxconn = nlm_max_connections;
+ error = svc_set_num_threads(serv, NULL, 1);
/* The thread now holds the only reference */
svc_put(serv);
if (error < 0)
@@ -453,11 +415,7 @@ static void lockd_put(void)
unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
#endif

- if (nlmsvc_task) {
- kthread_stop(nlmsvc_task);
- dprintk("lockd_down: service stopped\n");
- nlmsvc_task = NULL;
- }
+ svc_set_num_threads(nlmsvc_serv, NULL, 0);
nlmsvc_serv = NULL;
dprintk("lockd_down: service destroyed\n");
}
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index e544444b0259..e2b128531b1d 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -505,8 +505,6 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
const struct svc_serv_ops *);
struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
struct svc_pool *pool, int node);
-struct svc_rqst *svc_prepare_thread(struct svc_serv *serv,
- struct svc_pool *pool, int node);
void svc_rqst_replace_page(struct svc_rqst *rqstp,
struct page *page);
void svc_rqst_free(struct svc_rqst *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 18fc18778a80..d7e5d833a036 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -631,7 +631,7 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node)
}
EXPORT_SYMBOL_GPL(svc_rqst_alloc);

-struct svc_rqst *
+static struct svc_rqst *
svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
{
struct svc_rqst *rqstp;
@@ -651,7 +651,6 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
spin_unlock_bh(&pool->sp_lock);
return rqstp;
}
-EXPORT_SYMBOL_GPL(svc_prepare_thread);

/*
* Choose a pool in which to create a new thread, for svc_set_num_threads



2021-11-17 14:12:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> I have a dream of making nfsd threads start and stop dynamically.

It's a good dream!

I haven't had a chance to look at these at all yet, I just kicked off
tests to run overnight, and woke up to the below.

This happened on the client, probably the first time it attempted to do
an nfsv4 mount, so something went wrong with setup of the callback
server.

--b.

[ 285.585061] divide error: 0000 [#1] PREEMPT SMP KASAN PTI
[ 285.585754] CPU: 0 PID: 5864 Comm: mount.nfs Not tainted 5.16.0-rc1-00014-g659e13af1f87 #1017
[ 285.586828] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014
[ 285.587828] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
[ 285.588501] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
[ 285.590820] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
[ 285.591418] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
[ 285.592267] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
[ 285.593145] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
[ 285.594057] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
[ 285.594940] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
[ 285.595826] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
[ 285.596851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 285.597578] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
[ 285.598510] Call Trace:
[ 285.598824] <TASK>
[ 285.599097] svc_xprt_do_enqueue+0x164/0x900 [sunrpc]
[ 285.599767] svc_xprt_received+0x181/0x3a0 [sunrpc]
[ 285.600411] _svc_create_xprt+0x2bd/0x740 [sunrpc]
[ 285.601049] ? svc_add_new_perm_xprt+0x140/0x140 [sunrpc]
[ 285.601787] ? lock_release+0x3b8/0x6d0
[ 285.602318] ? nfs_callback_up+0x7ad/0xdb0 [nfsv4]
[ 285.603617] svc_create_xprt+0x36/0x90 [sunrpc]
[ 285.604306] nfs_callback_up+0x81f/0xdb0 [nfsv4]
[ 285.604972] nfs4_init_client+0x1db/0x450 [nfsv4]
[ 285.605605] ? nfs41_init_client+0x70/0x70 [nfsv4]
[ 285.606304] nfs4_set_client+0x25f/0x410 [nfsv4]
[ 285.606912] ? nfs4_add_trunk.isra.0+0x280/0x280 [nfsv4]
[ 285.607606] nfs4_create_server+0x5f0/0xda0 [nfsv4]
[ 285.608250] ? lock_is_held_type+0xd7/0x130
[ 285.608786] ? nfs4_server_common_setup+0x670/0x670 [nfsv4]
[ 285.609505] ? __module_get+0x47/0x60
[ 285.610077] nfs4_try_get_tree+0xd3/0x250 [nfsv4]
[ 285.610690] vfs_get_tree+0x8a/0x2d0
[ 285.611152] path_mount+0x3f9/0x19e0
[ 285.611608] ? debug_check_no_obj_freed+0x1f3/0x3c0
[ 285.612227] ? lock_is_held_type+0xd7/0x130
[ 285.612757] ? finish_automount+0x8c0/0x8c0
[ 285.613281] ? user_path_at_empty+0x45/0x50
[ 285.613832] ? rcu_read_lock_sched_held+0x3f/0x70
[ 285.614456] ? kmem_cache_free+0xd9/0x1b0
[ 285.614965] __x64_sys_mount+0x1d6/0x240
[ 285.615455] ? path_mount+0x19e0/0x19e0
[ 285.615941] ? syscall_enter_from_user_mode+0x1d/0x50
[ 285.616572] do_syscall_64+0x43/0x90
[ 285.617043] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 285.617693] RIP: 0033:0x7f489fd4182e
[ 285.618206] Code: 48 8b 0d 4d 16 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1a 16 0c 00 f7 d8 64 89 01 48
[ 285.620595] RSP: 002b:00007ffdc3bdd5b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 285.621532] RAX: ffffffffffffffda RBX: 00007ffdc3bdd750 RCX: 00007f489fd4182e
[ 285.622492] RDX: 000055da46c0a510 RSI: 000055da46c0a550 RDI: 000055da46c0c2f0
[ 285.623372] RBP: 00007ffdc3bdd750 R08: 000055da46c0d050 R09: 0037332e3232312e
[ 285.624271] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 285.625158] R13: 00007ffdc3bdd660 R14: 000055da46c0ce00 R15: 000055da46c0ce90
[ 285.626097] </TASK>
[ 285.626381] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
[ 285.627622] ---[ end trace 0ea273cc87891325 ]---
[ 285.628222] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
[ 285.628945] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
[ 285.631830] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
[ 285.632557] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
[ 285.634319] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
[ 285.635430] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
[ 285.636408] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
[ 285.637369] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
[ 285.638346] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
[ 285.639434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 285.640233] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
[ 285.641194] note: mount.nfs[5864] exited with preempt_count 1
[ 562.003788] kworker/dying (773) used greatest stack depth: 23128 bytes left
[ 1356.888419] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
[ 2396.888656] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
[ 3071.387007] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
[ 3074.395010] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
[ 3082.395298] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
[ 5736.389488] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
[root@test3 ~]# uname -a
Linux test3.fieldses.org 5.16.0-rc1-00014-g659e13af1f87 #1017 SMP PREEMPT Tue Nov 16 20:51:49 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

2021-11-17 17:13:04

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.



> On Nov 16, 2021, at 7:46 PM, NeilBrown <[email protected]> wrote:
>
> I have a dream of making nfsd threads start and stop dynamically.
> This would free the admin of having to choose a number of threads to
> start.
> I'm not there yet, and I may never get there, but the current state of
> the thread management code makes it harder to experiment than it needs
> to be. There is a lot of technical debt that needs to be repaid first.
>
> This series addresses much of this debt. There are three users of
> service threads: nfsd, lockd, and nfs-callback.
> nfs-callback, the newest, is quite clean. This patch brings nfsd and
> lockd up to a similar standard, and takes advantage of this increased
> uniformity to simplify some shared interfaces.

NFSD is venerable and ancient code. I'm a fan of careful and
tasteful clean up efforts!

I've started to familiarize myself with these changes. A couple
of thoughts for v2:

1. 1/14 is doing some heavy lifting and might be split into a
lockd/nfs_cb patch and an nfsd-only patch. Or maybe there's
another cleavage that makes more sense.

2. When introducing "static inline" functions I like to see full
kerneldoc comments for those.

I'd also like to have some idea how we should be exercising this
series to ensure it is as bug-free as is reasonable. Do you have
any thoughts about that?


> It doesn't introduce any functionality improvements,

I might quibble with that assessment: replacing heavyweight
synchronization with spinlocks and atomics will have some
functional impact. That's probably where we need to be most
careful.


> and (as far as I
> know) only fixes one minor bug (can you spot it? If not, look at
> c20106944eb6 and if you can see a second place that it could have
> fixed).
>
> Thanks for your review,
> NeilBrown
>
>
> ---
>
> NeilBrown (14):
> SUNRPC: stop using ->sv_nrthreads as a refcount
> nfsd: make nfsd_stats.th_cnt atomic_t
> NFSD: narrow nfsd_mutex protection in nfsd thread
> SUNRPC: use sv_lock to protect updates to sv_nrthreads.
> NFSD: Make it possible to use svc_set_num_threads_sync
> SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()
> NFSD: simplify locking for network notifier.
> lockd: introduce nlmsvc_serv
> lockd: simplify management of network status notifiers
> lockd: move lockd_start_svc() call into lockd_create_svc()
> lockd: move svc_exit_thread() into the thread
> lockd: introduce lockd_put()
> lockd: rename lockd_create_svc() to lockd_get()
> lockd: use svc_set_num_threads() for thread start and stop
>
>
> fs/lockd/svc.c | 190 ++++++++++++-------------------------
> fs/nfs/callback.c | 8 +-
> fs/nfsd/netns.h | 6 --
> fs/nfsd/nfsctl.c | 2 -
> fs/nfsd/nfssvc.c | 99 +++++++++----------
> fs/nfsd/stats.c | 2 +-
> fs/nfsd/stats.h | 4 +-
> include/linux/sunrpc/svc.h | 11 +--
> net/sunrpc/svc.c | 61 ++----------
> 9 files changed, 128 insertions(+), 255 deletions(-)
>
> --
> Signature
>

--
Chuck Lever




2021-11-19 03:19:55

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Thu, 18 Nov 2021, Chuck Lever III wrote:
>
> > On Nov 16, 2021, at 7:46 PM, NeilBrown <[email protected]> wrote:
> >
> > I have a dream of making nfsd threads start and stop dynamically.
> > This would free the admin of having to choose a number of threads to
> > start.
> > I'm not there yet, and I may never get there, but the current state of
> > the thread management code makes it harder to experiment than it needs
> > to be. There is a lot of technical debt that needs to be repaid first.
> >
> > This series addresses much of this debt. There are three users of
> > service threads: nfsd, lockd, and nfs-callback.
> > nfs-callback, the newest, is quite clean. This patch brings nfsd and
> > lockd up to a similar standard, and takes advantage of this increased
> > uniformity to simplify some shared interfaces.
>
> NFSD is venerable and ancient code. I'm a fan of careful and
> tasteful clean up efforts!
>
> I've started to familiarize myself with these changes. A couple
> of thoughts for v2:
>
> 1. 1/14 is doing some heavy lifting and might be split into a
> lockd/nfs_cb patch and an nfsd-only patch. Or maybe there's
> another cleavage that makes more sense.

I've split out the renaming of svc_destroy to svc_put and related
changes.

>
> 2. When introducing "static inline" functions I like to see full
> kerneldoc comments for those.

Will do.

>
> I'd also like to have some idea how we should be exercising this
> series to ensure it is as bug-free as is reasonable. Do you have
> any thoughts about that?

As Bruce discovered, it is the starting and stopping of services that
are most affected. Races between the two might be interesting.
Failures during start could be interesting. Signalling nfsd threads vs
"rpc.nfsd 0" might have races.

>
>
> > It doesn't introduce any functionality improvements,
>
> I might quibble with that assessment: replacing heavyweight
> synchronization with spinlocks and atomics will have some
> functional impact. That's probably where we need to be most
> careful.

Agreed.

Thanks,
NeilBrown

2021-11-19 03:24:21

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > I have a dream of making nfsd threads start and stop dynamically.
>
> It's a good dream!
>
> I haven't had a chance to look at these at all yet, I just kicked off
> tests to run overnight, and woke up to the below.
>
> This happened on the client, probably the first time it attempted to do
> an nfsv4 mount, so something went wrong with setup of the callback
> server.
>
> --b.
>
> [ 285.585061] divide error: 0000 [#1] PREEMPT SMP KASAN PTI

It appears that serv->sv_nrpools is zero. I don't think I changed
anything relating to that.

I did think it off that nfs-callback uses svc_create_pooled, but doesn't
ensure it requests as many threads as pools.... probably not related
though.

I'll dig in a bit more later.

Thanks,
NeilBrown

2021-11-21 23:50:40

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > I have a dream of making nfsd threads start and stop dynamically.
>
> It's a good dream!
>
> I haven't had a chance to look at these at all yet, I just kicked off
> tests to run overnight, and woke up to the below.
>
> This happened on the client, probably the first time it attempted to do
> an nfsv4 mount, so something went wrong with setup of the callback
> server.

I cannot reproduce this and cannot see any way it could possible happen.

Could you please confirm the patches were applied on a vanilla 5.1.6-rc1
kernel, and that you don't have the "pool_mode" module parameter set.

As I said, serv->sv_nrpools is zero, so either it hasn't been set, or it
was set, but the 'serv' has been cleared and free (or freed and
reallocated and clearer), or that it was set to zero.

svc_pool_map_get() doesn't explicitly protect against npools==0 (maybe
it should), only npools < 0. But even without that I cannot see it ever
setting ->npools to zero.

I have changed refcounting, so maybe something could get freed early,
but all the changes I made happen *before* the point in the code where
it is crashing.

So I'm thoroughly perplexed.

Thanks,
NeilBrown


>
> --b.
>
> [ 285.585061] divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 285.585754] CPU: 0 PID: 5864 Comm: mount.nfs Not tainted 5.16.0-rc1-00014-g659e13af1f87 #1017
> [ 285.586828] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014
> [ 285.587828] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
> [ 285.588501] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
> [ 285.590820] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
> [ 285.591418] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
> [ 285.592267] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
> [ 285.593145] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
> [ 285.594057] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
> [ 285.594940] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
> [ 285.595826] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
> [ 285.596851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 285.597578] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
> [ 285.598510] Call Trace:
> [ 285.598824] <TASK>
> [ 285.599097] svc_xprt_do_enqueue+0x164/0x900 [sunrpc]
> [ 285.599767] svc_xprt_received+0x181/0x3a0 [sunrpc]
> [ 285.600411] _svc_create_xprt+0x2bd/0x740 [sunrpc]
> [ 285.601049] ? svc_add_new_perm_xprt+0x140/0x140 [sunrpc]
> [ 285.601787] ? lock_release+0x3b8/0x6d0
> [ 285.602318] ? nfs_callback_up+0x7ad/0xdb0 [nfsv4]
> [ 285.603617] svc_create_xprt+0x36/0x90 [sunrpc]
> [ 285.604306] nfs_callback_up+0x81f/0xdb0 [nfsv4]
> [ 285.604972] nfs4_init_client+0x1db/0x450 [nfsv4]
> [ 285.605605] ? nfs41_init_client+0x70/0x70 [nfsv4]
> [ 285.606304] nfs4_set_client+0x25f/0x410 [nfsv4]
> [ 285.606912] ? nfs4_add_trunk.isra.0+0x280/0x280 [nfsv4]
> [ 285.607606] nfs4_create_server+0x5f0/0xda0 [nfsv4]
> [ 285.608250] ? lock_is_held_type+0xd7/0x130
> [ 285.608786] ? nfs4_server_common_setup+0x670/0x670 [nfsv4]
> [ 285.609505] ? __module_get+0x47/0x60
> [ 285.610077] nfs4_try_get_tree+0xd3/0x250 [nfsv4]
> [ 285.610690] vfs_get_tree+0x8a/0x2d0
> [ 285.611152] path_mount+0x3f9/0x19e0
> [ 285.611608] ? debug_check_no_obj_freed+0x1f3/0x3c0
> [ 285.612227] ? lock_is_held_type+0xd7/0x130
> [ 285.612757] ? finish_automount+0x8c0/0x8c0
> [ 285.613281] ? user_path_at_empty+0x45/0x50
> [ 285.613832] ? rcu_read_lock_sched_held+0x3f/0x70
> [ 285.614456] ? kmem_cache_free+0xd9/0x1b0
> [ 285.614965] __x64_sys_mount+0x1d6/0x240
> [ 285.615455] ? path_mount+0x19e0/0x19e0
> [ 285.615941] ? syscall_enter_from_user_mode+0x1d/0x50
> [ 285.616572] do_syscall_64+0x43/0x90
> [ 285.617043] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 285.617693] RIP: 0033:0x7f489fd4182e
> [ 285.618206] Code: 48 8b 0d 4d 16 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1a 16 0c 00 f7 d8 64 89 01 48
> [ 285.620595] RSP: 002b:00007ffdc3bdd5b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> [ 285.621532] RAX: ffffffffffffffda RBX: 00007ffdc3bdd750 RCX: 00007f489fd4182e
> [ 285.622492] RDX: 000055da46c0a510 RSI: 000055da46c0a550 RDI: 000055da46c0c2f0
> [ 285.623372] RBP: 00007ffdc3bdd750 R08: 000055da46c0d050 R09: 0037332e3232312e
> [ 285.624271] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 285.625158] R13: 00007ffdc3bdd660 R14: 000055da46c0ce00 R15: 000055da46c0ce90
> [ 285.626097] </TASK>
> [ 285.626381] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> [ 285.627622] ---[ end trace 0ea273cc87891325 ]---
> [ 285.628222] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
> [ 285.628945] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
> [ 285.631830] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
> [ 285.632557] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
> [ 285.634319] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
> [ 285.635430] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
> [ 285.636408] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
> [ 285.637369] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
> [ 285.638346] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
> [ 285.639434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 285.640233] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
> [ 285.641194] note: mount.nfs[5864] exited with preempt_count 1
> [ 562.003788] kworker/dying (773) used greatest stack depth: 23128 bytes left
> [ 1356.888419] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
> [ 2396.888656] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
> [ 3071.387007] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> [ 3074.395010] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> [ 3082.395298] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> [ 5736.389488] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> [root@test3 ~]# uname -a
> Linux test3.fieldses.org 5.16.0-rc1-00014-g659e13af1f87 #1017 SMP PREEMPT Tue Nov 16 20:51:49 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
>
>

2021-11-22 00:56:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Mon, Nov 22, 2021 at 10:50:34AM +1100, NeilBrown wrote:
> On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> > On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > > I have a dream of making nfsd threads start and stop dynamically.
> >
> > It's a good dream!
> >
> > I haven't had a chance to look at these at all yet, I just kicked off
> > tests to run overnight, and woke up to the below.
> >
> > This happened on the client, probably the first time it attempted to do
> > an nfsv4 mount, so something went wrong with setup of the callback
> > server.
>
> I cannot reproduce this and cannot see any way it could possible happen.

Huh. Well, it's possible I mixed up the results somehow. I'll see if I
can reproduce tonight or tomorrow.

> Could you please confirm the patches were applied on a vanilla 5.1.6-rc1
> kernel, and that you don't have the "pool_mode" module parameter set.

/sys/module/sunrpc/parameters/pool_mode is "global", the default.

> As I said, serv->sv_nrpools is zero, so either it hasn't been set, or it
> was set, but the 'serv' has been cleared and free (or freed and
> reallocated and clearer), or that it was set to zero.
>
> svc_pool_map_get() doesn't explicitly protect against npools==0 (maybe
> it should), only npools < 0. But even without that I cannot see it ever
> setting ->npools to zero.
>
> I have changed refcounting, so maybe something could get freed early,
> but all the changes I made happen *before* the point in the code where
> it is crashing.

OK, I'll take another look and let you know....

--b.

>
> So I'm thoroughly perplexed.
>
> Thanks,
> NeilBrown
>
>
> >
> > --b.
> >
> > [ 285.585061] divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> > [ 285.585754] CPU: 0 PID: 5864 Comm: mount.nfs Not tainted 5.16.0-rc1-00014-g659e13af1f87 #1017
> > [ 285.586828] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014
> > [ 285.587828] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
> > [ 285.588501] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
> > [ 285.590820] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
> > [ 285.591418] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
> > [ 285.592267] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
> > [ 285.593145] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
> > [ 285.594057] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
> > [ 285.594940] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
> > [ 285.595826] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
> > [ 285.596851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 285.597578] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
> > [ 285.598510] Call Trace:
> > [ 285.598824] <TASK>
> > [ 285.599097] svc_xprt_do_enqueue+0x164/0x900 [sunrpc]
> > [ 285.599767] svc_xprt_received+0x181/0x3a0 [sunrpc]
> > [ 285.600411] _svc_create_xprt+0x2bd/0x740 [sunrpc]
> > [ 285.601049] ? svc_add_new_perm_xprt+0x140/0x140 [sunrpc]
> > [ 285.601787] ? lock_release+0x3b8/0x6d0
> > [ 285.602318] ? nfs_callback_up+0x7ad/0xdb0 [nfsv4]
> > [ 285.603617] svc_create_xprt+0x36/0x90 [sunrpc]
> > [ 285.604306] nfs_callback_up+0x81f/0xdb0 [nfsv4]
> > [ 285.604972] nfs4_init_client+0x1db/0x450 [nfsv4]
> > [ 285.605605] ? nfs41_init_client+0x70/0x70 [nfsv4]
> > [ 285.606304] nfs4_set_client+0x25f/0x410 [nfsv4]
> > [ 285.606912] ? nfs4_add_trunk.isra.0+0x280/0x280 [nfsv4]
> > [ 285.607606] nfs4_create_server+0x5f0/0xda0 [nfsv4]
> > [ 285.608250] ? lock_is_held_type+0xd7/0x130
> > [ 285.608786] ? nfs4_server_common_setup+0x670/0x670 [nfsv4]
> > [ 285.609505] ? __module_get+0x47/0x60
> > [ 285.610077] nfs4_try_get_tree+0xd3/0x250 [nfsv4]
> > [ 285.610690] vfs_get_tree+0x8a/0x2d0
> > [ 285.611152] path_mount+0x3f9/0x19e0
> > [ 285.611608] ? debug_check_no_obj_freed+0x1f3/0x3c0
> > [ 285.612227] ? lock_is_held_type+0xd7/0x130
> > [ 285.612757] ? finish_automount+0x8c0/0x8c0
> > [ 285.613281] ? user_path_at_empty+0x45/0x50
> > [ 285.613832] ? rcu_read_lock_sched_held+0x3f/0x70
> > [ 285.614456] ? kmem_cache_free+0xd9/0x1b0
> > [ 285.614965] __x64_sys_mount+0x1d6/0x240
> > [ 285.615455] ? path_mount+0x19e0/0x19e0
> > [ 285.615941] ? syscall_enter_from_user_mode+0x1d/0x50
> > [ 285.616572] do_syscall_64+0x43/0x90
> > [ 285.617043] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 285.617693] RIP: 0033:0x7f489fd4182e
> > [ 285.618206] Code: 48 8b 0d 4d 16 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1a 16 0c 00 f7 d8 64 89 01 48
> > [ 285.620595] RSP: 002b:00007ffdc3bdd5b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> > [ 285.621532] RAX: ffffffffffffffda RBX: 00007ffdc3bdd750 RCX: 00007f489fd4182e
> > [ 285.622492] RDX: 000055da46c0a510 RSI: 000055da46c0a550 RDI: 000055da46c0c2f0
> > [ 285.623372] RBP: 00007ffdc3bdd750 R08: 000055da46c0d050 R09: 0037332e3232312e
> > [ 285.624271] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > [ 285.625158] R13: 00007ffdc3bdd660 R14: 000055da46c0ce00 R15: 000055da46c0ce90
> > [ 285.626097] </TASK>
> > [ 285.626381] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> > [ 285.627622] ---[ end trace 0ea273cc87891325 ]---
> > [ 285.628222] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
> > [ 285.628945] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
> > [ 285.631830] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
> > [ 285.632557] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
> > [ 285.634319] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
> > [ 285.635430] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
> > [ 285.636408] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
> > [ 285.637369] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
> > [ 285.638346] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
> > [ 285.639434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 285.640233] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
> > [ 285.641194] note: mount.nfs[5864] exited with preempt_count 1
> > [ 562.003788] kworker/dying (773) used greatest stack depth: 23128 bytes left
> > [ 1356.888419] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
> > [ 2396.888656] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
> > [ 3071.387007] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > [ 3074.395010] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > [ 3082.395298] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > [ 5736.389488] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > [root@test3 ~]# uname -a
> > Linux test3.fieldses.org 5.16.0-rc1-00014-g659e13af1f87 #1017 SMP PREEMPT Tue Nov 16 20:51:49 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
> >
> >

2021-11-22 00:59:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Sun, Nov 21, 2021 at 07:56:39PM -0500, J. Bruce Fields wrote:
> On Mon, Nov 22, 2021 at 10:50:34AM +1100, NeilBrown wrote:
> > On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> > > On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > > > I have a dream of making nfsd threads start and stop dynamically.
> > >
> > > It's a good dream!
> > >
> > > I haven't had a chance to look at these at all yet, I just kicked off
> > > tests to run overnight, and woke up to the below.
> > >
> > > This happened on the client, probably the first time it attempted to do
> > > an nfsv4 mount, so something went wrong with setup of the callback
> > > server.
> >
> > I cannot reproduce this and cannot see any way it could possible happen.
>
> Huh. Well, it's possible I mixed up the results somehow. I'll see if I
> can reproduce tonight or tomorrow.
>
> > Could you please confirm the patches were applied on a vanilla 5.1.6-rc1
> > kernel, and that you don't have the "pool_mode" module parameter set.
>
> /sys/module/sunrpc/parameters/pool_mode is "global", the default.

Oh, and yes, this is what I was testing, should just be 5.16-rc1 plus
your 14 patches:

http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=shortlog;h=659e13af1f8702776704676937932f332265d85e

--b.

>
> > As I said, serv->sv_nrpools is zero, so either it hasn't been set, or it
> > was set, but the 'serv' has been cleared and free (or freed and
> > reallocated and clearer), or that it was set to zero.
> >
> > svc_pool_map_get() doesn't explicitly protect against npools==0 (maybe
> > it should), only npools < 0. But even without that I cannot see it ever
> > setting ->npools to zero.
> >
> > I have changed refcounting, so maybe something could get freed early,
> > but all the changes I made happen *before* the point in the code where
> > it is crashing.
>
> OK, I'll take another look and let you know....
>
> --b.
>
> >
> > So I'm thoroughly perplexed.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > >
> > > --b.
> > >
> > > [ 285.585061] divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> > > [ 285.585754] CPU: 0 PID: 5864 Comm: mount.nfs Not tainted 5.16.0-rc1-00014-g659e13af1f87 #1017
> > > [ 285.586828] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014
> > > [ 285.587828] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
> > > [ 285.588501] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
> > > [ 285.590820] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
> > > [ 285.591418] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
> > > [ 285.592267] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
> > > [ 285.593145] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
> > > [ 285.594057] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
> > > [ 285.594940] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
> > > [ 285.595826] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
> > > [ 285.596851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 285.597578] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
> > > [ 285.598510] Call Trace:
> > > [ 285.598824] <TASK>
> > > [ 285.599097] svc_xprt_do_enqueue+0x164/0x900 [sunrpc]
> > > [ 285.599767] svc_xprt_received+0x181/0x3a0 [sunrpc]
> > > [ 285.600411] _svc_create_xprt+0x2bd/0x740 [sunrpc]
> > > [ 285.601049] ? svc_add_new_perm_xprt+0x140/0x140 [sunrpc]
> > > [ 285.601787] ? lock_release+0x3b8/0x6d0
> > > [ 285.602318] ? nfs_callback_up+0x7ad/0xdb0 [nfsv4]
> > > [ 285.603617] svc_create_xprt+0x36/0x90 [sunrpc]
> > > [ 285.604306] nfs_callback_up+0x81f/0xdb0 [nfsv4]
> > > [ 285.604972] nfs4_init_client+0x1db/0x450 [nfsv4]
> > > [ 285.605605] ? nfs41_init_client+0x70/0x70 [nfsv4]
> > > [ 285.606304] nfs4_set_client+0x25f/0x410 [nfsv4]
> > > [ 285.606912] ? nfs4_add_trunk.isra.0+0x280/0x280 [nfsv4]
> > > [ 285.607606] nfs4_create_server+0x5f0/0xda0 [nfsv4]
> > > [ 285.608250] ? lock_is_held_type+0xd7/0x130
> > > [ 285.608786] ? nfs4_server_common_setup+0x670/0x670 [nfsv4]
> > > [ 285.609505] ? __module_get+0x47/0x60
> > > [ 285.610077] nfs4_try_get_tree+0xd3/0x250 [nfsv4]
> > > [ 285.610690] vfs_get_tree+0x8a/0x2d0
> > > [ 285.611152] path_mount+0x3f9/0x19e0
> > > [ 285.611608] ? debug_check_no_obj_freed+0x1f3/0x3c0
> > > [ 285.612227] ? lock_is_held_type+0xd7/0x130
> > > [ 285.612757] ? finish_automount+0x8c0/0x8c0
> > > [ 285.613281] ? user_path_at_empty+0x45/0x50
> > > [ 285.613832] ? rcu_read_lock_sched_held+0x3f/0x70
> > > [ 285.614456] ? kmem_cache_free+0xd9/0x1b0
> > > [ 285.614965] __x64_sys_mount+0x1d6/0x240
> > > [ 285.615455] ? path_mount+0x19e0/0x19e0
> > > [ 285.615941] ? syscall_enter_from_user_mode+0x1d/0x50
> > > [ 285.616572] do_syscall_64+0x43/0x90
> > > [ 285.617043] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [ 285.617693] RIP: 0033:0x7f489fd4182e
> > > [ 285.618206] Code: 48 8b 0d 4d 16 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1a 16 0c 00 f7 d8 64 89 01 48
> > > [ 285.620595] RSP: 002b:00007ffdc3bdd5b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> > > [ 285.621532] RAX: ffffffffffffffda RBX: 00007ffdc3bdd750 RCX: 00007f489fd4182e
> > > [ 285.622492] RDX: 000055da46c0a510 RSI: 000055da46c0a550 RDI: 000055da46c0c2f0
> > > [ 285.623372] RBP: 00007ffdc3bdd750 R08: 000055da46c0d050 R09: 0037332e3232312e
> > > [ 285.624271] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > [ 285.625158] R13: 00007ffdc3bdd660 R14: 000055da46c0ce00 R15: 000055da46c0ce90
> > > [ 285.626097] </TASK>
> > > [ 285.626381] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> > > [ 285.627622] ---[ end trace 0ea273cc87891325 ]---
> > > [ 285.628222] RIP: 0010:svc_pool_for_cpu+0xc7/0x1b0 [sunrpc]
> > > [ 285.628945] Code: 8b ab f0 00 00 00 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 09 80 fa 03 0f 8e 8d 00 00 00 31 d2 <f7> b3 e8 00 00 00 48 83 c4 08 5b 48 8d 04 52 48 c1 e0 06 48 01 e8
> > > [ 285.631830] RSP: 0018:ffff88801526f8f8 EFLAGS: 00010246
> > > [ 285.632557] RAX: 0000000000000000 RBX: ffff88800db3bc00 RCX: 1ffff11001b6779d
> > > [ 285.634319] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88800db3bce8
> > > [ 285.635430] RBP: 0000000000000010 R08: 0000000000000001 R09: ffff888014b7403f
> > > [ 285.636408] R10: ffffed100296e807 R11: 0000000000000001 R12: ffff888014b74038
> > > [ 285.637369] R13: ffff888014b74010 R14: ffff888014b74000 R15: ffff88800db3bc00
> > > [ 285.638346] FS: 00007f489f68a440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
> > > [ 285.639434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 285.640233] CR2: 00007f2dffa0b198 CR3: 000000000c486003 CR4: 0000000000170ef0
> > > [ 285.641194] note: mount.nfs[5864] exited with preempt_count 1
> > > [ 562.003788] kworker/dying (773) used greatest stack depth: 23128 bytes left
> > > [ 1356.888419] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
> > > [ 2396.888656] clocksource: timekeeping watchdog on CPU1: acpi_pm retried 2 times before success
> > > [ 3071.387007] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > > [ 3074.395010] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > > [ 3082.395298] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > > [ 5736.389488] clocksource: timekeeping watchdog on CPU0: acpi_pm retried 2 times before success
> > > [root@test3 ~]# uname -a
> > > Linux test3.fieldses.org 5.16.0-rc1-00014-g659e13af1f87 #1017 SMP PREEMPT Tue Nov 16 20:51:49 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
> > >
> > >

2021-11-22 01:13:15

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Mon, 22 Nov 2021, J. Bruce Fields wrote:
> On Sun, Nov 21, 2021 at 07:56:39PM -0500, J. Bruce Fields wrote:
> > On Mon, Nov 22, 2021 at 10:50:34AM +1100, NeilBrown wrote:
> > > On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> > > > On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > > > > I have a dream of making nfsd threads start and stop dynamically.
> > > >
> > > > It's a good dream!
> > > >
> > > > I haven't had a chance to look at these at all yet, I just kicked off
> > > > tests to run overnight, and woke up to the below.
> > > >
> > > > This happened on the client, probably the first time it attempted to do
> > > > an nfsv4 mount, so something went wrong with setup of the callback
> > > > server.
> > >
> > > I cannot reproduce this and cannot see any way it could possible happen.
> >
> > Huh. Well, it's possible I mixed up the results somehow. I'll see if I
> > can reproduce tonight or tomorrow.
> >
> > > Could you please confirm the patches were applied on a vanilla 5.1.6-rc1
> > > kernel, and that you don't have the "pool_mode" module parameter set.
> >
> > /sys/module/sunrpc/parameters/pool_mode is "global", the default.
>
> Oh, and yes, this is what I was testing, should just be 5.16-rc1 plus
> your 14 patches:
>
> http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=shortlog;h=659e13af1f8702776704676937932f332265d85e

Thanks!

I did find a possible problem. Very first patch.
in fs/nfsd/nfsctl.c, in _write_ports_addfd()
if (!err && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))

should be "err >= 0" rather than "!err". That could result in a
use-after free, which can make anything explode.
If not too much trouble, could you just tweek that line and see what
happens?

Thanks,
NeilBrown

2021-11-22 02:37:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Mon, Nov 22, 2021 at 12:13:08PM +1100, NeilBrown wrote:
> On Mon, 22 Nov 2021, J. Bruce Fields wrote:
> > On Sun, Nov 21, 2021 at 07:56:39PM -0500, J. Bruce Fields wrote:
> > > On Mon, Nov 22, 2021 at 10:50:34AM +1100, NeilBrown wrote:
> > > > On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> > > > > On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > > > > > I have a dream of making nfsd threads start and stop dynamically.
> > > > >
> > > > > It's a good dream!
> > > > >
> > > > > I haven't had a chance to look at these at all yet, I just kicked off
> > > > > tests to run overnight, and woke up to the below.
> > > > >
> > > > > This happened on the client, probably the first time it attempted to do
> > > > > an nfsv4 mount, so something went wrong with setup of the callback
> > > > > server.
> > > >
> > > > I cannot reproduce this and cannot see any way it could possible happen.
> > >
> > > Huh. Well, it's possible I mixed up the results somehow. I'll see if I
> > > can reproduce tonight or tomorrow.
> > >
> > > > Could you please confirm the patches were applied on a vanilla 5.1.6-rc1
> > > > kernel, and that you don't have the "pool_mode" module parameter set.
> > >
> > > /sys/module/sunrpc/parameters/pool_mode is "global", the default.
> >
> > Oh, and yes, this is what I was testing, should just be 5.16-rc1 plus
> > your 14 patches:
> >
> > http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=shortlog;h=659e13af1f8702776704676937932f332265d85e

OK, tried again and it did indeed reproduce in the same spot.

> I did find a possible problem. Very first patch.
> in fs/nfsd/nfsctl.c, in _write_ports_addfd()
> if (!err && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
>
> should be "err >= 0" rather than "!err". That could result in a
> use-after free, which can make anything explode.
> If not too much trouble, could you just tweek that line and see what
> happens?

Like the following? Same divide-by-zero, I'm afraid.

--b.

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 28c26429988e..442581d5a1ca 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -743,7 +743,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred

err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);

- if (!err && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
+ if (err >= 0 && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
svc_get(nn->nfsd_serv);

nfsd_put(net);

2021-11-22 03:38:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Sun, Nov 21, 2021 at 09:37:53PM -0500, J. Bruce Fields wrote:
> On Mon, Nov 22, 2021 at 12:13:08PM +1100, NeilBrown wrote:
> > On Mon, 22 Nov 2021, J. Bruce Fields wrote:
> > > On Sun, Nov 21, 2021 at 07:56:39PM -0500, J. Bruce Fields wrote:
> > > > On Mon, Nov 22, 2021 at 10:50:34AM +1100, NeilBrown wrote:
> > > > > On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> > > > > > On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > > > > > > I have a dream of making nfsd threads start and stop dynamically.
> > > > > >
> > > > > > It's a good dream!
> > > > > >
> > > > > > I haven't had a chance to look at these at all yet, I just kicked off
> > > > > > tests to run overnight, and woke up to the below.
> > > > > >
> > > > > > This happened on the client, probably the first time it attempted to do
> > > > > > an nfsv4 mount, so something went wrong with setup of the callback
> > > > > > server.
> > > > >
> > > > > I cannot reproduce this and cannot see any way it could possible happen.
> > > >
> > > > Huh. Well, it's possible I mixed up the results somehow. I'll see if I
> > > > can reproduce tonight or tomorrow.
> > > >
> > > > > Could you please confirm the patches were applied on a vanilla 5.1.6-rc1
> > > > > kernel, and that you don't have the "pool_mode" module parameter set.
> > > >
> > > > /sys/module/sunrpc/parameters/pool_mode is "global", the default.
> > >
> > > Oh, and yes, this is what I was testing, should just be 5.16-rc1 plus
> > > your 14 patches:
> > >
> > > http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=shortlog;h=659e13af1f8702776704676937932f332265d85e
>
> OK, tried again and it did indeed reproduce in the same spot.
>
> > I did find a possible problem. Very first patch.
> > in fs/nfsd/nfsctl.c, in _write_ports_addfd()
> > if (!err && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
> >
> > should be "err >= 0" rather than "!err". That could result in a
> > use-after free, which can make anything explode.
> > If not too much trouble, could you just tweek that line and see what
> > happens?
>
> Like the following? Same divide-by-zero, I'm afraid.

Hm, playing with reproducer; it takes more than one mount. My simplest
reproducer is:

mount -overs=3 server:/path /mnt/
umount /mnt/
mount -overs=4.0 server:/path /mnt/

... and the client crashes here.

--b.

2021-11-22 15:18:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Sun, Nov 21, 2021 at 10:38:40PM -0500, J. Bruce Fields wrote:
> On Sun, Nov 21, 2021 at 09:37:53PM -0500, J. Bruce Fields wrote:
> > On Mon, Nov 22, 2021 at 12:13:08PM +1100, NeilBrown wrote:
> > > On Mon, 22 Nov 2021, J. Bruce Fields wrote:
> > > > On Sun, Nov 21, 2021 at 07:56:39PM -0500, J. Bruce Fields wrote:
> > > > > On Mon, Nov 22, 2021 at 10:50:34AM +1100, NeilBrown wrote:
> > > > > > On Thu, 18 Nov 2021, J. Bruce Fields wrote:
> > > > > > > On Wed, Nov 17, 2021 at 11:46:49AM +1100, NeilBrown wrote:
> > > > > > > > I have a dream of making nfsd threads start and stop dynamically.
> > > > > > >
> > > > > > > It's a good dream!
> > > > > > >
> > > > > > > I haven't had a chance to look at these at all yet, I just kicked off
> > > > > > > tests to run overnight, and woke up to the below.
> > > > > > >
> > > > > > > This happened on the client, probably the first time it attempted to do
> > > > > > > an nfsv4 mount, so something went wrong with setup of the callback
> > > > > > > server.
> > > > > >
> > > > > > I cannot reproduce this and cannot see any way it could possible happen.
> > > > >
> > > > > Huh. Well, it's possible I mixed up the results somehow. I'll see if I
> > > > > can reproduce tonight or tomorrow.
> > > > >
> > > > > > Could you please confirm the patches were applied on a vanilla 5.1.6-rc1
> > > > > > kernel, and that you don't have the "pool_mode" module parameter set.
> > > > >
> > > > > /sys/module/sunrpc/parameters/pool_mode is "global", the default.
> > > >
> > > > Oh, and yes, this is what I was testing, should just be 5.16-rc1 plus
> > > > your 14 patches:
> > > >
> > > > http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=shortlog;h=659e13af1f8702776704676937932f332265d85e
> >
> > OK, tried again and it did indeed reproduce in the same spot.
> >
> > > I did find a possible problem. Very first patch.
> > > in fs/nfsd/nfsctl.c, in _write_ports_addfd()
> > > if (!err && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
> > >
> > > should be "err >= 0" rather than "!err". That could result in a
> > > use-after free, which can make anything explode.
> > > If not too much trouble, could you just tweek that line and see what
> > > happens?
> >
> > Like the following? Same divide-by-zero, I'm afraid.
>
> Hm, playing with reproducer; it takes more than one mount. My simplest
> reproducer is:
>
> mount -overs=3 server:/path /mnt/
> umount /mnt/
> mount -overs=4.0 server:/path /mnt/
>
> ... and the client crashes here.

Also: the problem starts with the last patch ("lockd: use
svc_set_num_threads() for thread start and stop").

--b.

2021-11-22 23:25:46

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/14] SUNRPC: clean up server thread management.

On Tue, 23 Nov 2021, J. Bruce Fields wrote:
> >
> > Hm, playing with reproducer; it takes more than one mount. My simplest
> > reproducer is:
> >
> > mount -overs=3 server:/path /mnt/
> > umount /mnt/
> > mount -overs=4.0 server:/path /mnt/
> >
> > ... and the client crashes here.
>
> Also: the problem starts with the last patch ("lockd: use
> svc_set_num_threads() for thread start and stop").

That helps. It points to lockd, and so to lockd stopping.
Lockd wasn't stopping for me because I had nfsd running.
I disabled nfsd and tried again - and got the crash.

The problem is
#define svc_serv_is_pooled(serv) ((serv)->sv_ops->svo_function)

Why does the presence of a svo_function mark the serv as being
'pooled'???

That last patch gave lockd an svo_function, so when threats were
stopped, the pool was released. But as lockd doesn't create a pooled
service, the pool was never claimed. So the pool_map->npools was set
to zero even though it should have still been active.
I'll probably change svc_serv_is_pooled() to test serv->sv_nrpools, and
allow that to be zero for non-pooled services.

Thanks for your testing help,
NeilBrown