2008-04-08 19:40:11

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] NFS: don't let nfs_callback_svc exit on unexpected svc_recv errors (try #2)

When svc_recv returns an unexpected error, nfs_callback_svc will print a
warning and exit. This problematic for several reasons. In particular,
it will cause the reference counts for the thread to be wrong, and no
new thread will be started until all nfs4 mounts are unmounted.

Rather than exiting on error from svc_recv, have the thread do a 1s
sleep and then retry the loop. This is unlikely to cause any harm, and
if the error turns out to be something temporary then it may be able to
recover.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback.c | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index a9f1538..5606ae3 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -59,7 +59,7 @@ module_param_call(callback_tcpport, param_set_port, param_get_int,
static int
nfs_callback_svc(void *vrqstp)
{
- int err;
+ int err, preverr = 0;
struct svc_rqst *rqstp = vrqstp;

set_freezable();
@@ -74,14 +74,20 @@ nfs_callback_svc(void *vrqstp)
* Listen for a request on the socket
*/
err = svc_recv(rqstp, MAX_SCHEDULE_TIMEOUT);
- if (err == -EAGAIN || err == -EINTR)
+ if (err == -EAGAIN || err == -EINTR) {
+ preverr = err;
continue;
+ }
if (err < 0) {
- printk(KERN_WARNING
- "%s: terminating on error %d\n",
- __FUNCTION__, -err);
- break;
+ if (err != preverr) {
+ printk(KERN_WARNING "%s: unexpected error "
+ "from svc_recv (%d)\n", __func__, err);
+ preverr = err;
+ }
+ schedule_timeout_uninterruptible(HZ);
+ continue;
}
+ preverr = err;
svc_process(rqstp);
}
unlock_kernel();
--
1.5.3.6



2008-04-08 20:16:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NLM: don't let lockd exit on unexpected svc_recv errors (try #2)


On Tue, 2008-04-08 at 15:40 -0400, Jeff Layton wrote:
> When svc_recv returns an unexpected error, lockd will print a warning
> and exit. This problematic for several reasons. In particular, it will
> cause the reference counts for the thread to be wrong, and can lead to a
> potential BUG() call.
>
> Rather than exiting on error from svc_recv, have the thread do a 1s
> sleep and then retry the loop. This is unlikely to cause any harm, and
> if the error turns out to be something temporary then it may be able to
> recover.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/lockd/svc.c | 18 ++++++++++++------
> 1 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 66b5c98..cf977bb 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -112,7 +112,7 @@ static inline void clear_grace_period(void)
> static int
> lockd(void *vrqstp)
> {
> - int err = 0;
> + int err = 0, preverr = 0;
> struct svc_rqst *rqstp = vrqstp;
> unsigned long grace_period_expire;
>
> @@ -172,14 +172,20 @@ lockd(void *vrqstp)
> * recvfrom routine.
> */
> err = svc_recv(rqstp, timeout);
> - if (err == -EAGAIN || err == -EINTR)
> + if (err == -EAGAIN || err == -EINTR) {
> + preverr = err;
> continue;
> + }
> if (err < 0) {
> - printk(KERN_WARNING
> - "lockd: terminating on error %d\n",
> - -err);
> - break;
> + if (err != preverr) {
> + printk(KERN_WARNING "%s: unexpected error "
> + "from svc_recv (%d)\n", __func__, err);
> + preverr = err;
> + }
> + schedule_timeout_interruptible(HZ);

Why not use an uninterruptible sleep?

Trond


2008-04-08 20:23:51

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] NLM: don't let lockd exit on unexpected svc_recv errors (try #2)

On Tue, 08 Apr 2008 16:15:58 -0400
Trond Myklebust <[email protected]> wrote:

>
> On Tue, 2008-04-08 at 15:40 -0400, Jeff Layton wrote:
> > When svc_recv returns an unexpected error, lockd will print a warning
> > and exit. This problematic for several reasons. In particular, it will
> > cause the reference counts for the thread to be wrong, and can lead to a
> > potential BUG() call.
> >
> > Rather than exiting on error from svc_recv, have the thread do a 1s
> > sleep and then retry the loop. This is unlikely to cause any harm, and
> > if the error turns out to be something temporary then it may be able to
> > recover.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/lockd/svc.c | 18 ++++++++++++------
> > 1 files changed, 12 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index 66b5c98..cf977bb 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -112,7 +112,7 @@ static inline void clear_grace_period(void)
> > static int
> > lockd(void *vrqstp)
> > {
> > - int err = 0;
> > + int err = 0, preverr = 0;
> > struct svc_rqst *rqstp = vrqstp;
> > unsigned long grace_period_expire;
> >
> > @@ -172,14 +172,20 @@ lockd(void *vrqstp)
> > * recvfrom routine.
> > */
> > err = svc_recv(rqstp, timeout);
> > - if (err == -EAGAIN || err == -EINTR)
> > + if (err == -EAGAIN || err == -EINTR) {
> > + preverr = err;
> > continue;
> > + }
> > if (err < 0) {
> > - printk(KERN_WARNING
> > - "lockd: terminating on error %d\n",
> > - -err);
> > - break;
> > + if (err != preverr) {
> > + printk(KERN_WARNING "%s: unexpected error "
> > + "from svc_recv (%d)\n", __func__, err);
> > + preverr = err;
> > + }
> > + schedule_timeout_interruptible(HZ);
>
> Why not use an uninterruptible sleep?
>
> Trond
>

lockd can receive a SIGKILL to tell it to drop all locks, so I figured
it was reasonable to allow the sleep to be interrupted if that happens.
It probably doesn't matter much either way though...

--
Jeff Layton <[email protected]>

2008-04-09 17:53:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] NLM: don't let lockd exit on unexpected svc_recv errors (try #2)

On Tue, Apr 08, 2008 at 03:40:08PM -0400, Jeff Layton wrote:
> When svc_recv returns an unexpected error, lockd will print a warning
> and exit. This problematic for several reasons. In particular, it will
> cause the reference counts for the thread to be wrong, and can lead to a
> potential BUG() call.
>
> Rather than exiting on error from svc_recv, have the thread do a 1s
> sleep and then retry the loop. This is unlikely to cause any harm, and
> if the error turns out to be something temporary then it may be able to
> recover.

Both applied; thanks!--b.

>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/lockd/svc.c | 18 ++++++++++++------
> 1 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 66b5c98..cf977bb 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -112,7 +112,7 @@ static inline void clear_grace_period(void)
> static int
> lockd(void *vrqstp)
> {
> - int err = 0;
> + int err = 0, preverr = 0;
> struct svc_rqst *rqstp = vrqstp;
> unsigned long grace_period_expire;
>
> @@ -172,14 +172,20 @@ lockd(void *vrqstp)
> * recvfrom routine.
> */
> err = svc_recv(rqstp, timeout);
> - if (err == -EAGAIN || err == -EINTR)
> + if (err == -EAGAIN || err == -EINTR) {
> + preverr = err;
> continue;
> + }
> if (err < 0) {
> - printk(KERN_WARNING
> - "lockd: terminating on error %d\n",
> - -err);
> - break;
> + if (err != preverr) {
> + printk(KERN_WARNING "%s: unexpected error "
> + "from svc_recv (%d)\n", __func__, err);
> + preverr = err;
> + }
> + schedule_timeout_interruptible(HZ);
> + continue;
> }
> + preverr = err;
>
> dprintk("lockd: request from %s\n",
> svc_print_addr(rqstp, buf, sizeof(buf)));
> --
> 1.5.3.6
>

2008-04-08 19:40:08

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] NLM: don't let lockd exit on unexpected svc_recv errors (try #2)

When svc_recv returns an unexpected error, lockd will print a warning
and exit. This problematic for several reasons. In particular, it will
cause the reference counts for the thread to be wrong, and can lead to a
potential BUG() call.

Rather than exiting on error from svc_recv, have the thread do a 1s
sleep and then retry the loop. This is unlikely to cause any harm, and
if the error turns out to be something temporary then it may be able to
recover.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svc.c | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 66b5c98..cf977bb 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -112,7 +112,7 @@ static inline void clear_grace_period(void)
static int
lockd(void *vrqstp)
{
- int err = 0;
+ int err = 0, preverr = 0;
struct svc_rqst *rqstp = vrqstp;
unsigned long grace_period_expire;

@@ -172,14 +172,20 @@ lockd(void *vrqstp)
* recvfrom routine.
*/
err = svc_recv(rqstp, timeout);
- if (err == -EAGAIN || err == -EINTR)
+ if (err == -EAGAIN || err == -EINTR) {
+ preverr = err;
continue;
+ }
if (err < 0) {
- printk(KERN_WARNING
- "lockd: terminating on error %d\n",
- -err);
- break;
+ if (err != preverr) {
+ printk(KERN_WARNING "%s: unexpected error "
+ "from svc_recv (%d)\n", __func__, err);
+ preverr = err;
+ }
+ schedule_timeout_interruptible(HZ);
+ continue;
}
+ preverr = err;

dprintk("lockd: request from %s\n",
svc_print_addr(rqstp, buf, sizeof(buf)));
--
1.5.3.6