2008-02-19 17:51:35

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] NFS: flush signals before taking down callback thread

Now that the reference counting on the callback thread is working as
expected, it uncovers another problem. Peter Staubach noticed while
testing that patch on an older kernel that he would occasionally see
this printk in rpc_register fire:

"RPC: failed to contact portmap (errno -512).

The NFSv4 callback thread is signaled by nfs_callback_down(), but never
flushes that signal. All of the shutdown processing is done with that
signal pending. This makes it fail the call to unregister the port with
the portmapper.

In actuality, this rpc_register call isn't necessary at all since the
port isn't actually registered with the portmapper anymore. Regardless,
there doesn't seem to be any reason to leave the signal pending while
the thread is being shut down and flushing it should generally silence
that printk.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index ecc06c6..0d784f6 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -93,6 +93,7 @@ static void nfs_callback_svc(struct svc_rqst *rqstp)
svc_process(rqstp);
}

+ flush_signals(current);
svc_exit_thread(rqstp);
nfs_callback_info.pid = 0;
complete(&nfs_callback_info.stopped);
--
1.5.3.8


2008-02-19 22:22:47

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] NFS: flush signals before taking down callback thread

On Tue, Feb 19, 2008 at 12:51:35PM -0500, Jeff Layton wrote:
> Now that the reference counting on the callback thread is working as
> expected, it uncovers another problem. Peter Staubach noticed while
> testing that patch on an older kernel that he would occasionally see
> this printk in rpc_register fire:
>
> "RPC: failed to contact portmap (errno -512).
>
> The NFSv4 callback thread is signaled by nfs_callback_down(), but never
> flushes that signal. All of the shutdown processing is done with that
> signal pending. This makes it fail the call to unregister the port with
> the portmapper.
>
> In actuality, this rpc_register call isn't necessary at all since the
> port isn't actually registered with the portmapper anymore. Regardless,
> there doesn't seem to be any reason to leave the signal pending while
> the thread is being shut down and flushing it should generally silence
> that printk.

Wouldn't it be better to not allow for signals to this thread at all?
The code really begs for a similar kthread conversion as the lockd one.

2008-02-19 22:43:30

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFS: flush signals before taking down callback thread


On Tue, 2008-02-19 at 17:22 -0500, Christoph Hellwig wrote:
> Wouldn't it be better to not allow for signals to this thread at all?
> The code really begs for a similar kthread conversion as the lockd one.

I'd tend to prefer that solution too.

Cheers
Trond

2008-02-19 23:00:40

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] NFS: flush signals before taking down callback thread

On Tue, 19 Feb 2008 17:22:26 -0500
Christoph Hellwig <[email protected]> wrote:

> On Tue, Feb 19, 2008 at 12:51:35PM -0500, Jeff Layton wrote:
> > Now that the reference counting on the callback thread is working as
> > expected, it uncovers another problem. Peter Staubach noticed while
> > testing that patch on an older kernel that he would occasionally see
> > this printk in rpc_register fire:
> >
> > "RPC: failed to contact portmap (errno -512).
> >
> > The NFSv4 callback thread is signaled by nfs_callback_down(), but never
> > flushes that signal. All of the shutdown processing is done with that
> > signal pending. This makes it fail the call to unregister the port with
> > the portmapper.
> >
> > In actuality, this rpc_register call isn't necessary at all since the
> > port isn't actually registered with the portmapper anymore. Regardless,
> > there doesn't seem to be any reason to leave the signal pending while
> > the thread is being shut down and flushing it should generally silence
> > that printk.
>
> Wouldn't it be better to not allow for signals to this thread at all?
> The code really begs for a similar kthread conversion as the lockd one.

Yep, and I have such a patch already. I just need to test it out a bit
more before I send it out. I mainly sent this out to get the existing
code into shape before we do the conversion. That way if we have to
revert that conversion later, we won't be reverting to something broken.

But if Trond would rather not worry about it we can probably skip this
patch and I'll just send out the kthread conversion once I have some
more time to test it out.

--
Jeff Layton <[email protected]>

2008-02-20 14:01:24

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] NFS: flush signals before taking down callback thread

On Tue, 19 Feb 2008 17:43:30 -0500
Trond Myklebust <[email protected]> wrote:

>
> On Tue, 2008-02-19 at 17:22 -0500, Christoph Hellwig wrote:
> > Wouldn't it be better to not allow for signals to this thread at
> > all? The code really begs for a similar kthread conversion as the
> > lockd one.
>
> I'd tend to prefer that solution too.
>

I did a bit of smoke testing on that patch this morning and just sent
it out to the list. The only problem is that that patch depends on a
couple of other patches currently in Bruce's tree and that are slated
for 2.6.26. It might be reasonable in the interim to go ahead and take
the patch to flush_signals() in the existing code.

Thanks,
--
Jeff Layton <[email protected]>

2008-02-20 15:10:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFS: flush signals before taking down callback thread


On Wed, 2008-02-20 at 09:01 -0500, Jeff Layton wrote:
> I did a bit of smoke testing on that patch this morning and just sent
> it out to the list. The only problem is that that patch depends on a
> couple of other patches currently in Bruce's tree and that are slated
> for 2.6.26. It might be reasonable in the interim to go ahead and take
> the patch to flush_signals() in the existing code.

OK. I'll push that patch to Linus some time in the next few days.

Trond