LinuxLists.cc - Relocate NFS root FS for maintenance

2010-09-01 12:24:39

Subject: Relocate NFS root FS for maintenance

Hi,

I don't know if it is the good place to ask for such a problem, if not
please apologize.

I have a pseudo active/active NFSv4 configuration: 2 NFSv4 servers
(1:1.1.2-6lenny1), exporting 3 filesystems. Each filesystem are
connected to a SAN volume, via iscsi and multipath.

filer-01 :
/var/local
10.0.0.0/16(rw,fsid=0,crossmnt,insecure,async,no_subtree_check)
/var/local/large
10.0.0.0/16(rw,insecure,async,no_subtree_check,no_root_squash,fsid=3)

filer-02 :
/var/local
10.0.0.0/16(rw,fsid=0,crossmnt,insecure,async,no_subtree_check)
/var/local/small
10.0.0.0/16(rw,insecure,async,no_subtree_check,no_root_squash,fsid=1)
/var/local/medium
10.0.0.0/16(rw,insecure,async,no_subtree_check,no_root_squash,fsid=2)

filer-01 share a volume on the first SAN, filer-02 share 2 volumes on
the second SAN.

I have to do a firmware upgrade on the SANs, so I have to umount
/var/local/* on the nfs servers. On way could be to relocate NFS root FS
from /var/local to /var/maintenance with empty directories large, medium
and small
But how to do that ? I try to change /etc/exports then exportfs -rvf,
but got "Stale NFS file handle" on clients...

If not, is there a better way to do that, without to shutdown NFS servers ?

--
Greg

2010-09-20 12:49:28

by Menyhart Zoltan

[permalink] [raw]

Subject: Locking question around "...PagePrivate()"

Hi,

nfs_inode_add_request() runs under the protection of the Page Lock
while setting up SetPagePrivate(), set_page_private().

The consumers of PagePrivate() run also under the protection of the Page Lock.

On the other hand, nfs_inode_remove_request() invoked by
nfs_writeback_release_full() and nfs_commit_release() apparently does
not take the Page Lock.

nfs_inode_remove_request() includes the sequence of

set_page_private(req->wb_page, 0);
ClearPagePrivate(req->wb_page);

When the base kernel calls e.g.

mapping->a_ops->releasepage(page, gfp_mask)

then nfs_release_page() may see an incoherent "private" state.

Should not all the "private" operations be protected by the Page Lock?

Thanks,

Zoltan Menyhart

[permalink] [raw]

Subject: Re: [PATCH 1/4] svcrpc: never clear XPT_BUSY on dead xprt

On Tue, Oct 26, 2010 at 12:05:17PM -0400, J. Bruce Fields wrote:
> On Tue, Oct 26, 2010 at 08:59:47AM -0400, J. Bruce Fields wrote:
> > On Tue, Oct 26, 2010 at 12:28:20PM +1100, Neil Brown wrote:
> > > On Mon, 25 Oct 2010 20:30:08 -0400
> > > "J. Bruce Fields" <[email protected]> wrote:
> > > >
> > > > svcrpc: simplify svc_close_xprt, fix minor race
> > > >
> > > > We're assuming that in the XPT_BUSY case, somebody else will take care
> > > > of the close. But that's not necessarily the case (e.g. if
> > > > svc_xprt_enqueue() exits due to a lack of write space). So the close
> > > > could in theory be delayed indefinitely. We should be calling
> > > > svc_xprt_enqueue() after every set of XPT_CLOSE.
> > > >
> > > > Also with svc_close_all() no longer relying on svc_close, it no longer
> > > > needs to be able to immediately do the svc_delete_xprt() itself.
> > > >
> > > > Signed-off-by: J. Bruce Fields <[email protected]>
> > > >
> > > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > > > index 8c018df..dfee123 100644
> > > > --- a/net/sunrpc/svc_xprt.c
> > > > +++ b/net/sunrpc/svc_xprt.c
> > > > @@ -928,11 +928,7 @@ void svc_delete_xprt(struct svc_xprt *xprt)
> > > > void svc_close_xprt(struct svc_xprt *xprt)
> > > > {
> > > > set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > > > - if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags))
> > > > - /* someone else will have to effect the close */
> > > > - return;
> > > > -
> > > > - svc_delete_xprt(xprt);
> > > > + svc_xprt_enqueue(xprt);
> > > > }
> > > > EXPORT_SYMBOL_GPL(svc_close_xprt);
> > > >
> > >
> > > I'm not quite so sure about this one.
> > >
> > > svc_close_xprt gets called when user-space writes some magic string to some
> > > magic file. This could be done when there are no active nfsd threads.
> > > It is a bit far fetched, but you could tell nfsd to open a socket, then tell
> > > it to close it, before telling it to start any threads.
> > >
> > > In that case the socket would stay open until the first thread was started.
> >
> > Thanks, that's an odd case, but I think you're right. OK:
>
> Actually.... I think the real race here is the fault of the write-space
> check: once we've taken XPT_BUSY, we really do need to guarantee that
> we'll arrange to have svc_xprt_enqueue() called again after clearing
> XPT_BUSY, to avoid missing any events that arrived while we were
> processing ours. So putting the wspace check after taking XPT_BUSY in
> svc_xprt_enqueue looks to me like a bug that could cause processing to
> stall unnecessarily in some unlikely cases.

I'm going to apply this for 2.6.38 if you can't think of a reason not
to.

(My one last-minute worry was whether there was some reason xpo_wspace()
needed to be called with XPT_BUSY held; but I can't see any.)

--b.

>
> --b.
>
> commit cf501cb479873d6e77abba131ea40c11aa924d72
> Author: J. Bruce Fields <[email protected]>
> Date: Tue Oct 26 11:32:03 2010 -0400
>
> svcrpc: fix wspace-checking race
>
> We call svc_xprt_enqueue() after something happens which we think may
> require handling from a server thread. To avoid such events being lost,
> svc_xprt_enqueue() must guarantee that there will be a svc_serv() call
> from a server thread following any such event. It does that by either
> waking up a server thread itself, or checking that XPT_BUSY is set (in
> which case somebody else is doing it).
>
> But the check of XPT_BUSY could occur just as someone finishes
> processing some other event, and just before they clear XPT_BUSY.
>
> Therefore it's important not to clear XPT_BUSY without subsequently
> doing another svc_export_enqueue() to check whether the xprt should be
> requeued.
>
> The xpo_wspace() check in svc_xprt_enqueue() breaks this rule, allowing
> an event to be missed in situations like:
>
> data arrives
> call svc_tcp_data_ready():
> call svc_xprt_enqueue():
> set BUSY
> find no write space
> svc_reserve():
> free up write space
> call svc_enqueue():
> test BUSY
> clear BUSY
>
> So, instead, check wspace in the same places that the state flags are
> checked: before taking BUSY, and in svc_receive().
>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index d2261fe..9c690fa 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -92,12 +92,17 @@ static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user
> static inline int register_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u)
> {
> spin_lock(&xpt->xpt_lock);
> - list_add(&u->list, &xpt->xpt_users);
> - spin_unlock(&xpt->xpt_lock);
> if (test_bit(XPT_CLOSE, &xpt->xpt_flags)) {
> - unregister_xpt_user(xpt, u);
> + /*
> + * The connection is about to be deleted soon (or,
> + * worse, may already be deleted--in which case we've
> + * already notified the xpt_users).
> + */
> + spin_unlock(&xpt->xpt_lock);
> return -ENOTCONN;
> }
> + list_add(&u->list, &xpt->xpt_users);
> + spin_unlock(&xpt->xpt_lock);
> return 0;
> }
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 99996ad..04238a9 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -302,6 +302,15 @@ static void svc_thread_dequeue(struct svc_pool *pool, struct svc_rqst *rqstp)
> list_del(&rqstp->rq_list);
> }
>
> +static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
> +{
> + if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> + return true;
> + if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED)))
> + return xprt->xpt_ops->xpo_has_wspace(xprt);
> + return false;
> +}
> +
> /*
> * Queue up a transport with data pending. If there are idle nfsd
> * processes, wake 'em up.
> @@ -314,8 +323,7 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
> struct svc_rqst *rqstp;
> int cpu;
>
> - if (!(xprt->xpt_flags &
> - ((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
> + if (!svc_xprt_has_something_to_do(xprt))
> return;
>
> cpu = get_cpu();
> @@ -345,25 +353,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
> BUG_ON(xprt->xpt_pool != NULL);
> xprt->xpt_pool = pool;
>
> - /* Handle pending connection */
> - if (test_bit(XPT_CONN, &xprt->xpt_flags))
> - goto process;
> -
> - /* Handle close in-progress */
> - if (test_bit(XPT_CLOSE, &xprt->xpt_flags))
> - goto process;
> -
> - /* Check if we have space to reply to a request */
> - if (!xprt->xpt_ops->xpo_has_wspace(xprt)) {
> - /* Don't enqueue while not enough space for reply */
> - dprintk("svc: no write space, transport %p not enqueued\n",
> - xprt);
> - xprt->xpt_pool = NULL;
> - clear_bit(XPT_BUSY, &xprt->xpt_flags);
> - goto out_unlock;
> - }
> -
> - process:
> if (!list_empty(&pool->sp_threads)) {
> rqstp = list_entry(pool->sp_threads.next,
> struct svc_rqst,
> @@ -744,7 +733,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> spin_unlock_bh(&serv->sv_lock);
> svc_xprt_received(newxpt);
> }
> - } else {
> + } else if (xprt->xpt_ops->xpo_has_wspace(xprt)) {
> dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> rqstp, pool->sp_id, xprt,
> atomic_read(&xprt->xpt_ref.refcount));