2004-08-27 03:31:29

by NeilBrown

[permalink] [raw]
Subject: Re: rpc.mountd stops functioning

On Thursday August 5, [email protected] wrote:
> On Thu, Aug 05, 2004 at 12:19:26PM +0100, Johan van den Dorpe wrote:
> > For a long time we've been experiencing problems with rpc.mountd
> > stopping functioning.
>
> Maybe you need the following?
>
> (Neil or someone, could you please apply this? Last I checked it wasn't
> in nfs-utils cvs.)

I don't think this patch is needed (though it wouldn't actually break anything).
The loop in svc_run.c starts:


readfds = svc_fdset;
cache_set_fds(&readfds);

so readfds is completely initialised at the top of the loop, so
clearing something at the bottom (which is essentially what this patch
does) should be a no-op.

Am I wrong?

NeilBrown

>
> --Bruce Fields
>
> >From Garrick Staples <[email protected]>:
>
> After mountd handles a cache upcall, we should clear the relevant bits in the
> fd_set.
>
> ---
>
> nfs-utils-1.0.6-bfields/utils/mountd/cache.c | 1 +
> 1 files changed, 1 insertion(+)
>
> diff -puN utils/mountd/cache.c~cache_select_bugfix utils/mountd/cache.c
> --- nfs-utils-1.0.6/utils/mountd/cache.c~cache_select_bugfix 2004-07-14 12:52:57.000000000 -0400
> +++ nfs-utils-1.0.6-bfields/utils/mountd/cache.c 2004-07-14 12:52:57.000000000 -0400
> @@ -315,6 +315,7 @@ int cache_process_req(fd_set *readfds)
> FD_ISSET(fileno(cachelist[i].f), readfds)) {
> cnt++;
> cachelist[i].cache_handle(cachelist[i].f);
> + FD_CLR(fileno(cachelist[i].f), readfds);
> }
> }
> return cnt;
> _
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by OSTG. Have you noticed the changes on
> Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
> one more big change to announce. We are now OSTG- Open Source Technology
> Group. Come see the changes on the new OSTG site. http://www.ostg.com
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-08-27 23:17:01

by J. Bruce Fields

[permalink] [raw]
Subject: Re: rpc.mountd stops functioning

On Fri, Aug 27, 2004 at 01:31:20PM +1000, Neil Brown wrote:
> I don't think this patch is needed (though it wouldn't actually break anything).
> The loop in svc_run.c starts:
>
>
> readfds = svc_fdset;
> cache_set_fds(&readfds);
>
> so readfds is completely initialised at the top of the loop, so
> clearing something at the bottom (which is essentially what this patch
> does) should be a no-op.
>
> Am I wrong?

Note that cache_process_req() (which the patch below modifies) is not
all the way at the bottom of the loop; there's still an importan call to
svc_getreqset() after it.

If I understand the problem correctly, what happens is that if select()
returns with a set that includes two file descriptors, the first for a
cache channel and the second for an rpc socket, and if we don't clear
the first file descriptor from that set before passing it to
svc_getreqset(), then the rpc code will try to read an rpc request from
a cache channel file, and will get very confused.

The symptom is that mountd may die occasionally on a server that's
using the new interface (hence mountd is handling upcalls), and that's
getting a lot of mount requests.

I've never seen the bug myself, because I'm almost always using nfsv4,
hence only use mountd to handle upcalls....

--b.


> > >From Garrick Staples <[email protected]>:
> >
> > After mountd handles a cache upcall, we should clear the relevant bits in the
> > fd_set.
> >
> > ---
> >
> > nfs-utils-1.0.6-bfields/utils/mountd/cache.c | 1 +
> > 1 files changed, 1 insertion(+)
> >
> > diff -puN utils/mountd/cache.c~cache_select_bugfix utils/mountd/cache.c
> > --- nfs-utils-1.0.6/utils/mountd/cache.c~cache_select_bugfix 2004-07-14 12:52:57.000000000 -0400
> > +++ nfs-utils-1.0.6-bfields/utils/mountd/cache.c 2004-07-14 12:52:57.000000000 -0400
> > @@ -315,6 +315,7 @@ int cache_process_req(fd_set *readfds)
> > FD_ISSET(fileno(cachelist[i].f), readfds)) {
> > cnt++;
> > cachelist[i].cache_handle(cachelist[i].f);
> > + FD_CLR(fileno(cachelist[i].f), readfds);
> > }
> > }
> > return cnt;
> > _
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by OSTG. Have you noticed the changes on
> > Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
> > one more big change to announce. We are now OSTG- Open Source Technology
> > Group. Come see the changes on the new OSTG site. http://www.ostg.com
> > _______________________________________________
> > NFS maillist - [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-28 17:20:59

by Garrick Staples

[permalink] [raw]
Subject: Re: rpc.mountd stops functioning

On Fri, Aug 27, 2004 at 07:16:54PM -0400, J. Bruce Fields alleged:
> On Fri, Aug 27, 2004 at 01:31:20PM +1000, Neil Brown wrote:
> > I don't think this patch is needed (though it wouldn't actually break anything).
> > The loop in svc_run.c starts:
> >
> >
> > readfds = svc_fdset;
> > cache_set_fds(&readfds);
> >
> > so readfds is completely initialised at the top of the loop, so
> > clearing something at the bottom (which is essentially what this patch
> > does) should be a no-op.
> >
> > Am I wrong?
>
> Note that cache_process_req() (which the patch below modifies) is not
> all the way at the bottom of the loop; there's still an importan call to
> svc_getreqset() after it.
>
> If I understand the problem correctly, what happens is that if select()
> returns with a set that includes two file descriptors, the first for a
> cache channel and the second for an rpc socket, and if we don't clear
> the first file descriptor from that set before passing it to
> svc_getreqset(), then the rpc code will try to read an rpc request from
> a cache channel file, and will get very confused.
>
> The symptom is that mountd may die occasionally on a server that's
> using the new interface (hence mountd is handling upcalls), and that's
> getting a lot of mount requests.

Yes, this is exactly what happens. The RPC code reading from the cache channel
file segfaults mountd every time.

On my two main fileservers, mountd would die every few hours. Sometimes,
conditions being optimal to trigger the bug, it would chain-crash. It hasn't
segfaulted at all since that one line patch.

I'm surprised more people haven't reported this segfault. I guess not many
people are using the new interface in heavy production.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.77 kB)
(No filename) (189.00 B)
Download all attachments