2004-04-30 21:25:26

by Garrick Staples

[permalink] [raw]
Subject: mountd segfault on itanium2

Hi all,
I'm having a terrible time with mountd segfaulting on two Itanium boxes. I
can't find a specific trigger, but I can generally trigger it within a few
minutes by just calling mount/umount a few hundred times.

I'm using glibc 2.3.2 and nfs-utils 1.0.6 from RHE.

In the tests below, I have a single directory exported to 10.125.0.0/16. Since
I know name resolution was a recent problem, I've made sure all clients are in
/etc/hosts. I'm using NIS, but files is before dns and nis in nsswitch.conf.
I've also tested with and without nscd running.

Thanks in advance for any help.


gdb isn't showing much in a backtrace, but I can supply a core if anyone wants
it.

# gdb --core=/var/lib/nfs/core.15841
...
This GDB was configured as "ia64-redhat-linux-gnu".
Core was generated by `./mountd -F -d all'.
Program terminated with signal 11, Segmentation fault.
#0 0x20000008002c19d0 in ?? ()
(gdb) bt
#0 0x20000008002c19d0 in ?? ()
#1 0x20000008002c1950 in ?? ()
Previous frame identical to this frame (corrupt stack?)


I have a few different straces that show the segfault happening in different
places in the code:

open("/proc/fs/nfsd/filehandle", O_RDWR) = 9
fstat(9, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2000000800440000
write(9, "10.125.0.0/16 /export/usc-01 64 "..., 33) = 33
read(9, "\\x010000000008001102000000\n", 16384) = 27
close(9) = 0
munmap(0x2000000800440000, 65536) = 0
brk(0) = 0x2000000800038000
sendmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(641),
sin_addr=inet_addr("10.125.1.176")},
msg_iov(1)=[{"#\246?\315\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 56}],
msg_controllen=32, msg_control=0x2000000800341dd8, , msg_flags=0}, 0) = 56
select(1024, [3 4 5 6 7], NULL, NULL, NULL) = 2 (in [5 6])
read(5, "", 0) = 0
--- SIGSEGV (Segmentation fault) @ 20000008002c19d0 (63742f3132353111) ---


open("/var/lib/nfs/rmtab", O_RDWR) = 10
fstat(10, {st_mode=S_IFREG|0644, st_size=6445, ...}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x200000000037c000
lseek(10, 0, SEEK_CUR) = 0
read(10, "10.125.1.10:10.125.0.0/16:0x0000"..., 16384) = 6445
lseek(10, 6445, SEEK_SET) = 6445
lseek(10, -4678, SEEK_CUR) = 1767
write(10, "10.125.0.0/16:/export/usc-01:0x0"..., 40) = 40
fdatasync(10) = 0
close(10) = 0
munmap(0x200000000037c000, 65536) = 0
close(8) = 0
gettimeofday({1083357502, 998698}, NULL) = 0
write(5, "10.125.0.0/16 0 \\x00080011020000"..., 62) = 62
--- SIGSEGV (Segmentation fault) @ 20000000002899d0 (7064752f35343639) ---

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (2.84 kB)
(No filename) (189.00 B)
Download all attachments

2004-04-30 23:44:39

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 02:24:14PM -0700, Garrick Staples alleged:
> Hi all,
> I'm having a terrible time with mountd segfaulting on two Itanium boxes. I
> can't find a specific trigger, but I can generally trigger it within a few
> minutes by just calling mount/umount a few hundred times.
>
> I'm using glibc 2.3.2 and nfs-utils 1.0.6 from RHE.
>
> In the tests below, I have a single directory exported to 10.125.0.0/16. Since
> I know name resolution was a recent problem, I've made sure all clients are in
> /etc/hosts. I'm using NIS, but files is before dns and nis in nsswitch.conf.
> I've also tested with and without nscd running.

> select(1024, [3 4 5 6 7], NULL, NULL, NULL) = 2 (in [5 6])
> read(5, "", 0) = 0
> --- SIGSEGV (Segmentation fault) @ 20000008002c19d0 (63742f3132353111) ---

> write(5, "10.125.0.0/16 0 \\x00080011020000"..., 62) = 62
> --- SIGSEGV (Segmentation fault) @ 20000000002899d0 (7064752f35343639) ---


I just spotted a pattern. After collecting several strace samples, it always
segfaults after read() or write() to fd 5. And fd 5 is always:

open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

I have no idea what the file is for, but grep'ing my straces shows that mountd
doesn't normally use it. It can handle hundreds of mount/umount requests
without ever touching fd 5. Then at some point it reads once:

read(5, "10.125.0.0/16 0 \\x00080011020000"..., 128) = 35

If it doesn't segfault on the read(), it might segfault on a write() very soon
after:

write(5, "10.125.0.0/16 0 \\x00080011020000"..., 62) = 62


Thanks in advance to anyone that knows what's going on.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.70 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-01 00:15:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples wrote:
> I just spotted a pattern. After collecting several strace samples, it always
> segfaults after read() or write() to fd 5. And fd 5 is always:
>
> open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

Any interesting messages from the kernel (in /var/log/messages)?

--Bruce Fields


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-01 00:28:21

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 08:15:35PM -0400, J. Bruce Fields alleged:
> On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples wrote:
> > I just spotted a pattern. After collecting several strace samples, it always
> > segfaults after read() or write() to fd 5. And fd 5 is always:
> >
> > open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5
>
> Any interesting messages from the kernel (in /var/log/messages)?

Nope, nothing in dmesg either.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (529.00 B)
(No filename) (189.00 B)
Download all attachments

2004-05-01 03:08:43

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples alleged:
> I just spotted a pattern. After collecting several strace samples, it always
> segfaults after read() or write() to fd 5. And fd 5 is always:
>
> open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

I have an ugly work-around that seems to be working. It seems that 2.6 has a
new nfs interface for userspace. By forcing mountd to use the older 2.4
interface, it doesn't segfault anymore. So something in the new code paths is
broken.

In support/nfs/cachio.c:
int
check_new_cache(void)
{
struct stat stb;

return 0; /* DISABLE NEW 2.6 INTERFACE */

return (stat("/proc/fs/nfs/filehandle", &stb) == 0) ||
(stat("/proc/fs/nfsd/filehandle", &stb) == 0);
}

Am I losing any functionality by doing this? I can't actually find any
problems.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (935.00 B)
(No filename) (189.00 B)
Download all attachments

2004-05-03 13:36:44

by Jose R. Santos

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On 04/30/04 22:07:30, Garrick Staples wrote:
> I have an ugly work-around that seems to be working. It seems that 2.6 has a
> new nfs interface for userspace. By forcing mountd to use the older 2.4
> interface, it doesn't segfault anymore. So something in the new code paths is
> broken.
>
> In support/nfs/cachio.c:
> int
> check_new_cache(void)
> {
> struct stat stb;
>
> return 0; /* DISABLE NEW 2.6 INTERFACE */
>
> return (stat("/proc/fs/nfs/filehandle", &stb) == 0) ||
> (stat("/proc/fs/nfsd/filehandle", &stb) == 0);
> }

If you want to use the old syscall interface, all you need to do is make
sure nfsd is not mounted in /proc/fs/nfsd.

> Am I losing any functionality by doing this? I can't actually find any
> problems.

I think there are no functionality losses but I have seen issues with using
the syscall interface on large memory systems. The syscall mount and
unmount the nfsdfs for every syscall. This will cause the inode and dentry
for this filesystem to be flush, but in order to do that it needs to walk
through all inode and dentry caches under lock. If you have a system with
16GB of ram, mounting and unmounting can take a really long time.

-JRS


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-03 18:28:04

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 08:07:30PM -0700, Garrick Staples alleged:
> On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples alleged:
> > I just spotted a pattern. After collecting several strace samples, it always
> > segfaults after read() or write() to fd 5. And fd 5 is always:
> >
> > open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

I was able to insert some printf()'s after the write() to fd 5, which means it
is segfaulting after that file.

The segfault is inside svc_getreqset() in utils/mountd/svc_run.c:
switch (selret) {
case -1:
if (errno == EINTR || errno == ECONNREFUSED
|| errno == ENETUNREACH || errno == EHOSTUNREACH)
continue;
xlog(L_ERROR, "my_svc_run() - select: %m");
return;

default:
selret -= cache_process_req(&readfds);
if (selret)
svc_getreqset(&readfds);
}

Either glibc has a bug or the kernel is doing something fugly to the readfds?

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.20 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-03 20:48:57

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 08:07:30PM -0700, Garrick Staples alleged:
> On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples alleged:
> > I just spotted a pattern. After collecting several strace samples, it always
> > segfaults after read() or write() to fd 5. And fd 5 is always:
> >
> > open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

More info:

Program received signal SIGSEGV, Segmentation fault.
0x20000000002899d0 in svc_getreq_common_internal () from /lib/tls/libc.so.6.1
(gdb) bt
#0 0x20000000002899d0 in svc_getreq_common_internal ()
from /lib/tls/libc.so.6.1
#1 0x2000000000289720 in svc_getreqset_internal () from /lib/tls/libc.so.6.1
#2 0x4000000000009d60 in my_svc_run () at svc_run.c:62
#3 0x4000000000006160 in main (argc=4, argv=0x60000fffffffb8c8)
at mountd.c:436



--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (888.00 B)
(No filename) (189.00 B)
Download all attachments

2004-05-03 18:21:42

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Mon, May 03, 2004 at 08:35:46AM -0500, Jose R. Santos alleged:
> On 04/30/04 22:07:30, Garrick Staples wrote:
> > I have an ugly work-around that seems to be working. It seems that 2.6 has a
> > new nfs interface for userspace. By forcing mountd to use the older 2.4
> > interface, it doesn't segfault anymore. So something in the new code paths is
> > broken.
> >
> > In support/nfs/cachio.c:
> > int
> > check_new_cache(void)
> > {
> > struct stat stb;
> >
> > return 0; /* DISABLE NEW 2.6 INTERFACE */
> >
> > return (stat("/proc/fs/nfs/filehandle", &stb) == 0) ||
> > (stat("/proc/fs/nfsd/filehandle", &stb) == 0);
> > }
>
> If you want to use the old syscall interface, all you need to do is make
> sure nfsd is not mounted in /proc/fs/nfsd.

Thank you. That's much easier.


> > Am I losing any functionality by doing this? I can't actually find any
> > problems.
>
> I think there are no functionality losses but I have seen issues with using
> the syscall interface on large memory systems. The syscall mount and
> unmount the nfsdfs for every syscall. This will cause the inode and dentry
> for this filesystem to be flush, but in order to do that it needs to walk
> through all inode and dentry caches under lock. If you have a system with
> 16GB of ram, mounting and unmounting can take a really long time.
>
> -JRS

Both machines have 8GB. I definitly see the slower mounting, but it's not
prohibitive. And slow mounting is better than a segfault.


--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.57 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-04 00:19:08

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 08:07:30PM -0700, Garrick Staples alleged:
> On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples alleged:
> > I just spotted a pattern. After collecting several strace samples, it always
> > segfaults after read() or write() to fd 5. And fd 5 is always:
> >
> > open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5
>
> I have an ugly work-around that seems to be working. It seems that 2.6 has a
> new nfs interface for userspace. By forcing mountd to use the older 2.4
> interface, it doesn't segfault anymore. So something in the new code paths is
> broken.

I'm slowly starting to wrap my brain around how these RPC calls work. I've
found something that I can't make sense of. In my_svc_run(), it packs fds 3,
4, 5, 6, and 7 into select(). 3, 4, and 5 are 3 files in /proc/net/rpc. fd 6
and 7 are udp and tcp sockets. During my umount/mount tests, fd 6 is the
only set bit after the select(), and is then passed to svc_getreqset().

But just before the segfault, select() sets fd 5, which is
/proc/net/rpc/nfsd.fh/channel. The thing that I don't understand is that fd 5
is being passed to svc_getreqset(). Shouldn't svc_getreqset() be only for fds
of sockets that have pending rpc calls? Should fd 5 be cleared from the fdset
before calling svc_getreqset()?

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.36 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-04 01:02:02

by J. Bruce Fields

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Mon, May 03, 2004 at 05:17:18PM -0700, Garrick Staples wrote:
> I'm slowly starting to wrap my brain around how these RPC calls work. I've
> found something that I can't make sense of. In my_svc_run(), it packs fds 3,
> 4, 5, 6, and 7 into select(). 3, 4, and 5 are 3 files in /proc/net/rpc. fd 6
> and 7 are udp and tcp sockets. During my umount/mount tests, fd 6 is the
> only set bit after the select(), and is then passed to svc_getreqset().
>
> But just before the segfault, select() sets fd 5, which is
> /proc/net/rpc/nfsd.fh/channel. The thing that I don't understand is that fd 5
> is being passed to svc_getreqset(). Shouldn't svc_getreqset() be only for fds
> of sockets that have pending rpc calls? Should fd 5 be cleared from the fdset
> before calling svc_getreqset()?

Cool, good detective work, I think that must be it. Other people
weren't seeing it because you have to be using the new interface (have
nfsd mounted), and have to get an rpc call and a kernel upcall at the
same time. Does clearing the bit end the segfaults?

--Bruce Fields

>From Garrick Staples <[email protected]>:

After mountd handles a cache upcall, we should clear the relevant bits in the
fd_set.


utils/mountd/cache.c | 1 +
1 files changed, 1 insertion(+)

diff -puN utils/mountd/svc_run.c~cache_select_bugfix utils/mountd/svc_run.c
diff -puN utils/mountd/cache.c~cache_select_bugfix utils/mountd/cache.c
--- nfs-utils-1.0.6/utils/mountd/cache.c~cache_select_bugfix 2004-05-03 20:57:15.000000000 -0400
+++ nfs-utils-1.0.6-bfields/utils/mountd/cache.c 2004-05-03 20:57:15.000000000 -0400
@@ -315,6 +315,7 @@ int cache_process_req(fd_set *readfds)
FD_ISSET(fileno(cachelist[i].f), readfds)) {
cnt++;
cachelist[i].cache_handle(cachelist[i].f);
+ FD_CLR(fileno(cachelist[i].f), readfds);
}
}
return cnt;

_


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-04 01:41:48

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Mon, May 03, 2004 at 05:17:18PM -0700, Garrick Staples alleged:
> On Fri, Apr 30, 2004 at 08:07:30PM -0700, Garrick Staples alleged:
> > On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples alleged:
> > > I just spotted a pattern. After collecting several strace samples, it always
> > > segfaults after read() or write() to fd 5. And fd 5 is always:
> > >
> > > open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5
> >
> > I have an ugly work-around that seems to be working. It seems that 2.6 has a
> > new nfs interface for userspace. By forcing mountd to use the older 2.4
> > interface, it doesn't segfault anymore. So something in the new code paths is
> > broken.
>
> I'm slowly starting to wrap my brain around how these RPC calls work. I've
> found something that I can't make sense of. In my_svc_run(), it packs fds 3,
> 4, 5, 6, and 7 into select(). 3, 4, and 5 are 3 files in /proc/net/rpc. fd 6
> and 7 are udp and tcp sockets. During my umount/mount tests, fd 6 is the
> only set bit after the select(), and is then passed to svc_getreqset().
>
> But just before the segfault, select() sets fd 5, which is
> /proc/net/rpc/nfsd.fh/channel. The thing that I don't understand is that fd 5
> is being passed to svc_getreqset(). Shouldn't svc_getreqset() be only for fds
> of sockets that have pending rpc calls? Should fd 5 be cleared from the fdset
> before calling svc_getreqset()?

Going with this theory, I added a FD_CLR to clear those bits and it seems to
have fixed the problem. I've no idea the ramification of this fix, but
everything seems to be working. Anyone know if this is really bad?


diff -ruN utils/mountd/cache.c_orig utils/mountd/cache.c
--- utils/mountd/cache.c_orig 2004-05-03 18:07:26.257126950 -0700
+++ utils/mountd/cache.c 2004-05-03 18:07:28.639939421 -0700
@@ -317,6 +317,7 @@
FD_ISSET(fileno(cachelist[i].f), readfds)) {
cnt++;
cachelist[i].cache_handle(cachelist[i].f);
+ FD_CLR(fileno(cachelist[i].f), readfds);
}
}
return cnt;



--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (2.16 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-04 01:54:42

by Garrick Staples

[permalink] [raw]
Subject: Re: mountd segfault on itanium2

On Mon, May 03, 2004 at 09:01:58PM -0400, J. Bruce Fields alleged:
> On Mon, May 03, 2004 at 05:17:18PM -0700, Garrick Staples wrote:
> > I'm slowly starting to wrap my brain around how these RPC calls work. I've
> > found something that I can't make sense of. In my_svc_run(), it packs fds 3,
> > 4, 5, 6, and 7 into select(). 3, 4, and 5 are 3 files in /proc/net/rpc. fd 6
> > and 7 are udp and tcp sockets. During my umount/mount tests, fd 6 is the
> > only set bit after the select(), and is then passed to svc_getreqset().
> >
> > But just before the segfault, select() sets fd 5, which is
> > /proc/net/rpc/nfsd.fh/channel. The thing that I don't understand is that fd 5
> > is being passed to svc_getreqset(). Shouldn't svc_getreqset() be only for fds
> > of sockets that have pending rpc calls? Should fd 5 be cleared from the fdset
> > before calling svc_getreqset()?
>
> Cool, good detective work, I think that must be it. Other people
> weren't seeing it because you have to be using the new interface (have
> nfsd mounted), and have to get an rpc call and a kernel upcall at the
> same time. Does clearing the bit end the segfaults?

I just got this, for some reason mail from sourceforge seems to be lagging by
several hours today.

Yes, that seems to fix the problem. Isn't this a significant DOS attack?
Anyone on the net could generate lots of mount/umount requests to mountd
running on any 2.6 machine and segfault mountd.

Is there a deeper problem in glibc's rpc code that shouldn't be segfaulting?

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.58 kB)
(No filename) (189.00 B)
Download all attachments