2006-10-31 00:13:58

by Steinar H. Gunderson

[permalink] [raw]
Subject: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

[Please Cc me on any replies, I'm not subscribed to the list]

Hi,

We have a machine here running nfs-utils 1.0.10 serving Kerberos over NFSv4,
and every now and then (typically about once a day, but that varies) mounts
just stop working. rpc.svcgssd still runs, packets fly back and forth, but
mount simply refuses. Restarting rpc.svcgssd fixes the problems
immediately -- hosts mounting NFSv4 non-kerberized work just fine.

I've put up a transcript at
http://home.samfundet.no/~sesse/svcgssd-dying.log.bz2 -- hope you get
something useful out of it. (I guess we'll have to change the keytab now, but
hey :-) ) Only the last few calls in there are buggy; the rest are all good.

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-10-31 04:53:08

by Kevin Coffman

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On 10/30/06, Steinar H. Gunderson <[email protected]> wrote:
> [Please Cc me on any replies, I'm not subscribed to the list]
>
> Hi,
>
> We have a machine here running nfs-utils 1.0.10 serving Kerberos over NFSv4,
> and every now and then (typically about once a day, but that varies) mounts
> just stop working. rpc.svcgssd still runs, packets fly back and forth, but
> mount simply refuses. Restarting rpc.svcgssd fixes the problems
> immediately -- hosts mounting NFSv4 non-kerberized work just fine.
>
> I've put up a transcript at
> http://home.samfundet.no/~sesse/svcgssd-dying.log.bz2 -- hope you get
> something useful out of it. (I guess we'll have to change the keytab now, but
> hey :-) ) Only the last few calls in there are buggy; the rest are all good.
>
> /* Steinar */

Hello Steinar,
I don't see anything in the svcgssd output indicating there was any
kind of error.

When you say "packets fly back and forth, but mount simply refuses",
is there any kind of delay (i.e. 30 second timeout) or does the mount
fail pretty quickly? Do you see the "typical" error message(s) from
mount?

What kernel is this with?

K.C.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-31 10:04:54

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Mon, Oct 30, 2006 at 11:53:02PM -0500, Kevin Coffman wrote:
> Hello Steinar,
> I don't see anything in the svcgssd output indicating there was any
> kind of error.

Me neither, that's why it's so odd. :-)

> When you say "packets fly back and forth, but mount simply refuses",
> is there any kind of delay (i.e. 30 second timeout) or does the mount
> fail pretty quickly? Do you see the "typical" error message(s) from
> mount?

The mount fails immediately, IIRC with simply "Permission denied". Also,
existing mounts stop working (usually, they hang).

> What kernel is this with?

Mostly 2.6.17 and 2.6.18 on the clients; 2.6.17.11 on the server.

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-31 13:43:53

by Kevin Coffman

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On 10/31/06, Steinar H. Gunderson <[email protected]> wrote:
> On Mon, Oct 30, 2006 at 11:53:02PM -0500, Kevin Coffman wrote:
> > Hello Steinar,
> > I don't see anything in the svcgssd output indicating there was any
> > kind of error.
>
> Me neither, that's why it's so odd. :-)
>
> > When you say "packets fly back and forth, but mount simply refuses",
> > is there any kind of delay (i.e. 30 second timeout) or does the mount
> > fail pretty quickly? Do you see the "typical" error message(s) from
> > mount?
>
> The mount fails immediately, IIRC with simply "Permission denied". Also,
> existing mounts stop working (usually, they hang).
>
> > What kernel is this with?
>
> Mostly 2.6.17 and 2.6.18 on the clients; 2.6.17.11 on the server.

When you say existing mounts stop working, do you mean that data is no
longer available on those existing mounts, or that new connections
requiring new contexts via svcgssd stop working?

Could you send a packet trace from the server after mounts stop
working? (tcpdump -s 0 -w /tmp/x) Also, if it is enabled, could you
get the output from sysRq-T?

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-31 14:34:01

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Tue, Oct 31, 2006 at 08:43:47AM -0500, Kevin Coffman wrote:
> When you say existing mounts stop working, do you mean that data is no
> longer available on those existing mounts, or that new connections
> requiring new contexts via svcgssd stop working?

Data is no longer available on those existing mounts. In fact, if I run ls
-ld on the mount point, it is suddenly owned by the user and group "?".

> Could you send a packet trace from the server after mounts stop
> working? (tcpdump -s 0 -w /tmp/x) Also, if it is enabled, could you
> get the output from sysRq-T?

I'll be sure to grab the data next time the problems occur.

By the way, I also have an strace log available; but it's rather large-ish:

http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2

Note that it's not the same invocation as the previous log I sent you.

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-08 00:44:48

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Tue, Nov 07, 2006 at 01:47:20AM +0100, Steinar H. Gunderson wrote:
> On Mon, Nov 06, 2006 at 06:49:22PM -0500, J. Bruce Fields wrote:
> >> http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2
> > 18:48:39 ERROR 404: Not Found.
>
> Hm, that's odd. Oh well, I found a log here now, at least -- try it now.

Nope. Are you sure there isn't a typo in the URL? I'm just
cut-n-pasting it.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-08 00:52:02

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Tue, Nov 07, 2006 at 07:44:40PM -0500, J. Bruce Fields wrote:
>>>> http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2
>>> 18:48:39 ERROR 404: Not Found.
>> Hm, that's odd. Oh well, I found a log here now, at least -- try it now.
> Nope. Are you sure there isn't a typo in the URL? I'm just
> cut-n-pasting it.

Gah! It's not my day today. Try once more, please.

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-10 20:11:42

by Kevin Coffman

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On 11/7/06, Steinar H. Gunderson <[email protected]> wrote:
> On Tue, Nov 07, 2006 at 07:44:40PM -0500, J. Bruce Fields wrote:
> >>>> http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2
> >>> 18:48:39 ERROR 404: Not Found.
> >> Hm, that's odd. Oh well, I found a log here now, at least -- try it now.
> > Nope. Are you sure there isn't a typo in the URL? I'm just
> > cut-n-pasting it.
>
> Gah! It's not my day today. Try once more, please.

I didn't find any clues in the strace output either. I'm not sure
where to look next.

K.C.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-10 20:25:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Fri, Nov 10, 2006 at 03:11:31PM -0500, Kevin Coffman wrote:
> I didn't find any clues in the strace output either. I'm not sure
> where to look next.

Well, if we're seeing a NO_CONTEXT error on the wire, and if (as is
apparent from the strace), svcgssd isn't itself sending a NO_CONTEXT
error down, then you must be correct that it's coming from
gss_write_init_verf().

So gss_svc_searchbyctx() is returning NULL.

I wonder if the kmalloc() in dup_to_netobj() is failing? If it was
caused by a memory leak of some kind, that would explain why it takes a
while for the problem to show up, but not why restarting rpc.svcgssd
would help.

But the only other possibilities are that rsc_lookup() or cache_check()
are failing, and I don't see how that can happen when the previous
context downcall succeeded (as I believe the strace showed it did),
unless there's a bug in the cache code.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-10 20:39:47

by Kevin Coffman

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On 11/10/06, J. Bruce Fields <[email protected]> wrote:
> On Fri, Nov 10, 2006 at 03:11:31PM -0500, Kevin Coffman wrote:
> > I didn't find any clues in the strace output either. I'm not sure
> > where to look next.
>
> Well, if we're seeing a NO_CONTEXT error on the wire, and if (as is
> apparent from the strace), svcgssd isn't itself sending a NO_CONTEXT
> error down, then you must be correct that it's coming from
> gss_write_init_verf().
>
> So gss_svc_searchbyctx() is returning NULL.
>
> I wonder if the kmalloc() in dup_to_netobj() is failing? If it was
> caused by a memory leak of some kind, that would explain why it takes a
> while for the problem to show up, but not why restarting rpc.svcgssd
> would help.

That is the part that is most confusing to me. Dumb question time: Is
kernel memory allocated by the kernel while running in the svcgssd
task freed up when that task is restarted?

K.C.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-10 20:49:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Fri, Nov 10, 2006 at 03:39:44PM -0500, Kevin Coffman wrote:
> On 11/10/06, J. Bruce Fields <[email protected]> wrote:
> >Well, if we're seeing a NO_CONTEXT error on the wire, and if (as is
> >apparent from the strace), svcgssd isn't itself sending a NO_CONTEXT
> >error down, then you must be correct that it's coming from
> >gss_write_init_verf().
> >
> >So gss_svc_searchbyctx() is returning NULL.
> >
> >I wonder if the kmalloc() in dup_to_netobj() is failing? If it was
> >caused by a memory leak of some kind, that would explain why it takes a
> >while for the problem to show up, but not why restarting rpc.svcgssd
> >would help.
>
> That is the part that is most confusing to me. Dumb question time: Is
> kernel memory allocated by the kernel while running in the svcgssd
> task freed up when that task is restarted?

No, kmalloc() allocations belong to the kernel, not to any particular
process.

So it's a little hard to tell how the server auth_gss kernel code can
even be affected by svcgssd restarting. Just one obvious thing: svcgssd
holds /proc/net/rpc/auth.rpcsec.init/channel open, so when it's
restarted, there will be a close and then an open of that file.

I think the code in question is net/sunrpc/cache.c:cache_open() and
net/sunrpc/cache.c:cache_release(). Maybe there's a clue there.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-10 21:37:33

by Kevin Coffman

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On 11/10/06, J. Bruce Fields <[email protected]> wrote:
> On Fri, Nov 10, 2006 at 03:39:44PM -0500, Kevin Coffman wrote:
> > On 11/10/06, J. Bruce Fields <[email protected]> wrote:
> > >Well, if we're seeing a NO_CONTEXT error on the wire, and if (as is
> > >apparent from the strace), svcgssd isn't itself sending a NO_CONTEXT
> > >error down, then you must be correct that it's coming from
> > >gss_write_init_verf().
> > >
> > >So gss_svc_searchbyctx() is returning NULL.
> > >
> > >I wonder if the kmalloc() in dup_to_netobj() is failing? If it was
> > >caused by a memory leak of some kind, that would explain why it takes a
> > >while for the problem to show up, but not why restarting rpc.svcgssd
> > >would help.
> >
> > That is the part that is most confusing to me. Dumb question time: Is
> > kernel memory allocated by the kernel while running in the svcgssd
> > task freed up when that task is restarted?
>
> No, kmalloc() allocations belong to the kernel, not to any particular
> process.
>
> So it's a little hard to tell how the server auth_gss kernel code can
> even be affected by svcgssd restarting. Just one obvious thing: svcgssd
> holds /proc/net/rpc/auth.rpcsec.init/channel open, so when it's
> restarted, there will be a close and then an open of that file.

svcgssd actually fopens and fcloses the context channel and opens and
closes the init channel for each downcall. So I don't see how that
would affect it either?

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-10 21:41:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Fri, Nov 10, 2006 at 04:37:32PM -0500, Kevin Coffman wrote:
> svcgssd actually fopens and fcloses the context channel and opens and
> closes the init channel for each downcall. So I don't see how that
> would affect it either?

That's true for the context channel, but for the init channel, it
actually does have a file descriptor open all the time. (It may open
and close a second one when it makes a downcall, I don't remember.)

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-10 21:43:40

by Kevin Coffman

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On 11/10/06, J. Bruce Fields <[email protected]> wrote:
> On Fri, Nov 10, 2006 at 04:37:32PM -0500, Kevin Coffman wrote:
> > svcgssd actually fopens and fcloses the context channel and opens and
> > closes the init channel for each downcall. So I don't see how that
> > would affect it either?
>
> That's true for the context channel, but for the init channel, it
> actually does have a file descriptor open all the time. (It may open
> and close a second one when it makes a downcall, I don't remember.)

Oh yeah... it's gotta read something!

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:08:56

by Kevin Coffman

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On 11/3/06, Steinar H. Gunderson <[email protected]> wrote:
> On Tue, Oct 31, 2006 at 03:32:48PM +0100, Steinar H. Gunderson wrote:
> > Data is no longer available on those existing mounts. In fact, if I run ls
> > -ld on the mount point, it is suddenly owned by the user and group "?".
>
> I figured out later that this isn't always true -- for instance, in one case
> root could ls a directory, but others couldn't (and new mounts wouldn't
> work).
>
> >> Could you send a packet trace from the server after mounts stop
> >> working? (tcpdump -s 0 -w /tmp/x) Also, if it is enabled, could you
> >> get the output from sysRq-T?
> > I'll be sure to grab the data next time the problems occur.
>
> Here we go. The dump is a complete tcpdump from the server (129.241.93.19) at
> the same time as a client (129.241.93.50) attempts to do a "mount -t nfs4 -o
> sec=krb5i cassarossa:/itk /mnt". The call traces is the result of a "cat
> /proc/kmsg" while running "echo t > /proc/sysrq-trigger".

Hello Steinar,
The userland messages from svcgssd all indicate it is happy and is
sending down a context to the kernel. The packet trace shows that the
kernel is replying to the NULL call with GSS_S_NO_CONTEXT. This can
be returned by the kernel if it is unable to find the context that was
(supposedly) sent down from svcgssd.

It is a mystery to me why this is happening. Especially mysterious is
that restarting svcgssd clears it up. Can you check to see if there
are memory or other resource problems with svcgssd when it stops
working?

K.C.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:15:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Mon, Nov 06, 2006 at 06:08:49PM -0500, Kevin Coffman wrote:
> Hello Steinar,
> The userland messages from svcgssd all indicate it is happy and is
> sending down a context to the kernel. The packet trace shows that the
> kernel is replying to the NULL call with GSS_S_NO_CONTEXT. This can
> be returned by the kernel if it is unable to find the context that was
> (supposedly) sent down from svcgssd.
>
> It is a mystery to me why this is happening. Especially mysterious is
> that restarting svcgssd clears it up. Can you check to see if there
> are memory or other resource problems with svcgssd when it stops
> working?

I wonder if it'd also be interesting to see the kernel/svcgssd
interactions when it's failing. If there's strace options that'll give
you all the I/O data.....

>From the strace man page: it looks like

strace -oTMP -e read=a,b,c -e write=a,b,c pid-of-svcgssd

should do it, where a, b, c... are the file descriptors of any of the
files named "channel" under /proc/net/rpc. You can find those file
descriptors with

ls -l /proc/pid-of-scgssd/fd/ |grep channel

Fun.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:18:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Tue, Nov 07, 2006 at 12:16:15AM +0100, Steinar H. Gunderson wrote:
> On Mon, Nov 06, 2006 at 06:15:39PM -0500, J. Bruce Fields wrote:
> > I wonder if it'd also be interesting to see the kernel/svcgssd
> > interactions when it's failing. If there's strace options that'll give
> > you all the I/O data.....
>
> The strace dump I sent should be comprehensive; I don't see what data you
> would be missing.

Whoops, I wasn't paying attention. OK, thanks, I'll take a look.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:18:14

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Mon, Nov 06, 2006 at 06:15:39PM -0500, J. Bruce Fields wrote:
> I wonder if it'd also be interesting to see the kernel/svcgssd
> interactions when it's failing. If there's strace options that'll give
> you all the I/O data.....

The strace dump I sent should be comprehensive; I don't see what data you
would be missing.

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:22:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Mon, Nov 06, 2006 at 06:18:21PM -0500, J. Bruce Fields wrote:
> On Tue, Nov 07, 2006 at 12:16:15AM +0100, Steinar H. Gunderson wrote:
> > On Mon, Nov 06, 2006 at 06:15:39PM -0500, J. Bruce Fields wrote:
> > > I wonder if it'd also be interesting to see the kernel/svcgssd
> > > interactions when it's failing. If there's strace options that'll give
> > > you all the I/O data.....
> >
> > The strace dump I sent should be comprehensive; I don't see what data you
> > would be missing.
>
> Whoops, I wasn't paying attention. OK, thanks, I'll take a look.

I just looked back through my mail and can find sysrq-t and tcpdump
output, but no strace output. Did you send it and I lost it?

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:45:28

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Mon, Nov 06, 2006 at 06:08:49PM -0500, Kevin Coffman wrote:
> It is a mystery to me why this is happening. Especially mysterious is
> that restarting svcgssd clears it up. Can you check to see if there
> are memory or other resource problems with svcgssd when it stops
> working?

The machine has 2GB of RAM (plus plenty of free swap), and in general
everything else chugs happily along. I'm not sure what kind of other resource
problems there would be :-)

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:47:12

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Mon, Nov 06, 2006 at 06:22:24PM -0500, J. Bruce Fields wrote:
> I just looked back through my mail and can find sysrq-t and tcpdump
> output, but no strace output. Did you send it and I lost it?

http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-06 23:49:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Tue, Nov 07, 2006 at 12:46:07AM +0100, Steinar H. Gunderson wrote:
> On Mon, Nov 06, 2006 at 06:22:24PM -0500, J. Bruce Fields wrote:
> > I just looked back through my mail and can find sysrq-t and tcpdump
> > output, but no strace output. Did you send it and I lost it?
>
> http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2

bfields@puzzle:~$ wget http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2
--18:48:38-- http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2
=> `svcgssd-strace.log.bz2'
Resolving home.samfundet.no... 129.241.93.17, 2001:700:300:1800::1917
Connecting to home.samfundet.no|129.241.93.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
18:48:39 ERROR 404: Not Found.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-07 00:48:30

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: [BUG] All Kerberos mounts stop working, restarting rpc.svcgssd helps

On Mon, Nov 06, 2006 at 06:49:22PM -0500, J. Bruce Fields wrote:
>> http://home.samfundet.no/~sesse/svcgssd-strace.log.bz2
> 18:48:39 ERROR 404: Not Found.

Hm, that's odd. Oh well, I found a log here now, at least -- try it now.

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs