2005-11-08 00:04:55

by Vince Busam

[permalink] [raw]
Subject: oops in rpc_pipe_release

I'm using NFS3 with kerberos authentication, and 25 hour tickets that refresh when
unlocking the screensaver. Over the weekend, it'll hang with one of the following stack
traces. Any ideas what could cause this?

Thanks,
Vince

Unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
f890c6ca
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in: des binfmt_misc cpufreq_userspace cpufreq_ondemand cpufreq_powersave
autofs4 video sony_acpi pcc_acpi dev_acpi i2c_acpi_ec i2c_core button battery container ac
capability commoncap nfs lockd af_packet tg3 piix snd_intel8x0 snd_usb_audio
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_usb_lib snd_rawmidi snd_seq_device
snd_hwdep snd_timer snd soundcore snd_page_alloc pwc videodev v4l2_common uhci_hcd
pci_hotplug floppy pcspkr rtc md dm_mod nvidia agpgart psmouse tsdev evdev mousedev usbhid
parport_pc lp parport ide_generic ide_disk ide_cd cdrom rpcsec_gss_krb5 auth_rpcgss sunrpc
ehci_hcd ext3 jbd mbcache ahci sd_mod ata_piix libata usb_storage usbcore scsi_mod
ide_core unix thermal processor fan
CPU: 1
EIP: 0060:[<f890c6ca>] Tainted: P VLI
EFLAGS: 00010203 (2.6.12-gg3)
EIP is at __gss_unhash_msg+0x1a/0x60 [auth_rpcgss]
eax: 00000000 ebx: ef789200 ecx: ef789220 edx: 00000000
esi: efe1dbe4 edi: ffffffe0 ebp: f0368500 esp: efbb5f00
ds: 007b es: 007b ss: 0068
Process rpc.gssd (pid: 7760, threadinfo=efbb4000 task=efb56a60)
Stack: c02c3726 ef789200 ef789200 f890c734 ef789200 ef789208 ef789200 f890ccca
ef789200 f0368684 f0368500 f0368684 f8acbfae ef789208 f0368500 ef3a6a80
f0368500 f8acc33b f0368500 ffffffe0 00000008 ef3a6a80 f0369380 c01667fa
Call Trace:
[<c02c3726>] _spin_lock+0x16/0x90
[<f890c734>] gss_unhash_msg+0x24/0x40 [auth_rpcgss]
[<f890ccca>] gss_pipe_destroy_msg+0x3a/0xa0 [auth_rpcgss]
[<f8acbfae>] __rpc_purge_upcall+0x3e/0xb0 [sunrpc]
[<f8acc33b>] rpc_pipe_release+0xcb/0xf0 [sunrpc]
[<c01667fa>] __fput+0x18a/0x1d0
[<c0164b72>] filp_close+0x52/0xa0
[<c0164c2a>] sys_close+0x6a/0xa0
[<c010343b>] sysenter_past_esp+0x54/0x75
Code: e8 ac 2d 81 c7 eb bb 8d 76 00 8d bc 27 00 00 00 00 53 83 ec 08 8b 5c 24 10 8b 53 20
8d 4b 20 39 ca 75 05 83 c4 08 5b c3 8b 41 04 <89> 42 04 89 10 89 49 04 8b 43 1c 89 4b 20
89 44 24 04 8d 43 2c


------------[ cut here ]------------
kernel BUG at <bad filename>:53227!
invalid operand: 0000 [#1]
PREEMPT SMP
Modules linked in: des binfmt_misc cpufreq_userspace cpufreq_ondemand cpufreq_powersave
autofs4 video button battery container ac nfs lockd af_packet tg3 generic snd_intel8x0
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc
hw_random uhci_hcd pci_hotplug floppy pcspkr rtc evdev md_mod dm_mod nvidia agpgart
psmouse mousedev parport_pc lp parport ide_generic ide_disk ide_cd ide_core
rpcsec_gss_krb5 auth_rpcgss sunrpc ehci_hcd usbcore ext3 jbd mbcache ahci sr_mod cdrom
sd_mod sg ata_piix libata scsi_mod unix thermal processor fan
CPU: 0
EIP: 0060:[<f88d0637>] Tainted: P VLI
EFLAGS: 00010203 (2.6.13.4-gg4)
EIP is at gss_release_msg+0x47/0x50 [auth_rpcgss]
eax: f72a6420 ebx: f72a6400 ecx: f72a6420 edx: 00000000
esi: f6c5ed04 edi: ffffffe0 ebp: f6c5eb80 esp: f71cdf28
ds: 007b es: 007b ss: 0068
Process rpc.gssd (pid: 5183, threadinfo=f71cc000 task=f7af0020)
Stack: f6c5eb80 f6c5eb80 f8a93dae f72a6400 f6c5eb80 d5ce9180 f6c5eb80 f8a9413b
f6c5eb80 ffffffe0 00000008 d5ce9180 f74b7700 c0166caa f6c5eb80 d5ce9180
00000000 00000000 dff47290 d5ce9180 c2b3f180 00000000 d5ce9180 c0164fb6
Call Trace:
[<f8a93dae>] __rpc_purge_upcall+0x3e/0xb0 [sunrpc]
[<f8a9413b>] rpc_pipe_release+0xcb/0xf0 [sunrpc]
[<c0166caa>] __fput+0x18a/0x1d0
[<c0164fb6>] filp_close+0x46/0x90
[<c016506a>] sys_close+0x6a/0xa0
[<c010316b>] sysenter_past_esp+0x54/0x75
Code: 68 85 d2 75 12 89 5c 24 0c 58 5b e9 84 df 87 c7 8d 74 26 00 58 5b c3 f0 ff 0a 0f 94
c0 84 c0 74 e4 89 14 24 e8 fb 0a 00 00 eb da <0f> 0b eb cf 90 8d 74 26 00 56 53 83 ec 08
8b 4c 24 14 8b 74 24
-------------------------------------------------------------------




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-11-08 00:10:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: oops in rpc_pipe_release

On Mon, Nov 07, 2005 at 04:04:43PM -0800, Vince Busam wrote:
> I'm using NFS3 with kerberos authentication, and 25 hour tickets that
> refresh when unlocking the screensaver. Over the weekend, it'll hang
> with one of the following stack traces. Any ideas what could cause
> this?

Could you retry with Trond's latest NFS_ALL?

http://linux-nfs.org/Linux-2.6.x/2.6.14/linux-2.6.14-NFS_ALL.dif

--b.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-08 14:23:40

by Steve Dickson

[permalink] [raw]
Subject: Re: oops in rpc_pipe_release



Vince Busam wrote:
> I'm using NFS3 with kerberos authentication, and 25 hour tickets that
> refresh when
> unlocking the screensaver. Over the weekend, it'll hang with one of the
> following stack
> traces. Any ideas what could cause this?
I believe this is caused by the fact gss_pipe_release()
(i.e. rpci->ops->release_pipe(inode)) is being called
with a freed clnt->cl_auth pointer. I proposed the
following patch a while back that I thought fixed the
problem, but Trond said the patch "prevents anyone from
reopening the pipe after the first close(), so if gssd
needs to be restarted, then all pipes will forever block."
So the patch got reverted....


--- linux-2.6.13/net/sunrpc/rpc_pipe.c.orig 2005-08-28
19:41:01.000000000 -0400
+++ linux-2.6.13/net/sunrpc/rpc_pipe.c 2005-09-16 11:18:53.598157000 -0400
@@ -177,6 +177,8 @@ rpc_pipe_release(struct inode *inode, st
__rpc_purge_upcall(inode, -EPIPE);
if (rpci->ops->release_pipe)
rpci->ops->release_pipe(inode);
+ if (!rpci->nreaders && !rpci->nwriters)
+ rpci->ops = NULL;
out:
up(&inode->i_sem);
return 0;

I think the main problem here is there is no way of telling
if a rpc_inode is or is not valid (or active) so there
is no way of knowing whether or not a release call is needed...

steved.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-08 18:37:48

by Vince Busam

[permalink] [raw]
Subject: Re: oops in rpc_pipe_release

I tried that with and without the linux-2.6.13-CITI_NFS4_ALL-1.dif patch, and either way
it ends up causing another NULL pointer dereference in __rpc_purge_upcall after an hour or
two.

Vince

Steve Dickson wrote:
>
>
> Vince Busam wrote:
>
>> I'm using NFS3 with kerberos authentication, and 25 hour tickets that
>> refresh when
>> unlocking the screensaver. Over the weekend, it'll hang with one of
>> the following stack
>> traces. Any ideas what could cause this?
>
> I believe this is caused by the fact gss_pipe_release()
> (i.e. rpci->ops->release_pipe(inode)) is being called
> with a freed clnt->cl_auth pointer. I proposed the
> following patch a while back that I thought fixed the
> problem, but Trond said the patch "prevents anyone from
> reopening the pipe after the first close(), so if gssd
> needs to be restarted, then all pipes will forever block."
> So the patch got reverted....
>
>
> --- linux-2.6.13/net/sunrpc/rpc_pipe.c.orig 2005-08-28
> 19:41:01.000000000 -0400
> +++ linux-2.6.13/net/sunrpc/rpc_pipe.c 2005-09-16 11:18:53.598157000
> -0400
> @@ -177,6 +177,8 @@ rpc_pipe_release(struct inode *inode, st
> __rpc_purge_upcall(inode, -EPIPE);
> if (rpci->ops->release_pipe)
> rpci->ops->release_pipe(inode);
> + if (!rpci->nreaders && !rpci->nwriters)
> + rpci->ops = NULL;
> out:
> up(&inode->i_sem);
> return 0;
>
> I think the main problem here is there is no way of telling
> if a rpc_inode is or is not valid (or active) so there
> is no way of knowing whether or not a release call is needed...
>
> steved.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-08 18:58:58

by Steve Dickson

[permalink] [raw]
Subject: Re: oops in rpc_pipe_release



Vince Busam wrote:
> I tried that with and without the linux-2.6.13-CITI_NFS4_ALL-1.dif
> patch, and either way it ends up causing another NULL pointer
> dereference in __rpc_purge_upcall after an hour or two.
See if the attached patch helps... It makes gss_pipe_release()
handles the fact that given pointer that are passed in
could be NULL. This seem to fix the problem I was seeing...

Trond, Is this something that's a bit more palatable? :)

steved.


Attachments:
linux-2.6.14-rpc-gss-oops.patch (553.00 B)

2005-11-08 19:58:25

by Trond Myklebust

[permalink] [raw]
Subject: Re: oops in rpc_pipe_release

On Tue, 2005-11-08 at 13:58 -0500, Steve Dickson wrote:
>
> Vince Busam wrote:
> > I tried that with and without the linux-2.6.13-CITI_NFS4_ALL-1.dif
> > patch, and either way it ends up causing another NULL pointer
> > dereference in __rpc_purge_upcall after an hour or two.
> See if the attached patch helps... It makes gss_pipe_release()
> handles the fact that given pointer that are passed in
> could be NULL. This seem to fix the problem I was seeing...
>
> Trond, Is this something that's a bit more palatable? :)

I'd rather like to find out how this is happening, and fix the root
cause. Your patch seems like a bit too much of a band-aid.

My point is that we should never want to find ourselves in the situation
that the directory is being cleared without the auth code having first
cleaned up and deleted its pipes.

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-08 20:51:50

by Steve Dickson

[permalink] [raw]
Subject: Re: oops in rpc_pipe_release

Trond Myklebust wrote:
>
> I'd rather like to find out how this is happening, and fix the root
> cause. Your patch seems like a bit too much of a band-aid.
Yeah I know its a hack.... I just wanted to make sure it address
the root cause of Vince's problem... Unfortunately I'm a bit
under the gun w.r.t, to deadlines... and an oops is an oops...
so I might have to go with it...

>
> My point is that we should never want to find ourselves in the situation
> that the directory is being cleared without the auth code having first
> cleaned up and deleted its pipes.
Well here is how you should be able to reproduce it
as spelled out in bz 171112:
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171112)

root# mount -o krb5 server:/export /mnt
user$ cd /mnt
root# /bin/service nets stop
root# /bin/service rpcidmapd stop
root# kill -9 $(pgrep -u <user>)


steved.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs