2005-11-21 19:51:25

by Vince Busam

[permalink] [raw]
Subject: [PATCH] Fix typo on __rpc_purge_upcall

I posted this last week. Here's an official style patch.

Vince
------------------------------------------------------------------

Fix an obvious typo that would cause a NULL pointer dereference.

Signed-off-by: Vince Busam <[email protected]>
---

--- linux-2.6.13.4/net/sunrpc/rpc_pipe.c.orig 2005-11-16 16:48:00.000000000 -0800
+++ linux-2.6.13.4/net/sunrpc/rpc_pipe.c 2005-11-16 16:52:23.000000000 -0800
@@ -51,7 +51,7 @@ __rpc_purge_upcall(struct inode *inode,
rpci->ops->destroy_msg(msg);
}
while (!list_empty(&rpci->in_upcall)) {
- msg = list_entry(rpci->pipe.next, struct rpc_pipe_msg, list);
+ msg = list_entry(rpci->in_upcall.next, struct rpc_pipe_msg, list);
list_del_init(&msg->list);
msg->errno = err;
rpci->ops->destroy_msg(msg);


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-11-21 19:55:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

On Mon, 2005-11-21 at 11:51 -0800, Vince Busam wrote:
> I posted this last week. Here's an official style patch.

I've already put a fix into the latest NFS_ALL. See

http://client.linux-nfs.org/Linux-2.6.x/2.6.15-rc2/linux-2.6.15-06-rpc_pipe_fix_cleanup.dif

Thanks!
Trond




-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-21 21:51:39

by Vince Busam

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

Trond Myklebust wrote:
> On Mon, 2005-11-21 at 11:51 -0800, Vince Busam wrote:
>
> http://client.linux-nfs.org/Linux-2.6.x/2.6.15-rc2/linux-2.6.15-06-rpc_pipe_fix_cleanup.dif
>

That looks good to me. After testing this fix for a week, I haven't gotten an Oops, but
the system still locks up. The only relevant log message is about an upcall timing out.

Nov 20 00:19:00 dig kernel: RPC: AUTH_GSS upcall timed out.
Nov 20 00:19:00 dig kernel: Please check user daemon is running!

Vince


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-21 22:34:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

On Mon, 2005-11-21 at 13:51 -0800, Vince Busam wrote:
> Trond Myklebust wrote:
> > On Mon, 2005-11-21 at 11:51 -0800, Vince Busam wrote:
> >
> > http://client.linux-nfs.org/Linux-2.6.x/2.6.15-rc2/linux-2.6.15-06-rpc_pipe_fix_cleanup.dif
> >
>
> That looks good to me. After testing this fix for a week, I haven't gotten an Oops, but
> the system still locks up. The only relevant log message is about an upcall timing out.
>
> Nov 20 00:19:00 dig kernel: RPC: AUTH_GSS upcall timed out.
> Nov 20 00:19:00 dig kernel: Please check user daemon is running!

What kernel is this? There was a patch from Steve that caused this type
of behaviour in some 2.6.14 CITI_ALL patches. That patch has since been
removed.

Cheers,
Trond



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-21 23:00:01

by Vince Busam

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

Trond Myklebust wrote:
> On Mon, 2005-11-21 at 13:51 -0800, Vince Busam wrote:
>
>>Trond Myklebust wrote:
>>
>>>On Mon, 2005-11-21 at 11:51 -0800, Vince Busam wrote:
>>>
>>>http://client.linux-nfs.org/Linux-2.6.x/2.6.15-rc2/linux-2.6.15-06-rpc_pipe_fix_cleanup.dif
>>>
>>
>>That looks good to me. After testing this fix for a week, I haven't gotten an Oops, but
>>the system still locks up. The only relevant log message is about an upcall timing out.
>>
>>Nov 20 00:19:00 dig kernel: RPC: AUTH_GSS upcall timed out.
>>Nov 20 00:19:00 dig kernel: Please check user daemon is running!
>
>
> What kernel is this? There was a patch from Steve that caused this type
> of behaviour in some 2.6.14 CITI_ALL patches. That patch has since been
> removed.

This is 2.6.13.4, with the __rpc_purge_upcall patch, linux-2.6.13-CITI_NFS4_ALL-1.dif, and
an ugly patch that I don't remember why I'm using.

--- linux-2.6.8/net/sunrpc/auth_gss/auth_gss.c 2004-08-13 22:36:57.000000000 -0700
+++ linux-2.6.8-new/net/sunrpc/auth_gss/auth_gss.c 2004-08-24 14:44:40.887239458 -0700
@@ -515,6 +515,8 @@

clnt = rpci->private;
auth = clnt->cl_auth;
+ if (auth == NULL)
+ return;
gss_auth = container_of(auth, struct gss_auth, rpc_auth);
spin_lock(&gss_auth->lock);
while (!list_empty(&gss_auth->upcalls)) {


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-21 23:07:30

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

On Mon, 2005-11-21 at 14:59 -0800, Vince Busam wrote:
> Trond Myklebust wrote:
> > On Mon, 2005-11-21 at 13:51 -0800, Vince Busam wrote:
> >
> >>Trond Myklebust wrote:
> >>
> >>>On Mon, 2005-11-21 at 11:51 -0800, Vince Busam wrote:
> >>>
> >>>http://client.linux-nfs.org/Linux-2.6.x/2.6.15-rc2/linux-2.6.15-06-rpc_pipe_fix_cleanup.dif
> >>>
> >>
> >>That looks good to me. After testing this fix for a week, I haven't gotten an Oops, but
> >>the system still locks up. The only relevant log message is about an upcall timing out.
> >>
> >>Nov 20 00:19:00 dig kernel: RPC: AUTH_GSS upcall timed out.
> >>Nov 20 00:19:00 dig kernel: Please check user daemon is running!
> >
> >
> > What kernel is this? There was a patch from Steve that caused this type
> > of behaviour in some 2.6.14 CITI_ALL patches. That patch has since been
> > removed.
>
> This is 2.6.13.4, with the __rpc_purge_upcall patch, linux-2.6.13-CITI_NFS4_ALL-1.dif, and
> an ugly patch that I don't remember why I'm using.
>
> --- linux-2.6.8/net/sunrpc/auth_gss/auth_gss.c 2004-08-13 22:36:57.000000000 -0700
> +++ linux-2.6.8-new/net/sunrpc/auth_gss/auth_gss.c 2004-08-24 14:44:40.887239458 -0700
> @@ -515,6 +515,8 @@
>
> clnt = rpci->private;
> auth = clnt->cl_auth;
> + if (auth == NULL)
> + return;
> gss_auth = container_of(auth, struct gss_auth, rpc_auth);
> spin_lock(&gss_auth->lock);
> while (!list_empty(&gss_auth->upcalls)) {

Could you revert that patch, and just add the one from

http://client.linux-nfs.org/Linux-2.6.x/2.6.14/linux-2.6.14-88-rpcsec_gss_fix.dif

That should bring you up to the rpc_pipefs from 2.6.14.

Cheers
Trond



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-28 18:17:06

by Vince Busam

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

Trond Myklebust wrote:
>
> Could you revert that patch, and just add the one from
>
> http://client.linux-nfs.org/Linux-2.6.x/2.6.14/linux-2.6.14-88-rpcsec_gss_fix.dif
>

I got an Oops I haven't seen before. (2.6.13.4 + linux-2.6.13-CITI_NFS4_ALL-1.dif +
linux-2.6.14-88-rpcsec_gss_fix.dif + linux-2.6.15-06-rpc_pipe_fix_cleanup.dif)

Nov 26 00:05:36 dig kernel: Unable to handle kernel NULL pointer dereference at
virtual address 00000000
Nov 26 00:05:36 dig kernel: printing eip:
Nov 26 00:05:36 dig kernel: f8ad94ad
Nov 26 00:05:36 dig kernel: *pde = 00000000
Nov 26 00:05:36 dig kernel: Oops: 0002 [#1]
Nov 26 00:05:36 dig kernel: PREEMPT SMP
Nov 26 00:05:36 dig kernel: Modules linked in: des binfmt_misc cpufreq_userspace
cpufreq_ondemand cpufreq_powersave autofs4 video button battery container ac nfs lockd
af_packet tg3 snd_intel8x0 snd_ac97_codec ata_piix libata snd_usb_audio
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_usb_lib snd_rawmidi
snd_seq_device snd_hwdep snd soundcore pwc videodev v4l2_common uhci_hcd pci_hotplug
intel_agp floppy pcspkr rtc sd_mod tsdev usbhid usb_storage scsi_mod evdev md_mod dm_mod
nvidia agpgart psmouse mousedev parport_pc lp parport ide_cd cdrom rpcsec_gss_krb5
auth_rpcgss sunrpc ehci_hcd usbcore ext3 jbd mbcache ide_disk ide_generic via82cxxx trm290
triflex slc90e66 sis5513 siimage serverworks sc1200 rz1000 piix pdc202xx_old opti621
ns87415 hpt366 hpt34x generic cy82c693 cs5530 cs5520 cmd64x atiixp amd74xx alim15x3
aec62xx pdc202xx_new ide_core unix thermal processor fan
Nov 26 00:05:36 dig kernel: CPU: 0
Nov 26 00:05:36 dig kernel: EIP: 0060:[<f8ad94ad>] Tainted: P VLI
Nov 26 00:05:36 dig kernel: EFLAGS: 00010287 (2.6.13.4-gg5vb5)
Nov 26 00:05:36 dig kernel: EIP is at rpc_pipe_read+0xad/0x130 [sunrpc]
Nov 26 00:05:36 dig kernel: eax: 00000000 ebx: f5470b08 ecx: f5e1a88c edx: 00000000
Nov 26 00:05:36 dig kernel: esi: f5e1a700 edi: f55e3c80 ebp: 00000000 esp: f5b97f4c
Nov 26 00:05:36 dig kernel: ds: 007b es: 007b ss: 0068
Nov 26 00:05:36 dig kernel: Process rpc.gssd (pid: 7243, threadinfo=f5b96000 task=c22ba540)
Nov 26 00:05:36 dig kernel: Stack: e9a3f00c c0305200 e9a3f008 e9a3f008 00000004
f55e3c80 bff5dab4 00000000
Nov 26 00:05:36 dig kernel: c0165a03 f55e3c80 bff5dab4 00000004 f5b97fa4 f55e3c80 fffffff7
00000004
Nov 26 00:05:36 dig kernel: f5b96000 c0165df1 f55e3c80 bff5dab4 00000004 f5b97fa4 00000000
00000000
Nov 26 00:05:36 dig kernel: Call Trace:
Nov 26 00:05:36 dig kernel: [<c0165a03>] vfs_read+0xf3/0x1b0
Nov 26 00:05:36 dig kernel: [<c0165df1>] sys_read+0x51/0x80
Nov 26 00:05:36 dig kernel: [<c010316b>] sysenter_past_esp+0x54/0x75
Nov 26 00:05:36 dig kernel: Code: 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 8b 96 84 01 00
00 8d 86 84 01 00 00 39 c2 74 d0 89 d3 8b 52 04 8b 03 8d 8e 8c 01 00
00 <89> 02 89 50 04 8b 86 8c 01 00 00 89 58 04 89 03 89 4b 04 8b 86


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-28 18:53:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

On Mon, 2005-11-28 at 10:16 -0800, Vince Busam wrote:
> Trond Myklebust wrote:
> >
> > Could you revert that patch, and just add the one from
> >
> > http://client.linux-nfs.org/Linux-2.6.x/2.6.14/linux-2.6.14-88-rpcsec_gss_fix.dif
> >
>
> I got an Oops I haven't seen before. (2.6.13.4 + linux-2.6.13-CITI_NFS4_ALL-1.dif +
> linux-2.6.14-88-rpcsec_gss_fix.dif + linux-2.6.15-06-rpc_pipe_fix_cleanup.dif)
>
> Nov 26 00:05:36 dig kernel: Unable to handle kernel NULL pointer dereference at
> virtual address 00000000
> Nov 26 00:05:36 dig kernel: printing eip:
> Nov 26 00:05:36 dig kernel: f8ad94ad
> Nov 26 00:05:36 dig kernel: *pde = 00000000
> Nov 26 00:05:36 dig kernel: Oops: 0002 [#1]
> Nov 26 00:05:36 dig kernel: PREEMPT SMP
> Nov 26 00:05:36 dig kernel: Modules linked in: des binfmt_misc cpufreq_userspace
> cpufreq_ondemand cpufreq_powersave autofs4 video button battery container ac nfs lockd
> af_packet tg3 snd_intel8x0 snd_ac97_codec ata_piix libata snd_usb_audio
> snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_usb_lib snd_rawmidi
> snd_seq_device snd_hwdep snd soundcore pwc videodev v4l2_common uhci_hcd pci_hotplug
> intel_agp floppy pcspkr rtc sd_mod tsdev usbhid usb_storage scsi_mod evdev md_mod dm_mod
> nvidia agpgart psmouse mousedev parport_pc lp parport ide_cd cdrom rpcsec_gss_krb5
> auth_rpcgss sunrpc ehci_hcd usbcore ext3 jbd mbcache ide_disk ide_generic via82cxxx trm290
> triflex slc90e66 sis5513 siimage serverworks sc1200 rz1000 piix pdc202xx_old opti621
> ns87415 hpt366 hpt34x generic cy82c693 cs5530 cs5520 cmd64x atiixp amd74xx alim15x3
> aec62xx pdc202xx_new ide_core unix thermal processor fan
> Nov 26 00:05:36 dig kernel: CPU: 0
> Nov 26 00:05:36 dig kernel: EIP: 0060:[<f8ad94ad>] Tainted: P VLI
> Nov 26 00:05:36 dig kernel: EFLAGS: 00010287 (2.6.13.4-gg5vb5)
> Nov 26 00:05:36 dig kernel: EIP is at rpc_pipe_read+0xad/0x130 [sunrpc]
> Nov 26 00:05:36 dig kernel: eax: 00000000 ebx: f5470b08 ecx: f5e1a88c edx: 00000000
> Nov 26 00:05:36 dig kernel: esi: f5e1a700 edi: f55e3c80 ebp: 00000000 esp: f5b97f4c
> Nov 26 00:05:36 dig kernel: ds: 007b es: 007b ss: 0068
> Nov 26 00:05:36 dig kernel: Process rpc.gssd (pid: 7243, threadinfo=f5b96000 task=c22ba540)
> Nov 26 00:05:36 dig kernel: Stack: e9a3f00c c0305200 e9a3f008 e9a3f008 00000004
> f55e3c80 bff5dab4 00000000
> Nov 26 00:05:36 dig kernel: c0165a03 f55e3c80 bff5dab4 00000004 f5b97fa4 f55e3c80 fffffff7
> 00000004
> Nov 26 00:05:36 dig kernel: f5b96000 c0165df1 f55e3c80 bff5dab4 00000004 f5b97fa4 00000000
> 00000000
> Nov 26 00:05:36 dig kernel: Call Trace:
> Nov 26 00:05:36 dig kernel: [<c0165a03>] vfs_read+0xf3/0x1b0
> Nov 26 00:05:36 dig kernel: [<c0165df1>] sys_read+0x51/0x80
> Nov 26 00:05:36 dig kernel: [<c010316b>] sysenter_past_esp+0x54/0x75
> Nov 26 00:05:36 dig kernel: Code: 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 8b 96 84 01 00
> 00 8d 86 84 01 00 00 39 c2 74 d0 89 d3 8b 52 04 8b 03 8d 8e 8c 01 00
> 00 <89> 02 89 50 04 8b 86 8c 01 00 00 89 58 04 89 03 89 4b 04 8b 86

Argh... Yep. Looks like the "fix" to ensure that we purge
rpci->in_upcall was wrong. Does the following patch fix it?

Cheers,
Trond
-------

SUNRPC: Remove redundant list rpci->in_upcall.

The elements on rpci->in_upcall are tracked by the filp->private_data,
which will ensure that they get released when the file is closed.

Note that early purging of the elements on that list was responsible for a
potential Oops...

Signed-off-by: Trond Myklebust <[email protected]>
---

include/linux/sunrpc/rpc_pipe_fs.h | 1 -
net/sunrpc/rpc_pipe.c | 5 +----
2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/rpc_pipe_fs.h b/include/linux/sunrpc/rpc_pipe_fs.h
index 6392934..ee353f2 100644
--- a/include/linux/sunrpc/rpc_pipe_fs.h
+++ b/include/linux/sunrpc/rpc_pipe_fs.h
@@ -22,7 +22,6 @@ struct rpc_inode {
struct inode vfs_inode;
void *private;
struct list_head pipe;
- struct list_head in_upcall;
int pipelen;
int nreaders;
int nwriters;
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index e3b242d..eb240b6 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -38,7 +38,7 @@ static kmem_cache_t *rpc_inode_cachep __

#define RPC_UPCALL_TIMEOUT (30*HZ)

-static void
+static inline void
__rpc_purge_list(struct rpc_inode *rpci, struct list_head *head, int err)
{
struct rpc_pipe_msg *msg;
@@ -59,7 +59,6 @@ __rpc_purge_upcall(struct inode *inode,
struct rpc_inode *rpci = RPC_I(inode);

__rpc_purge_list(rpci, &rpci->pipe, err);
- __rpc_purge_list(rpci, &rpci->in_upcall, err);
rpci->pipelen = 0;
wake_up(&rpci->waitq);
}
@@ -210,7 +209,6 @@ rpc_pipe_read(struct file *filp, char __
msg = list_entry(rpci->pipe.next,
struct rpc_pipe_msg,
list);
- list_move(&msg->list, &rpci->in_upcall);
rpci->pipelen -= msg->len;
filp->private_data = msg;
msg->copied = 0;
@@ -814,7 +812,6 @@ init_once(void * foo, kmem_cache_t * cac
rpci->private = NULL;
rpci->nreaders = 0;
rpci->nwriters = 0;
- INIT_LIST_HEAD(&rpci->in_upcall);
INIT_LIST_HEAD(&rpci->pipe);
rpci->pipelen = 0;
init_waitqueue_head(&rpci->waitq);




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-12 18:57:26

by Vince Busam

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

I applied this patch from 2.6.15-rc5, and got the following oops. I really wish I could
reproduce this faster, but it still only happens over the weekend when my credentials have
expired. Letting them expire during the week doesn't reproduce it.

--- e3b242daf53c64506f9ba77937a94bb544bcefe6
+++ c76ea221798caf96666ef99ac3ce5c1694c832b7
@@ -59,7 +59,6 @@ __rpc_purge_upcall(struct inode *inode,
struct rpc_inode *rpci = RPC_I(inode);

__rpc_purge_list(rpci, &rpci->pipe, err);
- __rpc_purge_list(rpci, &rpci->in_upcall, err);
rpci->pipelen = 0;
wake_up(&rpci->waitq);
}
@@ -119,6 +118,7 @@ rpc_close_pipes(struct inode *inode)
down(&inode->i_sem);
if (rpci->ops != NULL) {
rpci->nreaders = 0;
+ __rpc_purge_list(rpci, &rpci->in_upcall, -EPIPE);
__rpc_purge_upcall(inode, -EPIPE);
rpci->nwriters = 0;
if (rpci->ops->release_pipe)


Dec 11 13:53:28 block kernel: RPC: AUTH_GSS upcall timed out.
Dec 11 13:53:28 block kernel: Please check user daemon is running!
Dec 11 13:53:43 block kernel: RPC: AUTH_GSS upcall timed out.
Dec 11 13:53:43 block kernel: Please check user daemon is running!
Dec 11 13:53:43 block kernel: Unable to handle kernel NULL pointer dereference at virtual
address 00000004
Dec 11 13:53:43 block kernel: printing eip:
Dec 11 13:53:43 block kernel: f8ad1d55
Dec 11 13:53:43 block kernel: *pde = 00000000
Dec 11 13:53:43 block kernel: Oops: 0002 [#1]
Dec 11 13:53:43 block kernel: PREEMPT SMP
Dec 11 13:53:43 block kernel: Modules linked in: ext2 loop des binfmt_misc
cpufreq_userspace cpufreq_ondemand cpufreq_powersave autofs4 video button battery
container ac capability commoncap nfs lockd af_packet tg3 generic piix snd_intel8x0
snd_usb_audio snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_usb_lib
snd_rawmidi snd_seq_device snd_hwdep snd soundcore snd_page_alloc pwc videodev v4l2_common
uhci_hcd pci_hotplug floppy pcspkr rtc tsdev usbhid evdev md_mod dm_mod nvidia agpgart
psmouse mousedev parport_pc lp parport ide_generic ide_disk ide_cd cdrom rpcsec_gss_krb5
auth_rpcgss sunrpc ehci_hcd ext3 jbd mbcache ahci sd_mod ata_piix libata usb_storage
usbcore scsi_mod ide_core unix thermal processor fan
Dec 11 13:53:43 block kernel: CPU: 1
Dec 11 13:53:43 block kernel: EIP: 0060:[<f8ad1d55>] Tainted: P VLI
Dec 11 13:53:43 block kernel: EFLAGS: 00010287 (2.6.13.4-gg5vb8)
Dec 11 13:53:43 block kernel: EIP is at __rpc_purge_list+0x35/0x60 [sunrpc]
Dec 11 13:53:43 block kernel: eax: 00000000 ebx: ebcdc684 ecx: ea628908 edx: 00000000
Dec 11 13:53:43 block kernel: esi: f890ece0 edi: ffffffe0 ebp: ebcdc500 esp: ebac7f1c
Dec 11 13:53:43 block kernel: ds: 007b es: 007b ss: 0068
Dec 11 13:53:43 block kernel: Process rpc.gssd (pid: 7196, threadinfo=ebac6000 task=dfe61540)
Dec 11 13:53:43 block kernel: Stack: ea628900 ebcdc500 ffffffe0 ebcdc500 f8ad1dad ebcdc500
ebcdc684 ffffffe0
Dec 11 13:53:43 block kernel: ebcdc500 ea20ea80 f8ad213b ebcdc500 ffffffe0 00000008
ea20ea80 ebcdaf00
Dec 11 13:53:43 block kernel: c01675fa ebcdc500 ea20ea80 00000000 00000000 ebba9d40
ea20ea80 dfb06080
Dec 11 13:53:43 block kernel: Call Trace:
Dec 11 13:53:43 block kernel: [<f8ad1dad>] __rpc_purge_upcall+0x2d/0x80 [sunrpc]Dec 11
13:53:43 block kernel: [<f8ad213b>] rpc_pipe_release+0xcb/0xf0 [sunrpc]
Dec 11 13:53:43 block kernel: [<c01675fa>] __fput+0x18a/0x1d0
Dec 11 13:53:43 block kernel: [<c0165906>] filp_close+0x46/0x90
Dec 11 13:53:43 block kernel: [<c01659ba>] sys_close+0x6a/0xa0
Dec 11 13:53:43 block kernel: [<c010316b>] sysenter_past_esp+0x54/0x75
Dec 11 13:53:43 block kernel: Code: 8b 44 24 14 8b 7c 24 1c 8b 0b 8b 80 b4 01 00 00 39 d9
8b 70 0c 74 2c eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 8b 51 04 8b 01 <89> 50 04 89
02 89 49 04 89 09 89 79 14 89 0c 24 ff d6 8b 0b 39


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-12 19:19:30

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

On Mon, 2005-12-12 at 10:57 -0800, Vince Busam wrote:
> I applied this patch from 2.6.15-rc5, and got the following oops. I really wish I could
> reproduce this faster, but it still only happens over the weekend when my credentials have
> expired. Letting them expire during the week doesn't reproduce it.

Could you send us the contents of rpc_close_pipes() and
rpc_pipe_release()?

I cannot see how rpc_pipe_release can be calling __rpc_purge_upcall with
a null entry for rpci->ops: the inode->i_sem should be protecting it
from changing.

Cheers,
Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-12 20:33:20

by Vince Busam

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

Trond Myklebust wrote:
> On Mon, 2005-12-12 at 10:57 -0800, Vince Busam wrote:
>
>>I applied this patch from 2.6.15-rc5, and got the following oops. I really wish I could
>>reproduce this faster, but it still only happens over the weekend when my credentials have
>>expired. Letting them expire during the week doesn't reproduce it.
>
>
> Could you send us the contents of rpc_close_pipes() and
> rpc_pipe_release()?
>
> I cannot see how rpc_pipe_release can be calling __rpc_purge_upcall with
> a null entry for rpci->ops: the inode->i_sem should be protecting it
> from changing.

static void
rpc_close_pipes(struct inode *inode)
{
struct rpc_inode *rpci = RPC_I(inode);

cancel_delayed_work(&rpci->queue_timeout);
flush_scheduled_work();
down(&inode->i_sem);
if (rpci->ops != NULL) {
rpci->nreaders = 0;
__rpc_purge_list(rpci, &rpci->in_upcall, -EPIPE);
__rpc_purge_upcall(inode, -EPIPE);
rpci->nwriters = 0;
if (rpci->ops->release_pipe)
rpci->ops->release_pipe(inode);
rpci->ops = NULL;
}
rpc_inode_setowner(inode, NULL);
up(&inode->i_sem);
}

static int
rpc_pipe_release(struct inode *inode, struct file *filp)
{
struct rpc_inode *rpci = RPC_I(filp->f_dentry->d_inode);
struct rpc_pipe_msg *msg;

down(&inode->i_sem);
if (rpci->ops == NULL)
goto out;
msg = (struct rpc_pipe_msg *)filp->private_data;
if (msg != NULL) {
msg->errno = -EPIPE;
list_del_init(&msg->list);
rpci->ops->destroy_msg(msg);
}
if (filp->f_mode & FMODE_WRITE)
rpci->nwriters --;
if (filp->f_mode & FMODE_READ)
rpci->nreaders --;
if (!rpci->nreaders)
__rpc_purge_upcall(inode, -EPIPE);
if (rpci->ops->release_pipe)
rpci->ops->release_pipe(inode);
out:
up(&inode->i_sem);
return 0;
}


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-12 23:52:12

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

SUNRPC: Fix a potential race in rpc_pipefs.

Signed-off-by: Trond Myklebust <[email protected]>
---

net/sunrpc/rpc_pipe.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index c76ea22..511647e 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -70,8 +70,11 @@ rpc_timeout_upcall_queue(void *data)
struct inode *inode = &rpci->vfs_inode;

down(&inode->i_sem);
+ if (rpci->ops == NULL)
+ goto out;
if (rpci->nreaders == 0 && !list_empty(&rpci->pipe))
__rpc_purge_upcall(inode, -ETIMEDOUT);
+out:
up(&inode->i_sem);
}

@@ -113,8 +116,6 @@ rpc_close_pipes(struct inode *inode)
{
struct rpc_inode *rpci = RPC_I(inode);

- cancel_delayed_work(&rpci->queue_timeout);
- flush_scheduled_work();
down(&inode->i_sem);
if (rpci->ops != NULL) {
rpci->nreaders = 0;
@@ -127,6 +128,8 @@ rpc_close_pipes(struct inode *inode)
}
rpc_inode_setowner(inode, NULL);
up(&inode->i_sem);
+ cancel_delayed_work(&rpci->queue_timeout);
+ flush_scheduled_work();
}

static struct inode *
@@ -166,7 +169,7 @@ rpc_pipe_open(struct inode *inode, struc
static int
rpc_pipe_release(struct inode *inode, struct file *filp)
{
- struct rpc_inode *rpci = RPC_I(filp->f_dentry->d_inode);
+ struct rpc_inode *rpci = RPC_I(inode);
struct rpc_pipe_msg *msg;

down(&inode->i_sem);


Attachments:
linux-2.6.15-37-fix_rpc_pipefs_race.dif (1.35 kB)

2005-12-05 21:04:09

by Vince Busam

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

Trond Myklebust wrote:
>
> Argh... Yep. Looks like the "fix" to ensure that we purge
> rpci->in_upcall was wrong. Does the following patch fix it?

I got another oops in __rpc_purge_upcall, which looks like this after applying the
patches. Looks like rcpi must have been NULL, but I'll defer to the experts here.

static void
__rpc_purge_upcall(struct inode *inode, int err)
{
struct rpc_inode *rpci = RPC_I(inode);

__rpc_purge_list(rpci, &rpci->pipe, err);
rpci->pipelen = 0;
wake_up(&rpci->waitq);
}

Dec 4 13:09:59 block kernel: RPC: AUTH_GSS upcall timed out.
Dec 4 13:09:59 block kernel: Please check user daemon is running!
Dec 4 13:10:12 block kernel: Unable to handle kernel NULL pointer dereference at virtual
address 00000004
Dec 4 13:10:12 block kernel: printing eip:
Dec 4 13:10:12 block kernel: f8a98d55
Dec 4 13:10:12 block kernel: *pde = 00000000
Dec 4 13:10:12 block kernel: Oops: 0002 [#1]
Dec 4 13:10:12 block kernel: PREEMPT SMP
Dec 4 13:10:12 block kernel: Modules linked in: des tsdev usbhid vmnet vmmon binfmt_misc
cpufreq_userspace cpufreq_ondemand cpufreq_powersave autofs4 video button battery
container ac capability commoncap nfs lockd af_packet tg3 generic piix snd_intel8x0
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc
uhci_hcd pci_hotplug floppy pcspkr rtc md_mod evdev dm_mod nvidia agpgart psmouse mousedev
parport_pc lp parport ide_generic ide_disk ide_cd cdrom ide_core rpcsec_gss_krb5
auth_rpcgss sunrpc ehci_hcd usbcore ext3 jbd mbcache ahci sd_mod ata_piix libata scsi_mod
unix thermal processor fan
Dec 4 13:10:12 block kernel: CPU: 1
Dec 4 13:10:12 block kernel: EIP: 0060:[<f8a98d55>] Tainted: P VLI
Dec 4 13:10:12 block kernel: EFLAGS: 00010202 (2.6.13.4-gg5vb7)
Dec 4 13:10:12 block kernel: EIP is at __rpc_purge_upcall+0x35/0x80 [sunrpc]
Dec 4 13:10:12 block kernel: eax: 00000000 ebx: c2bcec84 ecx: d16e1688 edx: 00000000
Dec 4 13:10:12 block kernel: esi: c2bceb00 edi: f88b5ce0 ebp: ffffffe0 esp: eea1bf30
Dec 4 13:10:12 block kernel: ds: 007b es: 007b ss: 0068
Dec 4 13:10:12 block kernel: Process rpc.gssd (pid: 5833, threadinfo=eea1a000 task=ef353020)
Dec 4 13:10:12 block kernel: Stack: d16e1680 c2bceb00 cf453380 c2bceb00 c2bceb00 f8a990cb
c2bceb00 ffffffe0
Dec 4 13:10:12 block kernel: 00000008 cf453380 eea94800 c01675fa c2bceb00 cf453380
00000000 00000000
Dec 4 13:10:12 block kernel: d16a28c0 cf453380 ef02b300 00000000 cf453380 c0165906
cf453380 ef02b300
Dec 4 13:10:12 block kernel: Call Trace:
Dec 4 13:10:12 block kernel: [<f8a990cb>] rpc_pipe_release+0xcb/0xf0 [sunrpc]
Dec 4 13:10:12 block kernel: [<c01675fa>] __fput+0x18a/0x1d0
Dec 4 13:10:12 block kernel: [<c0165906>] filp_close+0x46/0x90
Dec 4 13:10:12 block kernel: [<c01659ba>] sys_close+0x6a/0xa0
Dec 4 13:10:12 block kernel: [<c010316b>] sysenter_past_esp+0x54/0x75
Dec 4 13:10:12 block kernel: Code: 18 8b 6c 24 1c 8b 86 ac 01 00 00 8d 9e 84 01 00 00 8b
78 0c 8b 86 84 01 00 00 39 d8 74 25 89 c1 8d b6 00 00 00 00 8b 51 04 8b 01 <89> 50 04 89
02 89 49 04 89 09 89 69 14 89 0c 24 ff d7 8b 0b 39
Dec 5 10:59:31 block kernel: x55/0xb0

Vince


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-01-05 22:30:49

by Vince Busam

[permalink] [raw]
Subject: Re: [PATCH] Fix typo on __rpc_purge_upcall

Trond Myklebust wrote:
> On Mon, 2005-12-12 at 12:33 -0800, Vince Busam wrote:
>
>>Trond Myklebust wrote:
>>
>>>Could you send us the contents of rpc_close_pipes() and
>>>rpc_pipe_release()?
>>>
>
>
> Hmm.... Looks correct. The only potential races I can see should be
> fixed by the following patch. Can you apply and then try again?
>

I'm still got an oops after applying that patch (it still takes a long time for it to
occur, this happened over the break with expired credentials).

Dec 24 01:07:43 block kernel: RPC: AUTH_GSS upcall timed out.
Dec 24 01:07:43 block kernel: Please check user daemon is running!
Dec 24 01:07:45 block kernel: Unable to handle kernel NULL pointer dereference at virtual
address 00000004
Dec 24 01:07:45 block kernel: printing eip:
Dec 24 01:07:45 block kernel: f8ad1d4b
Dec 24 01:07:45 block kernel: *pde = 00000000
Dec 24 01:07:45 block kernel: Oops: 0002 [#1]
Dec 24 01:07:45 block kernel: PREEMPT SMP
Dec 24 01:07:45 block kernel: Modules linked in: des binfmt_misc cpufreq_userspace
cpufreq_ondemand cpufreq_powersave autofs4 video button battery container ac
capability commoncap nfs lockd af_packet tg3 generic piix snd_intel8x0 snd_ac97_codec
snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_usb_lib
snd_rawmidi snd_seq_device snd_hwdep snd soundcore pwc videodev v4l2_common uhci_hcd
pci_hotplug floppy pcspkr rtc tsdev evdev usbhid md_mod dm_mod nvidia agpgart psmouse
mousedev parport_pc lp parport ide_generic ide_disk ide_cd cdrom rpcsec_gss_krb5
auth_rpcgss sunrpc ehci_hcd ext3 jbd mbcache ahci sd_mod
ata_piix libata usb_storage usbcore scsi_mod ide_core unix thermal processor fanDec 24
01:07:45 block kernel: CPU: 1
Dec 24 01:07:45 block kernel: EIP: 0060:[<f8ad1d4b>] Tainted: P VLI
Dec 24 01:07:45 block kernel: EFLAGS: 00010286 (2.6.13.4-gg5vb9)
Dec 24 01:07:45 block kernel: EIP is at __rpc_purge_list+0x2b/0xc0 [sunrpc]
Dec 24 01:07:45 block kernel: eax: 00000000 ebx: c877de88 ecx: c877dea0 edx: 00000000
Dec 24 01:07:45 block kernel: esi: ec69e684 edi: f890ece0 ebp: ffffffe0 esp: ebd4ff14
Dec 24 01:07:45 block kernel: ds: 007b es: 007b ss: 0068
Dec 24 01:07:45 block kernel: Process rpc.gssd (pid: 7410, threadinfo=ebd4e000 task=ec48c540)
Dec 24 01:07:45 block kernel: Stack: c877de80 00000002 d646d440 ec69e500 ffffffe0 ec68fa00
ec69e500 f8ad1e15
Dec 24 01:07:45 block kernel: ec69e500 ec69e684 ffffffe0 ec69e500 e584bc80 f8ad21be
ec69e500 ffffffe0
Dec 24 01:07:45 block kernel: 00000008 e584bc80 c01675fa ec69e500 e584bc80 00000000
00000000 ebe667a0
Dec 24 01:07:45 block kernel: Call Trace:
Dec 24 01:07:45 block kernel: [<f8ad1e15>] __rpc_purge_upcall+0x35/0xb0 [sunrpc]Dec 24
01:07:45 block kernel: [<f8ad21be>] rpc_pipe_release+0xae/0xd0 [sunrpc]
Dec 24 01:07:45 block kernel: [<c01675fa>] __fput+0x18a/0x1d0
Dec 24 01:07:45 block kernel: [<c0165906>] filp_close+0x46/0x90
Dec 24 01:07:45 block kernel: [<c01659ba>] sys_close+0x6a/0xa0
Dec 24 01:07:45 block kernel: [<c010316b>] sysenter_past_esp+0x54/0x75
Dec 24 01:07:45 block kernel: Code: 55 57 56 53 83 ec 0c 8b 5c 24 20 8b 74 24 24 8b 6c 24
28 85 db 74 78 85 f6 74 54 8b 83 b4 01 00 00 8b 78 0c eb 17 8b 53 04 8b 03 <89> 50 04 89
02 89 5b 04 89 1b 89 6b 14 89 1c 24 ff d7 8b 1e 39

After dissassembling the code, it appears this is happening in list_del_init(&msg->list)
in __rpc_purge_list(), in the first line of the inlined function __list_del().

Vince


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs