2003-04-29 09:18:31

by Brasseur Valéry

[permalink] [raw]
Subject: linux 2.4.20 oops

I have got this oops in my kernel (2.4.20 + NFS-ALL )
note : the oops are not log in syslog (don't know why !)
but the oops seems nfs-related, any ideas ?

Warning (compare_maps): mismatch on symbol ixp_mac_cache_timer_active ,
rxp.2.4.20-xfs.mp says f8b3161c, /usr/local/resonate/etc/rxp.2.4.20-xfs.mp
says f8b31404. Ignoring /usr/local/resonate/etc/rxp.2.4.20-xfs.mp entry
Reading Oops report from the terminal
Call Trace: [<c02b8dd4>] [<c02ba007>] [<c02b9f90>] [<c012268f>] [<c011e7cf>]
[<c010a30b>]
[<c0106d7c>] [<c0106dcb>] [<c011a019>] [<c011a29b>] [<c011a10>]
Code: 8b 40 2c 47 89 7c 24 10 b9 08 00 00 00 83 f8 09 0f 4c c8 b8
Using defaults from ksymoops -t elf32-i386 -a i386

Trace; c02b8dd4 <rpc_restart_call+1fec/28ec>
Trace; c02ba007 <xprt_destroy+3b3/80c>
Trace; c02b9f90 <xprt_destroy+33c/80c>
Trace; c012268f <del_timer_sync+75f/9c0>
Trace; c011e7cf <do_softirq+6f/cc>
Trace; c010a30b <enable_irq+17f/190>
Trace; c0106d7c <enable_hlt+34/150>
Trace; c0106dcb <enable_hlt+83/150>
Trace; c011a019 <__out_of_line_bug+579/5e8>
Trace; c011a29b <acquire_console_sem+d3/100>
Trace; c011a1a0 <printk+118/140>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 8b 40 2c mov 0x2c(%eax),%eax
Code; 00000003 Before first symbol
3: 47 inc %edi
Code; 00000004 Before first symbol
4: 89 7c 24 10 mov %edi,0x10(%esp,1)
Code; 00000008 Before first symbol
8: b9 08 00 00 00 mov $0x8,%ecx
Code; 0000000d Before first symbol
d: 83 f8 09 cmp $0x9,%eax
Code; 00000010 Before first symbol
10: 0f 4c c8 cmovl %eax,%ecx
Code; 00000013 Before first symbol
13: b8 00 00 00 00 mov $0x0,%eax

thanks in advance
valery


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-04-29 09:45:22

by Danny Smith

[permalink] [raw]
Subject: Re: linux 2.4.20 oops

Brasseur Val=E9ry wrote:

>I have got this oops in my kernel (2.4.20 + NFS-ALL )
>note : the oops are not log in syslog (don't know why !)
>but the oops seems nfs-related, any ideas ?
>
We had almost identical oopses with the same setup on our dual render box=
es,
which would often panic almost immediately afterwards.

We used this patch from Ulrich Weigand ([email protected]=
e), and
haven't seen a problem since. See the archives for full details - basical=
ly seems
to be an SMP race in rpc_delete_timer(). If you're not on an SMP system,
it's probably NOT the right fix.

HTH,

Danny

Index: net/sunrpc/sched.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/cvs/linux-2.3/net/sunrpc/sched.c,v
retrieving revision 1.13
diff -u -p -r1.13 sched.c
--- net/sunrpc/sched.c 3 May 2001 16:18:18 -0000 1.13
+++ net/sunrpc/sched.c 8 Mar 2003 22:46:11 -0000
@@ -168,10 +168,8 @@ void rpc_add_timer(struct rpc_task *task
static inline void
rpc_delete_timer(struct rpc_task *task)
{
- if (timer_pending(&task->tk_timer)) {
+ if (del_timer_sync(&task->tk_timer))
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
- del_timer_sync(&task->tk_timer);
- }
}
=20
/*




--=20
Danny Smith
Senior Systems Administrator, Cinesite (Europe) Ltd
020 7973 4000 - x4055 / [email protected]




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 10:03:03

by Brasseur Valéry

[permalink] [raw]
Subject: RE: linux 2.4.20 oops

Thanks for your information ! we are on SMP boxes so I will try ASAP !

> -----Original Message-----
> From: Danny Smith [mailto:[email protected]]
> Sent: Tuesday, April 29, 2003 11:45 AM
> To: Brasseur Val=E9ry
> Cc: [email protected]
> Subject: Re: [NFS] linux 2.4.20 oops
>=20
>=20
> Brasseur Val=E9ry wrote:
>=20
> >I have got this oops in my kernel (2.4.20 + NFS-ALL )
> >note : the oops are not log in syslog (don't know why !)
> >but the oops seems nfs-related, any ideas ?
> >
> We had almost identical oopses with the same setup on our=20
> dual render boxes,
> which would often panic almost immediately afterwards.
>=20
> We used this patch from Ulrich Weigand=20
> ([email protected]), and
> haven't seen a problem since. See the archives for full=20
> details - basically seems
> to be an SMP race in rpc_delete_timer(). If you're not on an=20
> SMP system,
> it's probably NOT the right fix.
>=20
> HTH,
>=20
> Danny
>=20
> Index: net/sunrpc/sched.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /home/cvs/linux-2.3/net/sunrpc/sched.c,v
> retrieving revision 1.13
> diff -u -p -r1.13 sched.c
> --- net/sunrpc/sched.c 3 May 2001 16:18:18 -0000 1.13
> +++ net/sunrpc/sched.c 8 Mar 2003 22:46:11 -0000
> @@ -168,10 +168,8 @@ void rpc_add_timer(struct rpc_task *task
> static inline void
> rpc_delete_timer(struct rpc_task *task)
> {
> - if (timer_pending(&task->tk_timer)) {
> + if (del_timer_sync(&task->tk_timer))
> dprintk("RPC: %4d deleting timer\n", task->tk_pid);
> - del_timer_sync(&task->tk_timer);
> - }
> }
> =20
> /*
>=20
>=20
>=20
>=20
> --=20
> Danny Smith
> Senior Systems Administrator, Cinesite (Europe) Ltd
> 020 7973 4000 - x4055 / [email protected]
>=20
>=20
>=20


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 16:31:47

by Philippe Troin

[permalink] [raw]
Subject: Re: linux 2.4.20 oops

Danny Smith <[email protected]> writes:

> Brasseur Val=E9ry wrote:
>=20
> >I have got this oops in my kernel (2.4.20 + NFS-ALL )
> >note : the oops are not log in syslog (don't know why !)
> >but the oops seems nfs-related, any ideas ?
>=20
> We had almost identical oopses with the same setup on our dual
> render boxes, which would often panic almost immediately afterwards.
>=20
> We used this patch from Ulrich Weigand
> ([email protected]), and haven't seen a problem
> since. See the archives for full details - basically seems to be an
> SMP race in rpc_delete_timer(). If you're not on an SMP system, it's
> probably NOT the right fix.

Has this been pushed to 2.4.21?

Phil.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 16:45:01

by Danny Smith

[permalink] [raw]
Subject: Re: linux 2.4.20 oops


Philippe Troin wrote:

>Danny Smith <[email protected]> writes:
>
>
>>We used this patch from Ulrich Weigand
>>([email protected]), and haven't seen a problem
>>since. See the archives for full details - basically seems to be an
>>SMP race in rpc_delete_timer(). If you're not on an SMP system, it's
>>probably NOT the right fix.
>>
>>
>
>Has this been pushed to 2.4.21?
>
>
Not AFAIK.
From the comments in the original post I saw, it looks like Trond did
produce a similar patch, but I couldn't see anything when I was looking
for a resolution to our problems. Perhaps it's worth waiting for more
confirmation that this is
a) harmless in all cases, and
b) fixes the problems being seen
before trying to get this pushed up. I freely confess that I don't know
enough of this code at this level to feel confident beyond "it works
well for us".

Danny

--
Danny Smith
Senior Systems Administrator, Cinesite (Europe) Ltd
020 7973 4000 - x4055 / [email protected]




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 16:54:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: linux 2.4.20 oops

>>>>> " " == Philippe Troin <[email protected]> writes:

> Has this been pushed to 2.4.21?

Yes.

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 16:56:43

by Lever, Charles

[permalink] [raw]
Subject: RE: linux 2.4.20 oops

the fix appears in 2.4.21-pre7, but was removed from
2.4.21-rc1. trond?

> -----Original Message-----
> From: Danny Smith [mailto:[email protected]]
> Sent: Tuesday, April 29, 2003 12:44 PM
> To: Philippe Troin
> Cc: Brasseur Val=E9ry; [email protected]
> Subject: Re: [NFS] linux 2.4.20 oops
>=20
>=20
>=20
> Philippe Troin wrote:
>=20
> >Danny Smith <[email protected]> writes:
> > =20
> >
> >>We used this patch from Ulrich Weigand
> >>([email protected]), and haven't seen a problem
> >>since. See the archives for full details - basically seems to be an
> >>SMP race in rpc_delete_timer(). If you're not on an SMP system, =
it's
> >>probably NOT the right fix.
> >> =20
> >>
> >
> >Has this been pushed to 2.4.21?
> > =20
> >
> Not AFAIK.
> From the comments in the original post I saw, it looks like=20
> Trond did=20
> produce a similar patch, but I couldn't see anything when I=20
> was looking=20
> for a resolution to our problems. Perhaps it's worth waiting for more =

> confirmation that this is
> a) harmless in all cases, and
> b) fixes the problems being seen
> before trying to get this pushed up. I freely confess that I=20
> don't know=20
> enough of this code at this level to feel confident beyond "it works=20
> well for us".
>=20
> Danny
>=20
> --=20
> Danny Smith
> Senior Systems Administrator, Cinesite (Europe) Ltd
> 020 7973 4000 - x4055 / [email protected]
>=20
>=20
>=20
>=20
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 17:04:56

by Trond Myklebust

[permalink] [raw]
Subject: RE: linux 2.4.20 oops

>>>>> " " == Charles Lever <Lever> writes:

> the fix appears in 2.4.21-pre7, but was removed from
> 2.4.21-rc1. trond?

Huh? From the latest BK pull linux/net/sunrpc/sched.c

/*
* Delete any timer for the current task. Because we use
del_timer_sync(),
* this function should never be called while holding rpc_queue_lock.
*/
static inline void
rpc_delete_timer(struct rpc_task *task)
{
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
del_timer_sync(&task->tk_timer);
}

So AFAICS it is still there. I certainly have no quarrel with that
patch. I hit upon the same problem + fix in a different bug-report at
~ the same time.

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 17:22:23

by Trond Myklebust

[permalink] [raw]
Subject: RE: linux 2.4.20 oops

>>>>> " " == Charles Lever <Lever> writes:

> i built 2.4.21-rc1 from 2.4.20.tar.bz2 and the rc1 upgrade
> patch, and its not in my version of rc1.

Hmmm... strange. AFAICS there are no NFS or RPC changes whatsoever in
the rc1 patch on ftp.kernel.org.
As I said, all the changes (including Ulrich Weigand's patch) appear
to still be in the bitkeeper repository (see for instance the kernel
source browser on http://linux.bkbits.net:8080/linux-2.4).

Marcelo, is this perhaps a problem with an incorrect generation of the
'official' 2.4.21-rc1 patch?

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 17:25:05

by Lever, Charles

[permalink] [raw]
Subject: RE: linux 2.4.20 oops

i will verify that i have created 2.4.21-rc1 correctly.

> -----Original Message-----
> From: Trond Myklebust [mailto:[email protected]]
> Sent: Tuesday, April 29, 2003 1:22 PM
> To: Marcelo Tosatti
> Cc: Lever, Charles; NFS maillist
> Subject: RE: [NFS] linux 2.4.20 oops
>=20
>=20
> >>>>> " " =3D=3D Charles Lever <Lever> writes:
>=20
> > i built 2.4.21-rc1 from 2.4.20.tar.bz2 and the rc1 upgrade
> > patch, and its not in my version of rc1.
>=20
> Hmmm... strange. AFAICS there are no NFS or RPC changes whatsoever in
> the rc1 patch on ftp.kernel.org.
> As I said, all the changes (including Ulrich Weigand's patch) appear
> to still be in the bitkeeper repository (see for instance the kernel
> source browser on http://linux.bkbits.net:8080/linux-2.4).
>=20
> Marcelo, is this perhaps a problem with an incorrect generation of the
> 'official' 2.4.21-rc1 patch?
>=20
> Cheers,
> Trond
>=20


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-29 17:31:40

by Lever, Charles

[permalink] [raw]
Subject: RE: linux 2.4.20 oops

ok, it looks like i was stupid and didn't apply the rc1 patch
to my rc1 tree. my copy of the rc1 pre-patch has the right
stuff in it.

sorry to be a bug.

> -----Original Message-----
> From: Trond Myklebust [mailto:[email protected]]
> Sent: Tuesday, April 29, 2003 1:22 PM
> To: Marcelo Tosatti
> Cc: Lever, Charles; NFS maillist
> Subject: RE: [NFS] linux 2.4.20 oops
>=20
>=20
> >>>>> " " =3D=3D Charles Lever <Lever> writes:
>=20
> > i built 2.4.21-rc1 from 2.4.20.tar.bz2 and the rc1 upgrade
> > patch, and its not in my version of rc1.
>=20
> Hmmm... strange. AFAICS there are no NFS or RPC changes whatsoever in
> the rc1 patch on ftp.kernel.org.
> As I said, all the changes (including Ulrich Weigand's patch) appear
> to still be in the bitkeeper repository (see for instance the kernel
> source browser on http://linux.bkbits.net:8080/linux-2.4).
>=20
> Marcelo, is this perhaps a problem with an incorrect generation of the
> 'official' 2.4.21-rc1 patch?
>=20
> Cheers,
> Trond
>=20


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs