2004-07-27 14:14:38

by Olaf Kirch

[permalink] [raw]
Subject: maybe a svcsock.c/sv_lock deadlock?

Hi,

I have a somewhat unclear bug report against the 2.6 kernel, where
the sysrq output seems to indicate there's a deadlock on sv_lock
somewhere. There is a stuck process is in tcp_data_ready, but
unfortunately the sysrq output is not complete, so I don't know what
the other CPUs were doing at the time.

Looking at the code, I did notice however that some new code was
added recently (svc_defer, svc_revisit) that uses spin_lock instead of
spin_lock_bh when grabbing the sv_lock.

So it seems there's potential for deadlock if TCP data arrives while
one of these new functions hold sv_lock.

Comments?

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-07-27 17:43:59

by Bernd Schubert

[permalink] [raw]
Subject: Re: maybe a svcsock.c/sv_lock deadlock?

Hello Olaf,

> I have a somewhat unclear bug report against the 2.6 kernel, where
> the sysrq output seems to indicate there's a deadlock on sv_lock
> somewhere. There is a stuck process is in tcp_data_ready, but
> unfortunately the sysrq output is not complete, so I don't know what
> the other CPUs were doing at the time.
>
> Looking at the code, I did notice however that some new code was
> added recently (svc_defer, svc_revisit) that uses spin_lock instead of
> spin_lock_bh when grabbing the sv_lock.
>
> So it seems there's potential for deadlock if TCP data arrives while
> one of these new functions hold sv_lock.
>
> Comments?

When we tried to use 2.6.7 it happend two time that the machine was working
but the clients simply could not reach the nfs-server.
We didn't have much time to care about it and simply rebooted the server,
however, before doing that we issued sysrq+t.
I've attached the full trace, maybe its related to your trace?

Cheers,
Bernd


PS: Unfortunality I can't read those traces, oopses, etc., is there any
documentation how to read it?


Attachments:
(No filename) (1.05 kB)
trace.log.gz (5.36 kB)
Download all attachments