2004-10-06 08:23:05

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Netconsole & sungem: hang when link down

On Wed, 2004-10-06 at 16:39, Colin Leroy wrote:
> Hi,
>
> I noticed that, if you have netconsole set up and using a sungem card,
> and if the network cable is not plugged in, that the whole kernel hangs
> shortly after the "device not up yet, forcing it" netconsole
> message. I suspect this is due to the autoneg in sungem, but didn't have
> time to look further.
>
> Would you have any hints on the cause of this problem?

Not sure, I suppose the driver is doing printk's with spinlocks held
from the autoneg stuff and there is a spinlock deadlock happening ...

Ben.



2004-10-06 08:43:16

by Colin Leroy

[permalink] [raw]
Subject: Re: Netconsole & sungem: hang when link down

On 06 Oct 2004 at 18h10, Benjamin Herrenschmidt wrote:

Hi,

> On Wed, 2004-10-06 at 16:39, Colin Leroy wrote:
> > Hi,
> >
> > I noticed that, if you have netconsole set up and using a sungem
> > card, and if the network cable is not plugged in, that the whole
> > kernel hangs shortly after the "device not up yet, forcing it"
> > netconsole message. I suspect this is due to the autoneg in sungem,
> > but didn't have time to look further.
> >
> > Would you have any hints on the cause of this problem?
>
> Not sure, I suppose the driver is doing printk's with spinlocks held
> from the autoneg stuff and there is a spinlock deadlock happening ...

Thanks. I'll look into this. If I'm not mistaken, I've got no way of
catching it easily, do I ? CONFIG_DEBUG_SPINLOCK's help seems to say
that I need NMI watchdog in order to catch deadlocks, which is only
available on x86(_64).

--
Colin

2004-10-06 09:15:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Netconsole & sungem: hang when link down

On Wed, 2004-10-06 at 18:42, Colin Leroy wrote:

> > Not sure, I suppose the driver is doing printk's with spinlocks held
> > from the autoneg stuff and there is a spinlock deadlock happening ...
>
> Thanks. I'll look into this. If I'm not mistaken, I've got no way of
> catching it easily, do I ? CONFIG_DEBUG_SPINLOCK's help seems to say
> that I need NMI watchdog in order to catch deadlocks, which is only
> available on x86(_64).

Hrm... we have some sort of spinlock debugging, at least on ppc64...

BTW, did you have SMP or PREEMPT ? If none of these, then you should
not see any spin deadlock...

The solution is to look at the code though and find what's wrong :)

Ben.


2004-10-06 09:36:41

by Colin Leroy

[permalink] [raw]
Subject: Re: Netconsole & sungem: hang when link down

On 06 Oct 2004 at 19h10, Benjamin Herrenschmidt wrote:

Hi,

> Hrm... we have some sort of spinlock debugging, at least on ppc64...
> BTW, did you have SMP or PREEMPT ? If none of these, then you should
> not see any spin deadlock...

No, in fact. You're right...
Indeed, if there was a deadlock, it would also happen when cable is
plugged in, wouldn't it ? (as sungem outputs "Link is up at xxx..." or
something when correctly initialized).

> The solution is to look at the code though and find what's wrong :)

I'll try.
The called method in the driver when calling
dev_change_flags(ndev, ndev->flags | IFF_UP) from netpoll
is
gem_open(), if I'm not mistaken?

Could some kind of infinite loop happen within gem_link_timer, maybe ?

--
Colin