2008-03-23 07:23:29

by Khaled Al-Hamwi

[permalink] [raw]
Subject: spinlock BUG in qdisc_restart

Hi list,

I am doing experimental work on the bcm5700 network driver. I am using
3Com NICs. My machine has two NICs. The machine is simply forwarding
the incoming packets from one NIC to the other one.
I want to process all the incoming packets at the interrupt level and
not using softirqs. I changed the call sequence accordingly in the
bcm5700 driver.

I got the following bug and the system hanged (I got this from
/var/log/messages):

BUG: spinlock trylock failure on UP on CPU#0, swapper/0
lock: d11f014c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0
[<c01bec44>] _raw_spin_trylock+0x37/0x3b
[<c02d6343>] _spin_trylock+0x5/0xe
[<c0291117>] qdisc_restart+0x3c/0x1b5
[<c0284038>] dev_queue_xmit+0xf2/0x207
[<c029fb25>] ip_output+0x1cc/0x224
[<c029e1a7>] ip_forward+0x383/0x3e2
[<c029cf2f>] ip_rcv+0x38e/0x3ea
[<c02845af>] netif_receive_skb+0x1fc/0x23e
[<e033b560>] MM_IndicateRxPackets+0x2be/0x379 [bcm5700]
[<e0342c26>] LM_ServiceInterrupts+0xac/0xc5 [bcm5700]
[<e0337768>] bcm5700_interrupt+0x13c/0x2bd [bcm5700]
[<c0135ce2>] handle_IRQ_event+0x23/0x4c
[<c0135d85>] __do_IRQ+0x7a/0xcd
[<c0103eea>] do_IRQ+0x5c/0x77
=======================
[<c0102d6a>] common_interrupt+0x1a/0x20
[<c02d007b>] unix_sock_destructor+0x4a/0xb3
[<c02d6402>] _spin_unlock_irqrestore+0xa/0xc


I tried different things to solve this issue like using tasklets for
transmitting packets instead of transmitting them directly. Every time
I get a different bug or kernel panic :(
I tried also one suggestion from the mailing list of using spin_trylock_irqsave
and spin_trylock_irqrestore. But that also did not solve the problem.

The problem is that function qdisc_restart is being preempted and
called again on the same CPU. I have only one CPU core and I am using
kernel 2.6.15.
It seems to me that the interrupts initiated by the NIC driver are the
reason for this. But, From the driver code, I can see that the
interrupts are disabled before calling MM_IndicateRxPackets. This
function is responsible for processing the packets and calling
netif_receive_skb for further processing (look at the trace). I got
also this error:
Dead loop on netdevice eth0, fix it urgently!

Is there any other source for preempting qdisc_restart other than
hadware interrupts from the NIC? Is it possible that having two NICs
in this setup is causing this problem? Any ideas or suggestions are
appreciated.

Please, CC me when replying to this message.

Thanks,
Khaled


2008-03-23 10:10:37

by David Miller

[permalink] [raw]
Subject: Re: spinlock BUG in qdisc_restart

From: "Khaled Al-Hamwi" <[email protected]>
Date: Sun, 23 Mar 2008 10:23:20 +0300

> [<c02845af>] netif_receive_skb+0x1fc/0x23e
> [<e033b560>] MM_IndicateRxPackets+0x2be/0x379 [bcm5700]
> [<e0342c26>] LM_ServiceInterrupts+0xac/0xc5 [bcm5700]
> [<e0337768>] bcm5700_interrupt+0x13c/0x2bd [bcm5700]

This backtrace is not from a driver that is in the upstream kernel
sources.

Please use the supported and upstream "tg3" driver for this hardware.

Thank you.

2008-03-24 11:34:39

by Khaled Al-Hamwi

[permalink] [raw]
Subject: Re: spinlock BUG in qdisc_restart

On Sun, Mar 23, 2008 at 1:10 PM, David Miller <[email protected]> wrote:
> From: "Khaled Al-Hamwi" <[email protected]>
> Date: Sun, 23 Mar 2008 10:23:20 +0300
>
> > [<c02845af>] netif_receive_skb+0x1fc/0x23e
> > [<e033b560>] MM_IndicateRxPackets+0x2be/0x379 [bcm5700]
> > [<e0342c26>] LM_ServiceInterrupts+0xac/0xc5 [bcm5700]
> > [<e0337768>] bcm5700_interrupt+0x13c/0x2bd [bcm5700]
>
> This backtrace is not from a driver that is in the upstream kernel
> sources.
>
> Please use the supported and upstream "tg3" driver for this hardware.
>
> Thank you.
>

Thank you for your reply.

I need to use the bcm5700 driver. Otherwise, I have to reimplemented
the changes that has been already implemented.

By the way, I found the reason for this bug and fixed it. It was
caused by the scheduling of NET_TX_SOFTIRQ. The conflict happens when
tx softirq is run.
I have already disabled rx softirq as I am processing the packets at
the interrupt level.


Thanks,
Khaled