2004-11-09 17:13:10

by Jeff V. Merkey

[permalink] [raw]
Subject: 2.6.9 RCU breakage in dev_queue_xmit


Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
receive, 12 CLK interacket gap time, 1500 bytes payload
at 65000 packets per second per gigabit interface, and retransmitting
received packets at 130 MB/S out of a third gigabit interface
with skb, RCU locks in dev_queue_xmit breaks and enters the following state:

Nov 8 15:38:08 ds kernel: Badness in local_bh_enable at
kernel/softirq.c:141
Nov 8 15:38:08 ds kernel: [<40107d1e>] dump_stack+0x1e/0x30
Nov 8 15:38:08 ds kernel: [<401218b0>] local_bh_enable+0x70/0x80
Nov 8 15:38:08 ds kernel: [<402c5bbb>] dev_queue_xmit+0x11b/0x250
Nov 8 15:38:08 ds kernel: [<f8981cb7>] xmit_skb+0x17/0x20 [dsfs]
Nov 8 15:38:08 ds kernel: [<f8981f8e>] xmit_packet+0x2e/0x80 [dsfs]
Nov 8 15:38:08 ds kernel: [<f89820eb>] regen_data+0x10b/0x290 [dsfs]
Nov 8 15:38:08 ds kernel: [<401052c5>] kernel_thread_helper+0x5/0x10
Nov 8 15:38:08 ds kernel: Badness in local_bh_enable at
kernel/softirq.c:141
Nov 8 15:38:08 ds kernel: [<40107d1e>] dump_stack+0x1e/0x30
Nov 8 15:38:08 ds kernel: [<401218b0>] local_bh_enable+0x70/0x80
Nov 8 15:38:08 ds kernel: [<402c5bbb>] dev_queue_xmit+0x11b/0x250
Nov 8 15:38:08 ds kernel: [<f8981cb7>] xmit_skb+0x17/0x20 [dsfs]
Nov 8 15:38:08 ds kernel: [<f8981f8e>] xmit_packet+0x2e/0x80 [dsfs]
Nov 8 15:38:08 ds kernel: [<f89820eb>] regen_data+0x10b/0x290 [dsfs]
Nov 8 15:38:08 ds kernel: [<401052c5>] kernel_thread_helper+0x5/0x10


And before any of you guys whine about "give me a test case" I just
did. Device driver is e1000 in the 2.6.9 kernel
tree. System is a Xeon based system at 3 Ghz single processor with 4
GB of ram and a 3GB/1GB kernel/user
split address space.

Jeff


2004-11-09 18:14:10

by Patrick McHardy

[permalink] [raw]
Subject: Re: 2.6.9 RCU breakage in dev_queue_xmit

Jeff V. Merkey wrote:

>
> Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
> receive, 12 CLK interacket gap time, 1500 bytes payload
> at 65000 packets per second per gigabit interface, and retransmitting
> received packets at 130 MB/S out of a third gigabit interface
> with skb, RCU locks in dev_queue_xmit breaks and enters the following
> state:
>
> Nov 8 15:38:08 ds kernel: Badness in local_bh_enable at
> kernel/softirq.c:141
> Nov 8 15:38:08 ds kernel: [<40107d1e>] dump_stack+0x1e/0x30
> Nov 8 15:38:08 ds kernel: [<401218b0>] local_bh_enable+0x70/0x80
> Nov 8 15:38:08 ds kernel: [<402c5bbb>] dev_queue_xmit+0x11b/0x250
> Nov 8 15:38:08 ds kernel: [<f8981cb7>] xmit_skb+0x17/0x20 [dsfs]
> Nov 8 15:38:08 ds kernel: [<f8981f8e>] xmit_packet+0x2e/0x80 [dsfs]
> Nov 8 15:38:08 ds kernel: [<f89820eb>] regen_data+0x10b/0x290 [dsfs]

There is no such function in the 2.6.9 kernel.

Regards
Patrick

2004-11-09 18:18:16

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: 2.6.9 RCU breakage in dev_queue_xmit

Patrick McHardy wrote:

> Jeff V. Merkey wrote:
>
>>
>> Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
>> receive, 12 CLK interacket gap time, 1500 bytes payload
>> at 65000 packets per second per gigabit interface, and retransmitting
>> received packets at 130 MB/S out of a third gigabit interface
>> with skb, RCU locks in dev_queue_xmit breaks and enters the following
>> state:
>>
>> Nov 8 15:38:08 ds kernel: Badness in local_bh_enable at
>> kernel/softirq.c:141
>> Nov 8 15:38:08 ds kernel: [<40107d1e>] dump_stack+0x1e/0x30
>> Nov 8 15:38:08 ds kernel: [<401218b0>] local_bh_enable+0x70/0x80
>> Nov 8 15:38:08 ds kernel: [<402c5bbb>] dev_queue_xmit+0x11b/0x250
>> Nov 8 15:38:08 ds kernel: [<f8981cb7>] xmit_skb+0x17/0x20 [dsfs]
>> Nov 8 15:38:08 ds kernel: [<f8981f8e>] xmit_packet+0x2e/0x80 [dsfs]
>> Nov 8 15:38:08 ds kernel: [<f89820eb>] regen_data+0x10b/0x290 [dsfs]
>
>
> There is no such function in the 2.6.9 kernel.


Check /include/linux for local_bh_enable and /net/core/dev.c .

Jeff

>
> Regards
> Patrick
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-11-09 18:35:22

by Stephen Hemminger

[permalink] [raw]
Subject: Re: 2.6.9 RCU breakage in dev_queue_xmit

On Tue, 09 Nov 2004 11:40:26 -0700
"Jeff V. Merkey" <[email protected]> wrote:

> Patrick McHardy wrote:
>
> > Jeff V. Merkey wrote:
> >
> >>
> >> Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
> >> receive, 12 CLK interacket gap time, 1500 bytes payload
> >> at 65000 packets per second per gigabit interface, and retransmitting
> >> received packets at 130 MB/S out of a third gigabit interface
> >> with skb, RCU locks in dev_queue_xmit breaks and enters the following
> >> state:
> >>
> >> Nov 8 15:38:08 ds kernel: Badness in local_bh_enable at
> >> kernel/softirq.c:141
> >> Nov 8 15:38:08 ds kernel: [<40107d1e>] dump_stack+0x1e/0x30
> >> Nov 8 15:38:08 ds kernel: [<401218b0>] local_bh_enable+0x70/0x80
> >> Nov 8 15:38:08 ds kernel: [<402c5bbb>] dev_queue_xmit+0x11b/0x250
> >> Nov 8 15:38:08 ds kernel: [<f8981cb7>] xmit_skb+0x17/0x20 [dsfs]
> >> Nov 8 15:38:08 ds kernel: [<f8981f8e>] xmit_packet+0x2e/0x80 [dsfs]
> >> Nov 8 15:38:08 ds kernel: [<f89820eb>] regen_data+0x10b/0x290 [dsfs]
> >
> >
> > There is no such function in the 2.6.9 kernel.
>
>
> Check /include/linux for local_bh_enable and /net/core/dev.c .
>
> Jeff
>
> >
> > Regards
> > Patrick

Patrick is asking about the function regen_data which doesn't exist in the
standard kernel.

2004-11-09 23:17:36

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: 2.6.9 RCU breakage in dev_queue_xmit

Stephen Hemminger wrote:

>On Tue, 09 Nov 2004 11:40:26 -0700
>"Jeff V. Merkey" <[email protected]> wrote:
>
>
>
>>Patrick McHardy wrote:
>>
>>
>>
>>>Jeff V. Merkey wrote:
>>>
>>>
>>>
>>>>Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
>>>>receive, 12 CLK interacket gap time, 1500 bytes payload
>>>>at 65000 packets per second per gigabit interface, and retransmitting
>>>>received packets at 130 MB/S out of a third gigabit interface
>>>>with skb, RCU locks in dev_queue_xmit breaks and enters the following
>>>>state:
>>>>
>>>>Nov 8 15:38:08 ds kernel: Badness in local_bh_enable at
>>>>kernel/softirq.c:141
>>>>Nov 8 15:38:08 ds kernel: [<40107d1e>] dump_stack+0x1e/0x30
>>>>Nov 8 15:38:08 ds kernel: [<401218b0>] local_bh_enable+0x70/0x80
>>>>Nov 8 15:38:08 ds kernel: [<402c5bbb>] dev_queue_xmit+0x11b/0x250
>>>>Nov 8 15:38:08 ds kernel: [<f8981cb7>] xmit_skb+0x17/0x20 [dsfs]
>>>>Nov 8 15:38:08 ds kernel: [<f8981f8e>] xmit_packet+0x2e/0x80 [dsfs]
>>>>Nov 8 15:38:08 ds kernel: [<f89820eb>] regen_data+0x10b/0x290 [dsfs]
>>>>
>>>>
>>>There is no such function in the 2.6.9 kernel.
>>>
>>>
>>Check /include/linux for local_bh_enable and /net/core/dev.c .
>>
>>Jeff
>>
>>
>>
>>>Regards
>>>Patrick
>>>
>>>
>
>Patrick is asking about the function regen_data which doesn't exist in the
>standard kernel.
>-
>
>
>

Code for regen_data cuntion attached. This code provides the example of
the calls into linux that cause
the RCU locks to fail in dev_queue_xmit. The remainder of this module is
not open source and
proprietary.

Jeff






Attachments:
regen.c (7.27 kB)

2004-11-10 01:51:08

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.6.9 RCU breakage in dev_queue_xmit

Jeff V. Merkey <[email protected]> wrote:
>
> Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
> receive, 12 CLK interacket gap time, 1500 bytes payload
> at 65000 packets per second per gigabit interface, and retransmitting
> received packets at 130 MB/S out of a third gigabit interface
> with skb, RCU locks in dev_queue_xmit breaks and enters the following state:

This patch might help.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/11/01 17:41:19-08:00 [email protected]
# [NET]: Fix unbalanced local_bh_enable() in dev_queue_xmit()
#
# Signed-off-by: Ingo Molnar <[email protected]>
# Signed-off-by: David S. Miller <[email protected]>
#
# net/core/dev.c
# 2004/11/01 17:40:59-08:00 [email protected] +5 -6
# [NET]: Fix unbalanced local_bh_enable() in dev_queue_xmit()
#
# Signed-off-by: Ingo Molnar <[email protected]>
# Signed-off-by: David S. Miller <[email protected]>
#
diff -Nru a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c 2004-11-10 12:45:48 +11:00
+++ b/net/core/dev.c 2004-11-10 12:45:48 +11:00
@@ -1261,6 +1261,11 @@
struct Qdisc *q;
int rc = -ENOMEM;

+ /* Disable soft irqs for various locks below. Also
+ * stops preemption for RCU.
+ */
+ local_bh_disable();
+
if (skb_shinfo(skb)->frag_list &&
!(dev->features & NETIF_F_FRAGLIST) &&
__skb_linearize(skb, GFP_ATOMIC))
@@ -1284,12 +1289,6 @@
skb->protocol != htons(ETH_P_IP))))
if (skb_checksum_help(skb, 0))
goto out_kfree_skb;
-
-
- /* Disable soft irqs for various locks below. Also
- * stops preemption for RCU.
- */
- local_bh_disable();

/* Updates of qdisc are serialized by queue_lock.
* The struct Qdisc which is pointed to by qdisc is now a

2004-11-10 16:59:43

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: 2.6.9 RCU breakage in dev_queue_xmit

Herbert Xu wrote:

>Jeff V. Merkey <[email protected]> wrote:
>
>
>>Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
>>receive, 12 CLK interacket gap time, 1500 bytes payload
>>at 65000 packets per second per gigabit interface, and retransmitting
>>received packets at 130 MB/S out of a third gigabit interface
>>with skb, RCU locks in dev_queue_xmit breaks and enters the following state:
>>
>>
>
>This patch might help.
>
>
Fixed. Who an earth missed that? dropping into an unlock case
by default without holding the lock?

Jeff

2004-11-10 23:34:41

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: 2.6.9 RCU breakage in dev_queue_xmit

Herbert Xu wrote:

>Jeff V. Merkey <[email protected]> wrote:
>
>
>>Running dual gigabit interfaces at 196 MB/S (megabytes/second) on
>>receive, 12 CLK interacket gap time, 1500 bytes payload
>>at 65000 packets per second per gigabit interface, and retransmitting
>>received packets at 130 MB/S out of a third gigabit interface
>>with skb, RCU locks in dev_queue_xmit breaks and enters the following state:
>>
>>
>
>This patch might help.
>
>

Herbert,

Even with this patch I still see RCU breakage at these data rates and
the problem persists -- it just takes
longer for it to manifest (about 23 hours). I am recoding
dev_queue_xmit since the use of RCU primitives
is severely busted. I looked over the code and the fact it breaks on
uniprocessor is really a joke.
No offense guys, but this is pretty bad. How about something simple,
like a spinlock or
multiple send queues per proc?

Jeff