2008-08-21 03:56:50

by Yinghai Lu

[permalink] [raw]
Subject: skbuff bug?

skb_over_panic: text:ffffffff8056cec7 len:27760 put:27760
head:ffff880821c9b800 data:ffff880821c9b812 tail:0x6c82 end:0x680
dev:eth10
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:128!
invalid opcode: 0000 [1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
CPU 0
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.27-rc3-tip-00674-g64e34ec-dirty #126
RIP: 0010:[<ffffffff8089c163>] [<ffffffff8089c163>] skb_put+0x82/0x8e
RSP: 0018:ffffffff80fa2d30 EFLAGS: 00010286
RAX: 0000000000000089 RBX: ffff880822e6c780 RCX: 0000000000000641
RDX: 0000000000000641 RSI: 0000000000000046 RDI: ffffffff80fec804
RBP: ffffffff80fa2d50 R08: ffffffff80fa2b80 R09: 0000000000000000
R10: ffffffff8106cd10 R11: 0000000000000010 R12: ffffc2002d0f1028
R13: 0000000000006c70 R14: ffff880821c5fb00 R15: ffff8838237f4010
FS: 0000000000000000(0000) GS:ffffffff80ecae00(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fff017b2fe8 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff80ed0000, task ffffffff80db23b0)
Stack: 0000000000006c82 0000000000000680 ffff880822e6c000 0000000000000001
ffffffff80fa2e00 ffffffff8056cec7 0000000000000040 ffff880822e70780
00000040231ee340 ffffffff80fa2e74 0000003e00000053 ffff880822e6c000
Call Trace:
<IRQ> [<ffffffff8056cec7>] e1000_clean_rx_irq+0x1e9/0x2c4
[<ffffffff8056b47f>] e1000_clean+0x316/0x4cc
[<ffffffff8028e9ec>] ? handle_edge_irq+0xfb/0x11f
[<ffffffff808a05a4>] net_rx_action+0x7f/0x159
[<ffffffff8025fa48>] __do_softirq+0x72/0xe2
[<ffffffff80569d8f>] ? e1000_intr_msi+0x100/0x10a
[<ffffffff8022835c>] call_softirq+0x1c/0x28
[<ffffffff802295b0>] do_softirq+0x39/0x77
[<ffffffff8025f75c>] irq_exit+0x44/0x87
[<ffffffff802298fa>] do_IRQ+0x146/0x168
[<ffffffff802275c1>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff8022d64c>] ? default_idle+0x30/0x45
[<ffffffff8022d7e2>] ? c1e_idle+0xd5/0xdc
[<ffffffff8096a87a>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff80226271>] ? cpu_idle+0xd9/0x121
[<ffffffff8092ba72>] ? rest_init+0x66/0x68
[<ffffffff80ee1df3>] ? start_kernel+0x3ab/0x3b6
[<ffffffff80ee1296>] ? x86_64_start_reservations+0xa5/0xa9
[<ffffffff80ee13af>] ? x86_64_start_kernel+0xf2/0xf9
Code: 8b 57 68 48 89 44 24 10 8b 87 a8 00 00 00 48 89 44 24 08 8b bf
a4 00 00 00 31 c0 48 89 3c 24 48 c7 c7 6c ca d2 80 e8 4e 96 0c 00 <0f>
0b eb fe 89 d0 c9 49 8d 04 00 c3 55 48 89 e5 41 56 41 55 41
RIP [<ffffffff8089c163>] skb_put+0x82/0x8e
RSP <ffffffff80fa2d30>
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G D
2.6.27-rc3-tip-00674-g64e34ec-dirty #126
Call Trace:
<IRQ> [<ffffffff809656fa>] panic+0xa5/0x15c
[<ffffffff80227da6>] ? apic_timer_interrupt+0x66/0x70
[<ffffffff809686b6>] ? oops_end+0x42/0xa4
[<ffffffff809686fb>] oops_end+0x87/0xa4
[<ffffffff80228c60>] die+0x62/0x6b
[<ffffffff80968d0c>] do_trap+0x110/0x11f
[<ffffffff802292cd>] do_invalid_op+0x98/0xa1
[<ffffffff8089c163>] ? skb_put+0x82/0x8e
[<ffffffff802c40ff>] ? add_partial+0x1f/0x6c
[<ffffffff8096581d>] ? printk+0x6c/0x6f
[<ffffffff809680b9>] error_exit+0x0/0x51
[<ffffffff8089c163>] ? skb_put+0x82/0x8e
[<ffffffff8056cec7>] e1000_clean_rx_irq+0x1e9/0x2c4
[<ffffffff8056b47f>] e1000_clean+0x316/0x4cc
[<ffffffff8028e9ec>] ? handle_edge_irq+0xfb/0x11f
[<ffffffff808a05a4>] net_rx_action+0x7f/0x159
[<ffffffff8025fa48>] __do_softirq+0x72/0xe2
[<ffffffff80569d8f>] ? e1000_intr_msi+0x100/0x10a
[<ffffffff8022835c>] call_softirq+0x1c/0x28
[<ffffffff802295b0>] do_softirq+0x39/0x77
[<ffffffff8025f75c>] irq_exit+0x44/0x87
[<ffffffff802298fa>] do_IRQ+0x146/0x168
[<ffffffff802275c1>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff8022d64c>] ? default_idle+0x30/0x45
[<ffffffff8022d7e2>] ? c1e_idle+0xd5/0xdc
[<ffffffff8096a87a>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff80226271>] ? cpu_idle+0xd9/0x121
[<ffffffff8092ba72>] ? rest_init+0x66/0x68
[<ffffffff80ee1df3>] ? start_kernel+0x3ab/0x3b6
[<ffffffff80ee1296>] ? x86_64_start_reservations+0xa5/0xa9
[<ffffffff80ee13af>] ? x86_64_start_kernel+0xf2/0xf9


2008-08-21 23:47:06

by Yinghai Lu

[permalink] [raw]
Subject: Re: skbuff bug?

it seems caused by

commit f8d59f7826aa73c5e7682fbed6db38020635d466
Author: Bruce Allan <[email protected]>
Date: Fri Aug 8 18:36:11 2008 -0700

e1000e: test for unusable MSI support

Some systems do not like 82571/2 use of 16-bit MSI messages and some
other systems claim to support MSI, but neither really works. Setup a
test MSI handler to detect whether or not MSI is working properly, and
if not, fallback to legacy INTx interrupts.

Signed-off-by: Bruce Allan <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>


because sometime got

calling ip_auto_config+0x0/0xec1
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff80566e31>] e1000_clean+0xb6/0x4cc
PGD 0
Oops: 0000 [1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
CPU 3
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.27-rc4-tip-00862-gcc150c8-dirty #140
RIP: 0010:[<ffffffff80566e31>] [<ffffffff80566e31>] e1000_clean+0xb6/0x4cc
RSP: 0018:ffff880826527e10 EFLAGS: 00010246
RAX: ffff880823964000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff880823964000 RSI: 0000000000000000 RDI: ffffc200147b1000
RBP: ffff880826527ea0 R08: 0000000000000032 R09: 0000000000000000
R10: ffff880826527f78 R11: ffff88042655fe98 R12: 0000000000000000
R13: ffff880824110780 R14: ffff880423da0640 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8804264ba300(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000a8 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88042655e000, task ffff880426561680)
Stack: ffffffff802750f5 ffff880826527e48 ffffffff8027613e ffffffff8026a1fc
0000001470188100 00000040280809a0 ffff880824110908 0000000000000000
0000000001000286 ffff880824110000 ffff880823964000 ffff880824110780
Call Trace:
<IRQ> [<ffffffff802750f5>] ? clockevents_program_event+0x81/0x8a
[<ffffffff8027613e>] ? tick_program_event+0x3f/0x5f
[<ffffffff8026a1fc>] ? __queue_work+0x7c/0x85
[<ffffffff8089cb3c>] net_rx_action+0x7f/0x159
[<ffffffff8025ec38>] __do_softirq+0x72/0xe2
[<ffffffff80565ac0>] ? e1000_intr_msi+0x100/0x10a
[<ffffffff8022835c>] call_softirq+0x1c/0x28
[<ffffffff802295b0>] do_softirq+0x39/0x77
[<ffffffff8025e94c>] irq_exit+0x44/0x87
[<ffffffff80229907>] do_IRQ+0x146/0x168
[<ffffffff802275c1>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff8022d69c>] ? default_idle+0x30/0x45
[<ffffffff8022d832>] ? c1e_idle+0xd5/0xdc
[<ffffffff80966fea>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff80226271>] ? cpu_idle+0xd9/0x121
[<ffffffff8095e2a7>] ? start_secondary+0x16f/0x174
Code: c6 45 b3 00 e9 b0 00 00 00 49 8b 16 41 89 dc 49 6b fc 28 48 89
55 c0 49 03 7e 20 3b 5d b4 0f 94 45 b3 75 39 48 8b 4f 08 8b
<6>0000:87:00.0: eth4: MSI interrupt test failed!
0000:87:00.0: eth4: MSI interrupt test failed, using legacy interrupt.
75 ac <8b> 91 a8 00 00 00 48 8b 81 b0 00 00 00 0f b7 44 10 08 ba 01 00
RIP [<ffffffff80566e31>] e1000_clean+0xb6/0x4cc
RSP <ffff880826527e10>
CR2: 00000000000000a8
do_IRQ: 3.131 No irq handler for vector
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G D
2.6.27-rc4-tip-00862-gcc150c8-dirty #140
Call Trace:
<IRQ> [<ffffffff80961e69>] panic+0xa5/0x15c
[<ffffffff8025a1fc>] ? printk_ratelimit+0x15/0x17
[<ffffffff802275c1>] ? ret_from_intr+0x0/0xa
[<ffffffff80964e26>] ? oops_end+0x42/0xa4
[<ffffffff80964e6b>] oops_end+0x87/0xa4
[<ffffffff80966e79>] do_page_fault+0x7cb/0x8bf
[<ffffffff804c0a26>] ? __bitmap_weight+0x3e/0x89
[<ffffffff804c0a26>] ? __bitmap_weight+0x3e/0x89
[<ffffffff8024c4e0>] ? __enqueue_entity+0x8a/0x8c
[<ffffffff8024dcf7>] ? enqueue_entity+0x195/0x19d
[<ffffffff8024dd45>] ? enqueue_task_fair+0x46/0x4b
[<ffffffff8024a2d3>] ? enqueue_task+0x50/0x5b
[<ffffffff80964829>] error_exit+0x0/0x51
[<ffffffff80566e31>] ? e1000_clean+0xb6/0x4cc
[<ffffffff802750f5>] ? clockevents_program_event+0x81/0x8a
[<ffffffff8027613e>] ? tick_program_event+0x3f/0x5f
[<ffffffff8026a1fc>] ? __queue_work+0x7c/0x85
[<ffffffff8089cb3c>] net_rx_action+0x7f/0x159
[<ffffffff8025ec38>] __do_softirq+0x72/0xe2
[<ffffffff80565ac0>] ? e1000_intr_msi+0x100/0x10a
[<ffffffff8022835c>] call_softirq+0x1c/0x28
[<ffffffff802295b0>] do_softirq+0x39/0x77
[<ffffffff8025e94c>] irq_exit+0x44/0x87
[<ffffffff80229907>] do_IRQ+0x146/0x168
[<ffffffff802275c1>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff8022d69c>] ? default_idle+0x30/0x45
[<ffffffff8022d832>] ? c1e_idle+0xd5/0xdc
[<ffffffff80966fea>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff80226271>] ? cpu_idle+0xd9/0x121
[<ffffffff8095e2a7>] ? start_secondary+0x16f/0x174


the system got several intel pcie quad ports cards installed

YH

2008-08-22 01:23:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: skbuff bug?

another one

skb_over_panic: text:ffffffff80568ad9 len:10742 put:10742
head:ffff8808236ac800 data:ffff8808236ac812 tail:0x2a08 end:0x680
dev:eth2
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:128!
invalid opcode: 0000 [1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
CPU 3
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.27-rc4-tip-00862-gcc150c8-dirty #144
RIP: 0010:[<ffffffff808985bb>] [<ffffffff808985bb>] skb_put+0x82/0x8e
RSP: 0018:ffff880426567d30 EFLAGS: 00010286
RAX: 0000000000000088 RBX: ffff880823c80780 RCX: 0000000000006280
RDX: ffff8800a70ed000 RSI: 0000000000000046 RDI: ffffffff80fe7804
RBP: ffff880426567d50 R08: ffff880426567b80 R09: 0000000000000000
R10: ffffffff81067d10 R11: 0000000000000010 R12: ffffc200147ad9d8
R13: 00000000000029f6 R14: ffff88082367e500 R15: ffff880423a3c3f0
FS: 00007f4d6c6c36f0(0000) GS:ffff8804264be300(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f4d6c9694e8 CR3: 000000041fc45000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880826526000, task ffff880426559680)
Stack: 0000000000002a08 0000000000000680 ffff880823c80000 0000000000000501
ffff880426567e00 ffffffff80568ad9 00000000ffff9c73 0000040000000001
000000402301c288 ffff880426567e74 ffff880400000013 ffff880823c80000
Call Trace:
<IRQ> [<ffffffff80568ad9>] e1000_clean_rx_irq+0x1e9/0x2c4
[<ffffffff80567091>] e1000_clean+0x316/0x4cc
[<ffffffff802750f5>] ? clockevents_program_event+0x81/0x8a
[<ffffffff8027613e>] ? tick_program_event+0x3f/0x5f
[<ffffffff8089cb3c>] net_rx_action+0x7f/0x159
[<ffffffff8025ec38>] __do_softirq+0x72/0xe2
[<ffffffff80565ac0>] ? e1000_intr_msi+0x100/0x10a
[<ffffffff8022835c>] call_softirq+0x1c/0x28
[<ffffffff802295b0>] do_softirq+0x39/0x77
[<ffffffff8025e94c>] irq_exit+0x44/0x87
[<ffffffff80229907>] do_IRQ+0x146/0x168
[<ffffffff802275c1>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff8024c75c>] ? pick_next_task_fair+0x9d/0xac
[<ffffffff8022d69c>] ? default_idle+0x30/0x45
[<ffffffff8022d832>] ? c1e_idle+0xd5/0xdc
[<ffffffff80966fea>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff80226271>] ? cpu_idle+0xd9/0x121
[<ffffffff8095e2a7>] ? start_secondary+0x16f/0x174
Code: 8b 57 68 48 89 44 24 10 8b 87 a8 00 00 00 48 89 44 24 08 8b bf
a4 00 00 00 31 c0 48 89 3c 24 48 c7 c7 4c 82 d2 80 e8 65 99 0c 00 <0f>
0b eb fe 89 d0 c9 49 8d 04 00 c3 55 48 89 e5 41 56 41 55 41
RIP [<ffffffff808985bb>] skb_put+0x82/0x8e
RSP <ffff880426567d30>
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G D
2.6.27-rc4-tip-00862-gcc150c8-dirty #144
Call Trace:
<IRQ> [<ffffffff80961e69>] panic+0xa5/0x15c
[<ffffffff80227da6>] ? apic_timer_interrupt+0x66/0x70
[<ffffffff80964e26>] ? oops_end+0x42/0xa4
[<ffffffff80964e6b>] oops_end+0x87/0xa4
[<ffffffff80228c60>] die+0x62/0x6b
[<ffffffff8096547c>] do_trap+0x110/0x11f
[<ffffffff802292cd>] do_invalid_op+0x98/0xa1
[<ffffffff808985bb>] ? skb_put+0x82/0x8e
[<ffffffff808ce840>] ? tcp_v4_rcv+0x3ad/0x594
[<ffffffff80961f8c>] ? printk+0x6c/0x70
[<ffffffff80964829>] error_exit+0x0/0x51
[<ffffffff808985bb>] ? skb_put+0x82/0x8e
[<ffffffff80568ad9>] e1000_clean_rx_irq+0x1e9/0x2c4
[<ffffffff80567091>] e1000_clean+0x316/0x4cc
[<ffffffff802750f5>] ? clockevents_program_event+0x81/0x8a
[<ffffffff8027613e>] ? tick_program_event+0x3f/0x5f
[<ffffffff8089cb3c>] net_rx_action+0x7f/0x159
[<ffffffff8025ec38>] __do_softirq+0x72/0xe2
[<ffffffff80565ac0>] ? e1000_intr_msi+0x100/0x10a
[<ffffffff8022835c>] call_softirq+0x1c/0x28
[<ffffffff802295b0>] do_softirq+0x39/0x77
[<ffffffff8025e94c>] irq_exit+0x44/0x87
[<ffffffff80229907>] do_IRQ+0x146/0x168
[<ffffffff802275c1>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff8024c75c>] ? pick_next_task_fair+0x9d/0xac
[<ffffffff8022d69c>] ? default_idle+0x30/0x45
[<ffffffff8022d832>] ? c1e_idle+0xd5/0xdc
[<ffffffff80966fea>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff80226271>] ? cpu_idle+0xd9/0x121
[<ffffffff8095e2a7>] ? start_secondary+0x16f/0x174

2008-08-26 09:19:41

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: skbuff bug?

Hi.

On Thu, Aug 21, 2008 at 04:46:49PM -0700, Yinghai Lu ([email protected]) wrote:
> it seems caused by
>
> commit f8d59f7826aa73c5e7682fbed6db38020635d466
> Author: Bruce Allan <[email protected]>
> Date: Fri Aug 8 18:36:11 2008 -0700
>
> e1000e: test for unusable MSI support

Intel folks seems to be on vacations or whatever else without access to
the mail. Can you confirm that bug still exists in the latest tree and
reverting this commit fixes the problem?

--
Evgeniy Polyakov

2008-08-26 10:17:25

by Jeff Kirsher

[permalink] [raw]
Subject: Re: skbuff bug?

On Tue, Aug 26, 2008 at 2:17 AM, Evgeniy Polyakov <[email protected]> wrote:
> Hi.
>
> On Thu, Aug 21, 2008 at 04:46:49PM -0700, Yinghai Lu ([email protected]) wrote:
>> it seems caused by
>>
>> commit f8d59f7826aa73c5e7682fbed6db38020635d466
>> Author: Bruce Allan <[email protected]>
>> Date: Fri Aug 8 18:36:11 2008 -0700
>>
>> e1000e: test for unusable MSI support
>
> Intel folks seems to be on vacations or whatever else without access to
> the mail. Can you confirm that bug still exists in the latest tree and
> reverting this commit fixes the problem?
>
> --
> Evgeniy Polyakov
> --

We are looking into the issue, I am sorry there has been no response until now.

I am still curious if you still see the issue with all the recent
changes. Please provide the device id you are using and system info
so that we can try and reproduce the issue, if the issue persists with
the latest git tree.

--
Cheers,
Jeff

2008-08-26 16:38:20

by Yinghai Lu

[permalink] [raw]
Subject: Re: skbuff bug?

On Tue, Aug 26, 2008 at 3:17 AM, Jeff Kirsher
<[email protected]> wrote:
> On Tue, Aug 26, 2008 at 2:17 AM, Evgeniy Polyakov <[email protected]> wrote:
>> Hi.
>>
>> On Thu, Aug 21, 2008 at 04:46:49PM -0700, Yinghai Lu ([email protected]) wrote:
>>> it seems caused by
>>>
>>> commit f8d59f7826aa73c5e7682fbed6db38020635d466
>>> Author: Bruce Allan <[email protected]>
>>> Date: Fri Aug 8 18:36:11 2008 -0700
>>>
>>> e1000e: test for unusable MSI support
>>
>> Intel folks seems to be on vacations or whatever else without access to
>> the mail. Can you confirm that bug still exists in the latest tree and
>> reverting this commit fixes the problem?
>>
>> --
>> Evgeniy Polyakov
>> --
>
> We are looking into the issue, I am sorry there has been no response until now.
>
> I am still curious if you still see the issue with all the recent
> changes. Please provide the device id you are using and system info
> so that we can try and reproduce the issue, if the issue persists with
> the latest git tree.

can not duplicate any more...

YH