2011-06-27 21:49:29

by Justin Piszcz

[permalink] [raw]
Subject: 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!

Hi,

Looks like I am not the first one:
https://bugzilla.redhat.com/show_bug.cgi?id=429868

Any thoughts on this one?
http://home.comcast.net/~jpiszcz/20110627/IMG_2703.JPG

Rough transcription:

skb_over_panic: text:ffffffff813d711c len:44117 put:44117 head:ffff880415d40000 data:ffff880415d40040 tail:0xac95 end:0x640 dev:eth2
-- [ cut here ] ---
kernel BUG at net/core/skbuff.c:127!
invalid opcode: 0000 [#1] SMP

Was playing a video over samba (eth0) and then the kernel panic'd on eth2?

I use all INTEL controllers:

00:19.0 Ethernet controller: Intel Corporation 82578DC Gigabit Network Connection (rev 05)
01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter (rev 01)
03:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
03:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
03:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)

Justin.


2011-06-27 22:15:04

by Ronciak, John

[permalink] [raw]
Subject: RE: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!

Sorry to hear of your problem. Please do not try to relate something that happened on REHL4 to something that is happening in the 2.6.39 .2 kernel.

Is the problem reproducible? If so, does this happen every time? What interface had what HW associated with it? The info included below doesn't say. What in type of traffic was happening on eth2?

Cheers,
John


> -----Original Message-----
> From: Justin Piszcz [mailto:[email protected]]
> Sent: Monday, June 27, 2011 2:49 PM
> To: [email protected]
> Cc: [email protected]
> Subject: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at
> net/core/skbuff.c:127!
>
> Hi,
>
> Looks like I am not the first one:
> https://bugzilla.redhat.com/show_bug.cgi?id=429868
>
> Any thoughts on this one?
> http://home.comcast.net/~jpiszcz/20110627/IMG_2703.JPG
>
> Rough transcription:
>
> skb_over_panic: text:ffffffff813d711c len:44117 put:44117
> head:ffff880415d40000 data:ffff880415d40040 tail:0xac95 end:0x640
> dev:eth2
> -- [ cut here ] ---
> kernel BUG at net/core/skbuff.c:127!
> invalid opcode: 0000 [#1] SMP
>
> Was playing a video over samba (eth0) and then the kernel panic'd on
> eth2?
>
> I use all INTEL controllers:
>
> 00:19.0 Ethernet controller: Intel Corporation 82578DC Gigabit Network
> Connection (rev 05) 01:00.0 Ethernet controller: Intel Corporation
> 82598EB 10-Gigabit AT2 Server Adapter (rev 01) 03:00.0 Ethernet
> controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
> 03:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 03:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 03:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
>
> Justin.
>
>
> -----------------------------------------------------------------------
> -------
> All of the data generated in your IT infrastructure is seriously
> valuable.
> Why? It contains a definitive record of application performance,
> security threats, fraudulent activity, and more. Splunk takes this data
> and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> E1000-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel® Ethernet, visit
> http://communities.intel.com/community/wired

2011-06-27 22:21:26

by Justin Piszcz

[permalink] [raw]
Subject: RE: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!



On Mon, 27 Jun 2011, Ronciak, John wrote:

> Sorry to hear of your problem. Please do not try to relate something that happened on REHL4 to something that is happening in the 2.6.39 .2 kernel.
>
> Is the problem reproducible? If so, does this happen every time? What interface had what HW associated with it? The info included below doesn't say. What in type of traffic was happening on eth2?
>
> Cheers,
> John
>

Hello John,

Not much actually, just regular cable modem network traffic (was not even
utilizing it heavily), vnstat below:

eth2 18:11
^ r
| r r
| r r
| r r
| r r r r
| r r r r r
| r r r r r
| r r r r r r r r
| r r r r r r r r r
| r r r r r r r r r
-+--------------------------------------------------------------------------->
| 19 20 21 22 23 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18

h rx (KiB) tx (KiB) h rx (KiB) tx (KiB) h rx (KiB) tx (KiB)
19 89730 8838 03 11249 620 11 126496 9383
20 10972 596 04 43551 1729 12 86326 6513
21 11489 611 05 12159 699 13 47805 3863
22 12073 630 06 12080 712 14 52322 4577
23 12153 654 07 11894 710 15 135715 8618
00 11536 594 08 7392 396 16 75680 6181
01 11159 560 09 0 0 17 33817 3213
02 11564 575 10 759 51 18 4315 190

The most recent crash was at 16:33.

The system crashed earlier too but I had X running so I couldn't get a
screenshot, there is no serial port on this machine and netconsole does not
work with this type of failure.

It is unfortunately not reproducible, is there any debugging options that
you would recommend I can enable that will expose this bug? I'm up for
anything at this point.

All of this may have started when I added a 4-port Intel NIC:
http://www.intel.com/Assets/PDF/prodbrief/323205.pdf

NIC = Intel Ethernet I340 Server Adatper

But this is just a guess..

Justin.



2011-06-27 22:29:03

by Ronciak, John

[permalink] [raw]
Subject: RE: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!

> Hello John,
>
> Not much actually, just regular cable modem network traffic (was not
> even utilizing it heavily), vnstat below:
>
> eth2
> 18:11
> ^ r
> | r r
> | r r
> | r r
> | r r r r
> | r r r r r
> | r r r r r
> | r r r r r r r r
> | r r r r r r r r
> r
> | r r r r r r r r
> r
> -+-------------------------------------------------------------------
> -------->
> | 19 20 21 22 23 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16
> 17 18
>
> h rx (KiB) tx (KiB) h rx (KiB) tx (KiB) h rx (KiB)
> tx (KiB)
> 19 89730 8838 03 11249 620 11 126496
> 9383
> 20 10972 596 04 43551 1729 12 86326
> 6513
> 21 11489 611 05 12159 699 13 47805
> 3863
> 22 12073 630 06 12080 712 14 52322
> 4577
> 23 12153 654 07 11894 710 15 135715
> 8618
> 00 11536 594 08 7392 396 16 75680
> 6181
> 01 11159 560 09 0 0 17 33817
> 3213
> 02 11564 575 10 759 51 18 4315
> 190
>
> The most recent crash was at 16:33.
>
> The system crashed earlier too but I had X running so I couldn't get a
> screenshot, there is no serial port on this machine and netconsole does
> not work with this type of failure.
>
> It is unfortunately not reproducible, is there any debugging options
> that you would recommend I can enable that will expose this bug? I'm up
> for anything at this point.
>
> All of this may have started when I added a 4-port Intel NIC:
> http://www.intel.com/Assets/PDF/prodbrief/323205.pdf
>
> NIC = Intel Ethernet I340 Server Adatper
>
> But this is just a guess..
>
> Justin.
You still didn't tell us what eth interface is on which HW. We need to know that. Do an 'ethtool -i eth2' and the same for each of the other interfaces in the system. The igb driver that is used on the NIC described above is in high use without people reporting this error.

Cheers,
John

2011-06-27 22:35:10

by Justin Piszcz

[permalink] [raw]
Subject: RE: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!



On Mon, 27 Jun 2011, Ronciak, John wrote:

>> Hello John,
>> Justin.
> You still didn't tell us what eth interface is on which HW. We need to know that. Do an 'ethtool -i eth2' and the same for each of the other interfaces in the system. The igb driver that is used on the NIC described above is in high use without people reporting this error.
>
> Cheers,
> John

Sorry,

eth0 = e1000e (on-board (on an Intel DP55KG))
eth{1,2,3,4} = igb (the 4-port NIC I mentioned)
eth5 = ixgbe (the 10GbE AT2 server board (copper))

--

[ 2.301687] e1000e 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) hidden:mac
[ 2.301878] e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
[ 2.302162] e1000e 0000:00:19.0: eth0: MAC: 9, PHY: 9, PBA No: FFFFFF-0FF

[ 2.350085] igb 0000:03:00.0: eth1: (PCIe:2.5Gb/s:Width x4) hidden:mac
[ 2.350351] igb 0000:03:00.0: eth1: PBA No: E84075-002
[ 2.400381] igb 0000:03:00.1: eth2: (PCIe:2.5Gb/s:Width x4) hidden:mac
[ 2.400486] igb 0000:03:00.1: eth2: PBA No: E84075-002
[ 2.448103] igb 0000:03:00.2: eth3: (PCIe:2.5Gb/s:Width x4) hidden:mac
[ 2.448391] igb 0000:03:00.2: eth3: PBA No: E84075-002
[ 2.496095] igb 0000:03:00.3: eth4: (PCIe:2.5Gb/s:Width x4) hidden:mac
[ 2.496377] igb 0000:03:00.3: eth4: PBA No: E84075-002

[ 38.916123] ixgbe 0000:01:00.0: eth5: NIC Link is Up 10 Gbps, Flow Control: RX/TX

--

ethtool output as requested:

driver: e1000e
version: 1.3.10-k2
firmware-version: 0.12-5
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

ethtool -I eth1
driver: igb
version: 3.0.6-k2
firmware-version: 3.19-0
bus-info: 0000:03:00.3
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

ethtool -I eth2
driver: igb
version: 3.0.6-k2
firmware-version: 3.19-0
bus-info: 0000:03:00.2
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

ethtool -I eth3
driver: igb
version: 3.0.6-k2
firmware-version: 3.19-0
bus-info: 0000:03:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

ethtool -I eth4
driver: igb
version: 3.0.6-k2
firmware-version: 3.19-0
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

ethtool -I eth5
driver: ixgbe
version: 3.2.9-k2
firmware-version: 2.9-0
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

Justin.


2011-06-28 00:09:01

by Justin Piszcz

[permalink] [raw]
Subject: RE: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!


On Mon, 27 Jun 2011, Justin Piszcz wrote:

>
>
> On Mon, 27 Jun 2011, Ronciak, John wrote:
>

Hi,

Here's another crash: (see the dmesg, its right when powering the disks up)
http://home.comcast.net/~jpiszcz/20110627/IMG_2704.JPG

In this case, I was mkfs.xfs -f (some disks attached to a sata dock) over
an Sil 3132 card, I disconnected the card and re-ran it w/ the on-board
SATA controller and the problem no longer occurred (crashed repeatedly
everytime with the NIC error), strange.

In any case, will let you know if there are any further crashes after
removing that PCI-e card.

Justin.

2011-06-28 15:51:17

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!

On 06/27/2011 05:08 PM, Justin Piszcz wrote:
> Hi,
>
> Here's another crash: (see the dmesg, its right when powering the disks up)
> http://home.comcast.net/~jpiszcz/20110627/IMG_2704.JPG
>
> In this case, I was mkfs.xfs -f (some disks attached to a sata dock) over
> an Sil 3132 card, I disconnected the card and re-ran it w/ the on-board
> SATA controller and the problem no longer occurred (crashed repeatedly
> everytime with the NIC error), strange.
>
> In any case, will let you know if there are any further crashes after
> removing that PCI-e card.
>
> Justin.
Justin,

One other thing you might try is downloading and installing our latest
igb driver from e1000.sf.net. It looks like you are currently using the
in-kernel driver and it is possible that there may be differences
between the two that could resolve the issue you are experiencing.

If you are able to reproduce the issue with the Sourceforge driver then
that will provide valuable information. Once we have reproduced the
issue with the Sourceforge driver, we would be able to provide you a
debug driver so that we can narrow down this issue further.

Thanks,

Alex

2011-06-30 23:03:55

by Justin Piszcz

[permalink] [raw]
Subject: RE: [E1000-devel] 2.6.39.2: skb_over_panic: kernel BUG at net/core/skbuff.c:127!



On Mon, 27 Jun 2011, Justin Piszcz wrote:

>
> On Mon, 27 Jun 2011, Justin Piszcz wrote:
>
>>
>>
>> On Mon, 27 Jun 2011, Ronciak, John wrote:
>>
>
> Hi,
>
> Here's another crash: (see the dmesg, its right when powering the disks up)
> http://home.comcast.net/~jpiszcz/20110627/IMG_2704.JPG
>
> In this case, I was mkfs.xfs -f (some disks attached to a sata dock) over
> an Sil 3132 card, I disconnected the card and re-ran it w/ the on-board
> SATA controller and the problem no longer occurred (crashed repeatedly
> everytime with the NIC error), strange.
>
> In any case, will let you know if there are any further crashes after
> removing that PCI-e card.
>
> Justin.
>
>

Hi,

Per:
http://www.mail-archive.com/[email protected]/msg04232.html

I am using the drivers on e1000.sf.net for:
e1000e
igb
igbe

version: 1.3.17-NAPI
srcversion: BA556C5C800B0D67E5F8B84
version: 3.0.22
srcversion: 45B8078075068728A5A5573
version: 3.3.9-NAPI
srcversion: 0734B0E06E21B50A92ADDFF

No crashes when I run mkfs.xfs (w/the eSATA card back in).

Will monitor throughout to see if it recurs.
When will the current -stable versions go into mainline?

Also, is there a kernel option to 'pause' or take a screenshot of a kernel
console crash/dump/stack trace (besides kdump) and not reboot the machine
when it crashes?

I do not have any option to reboot on panic, but sometimes it still does
that.

Thanks!

Justin.