2008-11-28 21:16:19

by Marcin Ślusarz

[permalink] [raw]
Subject: networking oops after resume from s2ram (2.6.28-rc6)

Hi

Sometimes after resume from s2ram networking doesn't work, so I restart it by
/etc/init.d/net.eth1 restart. Recently it started to lock up my box completely,
but today it oopsed only (and killed my keyboard, so I had to save dmesg with
mouse :D).

It looks like it tries to use netconsole without working network interface...

[ 1621.013789] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 1621.013793] IP: [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
[ 1621.013802] PGD 16880067 PUD 16894067 PMD 0
[ 1621.013806] Oops: 0000 [#1] PREEMPT
[ 1621.013808] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
[ 1621.013812] Dumping ftrace buffer:
[ 1621.013814] (ftrace buffer empty)
[ 1621.013816] CPU 0
[ 1621.013819] Pid: 6725, comm: ip Not tainted 2.6.28-rc6-stack+mode #51
[ 1621.013821] RIP: 0010:[<ffffffff803f5db9>] [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
[ 1621.013825] RSP: 0018:ffff8800168c38b8 EFLAGS: 00010006
[ 1621.013827] RAX: 000000000000007f RBX: ffff88003e9c7090 RCX: 0000000000000001
[ 1621.013829] RDX: 0000000000000001 RSI: ffff88003e8b2840 RDI: ffff88003e95d700
[ 1621.013832] RBP: ffff8800168c3918 R08: 0000000000000002 R09: 0000000000000000
[ 1621.013834] R10: 0000000000000006 R11: 0000000000000000 R12: ffff88003e95d700
[ 1621.013836] R13: ffff88003e9c7090 R14: 0000000000000000 R15: 0000000000000001
[ 1621.013838] FS: 00007f47e467f6f0(0000) GS:ffffffff80738180(0000) knlGS:0000000000000000
[ 1621.013841] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1621.013843] CR2: 0000000000000000 CR3: 000000001689a000 CR4: 00000000000006e0
[ 1621.013845] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1621.013847] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1621.013850] Process ip (pid: 6725, threadinfo ffff8800168c2000, task ffff88001688da30)
[ 1621.013851] Stack:
[ 1621.013853] ffff880026921800 0000000080737c20 ffff88003e8b2000 ffff88003e8b2840
[ 1621.013856] ffff88003e87cf00 ffff88003e8b2000 0000000000000000 ffff88003e9ec120
[ 1621.013860] 0000000000000082 ffff88003e8b2000 0000000000000000 0000000000000001
[ 1621.013864] Call Trace:
[ 1621.013866] [<ffffffff804f5a28>] netpoll_send_skb+0xcd/0x196
[ 1621.013871] [<ffffffff804f5fe7>] netpoll_send_udp+0x202/0x20e
[ 1621.013874] [<ffffffff80400081>] write_msg+0x80/0xbf
[ 1621.013879] [<ffffffff8023220d>] __call_console_drivers+0x58/0x69
[ 1621.013884] [<ffffffff8023227f>] _call_console_drivers+0x61/0x66
[ 1621.013887] [<ffffffff802323b5>] release_console_sem+0x131/0x1d4
[ 1621.013890] [<ffffffff80232a08>] vprintk+0x389/0x3b8
[ 1621.013894] [<ffffffff80232a9e>] printk+0x67/0x69
[ 1621.013897] [<ffffffff804e7785>] ? dev_set_rx_mode+0x19/0x2e
[ 1621.013902] [<ffffffff80252e95>] ? __lock_acquire+0x6dd/0x73b
[ 1621.013905] [<ffffffff80251379>] ? mark_held_locks+0x52/0x72
[ 1621.013908] [<ffffffff802361f5>] ? local_bh_enable_ip+0xba/0xd6
[ 1621.013912] [<ffffffff803f2a07>] skge_up+0x7c/0x88a
[ 1621.013916] [<ffffffff804e7795>] ? dev_set_rx_mode+0x29/0x2e
[ 1621.013920] [<ffffffff802361f5>] ? local_bh_enable_ip+0xba/0xd6
[ 1621.013923] [<ffffffff804eb050>] dev_open+0x72/0xa6
[ 1621.013926] [<ffffffff804e9211>] dev_change_flags+0xa8/0x167
[ 1621.013929] [<ffffffff80527683>] devinet_ioctl+0x26a/0x5df
[ 1621.013933] [<ffffffff8052845a>] inet_ioctl+0x92/0xaa
[ 1621.013936] [<ffffffff804de260>] sock_ioctl+0x18c/0x1b8
[ 1621.013939] [<ffffffff8029ed28>] vfs_ioctl+0x2a/0x77
[ 1621.013943] [<ffffffff8029f110>] do_vfs_ioctl+0x39b/0x3e5
[ 1621.013946] [<ffffffff802479ad>] ? up_read+0x26/0x2b
[ 1621.013950] [<ffffffff8020b36c>] ? sysret_check+0x27/0x62
[ 1621.013953] [<ffffffff8029f19c>] sys_ioctl+0x42/0x65
[ 1621.013956] [<ffffffff8020b33b>] system_call_fastpath+0x16/0x1b
[ 1621.013960] Code: 48 c1 f8 03 0f b7 52 04 69 c0 cd cc cc cc 8d 44 30 ff ff c2 39 d0 0f 8c 00 03 00 00 48 8b 75 b8 4c 8b ae 98 00 00 00 4d 8b 75 08 <41> 83 3e 00 79 04 0f 0b eb fe 4d 89 65 10 41 8b 44 24 68 45 31
[ 1621.013988] RIP [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
[ 1621.013991] RSP <ffff8800168c38b8>
[ 1621.013993] CR2: 0000000000000000
[ 1621.013995] ---[ end trace 50a82573cfe89df3 ]---
[ 1621.013998] note: ip[6725] exited with preempt_count 3

Full demsg and config can be found at:
http://www.kadu.net/~joi/kernel/2008.11.28/skge-net.oops/

Marcin


2008-11-30 03:37:20

by Andrew Morton

[permalink] [raw]
Subject: Re: networking oops after resume from s2ram (2.6.28-rc6)

On Fri, 28 Nov 2008 22:15:40 +0100 Marcin Slusarz <[email protected]> wrote:

> Hi
>
> Sometimes after resume from s2ram networking doesn't work, so I restart it by
> /etc/init.d/net.eth1 restart. Recently it started to lock up my box completely,
> but today it oopsed only (and killed my keyboard, so I had to save dmesg with
> mouse :D).
>
> It looks like it tries to use netconsole without working network interface...
>
> [ 1621.013789] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [ 1621.013793] IP: [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
> [ 1621.013802] PGD 16880067 PUD 16894067 PMD 0
> [ 1621.013806] Oops: 0000 [#1] PREEMPT
> [ 1621.013808] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> [ 1621.013812] Dumping ftrace buffer:
> [ 1621.013814] (ftrace buffer empty)
> [ 1621.013816] CPU 0
> [ 1621.013819] Pid: 6725, comm: ip Not tainted 2.6.28-rc6-stack+mode #51
> [ 1621.013821] RIP: 0010:[<ffffffff803f5db9>] [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba

skge driver went splat, I guess.

It would be fun to try using this:


From: Arjan van de Ven <[email protected]>

We're struggling all the time to figure out where the code came from that
oopsed.. The script below (a adaption from a script used by
kerneloops.org) can help developers quite a bit, at least for non-module
cases.

It works and looks like this:

[/home/arjan/linux]$ dmesg | perl scripts/markup_oops.pl vmlinux
{
struct agp_memory *memory;

memory = agp_allocate_memory(agp_bridge, pg_count, type);
c055c10f: 89 c2 mov %eax,%edx
if (memory == NULL)
c055c111: 74 19 je c055c12c <agp_allocate_memory_wrap+0x30>
/* This function must only be called when current_controller != NULL */
static void agp_insert_into_pool(struct agp_memory * temp)
{
struct agp_memory *prev;

prev = agp_fe.current_controller->pool;
c055c113: a1 ec dc 8f c0 mov 0xc08fdcec,%eax
*c055c118: 8b 40 10 mov 0x10(%eax),%eax <----- faulting instruction

if (prev != NULL) {
c055c11b: 85 c0 test %eax,%eax
c055c11d: 74 05 je c055c124 <agp_allocate_memory_wrap+0x28>
prev->prev = temp;
c055c11f: 89 50 04 mov %edx,0x4(%eax)
temp->next = prev;
c055c122: 89 02 mov %eax,(%edx)
}
agp_fe.current_controller->pool = temp;
c055c124: a1 ec dc 8f c0 mov 0xc08fdcec,%eax
c055c129: 89 50 10 mov %edx,0x10(%eax)
if (memory == NULL)
return NULL;

agp_insert_into_pool(memory);

so in this case, we faulted while dereferencing agp_fe.current_controller pointer, and we get to see
exactly which function and line it affects... Personally I find this very useful, and I can
see value for having this script in the kernel for more-than-just-me to use.

Caveats:
* It only works for oopses not-in-modules
* It only works nicely for kernels compiled with CONFIG_DEBUG_INFO
* It's not very fast.
* It only works on x86

Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

scripts/markup_oops.pl | 162 +++++++++++++++++++++++++++++++++++++++
1 file changed, 162 insertions(+)

diff -puN /dev/null scripts/markup_oops.pl
--- /dev/null
+++ a/scripts/markup_oops.pl
@@ -0,0 +1,162 @@
+#!/usr/bin/perl -w
+
+# Copyright 2008, Intel Corporation
+#
+# This file is part of the Linux kernel
+#
+# This program file is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the
+# Free Software Foundation; version 2 of the License.
+#
+# Authors:
+# Arjan van de Ven <[email protected]>
+
+
+my $vmlinux_name = $ARGV[0];
+
+#
+# Step 1: Parse the oops to find the EIP value
+#
+
+my $target = "0";
+while (<STDIN>) {
+ if ($_ =~ /EIP: 0060:\[\<([a-z0-9]+)\>\]/) {
+ $target = $1;
+ }
+}
+
+if ($target =~ /^f8/) {
+ print "This script does not work on modules ... \n";
+ exit;
+}
+
+if ($target eq "0") {
+ print "No oops found!\n";
+ print "Usage: \n";
+ print " dmesg | perl scripts/markup_oops.pl vmlinux\n";
+ exit;
+}
+
+my $counter = 0;
+my $state = 0;
+my $center = 0;
+my @lines;
+
+sub InRange {
+ my ($address, $target) = @_;
+ my $ad = "0x".$address;
+ my $ta = "0x".$target;
+ my $delta = hex($ad) - hex($ta);
+
+ if (($delta > -4096) && ($delta < 4096)) {
+ return 1;
+ }
+ return 0;
+}
+
+
+
+# first, parse the input into the lines array, but to keep size down,
+# we only do this for 4Kb around the sweet spot
+
+my $filename;
+
+open(FILE, "objdump -dS $vmlinux_name |") || die "Cannot start objdump";
+
+while (<FILE>) {
+ my $line = $_;
+ chomp($line);
+ if ($state == 0) {
+ if ($line =~ /^([a-f0-9]+)\:/) {
+ if (InRange($1, $target)) {
+ $state = 1;
+ }
+ }
+ } else {
+ if ($line =~ /^([a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]+)\:/) {
+ my $val = $1;
+ if (!InRange($val, $target)) {
+ last;
+ }
+ if ($val eq $target) {
+ $center = $counter;
+ }
+ }
+ $lines[$counter] = $line;
+
+ $counter = $counter + 1;
+ }
+}
+
+close(FILE);
+
+if ($counter == 0) {
+ print "No matching code found \n";
+ exit;
+}
+
+if ($center == 0) {
+ print "No matching code found \n";
+ exit;
+}
+
+my $start;
+my $finish;
+my $codelines = 0;
+my $binarylines = 0;
+# now we go up and down in the array to find how much we want to print
+
+$start = $center;
+
+while ($start > 1) {
+ $start = $start - 1;
+ my $line = $lines[$start];
+ if ($line =~ /^([a-f0-9]+)\:/) {
+ $binarylines = $binarylines + 1;
+ } else {
+ $codelines = $codelines + 1;
+ }
+ if ($codelines > 10) {
+ last;
+ }
+ if ($binarylines > 20) {
+ last;
+ }
+}
+
+
+$finish = $center;
+$codelines = 0;
+$binarylines = 0;
+while ($finish < $counter) {
+ $finish = $finish + 1;
+ my $line = $lines[$finish];
+ if ($line =~ /^([a-f0-9]+)\:/) {
+ $binarylines = $binarylines + 1;
+ } else {
+ $codelines = $codelines + 1;
+ }
+ if ($codelines > 10) {
+ last;
+ }
+ if ($binarylines > 20) {
+ last;
+ }
+}
+
+
+my $i;
+
+my $fulltext = "";
+$i = $start;
+while ($i < $finish) {
+ if ($i == $center) {
+ $fulltext = $fulltext . "*$lines[$i] <----- faulting instruction\n";
+ } else {
+ $fulltext = $fulltext . " $lines[$i]\n";
+ }
+ $i = $i +1;
+}
+
+print $fulltext;
+
_

2008-11-30 12:59:47

by Marcin Ślusarz

[permalink] [raw]
Subject: Re: networking oops after resume from s2ram (2.6.28-rc6)

On Sat, Nov 29, 2008 at 07:36:56PM -0800, Andrew Morton wrote:
> On Fri, 28 Nov 2008 22:15:40 +0100 Marcin Slusarz <[email protected]> wrote:
>
> > Hi
> >
> > Sometimes after resume from s2ram networking doesn't work, so I restart it by
> > /etc/init.d/net.eth1 restart. Recently it started to lock up my box completely,
> > but today it oopsed only (and killed my keyboard, so I had to save dmesg with
> > mouse :D).
> >
> > It looks like it tries to use netconsole without working network interface...
> >
> > [ 1621.013789] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> > [ 1621.013793] IP: [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
> > [ 1621.013802] PGD 16880067 PUD 16894067 PMD 0
> > [ 1621.013806] Oops: 0000 [#1] PREEMPT
> > [ 1621.013808] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> > [ 1621.013812] Dumping ftrace buffer:
> > [ 1621.013814] (ftrace buffer empty)
> > [ 1621.013816] CPU 0
> > [ 1621.013819] Pid: 6725, comm: ip Not tainted 2.6.28-rc6-stack+mode #51
> > [ 1621.013821] RIP: 0010:[<ffffffff803f5db9>] [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
>
> skge driver went splat, I guess.
>
> It would be fun to try using this:

I had to "add support" for x86_64 oopses. Patch at the end of mail.

>
> From: Arjan van de Ven <[email protected]>
>
> We're struggling all the time to figure out where the code came from that
> oopsed.. The script below (a adaption from a script used by
> kerneloops.org) can help developers quite a bit, at least for non-module
> cases.
>
> It works and looks like this:
>
> [/home/arjan/linux]$ dmesg | perl scripts/markup_oops.pl vmlinux
> (...)
> diff -puN /dev/null scripts/markup_oops.pl
> --- /dev/null
> +++ a/scripts/markup_oops.pl
> (...)

ffffffff803f5db9
u64 map;

if (skb_padto(skb, ETH_ZLEN))
return NETDEV_TX_OK;

if (unlikely(skge_avail(&skge->tx_ring) < skb_shinfo(skb)->nr_frags + 1))
ffffffff803f5d75: 41 8b 94 24 a8 00 00 mov 0xa8(%r12),%edx
ffffffff803f5d7c: 00
ffffffff803f5d7d: 48 2b 41 08 sub 0x8(%rcx),%rax
ffffffff803f5d81: 49 03 94 24 b0 00 00 add 0xb0(%r12),%rdx
ffffffff803f5d88: 00
ffffffff803f5d89: b9 01 00 00 00 mov $0x1,%ecx
ffffffff803f5d8e: 48 c1 f8 03 sar $0x3,%rax
ffffffff803f5d92: 0f b7 52 04 movzwl 0x4(%rdx),%edx
ffffffff803f5d96: 69 c0 cd cc cc cc imul $0xcccccccd,%eax,%eax
ffffffff803f5d9c: 8d 44 30 ff lea -0x1(%rax,%rsi,1),%eax
ffffffff803f5da0: ff c2 inc %edx
ffffffff803f5da2: 39 d0 cmp %edx,%eax
ffffffff803f5da4: 0f 8c 00 03 00 00 jl ffffffff803f60aa <skge_xmit_frame+0x3a9>
return NETDEV_TX_BUSY;

e = skge->tx_ring.to_use;
ffffffff803f5daa: 48 8b 75 b8 mov -0x48(%rbp),%rsi
ffffffff803f5dae: 4c 8b ae 98 00 00 00 mov 0x98(%rsi),%r13
td = e->desc;
ffffffff803f5db5: 4d 8b 75 08 mov 0x8(%r13),%r14
BUG_ON(td->control & BMU_OWN);
*ffffffff803f5db9: 41 83 3e 00 cmpl $0x0,(%r14) <----- faulting instruction
ffffffff803f5dbd: 79 04 jns ffffffff803f5dc3 <skge_xmit_frame+0xc2>
ffffffff803f5dbf: 0f 0b ud2a
ffffffff803f5dc1: eb fe jmp ffffffff803f5dc1 <skge_xmit_frame+0xc0>
e->skb = skb;
ffffffff803f5dc3: 4d 89 65 10 mov %r12,0x10(%r13)
return skb->data_len;
}

static inline unsigned int skb_headlen(const struct sk_buff *skb)
{
return skb->len - skb->data_len;
ffffffff803f5dc7: 41 8b 44 24 68 mov 0x68(%r12),%eax
}

static inline dma_addr_t


---
From: Marcin Slusarz <[email protected]>
Subject: [PATCH] markup_oops.pl: "add support" for x86_64

Find instruction pointer in x86_64 oopses.

-w removed because it spammed with:
"Hexadecimal number > 0xffffffff non-portable at scripts/markup_oops.pl line 52, <FILE> line 383."

Signed-off-by: Marcin Slusarz <[email protected]>
Cc: Arjan van de Ven <[email protected]>
---
scripts/markup_oops.pl | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/scripts/markup_oops.pl b/scripts/markup_oops.pl
index 700a7a6..7038679 100644
--- a/scripts/markup_oops.pl
+++ b/scripts/markup_oops.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl

# Copyright 2008, Intel Corporation
#
@@ -23,6 +23,9 @@ while (<STDIN>) {
if ($_ =~ /EIP: 0060:\[\<([a-z0-9]+)\>\]/) {
$target = $1;
}
+ if ($_ =~ /RIP: 0010:\[\<([a-z0-9]+)\>\]/) {
+ $target = $1;
+ }
}

if ($target =~ /^f8/) {
--