2008-02-14 23:36:03

by Reinette Chatre

[permalink] [raw]
Subject: sending ARP triggers BUG

Hi,

I recently started seeing the BUG below. ieee80211_subif_start_xmit
calls pskb_expand_head, but this function BUGs because the skb is
shared. So far I have only seen this with arp messages ... I don't know
the significance of this fact.

------------[ cut here ]------------
kernel BUG at .../net/core/skbuff.c:643!
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: iwl3945 rfcomm l2cap bluetooth ipv6 acpi_cpufreq
cpufreq_powersave cpufe
Pid: 0, comm: swapper Not tainted (2.6.24 #3)
EIP: 0060:[<c024ef05>] EFLAGS: 00010202 CPU: 1
EIP is at pskb_expand_head+0x23/0x140
EAX: dba19d50 EBX: daa72bb8 ECX: 0000000c EDX: dba19cd0
ESI: 0000000c EDI: dba19cd2 EBP: da473d04 ESP: da473ce8
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=da472000 task=da46a000 task.ti=da472000)
Stack: 00000025 d74ee000 da473d04 000000d3 00000006 0000000c dba19cd2
da473d84
dca9360e 00000020 c0143622 da46a57c d797f000 daa72bb8 dba19ce0
dba19cf4
c037f108 da46a000 00000002 da46a050 0000001a 00000000 d9bb82a0
08000002
Call Trace:
[<c01050bc>] show_trace_log_lvl+0x1a/0x2f
[<c010516c>] show_stack_log_lvl+0x9b/0xa3
[<c0105218>] show_registers+0xa4/0x1d9
[<c010546e>] die+0x121/0x202
[<c02b7458>] do_trap+0x8a/0xa3
[<c010580e>] do_invalid_op+0x88/0x92
[<c02b722a>] error_code+0x72/0x78
[<dca9360e>] ieee80211_subif_start_xmit+0x330/0x56e [mac80211]
[<c02540e7>] dev_hard_start_xmit+0x24e/0x2b3
[<c02627d9>] __qdisc_run+0x74/0x16b
[<c025643e>] dev_queue_xmit+0x19f/0x2e5
[<c028f1f1>] arp_xmit+0x4b/0x51
[<c028fc08>] arp_send+0x45/0x4c
[<c02904c6>] arp_solicit+0x196/0x1aa
[<c025babb>] neigh_timer_handler+0x267/0x2a8
[<c012d1b0>] run_timer_softirq+0x142/0x1a4
[<c0129bd1>] __do_softirq+0x78/0xed
[<c0129c7f>] do_softirq+0x39/0x55
[<c0129e32>] irq_exit+0x45/0x83
[<c0115edf>] smp_apic_timer_interrupt+0x77/0x84
[<c0104b6f>] apic_timer_interrupt+0x33/0x38
[<c010259c>] cpu_idle+0x9e/0xd3
[<c0114c82>] start_secondary+0x165/0x16c
[<00000000>] 0x0
=======================
Code: f1 f4 f1 ff 5b 5e 5d c3 55 89 e5 57 56 53 89 c3 83 ec 10 83 bb a0
00 00 00 01 89 55
EIP: [<c024ef05>] pskb_expand_head+0x23/0x140 SS:ESP 0068:da473ce8
Kernel panic - not syncing: Fatal exception in interrupt


2008-02-20 18:28:27

by Johannes Berg

[permalink] [raw]
Subject: Re: sending ARP triggers BUG


> I recently started seeing the BUG below. ieee80211_subif_start_xmit
> calls pskb_expand_head, but this function BUGs because the skb is
> shared. So far I have only seen this with arp messages ... I don't know
> the significance of this fact.

Interestingly, I'm starting to see skb problems as well, in AP mode only
however, namely I get lots of

[ 4340.665679] SKB BUG: Invalid truesize (240) len=73, sizeof(sk_buff)=176

Anybody have an idea how to debug that? It looks like 'len' is one too
large, but I've also seen messages where it was two too large or one to
small.

johannes


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2008-02-21 06:35:46

by David Miller

[permalink] [raw]
Subject: Re: sending ARP triggers BUG

From: Johannes Berg <[email protected]>
Date: Wed, 20 Feb 2008 01:56:08 +0100

> [ 4340.665679] SKB BUG: Invalid truesize (240) len=73, sizeof(sk_buff)=176
>
> Anybody have an idea how to debug that? It looks like 'len' is one too
> large, but I've also seen messages where it was two too large or one to
> small.

The BUG occurs when you use paged SKBs, it's different from
the other problem the person you are replying to is seeing.

The easiest thing to do to look for potentially problematic areas
is to find code that modifies skb->data_len but doesn't make
similar adjustments to skb->truesize.

2008-02-15 15:11:03

by Johannes Berg

[permalink] [raw]
Subject: Re: sending ARP triggers BUG

Hi Reinette,

> I recently started seeing the BUG below. ieee80211_subif_start_xmit
> calls pskb_expand_head, but this function BUGs because the skb is
> shared.

I've been expecting this for a while now since I never found where on
mac80211's input path we make sure the skb isn't cloned/shared... I say
we just put an skb_unshare() into there since we need to modify the skb
once we get it.

johannes


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2008-03-07 07:04:22

by Zhu Yi

[permalink] [raw]
Subject: Re: sending ARP triggers BUG


On Wed, 2008-02-20 at 22:36 -0800, David Miller wrote:
> From: Johannes Berg <[email protected]>
> Date: Wed, 20 Feb 2008 01:56:08 +0100

> > Interestingly, I'm starting to see skb problems as well, in AP mode
> > only however, namely I get lots of
> >
> > [ 4340.665679] SKB BUG: Invalid truesize (240) len=73, sizeof(sk_buff)=176

OK. I started to see this also after I began to play with the AP mode.

SKB BUG: Invalid truesize (272) len=71, sizeof(sk_buff)=208

I get this for every ping packet from the AP to the client and only
occasionally if I ping AP from client. According to the call trace, it's
from the AP receive path.

Call Trace:
[<ffffffff804101d0>] ? sock_rfree+0x22/0x51
[<ffffffff8041394d>] ? skb_release_all+0x86/0xbe
[<ffffffff80413151>] ? __kfree_skb+0x9/0x6f
[<ffffffff8041596f>] ? skb_free_datagram+0xc/0x31
[<ffffffff8047a68d>] ? packet_recvmsg+0x174/0x187
[<ffffffff8040d807>] ? sock_recvmsg+0xf0/0x10f
[<ffffffff80370499>] ? n_tty_receive_buf+0xdc8/0xe20
[<ffffffff802468d2>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff8029aa5b>] ? core_sys_select+0x232/0x263
[<ffffffff80273316>] ? __do_fault+0x38f/0x3da
[<ffffffff8040e87c>] ? sys_recvfrom+0xbc/0x120
[<ffffffff8020bfd9>] ? tracesys+0xdc/0xe1


> > Anybody have an idea how to debug that? It looks like 'len' is one too
> > large, but I've also seen messages where it was two too large or one to
> > small.
>
> The BUG occurs when you use paged SKBs, it's different from
> the other problem the person you are replying to is seeing.
>
> The easiest thing to do to look for potentially problematic areas
> is to find code that modifies skb->data_len but doesn't make
> similar adjustments to skb->truesize.

>From my search result, most of such code is from tcp/ip and skbuff.c.
None from wireless, mac80211 or drivers. Looks like this is not wireless
specific? Anyone has made any progress on this?

Thanks,
-yi


2008-03-07 08:24:35

by Johannes Berg

[permalink] [raw]
Subject: Re: sending ARP triggers BUG


On Fri, 2008-03-07 at 15:04 +0800, Zhu Yi wrote:
> On Wed, 2008-02-20 at 22:36 -0800, David Miller wrote:
> > From: Johannes Berg <[email protected]>
> > Date: Wed, 20 Feb 2008 01:56:08 +0100
>
> > > Interestingly, I'm starting to see skb problems as well, in AP mode
> > > only however, namely I get lots of
> > >
> > > [ 4340.665679] SKB BUG: Invalid truesize (240) len=73, sizeof(sk_buff)=176
>
> OK. I started to see this also after I began to play with the AP mode.
>
> SKB BUG: Invalid truesize (272) len=71, sizeof(sk_buff)=208
>
> I get this for every ping packet from the AP to the client and only
> occasionally if I ping AP from client. According to the call trace, it's
> from the AP receive path.

You get this whenever you have monitor interfaces operating at the same
time. At least cooked monitor. Haven't had a chance to investigate, but
some change in the rest of the kernel seems to have caused this. Might
well be that we're doing something wrong with the SKBs in our monitor
functions though

johannes


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part