LinuxLists.cc - Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

2010-12-18 11:41:38

Subject: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

Hello all,

Since a year now, I installed my favorite Debian testing on my new X86_64 platform on which I like to run my personal kernel
config. And all was working fine since 2.6.35 was released.

The first time I experiment this issue, it was during a test of Ubuntu upgrade to the latest release 10.10 against which I
open this bug report <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/618636>: in short the system panicing on first
connection to my ISP.

This still occurs to me on the latest ubuntu stock linux image 2.6.35 as well as on my personal config from vanilla sources.

As in general this kind of problem is quickly identified and solved, I do some more test with 2.6.36 and even 2.6.37-rc?
releases, in smp or up config, with or without netfilter, with or without foreign nvidia graphical driver but this issue is
still there for me?

Obviously, I also do some researches with google but without a lot of success. So I try to figure out when this pb appears
in the development history and find out that it was already in 2.6.35-rc1. So I do more kind of rewind in time and all was
working fine since 2.6.34-git5 but problems appears in 2.6.34-git[67]. For my pppoe issue, there are few changes that I
tried to revert without success?

The latest related stuff I can discovered is this regression report 2.6.35 -> 2.6.36
<http://www.spinics.net/lists/netdev/msg145824.html> and the debugging patch proposed by Jarek Poplawski effectively drops
the ppp interface but the system doesn't panic any more, so can I get following kernel log:
Dec 17 17:26:53 sidh2 kernel: [ 25.941424] eth1: Link up
Dec 17 17:26:53 sidh2 kernel: [ 25.941736] eth1: Link changed: 10Mbps, half duplex
Dec 17 17:26:53 sidh2 kernel: [ 26.266657] r8169 0000:08:00.0: eth0: link up
Dec 17 17:26:53 sidh2 kernel: [ 26.266662] r8169 0000:08:00.0: eth0: link up
Dec 17 17:26:53 sidh2 kernel: [ 26.829432] ip_tables: (C) 2000-2006 Netfilter Core Team
Dec 17 17:26:58 sidh2 kernel: [ 33.017352] ppdev: user-space parallel port driver
Dec 17 17:27:04 sidh2 kernel: [ 36.357885] eth0: no IPv6 routers present
Dec 17 17:27:04 sidh2 kernel: [ 36.517833] eth1: no IPv6 routers present
Dec 17 17:28:12 sidh2 kernel: [ 106.511762] BUG: unable to handle kernel NULL pointer dereference at (null)
Dec 17 17:28:12 sidh2 kernel: [ 106.511952] IP: [<ffffffff810e405b>] put_page+0x1b/0x180
Dec 17 17:28:12 sidh2 kernel: [ 106.512078] PGD 13c6f0067 PUD 134283067 PMD 0
Dec 17 17:28:12 sidh2 kernel: [ 106.512293] Oops: 0000 [#1] SMP
Dec 17 17:28:12 sidh2 kernel: [ 106.512456] last sysfs file: /sys/devices/virtual/sound/timer/uevent
Dec 17 17:28:12 sidh2 kernel: [ 106.512533] CPU 4
Dec 17 17:28:12 sidh2 kernel: [ 106.512584] Modules linked in: ppdev xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle ip_tables
x_tables lp nvidia(P) parport_pc parport agpgart i7core_edac edac_core tpm_tis tpm tpm_bios [last unloaded: scsi_wait_scan]
Dec 17 17:28:12 sidh2 kernel: [ 106.513674]
Dec 17 17:28:12 sidh2 kernel: [ 106.513742] Pid: 1821, comm: pppd Tainted: P 2.6.35-amd64-t3 #5 EX58-UD3R/EX58-UD3R
Dec 17 17:28:12 sidh2 kernel: [ 106.513838] RIP: 0010:[<ffffffff810e405b>] [<ffffffff810e405b>] put_page+0x1b/0x180
Dec 17 17:28:12 sidh2 kernel: [ 106.513982] RSP: 0018:ffff88013cecfe08 EFLAGS: 00010292
Dec 17 17:28:12 sidh2 kernel: [ 106.514056] RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000000001733
Dec 17 17:28:12 sidh2 kernel: [ 106.514134] RDX: ffff88013c906640 RSI: ffff88013c906060 RDI: 0000000000000000
Dec 17 17:28:12 sidh2 kernel: [ 106.514211] RBP: ffff88013c86e900 R08: ffffffff81342b90 R09: 00000000ffe22e08
Dec 17 17:28:12 sidh2 kernel: [ 106.514288] R10: ffff88013cece000 R11: 0000000000000000 R12: 00000000080d5f42
Dec 17 17:28:12 sidh2 kernel: [ 106.514366] R13: ffff88013c86f400 R14: 000000000000000a R15: 00000000000005de
Dec 17 17:28:12 sidh2 kernel: [ 106.514444] FS: 0000000000000000(0000) GS:ffff880001f00000(0063) knlGS:00000000f75f5b20
Dec 17 17:28:12 sidh2 kernel: [ 106.514539] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
Dec 17 17:28:12 sidh2 kernel: [ 106.514614] CR2: 0000000000000000 CR3: 000000013c758000 CR4: 00000000000006e0
Dec 17 17:28:12 sidh2 kernel: [ 106.514692] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 17 17:28:12 sidh2 kernel: [ 106.514769] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 17 17:28:12 sidh2 kernel: [ 106.514847] Process pppd (pid: 1821, threadinfo ffff88013cece000, task ffff88013c7ea120)
Dec 17 17:28:12 sidh2 kernel: [ 106.514941] Stack:
Dec 17 17:28:12 sidh2 kernel: [ 106.515008] 0000000000000000 0000000000000001 ffff88013c86e900 00000000080d5f42
Dec 17 17:28:12 sidh2 kernel: [ 106.515227] <0> ffff88013c86f400 ffffffff814288b7 ffff88013c523c00 ffff88013c86e900
Dec 17 17:28:12 sidh2 kernel: [ 106.515556] <0> ffff88013c523c00 ffffffff814283c1 ffff88013c86e900 ffffffff81342d1c
Dec 17 17:28:12 sidh2 kernel: [ 106.515948] Call Trace:
Dec 17 17:28:12 sidh2 kernel: [ 106.516022] [<ffffffff814288b7>] ? skb_release_data+0x77/0xd0
Dec 17 17:28:12 sidh2 kernel: [ 106.516099] [<ffffffff814283c1>] ? __kfree_skb+0x11/0x90
Dec 17 17:28:12 sidh2 kernel: [ 106.516176] [<ffffffff81342d1c>] ? ppp_read+0x18c/0x210
Dec 17 17:28:12 sidh2 kernel: [ 106.516252] [<ffffffff8104df70>] ? default_wake_function+0x0/0x20
Dec 17 17:28:12 sidh2 kernel: [ 106.516329] [<ffffffff8111f905>] ? vfs_read+0xb5/0x1a0
Dec 17 17:28:12 sidh2 kernel: [ 106.516404] [<ffffffff8111fa3e>] ? sys_read+0x4e/0x90
Dec 17 17:28:12 sidh2 kernel: [ 106.516479] [<ffffffff810996ff>] ? compat_sys_gettimeofday+0x9f/0xb0
Dec 17 17:28:12 sidh2 kernel: [ 106.516558] [<ffffffff8103eb8f>] ? sysenter_dispatch+0x7/0x2e
Dec 17 17:28:12 sidh2 kernel: [ 106.516632] Code: cc bb f8 ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 5c 24
08 48 89 6c 24 10 48 89 fb 4c 89 64 24 18 4c 89 6c 24 20 <48> 8b 07 66 a9 00 c0 0f 85 13 01 00 00 f0 ff 4f 08 0f 94 c0 84
Dec 17 17:28:12 sidh2 kernel: [ 106.519823] RIP [<ffffffff810e405b>] put_page+0x1b/0x180
Dec 17 17:28:12 sidh2 kernel: [ 106.519945] RSP <ffff88013cecfe08>
Dec 17 17:28:12 sidh2 kernel: [ 106.520015] CR2: 0000000000000000
Dec 17 17:28:12 sidh2 kernel: [ 106.520093] ---[ end trace bac76655d5f488ec ]---

Sorry but I don't have enough knowledge to understand this pb, so all help will be appreciated.

Tia,
J.

2010-12-20 11:03:44

by Joel Soete

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

On 12/18/2010 11:33 AM, Joel Soete wrote:
> Hello all,
>
[snip]
>
> As in general this kind of problem is quickly identified and solved, I
> do some more test with 2.6.36 and even 2.6.37-rc? releases, in smp or up
> config, with or without netfilter, with or without foreign nvidia
> graphical driver but this issue is still there for me?
>
[snip]

Hello,

Just a small update because I read that there is a pppoe fix in latest testing kernel 2.6.37-rc6 but unfortunately it
doesn't help me :<(.

Tx,
J.

PS: I also test kernels 32 and 64 bits without any success.

2010-12-22 08:26:01

by Andrew Morton

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

(cc netdev)

The bug is still present in 2.6.37-rc6.

On Sat, 18 Dec 2010 11:33:14 +0000 Joel Soete <[email protected]> wrote:

> Hello all,
>
> Since a year now, I installed my favorite Debian testing on my new X86_64 platform on which I like to run my personal kernel
> config. And all was working fine since 2.6.35 was released.
>
> The first time I experiment this issue, it was during a test of Ubuntu upgrade to the latest release 10.10 against which I
> open this bug report <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/618636>: in short the system panicing on first
> connection to my ISP.
>
> This still occurs to me on the latest ubuntu stock linux image 2.6.35 as well as on my personal config from vanilla sources.
>
> As in general this kind of problem is quickly identified and solved, I do some more test with 2.6.36 and even 2.6.37-rc?
> releases, in smp or up config, with or without netfilter, with or without foreign nvidia graphical driver but this issue is
> still there for me?
>
> Obviously, I also do some researches with google but without a lot of success. So I try to figure out when this pb appears
> in the development history and find out that it was already in 2.6.35-rc1. So I do more kind of rewind in time and all was
> working fine since 2.6.34-git5 but problems appears in 2.6.34-git[67]. For my pppoe issue, there are few changes that I
> tried to revert without success?
>
> The latest related stuff I can discovered is this regression report 2.6.35 -> 2.6.36
> <http://www.spinics.net/lists/netdev/msg145824.html> and the debugging patch proposed by Jarek Poplawski effectively drops
> the ppp interface but the system doesn't panic any more, so can I get following kernel log:
> Dec 17 17:26:53 sidh2 kernel: [ 25.941424] eth1: Link up
> Dec 17 17:26:53 sidh2 kernel: [ 25.941736] eth1: Link changed: 10Mbps, half duplex
> Dec 17 17:26:53 sidh2 kernel: [ 26.266657] r8169 0000:08:00.0: eth0: link up
> Dec 17 17:26:53 sidh2 kernel: [ 26.266662] r8169 0000:08:00.0: eth0: link up
> Dec 17 17:26:53 sidh2 kernel: [ 26.829432] ip_tables: (C) 2000-2006 Netfilter Core Team
> Dec 17 17:26:58 sidh2 kernel: [ 33.017352] ppdev: user-space parallel port driver
> Dec 17 17:27:04 sidh2 kernel: [ 36.357885] eth0: no IPv6 routers present
> Dec 17 17:27:04 sidh2 kernel: [ 36.517833] eth1: no IPv6 routers present
> Dec 17 17:28:12 sidh2 kernel: [ 106.511762] BUG: unable to handle kernel NULL pointer dereference at (null)
> Dec 17 17:28:12 sidh2 kernel: [ 106.511952] IP: [<ffffffff810e405b>] put_page+0x1b/0x180
> Dec 17 17:28:12 sidh2 kernel: [ 106.512078] PGD 13c6f0067 PUD 134283067 PMD 0
> Dec 17 17:28:12 sidh2 kernel: [ 106.512293] Oops: 0000 [#1] SMP
> Dec 17 17:28:12 sidh2 kernel: [ 106.512456] last sysfs file: /sys/devices/virtual/sound/timer/uevent
> Dec 17 17:28:12 sidh2 kernel: [ 106.512533] CPU 4
> Dec 17 17:28:12 sidh2 kernel: [ 106.512584] Modules linked in: ppdev xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle ip_tables
> x_tables lp nvidia(P) parport_pc parport agpgart i7core_edac edac_core tpm_tis tpm tpm_bios [last unloaded: scsi_wait_scan]
> Dec 17 17:28:12 sidh2 kernel: [ 106.513674]
> Dec 17 17:28:12 sidh2 kernel: [ 106.513742] Pid: 1821, comm: pppd Tainted: P 2.6.35-amd64-t3 #5 EX58-UD3R/EX58-UD3R
> Dec 17 17:28:12 sidh2 kernel: [ 106.513838] RIP: 0010:[<ffffffff810e405b>] [<ffffffff810e405b>] put_page+0x1b/0x180
> Dec 17 17:28:12 sidh2 kernel: [ 106.513982] RSP: 0018:ffff88013cecfe08 EFLAGS: 00010292
> Dec 17 17:28:12 sidh2 kernel: [ 106.514056] RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000000001733
> Dec 17 17:28:12 sidh2 kernel: [ 106.514134] RDX: ffff88013c906640 RSI: ffff88013c906060 RDI: 0000000000000000
> Dec 17 17:28:12 sidh2 kernel: [ 106.514211] RBP: ffff88013c86e900 R08: ffffffff81342b90 R09: 00000000ffe22e08
> Dec 17 17:28:12 sidh2 kernel: [ 106.514288] R10: ffff88013cece000 R11: 0000000000000000 R12: 00000000080d5f42
> Dec 17 17:28:12 sidh2 kernel: [ 106.514366] R13: ffff88013c86f400 R14: 000000000000000a R15: 00000000000005de
> Dec 17 17:28:12 sidh2 kernel: [ 106.514444] FS: 0000000000000000(0000) GS:ffff880001f00000(0063) knlGS:00000000f75f5b20
> Dec 17 17:28:12 sidh2 kernel: [ 106.514539] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> Dec 17 17:28:12 sidh2 kernel: [ 106.514614] CR2: 0000000000000000 CR3: 000000013c758000 CR4: 00000000000006e0
> Dec 17 17:28:12 sidh2 kernel: [ 106.514692] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 17 17:28:12 sidh2 kernel: [ 106.514769] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 17 17:28:12 sidh2 kernel: [ 106.514847] Process pppd (pid: 1821, threadinfo ffff88013cece000, task ffff88013c7ea120)
> Dec 17 17:28:12 sidh2 kernel: [ 106.514941] Stack:
> Dec 17 17:28:12 sidh2 kernel: [ 106.515008] 0000000000000000 0000000000000001 ffff88013c86e900 00000000080d5f42
> Dec 17 17:28:12 sidh2 kernel: [ 106.515227] <0> ffff88013c86f400 ffffffff814288b7 ffff88013c523c00 ffff88013c86e900
> Dec 17 17:28:12 sidh2 kernel: [ 106.515556] <0> ffff88013c523c00 ffffffff814283c1 ffff88013c86e900 ffffffff81342d1c
> Dec 17 17:28:12 sidh2 kernel: [ 106.515948] Call Trace:
> Dec 17 17:28:12 sidh2 kernel: [ 106.516022] [<ffffffff814288b7>] ? skb_release_data+0x77/0xd0
> Dec 17 17:28:12 sidh2 kernel: [ 106.516099] [<ffffffff814283c1>] ? __kfree_skb+0x11/0x90
> Dec 17 17:28:12 sidh2 kernel: [ 106.516176] [<ffffffff81342d1c>] ? ppp_read+0x18c/0x210
> Dec 17 17:28:12 sidh2 kernel: [ 106.516252] [<ffffffff8104df70>] ? default_wake_function+0x0/0x20
> Dec 17 17:28:12 sidh2 kernel: [ 106.516329] [<ffffffff8111f905>] ? vfs_read+0xb5/0x1a0
> Dec 17 17:28:12 sidh2 kernel: [ 106.516404] [<ffffffff8111fa3e>] ? sys_read+0x4e/0x90
> Dec 17 17:28:12 sidh2 kernel: [ 106.516479] [<ffffffff810996ff>] ? compat_sys_gettimeofday+0x9f/0xb0
> Dec 17 17:28:12 sidh2 kernel: [ 106.516558] [<ffffffff8103eb8f>] ? sysenter_dispatch+0x7/0x2e
> Dec 17 17:28:12 sidh2 kernel: [ 106.516632] Code: cc bb f8 ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 5c 24
> 08 48 89 6c 24 10 48 89 fb 4c 89 64 24 18 4c 89 6c 24 20 <48> 8b 07 66 a9 00 c0 0f 85 13 01 00 00 f0 ff 4f 08 0f 94 c0 84
> Dec 17 17:28:12 sidh2 kernel: [ 106.519823] RIP [<ffffffff810e405b>] put_page+0x1b/0x180
> Dec 17 17:28:12 sidh2 kernel: [ 106.519945] RSP <ffff88013cecfe08>
> Dec 17 17:28:12 sidh2 kernel: [ 106.520015] CR2: 0000000000000000
> Dec 17 17:28:12 sidh2 kernel: [ 106.520093] ---[ end trace bac76655d5f488ec ]---
>
>
> Sorry but I don't have enough knowledge to understand this pb, so all help will be appreciated.
>
> Tia,
> J.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-12-22 11:00:31

by Jarek Poplawski

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

On 2010-12-22 09:22, Andrew Morton wrote:
> (cc netdev)
>
> The bug is still present in 2.6.37-rc6.
>
> On Sat, 18 Dec 2010 11:33:14 +0000 Joel Soete <[email protected]> wrote:
>
>> Hello all,

Hi,
Could you reproduce this bug with a vanilla kernel (without nvidia
patch)? If so, please include dmesg and .config to the next report.

Cheers,
Jarek P.

...
>> Dec 17 17:28:12 sidh2 kernel: [ 106.511762] BUG: unable to handle kernel NULL pointer dereference at (null)
>> Dec 17 17:28:12 sidh2 kernel: [ 106.511952] IP: [<ffffffff810e405b>] put_page+0x1b/0x180
>> Dec 17 17:28:12 sidh2 kernel: [ 106.512078] PGD 13c6f0067 PUD 134283067 PMD 0
>> Dec 17 17:28:12 sidh2 kernel: [ 106.512293] Oops: 0000 [#1] SMP
>> Dec 17 17:28:12 sidh2 kernel: [ 106.512456] last sysfs file: /sys/devices/virtual/sound/timer/uevent
>> Dec 17 17:28:12 sidh2 kernel: [ 106.512533] CPU 4
>> Dec 17 17:28:12 sidh2 kernel: [ 106.512584] Modules linked in: ppdev xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle ip_tables
>> x_tables lp nvidia(P) parport_pc parport agpgart i7core_edac edac_core tpm_tis tpm tpm_bios [last unloaded: scsi_wait_scan]
>> Dec 17 17:28:12 sidh2 kernel: [ 106.513674]
>> Dec 17 17:28:12 sidh2 kernel: [ 106.513742] Pid: 1821, comm: pppd Tainted: P 2.6.35-amd64-t3 #5 EX58-UD3R/EX58-UD3R
...

2010-12-22 16:25:08

by Eric Dumazet

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

Le mercredi 22 décembre 2010 à 17:00 +0100, Joel Soete a écrit :
> Hello Jarek,
>
> Nice to read you :<)
>
> On 12/22/2010 12:00 PM, Jarek Poplawski wrote:
> > On 2010-12-22 09:22, Andrew Morton wrote:
> >> (cc netdev)
> >>
> >> The bug is still present in 2.6.37-rc6.
> >>
> >> On Sat, 18 Dec 2010 11:33:14 +0000 Joel Soete<[email protected]> wrote:
> >>
> >>> Hello all,
> >
> > Hi,
> > Could you reproduce this bug with a vanilla kernel (without nvidia
> > patch)? If so, please include dmesg and .config to the next report.
> >
> Yes (it was already a vanilla kernel but 2.6.35 with my config, even thought same issue occurs some other distro stock
> kernel 2.6.35), but here are some more dmesg with vanilla 2.6.37-rc6 and rc7 (I just added your debugging patch
> I found here, just because if I don't do it kernel is panicing immediately without letting any chance to capture dmesg (and
> unfortunately I don't have any more chance to grab panic messages from serial console: no more rs232 on latest office laptop :<)
>
> So you will find here attached personal config files of 2 kernels and respective dmesg.
>
> If ever you need more details, don't hesitate to ask me.
>
> Thanks a lot,
> J.

Something overwrites nr_frags in skb_shinfo(skb)

As skb_shinfo follows head portion of an skb, something overflows skb
head

Please try adding some room like in following patch ?

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e6ba898..adf2834 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -187,6 +187,7 @@ enum {
* the end of the header data, ie. at skb->end.
*/
struct skb_shared_info {
+ char filler[64];
unsigned short nr_frags;
unsigned short gso_size;
/* Warning: this field is not always filled in (UFO)! */

2010-12-22 16:00:37

by Joel Soete

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

Hello Jarek,

Nice to read you :<)

On 12/22/2010 12:00 PM, Jarek Poplawski wrote:
> On 2010-12-22 09:22, Andrew Morton wrote:
>> (cc netdev)
>>
>> The bug is still present in 2.6.37-rc6.
>>
>> On Sat, 18 Dec 2010 11:33:14 +0000 Joel Soete<[email protected]> wrote:
>>
>>> Hello all,
>
> Hi,
> Could you reproduce this bug with a vanilla kernel (without nvidia
> patch)? If so, please include dmesg and .config to the next report.
>
Yes (it was already a vanilla kernel but 2.6.35 with my config, even thought same issue occurs some other distro stock
kernel 2.6.35), but here are some more dmesg with vanilla 2.6.37-rc6 and rc7 (I just added your debugging patch
I found here, just because if I don't do it kernel is panicing immediately without letting any chance to capture dmesg (and
unfortunately I don't have any more chance to grab panic messages from serial console: no more rs232 on latest office laptop :<)

So you will find here attached personal config files of 2 kernels and respective dmesg.

If ever you need more details, don't hesitate to ask me.

Thanks a lot,
J.

Attachments:

config-2.6.37-rc6-amd64-t1.gz (27.51 kB)
config-2.6.37-rc7-amd64-t0.gz (27.51 kB)
Dmesg-2.6.2.6.37-rc6-amd64-t1.txt (71.94 kB)
Dmesg-2.6.2.6.37-rc7-amd64-t0.txt (74.53 kB)
Download all attachments

2010-12-23 11:03:06

by Joel Soete

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

Hello Eric,

On 12/22/2010 04:25 PM, Eric Dumazet wrote:
[snip]
>
> Something overwrites nr_frags in skb_shinfo(skb)
>
> As skb_shinfo follows head portion of an skb, something overflows skb
> head
>
> Please try adding some room like in following patch ?
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index e6ba898..adf2834 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -187,6 +187,7 @@ enum {
> * the end of the header data, ie. at skb->end.
> */
> struct skb_shared_info {
> + char filler[64];
> unsigned short nr_frags;
> unsigned short gso_size;
> /* Warning: this field is not always filled in (UFO)! */
>
Sorry for delay but I have good news, I am sending this answer from:
$ uname -a
Linux sidh2 2.6.37-rc7-amd64-t1 #1 SMP Thu Dec 23 10:30:27 GMT 2010 x86_64 GNU/Linux

with your tips ;<) (without kernel had already died)

That said how can find stuff overflowing skb head? (all I say, is that this issue started with 2.6.34-git6???)

Thanks a lot,
J.

2010-12-23 12:12:36

by Eric Dumazet

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

Le jeudi 23 décembre 2010 à 11:02 +0000, Joel Soete a écrit :
> Hello Eric,
>
>
> On 12/22/2010 04:25 PM, Eric Dumazet wrote:
> [snip]
> >
> > Something overwrites nr_frags in skb_shinfo(skb)
> >
> > As skb_shinfo follows head portion of an skb, something overflows skb
> > head
> >
> > Please try adding some room like in following patch ?
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index e6ba898..adf2834 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -187,6 +187,7 @@ enum {
> > * the end of the header data, ie. at skb->end.
> > */
> > struct skb_shared_info {
> > + char filler[64];
> > unsigned short nr_frags;
> > unsigned short gso_size;
> > /* Warning: this field is not always filled in (UFO)! */
> >
> Sorry for delay but I have good news, I am sending this answer from:
> $ uname -a
> Linux sidh2 2.6.37-rc7-amd64-t1 #1 SMP Thu Dec 23 10:30:27 GMT 2010 x86_64 GNU/Linux
>
> with your tips ;<) (without kernel had already died)
>
> That said how can find stuff overflowing skb head? (all I say, is that this issue started with 2.6.34-git6???)
>
> Thanks a lot,

You're welcome. At least we know were to search. Thanks !

I am taking holidays right now for about 5 days, I guess someone else
might find the bug before me ;)

2010-12-23 20:25:34

by Jarek Poplawski

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

On Thu, Dec 23, 2010 at 01:12:28PM +0100, Eric Dumazet wrote:
> Le jeudi 23 d?cembre 2010 ?? 11:02 +0000, Joel Soete a ?crit :
...
> > Sorry for delay but I have good news, I am sending this answer from:
> > $ uname -a
> > Linux sidh2 2.6.37-rc7-amd64-t1 #1 SMP Thu Dec 23 10:30:27 GMT 2010 x86_64 GNU/Linux
> >
> > with your tips ;<) (without kernel had already died)
> >
> > That said how can find stuff overflowing skb head? (all I say, is that this issue started with 2.6.34-git6???)

Hi Joel,
2.6.34-git6 or 7 is almost a whole netdev batch for 2.6.35 so still
a lot of guessing. One such guess could be e.g. this one:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=18e8c134f4e984e6639e62846345192816f06d5c

I've added to Eric's patch some debugging. After taking several
warnings (might a lot) revert this patch and apply Eric's again.
Btw, could you send your pppoe config (without any personal data,
of course), and mention if there are other changes like mtu etc.

> I am taking holidays right now for about 5 days, I guess someone else
> might find the bug before me ;)

Good job, Eric, we can try. Have a nice rest!

Thanks,
Jarek P.
--- (a debugging patch, apply to clean 2.6.37-rc)

drivers/net/pppoe.c | 8 ++++++++
include/linux/skbuff.h | 6 ++++++
net/core/dev.c | 8 ++++++++
net/core/skbuff.c | 9 +++++++++
4 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index d72fb05..0d41a04 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -385,6 +385,7 @@ static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
* can't change.
*/

+ DEBUG_SKB_POISON(skb);
if (sk->sk_state & PPPOX_BOUND) {
ppp_input(&po->chan, skb);
} else if (sk->sk_state & PPPOX_RELAY) {
@@ -430,6 +431,7 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
if (!skb)
goto out;

+ DEBUG_SKB_POISON(skb);
if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr)))
goto drop;

@@ -452,6 +454,7 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
if (!po)
goto drop;

+ DEBUG_SKB_POISON(skb);
return sk_receive_skb(sk_pppox(po), skb, 0);

drop:
@@ -485,6 +488,7 @@ static int pppoe_disc_rcv(struct sk_buff *skb, struct net_device *dev,
if (ph->code != PADT_CODE)
goto abort;

+ DEBUG_SKB_POISON(skb);
pn = pppoe_pernet(dev_net(dev));
po = get_item(pn, ph->sid, eth_hdr(skb)->h_source, dev->ifindex);
if (po) {
@@ -888,6 +892,7 @@ static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock,

ph->length = htons(total_len);

+ DEBUG_SKB_POISON(skb);
dev_queue_xmit(skb);

end:
@@ -921,6 +926,7 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
if (!dev)
goto abort;

+ DEBUG_SKB_POISON(skb);
/* Copy the data if there is no space for the header or if it's
* read-only.
*/
@@ -943,6 +949,7 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
dev_hard_header(skb, dev, ETH_P_PPP_SES,
po->pppoe_pa.remote, NULL, data_len);

+ DEBUG_SKB_POISON(skb);
dev_queue_xmit(skb);
return 1;

@@ -987,6 +994,7 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
m->msg_namelen = 0;

if (skb) {
+ DEBUG_SKB_POISON(skb);
total_len = min_t(size_t, total_len, skb->len);
error = skb_copy_datagram_iovec(skb, 0, m->msg_iov, total_len);
if (error == 0)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e6ba898..706f182 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -187,6 +187,12 @@ enum {
* the end of the header data, ie. at skb->end.
*/
struct skb_shared_info {
+#define SKB_POISON 0xe2e4e7e5
+#define SET_SKB_POISON(skb) skb_shinfo(skb)->poison = SKB_POISON
+#define DEBUG_SKB_POISON(skb) WARN_ON(skb_shinfo(skb)->poison != SKB_POISON)
+
+ unsigned int poison;
+ char filler[60];
unsigned short nr_frags;
unsigned short gso_size;
/* Warning: this field is not always filled in (UFO)! */
diff --git a/net/core/dev.c b/net/core/dev.c
index 0dd54a6..01ca7de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1994,6 +1994,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
const struct net_device_ops *ops = dev->netdev_ops;
int rc = NETDEV_TX_OK;

+ DEBUG_SKB_POISON(skb);
if (likely(!skb->next)) {
if (!list_empty(&ptype_all))
dev_queue_xmit_nit(skb, dev);
@@ -2026,6 +2027,8 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
__skb_linearize(skb))
goto out_kfree_skb;

+ DEBUG_SKB_POISON(skb);
+
/* If packet is not checksummed and device does not
* support checksumming for this protocol, complete
* checksumming here.
@@ -2039,6 +2042,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
}
}

+ DEBUG_SKB_POISON(skb);
rc = ops->ndo_start_xmit(skb, dev);
trace_net_dev_xmit(skb, rc);
if (rc == NETDEV_TX_OK)
@@ -2243,6 +2247,7 @@ int dev_queue_xmit(struct sk_buff *skb)
struct Qdisc *q;
int rc = -ENOMEM;

+ DEBUG_SKB_POISON(skb);
/* Disable soft irqs for various locks below. Also
* stops preemption for RCU.
*/
@@ -2604,6 +2609,7 @@ int netif_rx(struct sk_buff *skb)
{
int ret;

+ DEBUG_SKB_POISON(skb);
/* if netpoll wants it, pretend we never saw it */
if (netpoll_rx(skb))
return NET_RX_DROP;
@@ -2898,6 +2904,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
int ret = NET_RX_DROP;
__be16 type;

+ DEBUG_SKB_POISON(skb);
if (!netdev_tstamp_prequeue)
net_timestamp_check(skb);

@@ -3043,6 +3050,7 @@ out:
*/
int netif_receive_skb(struct sk_buff *skb)
{
+ DEBUG_SKB_POISON(skb);
if (netdev_tstamp_prequeue)
net_timestamp_check(skb);

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 104f844..b112c7d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -210,6 +210,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
shinfo = skb_shinfo(skb);
memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
atomic_set(&shinfo->dataref, 1);
+ SET_SKB_POISON(skb);

if (fclone) {
struct sk_buff *child = skb + 1;
@@ -412,6 +413,7 @@ static void skb_release_all(struct sk_buff *skb)

void __kfree_skb(struct sk_buff *skb)
{
+ DEBUG_SKB_POISON(skb);
skb_release_all(skb);
kfree_skbmem(skb);
}
@@ -428,6 +430,7 @@ void kfree_skb(struct sk_buff *skb)
{
if (unlikely(!skb))
return;
+ DEBUG_SKB_POISON(skb);
if (likely(atomic_read(&skb->users) == 1))
smp_rmb();
else if (likely(!atomic_dec_and_test(&skb->users)))
@@ -449,6 +452,7 @@ void consume_skb(struct sk_buff *skb)
{
if (unlikely(!skb))
return;
+ DEBUG_SKB_POISON(skb);
if (likely(atomic_read(&skb->users) == 1))
smp_rmb();
else if (likely(!atomic_dec_and_test(&skb->users)))
@@ -487,11 +491,13 @@ bool skb_recycle_check(struct sk_buff *skb, int skb_size)
if (skb_shared(skb) || skb_cloned(skb))
return false;

+ DEBUG_SKB_POISON(skb);
skb_release_head_state(skb);

shinfo = skb_shinfo(skb);
memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
atomic_set(&shinfo->dataref, 1);
+ SET_SKB_POISON(skb);

memset(skb, 0, offsetof(struct sk_buff, tail));
skb->data = skb->head + NET_SKB_PAD;
@@ -571,6 +577,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)

atomic_inc(&(skb_shinfo(skb)->dataref));
skb->cloned = 1;
+ DEBUG_SKB_POISON(skb);

return n;
#undef C
@@ -772,6 +779,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
bool fastpath;

BUG_ON(nhead < 0);
+ DEBUG_SKB_POISON(skb);

if (skb_shared(skb))
BUG();
@@ -836,6 +844,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
skb->hdr_len = 0;
skb->nohdr = 0;
atomic_set(&skb_shinfo(skb)->dataref, 1);
+ SET_SKB_POISON(skb);
return 0;

nodata:

2010-12-24 15:14:05

by Jarek Poplawski

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

On Fri, Dec 24, 2010 at 11:22:25AM +0000, Joel Soete wrote:
> Hello Jarek,
Hi Joel,

> Ok I get a clean 2.6.37-rc7 vanilla src and apply your debugging
> patch and grab the attached syslog-2.6.37-rc7-t2.gz with obviously a
> lot of "warning" (but as well as with Eric's patch, kernel survived
> to a lynx connection to ftp.eu.kernel.org to download of a snapshot
> patch ;<) )

Yes, even more than I expected... I hope the list will forgive us ;-)

> I copy my all /etc/ppp dir into a PPP.ANONYM dir in which I replaced letters of my account and passwd by '.'
> To give more details, I am using a debian 'testing' distro with following ppp pkg:
> # dpkg -l ppp\*
> Desired=Unknown/Install/Remove/Purge/Hold
> | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version Description
> +++-===========================-===========================-======================================================================
> ii ppp 2.4.5-4 Point-to-Point Protocol (PPP) - daemon
> ii pppconfig 2.3.18+nmu2 A text menu based utility for configuring ppp
> ii pppoe 3.8-3 PPP over Ethernet driver
> ii pppoeconf 1.19 configures PPPoE/ADSL connections
> ii pppstatus 0.4.2-10 console-based PPP status monitor
>
> and I used pppoeconf to configure my ADSL connection to my ISP using
> all defaults and just the recommended mtu of 1492 (which seems
> effectively to works fine to me) and I don't remember to have change
> anything else excepted my account and passwd.

Should be enough reading to me for the next day or two.

> Thanks for help and have Happy Christmas,
> J.

Thanks and Happy Holidays!
Jarek P.

2010-12-25 12:10:54

by Jarek Poplawski

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

On Fri, Dec 24, 2010 at 04:13:25PM +0100, Jarek Poplawski wrote:
> On Fri, Dec 24, 2010 at 11:22:25AM +0000, Joel Soete wrote:
> > Hello Jarek,
> Hi Joel,
>
> > Ok I get a clean 2.6.37-rc7 vanilla src and apply your debugging
> > patch and grab the attached syslog-2.6.37-rc7-t2.gz with obviously a
> > lot of "warning" (but as well as with Eric's patch, kernel survived
> > to a lynx connection to ftp.eu.kernel.org to download of a snapshot
> > patch ;<) )
>
> Yes, even more than I expected... I hope the list will forgive us ;-)

Alas the list rejected your message (try to limit it to ~200kb next
time).

Anyway, it looks like the sundance driver is the main guilty. The
patch below removes one obvious bug but there could be something more.
Please, apply this one and my previous debugging patch to the clean
2.6.37-rc7. (If there're still warnings the first ~20kb should do.)

Thanks,
Jarek P.
---

diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
index 3ed2a67..b409d7e 100644
--- a/drivers/net/sundance.c
+++ b/drivers/net/sundance.c
@@ -1016,7 +1016,7 @@ static void init_ring(struct net_device *dev)

/* Fill in the Rx buffers. Handle allocation failure gracefully. */
for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz + 2);
np->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1407,7 +1407,7 @@ static void refill_rx (struct net_device *dev)
struct sk_buff *skb;
entry = np->dirty_rx % RX_RING_SIZE;
if (np->rx_skbuff[entry] == NULL) {
- skb = dev_alloc_skb(np->rx_buf_sz);
+ skb = dev_alloc_skb(np->rx_buf_sz + 2);
np->rx_skbuff[entry] = skb;
if (skb == NULL)
break; /* Better luck next round. */

2010-12-25 13:51:44

by Joel Soete

[permalink] [raw]

Subject: Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?

Hello Jarek,

On 12/25/2010 12:10 PM, Jarek Poplawski wrote:
> On Fri, Dec 24, 2010 at 04:13:25PM +0100, Jarek Poplawski wrote:
>> On Fri, Dec 24, 2010 at 11:22:25AM +0000, Joel Soete wrote:
>>> Hello Jarek,
>> Hi Joel,
>>
[snip]
>
> Alas the list rejected your message (try to limit it to ~200kb next
> time).
>
Ah ok I will take care next ;<)

> Anyway, it looks like the sundance driver is the main guilty. The
> patch below removes one obvious bug but there could be something more.
> Please, apply this one and my previous debugging patch to the clean
> 2.6.37-rc7. (If there're still warnings the first ~20kb should do.)
>
> Thanks,
> Jarek P.
> ---
>
> diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
> index 3ed2a67..b409d7e 100644
> --- a/drivers/net/sundance.c
> +++ b/drivers/net/sundance.c
> @@ -1016,7 +1016,7 @@ static void init_ring(struct net_device *dev)
>
> /* Fill in the Rx buffers. Handle allocation failure gracefully. */
> for (i = 0; i< RX_RING_SIZE; i++) {
> - struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz);
> + struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz + 2);
> np->rx_skbuff[i] = skb;
> if (skb == NULL)
> break;
> @@ -1407,7 +1407,7 @@ static void refill_rx (struct net_device *dev)
> struct sk_buff *skb;
> entry = np->dirty_rx % RX_RING_SIZE;
> if (np->rx_skbuff[entry] == NULL) {
> - skb = dev_alloc_skb(np->rx_buf_sz);
> + skb = dev_alloc_skb(np->rx_buf_sz + 2);
> np->rx_skbuff[entry] = skb;
> if (skb == NULL)
> break; /* Better luck next round. */
>
I don't have any more warnings :<)

Awesome job.

Thanks a lot for help and I wish you a Happy new year,
J.

2010-12-25 15:12:25

by Jarek Poplawski

[permalink] [raw]

Subject: [PATCH net-2.6] sundance: Fix oopses with corrupted skb_shared_info

[Was: Help: major pppoe regression since 2.6.35 (panic on first ppp
conection)?]

On Sat, Dec 25, 2010 at 01:51:05PM +0000, Joel Soete wrote:
> Hello Jarek,
Hello Joel,
...
> I don't have any more warnings :<)
>
> Awesome job.

Awesome help.

Thanks and Happy New Year to you as well!
Jarek P.
-------------->

[PATCH net-2.6] sundance: Fix oopses with corrupted skb_shared_info

Joel Soete reported oopses at the beginning of pppoe connections since
v2.6.35. After debugging the bug was found in sundance skb allocation
and dma mapping code, where skb_reserve() bytes aren't taken into
account. This is an old bug, only uncovered by some change in 2.6.35.

Initial debugging patch by: Eric Dumazet <[email protected]>

Reported-by: Joel Soete <[email protected]>
Tested-by: Joel Soete <[email protected]>
Signed-off-by: Jarek Poplawski <[email protected]>
Cc: Eric Dumazet <[email protected]>
---

diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
index 3ed2a67..b409d7e 100644
--- a/drivers/net/sundance.c
+++ b/drivers/net/sundance.c
@@ -1016,7 +1016,7 @@ static void init_ring(struct net_device *dev)

/* Fill in the Rx buffers. Handle allocation failure gracefully. */
for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz + 2);
np->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1407,7 +1407,7 @@ static void refill_rx (struct net_device *dev)
struct sk_buff *skb;
entry = np->dirty_rx % RX_RING_SIZE;
if (np->rx_skbuff[entry] == NULL) {
- skb = dev_alloc_skb(np->rx_buf_sz);
+ skb = dev_alloc_skb(np->rx_buf_sz + 2);
np->rx_skbuff[entry] = skb;
if (skb == NULL)
break; /* Better luck next round. */

2010-12-25 17:32:00

by Jarek Poplawski

[permalink] [raw]

Subject: [PATCH net-2.6] epic100: hamachi: yellowfin: Fix skb allocation size

Joel Soete reported oopses during pppoe over sundance NIC, caused by
a bug in skb allocation and dma mapping code, where skb_reserve()
bytes weren't taken into account. As a followup to the patch:
"sundance: Fix oopses with corrupted" very similar code is fixed here
for three other drivers.

Signed-off-by: Jarek Poplawski <[email protected]>
Cc: Joel Soete <[email protected]>
Cc: Eric Dumazet <[email protected]>
---

drivers/net/epic100.c | 4 ++--
drivers/net/hamachi.c | 4 ++--
drivers/net/yellowfin.c | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/epic100.c b/drivers/net/epic100.c
index aa56963..c353bf3 100644
--- a/drivers/net/epic100.c
+++ b/drivers/net/epic100.c
@@ -935,7 +935,7 @@ static void epic_init_ring(struct net_device *dev)

/* Fill in the Rx buffers. Handle allocation failure gracefully. */
for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(ep->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(ep->rx_buf_sz + 2);
ep->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1233,7 +1233,7 @@ static int epic_rx(struct net_device *dev, int budget)
entry = ep->dirty_rx % RX_RING_SIZE;
if (ep->rx_skbuff[entry] == NULL) {
struct sk_buff *skb;
- skb = ep->rx_skbuff[entry] = dev_alloc_skb(ep->rx_buf_sz);
+ skb = ep->rx_skbuff[entry] = dev_alloc_skb(ep->rx_buf_sz + 2);
if (skb == NULL)
break;
skb_reserve(skb, 2); /* Align IP on 16 byte boundaries */
diff --git a/drivers/net/hamachi.c b/drivers/net/hamachi.c
index 9a64858..80d25ed 100644
--- a/drivers/net/hamachi.c
+++ b/drivers/net/hamachi.c
@@ -1202,7 +1202,7 @@ static void hamachi_init_ring(struct net_device *dev)
}
/* Fill in the Rx buffers. Handle allocation failure gracefully. */
for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz + 2);
hmp->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1669,7 +1669,7 @@ static int hamachi_rx(struct net_device *dev)
entry = hmp->dirty_rx % RX_RING_SIZE;
desc = &(hmp->rx_ring[entry]);
if (hmp->rx_skbuff[entry] == NULL) {
- struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz + 2);

hmp->rx_skbuff[entry] = skb;
if (skb == NULL)
diff --git a/drivers/net/yellowfin.c b/drivers/net/yellowfin.c
index cd1b3dc..ec47e22 100644
--- a/drivers/net/yellowfin.c
+++ b/drivers/net/yellowfin.c
@@ -744,7 +744,7 @@ static int yellowfin_init_ring(struct net_device *dev)
}

for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz + 2);
yp->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1157,7 +1157,7 @@ static int yellowfin_rx(struct net_device *dev)
for (; yp->cur_rx - yp->dirty_rx > 0; yp->dirty_rx++) {
entry = yp->dirty_rx % RX_RING_SIZE;
if (yp->rx_skbuff[entry] == NULL) {
- struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz + 2);
if (skb == NULL)
break; /* Better luck next round. */
yp->rx_skbuff[entry] = skb;

2010-12-25 17:40:09

by Jarek Poplawski

[permalink] [raw]

Subject: [PATCH net-2.6 v2] epic100: hamachi: yellowfin: Fix skb allocation size

Joel Soete reported oopses during pppoe over sundance NIC, caused by
a bug in skb allocation and dma mapping code, where skb_reserve()
bytes weren't taken into account. As a followup to the patch:
"sundance: Fix oopses with corrupted skb_shared_info" very similar
code is fixed here for three other drivers.

Signed-off-by: Jarek Poplawski <[email protected]>
Cc: Joel Soete <[email protected]>
Cc: Eric Dumazet <[email protected]>
---
v2: a tiny changelog fix only

drivers/net/epic100.c | 4 ++--
drivers/net/hamachi.c | 4 ++--
drivers/net/yellowfin.c | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/epic100.c b/drivers/net/epic100.c
index aa56963..c353bf3 100644
--- a/drivers/net/epic100.c
+++ b/drivers/net/epic100.c
@@ -935,7 +935,7 @@ static void epic_init_ring(struct net_device *dev)

/* Fill in the Rx buffers. Handle allocation failure gracefully. */
for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(ep->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(ep->rx_buf_sz + 2);
ep->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1233,7 +1233,7 @@ static int epic_rx(struct net_device *dev, int budget)
entry = ep->dirty_rx % RX_RING_SIZE;
if (ep->rx_skbuff[entry] == NULL) {
struct sk_buff *skb;
- skb = ep->rx_skbuff[entry] = dev_alloc_skb(ep->rx_buf_sz);
+ skb = ep->rx_skbuff[entry] = dev_alloc_skb(ep->rx_buf_sz + 2);
if (skb == NULL)
break;
skb_reserve(skb, 2); /* Align IP on 16 byte boundaries */
diff --git a/drivers/net/hamachi.c b/drivers/net/hamachi.c
index 9a64858..80d25ed 100644
--- a/drivers/net/hamachi.c
+++ b/drivers/net/hamachi.c
@@ -1202,7 +1202,7 @@ static void hamachi_init_ring(struct net_device *dev)
}
/* Fill in the Rx buffers. Handle allocation failure gracefully. */
for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz + 2);
hmp->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1669,7 +1669,7 @@ static int hamachi_rx(struct net_device *dev)
entry = hmp->dirty_rx % RX_RING_SIZE;
desc = &(hmp->rx_ring[entry]);
if (hmp->rx_skbuff[entry] == NULL) {
- struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(hmp->rx_buf_sz + 2);

hmp->rx_skbuff[entry] = skb;
if (skb == NULL)
diff --git a/drivers/net/yellowfin.c b/drivers/net/yellowfin.c
index cd1b3dc..ec47e22 100644
--- a/drivers/net/yellowfin.c
+++ b/drivers/net/yellowfin.c
@@ -744,7 +744,7 @@ static int yellowfin_init_ring(struct net_device *dev)
}

for (i = 0; i < RX_RING_SIZE; i++) {
- struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz + 2);
yp->rx_skbuff[i] = skb;
if (skb == NULL)
break;
@@ -1157,7 +1157,7 @@ static int yellowfin_rx(struct net_device *dev)
for (; yp->cur_rx - yp->dirty_rx > 0; yp->dirty_rx++) {
entry = yp->dirty_rx % RX_RING_SIZE;
if (yp->rx_skbuff[entry] == NULL) {
- struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz);
+ struct sk_buff *skb = dev_alloc_skb(yp->rx_buf_sz + 2);
if (skb == NULL)
break; /* Better luck next round. */
yp->rx_skbuff[entry] = skb;

2010-12-26 03:41:41

by David Miller

[permalink] [raw]

Subject: Re: [PATCH net-2.6] sundance: Fix oopses with corrupted skb_shared_info

From: Jarek Poplawski <[email protected]>
Date: Sat, 25 Dec 2010 16:12:17 +0100

> [PATCH net-2.6] sundance: Fix oopses with corrupted skb_shared_info
>
> Joel Soete reported oopses at the beginning of pppoe connections since
> v2.6.35. After debugging the bug was found in sundance skb allocation
> and dma mapping code, where skb_reserve() bytes aren't taken into
> account. This is an old bug, only uncovered by some change in 2.6.35.
>
> Initial debugging patch by: Eric Dumazet <[email protected]>
>
> Reported-by: Joel Soete <[email protected]>
> Tested-by: Joel Soete <[email protected]>
> Signed-off-by: Jarek Poplawski <[email protected]>
> Cc: Eric Dumazet <[email protected]>

Applied, great work Jarek.

I was auditing ppp_generic.c hoping I'd find something, but
if I had that backtrace I wouldn't have bothered :-)

2010-12-26 03:41:56

by David Miller

[permalink] [raw]

Subject: Re: [PATCH net-2.6 v2] epic100: hamachi: yellowfin: Fix skb allocation size

From: Jarek Poplawski <[email protected]>
Date: Sat, 25 Dec 2010 18:39:59 +0100

> Joel Soete reported oopses during pppoe over sundance NIC, caused by
> a bug in skb allocation and dma mapping code, where skb_reserve()
> bytes weren't taken into account. As a followup to the patch:
> "sundance: Fix oopses with corrupted skb_shared_info" very similar
> code is fixed here for three other drivers.
>
> Signed-off-by: Jarek Poplawski <[email protected]>

Also applied, thanks.

2010-12-26 11:01:22

by Jarek Poplawski

[permalink] [raw]

Subject: Re: [PATCH net-2.6] sundance: Fix oopses with corrupted skb_shared_info

On Sat, Dec 25, 2010 at 07:42:10PM -0800, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Sat, 25 Dec 2010 16:12:17 +0100
>
> > [PATCH net-2.6] sundance: Fix oopses with corrupted skb_shared_info
> >
> > Joel Soete reported oopses at the beginning of pppoe connections since
> > v2.6.35. After debugging the bug was found in sundance skb allocation
> > and dma mapping code, where skb_reserve() bytes aren't taken into
> > account. This is an old bug, only uncovered by some change in 2.6.35.
> >
> > Initial debugging patch by: Eric Dumazet <[email protected]>
> >
> > Reported-by: Joel Soete <[email protected]>
> > Tested-by: Joel Soete <[email protected]>
> > Signed-off-by: Jarek Poplawski <[email protected]>
> > Cc: Eric Dumazet <[email protected]>
>
> Applied, great work Jarek.
>
> I was auditing ppp_generic.c hoping I'd find something, but
> if I had that backtrace I wouldn't have bothered :-)

Yes, with the backtrace it was a piece of cake :-) If I knew you're
interested... Anyway, I append a little bit for the record.

Thanks,
Jarek P.

Attachments:

(No filename) (1.08 kB)
joel.syslog-2.6.37-rc7-t2.1000 (93.20 kB)
Download all attachments