2013-06-06 22:17:03

by Steinar H. Gunderson

[permalink] [raw]
Subject: NULL pointer dereference when loading the gre module (3.10.0-rc4)

Hi,

In 3.10.0-rc4, I get this on boot:

[ 16.871043] BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
[ 16.879453] IP: [<ffffffffa0e52002>] 0xffffffffa0e52001
[ 16.884995] PGD 0
[ 16.887313] Oops: 0000 [#1] SMP
[ 16.890904] Modules linked in: ip_gre(+) gre ip_tunnel psmouse ide_generic ide_gd_mod ide_cd_mod cdrom acpi_cpufreq mperf coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support i2c_i801 microcode lpc_ich pcspkr i2c_core mfd_core ehci_pci evbug evdev ext4 crc16 jbd2 mbcache dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 md_mod sg sd_mod usbhid ide_pci_generic ide_core crc32c_intel e1000e ata_piix ptp pps_core uhci_hcd ehci_hcd mpt2sas raid_class unix
[ 16.939181] CPU: 0 PID: 3261 Comm: modprobe Not tainted 3.10.0-rc4 #1
[ 16.945873] Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a 12/30/2011
[ 16.953252] task: ffff880621662d60 ti: ffff8806227de000 task.ti: ffff8806227de000
[ 16.961184] RIP: 0010:[<ffffffffa0e52002>] [<ffffffffa0e52002>] 0xffffffffa0e52001
[ 16.969346] RSP: 0018:ffff8806227dfca8 EFLAGS: 00010246
[ 16.974903] RAX: ffffffffa0e5d000 RBX: ffff880623ebe280 RCX: 0000000000000000
[ 16.982285] RDX: ffffffffa0e5aa40 RSI: 0000000000000003 RDI: ffffffffa0e5d018
[ 16.989674] RBP: ffff8806227dfca8 R08: 000000000000072f R09: ffffffff812bae96
[ 16.997051] R10: ffffea00188d1200 R11: 0000000000000000 R12: ffff88061f874900
[ 17.004440] R13: ffffffffa0e5a9c0 R14: ffff8806227dfef8 R15: 0000000000000002
[ 17.011818] FS: 00007f7da1d97700(0000) GS:ffff880627200000(0000) knlGS:0000000000000000
[ 17.020357] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 17.026349] CR2: 0000000000000003 CR3: 0000000621b84000 CR4: 00000000000007f0
[ 17.033734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 17.041110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 17.048494] Stack:
[ 17.050757] ffff8806227dfcf8 ffffffff812baf26 2222222222222222 2222222222222222
[ 17.058857] 2222222222222222 ffffffffa0e5a9c0 0000000000000000 0000000000000000
[ 17.066933] ffff8806227dfef8 ffffffffa0e5ab60 ffff8806227dfd28 ffffffff812bafb6
[ 17.075008] Call Trace:
[ 17.077703] [<ffffffff812baf26>] ops_init.constprop.7+0xc6/0xf5
[ 17.083956] [<ffffffff812bafb6>] register_pernet_operations.isra.4+0x61/0x91
[ 17.091340] [<ffffffff8138486f>] ? mutex_lock+0xf/0x20
[ 17.096822] [<ffffffff812bb006>] register_pernet_device+0x20/0x51
[ 17.103254] [<ffffffffa0e5d034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]
[ 17.110298] [<ffffffffa0e5d055>] ipgre_init+0x21/0xc9 [ip_gre]
[ 17.116470] [<ffffffffa0e5d034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]
[ 17.123515] [<ffffffff81000263>] do_one_initcall+0x7b/0x10c
[ 17.129422] [<ffffffff8107e5db>] load_module+0x1b1f/0x1e19
[ 17.135241] [<ffffffff8107a4f8>] ? sys_getegid16+0x44/0x44
[ 17.141058] [<ffffffff81386cf2>] ? page_fault+0x22/0x30
[ 17.146618] [<ffffffff8107e969>] SyS_init_module+0x94/0xa1
[ 17.152440] [<ffffffff8138cf12>] system_call_fastpath+0x16/0x1b
[ 17.158695] Code: <6e> 65 77 6c 69 6e 6b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 17.168440] RIP [<ffffffffa0e52002>] 0xffffffffa0e52001
[ 17.174058] RSP <ffff8806227dfca8>
[ 17.177798] CR2: 0000000000000003
[ 17.181730] ---[ end trace 531fea804a54bcad ]---

I assume this is from loading ip_gre, given that it's somewhere in the call
stack; amazingly enough, GRE tunnels seem to actually still work, though,
although I cannot load other modules such as ip_tables (modprobe hangs).

/* Steinar */
--
Homepage: http://www.sesse.net/


2013-06-07 03:06:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, Jun 07, 2013 at 12:16:56AM +0200, Steinar H. Gunderson wrote:
> Hi,
>
> In 3.10.0-rc4, I get this on boot:
>
> [ 16.871043] BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
> [ 16.879453] IP: [<ffffffffa0e52002>] 0xffffffffa0e52001

Strange, kallsyms should have registered the address already, even if it
crashed on early module load. Not sure why it's not reporting it. Well,
it seems to have reported some of the symbols of ip_gre below. Maybe
this pointer is just totally screwed up.

> [ 16.884995] PGD 0
> [ 16.887313] Oops: 0000 [#1] SMP
> [ 16.890904] Modules linked in: ip_gre(+) gre ip_tunnel psmouse ide_generic ide_gd_mod ide_cd_mod cdrom acpi_cpufreq mperf coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support i2c_i801 microcode lpc_ich pcspkr i2c_core mfd_core ehci_pci evbug evdev ext4 crc16 jbd2 mbcache dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 md_mod sg sd_mod usbhid ide_pci_generic ide_core crc32c_intel e1000e ata_piix ptp pps_core uhci_hcd ehci_hcd mpt2sas raid_class unix

The ip_gre(+) shows that this is indeed happening while the ip_gre
module is being loaded.

> [ 16.939181] CPU: 0 PID: 3261 Comm: modprobe Not tainted 3.10.0-rc4 #1
> [ 16.945873] Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a 12/30/2011
> [ 16.953252] task: ffff880621662d60 ti: ffff8806227de000 task.ti: ffff8806227de000
> [ 16.961184] RIP: 0010:[<ffffffffa0e52002>] [<ffffffffa0e52002>] 0xffffffffa0e52001
> [ 16.969346] RSP: 0018:ffff8806227dfca8 EFLAGS: 00010246
> [ 16.974903] RAX: ffffffffa0e5d000 RBX: ffff880623ebe280 RCX: 0000000000000000
> [ 16.982285] RDX: ffffffffa0e5aa40 RSI: 0000000000000003 RDI: ffffffffa0e5d018
> [ 16.989674] RBP: ffff8806227dfca8 R08: 000000000000072f R09: ffffffff812bae96
> [ 16.997051] R10: ffffea00188d1200 R11: 0000000000000000 R12: ffff88061f874900
> [ 17.004440] R13: ffffffffa0e5a9c0 R14: ffff8806227dfef8 R15: 0000000000000002
> [ 17.011818] FS: 00007f7da1d97700(0000) GS:ffff880627200000(0000) knlGS:0000000000000000
> [ 17.020357] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 17.026349] CR2: 0000000000000003 CR3: 0000000621b84000 CR4: 00000000000007f0
> [ 17.033734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 17.041110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 17.048494] Stack:
> [ 17.050757] ffff8806227dfcf8 ffffffff812baf26 2222222222222222 2222222222222222
> [ 17.058857] 2222222222222222 ffffffffa0e5a9c0 0000000000000000 0000000000000000
> [ 17.066933] ffff8806227dfef8 ffffffffa0e5ab60 ffff8806227dfd28 ffffffff812bafb6
> [ 17.075008] Call Trace:
> [ 17.077703] [<ffffffff812baf26>] ops_init.constprop.7+0xc6/0xf5

This looks like something really bad happened net_namespace.c with
ops->init(net). If ops is corrupted here, it would explain why calling
ops->init might do something nasty and we get a bad instruction pointer.

> [ 17.083956] [<ffffffff812bafb6>] register_pernet_operations.isra.4+0x61/0x91
> [ 17.091340] [<ffffffff8138486f>] ? mutex_lock+0xf/0x20
> [ 17.096822] [<ffffffff812bb006>] register_pernet_device+0x20/0x51
> [ 17.103254] [<ffffffffa0e5d034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]
> [ 17.110298] [<ffffffffa0e5d055>] ipgre_init+0x21/0xc9 [ip_gre]
> [ 17.116470] [<ffffffffa0e5d034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]

Note the faulting address is 0xffffffffa0e52001, which is around the
above address, be interesting to know what was at that location.

> [ 17.123515] [<ffffffff81000263>] do_one_initcall+0x7b/0x10c
> [ 17.129422] [<ffffffff8107e5db>] load_module+0x1b1f/0x1e19
> [ 17.135241] [<ffffffff8107a4f8>] ? sys_getegid16+0x44/0x44
> [ 17.141058] [<ffffffff81386cf2>] ? page_fault+0x22/0x30
> [ 17.146618] [<ffffffff8107e969>] SyS_init_module+0x94/0xa1
> [ 17.152440] [<ffffffff8138cf12>] system_call_fastpath+0x16/0x1b
> [ 17.158695] Code: <6e> 65 77 6c 69 6e 6b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 17.168440] RIP [<ffffffffa0e52002>] 0xffffffffa0e52001
> [ 17.174058] RSP <ffff8806227dfca8>
> [ 17.177798] CR2: 0000000000000003
> [ 17.181730] ---[ end trace 531fea804a54bcad ]---
>
> I assume this is from loading ip_gre, given that it's somewhere in the call
> stack; amazingly enough, GRE tunnels seem to actually still work, though,
> although I cannot load other modules such as ip_tables (modprobe hangs).

Well, probably a lock was held when this crashed, and never got to be
released. Which would explain the modprobe hangs. There's a few net
mutexes held in that location too.

-- Steve

2013-06-07 03:59:52

by Eric Dumazet

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Thu, 2013-06-06 at 23:06 -0400, Steven Rostedt wrote:
> On Fri, Jun 07, 2013 at 12:16:56AM +0200, Steinar H. Gunderson wrote:
> > Hi,
> >
> > In 3.10.0-rc4, I get this on boot:
> >
> > [ 16.871043] BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
> > [ 16.879453] IP: [<ffffffffa0e52002>] 0xffffffffa0e52001
>
> Strange, kallsyms should have registered the address already, even if it
> crashed on early module load. Not sure why it's not reporting it. Well,
> it seems to have reported some of the symbols of ip_gre below. Maybe
> this pointer is just totally screwed up.

I could not reproduce this here.

Steinar, please make sure you recompiled your modules, because this
looks like you loaded old modules.


2013-06-07 08:27:40

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Thu, Jun 06, 2013 at 11:06:48PM -0400, Steven Rostedt wrote:
> Note the faulting address is 0xffffffffa0e52001, which is around the
> above address, be interesting to know what was at that location.

Is there any way I can figure this out? The machine in question is still
running. kallsyms doesn't show anything near it, though.

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 08:32:06

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Thu, Jun 06, 2013 at 08:59:48PM -0700, Eric Dumazet wrote:
> Steinar, please make sure you recompiled your modules, because this
> looks like you loaded old modules.

I compiled the kernel using make-kpkg, so I don't see how that would happen.
Also, the timestamps indicate everything is fine:

-rw-r--r-- 1 root root 2574976 Jun 6 00:39 /boot/vmlinuz-3.10.0-rc4
-rw-r--r-- 1 root root 22856 Jun 6 00:39 /lib/modules/3.10.0-rc4/kernel/net/ipv4/ip_gre.ko

Or from the source tree:

-rw-r--r-- 1 root root 2574976 Jun 6 00:36 arch/x86/boot/bzImage
-rw-r--r-- 1 root root 22800 Jun 6 00:39 ./net/ipv4/ip_gre.ko

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 08:43:14

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Thu, Jun 06, 2013 at 11:06:48PM -0400, Steven Rostedt wrote:
> Note the faulting address is 0xffffffffa0e52001, which is around the
> above address, be interesting to know what was at that location.

Aha, the plot thickens:

root 6095 0.0 0.0 6632 596 ? D Jun06 0:00 /sbin/modprobe -q -- net-pf-17

pannekake:/usr/src/linux-3.10-rc4> sudo cat /proc/6095/stack
[<ffffffff812bb04f>] register_pernet_subsys+0x18/0x39
[<ffffffffa0ffd089>] packet_init+0x32/0x44 [af_packet]
[<ffffffff81000263>] do_one_initcall+0x7b/0x10c
[<ffffffff8107e5db>] load_module+0x1b1f/0x1e19
[<ffffffff8107e969>] SyS_init_module+0x94/0xa1
[<ffffffff8138cf12>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

I have a tcpdump running almost all the time (from boot), for a variety of
reasons. And I think I have the BPF JIT on; possibly related.

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 08:54:54

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Thu, Jun 06, 2013 at 11:06:48PM -0400, Steven Rostedt wrote:
> Note the faulting address is 0xffffffffa0e52001, which is around the
> above address, be interesting to know what was at that location.

Doh, I looked at the wrong place in kallsyms:

ffffffffa0e52000 u ip_tunnel_init_net [ip_gre]
ffffffffa0e55000 t gre_err [gre]
ffffffffa0e5503d t gre_gso_send_check [gre]
ffffffffa0e55053 t gre_rcv [gre]

So it's really ip_tunnel_init_net+1.

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 13:40:45

by Eric Dumazet

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, 2013-06-07 at 10:54 +0200, Steinar H. Gunderson wrote:
> On Thu, Jun 06, 2013 at 11:06:48PM -0400, Steven Rostedt wrote:
> > Note the faulting address is 0xffffffffa0e52001, which is around the
> > above address, be interesting to know what was at that location.
>
> Doh, I looked at the wrong place in kallsyms:
>
> ffffffffa0e52000 u ip_tunnel_init_net [ip_gre]
> ffffffffa0e55000 t gre_err [gre]
> ffffffffa0e5503d t gre_gso_send_check [gre]
> ffffffffa0e55053 t gre_rcv [gre]
>
> So it's really ip_tunnel_init_net+1.
>
> /* Steinar */

" u " for ip_tunnel_init_net ?

Looks like someone forgot taking refcounts on a module ...

CC Pravin B Shelar, as this probably comes from commit
c54419321455631079c7d6e60bc732dd0c5914c5
("GRE: Refactor GRE tunneling code.")



2013-06-07 15:15:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, 2013-06-07 at 06:40 -0700, Eric Dumazet wrote:
> On Fri, 2013-06-07 at 10:54 +0200, Steinar H. Gunderson wrote:
> > On Thu, Jun 06, 2013 at 11:06:48PM -0400, Steven Rostedt wrote:
> > > Note the faulting address is 0xffffffffa0e52001, which is around the
> > > above address, be interesting to know what was at that location.
> >
> > Doh, I looked at the wrong place in kallsyms:
> >
> > ffffffffa0e52000 u ip_tunnel_init_net [ip_gre]
> > ffffffffa0e55000 t gre_err [gre]
> > ffffffffa0e5503d t gre_gso_send_check [gre]
> > ffffffffa0e55053 t gre_rcv [gre]
> >
> > So it's really ip_tunnel_init_net+1.
> >
> > /* Steinar */
>
> " u " for ip_tunnel_init_net ?
>
> Looks like someone forgot taking refcounts on a module ...
>
> CC Pravin B Shelar, as this probably comes from commit
> c54419321455631079c7d6e60bc732dd0c5914c5
> ("GRE: Refactor GRE tunneling code.")

int __net_init ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
struct rtnl_link_ops *ops, char *devname)
{

[...]

}
EXPORT_SYMBOL_GPL(ip_tunnel_init_net);

Really, you exported a symbol that can go away if CONFIG_NET_NS is not
set?

----
net: Remove __net_init/exit from exported functions

If CONFIG_NET_NS is not set then __net_init is the same as __init and
__net_exit is the same as __exit. These functions will be removed from
memory after the module loads or is removed. Functions that are exported
for use by other functions should never be labeled for removal.

Signed-off-by: Steven Rostedt <[email protected]>

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index e4147ec..850b5b5 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -853,8 +853,8 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
}
EXPORT_SYMBOL_GPL(ip_tunnel_dellink);

-int __net_init ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
- struct rtnl_link_ops *ops, char *devname)
+int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
+ struct rtnl_link_ops *ops, char *devname)
{
struct ip_tunnel_net *itn = net_generic(net, ip_tnl_net_id);
struct ip_tunnel_parm parms;
@@ -899,7 +899,7 @@ static void ip_tunnel_destroy(struct ip_tunnel_net *itn, struct list_head *head)
unregister_netdevice_queue(itn->fb_tunnel_dev, head);
}

-void __net_exit ip_tunnel_delete_net(struct ip_tunnel_net *itn)
+void ip_tunnel_delete_net(struct ip_tunnel_net *itn)
{
LIST_HEAD(list);


2013-06-07 15:46:48

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, Jun 07, 2013 at 11:15:00AM -0400, Steven Rostedt wrote:
> net: Remove __net_init/exit from exported functions
>
> If CONFIG_NET_NS is not set then __net_init is the same as __init and
> __net_exit is the same as __exit. These functions will be removed from
> memory after the module loads or is removed. Functions that are exported
> for use by other functions should never be labeled for removal.

That didn't help much, I'm afraid:

[ 18.005451] BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
[ 18.013853] IP: [<ffffffffa0e76002>] 0xffffffffa0e76001
[ 18.019380] PGD 0
[ 18.021695] Oops: 0000 [#1] SMP
[ 18.025285] Modules linked in: ip_gre(+) gre ip_tunnel psmouse ide_generic ide_gd_mod ide_cd_mod cdrom acpi_cpufreq mperf coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support lpc_ich microcode mfd_core i2c_i801 pcspkr i2c_core ehci_pci evbug evdev ext4 crc16 jbd2 mbcache dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 md_mod sg sd_mod usbhid ide_pci_generic ide_core crc32c_intel e1000e ata_piix ptp pps_core uhci_hcd ehci_hcd mpt2sas raid_class unix
[ 18.073543] CPU: 0 PID: 3263 Comm: modprobe Not tainted 3.10.0-rc4 #2
[ 18.080237] Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a 12/30/2011
[ 18.087634] task: ffff88061ecfad60 ti: ffff8806212f0000 task.ti: ffff8806212f0000
[ 18.095571] RIP: 0010:[<ffffffffa0e76002>] [<ffffffffa0e76002>] 0xffffffffa0e76001
[ 18.103745] RSP: 0018:ffff8806212f1ca8 EFLAGS: 00010246
[ 18.109301] RAX: ffffffffa0e81000 RBX: ffff880623ebe280 RCX: 0000000000000000
[ 18.116682] RDX: ffffffffa0e7ea40 RSI: 0000000000000003 RDI: ffffffffa0e81018
[ 18.124063] RBP: ffff8806212f1ca8 R08: 0000000000000cf8 R09: ffffffff812bae96
[ 18.131441] R10: ffffea0018852c00 R11: 0000000000000000 R12: ffff880621678290
[ 18.138829] R13: ffffffffa0e7e9c0 R14: ffff8806212f1ef8 R15: 0000000000000002
[ 18.146210] FS: 00007f2e37fd1700(0000) GS:ffff880627200000(0000) knlGS:0000000000000000
[ 18.154747] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 18.160742] CR2: 0000000000000003 CR3: 0000000622a5e000 CR4: 00000000000007f0
[ 18.168131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 18.175510] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 18.182890] Stack:
[ 18.185143] ffff8806212f1cf8 ffffffff812baf26 2222222222222222 2222222222222222
[ 18.193235] 2222222222222222 ffffffffa0e7e9c0 0000000000000000 0000000000000000
[ 18.201313] ffff8806212f1ef8 ffffffffa0e7eb60 ffff8806212f1d28 ffffffff812bafb6
[ 18.209389] Call Trace:
[ 18.212084] [<ffffffff812baf26>] ops_init.constprop.7+0xc6/0xf5
[ 18.218339] [<ffffffff812bafb6>] register_pernet_operations.isra.4+0x61/0x91
[ 18.225720] [<ffffffff8138486f>] ? mutex_lock+0xf/0x20
[ 18.231189] [<ffffffff812bb006>] register_pernet_device+0x20/0x51
[ 18.237621] [<ffffffffa0e81034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]
[ 18.244661] [<ffffffffa0e81055>] ipgre_init+0x21/0xc9 [ip_gre]
[ 18.250831] [<ffffffffa0e81034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]
[ 18.257866] [<ffffffff81000263>] do_one_initcall+0x7b/0x10c
[ 18.263780] [<ffffffff8107e5db>] load_module+0x1b1f/0x1e19
[ 18.269594] [<ffffffff8107a4f8>] ? sys_getegid16+0x44/0x44
[ 18.275416] [<ffffffff81386cf2>] ? page_fault+0x22/0x30
[ 18.280972] [<ffffffff8107e969>] SyS_init_module+0x94/0xa1
[ 18.286795] [<ffffffff8138cf12>] system_call_fastpath+0x16/0x1b
[ 18.293051] Code: <6e> 65 77 6c 69 6e 6b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 18.302807] RIP [<ffffffffa0e76002>] 0xffffffffa0e76001
[ 18.308429] RSP <ffff8806212f1ca8>
[ 18.312163] CR2: 0000000000000003
[ 18.316021] ---[ end trace 839c6b43b00f02f5 ]---

and still:

Ffffffffa0e76000 u ip_tunnel_init_net [ip_gre]

I've checked that ip_tunnel.ko and ip_gre.ko was indeed rebuilt (new timestamps),
and that my patching (I had to resolve manually due to fuzz) really removed __net_init.

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 16:12:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, 2013-06-07 at 17:46 +0200, Steinar H. Gunderson wrote:
> On Fri, Jun 07, 2013 at 11:15:00AM -0400, Steven Rostedt wrote:
> > net: Remove __net_init/exit from exported functions
> >
> > If CONFIG_NET_NS is not set then __net_init is the same as __init and
> > __net_exit is the same as __exit. These functions will be removed from
> > memory after the module loads or is removed. Functions that are exported
> > for use by other functions should never be labeled for removal.
>
> That didn't help much, I'm afraid:

Ouch :-/

>
> [ 18.005451] BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
> [ 18.013853] IP: [<ffffffffa0e76002>] 0xffffffffa0e76001
> [ 18.019380] PGD 0
> [ 18.021695] Oops: 0000 [#1] SMP
> [ 18.025285] Modules linked in: ip_gre(+) gre ip_tunnel psmouse ide_generic ide_gd_mod ide_cd_mod cdrom acpi_cpufreq mperf coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support lpc_ich microcode mfd_core i2c_i801 pcspkr i2c_core ehci_pci evbug evdev ext4 crc16 jbd2 mbcache dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 md_mod sg sd_mod usbhid ide_pci_generic ide_core crc32c_intel e1000e ata_piix ptp pps_core uhci_hcd ehci_hcd mpt2sas raid_class unix
> [ 18.073543] CPU: 0 PID: 3263 Comm: modprobe Not tainted 3.10.0-rc4 #2
> [ 18.080237] Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a 12/30/2011
> [ 18.087634] task: ffff88061ecfad60 ti: ffff8806212f0000 task.ti: ffff8806212f0000
> [ 18.095571] RIP: 0010:[<ffffffffa0e76002>] [<ffffffffa0e76002>] 0xffffffffa0e76001
> [ 18.103745] RSP: 0018:ffff8806212f1ca8 EFLAGS: 00010246
> [ 18.109301] RAX: ffffffffa0e81000 RBX: ffff880623ebe280 RCX: 0000000000000000
> [ 18.116682] RDX: ffffffffa0e7ea40 RSI: 0000000000000003 RDI: ffffffffa0e81018
> [ 18.124063] RBP: ffff8806212f1ca8 R08: 0000000000000cf8 R09: ffffffff812bae96
> [ 18.131441] R10: ffffea0018852c00 R11: 0000000000000000 R12: ffff880621678290
> [ 18.138829] R13: ffffffffa0e7e9c0 R14: ffff8806212f1ef8 R15: 0000000000000002
> [ 18.146210] FS: 00007f2e37fd1700(0000) GS:ffff880627200000(0000) knlGS:0000000000000000
> [ 18.154747] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 18.160742] CR2: 0000000000000003 CR3: 0000000622a5e000 CR4: 00000000000007f0
> [ 18.168131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 18.175510] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 18.182890] Stack:
> [ 18.185143] ffff8806212f1cf8 ffffffff812baf26 2222222222222222 2222222222222222
> [ 18.193235] 2222222222222222 ffffffffa0e7e9c0 0000000000000000 0000000000000000
> [ 18.201313] ffff8806212f1ef8 ffffffffa0e7eb60 ffff8806212f1d28 ffffffff812bafb6
> [ 18.209389] Call Trace:
> [ 18.212084] [<ffffffff812baf26>] ops_init.constprop.7+0xc6/0xf5
> [ 18.218339] [<ffffffff812bafb6>] register_pernet_operations.isra.4+0x61/0x91
> [ 18.225720] [<ffffffff8138486f>] ? mutex_lock+0xf/0x20
> [ 18.231189] [<ffffffff812bb006>] register_pernet_device+0x20/0x51
> [ 18.237621] [<ffffffffa0e81034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]
> [ 18.244661] [<ffffffffa0e81055>] ipgre_init+0x21/0xc9 [ip_gre]
> [ 18.250831] [<ffffffffa0e81034>] ? ipgre_tap_init_net+0x1a/0x1a [ip_gre]
> [ 18.257866] [<ffffffff81000263>] do_one_initcall+0x7b/0x10c
> [ 18.263780] [<ffffffff8107e5db>] load_module+0x1b1f/0x1e19
> [ 18.269594] [<ffffffff8107a4f8>] ? sys_getegid16+0x44/0x44
> [ 18.275416] [<ffffffff81386cf2>] ? page_fault+0x22/0x30
> [ 18.280972] [<ffffffff8107e969>] SyS_init_module+0x94/0xa1
> [ 18.286795] [<ffffffff8138cf12>] system_call_fastpath+0x16/0x1b
> [ 18.293051] Code: <6e> 65 77 6c 69 6e 6b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 18.302807] RIP [<ffffffffa0e76002>] 0xffffffffa0e76001
> [ 18.308429] RSP <ffff8806212f1ca8>
> [ 18.312163] CR2: 0000000000000003
> [ 18.316021] ---[ end trace 839c6b43b00f02f5 ]---
>
> and still:
>
> Ffffffffa0e76000 u ip_tunnel_init_net [ip_gre]

What do you get if you do an objdump -Dr ip_gre.ko

And then look for ipgre_init, and then subtract 0xb053 (45139) from its
address. As that is: ffffffffa0e81055 - ffffffffa0e76002, then see if
that object file has anything in that location.


>
> I've checked that ip_tunnel.ko and ip_gre.ko was indeed rebuilt (new timestamps),
> and that my patching (I had to resolve manually due to fuzz) really removed __net_init.
>
> /* Steinar */

There's also reverting c54419321455631079c7d6e60bc732dd0c5914c5 and see
if that fixes things. Just to confirm if that is the culprit.

Thanks,

-- Steve

2013-06-07 17:52:42

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, Jun 07, 2013 at 12:12:23PM -0400, Steven Rostedt wrote:
>> Ffffffffa0e76000 u ip_tunnel_init_net [ip_gre]
> What do you get if you do an objdump -Dr ip_gre.ko
>
> And then look for ipgre_init, and then subtract 0xb053 (45139) from its
> address. As that is: ffffffffa0e81055 - ffffffffa0e76002, then see if
> that object file has anything in that location.

pannekake:~> objdump -Dr /lib/modules/3.10.0-rc4/kernel/net/ipv4/ip_gre.ko | grep ipgre_init
0000000000000000 <ipgre_init_net>:
0: 8b 35 00 00 00 00 mov 0x0(%rip),%esi # 6 <ipgre_init_net+0x6>
13: e8 00 00 00 00 callq 18 <ipgre_init_net+0x18>

Ie., the symbol doesn't show up in the disassembly (for whatever reason).

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 18:26:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, 2013-06-07 at 19:52 +0200, Steinar H. Gunderson wrote:
> On Fri, Jun 07, 2013 at 12:12:23PM -0400, Steven Rostedt wrote:
> >> Ffffffffa0e76000 u ip_tunnel_init_net [ip_gre]
> > What do you get if you do an objdump -Dr ip_gre.ko
> >
> > And then look for ipgre_init, and then subtract 0xb053 (45139) from its
> > address. As that is: ffffffffa0e81055 - ffffffffa0e76002, then see if
> > that object file has anything in that location.
>
> pannekake:~> objdump -Dr /lib/modules/3.10.0-rc4/kernel/net/ipv4/ip_gre.ko | grep ipgre_init
> 0000000000000000 <ipgre_init_net>:
> 0: 8b 35 00 00 00 00 mov 0x0(%rip),%esi # 6 <ipgre_init_net+0x6>
> 13: e8 00 00 00 00 callq 18 <ipgre_init_net+0x18>
>
> Ie., the symbol doesn't show up in the disassembly (for whatever reason).

Ah, that's because of this: module_init(ipgre_init); Where it makes it
into:

00000000 <init_module>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 53 push %ebx
4: 83 ec 08 sub $0x8,%esp
7: c7 04 24 00 00 00 00 movl $0x0,(%esp)
a: R_386_32 .rodata.str1.4

We can use ipgre_tap_init_net, and the offset of 0xb032 (45106) as that
was 0xffffffffa0e5d034 - 0xffffffffa0e52002. Do you have CONFIG_NET_NS
set?


You can also cat /proc/modules. It gives you where the modules are
located.

-- Steve

2013-06-07 18:34:46

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, Jun 07, 2013 at 02:26:08PM -0400, Steven Rostedt wrote:
> On Fri, 2013-06-07 at 19:52 +0200, Steinar H. Gunderson wrote:
> Ah, that's because of this: module_init(ipgre_init); Where it makes it
> into:
>
> 00000000 <init_module>:
> 0: 55 push %ebp
> 1: 89 e5 mov %esp,%ebp
> 3: 53 push %ebx
> 4: 83 ec 08 sub $0x8,%esp
> 7: c7 04 24 00 00 00 00 movl $0x0,(%esp)
> a: R_386_32 .rodata.str1.4
>
> We can use ipgre_tap_init_net, and the offset of 0xb032 (45106) as that
> was 0xffffffffa0e5d034 - 0xffffffffa0e52002. Do you have CONFIG_NET_NS
> set?

ipgre_tap_init_net is 000000000000001a, but there's no way I can subtract
0xb053 from that? Sorry, I'm confused. :-)

> You can also cat /proc/modules. It gives you where the modules are
> located.

I've booted back to 3.9.x already; I couldn't live with a crashing kernel like
that. Unfortunately it's not that easy for me to reboot this machine all the
time either. :-/

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 18:44:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, 2013-06-07 at 20:34 +0200, Steinar H. Gunderson wrote:
> On Fri, Jun 07, 2013 at 02:26:08PM -0400, Steven Rostedt wrote:
> > On Fri, 2013-06-07 at 19:52 +0200, Steinar H. Gunderson wrote:
> > Ah, that's because of this: module_init(ipgre_init); Where it makes it
> > into:
> >
> > 00000000 <init_module>:
> > 0: 55 push %ebp
> > 1: 89 e5 mov %esp,%ebp
> > 3: 53 push %ebx
> > 4: 83 ec 08 sub $0x8,%esp
> > 7: c7 04 24 00 00 00 00 movl $0x0,(%esp)
> > a: R_386_32 .rodata.str1.4
> >
> > We can use ipgre_tap_init_net, and the offset of 0xb032 (45106) as that
> > was 0xffffffffa0e5d034 - 0xffffffffa0e52002. Do you have CONFIG_NET_NS
> > set?
>
> ipgre_tap_init_net is 000000000000001a, but there's no way I can subtract
> 0xb053 from that? Sorry, I'm confused. :-)

OK, then its most likely in another module (likely the ip_tunnel.ko).

Do know if you have CONFIG_NET_NS set in your .config?

>
> > You can also cat /proc/modules. It gives you where the modules are
> > located.
>
> I've booted back to 3.9.x already; I couldn't live with a crashing kernel like
> that. Unfortunately it's not that easy for me to reboot this machine all the
> time either. :-/

OK, if you get time, this may be a candidate to do a git bisect with.

Thanks,

-- Steve

2013-06-07 18:46:31

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, Jun 07, 2013 at 02:44:19PM -0400, Steven Rostedt wrote:
> Do know if you have CONFIG_NET_NS set in your .config?

Sorry, I forgot to answer this: No, it is not set.

/* Steinar */
--
Homepage: http://www.sesse.net/

2013-06-07 20:26:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

On Fri, 2013-06-07 at 20:46 +0200, Steinar H. Gunderson wrote:
> On Fri, Jun 07, 2013 at 02:44:19PM -0400, Steven Rostedt wrote:
> > Do know if you have CONFIG_NET_NS set in your .config?
>
> Sorry, I forgot to answer this: No, it is not set.

OK please try the following patch, Steven forgot to update the include
file as well.

I tried this patch and it solved the problem for me.

Thanks


[PATCH] ip_tunnel: remove __net_init/exit from exported functions

If CONFIG_NET_NS is not set then __net_init is the same as __init and
__net_exit is the same as __exit. These functions will be removed from
memory after the module loads or is removed. Functions that are exported
for use by other functions should never be labeled for removal.

Bug introduced by commit c54419321455631079c
("GRE: Refactor GRE tunneling code.")

Reported-by: Steinar H. Gunderson <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
include/net/ip_tunnels.h | 6 +++---
net/ipv4/ip_tunnel.c | 4 ++--
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 40b4dfc..1be442f 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -95,10 +95,10 @@ struct ip_tunnel_net {
int ip_tunnel_init(struct net_device *dev);
void ip_tunnel_uninit(struct net_device *dev);
void ip_tunnel_dellink(struct net_device *dev, struct list_head *head);
-int __net_init ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
- struct rtnl_link_ops *ops, char *devname);
+int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
+ struct rtnl_link_ops *ops, char *devname);

-void __net_exit ip_tunnel_delete_net(struct ip_tunnel_net *itn);
+void ip_tunnel_delete_net(struct ip_tunnel_net *itn);

void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
const struct iphdr *tnl_params, const u8 protocol);
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 7c79cf8..e189db4 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -853,7 +853,7 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
}
EXPORT_SYMBOL_GPL(ip_tunnel_dellink);

-int __net_init ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
+int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
struct rtnl_link_ops *ops, char *devname)
{
struct ip_tunnel_net *itn = net_generic(net, ip_tnl_net_id);
@@ -899,7 +899,7 @@ static void ip_tunnel_destroy(struct ip_tunnel_net *itn, struct list_head *head)
unregister_netdevice_queue(itn->fb_tunnel_dev, head);
}

-void __net_exit ip_tunnel_delete_net(struct ip_tunnel_net *itn)
+void ip_tunnel_delete_net(struct ip_tunnel_net *itn)
{
LIST_HEAD(list);


2013-06-13 10:01:31

by David Miller

[permalink] [raw]
Subject: Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

From: Eric Dumazet <[email protected]>
Date: Thu, 06 Jun 2013 20:59:48 -0700

> On Thu, 2013-06-06 at 23:06 -0400, Steven Rostedt wrote:
>> On Fri, Jun 07, 2013 at 12:16:56AM +0200, Steinar H. Gunderson wrote:
>> > Hi,
>> >
>> > In 3.10.0-rc4, I get this on boot:
>> >
>> > [ 16.871043] BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
>> > [ 16.879453] IP: [<ffffffffa0e52002>] 0xffffffffa0e52001
>>
>> Strange, kallsyms should have registered the address already, even if it
>> crashed on early module load. Not sure why it's not reporting it. Well,
>> it seems to have reported some of the symbols of ip_gre below. Maybe
>> this pointer is just totally screwed up.
>
> I could not reproduce this here.
>
> Steinar, please make sure you recompiled your modules, because this
> looks like you loaded old modules.

I applied Eric's fixed version of the patch, thanks.