2011-02-18 05:03:27

by George Spelvin

[permalink] [raw]
Subject: 2.6.38-rc5 tcp_connect oops: EIP = 0x0

But wonder of wonders, kernel mode switching worked so I got to see the
oops on the text-mode console. It happened just as I tried to ssh out.

It's a Core 2 duo laptop (Dell E1405), 2 GB RAM, running a 32-bit kernel.
It's worth noting that I was using wired internet (b44 driver) and
not wireless.

I've been having a lot of weird lockups with 2.6.38-rcX, quite a change
from the very stable 2.6.36, but this is the first time I booted -rc5.
Also, the symptoms are very different; before the lockup did not seen
correlated with any particular activity, but the "lockup" was more like
something getting wedged in the kernel that more and more tasks would
get stuck on until everything stopped responding.

I should mention that this is transcribed by hand from the screen.
Oh, and also, it is far from the first time I ran ssh this boot.
(I re-tested it ater rebooting, just to be sure. Not a consistent
crash.)

Anyway, jumping to address 0 looks "interesting", so it seems worth reporting.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
Modules linked in: rfcomm btusb sco l2cap crc16 bluetooth b43 mac80211 cfg80211 [last unloaded: sha256_generic]

Pid: 12178, comm: ssh Not tainted 2.6.38-rc5 #227 Dell Inc. MXC061 /0MG532
EIP: 0060:[<00000000>] EFLAGS: 0021246 CPU: 0
EIP is at 0x0
EAX: f5947e00 EBX: f5982f80 ECX: 00000024 EDX: c148fe00
ESI: f5947e00 EDI: ebf29e54 EBP: ebf29ee0 ESP: ebf29dd0
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process ssh (pid: 12178, ti=ebf28000 task=ebefc6c0 task.ti = ebef28000)
Stack:
c12c83ea 07e2427d 02000000 7b018f79 c117b827 0b996609 7b018f79 5f5f6644
f5982f80 00000000 ebf29e58 ebf29ee0 c12cc0e0 00001600 00000000 00000001
03641600 00000000 00000000 00000000 00000000 036423c0 3e6423c0 00000000
Call Trace:
[<c12c83ea>] ? tcp_connect+0xdd/0x3fd
[<c117b827>] ? secure_tcp_sequence_number+0x4f/0x65
[<c12cc0e0>] ? tcp_v4_connect+0x3c1/0x417
[<c12d6726>] ? inet_stream_connect+0x88/0x1fc
[<c1110705>] ? _copy_from_user+0x2b/0x10e
[<c129307d>] ? sys_connect+0x70/0x98
[<c108df0f>] ? get_empty_filp+0x9f/0x121
[<c108dfa0>] ? alloc_file+0xf/0x85
[<c1293287>] ? sock_alloc_file+0x97/0xeb
[<c108b758>] ? fd_install+0x1b/0x38
[<c12932f6>] ? sock_map_fd+0x1b/0x20
[<c1293bdf>] ? sys_socketcall+0x9d/0x291
[<c1002750>] ? sysenter_do_call+0x12/0x26
Code: Bad EIP value
EIP: [<00000000>] 0x0 SS:ESP 0068:ebf29dd0
CR2: 0000000000000000


2011-02-18 06:32:24

by Eric Dumazet

[permalink] [raw]
Subject: Re: 2.6.38-rc5 tcp_connect oops: EIP = 0x0

Le vendredi 18 février 2011 à 00:03 -0500, George Spelvin a écrit :
> But wonder of wonders, kernel mode switching worked so I got to see the
> oops on the text-mode console. It happened just as I tried to ssh out.
>

CC netdev (removed linux-netdev)

I'll take a look this morning, thanks for the report.

> It's a Core 2 duo laptop (Dell E1405), 2 GB RAM, running a 32-bit kernel.
> It's worth noting that I was using wired internet (b44 driver) and
> not wireless.
>
> I've been having a lot of weird lockups with 2.6.38-rcX, quite a change
> from the very stable 2.6.36, but this is the first time I booted -rc5.
> Also, the symptoms are very different; before the lockup did not seen
> correlated with any particular activity, but the "lockup" was more like
> something getting wedged in the kernel that more and more tasks would
> get stuck on until everything stopped responding.
>
> I should mention that this is transcribed by hand from the screen.
> Oh, and also, it is far from the first time I ran ssh this boot.
> (I re-tested it ater rebooting, just to be sure. Not a consistent
> crash.)
>
> Anyway, jumping to address 0 looks "interesting", so it seems worth reporting.
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [< (null)>] (null)
> *pde = 00000000
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
> Modules linked in: rfcomm btusb sco l2cap crc16 bluetooth b43 mac80211 cfg80211 [last unloaded: sha256_generic]
>
> Pid: 12178, comm: ssh Not tainted 2.6.38-rc5 #227 Dell Inc. MXC061 /0MG532
> EIP: 0060:[<00000000>] EFLAGS: 0021246 CPU: 0
> EIP is at 0x0
> EAX: f5947e00 EBX: f5982f80 ECX: 00000024 EDX: c148fe00
> ESI: f5947e00 EDI: ebf29e54 EBP: ebf29ee0 ESP: ebf29dd0
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process ssh (pid: 12178, ti=ebf28000 task=ebefc6c0 task.ti = ebef28000)
> Stack:
> c12c83ea 07e2427d 02000000 7b018f79 c117b827 0b996609 7b018f79 5f5f6644
> f5982f80 00000000 ebf29e58 ebf29ee0 c12cc0e0 00001600 00000000 00000001
> 03641600 00000000 00000000 00000000 00000000 036423c0 3e6423c0 00000000
> Call Trace:
> [<c12c83ea>] ? tcp_connect+0xdd/0x3fd
> [<c117b827>] ? secure_tcp_sequence_number+0x4f/0x65
> [<c12cc0e0>] ? tcp_v4_connect+0x3c1/0x417
> [<c12d6726>] ? inet_stream_connect+0x88/0x1fc
> [<c1110705>] ? _copy_from_user+0x2b/0x10e
> [<c129307d>] ? sys_connect+0x70/0x98
> [<c108df0f>] ? get_empty_filp+0x9f/0x121
> [<c108dfa0>] ? alloc_file+0xf/0x85
> [<c1293287>] ? sock_alloc_file+0x97/0xeb
> [<c108b758>] ? fd_install+0x1b/0x38
> [<c12932f6>] ? sock_map_fd+0x1b/0x20
> [<c1293bdf>] ? sys_socketcall+0x9d/0x291
> [<c1002750>] ? sysenter_do_call+0x12/0x26
> Code: Bad EIP value
> EIP: [<00000000>] 0x0 SS:ESP 0068:ebf29dd0
> CR2: 0000000000000000

2011-02-18 09:57:12

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] net: Add default_advmss() methods to blackhole dst_ops

Le vendredi 18 février 2011 à 07:32 +0100, Eric Dumazet a écrit :
> Le vendredi 18 février 2011 à 00:03 -0500, George Spelvin a écrit :
> > But wonder of wonders, kernel mode switching worked so I got to see the
> > oops on the text-mode console. It happened just as I tried to ssh out.
> >
>
> CC netdev (removed linux-netdev)
>
> I'll take a look this morning, thanks for the report.
>
> > It's a Core 2 duo laptop (Dell E1405), 2 GB RAM, running a 32-bit kernel.
> > It's worth noting that I was using wired internet (b44 driver) and
> > not wireless.
> >
> > I've been having a lot of weird lockups with 2.6.38-rcX, quite a change
> > from the very stable 2.6.36, but this is the first time I booted -rc5.
> > Also, the symptoms are very different; before the lockup did not seen
> > correlated with any particular activity, but the "lockup" was more like
> > something getting wedged in the kernel that more and more tasks would
> > get stuck on until everything stopped responding.
> >
> > I should mention that this is transcribed by hand from the screen.
> > Oh, and also, it is far from the first time I ran ssh this boot.
> > (I re-tested it ater rebooting, just to be sure. Not a consistent
> > crash.)
> >
> > Anyway, jumping to address 0 looks "interesting", so it seems worth reporting.
> >
> > BUG: unable to handle kernel NULL pointer dereference at (null)
> > IP: [< (null)>] (null)
> > *pde = 00000000
> > Oops: 0000 [#1] SMP
> > last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
> > Modules linked in: rfcomm btusb sco l2cap crc16 bluetooth b43 mac80211 cfg80211 [last unloaded: sha256_generic]
> >
> > Pid: 12178, comm: ssh Not tainted 2.6.38-rc5 #227 Dell Inc. MXC061 /0MG532
> > EIP: 0060:[<00000000>] EFLAGS: 0021246 CPU: 0
> > EIP is at 0x0
> > EAX: f5947e00 EBX: f5982f80 ECX: 00000024 EDX: c148fe00
> > ESI: f5947e00 EDI: ebf29e54 EBP: ebf29ee0 ESP: ebf29dd0
> > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > Process ssh (pid: 12178, ti=ebf28000 task=ebefc6c0 task.ti = ebef28000)
> > Stack:
> > c12c83ea 07e2427d 02000000 7b018f79 c117b827 0b996609 7b018f79 5f5f6644
> > f5982f80 00000000 ebf29e58 ebf29ee0 c12cc0e0 00001600 00000000 00000001
> > 03641600 00000000 00000000 00000000 00000000 036423c0 3e6423c0 00000000
> > Call Trace:
> > [<c12c83ea>] ? tcp_connect+0xdd/0x3fd
> > [<c117b827>] ? secure_tcp_sequence_number+0x4f/0x65
> > [<c12cc0e0>] ? tcp_v4_connect+0x3c1/0x417
> > [<c12d6726>] ? inet_stream_connect+0x88/0x1fc
> > [<c1110705>] ? _copy_from_user+0x2b/0x10e
> > [<c129307d>] ? sys_connect+0x70/0x98
> > [<c108df0f>] ? get_empty_filp+0x9f/0x121
> > [<c108dfa0>] ? alloc_file+0xf/0x85
> > [<c1293287>] ? sock_alloc_file+0x97/0xeb
> > [<c108b758>] ? fd_install+0x1b/0x38
> > [<c12932f6>] ? sock_map_fd+0x1b/0x20
> > [<c1293bdf>] ? sys_socketcall+0x9d/0x291
> > [<c1002750>] ? sysenter_do_call+0x12/0x26
> > Code: Bad EIP value
> > EIP: [<00000000>] 0x0 SS:ESP 0068:ebf29dd0
> > CR2: 0000000000000000


I suspect following patch is needed

CC Roland Dreier <[email protected]> because he fixed the
default_mtu problem in commit ec831ea72ee5d7d47
(net: Add default_mtu() methods to blackhole dst_ops)

Thanks !

[PATCH] net: Add default_advmss() methods to blackhole dst_ops

Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()

Reported-by: George Spelvin <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
CC: Roland Dreier <[email protected]>
---
net/ipv4/route.c | 7 +++++++
net/ipv6/route.c | 6 ++++++
2 files changed, 13 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 788a3e7..5edb605 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2712,6 +2712,12 @@ static unsigned int ipv4_blackhole_default_mtu(const struct dst_entry *dst)
return 0;
}

+static unsigned int ipv4_blackhole_default_advmss(const struct dst_entry *dst)
+{
+ return 256;
+}
+
+
static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu)
{
}
@@ -2722,6 +2728,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
.destroy = ipv4_dst_destroy,
.check = ipv4_blackhole_dst_check,
.default_mtu = ipv4_blackhole_default_mtu,
+ .default_advmss = ipv4_blackhole_default_advmss,
.update_pmtu = ipv4_rt_blackhole_update_pmtu,
};

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1c29f95..2eeeabb 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -118,6 +118,11 @@ static unsigned int ip6_blackhole_default_mtu(const struct dst_entry *dst)
return 0;
}

+static unsigned int ip6_blackhole_default_advmss(const struct dst_entry *dst)
+{
+ return 256;
+}
+
static void ip6_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu)
{
}
@@ -128,6 +133,7 @@ static struct dst_ops ip6_dst_blackhole_ops = {
.destroy = ip6_dst_destroy,
.check = ip6_dst_check,
.default_mtu = ip6_blackhole_default_mtu,
+ .default_advmss = ip6_blackhole_default_advmss,
.update_pmtu = ip6_rt_blackhole_update_pmtu,
};


2011-02-18 13:17:05

by Changli Gao

[permalink] [raw]
Subject: Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops

On Fri, Feb 18, 2011 at 5:57 PM, Eric Dumazet <[email protected]> wrote:
>
>
> I suspect following patch is needed
>
> CC Roland Dreier <[email protected]> because he fixed the
> default_mtu problem in commit ec831ea72ee5d7d47
> (net: Add default_mtu() methods to blackhole dst_ops)
>
> Thanks !
>
> [PATCH] net: Add default_advmss() methods to blackhole dst_ops
>
> Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
> accessor.) introduced a possible crash in tcp_connect_init(), when
> dst->default_advmss() is called from dst_metric_advmss()
>
> Reported-by: George Spelvin <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> CC: Roland Dreier <[email protected]>
> ---
> ?net/ipv4/route.c | ? ?7 +++++++
> ?net/ipv6/route.c | ? ?6 ++++++
> ?2 files changed, 13 insertions(+)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 788a3e7..5edb605 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2712,6 +2712,12 @@ static unsigned int ipv4_blackhole_default_mtu(const struct dst_entry *dst)
> ? ? ? ?return 0;
> ?}
>
> +static unsigned int ipv4_blackhole_default_advmss(const struct dst_entry *dst)
> +{
> + ? ? ? return 256;
> +}
> +
> +

I am wondering why magic number 256 is used here. Is there a special
reason? Thanks.

--
Regards,
Changli Gao([email protected])

2011-02-18 13:24:13

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops

Le vendredi 18 février 2011 à 21:16 +0800, Changli Gao a écrit :

> I am wondering why magic number 256 is used here. Is there a special
> reason? Thanks.
>

It really doesnt matter. SYN message will be dropped anyway.

256 happens to be the default value
of /proc/sys/net/ipv4/route/min_adv_mss


2011-02-18 13:30:08

by Changli Gao

[permalink] [raw]
Subject: Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops

On Fri, Feb 18, 2011 at 9:24 PM, Eric Dumazet <[email protected]> wrote:
> Le vendredi 18 f?vrier 2011 ? 21:16 +0800, Changli Gao a ?crit :
>
>> I am wondering why magic number 256 is used here. Is there a special
>> reason? Thanks.
>>
>
> It really doesnt matter. SYN message will be dropped anyway.
>
> 256 happens to be the default value
> of /proc/sys/net/ipv4/route/min_adv_mss
>

Thanks for your explaining. IMHO, ip_rt_min_advmss is better than a
magic number, here.

--
Regards,
Changli Gao([email protected])

2011-02-18 13:34:48

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops

Le vendredi 18 février 2011 à 21:29 +0800, Changli Gao a écrit :
> On Fri, Feb 18, 2011 at 9:24 PM, Eric Dumazet <[email protected]> wrote:
> > Le vendredi 18 février 2011 à 21:16 +0800, Changli Gao a écrit :
> >
> >> I am wondering why magic number 256 is used here. Is there a special
> >> reason? Thanks.
> >>
> >
> > It really doesnt matter. SYN message will be dropped anyway.
> >
> > 256 happens to be the default value
> > of /proc/sys/net/ipv4/route/min_adv_mss
> >
>
> Thanks for your explaining. IMHO, ip_rt_min_advmss is better than a
> magic number, here.
>

I had this exact idea but found we need struct net pointer to get this
value, not provided in parameters, so I falled back to the 256 value.


2011-02-18 13:43:44

by Daniel Baluta

[permalink] [raw]
Subject: Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops

On Fri, Feb 18, 2011 at 3:33 PM, Eric Dumazet <[email protected]> wrote:
> Le vendredi 18 f?vrier 2011 ? 21:29 +0800, Changli Gao a ?crit :
>> On Fri, Feb 18, 2011 at 9:24 PM, Eric Dumazet <[email protected]> wrote:
>> > Le vendredi 18 f?vrier 2011 ? 21:16 +0800, Changli Gao a ?crit :
>> >
>> >> I am wondering why magic number 256 is used here. Is there a special
>> >> reason? Thanks.
>> >>
>> >
>> > It really doesnt matter. SYN message will be dropped anyway.
>> >
>> > 256 happens to be the default value
>> > of /proc/sys/net/ipv4/route/min_adv_mss

Perhaps a comment would make sense then.

thanks,
Daniel.

2011-02-18 13:44:48

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH v2] net: provide default_advmss() methods to blackhole dst_ops

Le vendredi 18 février 2011 à 14:33 +0100, Eric Dumazet a écrit :

> I had this exact idea but found we need struct net pointer to get this
> value, not provided in parameters, so I falled back to the 256 value.
>
>

Hmm, reading again this stuff, maybe we can just use
ipv4_default_advmss() instead of a custom one.

dst->dev should be available

[PATCH] net: provide default_advmss() methods to blackhole dst_ops

Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()

Reported-by: George Spelvin <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
CC: Roland Dreier <[email protected]>
CC: Changli Gao <[email protected]>
CC: Daniel Baluta <[email protected]>
---
net/ipv4/route.c | 1 +
net/ipv6/route.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 788a3e7..6ed6603 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2722,6 +2722,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
.destroy = ipv4_dst_destroy,
.check = ipv4_blackhole_dst_check,
.default_mtu = ipv4_blackhole_default_mtu,
+ .default_advmss = ipv4_default_advmss,
.update_pmtu = ipv4_rt_blackhole_update_pmtu,
};

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1c29f95..a998db6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -128,6 +128,7 @@ static struct dst_ops ip6_dst_blackhole_ops = {
.destroy = ip6_dst_destroy,
.check = ip6_dst_check,
.default_mtu = ip6_blackhole_default_mtu,
+ .default_advmss = ip6_default_advmss,
.update_pmtu = ip6_rt_blackhole_update_pmtu,
};


2011-02-18 19:39:02

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] net: provide default_advmss() methods to blackhole dst_ops

From: Eric Dumazet <[email protected]>
Date: Fri, 18 Feb 2011 14:44:32 +0100

> Le vendredi 18 f?vrier 2011 ? 14:33 +0100, Eric Dumazet a ?crit :
>
>> I had this exact idea but found we need struct net pointer to get this
>> value, not provided in parameters, so I falled back to the 256 value.
>>
>>
>
> Hmm, reading again this stuff, maybe we can just use
> ipv4_default_advmss() instead of a custom one.
>
> dst->dev should be available
>
> [PATCH] net: provide default_advmss() methods to blackhole dst_ops
>
> Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
> accessor.) introduced a possible crash in tcp_connect_init(), when
> dst->default_advmss() is called from dst_metric_advmss()
>
> Reported-by: George Spelvin <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>

Yes, this is a lot better, applied.

Thanks!

2011-02-18 19:58:29

by George Spelvin

[permalink] [raw]
Subject: Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops

This makes sense to me; I was having trouble at the time with a loose
ethernet cable (broken RJ45 retention clip), so the cable could have
come unplugged at the time I tried to make the connection.

Thank you very much!