2010-06-22 23:16:56

by Justin P. Mattock

[permalink] [raw]
Subject: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

I remember ipsec was able to work cleanly on my machines probably
about 4/6 months ago
now I get this:


[ 302.071077] BUG: unable to handle kernel NULL pointer dereference
at 00000000000000a0
[ 302.071084] IP: [<ffffffff81387e0b>] xfrm_bundle_ok+0x14f/0x2e9
[ 302.071094] PGD 13e695067 PUD 139c7e067 PMD 0
[ 302.071100] Oops: 0000 [#1] SMP
[ 302.071104] last sysfs file:
/sys/devices/pci0000:00/0000:00:15.0/0000:04:00.0/net/eth1/statistics/tx_bytes
[ 302.071109] CPU 0
[ 302.071111] Modules linked in: xfrm4_mode_transport sco xcbc bnep
rmd160 sha512_generic xt_tcpudp ipt_LOG iptable_nat nf_nat xt_state
nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
iptable_filter ip_tables x_tables firewire_ohci firewire_core evdev
lib80211_crypt_tkip uvcvideo videodev ohci1394 v4l1_compat button
thermal wl(P) nvidia(P) ohci_hcd forcedeth i2c_nforce2 aes_x86_64 lzo
lzo_compress ipcomp xfrm_ipcomp crypto_null sha256_generic cbc
des_generic cast5 blowfish serpent camellia twofish twofish_common ctr
ah4 esp4 authenc adm1021 raw1394 ieee1394 uhci_hcd ehci_hcd hci_uart
rfcomm btusb hidp l2cap bluetooth coretemp acpi_cpufreq processor
mperf appletouch applesmc
[ 302.071185]
[ 302.071189] Pid: 2603, comm: vncviewer Tainted: P
2.6.35-rc2-00001-g8dd40f7 #3 Mac-F2218FC8/iMac9,1
[ 302.071193] RIP: 0010:[<ffffffff81387e0b>] [<ffffffff81387e0b>]
xfrm_bundle_ok+0x14f/0x2e9
[ 302.071199] RSP: 0018:ffff880139f4db58 EFLAGS: 00010246
[ 302.071202] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 302.071206] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880139f48700
[ 302.071209] RBP: ffff880139f4dbc8 R08: 0000000000000000 R09: ffff8801389cc574
[ 302.071212] R10: dead000000200200 R11: ffff880139f4dc98 R12: ffff88012739a500
[ 302.071216] R13: ffff88012739a780 R14: 0000000000000000 R15: ffff88012ed266c0
[ 302.071220] FS: 00007f201be85740(0000) GS:ffff880001a00000(0000)
knlGS:0000000000000000
[ 302.071224] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 302.071227] CR2: 00000000000000a0 CR3: 000000013b2a6000 CR4: 00000000000406f0
[ 302.071230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 302.071234] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 302.071238] Process vncviewer (pid: 2603, threadinfo
ffff880139f4c000, task ffff880131b1dc40)
[ 302.071240] Stack:
[ 302.071242] ffff8801389366c0 ffffffff8168ff08 000000000000001c
000000000000000c
[ 302.071248] <0> 0000000000000000 00000000004623c0 0000000000000000
0000000081606c40
[ 302.071253] <0> ffff8801389cc480 ffff88012739a500 ffff88012ef80780
0000000000000000
[ 302.071260] Call Trace:
[ 302.071265] [<ffffffff81387fba>] stale_bundle+0x15/0x1f
[ 302.071270] [<ffffffff81387fdc>] xfrm_dst_check+0x18/0x2e
[ 302.071275] [<ffffffff8131d02f>] __sk_dst_check+0x27/0x53
[ 302.071281] [<ffffffff8135172a>] ip_queue_xmit+0x3c/0x2ed
[ 302.071286] [<ffffffff8136405c>] ? tcp_connect+0x1d4/0x379
[ 302.071290] [<ffffffff8131eef3>] ? __skb_clone+0x29/0x100
[ 302.071295] [<ffffffff81363dc0>] tcp_transmit_skb+0x6e1/0x71f
[ 302.071300] [<ffffffff81364175>] tcp_connect+0x2ed/0x379
[ 302.071305] [<ffffffff81243739>] ? secure_tcp_sequence_number+0x55/0x6e
[ 302.071310] [<ffffffff813692ee>] tcp_v4_connect+0x3c4/0x419
[ 302.071316] [<ffffffff811952d2>] ? avc_has_perm+0x57/0x69
[ 302.071321] [<ffffffff81375030>] inet_stream_connect+0xa7/0x260
[ 302.071326] [<ffffffff8131aa26>] sys_connect+0x75/0x9b
[ 302.071332] [<ffffffff810e403c>] ? fd_install+0x52/0x5b
[ 302.071338] [<ffffffff81092983>] ? audit_syscall_entry+0x1b6/0x1e2
[ 302.071342] [<ffffffff8131a552>] ? sys_socket+0x3b/0x57
[ 302.071348] [<ffffffff81025f42>] system_call_fastpath+0x16/0x1b
[ 302.071350] Code: 7d 58 41 80 bf c0 00 00 00 02 0f 85 98 01 00 00
41 8b 87 a8 00 00 00 41 39 85 b8 01 00 00 0f 85 84 01 00 00 49 8b 85
90 01 00 00 <8b> 80 a0 00 00 00 41 39 85 bc 01 00 00 0f 85 6a 01 00 00
83 7d
[ 302.071400] RIP [<ffffffff81387e0b>] xfrm_bundle_ok+0x14f/0x2e9
[ 302.071405] RSP <ffff880139f4db58>
[ 302.071408] CR2: 00000000000000a0
[ 302.071414] ---[ end trace b4323dbb88295950 ]---


starting a bisect, but might take some time....

--
Justin P. Mattock


2010-06-23 14:30:15

by John W. Linville

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

On Tue, Jun 22, 2010 at 04:16:53PM -0700, Justin Mattock wrote:
> I remember ipsec was able to work cleanly on my machines probably
> about 4/6 months ago
> now I get this:

<snip>

Perhaps netdev would be a more appropriate list than linux-wireless
for this?

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2010-06-23 14:40:56

by Justin P. Mattock

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

On 06/23/2010 07:16 AM, John W. Linville wrote:
> On Tue, Jun 22, 2010 at 04:16:53PM -0700, Justin Mattock wrote:
>> I remember ipsec was able to work cleanly on my machines probably
>> about 4/6 months ago
>> now I get this:
>
> <snip>
>
> Perhaps netdev would be a more appropriate list than linux-wireless
> for this?
>
> John

alright added the cc's..
almost done bisecting..hopefully this points to
the right area..(if not I'll re-bisect until I get this).

as for the troubled machines I've a macbookpro2,2(no proprietary
stuff) and an imac9,1 with broadcom-sta(blah) both
seem to hit this right when the ipsec transaction starts
with ssh, vncviewer..(last good kernel I have is
2.6.33-rc5-01007-ge9449d8).

Justin P. Mattock

2010-06-23 17:08:16

by Justin P. Mattock

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

o.k. the bisect is pointing to the below results..
(I tried git revert xxx but this commit is too big
so I'll(hopefully)manually revert it on the latest HEAD to
see if this is the actual problem im experiencing)



80c802f3073e84c956846e921e8a0b02dfa3755f is the first bad commit
commit 80c802f3073e84c956846e921e8a0b02dfa3755f
Author: Timo Teräs <[email protected]>
Date: Wed Apr 7 00:30:05 2010 +0000

xfrm: cache bundles instead of policies for outgoing flows

__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).

Signed-off-by: Timo Teras <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

:040000 040000 d8e60f5fa4c1329f450d9c7cdf98b34e6a177f22
9f576e68e5bf4ce357d7f0305aee5f410250dfe2 M include
:040000 040000 f2876df688ee36907af7b4123eea96592faaed3e
a3f6f6f94f0309106856cd99b38ec90b024eb016 M net



Justin P. Mattock

2010-06-23 17:29:37

by Eric Dumazet

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

Le mercredi 23 juin 2010 à 10:00 -0700, Justin P. Mattock a écrit :
> o.k. the bisect is pointing to the below results..
> (I tried git revert xxx but this commit is too big
> so I'll(hopefully)manually revert it on the latest HEAD to
> see if this is the actual problem im experiencing)
>
>
>
> 80c802f3073e84c956846e921e8a0b02dfa3755f is the first bad commit
> commit 80c802f3073e84c956846e921e8a0b02dfa3755f
> Author: Timo Teräs <[email protected]>
> Date: Wed Apr 7 00:30:05 2010 +0000
>
> xfrm: cache bundles instead of policies for outgoing flows
>
> __xfrm_lookup() is called for each packet transmitted out of
> system. The xfrm_find_bundle() does a linear search which can
> kill system performance depending on how many bundles are
> required per policy.
>
> This modifies __xfrm_lookup() to store bundles directly in
> the flow cache. If we did not get a hit, we just create a new
> bundle instead of doing slow search. This means that we can now
> get multiple xfrm_dst's for same flow (on per-cpu basis).
>
> Signed-off-by: Timo Teras <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
>
> :040000 040000 d8e60f5fa4c1329f450d9c7cdf98b34e6a177f22
> 9f576e68e5bf4ce357d7f0305aee5f410250dfe2 M include
> :040000 040000 f2876df688ee36907af7b4123eea96592faaed3e
> a3f6f6f94f0309106856cd99b38ec90b024eb016 M net
>
>

Thanks a lot for bisecting Jutin, this is really appreciated.

crash is in xfrm_bundle_ok()

if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
return 0;

xdst->pols[0] contains a NULL pointer

2010-06-23 18:10:23

by Timo Teras

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

On 06/23/2010 08:29 PM, Eric Dumazet wrote:
> Le mercredi 23 juin 2010 à 10:00 -0700, Justin P. Mattock a écrit :
>> o.k. the bisect is pointing to the below results..
>> (I tried git revert xxx but this commit is too big
>> so I'll(hopefully)manually revert it on the latest HEAD to
>> see if this is the actual problem im experiencing)
>>
>> 80c802f3073e84c956846e921e8a0b02dfa3755f is the first bad commit
>> commit 80c802f3073e84c956846e921e8a0b02dfa3755f
>> Author: Timo Teräs <[email protected]>
>> Date: Wed Apr 7 00:30:05 2010 +0000
>>
>> xfrm: cache bundles instead of policies for outgoing flows
>>
>> __xfrm_lookup() is called for each packet transmitted out of
>> system. The xfrm_find_bundle() does a linear search which can
>> kill system performance depending on how many bundles are
>> required per policy.
>>
>> This modifies __xfrm_lookup() to store bundles directly in
>> the flow cache. If we did not get a hit, we just create a new
>> bundle instead of doing slow search. This means that we can now
>> get multiple xfrm_dst's for same flow (on per-cpu basis).
>>
>> Signed-off-by: Timo Teras <[email protected]>
>> Signed-off-by: David S. Miller <[email protected]>
>>
>> :040000 040000 d8e60f5fa4c1329f450d9c7cdf98b34e6a177f22
>> 9f576e68e5bf4ce357d7f0305aee5f410250dfe2 M include
>> :040000 040000 f2876df688ee36907af7b4123eea96592faaed3e
>> a3f6f6f94f0309106856cd99b38ec90b024eb016 M net
>
> Thanks a lot for bisecting Jutin, this is really appreciated.
>
> crash is in xfrm_bundle_ok()
>
> if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
> return 0;
>
> xdst->pols[0] contains a NULL pointer

That does not really make sense, if we get this far; there's a valid
xfrm_state with the bundle. This means that there existed a policy with
it too.

I'll take a deeper look at this tomorrow. Would it be possible to see
your xfrm policies?

- Timo

2010-06-23 18:21:14

by Justin P. Mattock

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

On 06/23/10 11:10, Timo Teräs wrote:
> On 06/23/2010 08:29 PM, Eric Dumazet wrote:
>
>> Le mercredi 23 juin 2010 à 10:00 -0700, Justin P. Mattock a écrit :
>>
>>> o.k. the bisect is pointing to the below results..
>>> (I tried git revert xxx but this commit is too big
>>> so I'll(hopefully)manually revert it on the latest HEAD to
>>> see if this is the actual problem im experiencing)
>>>
>>> 80c802f3073e84c956846e921e8a0b02dfa3755f is the first bad commit
>>> commit 80c802f3073e84c956846e921e8a0b02dfa3755f
>>> Author: Timo Teräs<[email protected]>
>>> Date: Wed Apr 7 00:30:05 2010 +0000
>>>
>>> xfrm: cache bundles instead of policies for outgoing flows
>>>
>>> __xfrm_lookup() is called for each packet transmitted out of
>>> system. The xfrm_find_bundle() does a linear search which can
>>> kill system performance depending on how many bundles are
>>> required per policy.
>>>
>>> This modifies __xfrm_lookup() to store bundles directly in
>>> the flow cache. If we did not get a hit, we just create a new
>>> bundle instead of doing slow search. This means that we can now
>>> get multiple xfrm_dst's for same flow (on per-cpu basis).
>>>
>>> Signed-off-by: Timo Teras<[email protected]>
>>> Signed-off-by: David S. Miller<[email protected]>
>>>
>>> :040000 040000 d8e60f5fa4c1329f450d9c7cdf98b34e6a177f22
>>> 9f576e68e5bf4ce357d7f0305aee5f410250dfe2 M include
>>> :040000 040000 f2876df688ee36907af7b4123eea96592faaed3e
>>> a3f6f6f94f0309106856cd99b38ec90b024eb016 M net
>>>
>> Thanks a lot for bisecting Jutin, this is really appreciated.
>>
>> crash is in xfrm_bundle_ok()
>>
>> if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
>> return 0;
>>
>> xdst->pols[0] contains a NULL pointer
>>
> That does not really make sense, if we get this far; there's a valid
> xfrm_state with the bundle. This means that there existed a policy with
> it too.
>
> I'll take a deeper look at this tomorrow. Would it be possible to see
> your xfrm policies?
>
> - Timo
>
>

@Eric sure no problem doing the bisect...

as for the xfrm policy here is the link that I used to setup ipsec:
http://www.linuxfromscratch.org/hints/downloads/files/ipsec.txt
(just change the keys if doing real world work..(but for me just testing).

below is a temporary fix for me to get this working, tcpdump reports
everything is doing what it should be
11:16:32.496166 IP xxxxx > xxxxx: AH(spi=0x00000200,seq=0x1090):
ESP(spi=0x00000201,seq=0x1090), length 56
11:16:32.496212 IP xxxxx > xxxxx: AH(spi=0x00000200,seq=0x1091):
ESP(spi=0x00000201,seq=0x1091), length 56
11:16:32.496259 IP xxxxx > xxxxx: AH(spi=0x00000200,seq=0x1092):
ESP(spi=0x00000201,seq=0x1092), length 56

(tested a few mins ago, but not the right fix..)

From a280d9048c8f8f5756f9f0dcc608b4499645593e Mon Sep 17 00:00:00 2001
From: Justin P. Mattock <[email protected]>
Date: Wed, 23 Jun 2010 11:10:28 -0700
Subject: [PATCH] fix ipsec
Signed-off-by: Justin P. Mattock <[email protected]>

---
net/xfrm/xfrm_policy.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4bf27d9..701d69a 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2300,8 +2300,6 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct
xfrm_dst *first,
return 0;
if (xdst->xfrm_genid != dst->xfrm->genid)
return 0;
- if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
- return 0;

if (strict && fl &&
!(dst->xfrm->outer_mode->flags & XFRM_MODE_FLAG_TUNNEL) &&
--
1.7.1.rc1.21.gf3bd6


I've got the machine ready for any patches.. let me know o.k...

Justin P. Mattock

2010-06-23 20:42:15

by Timo Teras

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

On 06/23/2010 09:20 PM, Justin P. Mattock wrote:
> On 06/23/10 11:10, Timo Teräs wrote:
>> On 06/23/2010 08:29 PM, Eric Dumazet wrote:
>>
>>> Le mercredi 23 juin 2010 à 10:00 -0700, Justin P. Mattock a écrit :
>>>
>>>> o.k. the bisect is pointing to the below results..
>>>> (I tried git revert xxx but this commit is too big
>>>> so I'll(hopefully)manually revert it on the latest HEAD to
>>>> see if this is the actual problem im experiencing)
>>>>
>>>> 80c802f3073e84c956846e921e8a0b02dfa3755f is the first bad commit
>>>> commit 80c802f3073e84c956846e921e8a0b02dfa3755f
>>>> Author: Timo Teräs<[email protected]>
>>>> Date: Wed Apr 7 00:30:05 2010 +0000
>>>>
>>>> xfrm: cache bundles instead of policies for outgoing flows
>>>>
>>>> __xfrm_lookup() is called for each packet transmitted out of
>>>> system. The xfrm_find_bundle() does a linear search which can
>>>> kill system performance depending on how many bundles are
>>>> required per policy.
>>>>
>>>> This modifies __xfrm_lookup() to store bundles directly in
>>>> the flow cache. If we did not get a hit, we just create a new
>>>> bundle instead of doing slow search. This means that we can now
>>>> get multiple xfrm_dst's for same flow (on per-cpu basis).
>>>>
>>>> Signed-off-by: Timo Teras<[email protected]>
>>>> Signed-off-by: David S. Miller<[email protected]>
>>>>
>>>> :040000 040000 d8e60f5fa4c1329f450d9c7cdf98b34e6a177f22
>>>> 9f576e68e5bf4ce357d7f0305aee5f410250dfe2 M include
>>>> :040000 040000 f2876df688ee36907af7b4123eea96592faaed3e
>>>> a3f6f6f94f0309106856cd99b38ec90b024eb016 M net
>>>>
>>> Thanks a lot for bisecting Jutin, this is really appreciated.
>>>
>>> crash is in xfrm_bundle_ok()
>>>
>>> if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
>>> return 0;
>>>
>>> xdst->pols[0] contains a NULL pointer
>>>
>> That does not really make sense, if we get this far; there's a valid
>> xfrm_state with the bundle. This means that there existed a policy with
>> it too.
>>
>> I'll take a deeper look at this tomorrow. Would it be possible to see
>> your xfrm policies?
>>
>> - Timo
>>
>>
>
> @Eric sure no problem doing the bisect...
>
> as for the xfrm policy here is the link that I used to setup ipsec:
> http://www.linuxfromscratch.org/hints/downloads/files/ipsec.txt
> (just change the keys if doing real world work..(but for me just testing).
>
> below is a temporary fix for me to get this working, tcpdump reports
> everything is doing what it should be
> 11:16:32.496166 IP xxxxx > xxxxx: AH(spi=0x00000200,seq=0x1090):
> ESP(spi=0x00000201,seq=0x1090), length 56
> 11:16:32.496212 IP xxxxx > xxxxx: AH(spi=0x00000200,seq=0x1091):
> ESP(spi=0x00000201,seq=0x1091), length 56
> 11:16:32.496259 IP xxxxx > xxxxx: AH(spi=0x00000200,seq=0x1092):
> ESP(spi=0x00000201,seq=0x1092), length 56
>
> (tested a few mins ago, but not the right fix..)

Yes, that would break some other obscure scenarios.

Looks like it's ah inside esp. So you get chain of bundles. And only the
first bundle gets a policy. Should have thought of that. Does the below
fix it for you?

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4bf27d9..af1c173 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2300,7 +2300,8 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct
xfrm_dst *first,
return 0;
if (xdst->xfrm_genid != dst->xfrm->genid)
return 0;
- if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
+ if (xdst->num_pols > 0 &&
+ xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
return 0;

if (strict && fl &&

2010-06-23 21:44:47

by Justin P. Mattock

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0

On 06/23/2010 01:34 PM, Timo Teräs wrote:
> On 06/23/2010 09:20 PM, Justin P. Mattock wrote:
>> On 06/23/10 11:10, Timo Teräs wrote:
>>> On 06/23/2010 08:29 PM, Eric Dumazet wrote:
>>>
>>>> Le mercredi 23 juin 2010 à 10:00 -0700, Justin P. Mattock a écrit :
>>>>
>>>>> o.k. the bisect is pointing to the below results..
>>>>> (I tried git revert xxx but this commit is too big
>>>>> so I'll(hopefully)manually revert it on the latest HEAD to
>>>>> see if this is the actual problem im experiencing)
>>>>>
>>>>> 80c802f3073e84c956846e921e8a0b02dfa3755f is the first bad commit
>>>>> commit 80c802f3073e84c956846e921e8a0b02dfa3755f
>>>>> Author: Timo Teräs<[email protected]>
>>>>> Date: Wed Apr 7 00:30:05 2010 +0000
>>>>>
>>>>> xfrm: cache bundles instead of policies for outgoing flows
>>>>>
>>>>> __xfrm_lookup() is called for each packet transmitted out of
>>>>> system. The xfrm_find_bundle() does a linear search which can
>>>>> kill system performance depending on how many bundles are
>>>>> required per policy.
>>>>>
>>>>> This modifies __xfrm_lookup() to store bundles directly in
>>>>> the flow cache. If we did not get a hit, we just create a new
>>>>> bundle instead of doing slow search. This means that we can now
>>>>> get multiple xfrm_dst's for same flow (on per-cpu basis).
>>>>>
>>>>> Signed-off-by: Timo Teras<[email protected]>
>>>>> Signed-off-by: David S. Miller<[email protected]>
>>>>>
>>>>> :040000 040000 d8e60f5fa4c1329f450d9c7cdf98b34e6a177f22
>>>>> 9f576e68e5bf4ce357d7f0305aee5f410250dfe2 M include
>>>>> :040000 040000 f2876df688ee36907af7b4123eea96592faaed3e
>>>>> a3f6f6f94f0309106856cd99b38ec90b024eb016 M net
>>>>>
>>>> Thanks a lot for bisecting Jutin, this is really appreciated.
>>>>
>>>> crash is in xfrm_bundle_ok()
>>>>
>>>> if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
>>>> return 0;
>>>>
>>>> xdst->pols[0] contains a NULL pointer
>>>>
>>> That does not really make sense, if we get this far; there's a valid
>>> xfrm_state with the bundle. This means that there existed a policy with
>>> it too.
>>>
>>> I'll take a deeper look at this tomorrow. Would it be possible to see
>>> your xfrm policies?
>>>
>>> - Timo
>>>
>>>
>>
>> @Eric sure no problem doing the bisect...
>>
>> as for the xfrm policy here is the link that I used to setup ipsec:
>> http://www.linuxfromscratch.org/hints/downloads/files/ipsec.txt
>> (just change the keys if doing real world work..(but for me just testing).
>>
>> below is a temporary fix for me to get this working, tcpdump reports
>> everything is doing what it should be
>> 11:16:32.496166 IP xxxxx> xxxxx: AH(spi=0x00000200,seq=0x1090):
>> ESP(spi=0x00000201,seq=0x1090), length 56
>> 11:16:32.496212 IP xxxxx> xxxxx: AH(spi=0x00000200,seq=0x1091):
>> ESP(spi=0x00000201,seq=0x1091), length 56
>> 11:16:32.496259 IP xxxxx> xxxxx: AH(spi=0x00000200,seq=0x1092):
>> ESP(spi=0x00000201,seq=0x1092), length 56
>>
>> (tested a few mins ago, but not the right fix..)
>
> Yes, that would break some other obscure scenarios.
>
> Looks like it's ah inside esp. So you get chain of bundles. And only the
> first bundle gets a policy. Should have thought of that. Does the below
> fix it for you?
>
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 4bf27d9..af1c173 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -2300,7 +2300,8 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct
> xfrm_dst *first,
> return 0;
> if (xdst->xfrm_genid != dst->xfrm->genid)
> return 0;
> - if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
> + if (xdst->num_pols> 0&&
> + xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
> return 0;
>
> if (strict&& fl&&
>
>


yeah this works.. I can see AH and ESP showing up with tcpdump..
looks good..

Reported-Bisected-By: Justin P. Mattock <[email protected]>

cheers,

Justin P. Mattock

2010-06-24 05:45:53

by Timo Teras

[permalink] [raw]
Subject: [PATCH] xfrm: check bundle policy existance before dereferencing it

Fix the bundle validation code to not assume having a valid policy.
When we have multiple transformations for a xfrm policy, the bundle
instance will be a chain of bundles with only the first one having
the policy reference. When policy_genid is bumped it will expire the
first bundle in the chain which is equivalent of expiring the whole
chain.

Reported-bisected-and-tested-by: Justin P. Mattock <[email protected]>
Signed-off-by: Timo Teräs <[email protected]>
---
net/xfrm/xfrm_policy.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4bf27d9..af1c173 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2300,7 +2300,8 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct xfrm_dst *first,
return 0;
if (xdst->xfrm_genid != dst->xfrm->genid)
return 0;
- if (xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
+ if (xdst->num_pols > 0 &&
+ xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
return 0;

if (strict && fl &&
--
1.7.0.4

2010-06-24 21:35:21

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] xfrm: check bundle policy existance before dereferencing it

From: Timo Ter?s <[email protected]>
Date: Thu, 24 Jun 2010 08:45:19 +0300

> Fix the bundle validation code to not assume having a valid policy.
> When we have multiple transformations for a xfrm policy, the bundle
> instance will be a chain of bundles with only the first one having
> the policy reference. When policy_genid is bumped it will expire the
> first bundle in the chain which is equivalent of expiring the whole
> chain.
>
> Reported-bisected-and-tested-by: Justin P. Mattock <[email protected]>
> Signed-off-by: Timo Ter?s <[email protected]>

Applied.