2010-04-08 06:59:00

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: powerpc boot failure

Hi ,

Today's linux-next (20100408) failed a powerpc boot test like this:

[While bringing up the network interfaces ...]

Unable to handle kernel paging request for data at address 0x200000025
Faulting instruction address: 0xc00000000053d32c
cpu 0x5: Vector: 300 (Data Access) at [c0000000bb277680]
pc: c00000000053d32c: .__xfrm_lookup+0x32c/0x4c0
lr: c0000000004e6e10: .ip_route_output_flow+0xb0/0x300
sp: c0000000bb277900
msr: 8000000000009032
dar: 200000025
dsisr: 40000000
current = 0xc0000000bce55640
paca = 0xc000000007691a00
pid = 4106, comm = ntpdate
[c0000000bb277a20] c0000000004e6e10 .ip_route_output_flow+0xb0/0x300
[c0000000bb277ad0] c0000000005158c8 .ip4_datagram_connect+0x1a8/0x2f0
[c0000000bb277bd0] c000000000523dc0 .inet_dgram_connect+0x80/0x110
[c0000000bb277c60] c0000000004a6904 .SyS_connect+0xa4/0xf0
[c0000000bb277d90] c0000000004d5f48 .compat_sys_socketcall+0x128/0x2f0
[c0000000bb277e30] c00000000000852c syscall_exit+0x0/0x40

The most obvious suspect is commit
80c802f3073e84c956846e921e8a0b02dfa3755f ("xfrm: cache bundles instead of
policies for outgoing flows") and the couple of commits around that
(these are new to linux-next today).

The above pc is in this piece of code (I think - I don't have the actual
kernel) from __xfrm_lookup (in net/xfrm/xfrm_policy.c):

if ((flags & XFRM_LOOKUP_ICMP) &&
!(pols[0]->flags & XFRM_POLICY_ICMP)) {
err = -ENOENT;
goto error;
}

for (i = 0; i < num_pols; i++)
pols[i]->curlft.use_time = get_seconds(); <-------- (line 1845)

And the 0x200000025 is probably &(pols[i]) (which actually seems unlikely
since pols is an array on the stack).
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (1.79 kB)
(No filename) (198.00 B)
Download all attachments

2010-04-08 07:12:00

by Timo Teras

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

Stephen Rothwell wrote:
> Today's linux-next (20100408) failed a powerpc boot test like this:
>
> [While bringing up the network interfaces ...]
>
> Unable to handle kernel paging request for data at address 0x200000025
> Faulting instruction address: 0xc00000000053d32c
> cpu 0x5: Vector: 300 (Data Access) at [c0000000bb277680]
> pc: c00000000053d32c: .__xfrm_lookup+0x32c/0x4c0
> lr: c0000000004e6e10: .ip_route_output_flow+0xb0/0x300
> sp: c0000000bb277900
> msr: 8000000000009032
> dar: 200000025
> dsisr: 40000000
> current = 0xc0000000bce55640
> paca = 0xc000000007691a00
> pid = 4106, comm = ntpdate
> [c0000000bb277a20] c0000000004e6e10 .ip_route_output_flow+0xb0/0x300
> [c0000000bb277ad0] c0000000005158c8 .ip4_datagram_connect+0x1a8/0x2f0
> [c0000000bb277bd0] c000000000523dc0 .inet_dgram_connect+0x80/0x110
> [c0000000bb277c60] c0000000004a6904 .SyS_connect+0xa4/0xf0
> [c0000000bb277d90] c0000000004d5f48 .compat_sys_socketcall+0x128/0x2f0
> [c0000000bb277e30] c00000000000852c syscall_exit+0x0/0x40
>
> The most obvious suspect is commit
> 80c802f3073e84c956846e921e8a0b02dfa3755f ("xfrm: cache bundles instead of
> policies for outgoing flows") and the couple of commits around that
> (these are new to linux-next today).
>
> The above pc is in this piece of code (I think - I don't have the actual
> kernel) from __xfrm_lookup (in net/xfrm/xfrm_policy.c):
>
> if ((flags & XFRM_LOOKUP_ICMP) &&
> !(pols[0]->flags & XFRM_POLICY_ICMP)) {
> err = -ENOENT;
> goto error;
> }
>
> for (i = 0; i < num_pols; i++)
> pols[i]->curlft.use_time = get_seconds(); <-------- (line 1845)
>
> And the 0x200000025 is probably &(pols[i]) (which actually seems unlikely
> since pols is an array on the stack).

What kind of xfrm policies the system has?

2010-04-08 07:23:25

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

Hi,

On Thu, 08 Apr 2010 10:11:47 +0300 Timo Teräs <[email protected]> wrote:
>
> > The above pc is in this piece of code (I think - I don't have the actual
> > kernel) from __xfrm_lookup (in net/xfrm/xfrm_policy.c):
> >
> > if ((flags & XFRM_LOOKUP_ICMP) &&
> > !(pols[0]->flags & XFRM_POLICY_ICMP)) {
> > err = -ENOENT;
> > goto error;
> > }
> >
> > for (i = 0; i < num_pols; i++)
> > pols[i]->curlft.use_time = get_seconds(); <-------- (line 1845)
> >
> > And the 0x200000025 is probably &(pols[i]) (which actually seems unlikely
> > since pols is an array on the stack).
>
> What kind of xfrm policies the system has?

I don't even know what an xfrm policy is :-). This is a pretty normal Ubuntu
Gutsy install and wouldn't have anything special in its network setup.

The above code fragment may be not quite the right place, sorry. But it
is the right function.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (1.04 kB)
(No filename) (198.00 B)
Download all attachments

2010-04-08 07:29:57

by Timo Teras

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

Stephen Rothwell wrote:
> On Thu, 08 Apr 2010 10:11:47 +0300 Timo Ter?s <[email protected]> wrote:
>>> The above pc is in this piece of code (I think - I don't have the actual
>>> kernel) from __xfrm_lookup (in net/xfrm/xfrm_policy.c):
>>>
>>> if ((flags & XFRM_LOOKUP_ICMP) &&
>>> !(pols[0]->flags & XFRM_POLICY_ICMP)) {
>>> err = -ENOENT;
>>> goto error;
>>> }
>>>
>>> for (i = 0; i < num_pols; i++)
>>> pols[i]->curlft.use_time = get_seconds(); <-------- (line 1845)
>>>
>>> And the 0x200000025 is probably &(pols[i]) (which actually seems unlikely
>>> since pols is an array on the stack).
>> What kind of xfrm policies the system has?
>
> I don't even know what an xfrm policy is :-). This is a pretty normal Ubuntu
> Gutsy install and wouldn't have anything special in its network setup.
>
> The above code fragment may be not quite the right place, sorry. But it
> is the right function.

You don't probably have any xfrm policies then. And that code should not
really get executed.

Some of the changes touch globally visible structs, and inline functions.
Was this a clean rebuild? And did you update all kernel modules, also in
the initramfs?

2010-04-08 07:46:00

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

Hi,

On Thu, 08 Apr 2010 10:29:49 +0300 Timo Teräs <[email protected]> wrote:
>
> You don't probably have any xfrm policies then. And that code should not
> really get executed.
>
> Some of the changes touch globally visible structs, and inline functions.
> Was this a clean rebuild? And did you update all kernel modules, also in
> the initramfs?

Yes, the build is started from scratch and the kernel and modules are
updated (this is our automated build and test system).

I have attached the config in case that is of use.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (631.00 B)
dotconfig.txt (49.33 kB)
(No filename) (198.00 B)
Download all attachments

2010-04-08 07:48:12

by Timo Teras

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

Stephen Rothwell wrote:
> Hi,
>
> On Thu, 08 Apr 2010 10:29:49 +0300 Timo Ter?s <[email protected]> wrote:
>> You don't probably have any xfrm policies then. And that code should not
>> really get executed.
>>
>> Some of the changes touch globally visible structs, and inline functions.
>> Was this a clean rebuild? And did you update all kernel modules, also in
>> the initramfs?
>
> Yes, the build is started from scratch and the kernel and modules are
> updated (this is our automated build and test system).
>
> I have attached the config in case that is of use.

Thanks, I'll check some of the xfrm related configs.

Can you on running system do:
ip xfrm policy

That shows if there's any policies due to e.g. ipsec or tcp-md5.

2010-04-08 08:40:23

by Timo Teras

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

Stephen Rothwell wrote:
> On Thu, 08 Apr 2010 10:29:49 +0300 Timo Ter?s <[email protected]> wrote:
>> You don't probably have any xfrm policies then. And that code should not
>> really get executed.
>>
>> Some of the changes touch globally visible structs, and inline functions.
>> Was this a clean rebuild? And did you update all kernel modules, also in
>> the initramfs?
>
> Yes, the build is started from scratch and the kernel and modules are
> updated (this is our automated build and test system).
>
> I have attached the config in case that is of use.

It looks like my new code uses xfrm_pols_put assuming it always does the
proper thing. But seems like it's doing funny stuff if CONFIG_XFRM_SUB_POLICY
is not set, which is your case.

Can you try if this helps?

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 625dd61..cccb049 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -735,19 +735,12 @@ static inline void xfrm_pol_put(struct xfrm_policy *policy
xfrm_policy_destroy(policy);
}

-#ifdef CONFIG_XFRM_SUB_POLICY
static inline void xfrm_pols_put(struct xfrm_policy **pols, int npols)
{
int i;
for (i = npols - 1; i >= 0; --i)
xfrm_pol_put(pols[i]);
}
-#else
-static inline void xfrm_pols_put(struct xfrm_policy **pols, int npols)
-{
- xfrm_pol_put(pols[0]);
-}
-#endif

extern void __xfrm_state_destroy(struct xfrm_state *);

2010-04-08 18:55:37

by Luck, Tony

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

I'm seeing an oops in the same routine on ia64 built from next-20100408

My setup is a SLES11 installation. next-20100407 booted with no problems.
I'm also clueless about xfrm_policy.

Here's an abbreviated copy of the first (of several) oops. The code
dereferences a bad pointer:

Unable to handle kernel paging request at virtual address 480cb78f00000024
mount.nfs[7289]: Oops 8821862825984 [1]
Modules linked in: nfs lockd auth_rpcgss sunrpc binfmt_misc loop
dm_mod sr_mod usb_storage sg button container usbhid uhci_hcd ehci_hcd
usbcore fan processor thermal thermal_sys

Pid: 7289, CPU 16, comm: mount.nfs
psr : 0000101008526030 ifs : 8000000000000e22 ip :
[<a000000100888f10>] Not tainted
(2.6.34-rc3-generic-smp-next-20100408)
ip is at __xfrm_lookup+0x650/0x760

Call Trace:
[<a000000100015950>] show_stack+0x50/0xa0
[<a0000001000161c0>] show_regs+0x820/0x860
[<a00000010003ac00>] die+0x1a0/0x300
[<a000000100068b40>] ia64_do_page_fault+0x8c0/0x9e0
[<a00000010000bb00>] ia64_native_leave_kernel+0x0/0x270
[<a000000100888f10>] __xfrm_lookup+0x650/0x760
[<a0000001007ec410>] ip_route_output_flow+0xf0/0x480
[<a000000100846c30>] ip4_datagram_connect+0x330/0x5e0
[<a00000010085f420>] inet_dgram_connect+0x140/0x180
[<a0000001007854f0>] sys_connect+0xf0/0x1a0
[<a00000010000b980>] ia64_ret_from_syscall+0x0/0x20
[<a000000000010720>] __kernel_syscall_via_break+0x0/0x20

I tried the patch you just posted. Compiling with it gave this warning:

net/xfrm/xfrm_policy.c: In function ?__xfrm_lookup?:
net/xfrm/xfrm_policy.c:1735: warning: ?num_xfrms? may be used
uninitialized in this function

but the patched kernel booted ok.

-Tony

2010-04-08 20:13:09

by David Miller

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

From: Tony Luck <[email protected]>
Date: Thu, 8 Apr 2010 11:55:34 -0700

> I tried the patch you just posted. Compiling with it gave this warning:
>
> net/xfrm/xfrm_policy.c: In function ?__xfrm_lookup?:
> net/xfrm/xfrm_policy.c:1735: warning: ?num_xfrms? may be used
> uninitialized in this function

This is just because gcc is stupid, you can ignore this.

It can't see that when a real 'err' error is returned we never end up
referencing the num_xfrms value.

> but the patched kernel booted ok.

Thanks for testing, I pushed Timo's fix to net-next-2.6 earlier today
so it'll hopefully show up in the next linux-next.

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2010-04-09 00:09:08

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: powerpc boot failure

Hi Timo,

On Thu, 08 Apr 2010 11:40:13 +0300 Timo Teräs <[email protected]> wrote:
>
> Can you try if this helps?

That patch allows my machine to boot.

Thanks.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (267.00 B)
(No filename) (198.00 B)
Download all attachments