2010-02-14 17:45:14

by Alexey Dobriyan

[permalink] [raw]
Subject: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff81145bf4>] crypto_remove_spawns+0xd4/0x340
PGD bdc48067 PUD bc954067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
CPU 0
Pid: 16500, comm: rmmod Not tainted 2.6.33-rc7-next-20100212+ #9 P5E/P5E
RIP: 0010:[<ffffffff81145bf4>] [<ffffffff81145bf4>] crypto_remove_spawns+0xd4/0x340
RSP: 0018:ffff8800bc9dfde8 EFLAGS: 00010282
RAX: ffff8800bc901498 RBX: 0000000000000000 RCX: ffff8800ba859610
RDX: ffff8800bc900380 RSI: ffff8800bc9dfe18 RDI: ffff8800bc9015c0
RBP: ffff8800bc9dfe68 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800bc901488
R13: ffff8800bc9dfe18 R14: ffffffffa05817e0 R15: 0000000000000000
FS: 00007fdd2ec1c6f0(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 00000000bca34000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 16500, threadinfo ffff8800bc9de000, task ffff8800bd53ad90)
Stack:
ffff8800bc9dfe08 ffff8800bc9dfe28 ffff8800bc9dfe98 0000042181636020
<0> ffff8800bc9dfe08 ffff8800bc9dfe08 ffff8800bc9015c0 ffff8800bc900380
<0> ffff8800ba859808 ffff8800ba859610 ffff8800bc9dfe98 ffffffffa05817e0
Call Trace:
[<ffffffff81145eb1>] crypto_remove_alg+0x51/0x60
[<ffffffff81145ef3>] crypto_unregister_alg+0x33/0x90
[<ffffffffa058175c>] aes_fini+0x10/0x12 [aes_x86_64]
[<ffffffff8107266c>] sys_delete_module+0x19c/0x250
[<ffffffff8100256b>] system_call_fastpath+0x16/0x1b
Code: 02 00 eb c3 0f 1f 00 48 8b 47 08 48 8d 75 c0 4c 89 28 49 89 45 08 48 8b 55 c0 e8 a8 fa 02 00 48 8d 45 a0 48 8b 18 48 39 d8 74 44 <4c> 8b 63 18 4d 39 f4 0f 84 4e 02 00 00 48 8b 13 48 8b 43 08 4c
RIP [<ffffffff81145bf4>] crypto_remove_spawns+0xd4/0x340
RSP <ffff8800bc9dfde8>
CR2: 0000000000000018


crypto_remove_spawns:

spawn = list_first_entry(spawns, struct crypto_spawn, list);
inst = spawn->inst;

spawn is NULL here.


2010-02-15 05:27:35

by Herbert Xu

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Sun, Feb 14, 2010 at 07:45:07PM +0200, Alexey Dobriyan wrote:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> IP: [<ffffffff81145bf4>] crypto_remove_spawns+0xd4/0x340
> PGD bdc48067 PUD bc954067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
> CPU 0
> Pid: 16500, comm: rmmod Not tainted 2.6.33-rc7-next-20100212+ #9 P5E/P5E
> RIP: 0010:[<ffffffff81145bf4>] [<ffffffff81145bf4>] crypto_remove_spawns+0xd4/0x340
> RSP: 0018:ffff8800bc9dfde8 EFLAGS: 00010282
> RAX: ffff8800bc901498 RBX: 0000000000000000 RCX: ffff8800ba859610
> RDX: ffff8800bc900380 RSI: ffff8800bc9dfe18 RDI: ffff8800bc9015c0
> RBP: ffff8800bc9dfe68 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800bc901488
> R13: ffff8800bc9dfe18 R14: ffffffffa05817e0 R15: 0000000000000000
> FS: 00007fdd2ec1c6f0(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000018 CR3: 00000000bca34000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process rmmod (pid: 16500, threadinfo ffff8800bc9de000, task ffff8800bd53ad90)
> Stack:
> ffff8800bc9dfe08 ffff8800bc9dfe28 ffff8800bc9dfe98 0000042181636020
> <0> ffff8800bc9dfe08 ffff8800bc9dfe08 ffff8800bc9015c0 ffff8800bc900380
> <0> ffff8800ba859808 ffff8800ba859610 ffff8800bc9dfe98 ffffffffa05817e0
> Call Trace:
> [<ffffffff81145eb1>] crypto_remove_alg+0x51/0x60
> [<ffffffff81145ef3>] crypto_unregister_alg+0x33/0x90
> [<ffffffffa058175c>] aes_fini+0x10/0x12 [aes_x86_64]
> [<ffffffff8107266c>] sys_delete_module+0x19c/0x250
> [<ffffffff8100256b>] system_call_fastpath+0x16/0x1b
> Code: 02 00 eb c3 0f 1f 00 48 8b 47 08 48 8d 75 c0 4c 89 28 49 89 45 08 48 8b 55 c0 e8 a8 fa 02 00 48 8d 45 a0 48 8b 18 48 39 d8 74 44 <4c> 8b 63 18 4d 39 f4 0f 84 4e 02 00 00 48 8b 13 48 8b 43 08 4c
> RIP [<ffffffff81145bf4>] crypto_remove_spawns+0xd4/0x340
> RSP <ffff8800bc9dfde8>
> CR2: 0000000000000018
>
>
> crypto_remove_spawns:
>
> spawn = list_first_entry(spawns, struct crypto_spawn, list);
> inst = spawn->inst;
>
> spawn is NULL here.

Is this reproducible every time you unload aes_x86_64 after boot?
Please attach your config file?

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-02-15 07:47:26

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Mon, Feb 15, 2010 at 7:27 AM, Herbert Xu <[email protected]> wrote:
> Is this reproducible every time you unload aes_x86_64 after boot?

No, what I do is

1. setup ipcomp in tunnel mode _in fresh netns_ and immediately exit
2. modprobe/rmmod all modules (not much)

~1 hour of this workload and it hits sometimes with aes_x86_64,
sometimes with aes_generic.

> Please attach your config file?

Full config later, for now it's ipv4 only, XFRM stuff as modules,
crypto modules as modules, almost all debugging on.

2010-02-15 08:11:59

by Herbert Xu

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Mon, Feb 15, 2010 at 09:47:25AM +0200, Alexey Dobriyan wrote:
> On Mon, Feb 15, 2010 at 7:27 AM, Herbert Xu <[email protected]> wrote:
> > Is this reproducible every time you unload aes_x86_64 after boot?
>
> No, what I do is
>
> 1. setup ipcomp in tunnel mode _in fresh netns_ and immediately exit
> 2. modprobe/rmmod all modules (not much)
>
> ~1 hour of this workload and it hits sometimes with aes_x86_64,
> sometimes with aes_generic.

Was this with that IPCOMP bug fixed?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-02-15 08:14:09

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Mon, Feb 15, 2010 at 10:11 AM, Herbert Xu
<[email protected]> wrote:
> On Mon, Feb 15, 2010 at 09:47:25AM +0200, Alexey Dobriyan wrote:
>> On Mon, Feb 15, 2010 at 7:27 AM, Herbert Xu <[email protected]> wrote:
>> > Is this reproducible every time you unload aes_x86_64 after boot?
>>
>> No, what I do is
>>
>> 1. setup ipcomp in tunnel mode _in fresh netns_ and immediately exit
>> 2. modprobe/rmmod all modules (not much)
>>
>> ~1 hour of this workload and it hits sometimes with aes_x86_64,
>> sometimes with aes_generic.
>
> Was this with that IPCOMP bug fixed?

Yes, ipcomp bug triggers almost immediately.
Anyway, this is just description of what I do.

2010-02-16 12:02:08

by Herbert Xu

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Mon, Feb 15, 2010 at 10:14:08AM +0200, Alexey Dobriyan wrote:
>
> Yes, ipcomp bug triggers almost immediately.
> Anyway, this is just description of what I do.

Can you see if this patch makes it go away?

This can happen when you're unloading aes just as an algorithm
that uses aes (such as cbc(aes)) is being created.

diff --git a/crypto/algapi.c b/crypto/algapi.c
index f149b1c..88c5f6c 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -165,6 +165,8 @@ static void crypto_remove_spawns(struct crypto_alg *alg,

spawn->alg = NULL;
spawns = &inst->alg.cra_users;
+ if (!spawns->next)
+ break;
}
} while ((spawns = crypto_more_spawns(alg, &stack, &top,
&secondary_spawns)));

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-02-16 19:31:46

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Tue, Feb 16, 2010 at 08:02:03PM +0800, Herbert Xu wrote:
> On Mon, Feb 15, 2010 at 10:14:08AM +0200, Alexey Dobriyan wrote:
> >
> > Yes, ipcomp bug triggers almost immediately.
> > Anyway, this is just description of what I do.
>
> Can you see if this patch makes it go away?
>
> This can happen when you're unloading aes just as an algorithm
> that uses aes (such as cbc(aes)) is being created.

Which codepath exactly?
I'd say try_module_get() should fail somewhere.

BTW, CBC or AES aren't used, just loaded.

Here is setkey script:

#!/usr/sbin/setkey -f
flush;
spdflush;

add A B ipcomp 44 -m tunnel -C deflate;
add B A ipcomp 45 -m tunnel -C deflate;

spdadd A B any -P in ipsec
ipcomp/tunnel/192.168.1.2-192.168.1.3/use;

spdadd B A any -P out ipsec
ipcomp/tunnel/192.168.1.3-192.168.1.2/use;

> --- a/crypto/algapi.c
> +++ b/crypto/algapi.c
> @@ -165,6 +165,8 @@ static void crypto_remove_spawns(struct crypto_alg *alg,
>
> spawn->alg = NULL;
> spawns = &inst->alg.cra_users;
> + if (!spawns->next)
> + break;
> }
> } while ((spawns = crypto_more_spawns(alg, &stack, &top,
> &secondary_spawns)));

2010-02-17 00:37:46

by Herbert Xu

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Tue, Feb 16, 2010 at 09:31:39PM +0200, Alexey Dobriyan wrote:
>
> Which codepath exactly?

When a spawn is created the instance associated with it will have
a zero-initialised cra_users entry.

> BTW, CBC or AES aren't used, just loaded.

Can you boot without aes/cbc loaded and see if it gets automatically
loaded? PF_KEY users tend to load all possible algorithms when
they start up.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-02-17 19:31:10

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Wed, Feb 17, 2010 at 08:37:35AM +0800, Herbert Xu wrote:
> On Tue, Feb 16, 2010 at 09:31:39PM +0200, Alexey Dobriyan wrote:
> >
> > Which codepath exactly?
>
> When a spawn is created the instance associated with it will have
> a zero-initialised cra_users entry.
>
> > BTW, CBC or AES aren't used, just loaded.
>
> Can you boot without aes/cbc loaded and see if it gets automatically
> loaded? PF_KEY users tend to load all possible algorithms when
> they start up.

AES and CBC are indeed loaded even if not used.

xfrm4_tunnel 1701 0
tunnel4 2319 1 xfrm4_tunnel
ipcomp 2063 0
xfrm_ipcomp 5154 1 ipcomp
xfrm4_mode_tunnel 1861 0
deflate 2033 0
zlib_deflate 21231 1 deflate
zlib_inflate 18334 1 deflate
aes_generic 26652 0
des_generic 16263 0
cbc 2945 0
sha1_generic 2223 0
md5 4161 0
hmac 3041 0
cryptomgr 105876 0
aead 6474 1 cryptomgr
pcompress 1767 1 cryptomgr
crypto_null 2910 0
crypto_blkcipher 12080 3 cbc,cryptomgr,crypto_null
crypto_hash 15857 5
sha1_generic,md5,hmac,cryptomgr,crypto_null
crypto_algapi 16753 11
deflate,aes_generic,des_generic,cbc,hmac,cryptomgr,aead,pcompress,crypto_null,crypto_blkcipher,crypto_hash
af_key 30024 0


Race started to become even less reproducible, so I can't confirm your patch quickly. :-(

2010-02-18 00:26:24

by Herbert Xu

[permalink] [raw]
Subject: Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

On Wed, Feb 17, 2010 at 09:31:09PM +0200, Alexey Dobriyan wrote:
>
> Race started to become even less reproducible, so I can't confirm your patch quickly. :-(

See if it's setkey or racoon that's loading the algorithms. Once
you know which it is, just repeatedly restart that while running
rmmod on aes.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt