LinuxLists.cc - NULL data pointer dereference in kcryptd

2009-07-31 20:54:54

Subject: NULL data pointer dereference in kcryptd

The following oops happened while doing an mke2fs -j on a luks mapping.
I think (I'm not entirely sure) that it was triggered by running the "sync" command
simultaneously to the mke2fs. Both sync and mke2fs are stuck in D state after the oops.

root 10916 0.4 0.0 1888 460 ? D< 22:39 0:03 /bin/sync
root 11038 0.4 0.0 1888 460 ? D< 22:39 0:02 /bin/sync
(Not sure why two. Maybe I poked it twice. I don't remember.)

root 10375 7.3 0.0 0 0 ? D 22:25 1:54 [mke2fs]
(Why are there brackets around mke2fs anyway?)

root 353 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/0]
root 354 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/1]
root 355 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/2]
root 356 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/3]
root 10321 0.0 0.0 0 0 ? S< 22:24 0:00 [kcryptd_io]

Kernel is 2.6.30 on PowerPC 64bit SMP.

[15577.988487] Unable to handle kernel paging request for data at address 0x00000000
[15577.988494] Faulting instruction address: 0xc0000000000b8034
[15577.988499] Oops: Kernel access of bad area, sig: 11 [#1]
[15577.988501] PREEMPT SMP NR_CPUS=4 NUMA PowerMac
[15577.988506] Modules linked in: ppdev lp af_packet sbp2 ieee1394 loop b43 mac80211 cfg80211 ssb 8250_pci 8250 serial_core parport_pc parport uninorth_agp agpgart evdev
[15577.988530] NIP: c0000000000b8034 LR: c000000000139bdc CTR: c000000000533290
[15577.988534] REGS: c0000001f022f8e0 TRAP: 0300 Not tainted (2.6.30)
[15577.988536] MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28004044 XER: 000fffff
[15577.988546] DAR: 0000000000000000, DSISR: 0000000040000000
[15577.988549] TASK = c00000020fe74380[10322] 'kcryptd' THREAD: c0000001f022c000 CPU: 3
[15577.988552] GPR00: 00000000ffffffaf c0000001f022fb60 c000000000a4e928 0000000000011200
[15577.988559] GPR04: 000002037053b000 c0000002150b9e40 0000000000000000 c0000001e8ec8ef0
[15577.988565] GPR08: 0000000000000000 0000000000000000 c0000001bfacabf8 0000000000000000
[15577.988571] GPR12: 0000000028000044 c000000000a7f880 c000000000831ae0 c0000000009270a8
[15577.988577] GPR16: c000000000926d68 0000000000000000 f0000000046e6b58 0000000070e007b1
[15577.988583] GPR20: 00000000fffffdef 0000000000000200 c0000001bfacabe0 f0000000046e6b58
[15577.988590] GPR24: 0000000000000001 c0000001f022fbd0 c0000001f022fbe8 c000000203bd9ed1
[15577.988596] GPR28: c000000203bd9ea1 0000000000011210 c0000000009ac658 0000000000000000
[15577.988608] NIP [c0000000000b8034] .mempool_alloc+0x74/0x1a0
[15577.988614] LR [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
[15577.988616] Call Trace:
[15577.988619] [c0000001f022fb60] [c0000001f022fbf0] 0xc0000001f022fbf0 (unreliable)
[15577.988625] [c0000001f022fc40] [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
[15577.988632] [c0000001f022fcf0] [c0000000005334a0] .kcryptd_crypt+0x210/0x520
[15577.988637] [c0000001f022fde0] [c000000000068018] .worker_thread+0x248/0x3e0
[15577.988642] [c0000001f022ff00] [c00000000006e1e4] .kthread+0x84/0xe0
[15577.988648] [c0000001f022ff90] [c000000000021830] .kernel_thread+0x54/0x70
[15577.988651] Instruction dump:
[15577.988654] 789d0020 7c7c1b78 780083e4 57a906f6 3b7c0030 6000ffaf 2e090000 7fa30038
[15577.988663] 3b410088 3b210070 e93c0020 e89c0018 <e8090000> f8410028 7c0903a6 e9690010
[15577.988674] ---[ end trace 13c61a64a39c7194 ]---

--
Greetings, Michael.

2009-08-01 00:27:15

by Herbert Xu

[permalink] [raw]

Subject: Re: NULL data pointer dereference in kcryptd

On Fri, Jul 31, 2009 at 10:54:45PM +0200, Michael Buesch wrote:
> The following oops happened while doing an mke2fs -j on a luks mapping.
> I think (I'm not entirely sure) that it was triggered by running the "sync" command
> simultaneously to the mke2fs. Both sync and mke2fs are stuck in D state after the oops.
>
> root 10916 0.4 0.0 1888 460 ? D< 22:39 0:03 /bin/sync
> root 11038 0.4 0.0 1888 460 ? D< 22:39 0:02 /bin/sync
> (Not sure why two. Maybe I poked it twice. I don't remember.)
>
> root 10375 7.3 0.0 0 0 ? D 22:25 1:54 [mke2fs]
> (Why are there brackets around mke2fs anyway?)
>
> root 353 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/0]
> root 354 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/1]
> root 355 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/2]
> root 356 0.0 0.0 0 0 ? S< 18:25 0:00 [crypto/3]
> root 10321 0.0 0.0 0 0 ? S< 22:24 0:00 [kcryptd_io]
>
> Kernel is 2.6.30 on PowerPC 64bit SMP.
>
> [15577.988487] Unable to handle kernel paging request for data at address 0x00000000
> [15577.988494] Faulting instruction address: 0xc0000000000b8034
> [15577.988499] Oops: Kernel access of bad area, sig: 11 [#1]
> [15577.988501] PREEMPT SMP NR_CPUS=4 NUMA PowerMac
> [15577.988506] Modules linked in: ppdev lp af_packet sbp2 ieee1394 loop b43 mac80211 cfg80211 ssb 8250_pci 8250 serial_core parport_pc parport uninorth_agp agpgart evdev
> [15577.988530] NIP: c0000000000b8034 LR: c000000000139bdc CTR: c000000000533290
> [15577.988534] REGS: c0000001f022f8e0 TRAP: 0300 Not tainted (2.6.30)
> [15577.988536] MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28004044 XER: 000fffff
> [15577.988546] DAR: 0000000000000000, DSISR: 0000000040000000
> [15577.988549] TASK = c00000020fe74380[10322] 'kcryptd' THREAD: c0000001f022c000 CPU: 3
> [15577.988552] GPR00: 00000000ffffffaf c0000001f022fb60 c000000000a4e928 0000000000011200
> [15577.988559] GPR04: 000002037053b000 c0000002150b9e40 0000000000000000 c0000001e8ec8ef0
> [15577.988565] GPR08: 0000000000000000 0000000000000000 c0000001bfacabf8 0000000000000000
> [15577.988571] GPR12: 0000000028000044 c000000000a7f880 c000000000831ae0 c0000000009270a8
> [15577.988577] GPR16: c000000000926d68 0000000000000000 f0000000046e6b58 0000000070e007b1
> [15577.988583] GPR20: 00000000fffffdef 0000000000000200 c0000001bfacabe0 f0000000046e6b58
> [15577.988590] GPR24: 0000000000000001 c0000001f022fbd0 c0000001f022fbe8 c000000203bd9ed1
> [15577.988596] GPR28: c000000203bd9ea1 0000000000011210 c0000000009ac658 0000000000000000
> [15577.988608] NIP [c0000000000b8034] .mempool_alloc+0x74/0x1a0
> [15577.988614] LR [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
> [15577.988616] Call Trace:
> [15577.988619] [c0000001f022fb60] [c0000001f022fbf0] 0xc0000001f022fbf0 (unreliable)
> [15577.988625] [c0000001f022fc40] [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
> [15577.988632] [c0000001f022fcf0] [c0000000005334a0] .kcryptd_crypt+0x210/0x520
> [15577.988637] [c0000001f022fde0] [c000000000068018] .worker_thread+0x248/0x3e0
> [15577.988642] [c0000001f022ff00] [c00000000006e1e4] .kthread+0x84/0xe0
> [15577.988648] [c0000001f022ff90] [c000000000021830] .kernel_thread+0x54/0x70
> [15577.988651] Instruction dump:
> [15577.988654] 789d0020 7c7c1b78 780083e4 57a906f6 3b7c0030 6000ffaf 2e090000 7fa30038
> [15577.988663] 3b410088 3b210070 e93c0020 e89c0018 <e8090000> f8410028 7c0903a6 e9690010
> [15577.988674] ---[ end trace 13c61a64a39c7194 ]---

kcryptd actually belongs to drivers/md/dm-crypt.c. So please post
to the DM list.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-08-01 09:04:11

by Milan Broz

[permalink] [raw]

Subject: Re: Re: NULL data pointer dereference in kcryptd

Herbert Xu wrote:
> On Fri, Jul 31, 2009 at 10:54:45PM +0200, Michael Buesch wrote:
>> [15577.988608] NIP [c0000000000b8034] .mempool_alloc+0x74/0x1a0
>> [15577.988614] LR [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
>> [15577.988616] Call Trace:
>> [15577.988619] [c0000001f022fb60] [c0000001f022fbf0] 0xc0000001f022fbf0 (unreliable)
>> [15577.988625] [c0000001f022fc40] [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
>> [15577.988632] [c0000001f022fcf0] [c0000000005334a0] .kcryptd_crypt+0x210/0x520
>> [15577.988637] [c0000001f022fde0] [c000000000068018] .worker_thread+0x248/0x3e0
>> [15577.988642] [c0000001f022ff00] [c00000000006e1e4] .kthread+0x84/0xe0
>> [15577.988648] [c0000001f022ff90] [c000000000021830] .kernel_thread+0x54/0x70

That seems like mempool is NULL in bio_alloc_bioset. That mempool/bioset is destroyed only
when crypt mapping is destroyed, after workqueue is flushed, so it should not happen...

Which command did you exactly run to trigger this? only sync & mkfs on existing
LUKS device or there was also some cryptsetup luksClose (or something removing
mapping) before?

> kcryptd actually belongs to drivers/md/dm-crypt.c. So please post
> to the DM list.

yes, this is dm-crypt or block layer problem. But I read linux-crypto list too:-)

Milan
--
[email protected]

2009-08-01 10:07:16

by Michael Büsch

[permalink] [raw]

Subject: Re: [dm-devel] Re: NULL data pointer dereference in kcryptd

On Saturday 01 August 2009 11:04:11 Milan Broz wrote:
> Herbert Xu wrote:
> > On Fri, Jul 31, 2009 at 10:54:45PM +0200, Michael Buesch wrote:
> >> [15577.988608] NIP [c0000000000b8034] .mempool_alloc+0x74/0x1a0
> >> [15577.988614] LR [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
> >> [15577.988616] Call Trace:
> >> [15577.988619] [c0000001f022fb60] [c0000001f022fbf0] 0xc0000001f022fbf0 (unreliable)
> >> [15577.988625] [c0000001f022fc40] [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
> >> [15577.988632] [c0000001f022fcf0] [c0000000005334a0] .kcryptd_crypt+0x210/0x520
> >> [15577.988637] [c0000001f022fde0] [c000000000068018] .worker_thread+0x248/0x3e0
> >> [15577.988642] [c0000001f022ff00] [c00000000006e1e4] .kthread+0x84/0xe0
> >> [15577.988648] [c0000001f022ff90] [c000000000021830] .kernel_thread+0x54/0x70
>
> That seems like mempool is NULL in bio_alloc_bioset. That mempool/bioset is destroyed only
> when crypt mapping is destroyed, after workqueue is flushed, so it should not happen...
>
> Which command did you exactly run to trigger this? only sync & mkfs on existing
> LUKS device or there was also some cryptsetup luksClose (or something removing
> mapping) before?

I started mke2fs on a large luks mapping (1TB). So that takes quite a while.
So I had some time to do some other stuff simultaneously. It happens that I
did a "sync" while the mke2fs command was still running. The sync immediately froze.
Then after mke2fs finished writing the inode tables it also froze.
Then I looked into dmesg and saw the oops.
I don't think I did a cryptsetup action while mke2fs was running. I just mounted a few
USB sticks and copied a few files around and did that sync operation...

>
> > kcryptd actually belongs to drivers/md/dm-crypt.c. So please post
> > to the DM list.
>
> yes, this is dm-crypt or block layer problem. But I read linux-crypto list too:-)
>
> Milan
> --
> [email protected]
>
>
>

--
Greetings, Michael.

2009-08-01 11:10:06

by Michael Büsch

[permalink] [raw]

Subject: Re: [dm-devel] Re: NULL data pointer dereference in kcryptd

On Saturday 01 August 2009 12:07:12 Michael Buesch wrote:
> On Saturday 01 August 2009 11:04:11 Milan Broz wrote:
> > Herbert Xu wrote:
> > > On Fri, Jul 31, 2009 at 10:54:45PM +0200, Michael Buesch wrote:
> > >> [15577.988608] NIP [c0000000000b8034] .mempool_alloc+0x74/0x1a0
> > >> [15577.988614] LR [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
> > >> [15577.988616] Call Trace:
> > >> [15577.988619] [c0000001f022fb60] [c0000001f022fbf0] 0xc0000001f022fbf0 (unreliable)
> > >> [15577.988625] [c0000001f022fc40] [c000000000139bdc] .bio_alloc_bioset+0x4c/0x130
> > >> [15577.988632] [c0000001f022fcf0] [c0000000005334a0] .kcryptd_crypt+0x210/0x520
> > >> [15577.988637] [c0000001f022fde0] [c000000000068018] .worker_thread+0x248/0x3e0
> > >> [15577.988642] [c0000001f022ff00] [c00000000006e1e4] .kthread+0x84/0xe0
> > >> [15577.988648] [c0000001f022ff90] [c000000000021830] .kernel_thread+0x54/0x70
> >
> > That seems like mempool is NULL in bio_alloc_bioset. That mempool/bioset is destroyed only
> > when crypt mapping is destroyed, after workqueue is flushed, so it should not happen...
> >
> > Which command did you exactly run to trigger this? only sync & mkfs on existing
> > LUKS device or there was also some cryptsetup luksClose (or something removing
> > mapping) before?
>
> I started mke2fs on a large luks mapping (1TB). So that takes quite a while.
> So I had some time to do some other stuff simultaneously. It happens that I
> did a "sync" while the mke2fs command was still running. The sync immediately froze.
> Then after mke2fs finished writing the inode tables it also froze.
> Then I looked into dmesg and saw the oops.
> I don't think I did a cryptsetup action while mke2fs was running. I just mounted a few
> USB sticks and copied a few files around and did that sync operation...

Of course, I do not know _when_ it oopsed exactly. I may have oopsed even before I started mke2fs
and I did only notice later (due to the stuck processes).
So before doing the mke2fs I did a few luksOpen and luksClose, of course.

--
Greetings, Michael.