2009-03-05 10:42:26

by Alexander Holler

[permalink] [raw]
Subject: Oops using 2.6.28.n after a lazy umount of a crypted loop-device

Hello,

for some reason I can't remember I've done a lazy umount follwing the
deregistration of the loop-device. The commands in question are:

---------
umount -l /mnt/crypted
cryptsetup luksClose crypted
losetup -d /dev/loop1
---------

Using Kernels 2.6.28.2 and .7 this two times resulted
in an Oops like the following (both having the same Call Trace):

-----------------
BUG: unable to handle kernel paging request at 622f7377
IP: [<c013f57b>] mempool_free+0xb/0x80
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/class/net/sit1/address
Modules linked in: sit tunnel4 tun nfsd lockd nfs_acl auth_rpcgss
exportfs sunrpc loop dm_crypt dm_mod lrw gf128mul aes_i586 aes_generic
longhaul 8250_pci 8250 serial_core ipv6 fan aic7xxx vt8231 cyblafb
uhci_hcd parport_pc i2c_viapro pcspkr scsi_transport_spi via_agp thermal
processor usbcore i2c_core parport button agpgart sg evdev
Pid: 15933, comm: loop1 Not tainted (2.6.28.7 #1) EPIA
EIP: 0060:[<c013f57b>] EFLAGS: 00010282 CPU: 0
EIP is at mempool_free+0xb/0x80
EAX: d1ebe280 EBX: d1e47360 ECX: d1d50438 EDX: 622f7373
ESI: 622f7373 EDI: d1ebe280 EBP: 00000000 ESP: d1ed8f28
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process loop1 (pid: 15933, ti=d1ed8000 task=d0edc3e0 task.ti=d1ed8000)
Stack:
d1e47360 d1fd0920 d1e47360 c01796a5 00000000 d1ebe5f0 c0179522 d813b513
d8d30020 d1d75600 d1ffbf78 d0b7da00 d1ebeadc d1e47de0 c017922e d814e455
0017a000 00000000 c017922e d81616c2 d5409e00 00000000 00000000 d8161700
Call Trace:
[<c01796a5>] bio_free+0x25/0x30
[<c0179522>] bio_put+0x22/0x30
[<d813b513>] clone_endio+0x83/0xa0 [dm_mod]
[<c017922e>] bio_endio+0x1e/0x20
[<d814e455>] crypt_dec_pending+0x25/0x50 [dm_crypt]
[<c017922e>] bio_endio+0x1e/0x20
[<d81616c2>] loop_thread+0x362/0x3a0 [loop]
[<d8161700>] do_lo_send_aops+0x0/0x160 [loop]
[<c012a5c0>] autoremove_wake_function+0x0/0x30
[<d8161360>] loop_thread+0x0/0x3a0 [loop]
[<c012a4e8>] kthread+0x38/0x60
[<c012a4b0>] kthread+0x0/0x60
[<c0103fa7>] kernel_thread_helper+0x7/0x10
Code: ff 89 e8 e9 4e ff ff ff 31 f6 89 f0 83 c4 14 5b 5e 5f 5d c3 8d b6
00 00 00 00 8d bf 00 00 00 00 57 56 53 89 c7 89 d6 85 c0 74 71 <8b> 42
043b 02 7d 62 9c 5b fa 89 e0 25 00 f0 ff ff ff 40 14 8b
EIP: [<c013f57b>] mempool_free+0xb/0x80 SS:ESP 0068:d1ed8f28
---[ end trace c4cedfb39b6cc26d ]---
-----------------

I've digged something arround in drivers/block/loop.c and I assume that
loop_clr_fd() misses to stop or clear something before it destroys
needed datas the kernel-thread (loop_thread) uses. Anyway I don't have
much knowledge about all the stuff going on there, so I don't think I
will find the problem by myself without spending much time. I know using
a normal umount will be the obvious workaround, anyway I don't think the
lazy umount should result in an Oops afterwards, regardless how
reasonable the lazy umount following the deletion of the device is.

If I could help with some more infos or similar, feel free to ask.

Kind regards,

Alexander Holler


2009-03-05 11:58:20

by Milan Broz

[permalink] [raw]
Subject: [PATCH] Re: Oops using 2.6.28.n after a lazy umount of a crypted loop-device

Alexander Holler wrote:

> Hello,
>
> for some reason I can't remember I've done a lazy umount follwing the
> deregistration of the loop-device. The commands in question are:
>
> ---------
> umount -l /mnt/crypted
> cryptsetup luksClose crypted
> losetup -d /dev/loop1
> ---------
>
> Using Kernels 2.6.28.2 and .7 this two times resulted
> in an Oops like the following (both having the same Call Trace):
>
>
Please Can you try attached patch if helps here?
(Patch is not perfect, but should help, at least identify that
it is the same problem I am fixing:-)

Milan
--
[email protected]

dm crypt: Wait for possible unfinished endio() call in destructor

When user set dm-crypt over loop device and the loop thread processing
bios calls bio_endio later than the dm-crypt mapping is destroyed
(including mempool for dm io request), the endio can cause this OOPs:

(mempool_free from already destroyed mempool).

[ 70.381058] EIP is at mempool_free+0x12/0x6b
...
[ 70.381058] Process loop0 (pid: 4268, ti=cf3b2000 task=cf1cc1f0 task.ti=cf3b2000)
...
[ 70.381058] Call Trace:
[ 70.381058] [<d0d76601>] ? crypt_dec_pending+0x5e/0x62 [dm_crypt]
[ 70.381058] [<d0d767b8>] ? crypt_endio+0xa2/0xaa [dm_crypt]
[ 70.381058] [<d0d76716>] ? crypt_endio+0x0/0xaa [dm_crypt]
[ 70.381058] [<c01a2f24>] ? bio_endio+0x2b/0x2e
[ 70.381058] [<d0806530>] ? dec_pending+0x224/0x23b [dm_mod]
[ 70.381058] [<d08066e4>] ? clone_endio+0x79/0xa4 [dm_mod]
[ 70.381058] [<d080666b>] ? clone_endio+0x0/0xa4 [dm_mod]
[ 70.381058] [<c01a2f24>] ? bio_endio+0x2b/0x2e
[ 70.381058] [<c02bad86>] ? loop_thread+0x380/0x3b7
[ 70.381058] [<c02ba8a1>] ? do_lo_send_aops+0x0/0x165
[ 70.381058] [<c013754f>] ? autoremove_wake_function+0x0/0x33
[ 70.381058] [<c02baa06>] ? loop_thread+0x0/0x3b7

Fix it by adding reference counter into crypt config and wait till
all endio operations finishes.

Signed-off-by: Milan Broz <[email protected]>
---
drivers/md/dm-crypt.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 35bda49..fa37c87 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -95,6 +95,8 @@ struct crypt_config {
struct workqueue_struct *io_queue;
struct workqueue_struct *crypt_queue;

+ atomic_t pending;
+
/*
* crypto related data
*/
@@ -566,6 +568,7 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
}

mempool_free(io, cc->io_pool);
+ atomic_dec(&cc->pending);
}

/*
@@ -1113,6 +1116,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto bad_crypt_queue;
}

+ atomic_set(&cc->pending, 0);
+
ti->private = cc;
return 0;

@@ -1149,6 +1154,9 @@ static void crypt_dtr(struct dm_target *ti)
destroy_workqueue(cc->io_queue);
destroy_workqueue(cc->crypt_queue);

+ while (atomic_read(&cc->pending))
+ msleep(1);
+
if (cc->req)
mempool_free(cc->req, cc->req_pool);

@@ -1171,8 +1179,11 @@ static void crypt_dtr(struct dm_target *ti)
static int crypt_map(struct dm_target *ti, struct bio *bio,
union map_info *map_context)
{
+ struct crypt_config *cc = ti->private;
struct dm_crypt_io *io;

+ atomic_inc(&cc->pending);
+
io = crypt_io_alloc(ti, bio, bio->bi_sector - ti->begin);

if (bio_data_dir(io->base_bio) == READ)

2009-03-06 06:16:52

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH] Re: Oops using 2.6.28.n after a lazy umount of a crypted loop-device

Hello,

thanks for the fast response and patch.

Milan Broz schrieb:
> Please Can you try attached patch if helps here?
> (Patch is not perfect, but should help, at least identify that
> it is the same problem I am fixing:-)

The patch works (I had to add an #include <linux/delay.h> /* msleep */).

I've tested it using 2.6.28.7 and the script below. With your patch the
script was running over night looping about 400 times without any error.
A crosscheck without the patch needed only 10 iterations to get an oops.
So I assume you have fixed the problem I had. ;)

There stills seems to be another problem left, I've got 3 times the
kernel-message

device-mapper: ioctl: unable to remove open device
temporary-cryptsetup-21571

during cryptsetup luksOpen. I never realized that msg before, but I've
found an old one in my logs too.

Anyway, this does not result in an oops, so I'm happy. Thanks a lot.

Kind regards,

Alexander Holler


----------- oopstest.exp -----------------------
!/usr/bin/expect
system modprobe dm-crypt
system modprobe loop
for { set i 0 } { $i < 1000 } {} {
incr i
send_user "Test $i\n"
system losetup /dev/loop1 /Daten/Daten.crypt
spawn cryptsetup luksOpen /dev/loop1 crypted
expect passphrase:
sleep 2
send oopstest\r
sleep 60
send_user \n
system fsck.ext3 /dev/mapper/crypted
sleep 2
system mount -t ext3 -o rw,user,exec,noatime /dev/mapper/crypted
/Daten/crypted
system dd if=/dev/urandom of=/Daten/crypted/random bs=1024 count=1024
system umount -l /Daten/crypted
system cryptsetup luksClose crypted
system losetup -d /dev/loop1
}
----------- oopstest.exp -----------------------

2009-03-06 08:25:16

by Milan Broz

[permalink] [raw]
Subject: Re: [PATCH] Re: Oops using 2.6.28.n after a lazy umount of a crypted loop-device

Alexander Holler wrote:
> Milan Broz schrieb:
>> Please Can you try attached patch if helps here?
>> (Patch is not perfect, but should help, at least identify that
>> it is the same problem I am fixing:-)
>
> The patch works (I had to add an #include <linux/delay.h> /* msleep */).

ok, thanks.


> I've tested it using 2.6.28.7 and the script below. With your patch the
> script was running over night looping about 400 times without any error.
> A crosscheck without the patch needed only 10 iterations to get an oops.
> So I assume you have fixed the problem I had. ;)
>
> There stills seems to be another problem left, I've got 3 times the
> kernel-message
>
> device-mapper: ioctl: unable to remove open device
> temporary-cryptsetup-21571

This is bug in cryptsetup related to udev (udev should not touch temporary
cryptsetup device) which is fixed upstream, but there is no
new release yet (Fedora has this patched, not sure about other distros)

see http://code.google.com/p/cryptsetup/source/detail?r=32 and other patches.

Milan
--
[email protected]