2008-03-07 20:33:16

by Christian Kujau

[permalink] [raw]
Subject: INFO: task mount:11202 blocked for more than 120 seconds

Hi,

after upgrading from 2.6.24.1 to 2.6.25-rc3, I came across[0]. This
warning seems to be gone now. With 2.6.25-rc4 (and the fix from [1])
the box was running fine for 20 hours or so (doing its usual jobs plus
a "make randconfig && make" loop).

After this, I noticed that /bin/sync would not exit anymore and
remains stuck in D state. Looking around I noticed that the rsync
backup jobs (rsync'ing to an xfs partition) from earlier this
morning did not exit either and hung in D state. With sync hung, the
following messages started to appear:

[75377.756985] INFO: task sync:2697 blocked for more than 120 seconds.
[75377.757579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[75377.758211] sync D c013835c 0 2697 16457
[75377.758216] f59506c0 00000082 f4c34000 c013835c fffeffff f6c1bcb0 f5dd0000 f4c34000
[75377.758223] c04405d7 f53f7e98 f6c1bcb4 f6c1bcd0 00000000 f6c1bcb0 00000000 f7ca1090
[75377.758230] f4c34000 c044070a f6c1bcd0 f6c1bcd0 f5dd0000 00000001 f6c1bcb0 c044074b
[75377.758237] Call Trace:
[75377.758253] [<c013835c>] trace_hardirqs_on+0x9c/0x110
[75377.758269] [<c04405d7>] rwsem_down_failed_common+0x67/0x150
[75377.758279] [<c044070a>] rwsem_down_read_failed+0x1a/0x24
[75377.758286] [<c044074b>] call_rwsem_down_read_failed+0x7/0xc
[75377.758291] [<c012fd7c>] down_read_nested+0x4c/0x60
[75377.758295] [<c027a64b>] xfs_ilock+0x5b/0xb0
[75377.758301] [<c027a64b>] xfs_ilock+0x5b/0xb0
[75377.758306] [<c029693d>] xfs_sync_inodes+0x3dd/0x6b0
[75377.758314] [<c0440b14>] _spin_unlock+0x14/0x20
[75377.758325] [<c0296d9b>] xfs_syncsub+0x18b/0x300
[75377.758330] [<c0440b14>] _spin_unlock+0x14/0x20
[75377.758335] [<c02a7c2b>] xfs_fs_sync_super+0x2b/0xd0
[75377.758342] [<c016a124>] sync_filesystems+0xa4/0x100
[75377.758351] [<c043fdd8>] down_read+0x38/0x50
[75377.758356] [<c016a13f>] sync_filesystems+0xbf/0x100
[75377.758361] [<c01872b3>] do_sync+0x33/0x70
[75377.758366] [<c0102ed7>] restore_nocheck+0x12/0x15
[75377.758371] [<c01872fa>] sys_sync+0xa/0x10
[75377.758375] [<c0102dee>] sysenter_past_esp+0x5f/0xa5
[75377.758402] =======================
[75377.758405] 3 locks held by sync/2697:
[75377.758407] #0: (mutex){--..}, at: [<c016a091>] sync_filesystems+0x11/0x100
[75377.758414] #1: (&type->s_umount_key#22){----}, at: [<c016a124>] sync_filesystems+0xa4/0x100
[75377.758422] #2: (&(&ip->i_iolock)->mr_lock){----}, at: [<c027a64b>] xfs_ilock+0x5b/0xb0


The box is still up & running, although the load is increasing slightly.
I've gathered some details here: http://nerdbynature.de/bits/2.6.25-rc4/

I've searched the archives for this error, but the only thing was
* http://lkml.org/lkml/2008/2/12/44
[BUG] 2.6.25-rc1-git1 softlockup while bootup on powerpc
...however, I don't get "CPU stuck" messages
* http://lkml.org/lkml/2008/1/29/370
Re: system hang on latest git
...but calltrace looks a lot different.

Since both mailings are not so current, I'd like to got back to -rc3 and
try to reproduce this one.

Do you have any idea what's going on here?

Thanks,
Christian.

[0] http://lkml.org/lkml/2008/3/2/171
[1] http://lkml.org/lkml/2008/3/4/634
--
BOFH excuse #158:

Defunct processes


2008-03-07 22:40:59

by David Chinner

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Fri, Mar 07, 2008 at 09:32:57PM +0100, Christian Kujau wrote:
> Hi,
>
> after upgrading from 2.6.24.1 to 2.6.25-rc3, I came across[0]. This
> warning seems to be gone now. With 2.6.25-rc4 (and the fix from [1])
> the box was running fine for 20 hours or so (doing its usual jobs plus
> a "make randconfig && make" loop).
>
> After this, I noticed that /bin/sync would not exit anymore and
> remains stuck in D state. Looking around I noticed that the rsync
> backup jobs (rsync'ing to an xfs partition) from earlier this
> morning did not exit either and hung in D state. With sync hung, the
> following messages started to appear:
>
> [75377.756985] INFO: task sync:2697 blocked for more than 120 seconds.
> [75377.757579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [75377.758211] sync D c013835c 0 2697 16457
> [75377.758216] f59506c0 00000082 f4c34000 c013835c fffeffff f6c1bcb0
> f5dd0000 f4c34000 [75377.758223] c04405d7 f53f7e98 f6c1bcb4 f6c1bcd0
> 00000000 f6c1bcb0 00000000 f7ca1090 [75377.758230] f4c34000 c044070a
> f6c1bcd0 f6c1bcd0 f5dd0000 00000001 f6c1bcb0 c044074b [75377.758237] Call
> Trace:
> [75377.758253] [<c013835c>] trace_hardirqs_on+0x9c/0x110
> [75377.758269] [<c04405d7>] rwsem_down_failed_common+0x67/0x150
> [75377.758279] [<c044070a>] rwsem_down_read_failed+0x1a/0x24
> [75377.758286] [<c044074b>] call_rwsem_down_read_failed+0x7/0xc
> [75377.758291] [<c012fd7c>] down_read_nested+0x4c/0x60
> [75377.758295] [<c027a64b>] xfs_ilock+0x5b/0xb0
> [75377.758301] [<c027a64b>] xfs_ilock+0x5b/0xb0
> [75377.758306] [<c029693d>] xfs_sync_inodes+0x3dd/0x6b0
> [75377.758314] [<c0440b14>] _spin_unlock+0x14/0x20
> [75377.758325] [<c0296d9b>] xfs_syncsub+0x18b/0x300
> [75377.758330] [<c0440b14>] _spin_unlock+0x14/0x20
> [75377.758335] [<c02a7c2b>] xfs_fs_sync_super+0x2b/0xd0
> [75377.758342] [<c016a124>] sync_filesystems+0xa4/0x100
> [75377.758351] [<c043fdd8>] down_read+0x38/0x50
> [75377.758356] [<c016a13f>] sync_filesystems+0xbf/0x100
> [75377.758361] [<c01872b3>] do_sync+0x33/0x70
> [75377.758366] [<c0102ed7>] restore_nocheck+0x12/0x15
> [75377.758371] [<c01872fa>] sys_sync+0xa/0x10
> [75377.758375] [<c0102dee>] sysenter_past_esp+0x5f/0xa5
> [75377.758402] =======================
> [75377.758405] 3 locks held by sync/2697:
> [75377.758407] #0: (mutex){--..}, at: [<c016a091>]
> sync_filesystems+0x11/0x100
> [75377.758414] #1: (&type->s_umount_key#22){----}, at: [<c016a124>]
> sync_filesystems+0xa4/0x100
> [75377.758422] #2: (&(&ip->i_iolock)->mr_lock){----}, at: [<c027a64b>]
> xfs_ilock+0x5b/0xb0

Well, if that is hung there, something else must be holding on to
the iolock it's waiting on. What are the other D state processes in the
machine?

Also, the iolock can be held across I/O so it's possible you've lost an I/O.
Any I/O errors in the syslog?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-03-07 23:46:53

by Christian Kujau

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Sat, 8 Mar 2008, David Chinner wrote:
> Well, if that is hung there, something else must be holding on to
> the iolock it's waiting on. What are the other D state processes in the
> machine?

I have 7 processes in D state so far:

$ ps auxww [....]
root 9844 0.0 0.0 0 0 ? D Mar06 0:22 [pdflush]
root 2697 0.0 0.0 4712 460 ? D Mar07 0:00 sync
root 8342 0.0 0.0 1780 440 ? D Mar07 0:01 /bin/rm -rf /data/md1/stuff
root 12494 0.0 0.0 11124 1228 ? D Mar07 0:14 /usr/bin/rsync
root 15008 0.0 0.0 4712 460 ? D Mar07 0:00 sync
root 11202 0.0 0.0 5012 764 ? D Mar07 0:00 mount -o remount,ro /data/md1
root 15936 0.0 0.0 4712 460 ? D Mar07 0:00 sync

At one point I did a sysrq-D and put the results in:
http://nerdbynature.de/bits/2.6.25-rc4/hung_task/kern.log.gz
(grep for "SysRq : Show Locks Held" and "SysRq : Show Blocked State")

> Also, the iolock can be held across I/O so it's possible you've lost an I/O.
> Any I/O errors in the syslog?

No, no I/O errors at all. See the kern.log above, I could even do dd(1)
from the md1 (dm-crypt on raid1), no errors either.

thanks,
Christian.
--
BOFH excuse #233:

TCP/IP UDP alarm threshold is set too low.

2008-03-08 01:54:43

by Christian Kujau

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Sat, 8 Mar 2008, Christian Kujau wrote:
> I have 7 processes in D state so far:

FWIW, it's 100% reproducible with 2.6.25-rc3 too...sigh :-\
So, the last working kernel for me is 2.6.24.1 - that's a lot of bisecting
and I fear that compile errors will invalidate the bisecting results again
or make it impossible at all....I'll try anyway....tomorrow...

C.
--
BOFH excuse #285:

Telecommunications is upgrading.

2008-03-09 06:15:26

by Christian Kujau

[permalink] [raw]
Subject: 2.6.25-rc hangs (was: INFO: task mount:11202 blocked for more than 120 seconds)

On Sat, 8 Mar 2008, Christian Kujau wrote:
> FWIW, it's 100% reproducible with 2.6.25-rc3 too...sigh :-\
> So, the last working kernel for me is 2.6.24.1 - that's a lot of bisecting
> and I fear that compile errors will invalidate the bisecting results again or
> make it impossible at all....I'll try anyway....tomorrow...

Bisecting failed as expected :-(
I tried to follow the git-bisect manpage (and have successfully used
bisect in the past a few times), but I think ~5700 revisions between
2.6.24 and 2.6.25 are just too much fuzz. The bisect logs so far, with
my comments inbetween:

git-bisect start
# 2.6.25-rc4, known to be bad:
# bad: [84c6f6046c5a2189160a8f0dca8b90427bf690ea] x86_64: make ptrace always sign-extend orig_ax to 64 bits
git-bisect bad 84c6f6046c5a2189160a8f0dca8b90427bf690ea
# 2.6.24, good:
# good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
# 2.6.24, hard lockup during boot, bad:
# bad: [bd45ac0c5daae35e7c71138172e63df5cf644cf6] Merge branch 'linux-2.6'
git-bisect bad bd45ac0c5daae35e7c71138172e63df5cf644cf6

I marked the last one bad, because I could not boot any more. As it's a
headless box, I could not get more details. It did not even respond so
sysrq-b. After marking this bad, I compiled and booted again - same
result, hard lockup. So I tried again:

git bisect reset
git-bisect start
# 2.6.25-rc4, known to be bad:
# bad: [84c6f6046c5a2189160a8f0dca8b90427bf690ea] x86_64: make ptrace always sign-extend orig_ax to 64 bits
git-bisect bad 84c6f6046c5a2189160a8f0dca8b90427bf690ea
# 2.6.24, good:
# good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
# 2.6.24, hard lockup during boot, marking good anyway:
# good: [bd45ac0c5daae35e7c71138172e63df5cf644cf6] Merge branch 'linux-2.6'
git-bisect good bd45ac0c5daae35e7c71138172e63df5cf644cf6
# lockup
# bad: [f0f1b3364ae7f48084bdf2837fb979ff59622523] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
git-bisect bad f0f1b3364ae7f48084bdf2837fb979ff59622523

Although I could not boot with bd45ac0c5daae35e7c71138172e63df5cf644cf6, I
marked it "good", as the lockup is totally unrelated to my problem.
However, the box locks up as soon as I'm using the device-mapper. This
time it does respond to sysrq-b.

But still: I'm unable to diagnose the system hang [0] and I fear that
2.6.25 is released and for the first time since ages I'd have to skip a release...

Help!

Christian.

[0] http://lkml.org/lkml/2008/3/7/308
--
BOFH excuse #288:

Hard drive sleeping. Let it wake up on it's own...

2008-03-09 16:44:54

by Eric Sandeen

[permalink] [raw]
Subject: Re: 2.6.25-rc hangs

Christian Kujau wrote:
> On Sat, 8 Mar 2008, Christian Kujau wrote:
>> FWIW, it's 100% reproducible with 2.6.25-rc3 too...sigh :-\
>> So, the last working kernel for me is 2.6.24.1 - that's a lot of bisecting
>> and I fear that compile errors will invalidate the bisecting results again or
>> make it impossible at all....I'll try anyway....tomorrow...
>
> Bisecting failed as expected :-(
> I tried to follow the git-bisect manpage (and have successfully used
> bisect in the past a few times), but I think ~5700 revisions between
> 2.6.24 and 2.6.25 are just too much fuzz. The bisect logs so far, with
> my comments inbetween:

Christian, what is the test you are using for the bisect?

Thanks,
-Eric

2008-03-09 18:06:14

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.25-rc hangs

On Sun, 9 Mar 2008, Eric Sandeen wrote:
>> I tried to follow the git-bisect manpage (and have successfully used
>> bisect in the past a few times), but I think ~5700 revisions between
>> 2.6.24 and 2.6.25 are just too much fuzz. The bisect logs so far, with
>> my comments inbetween:
>
> Christian, what is the test you are using for the bisect?

Sorry, I don't understand: which "test" do you mean?

I did the bisect as per the bisect log and then just rebooted. Which gave
me no usable results yet. I'm trying to boot 2.6.25-rc1 in a few moments
and see if the hang is there as well. If it is, I'll start bisecting again
and hope that "halfway between -rc1 and .24" will be a bootable kernel
this time...

The "error" I'm trying to chase is a system "hang", but no instant lockup.
I can reproduce this by increasing disk I/O. I did this primarily with
rsync from different filesystems to my backup XFS partition. After a few
minutes, the INFO: messages[0] appeared.

Thanks,
Christian.

[0] http://lkml.org/lkml/2008/3/7/308
--
BOFH excuse #2:

solar flares

2008-03-09 21:35:07

by David Chinner

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

[adding dm-devel to cc list]

On Sat, Mar 08, 2008 at 12:46:40AM +0100, Christian Kujau wrote:
> On Sat, 8 Mar 2008, David Chinner wrote:
> >Well, if that is hung there, something else must be holding on to
> >the iolock it's waiting on. What are the other D state processes in the
> >machine?
>
> I have 7 processes in D state so far:
>
> $ ps auxww [....]
> root 9844 0.0 0.0 0 0 ? D Mar06 0:22 [pdflush]
> root 2697 0.0 0.0 4712 460 ? D Mar07 0:00 sync
> root 8342 0.0 0.0 1780 440 ? D Mar07 0:01 /bin/rm -rf
> /data/md1/stuff
> root 12494 0.0 0.0 11124 1228 ? D Mar07 0:14 /usr/bin/rsync
> root 15008 0.0 0.0 4712 460 ? D Mar07 0:00 sync
> root 11202 0.0 0.0 5012 764 ? D Mar07 0:00 mount -o
> remount,ro /data/md1
> root 15936 0.0 0.0 4712 460 ? D Mar07 0:00 sync
>
> At one point I did a sysrq-D and put the results in:
> http://nerdbynature.de/bits/2.6.25-rc4/hung_task/kern.log.gz
> (grep for "SysRq : Show Locks Held" and "SysRq : Show Blocked State")

Ok, this looks like a block layer issue. Everything is waiting in ioschedule()
and so in places we are blocked holding locked inodes. Hence sync, pdflush,
etc are hung waiting for the inode locks to be released. e.g:

INFO: task rm:8342 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm D ee08de8c 0 8342 8199
f5eebd80 00000086 c0380a16 ee08de8c ee08de8c 00000000 ee08de94 c200ebd0
c043ef2b c0146237 c043f1d2 c0146210 ee08de8c 00000000 00000000 db122110
c01464ca 00000002 c1865b00 0000000c 00000000 ee3acd60 c012c700 c200ebec
Call Trace:
[<c0380a16>] dm_table_unplug_all+0x26/0x40
[<c043ef2b>] io_schedule+0xb/0x20
[<c0146237>] sync_page+0x27/0x40
[<c043f1d2>] __wait_on_bit+0x42/0x70
[<c0146210>] sync_page+0x0/0x40
[<c01464ca>] wait_on_page_bit+0x5a/0x60
[<c012c700>] wake_bit_function+0x0/0x60
[<c014ec53>] truncate_inode_pages_range+0x223/0x360
[<c014eda7>] truncate_inode_pages+0x17/0x20
[<c017b44f>] generic_delete_inode+0xef/0x100
[<c017a8ac>] iput+0x5c/0x70
[<c0171e47>] do_unlinkat+0xf7/0x160
[<c0102e29>] sysenter_past_esp+0x9a/0xa5
[<c013835c>] trace_hardirqs_on+0x9c/0x110
[<c0102dee>] sysenter_past_esp+0x5f/0xa5
=======================
no locks held by rm/8342.

And rsync is stuck in congestion_wait()

SysRq : Show Blocked State
task PC stack pid father
rsync D 00000292 0 12494 1
f5dc7b40 00000086 00000000 00000292 f697bcdc 00735f5c 00000000 00000600
c043efd9 c013820d f532ed60 c05c0f04 f5cc1b58 00735f5c c0122900 f532ed60
c05c0c00 c053eb04 0000000a c043ef0b c01515f8 00000000 f532ed60 c012c6c0
Call Trace:
[<c043efd9>] schedule_timeout+0x49/0xc0
[<c013820d>] mark_held_locks+0x3d/0x70
[<c0122900>] process_timeout+0x0/0x10
[<c043ef0b>] io_schedule_timeout+0xb/0x20
[<c01515f8>] congestion_wait+0x58/0x80
[<c012c6c0>] autoremove_wake_function+0x0/0x40
[<c014cc32>] balance_dirty_pages_ratelimited_nr+0xb2/0x240
[<c0147368>] generic_file_buffered_write+0x1a8/0x650
[<c028540e>] xfs_log_move_tail+0x3e/0x180
[<c0440be9>] _spin_lock+0x29/0x40
[<c02a73cc>] xfs_write+0x7ac/0x8a0
[<c0174c01>] core_sys_select+0x21/0x350
[<c02a32bc>] xfs_file_aio_write+0x5c/0x70
[<c0167d25>] do_sync_write+0xd5/0x120
[<c0102ed7>] restore_nocheck+0x12/0x15
[<c012c6c0>] autoremove_wake_function+0x0/0x40
[<c019d105>] dnotify_parent+0x35/0x90
[<c0167c50>] do_sync_write+0x0/0x120
[<c016859f>] vfs_write+0x9f/0x140
[<c0168b51>] sys_write+0x41/0x70
[<c0102dee>] sysenter_past_esp+0x5f/0xa5

And this rsync procss will definitely be holding the iolock on an XFS
inode here (which is why other processes are hanging on an inode iolock).

> >Also, the iolock can be held across I/O so it's possible you've lost an
> >I/O. Any I/O errors in the syslog?
>
> No, no I/O errors at all. See the kern.log above, I could even do dd(1)
> from the md1 (dm-crypt on raid1), no errors either.

Oh, dm-crypt. Well, I'd definitely start looking there. XFS has a
history of exposing dm-crypt bugs, and these hangs appear to be I/O
congestion/scheduling related and not XFS. Also, we haven't changed
anything related to plug/unplug of block devices in XFS recently, so
that also points to some other change as well...

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-03-10 01:46:55

by Christian Kujau

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Mon, 10 Mar 2008, David Chinner wrote:
> Oh, dm-crypt. Well, I'd definitely start looking there. XFS has a
> history of exposing dm-crypt bugs, and these hangs appear to be I/O
> congestion/scheduling related and not XFS.

Yeah, I noticed that too, thanks for verifying this: during the 2nd bisect
run, the box locked up hard when I accessed the device-mapper. I'm using a
wrapper script to set up my luks/dm-crypt devices and still have to find
out which command exactly triggers the lockup. So, maybe the hard lockup
and the hangs are not so unrelated after all...oh well.

> Also, we haven't changed
> anything related to plug/unplug of block devices in XFS recently, so
> that also points to some other change as well...

Thanks for your assistance, David, I really appreciate it. I'll try to
find out more about this dm-crypt thingy....

Christian.
--
BOFH excuse #396:

Mail server hit by UniSpammer.

2008-03-12 18:04:03

by Samuel Tardieu

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

>>>>> "David" == David Chinner <[email protected]> writes:

>> No, no I/O errors at all. See the kern.log above, I could even do
>> dd(1) from the md1 (dm-crypt on raid1), no errors either.

David> Oh, dm-crypt. Well, I'd definitely start looking there. XFS has
David> a history of exposing dm-crypt bugs, and these hangs appear to
David> be I/O congestion/scheduling related and not XFS. Also, we
David> haven't changed anything related to plug/unplug of block
David> devices in XFS recently, so that also points to some other
David> change as well...

For what it is worth, I notice the same error last week (with a kernel
of the day) on my laptop but was too busy at work to investigate. And
I use ext3... on dm-crypt.

Sam
--
Samuel Tardieu -- [email protected] -- http://www.rfc1149.net/

2008-03-12 19:55:00

by Christian Lamparter

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Wednesday 12 March 2008 19:03:51 Samuel Tardieu wrote:
>
> David> Oh, dm-crypt. Well, I'd definitely start looking there. XFS has
> David> a history of exposing dm-crypt bugs, and these hangs appear to
> David> be I/O congestion/scheduling related and not XFS. Also, we
> David> haven't changed anything related to plug/unplug of block
> David> devices in XFS recently, so that also points to some other
> David> change as well...
>
me too...

http://lkml.org/lkml/2008/3/11/377
(BTW: I'm still bisecting it... can somebody please confirm that 2.6.24-git17
http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.24-git17.bz2
was the _last_ "good" one?)

BTW:
by pressing "SYSRQ-S (Emerg. Sync)" over and over again can unlock
some but not all "D" tasks...

Regards,
Chr.

2008-03-12 23:07:25

by Christian Kujau

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Wed, 12 Mar 2008, Chr wrote:
> http://lkml.org/lkml/2008/3/11/377
> (BTW: I'm still bisecting it... can somebody please confirm that 2.6.24-git17
> http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.24-git17.bz2
> was the _last_ "good" one?)

Hm, -git17 locked up hard when trying to access the device mapper
(just like -rc1), not even sysrq was usable any more. Sadly,
nothing was logged to the disks/network. Will try earlier snapshots
then. And yes, this resyncing of the md devices after an unclean shutdown
is a real pita right now...

Thanks for reporting!
Christian.
--
BOFH excuse #197:

I'm sorry a pentium won't do, you need an SGI to connect with us.

2008-03-13 13:45:58

by Christian Kujau

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Thu, 13 Mar 2008, Christian Kujau wrote:
> Hm, -git17 locked up hard when trying to access the device mapper

and so does -git8 and -git9...sigh :(

C.
--
BOFH excuse #412:

Radial Telemetry Infiltration

2008-03-13 21:33:59

by Christian Lamparter

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Thursday 13 March 2008 14:45:39 Christian Kujau wrote:
> and so does -git8 and -git9...sigh :(

hmm, I can boot any of these -rcX-gitY kernels (but not without some stuck
tasks after about a hour or so...)

but back: I tried lockdep and ran into this: (debug blocker)

pktcdvd: writer pktcdvd0 mapped to sr0

=============================================
[ INFO: possible recursive locking detected ]
2.6.25-rc5-git3 #2
---------------------------------------------
scsi_id/8262 is trying to acquire lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

but task is already holding lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

other info that might help us debug this:
2 locks held by scsi_id/8262:
#0: (&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0
#1: (&ctl_mutex#2){--..}, at: [<ffffffff88231af2>] pkt_open+0x26/0x516
[pktcdvd]

stack backtrace:
Pid: 8262, comm: scsi_id Not tainted 2.6.25-rc5-git3 #2

Call Trace:
[<ffffffff8024ffe4>] __lock_acquire+0x8c9/0xc72
[<ffffffff804bb23e>] ? __mutex_unlock_slowpath+0xf9/0x105
[<ffffffff8025079f>] lock_acquire+0x5e/0x77
[<ffffffff802a8e51>] ? do_open+0x7b/0x2c0
[<ffffffff804bb336>] mutex_lock_nested+0xe1/0x270
[<ffffffff802a8e51>] ? do_open+0x7b/0x2c0
[<ffffffff802a8e51>] do_open+0x7b/0x2c0
[<ffffffff802a9116>] __blkdev_get+0x80/0x92
[<ffffffff802a9133>] blkdev_get+0xb/0xd
[<ffffffff88231b6b>] :pktcdvd:pkt_open+0x9f/0x516
[<ffffffff802a8e84>] do_open+0xae/0x2c0
[<ffffffff802a92a4>] ? blkdev_open+0x0/0x6a
[<ffffffff804bcbab>] ? _spin_unlock+0x26/0x2a
[<ffffffff802a92a4>] ? blkdev_open+0x0/0x6a
[<ffffffff802a92db>] blkdev_open+0x37/0x6a
[<ffffffff80282571>] __dentry_open+0xe6/0x1c0
[<ffffffff802826e4>] nameidata_to_filp+0x2e/0x40
[<ffffffff8028272f>] do_filp_open+0x39/0x4b
[<ffffffff804bcbab>] ? _spin_unlock+0x26/0x2a
[<ffffffff8028246f>] ? get_unused_fd_flags+0x10d/0x11c
[<ffffffff80282792>] do_sys_open+0x51/0xd9
[<ffffffff80282843>] sys_open+0x1b/0x1d
[<ffffffff8020b22b>] system_call_after_swapgs+0x7b/0x80

soooo.....?

scsi_id/8262 is trying to acquire lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

but task is already holding lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

sounds a bit strange? Is this a SMP related problem after all?
(*disabled pktcddvd for now... to see if still locks)

Regards,
Chr...

2008-03-13 21:54:42

by Christian Kujau

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Thu, 13 Mar 2008, Chr wrote:
> On Thursday 13 March 2008 14:45:39 Christian Kujau wrote:
>> and so does -git8 and -git9...sigh :(
>
> hmm, I can boot any of these -rcX-gitY kernels (but not without some stuck
> tasks after about a hour or so...)

Ah, when you said -git17 would be the last "good" kernel, I thought
everything earlier would not have the stuck tasks.

> but back: I tried lockdep and ran into this: (debug blocker)

I have lockdep enabled as well, but no warnings so far...

> sounds a bit strange? Is this a SMP related problem after all?

I don't think so, as I don't have an SMP system (and CONFIG_SMP is
disabled as well).

C.
--
BOFH excuse #441:

Hash table has woodworm

2008-03-14 00:16:18

by Christian Lamparter

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Thursday 13 March 2008 22:54:25 Christian Kujau wrote:
> On Thu, 13 Mar 2008, Chr wrote:
> > On Thursday 13 March 2008 14:45:39 Christian Kujau wrote:
> >> and so does -git8 and -git9...sigh :(
> >
> > hmm, I can boot any of these -rcX-gitY kernels (but not without some
> > stuck tasks after about a hour or so...)
>
> Ah, when you said -git17 would be the last "good" kernel, I thought
> everything earlier would not have the stuck tasks.
well, no... even 2.6.24.3 has _hangs_, however not forever (approx. 5-10 min)
and everything is back to normal...

> > sounds a bit strange? Is this a SMP related problem after all?
>
> I don't think so, as I don't have an SMP system (and CONFIG_SMP is
> disabled as well).
Then maybe scheduler? because these values looks suspecious:
(that's why I added Ingo to CC, because he probably knows what's NOT going on
;-) )

(full dmesg there:
http://www.pastebin.ca/941845

config here:
http://www.pastebin.ca/941856
)

the funny stuff is highlighted with a >:
cpu#1, 2211.332 MHz
.nr_running : 8
.load : 188884
.nr_switches : 9615826
.nr_load_updates : 1879402
> .nr_uninterruptible : -2053
.jiffies : 4296946718
.next_balance : 4296.946740
.curr->pid : 0
.clock : 9800558.756490
.idle_clock : 6521432.794621
.prev_clock_raw : 4476359.432375
.clock_warps : 0
.clock_overflows : 8676899
.clock_underflows : 688045
.clock_deep_idle_events : 0
.clock_max_delta : 4.000250
.cpu_load[0] : 0
.cpu_load[1] : 24
.cpu_load[2] : 112
.cpu_load[3] : 155
.cpu_load[4] : 177

cfs_rq
.exec_clock : 1353227.466775
.MIN_vruntime : 1808656.942015
.min_vruntime : 1808696.942015
.max_vruntime : 1808657.235424
.spread : 0.293409
> .spread0 : -267982.528057
.nr_running : 7
.load : 11362
.bkl_count : 60751
.nr_spread_over : 6234

runnable tasks:
task PID tree-key switches prio exec-runtime
sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
kcryptd 2266 1808656.942015 12778 115 1808656.942015
3689.257692 3125323.031084
hald-addon-inpu 5504 1808656.942015 622 120 1808656.942015
10.739211 3044831.003941
kcryptd 7312 1808656.960820 578891 115 1808656.960820
71166.387493 2625492.750280
Xorg 9564 1808656.942016 513338 120 1808656.942016
245777.204573 2587613.331559
pdflush 10059 1808656.942015 14575 120 1808656.942015
4400.355110 2985523.630368
pdflush 10696 1808656.942015 9433 120 1808656.942015
3607.302084 3103638.660388
xine 16773 1808657.235424 13455 120 1808657.235424
2135.923058 574348.774173
> watchdog 16959 -206866.028086 0 98 -206866.028086
0.000000 0.000000

(it's at the end of the dmesg, the kernel, X.org and probably everything else
was stuck...)

Thanks,
Christian.

2008-03-14 09:28:39

by Milan Broz

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

David Chinner wrote:
>>> Also, the iolock can be held across I/O so it's possible you've lost an
>>> I/O. Any I/O errors in the syslog?
>> No, no I/O errors at all. See the kern.log above, I could even do dd(1)
>> from the md1 (dm-crypt on raid1), no errors either.
>
> Oh, dm-crypt. Well, I'd definitely start looking there. XFS has a
> history of exposing dm-crypt bugs, and these hangs appear to be I/O
> congestion/scheduling related and not XFS. Also, we haven't changed
> anything related to plug/unplug of block devices in XFS recently, so
> that also points to some other change as well...

Yes, there is bug in dm-crypt...
Please try if the patch here helps: http://lkml.org/lkml/2008/3/14/71

Thanks,
Milan
--
[email protected]

2008-03-14 23:58:22

by Christian Kujau

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Fri, 14 Mar 2008, Milan Broz wrote:
> Yes, there is bug in dm-crypt...
> Please try if the patch here helps: http://lkml.org/lkml/2008/3/14/71

Hm, it seems to help the hangs, yes. Applied to today's -git a few hours
ago, the hangs are gone. However, when doing lots of disk I/O, the machine
locks up after a few (10-20) minutes. Sadly, netconsole got nothing :(

After the first lockup I tried again and shortly after bootup I got:

[ 866.681441] [ INFO: possible circular locking dependency detected ]
[ 866.681876] 2.6.25-rc5 #1
[ 866.682203] -------------------------------------------------------
[ 866.682637] kswapd0/132 is trying to acquire lock:
[ 866.683028] (&(&ip->i_iolock)->mr_lock){----}, at: [<c027a686>] xfs_ilock+0x96/0xb0
[ 866.683916]
[ 866.683917] but task is already holding lock:
[ 866.684582] (iprune_mutex){--..}, at: [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.685461]
[ 866.685462] which lock already depends on the new lock.
[ 866.685463]
[ 866.686440]
[ 866.686441] the existing dependency chain (in reverse order) is:
[ 866.687151]
[ 866.687152] -> #1 (iprune_mutex){--..}:
[ 866.687339] [<c0136914>] add_lock_to_list+0x44/0xc0
[ 866.687339] [<c01393a6>] __lock_acquire+0xc26/0x10b0
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c013890f>] __lock_acquire+0x18f/0x10b0
[ 866.687339] [<c013988e>] lock_acquire+0x5e/0x80
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c043fc79>] mutex_lock_nested+0x89/0x240
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c0150051>] shrink_slab+0x21/0x160
[ 866.687340] [<c0150131>] shrink_slab+0x101/0x160
[ 866.687340] [<c01502e2>] try_to_free_pages+0x152/0x230
[ 866.687340] [<c014f060>] isolate_pages_global+0x0/0x60
[ 866.687340] [<c014b95b>] __alloc_pages+0x14b/0x370
[ 866.687340] [<c04413a0>] _read_unlock_irq+0x20/0x30
[ 866.687340] [<c0146601>] __grab_cache_page+0x81/0xc0
[ 866.687340] [<c01896f6>] block_write_begin+0x76/0xe0
[ 866.687340] [<c029ec76>] xfs_vm_write_begin+0x46/0x50
[ 866.687340] [<c029f4c0>] xfs_get_blocks+0x0/0x30
[ 866.687340] [<c0147297>] generic_file_buffered_write+0x117/0x650
[ 866.687340] [<c027a65d>] xfs_ilock+0x6d/0xb0
[ 866.687340] [<c02a73cc>] xfs_write+0x7ac/0x8a0
[ 866.687340] [<c0174ac1>] core_sys_select+0x21/0x350
[ 866.687340] [<c02a32bc>] xfs_file_aio_write+0x5c/0x70
[ 866.687340] [<c0167bf5>] do_sync_write+0xd5/0x120
[ 866.687340] [<c012c630>] autoremove_wake_function+0x0/0x40
[ 866.687340] [<c019cfd5>] dnotify_parent+0x35/0x90
[ 866.687340] [<c0167b20>] do_sync_write+0x0/0x120
[ 866.687340] [<c016846f>] vfs_write+0x9f/0x140
[ 866.687340] [<c0168a21>] sys_write+0x41/0x70
[ 866.687340] [<c0102dee>] sysenter_past_esp+0x5f/0xa5
[ 866.687340] [<ffffffff>] 0xffffffff
[ 866.687340]
[ 866.687340] -> #0 (&(&ip->i_iolock)->mr_lock){----}:
[ 866.687340] [<c0136ba0>] print_circular_bug_entry+0x40/0x50

The box was running fine then for ~20 minutes, then it locked up again.

Full dmesg and .config: http://nerdbynature.de/bits/2.6.25-rc5/

Right now I'm back to 2.6.24.3...

Thanks,
Christian.
--
BOFH excuse #350:

paradigm shift...without a clutch

2008-03-15 00:08:19

by Christian Lamparter

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Friday 14 March 2008 10:27:07 Milan Broz wrote:
>
> > Oh, dm-crypt. Well, I'd definitely start looking there. XFS has a
> > history of exposing dm-crypt bugs, and these hangs appear to be I/O
> > congestion/scheduling related and not XFS. Also, we haven't changed
> > anything related to plug/unplug of block devices in XFS recently, so
> > that also points to some other change as well...
>
> Yes, there is bug in dm-crypt...
> Please try if the patch here helps: http://lkml.org/lkml/2008/3/14/71
>
hmm, :-/

stuck again... (tried the patch on top of 2.6.25-rc5 and 2.6.25-rc5-git4)

SYS-RQ (several times) T+W+Q
right here:
http://www.pastebin.ca/943145

Regards,
Christian

2008-03-15 13:32:24

by Christian Lamparter

[permalink] [raw]
Subject: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

>On Sat, Mar 15, 2008 at 01:21:04AM +0100, Ulrich Lukas wrote:
>> With that patch applied, writing seems to work again.
>?
> Current version of the patch is below. ?It's still not quite finished.
>
:-/, both patches freezes here...

I'll try to revert:

commit 3a7f6c990ad04e6f576a159876c602d14d6f7fef
dm crypt: use async crypto

dm-crypt: Use crypto ablkcipher interface
Move encrypt/decrypt core to async crypto call.

maybe it isn't the async stuff after all..

>> Btw, is this a regression because of the changed 2.6.25(-rc) code or is
>> it just a coincidence that it worked before?
>
>Introduced during code changes to support async crypto interface.

Regards,
Chr
?

2008-03-15 21:35:06

by Christian Lamparter

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

On Saturday 15 March 2008 14:32:10 Chr wrote:
> reverted:
>
> commit 3a7f6c990ad04e6f576a159876c602d14d6f7fef
> dm crypt: use async crypto
>
> dm-crypt: Use crypto ablkcipher interface
> Move encrypt/decrypt core to async crypto call.
>

well.... it's much better now, without the async interface.
Christian Kajau, can you confirm it too?

Regards,
Christian
?


2008-03-16 13:09:58

by Christian Kujau

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

On Sat, 15 Mar 2008, Chr wrote:
> On Saturday 15 March 2008 14:32:10 Chr wrote:
>> reverted:
>>
>> commit 3a7f6c990ad04e6f576a159876c602d14d6f7fef
>> dm crypt: use async crypto
>>
>> dm-crypt: Use crypto ablkcipher interface
>> Move encrypt/decrypt core to async crypto call.
>>
>
> well.... it's much better now, without the async interface.
> Christian Kajau, can you confirm it too?

I reverted the commit above from today's -git and booted....I could not
notice any hangs more. But when I tried to reproduce these hangs by
generating disk I/O (mostly reads) ~10 minutes later the box panicked, but
still not netconsole messages :-(

thanks,
C.
--
BOFH excuse #233:

TCP/IP UDP alarm threshold is set too low.

2008-03-16 20:34:16

by David Chinner

[permalink] [raw]
Subject: Re: INFO: task mount:11202 blocked for more than 120 seconds

On Sat, Mar 15, 2008 at 12:58:02AM +0100, Christian Kujau wrote:
> On Fri, 14 Mar 2008, Milan Broz wrote:
> >Yes, there is bug in dm-crypt...
> >Please try if the patch here helps: http://lkml.org/lkml/2008/3/14/71
>
> Hm, it seems to help the hangs, yes. Applied to today's -git a few hours
> ago, the hangs are gone. However, when doing lots of disk I/O, the machine
> locks up after a few (10-20) minutes. Sadly, netconsole got nothing :(
>
> After the first lockup I tried again and shortly after bootup I got:

False positive. Memory reclaim can inverts the order of iprune_mutex and
the normal inode locking orders. i.e. Both of these are possible:

do_something()
enter memory reclaim
iprune_mutex
inode lock

or
do_something()
inode lock
do_something_else
enter memory reclaim
iprune_mutex
inode lock on different inode

So depending on what is seen first (in this case the first), the second
will trip lockdep. Neither are a deadlock, because the inode lock
held before memory reclaim is a referenced inode and will *never* be
on the free list for memory reclaim to deadlock on....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-03-17 17:36:43

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

Latest version for everyone to try:

From: Milan Broz <[email protected]>

Fix regression in dm-crypt introduced in commit
3a7f6c990ad04e6f576a159876c602d14d6f7fef
(dm crypt: use async crypto).

If write requests need to be split into pieces, the code must not
process them in parallel because the crypto context cannot be shared.
So there can be parallel crypto operations on one part of the write,
but only one write bio can be processed at a time.

This is not optimal and the workqueue code need to be optimized for
parallel processing, but for now it solves problem without affecting
the performance of synchronous crypto operation (most of current
dm-crypt users).

Signed-off-by: Milan Broz <[email protected]>
Signed-off-by: Alasdair G Kergon <[email protected]>

---
drivers/md/dm-crypt.c | 58 +++++++++++++++++++++++++-------------------------
1 files changed, 30 insertions(+), 28 deletions(-)

Index: linux-2.6.25-rc4/drivers/md/dm-crypt.c
===================================================================
--- linux-2.6.25-rc4.orig/drivers/md/dm-crypt.c 2008-03-17 11:42:16.000000000 +0000
+++ linux-2.6.25-rc4/drivers/md/dm-crypt.c 2008-03-17 11:42:28.000000000 +0000
@@ -1,7 +1,7 @@
/*
* Copyright (C) 2003 Christophe Saout <[email protected]>
* Copyright (C) 2004 Clemens Fruhwirth <[email protected]>
- * Copyright (C) 2006-2007 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
*
* This file is released under the GPL.
*/
@@ -93,6 +93,8 @@ struct crypt_config {

struct workqueue_struct *io_queue;
struct workqueue_struct *crypt_queue;
+ wait_queue_head_t writeq;
+
/*
* crypto related data
*/
@@ -331,14 +333,7 @@ static void crypt_convert_init(struct cr
ctx->idx_out = bio_out ? bio_out->bi_idx : 0;
ctx->sector = sector + cc->iv_offset;
init_completion(&ctx->restart);
- /*
- * Crypto operation can be asynchronous,
- * ctx->pending is increased after request submission.
- * We need to ensure that we don't call the crypt finish
- * operation before pending got incremented
- * (dependent on crypt submission return code).
- */
- atomic_set(&ctx->pending, 2);
+ atomic_set(&ctx->pending, 1);
}

static int crypt_convert_block(struct crypt_config *cc,
@@ -411,43 +406,42 @@ static void crypt_alloc_req(struct crypt
static int crypt_convert(struct crypt_config *cc,
struct convert_context *ctx)
{
- int r = 0;
+ int r;

while(ctx->idx_in < ctx->bio_in->bi_vcnt &&
ctx->idx_out < ctx->bio_out->bi_vcnt) {

crypt_alloc_req(cc, ctx);

+ atomic_inc(&ctx->pending);
+
r = crypt_convert_block(cc, ctx, cc->req);

switch (r) {
+ /* async */
case -EBUSY:
wait_for_completion(&ctx->restart);
INIT_COMPLETION(ctx->restart);
/* fall through*/
case -EINPROGRESS:
- atomic_inc(&ctx->pending);
cc->req = NULL;
- r = 0;
- /* fall through*/
+ ctx->sector++;
+ continue;
+
+ /* sync */
case 0:
+ atomic_dec(&ctx->pending);
ctx->sector++;
continue;
- }

- break;
+ /* error */
+ default:
+ atomic_dec(&ctx->pending);
+ return r;
+ }
}

- /*
- * If there are pending crypto operation run async
- * code. Otherwise process return code synchronously.
- * The step of 2 ensures that async finish doesn't
- * call crypto finish too early.
- */
- if (atomic_sub_return(2, &ctx->pending))
- return -EINPROGRESS;
-
- return r;
+ return 0;
}

static void dm_crypt_bio_destructor(struct bio *bio)
@@ -624,8 +618,10 @@ static void kcryptd_io_read(struct dm_cr
static void kcryptd_io_write(struct dm_crypt_io *io)
{
struct bio *clone = io->ctx.bio_out;
+ struct crypt_config *cc = io->target->private;

generic_make_request(clone);
+ wake_up(&cc->writeq);
}

static void kcryptd_io(struct work_struct *work)
@@ -698,7 +694,8 @@ static void kcryptd_crypt_write_convert_

r = crypt_convert(cc, &io->ctx);

- if (r != -EINPROGRESS) {
+ if (atomic_dec_and_test(&io->ctx.pending)) {
+ /* processed, no running async crypto */
kcryptd_crypt_write_io_submit(io, r, 0);
if (unlikely(r < 0))
return;
@@ -706,8 +703,12 @@ static void kcryptd_crypt_write_convert_
atomic_inc(&io->pending);

/* out of memory -> run queues */
- if (unlikely(remaining))
+ if (unlikely(remaining)) {
+ /* wait for async crypto and reinitialize pending counter */
+ wait_event(cc->writeq, !atomic_read(&io->ctx.pending));
+ atomic_set(&io->ctx.pending, 1);
congestion_wait(WRITE, HZ/100);
+ }
}
}

@@ -746,7 +747,7 @@ static void kcryptd_crypt_read_convert(s

r = crypt_convert(cc, &io->ctx);

- if (r != -EINPROGRESS)
+ if (atomic_dec_and_test(&io->ctx.pending))
kcryptd_crypt_read_done(io, r);

crypt_dec_pending(io);
@@ -1047,6 +1048,7 @@ static int crypt_ctr(struct dm_target *t
goto bad_crypt_queue;
}

+ init_waitqueue_head(&cc->writeq);
ti->private = cc;
return 0;

2008-03-17 18:36:32

by Christian Lamparter

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

On Monday 17 March 2008 18:36:09 Alasdair G Kergon wrote:
> Latest version for everyone to try:
>
> From: Milan Broz <[email protected]>
>
> Fix regression in dm-crypt introduced in commit
> 3a7f6c990ad04e6f576a159876c602d14d6f7fef
> (dm crypt: use async crypto).
>
> If write requests need to be split into pieces, the code must not
> process them in parallel because the crypto context cannot be shared.
> So there can be parallel crypto operations on one part of the write,
> but only one write bio can be processed at a time.
>
> This is not optimal and the workqueue code need to be optimized for
> parallel processing, but for now it solves problem without affecting
> the performance of synchronous crypto operation (most of current
> dm-crypt users).
>
> Signed-off-by: Milan Broz <[email protected]>
> Signed-off-by: Alasdair G Kergon <[email protected]>
Tested-by: Christian Lamparter <[email protected]>

Well, the dm-crypt regressions seems to be gone. :)
Thanks for your work & time!

Regards,
Christian

2008-03-18 00:57:20

by Herbert Xu

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

On Mon, Mar 17, 2008 at 05:36:09PM +0000, Alasdair G Kergon wrote:
> Latest version for everyone to try:
>
> From: Milan Broz <[email protected]>
>
> Fix regression in dm-crypt introduced in commit
> 3a7f6c990ad04e6f576a159876c602d14d6f7fef
> (dm crypt: use async crypto).
>
> If write requests need to be split into pieces, the code must not
> process them in parallel because the crypto context cannot be shared.
> So there can be parallel crypto operations on one part of the write,
> but only one write bio can be processed at a time.

Could you explain this part please? Crypto tfm objects are meant
to be reentrant, synchronous or not.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2008-03-18 04:08:24

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

Herbert Xu wrote:
> On Mon, Mar 17, 2008 at 05:36:09PM +0000, Alasdair G Kergon wrote:
>> Latest version for everyone to try:
>>
>> From: Milan Broz <[email protected]>
>>
>> Fix regression in dm-crypt introduced in commit
>> 3a7f6c990ad04e6f576a159876c602d14d6f7fef
>> (dm crypt: use async crypto).
>>
>> If write requests need to be split into pieces, the code must not
>> process them in parallel because the crypto context cannot be shared.
>> So there can be parallel crypto operations on one part of the write,
>> but only one write bio can be processed at a time.
>
> Could you explain this part please? Crypto tfm objects are meant
> to be reentrant, synchronous or not.

Ah, sorry - I mean dm-crypt convert context (with crypto context included).

Context is reentrant in the sense of crypto operations. But we need also
sometimes split bio in writes (not only because of low memory,
but also new memory layout of cloned bio can be different and we
must not violate hardware restrictions - spec. XFS generates such
highly optimized bio requests - it's why it discovers so many dm-crypt problems ;-)

see problematic dm-crypt bio write path

while (remaining) {
clone = crypt_alloc_buffer(io, remaining);
...
io->ctx.bio_out = clone;
io->ctx.idx_out = 0;
remaining -= clone->bi_size;
...
r = crypt_convert(cc, &io->ctx);
-> fire sync or (multiple) async crypto operation

if (atomic_dec_and_test(&io->ctx.pending))
-> sync mode, submit clone direclty
...
if (unlikely(remaining))
congestion_wait(WRITE, HZ/100);
}

and in async crypto completion callback (because async callback cannot
call in its context generic_make_request directly) is called:

struct convert_context *ctx = async_req->data;
...
if (!atomic_dec_and_test(&ctx->pending))
return;
...
INIT_WORK(&io->work, kcryptd_io);
queue_work(cc->io_queue, &io->work);
...
(and from io thread later)
struct bio *clone = io->ctx.bio_out;
generic_make_request(clone);

problems:
1) we cannot use io->work struct in parallel
2) io->ctx.pending is shared here between multiple sub-bio clones
...

(there was no problems in sync crypto mode. and dm-crypt io struct is
already allocated from mempool in crypt_map allocation, so changing this
to per cloned sub-bio allocation can cause new problems in low-memory situations,
not good idea change it in this development phase...)

Milan
--
[email protected]

2008-03-20 00:03:11

by Christian Kujau

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

Chr wrote:
> Well, the dm-crypt regressions seems to be gone. :)
> Thanks for your work& time!

Unfortunately I'm unable to test atm as I currently don't have access to
the machine where the hangs occured. So, if Chr says "it's fixed" I
believe it is and I don't wanna be a show stopper...IOW: feel free to
close the bug.

Thanks to all involved,
Christian.



2008-03-22 02:52:24

by Christian Kujau

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

On Mon, 17 Mar 2008, Alasdair G Kergon wrote:
> From: Milan Broz <[email protected]>
>
> Fix regression in dm-crypt introduced in commit
> 3a7f6c990ad04e6f576a159876c602d14d6f7fef
> (dm crypt: use async crypto).

I finally got around to test this - and yes, 2.6.25-rc6 with this patch
applied make the hangs go away. There are other problems now[0], but they
don't seem to be related, AFAICT.

Thank you!
Christian.

[0] http://lkml.org/lkml/2008/3/21/408
--
BOFH excuse #197:

I'm sorry a pentium won't do, you need an SGI to connect with us.

2008-03-26 16:58:14

by Christian Lamparter

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

On Monday 17 March 2008 18:36:09 Alasdair G Kergon wrote:
> Latest version for everyone to try:
>
> From: Milan Broz <[email protected]>
>
> Fix regression in dm-crypt introduced in commit
> 3a7f6c990ad04e6f576a159876c602d14d6f7fef
> (dm crypt: use async crypto).
>
> If write requests need to be split into pieces, the code must not
> process them in parallel because the crypto context cannot be shared.
> So there can be parallel crypto operations on one part of the write,
> but only one write bio can be processed at a time.
>
> This is not optimal and the workqueue code need to be optimized for
> parallel processing, but for now it solves problem without affecting
> the performance of synchronous crypto operation (most of current
> dm-crypt users).
>
> Signed-off-by: Milan Broz <[email protected]>
> Signed-off-by: Alasdair G Kergon <[email protected]>
>
> [...]

Is there already a new, optimized version? Or is it postponed and someone will
_hopefully_ merge this patch into -rc7-gitX before we have a buggy mainline
2.6.25?

Thanks,
Chr


2008-03-27 08:22:08

by Christian Kujau

[permalink] [raw]
Subject: Re: [dm-crypt] INFO: task mount:11202 blocked for more than 120 seconds

On Sat, 22 Mar 2008, Christian Kujau wrote:
> On Mon, 17 Mar 2008, Alasdair G Kergon wrote:
>> From: Milan Broz <[email protected]>
>>
>> Fix regression in dm-crypt introduced in commit
>> 3a7f6c990ad04e6f576a159876c602d14d6f7fef
>> (dm crypt: use async crypto).

I noticed that this patch[0] hasn't made it into mainline yet,
will this be included or is this fixed by something else?

Thanks,
Christian.

[0] http://lkml.org/lkml/2008/3/17/214
--
BOFH excuse #416:

We're out of slots on the server