2012-08-24 13:13:12

by Bart Van Assche

[permalink] [raw]
Subject: Kernel hangs in truncate_inode_pages()

Hello,

If I log in with the open-iscsi initiator to an iSCSI target an power
down the target the initiator hangs in truncate_inode_pages(). This
happens systematically with kernel 3.6-rc3 and it's something I've
never seen with kernel 3.4 or before. Has anyone else already run into
this issue ?

>From the output of echo w >/proc/sysrq-trigger:

SysRq : Show Blocked State
task PC stack pid father
udisks-part-id D ffffffff810f7530 0 30803 10057 0x00000006
ffff8801063c39a8 0000000000000046 ffff880126955d60 0000000000000002
ffff8801063c3fd8 ffff880126955d60 ffff8801063c3fd8 ffff8801063c3fd8
ffffffff81813420 ffff880126955d60 ffff8801063c39a8 ffff88012fa13990
Call Trace:
[<ffffffff810f7530>] ? __lock_page+0x70/0x70
[<ffffffff8140c4e9>] schedule+0x29/0x70
[<ffffffff8140c5bf>] io_schedule+0x8f/0xd0
[<ffffffff810f753e>] sleep_on_page+0xe/0x20
[<ffffffff8140a0fa>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff810fa45a>] ? find_get_pages+0x2a/0x1c0
[<ffffffff81104f7f>] ? release_pages+0x1af/0x200
[<ffffffff810f7527>] __lock_page+0x67/0x70
[<ffffffff810615a0>] ? autoremove_wake_function+0x40/0x40
[<ffffffff81106976>] truncate_inode_pages_range+0x456/0x480
[<ffffffff8109590d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff8109be5a>] ? on_each_cpu_cond+0xaa/0xf0
[<ffffffff81106a25>] truncate_inode_pages+0x15/0x20
[<ffffffff8118389f>] kill_bdev+0x2f/0x40
[<ffffffff81184096>] __blkdev_put+0x76/0x1d0
[<ffffffff8118424d>] blkdev_put+0x5d/0x180
[<ffffffff81184395>] blkdev_close+0x25/0x30
[<ffffffff8114c43c>] __fput+0xcc/0x240
[<ffffffff8114c5be>] ____fput+0xe/0x10
[<ffffffff8105d419>] task_work_run+0x69/0xa0
[<ffffffff8104165f>] do_exit+0x78f/0x930
[<ffffffff810514f0>] ? get_signal_to_deliver+0xf0/0x6f0
[<ffffffff8140d1e7>] ? _raw_spin_lock_irq+0x17/0x60
[<ffffffff8140d930>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff81041b5e>] do_group_exit+0x4e/0xc0
[<ffffffff81051628>] get_signal_to_deliver+0x228/0x6f0
[<ffffffff8100223c>] do_signal+0x3c/0x620
[<ffffffff81415dcc>] ? sysret_signal+0x5/0x3d
[<ffffffff810028ad>] do_notify_resume+0x6d/0xb0
[<ffffffff8120846e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81415fd8>] int_signal+0x12/0x17

Bart.


2012-08-24 17:25:24

by Mike Christie

[permalink] [raw]
Subject: Re: Kernel hangs in truncate_inode_pages()

On 08/24/2012 08:13 AM, Bart Van Assche wrote:
> Hello,
>
> If I log in with the open-iscsi initiator to an iSCSI target an power
> down the target the initiator hangs in truncate_inode_pages(). This
> happens systematically with kernel 3.6-rc3 and it's something I've
> never seen with kernel 3.4 or before. Has anyone else already run into
> this issue ?
>

I have not seen this issue before and have not seen it reported.

Are you doing IO the disk at the time of power down, or are you doing a
iscsiadm -m node ... --logout command?


> From the output of echo w >/proc/sysrq-trigger:
>
> SysRq : Show Blocked State
> task PC stack pid father
> udisks-part-id D ffffffff810f7530 0 30803 10057 0x00000006
> ffff8801063c39a8 0000000000000046 ffff880126955d60 0000000000000002
> ffff8801063c3fd8 ffff880126955d60 ffff8801063c3fd8 ffff8801063c3fd8
> ffffffff81813420 ffff880126955d60 ffff8801063c39a8 ffff88012fa13990
> Call Trace:
> [<ffffffff810f7530>] ? __lock_page+0x70/0x70
> [<ffffffff8140c4e9>] schedule+0x29/0x70
> [<ffffffff8140c5bf>] io_schedule+0x8f/0xd0
> [<ffffffff810f753e>] sleep_on_page+0xe/0x20
> [<ffffffff8140a0fa>] __wait_on_bit_lock+0x5a/0xc0
> [<ffffffff810fa45a>] ? find_get_pages+0x2a/0x1c0
> [<ffffffff81104f7f>] ? release_pages+0x1af/0x200
> [<ffffffff810f7527>] __lock_page+0x67/0x70
> [<ffffffff810615a0>] ? autoremove_wake_function+0x40/0x40
> [<ffffffff81106976>] truncate_inode_pages_range+0x456/0x480
> [<ffffffff8109590d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff8109be5a>] ? on_each_cpu_cond+0xaa/0xf0
> [<ffffffff81106a25>] truncate_inode_pages+0x15/0x20
> [<ffffffff8118389f>] kill_bdev+0x2f/0x40
> [<ffffffff81184096>] __blkdev_put+0x76/0x1d0
> [<ffffffff8118424d>] blkdev_put+0x5d/0x180
> [<ffffffff81184395>] blkdev_close+0x25/0x30
> [<ffffffff8114c43c>] __fput+0xcc/0x240
> [<ffffffff8114c5be>] ____fput+0xe/0x10
> [<ffffffff8105d419>] task_work_run+0x69/0xa0
> [<ffffffff8104165f>] do_exit+0x78f/0x930
> [<ffffffff810514f0>] ? get_signal_to_deliver+0xf0/0x6f0
> [<ffffffff8140d1e7>] ? _raw_spin_lock_irq+0x17/0x60
> [<ffffffff8140d930>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff81041b5e>] do_group_exit+0x4e/0xc0
> [<ffffffff81051628>] get_signal_to_deliver+0x228/0x6f0
> [<ffffffff8100223c>] do_signal+0x3c/0x620
> [<ffffffff81415dcc>] ? sysret_signal+0x5/0x3d
> [<ffffffff810028ad>] do_notify_resume+0x6d/0xb0
> [<ffffffff8120846e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff81415fd8>] int_signal+0x12/0x17
>
> Bart.
>

2012-08-24 17:48:07

by Bart Van Assche

[permalink] [raw]
Subject: Re: Kernel hangs in truncate_inode_pages()

On 08/24/12 17:24, Mike Christie wrote:
> On 08/24/2012 08:13 AM, Bart Van Assche wrote:
>> If I log in with the open-iscsi initiator to an iSCSI target an power
>> down the target the initiator hangs in truncate_inode_pages(). This
>> happens systematically with kernel 3.6-rc3 and it's something I've
>> never seen with kernel 3.4 or before. Has anyone else already run into
>> this issue ?
>
> I have not seen this issue before and have not seen it reported.
>
> Are you doing IO the disk at the time of power down, or are you doing a
> iscsiadm -m node ... --logout command?

There was no I/O load test running at the time of the target power down.
The call stack I posted is what I see before logout. And this is what I
get after iscsi logout:

BUG: soft lockup - CPU#3 stuck for 23s! [iscsid:8589]
Modules linked in: ... [last unloaded: crc32c]
irq event stamp: 918076162
hardirqs last enabled at (918076161): [<ffffffff8140d98f>] _raw_spin_unlock_irqrestore+0x3f/0x70
hardirqs last disabled at (918076162): [<ffffffff814167d7>] apic_timer_interrupt+0x67/0x80
softirqs last enabled at (917955754): [<ffffffff81044162>] __do_softirq+0x142/0x2a0
softirqs last disabled at (917955749): [<ffffffff81416f5c>] call_softirq+0x1c/0x30
CPU 3
Pid: 8589, comm: iscsid Not tainted 3.6.0-rc3+ #1 Gigabyte Technology Co., Ltd. Z68X-UD3H-B3/Z68X-UD3H-B3
RIP: 0010:[<ffffffff8140d991>] [<ffffffff8140d991>] _raw_spin_unlock_irqrestore+0x41/0x70
RSP: 0018:ffff8800ae7e5928 EFLAGS: 00000286
RAX: ffff880106002eb0 RBX: ffffffff810956a2 RCX: 0000000000000006
RDX: 0000000000000040 RSI: ffff880106003498 RDI: ffff880106002eb0
RBP: ffff8800ae7e5938 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000006
R13: 0000000000000007 R14: ffff880106002eb0 R15: ffff880106003498
FS: 00007fa8cfc22700(0000) GS:ffff88012fac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fa8cf26617c CR3: 00000000bc87c000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process iscsid (pid: 8589, threadinfo ffff8800ae7e4000, task ffff880106002eb0)
Stack:
ffff88011af0c000 ffff88001ea0fff0 ffff8800ae7e5988 ffffffffa00aaafc
0000000000000286 ffff88001ea10000 ffff8800ae7e5968 ffff88008ae44a28
ffff88008ae44800 ffff88008ae44818 0000000000000000 0000000000000001
Call Trace:
[<ffffffffa00aaafc>] scsi_remove_target+0x1bc/0x1f0 [scsi_mod]
[<ffffffffa06af6c6>] __iscsi_unbind_session+0xc6/0x1a0 [scsi_transport_iscsi]
[<ffffffffa06af8a5>] iscsi_remove_session+0x105/0x1e0 [scsi_transport_iscsi]
[<ffffffffa06af996>] iscsi_destroy_session+0x16/0x60 [scsi_transport_iscsi]
[<ffffffffa07bc549>] iscsi_session_teardown+0xa9/0xd0 [libiscsi]
[<ffffffffa07c91a0>] iscsi_sw_tcp_session_destroy+0x50/0x70 [iscsi_tcp]
[<ffffffffa06aeef8>] iscsi_if_rx+0xab8/0xf60 [scsi_transport_iscsi]
[<ffffffff81350cad>] netlink_unicast+0x1ad/0x230
[<ffffffff8135102b>] netlink_sendmsg+0x2fb/0x350
[<ffffffff8131c458>] ? sock_update_classid+0xc8/0x140
[<ffffffff813162d2>] sock_sendmsg+0xa2/0xe0
[<ffffffff8111c9ac>] ? might_fault+0x9c/0xb0
[<ffffffff8111c963>] ? might_fault+0x53/0xb0
[<ffffffff81325356>] ? verify_iovec+0x56/0xd0
[<ffffffff81317ac2>] __sys_sendmsg+0x382/0x390
[<ffffffff8111c9ac>] ? might_fault+0x9c/0xb0
[<ffffffff8111c963>] ? might_fault+0x53/0xb0
[<ffffffff81069c63>] ? lg_local_unlock+0x23/0x50
[<ffffffff811501f6>] ? cp_new_stat+0x116/0x130
[<ffffffff813198f9>] sys_sendmsg+0x49/0x90
[<ffffffff81415d52>] system_call_fastpath+0x16/0x1b

Bart.