LinuxLists.cc - Re: INFO: task hung in sync

2018-02-07 15:55:22

Subject: Re: INFO: task hung in sync_blockdev

> #0: (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
> __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
> 1 lock held by blkid/19199:
> #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> #1: (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 1 lock held by syz-executor5/19330:
> #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> 1 lock held by syz-executor5/19331:
> #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439

It seems multiple processes deadlocked on the bd_mutex.
Unfortunately there's no backtrace for the lock acquisitions,
so it's hard to see the exact sequence.

It seems lockdep is already active, so it's likely not
just an ordering violation, but something else.

-Andi

2018-02-08 09:30:49

by Jan Kara

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On Wed 07-02-18 07:52:29, Andi Kleen wrote:
> > #0: (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
> > __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
> > 1 lock held by blkid/19199:
> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> > #1: (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
> > n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> > 1 lock held by syz-executor5/19330:
> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> > 1 lock held by syz-executor5/19331:
> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>
> It seems multiple processes deadlocked on the bd_mutex.
> Unfortunately there's no backtrace for the lock acquisitions,
> so it's hard to see the exact sequence.

Well, all in the report points to a situation where some IO was submitted
to the block device and never completed (more exactly it took longer than
those 120s to complete that IO). It would need more digging into the
syzkaller program to find out what kind of device that was and possibly why
the IO took so long to complete...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-02-08 13:29:49

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On Thu, Feb 8, 2018 at 10:28 AM, Jan Kara <[email protected]> wrote:
> On Wed 07-02-18 07:52:29, Andi Kleen wrote:
>> > #0: (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
>> > __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
>> > 1 lock held by blkid/19199:
>> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> > #1: (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
>> > n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
>> > 1 lock held by syz-executor5/19330:
>> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> > 1 lock held by syz-executor5/19331:
>> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>>
>> It seems multiple processes deadlocked on the bd_mutex.
>> Unfortunately there's no backtrace for the lock acquisitions,
>> so it's hard to see the exact sequence.
>
> Well, all in the report points to a situation where some IO was submitted
> to the block device and never completed (more exactly it took longer than
> those 120s to complete that IO). It would need more digging into the
> syzkaller program to find out what kind of device that was and possibly why
> the IO took so long to complete...

Would a traceback of all task stacks help in this case?
What I've seen in several "task hung" reports is that the CPU
traceback is not showing anything useful. So perhaps it should be
changed to task traceback? Or it would not help either?

2018-02-08 14:09:27

by Jan Kara

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On Thu 08-02-18 14:28:08, Dmitry Vyukov wrote:
> On Thu, Feb 8, 2018 at 10:28 AM, Jan Kara <[email protected]> wrote:
> > On Wed 07-02-18 07:52:29, Andi Kleen wrote:
> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
> >> > __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
> >> > 1 lock held by blkid/19199:
> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> > #1: (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
> >> > n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> >> > 1 lock held by syz-executor5/19330:
> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> > 1 lock held by syz-executor5/19331:
> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >>
> >> It seems multiple processes deadlocked on the bd_mutex.
> >> Unfortunately there's no backtrace for the lock acquisitions,
> >> so it's hard to see the exact sequence.
> >
> > Well, all in the report points to a situation where some IO was submitted
> > to the block device and never completed (more exactly it took longer than
> > those 120s to complete that IO). It would need more digging into the
> > syzkaller program to find out what kind of device that was and possibly why
> > the IO took so long to complete...
>
>
> Would a traceback of all task stacks help in this case?
> What I've seen in several "task hung" reports is that the CPU
> traceback is not showing anything useful. So perhaps it should be
> changed to task traceback? Or it would not help either?

Task stack traceback for all tasks (usually only tasks in D state - i.e.
sysrq-w - are enough actually) would definitely help for debugging
deadlocks on sleeping locks. For this particular case I'm not sure if it
would help or not since it is quite possible the IO is just sitting in some
queue never getting processed due to some racing syzkaller process tearing
down the device in the wrong moment or something like that... Such case is
very difficult to debug without full kernel crashdump of the hung kernel
(or a reproducer for that matter) and even with that it is usually rather
time consuming. But for the deadlocks which do occur more frequently it
would be probably worth the time so it would be nice if such option was
eventually available.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-02-08 14:19:50

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On Thu, Feb 8, 2018 at 3:08 PM, Jan Kara <[email protected]> wrote:
> On Thu 08-02-18 14:28:08, Dmitry Vyukov wrote:
>> On Thu, Feb 8, 2018 at 10:28 AM, Jan Kara <[email protected]> wrote:
>> > On Wed 07-02-18 07:52:29, Andi Kleen wrote:
>> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
>> >> > __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
>> >> > 1 lock held by blkid/19199:
>> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> >> > #1: (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
>> >> > n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
>> >> > 1 lock held by syz-executor5/19330:
>> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> >> > 1 lock held by syz-executor5/19331:
>> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> >>
>> >> It seems multiple processes deadlocked on the bd_mutex.
>> >> Unfortunately there's no backtrace for the lock acquisitions,
>> >> so it's hard to see the exact sequence.
>> >
>> > Well, all in the report points to a situation where some IO was submitted
>> > to the block device and never completed (more exactly it took longer than
>> > those 120s to complete that IO). It would need more digging into the
>> > syzkaller program to find out what kind of device that was and possibly why
>> > the IO took so long to complete...
>>
>>
>> Would a traceback of all task stacks help in this case?
>> What I've seen in several "task hung" reports is that the CPU
>> traceback is not showing anything useful. So perhaps it should be
>> changed to task traceback? Or it would not help either?
>
> Task stack traceback for all tasks (usually only tasks in D state - i.e.
> sysrq-w - are enough actually) would definitely help for debugging
> deadlocks on sleeping locks. For this particular case I'm not sure if it
> would help or not since it is quite possible the IO is just sitting in some
> queue never getting processed

That's what I was afraid of.

> due to some racing syzkaller process tearing
> down the device in the wrong moment or something like that... Such case is
> very difficult to debug without full kernel crashdump of the hung kernel
> (or a reproducer for that matter) and even with that it is usually rather
> time consuming. But for the deadlocks which do occur more frequently it
> would be probably worth the time so it would be nice if such option was
> eventually available.

By "full kernel crashdump" you mean kdump thing, or something else?

2018-02-08 14:52:07

by Andi Kleen

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

> > It seems multiple processes deadlocked on the bd_mutex.
> > Unfortunately there's no backtrace for the lock acquisitions,
> > so it's hard to see the exact sequence.
>
> Well, all in the report points to a situation where some IO was submitted
> to the block device and never completed (more exactly it took longer than
> those 120s to complete that IO). It would need more digging into the

Are you sure? I didn't think outstanding IO would take bd_mutex.

-Andi

2018-02-08 16:21:03

by Jan Kara

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On Thu 08-02-18 15:18:11, Dmitry Vyukov wrote:
> On Thu, Feb 8, 2018 at 3:08 PM, Jan Kara <[email protected]> wrote:
> > On Thu 08-02-18 14:28:08, Dmitry Vyukov wrote:
> >> On Thu, Feb 8, 2018 at 10:28 AM, Jan Kara <[email protected]> wrote:
> >> > On Wed 07-02-18 07:52:29, Andi Kleen wrote:
> >> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
> >> >> > __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
> >> >> > 1 lock held by blkid/19199:
> >> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> >> > #1: (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
> >> >> > n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> >> >> > 1 lock held by syz-executor5/19330:
> >> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> >> > 1 lock held by syz-executor5/19331:
> >> >> > #0: (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> >>
> >> >> It seems multiple processes deadlocked on the bd_mutex.
> >> >> Unfortunately there's no backtrace for the lock acquisitions,
> >> >> so it's hard to see the exact sequence.
> >> >
> >> > Well, all in the report points to a situation where some IO was submitted
> >> > to the block device and never completed (more exactly it took longer than
> >> > those 120s to complete that IO). It would need more digging into the
> >> > syzkaller program to find out what kind of device that was and possibly why
> >> > the IO took so long to complete...
> >>
> >>
> >> Would a traceback of all task stacks help in this case?
> >> What I've seen in several "task hung" reports is that the CPU
> >> traceback is not showing anything useful. So perhaps it should be
> >> changed to task traceback? Or it would not help either?
> >
> > Task stack traceback for all tasks (usually only tasks in D state - i.e.
> > sysrq-w - are enough actually) would definitely help for debugging
> > deadlocks on sleeping locks. For this particular case I'm not sure if it
> > would help or not since it is quite possible the IO is just sitting in some
> > queue never getting processed
>
> That's what I was afraid of.
>
> > due to some racing syzkaller process tearing
> > down the device in the wrong moment or something like that... Such case is
> > very difficult to debug without full kernel crashdump of the hung kernel
> > (or a reproducer for that matter) and even with that it is usually rather
> > time consuming. But for the deadlocks which do occur more frequently it
> > would be probably worth the time so it would be nice if such option was
> > eventually available.
>
> By "full kernel crashdump" you mean kdump thing, or something else?

Yes, the kdump thing (for KVM guest you can grab the memory dump also from
the host in a simplier way and it should be usable with the crash utility
AFAIK).

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-02-08 16:22:21

by Jan Kara

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On Thu 08-02-18 06:49:18, Andi Kleen wrote:
> > > It seems multiple processes deadlocked on the bd_mutex.
> > > Unfortunately there's no backtrace for the lock acquisitions,
> > > so it's hard to see the exact sequence.
> >
> > Well, all in the report points to a situation where some IO was submitted
> > to the block device and never completed (more exactly it took longer than
> > those 120s to complete that IO). It would need more digging into the
>
> Are you sure? I didn't think outstanding IO would take bd_mutex.

The stack trace is:

schedule+0xf5/0x430 kernel/sched/core.c:3480
io_schedule+0x1c/0x70 kernel/sched/core.c:5096
wait_on_page_bit_common+0x4b3/0x770 mm/filemap.c:1099
wait_on_page_bit mm/filemap.c:1132 [inline]
wait_on_page_writeback include/linux/pagemap.h:546 [inline]
__filemap_fdatawait_range+0x282/0x430 mm/filemap.c:533
filemap_fdatawait_range mm/filemap.c:558 [inline]
filemap_fdatawait include/linux/fs.h:2590 [inline]
filemap_write_and_wait+0x7a/0xd0 mm/filemap.c:624
__sync_blockdev fs/block_dev.c:448 [inline]
sync_blockdev.part.29+0x50/0x70 fs/block_dev.c:457
sync_blockdev fs/block_dev.c:444 [inline]
__blkdev_put+0x18b/0x7f0 fs/block_dev.c:1763
blkdev_put+0x85/0x4f0 fs/block_dev.c:1835
blkdev_close+0x8b/0xb0 fs/block_dev.c:1842
__fput+0x327/0x7e0 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:243

So we are waiting for PageWriteback on some page. And bd_mutex is grabbed
by this process in __blkdev_put() before calling sync_blockdev().

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-02-08 16:25:07

by Andrey Ryabinin

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On 02/08/2018 07:18 PM, Jan Kara wrote:

>> By "full kernel crashdump" you mean kdump thing, or something else?
>
> Yes, the kdump thing (for KVM guest you can grab the memory dump also from
> the host in a simplier way and it should be usable with the crash utility
> AFAIK).
>

In QEMU monitor 'dump-guest-memory' command:

(qemu) help dump-guest-memory
dump-guest-memory [-p] [-d] [-z|-l|-s] filename [begin length] -- dump guest memory into file 'filename'.
-p: do paging to get guest's memory mapping.
-d: return immediately (do not wait for completion).
-z: dump in kdump-compressed format, with zlib compression.
-l: dump in kdump-compressed format, with lzo compression.
-s: dump in kdump-compressed format, with snappy compression.
begin: the starting physical address.
length: the memory size, in bytes

2018-02-08 17:20:03

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On Thu, Feb 8, 2018 at 5:23 PM, Andrey Ryabinin <[email protected]> wrote:
>
>
> On 02/08/2018 07:18 PM, Jan Kara wrote:
>
>>> By "full kernel crashdump" you mean kdump thing, or something else?
>>
>> Yes, the kdump thing (for KVM guest you can grab the memory dump also from
>> the host in a simplier way and it should be usable with the crash utility
>> AFAIK).
>>
>
> In QEMU monitor 'dump-guest-memory' command:
>
> (qemu) help dump-guest-memory
> dump-guest-memory [-p] [-d] [-z|-l|-s] filename [begin length] -- dump guest memory into file 'filename'.
> -p: do paging to get guest's memory mapping.
> -d: return immediately (do not wait for completion).
> -z: dump in kdump-compressed format, with zlib compression.
> -l: dump in kdump-compressed format, with lzo compression.
> -s: dump in kdump-compressed format, with snappy compression.
> begin: the starting physical address.
> length: the memory size, in bytes

Nice!
Do you know straight away if it's scriptable/automatable? Or do I just
send some magic sequence of bytes representing ^A+C,
dump-guest-memory, \n to stdin pipe?

Unfortunately, syzbot uses GCE VMs for testing, and there does not
seem to be such feature on GCE...

2018-02-09 15:33:03

by Andrey Ryabinin

[permalink] [raw]

Subject: Re: INFO: task hung in sync_blockdev

On 02/08/2018 08:17 PM, Dmitry Vyukov wrote:
> On Thu, Feb 8, 2018 at 5:23 PM, Andrey Ryabinin <[email protected]> wrote:
>>
>>
>> On 02/08/2018 07:18 PM, Jan Kara wrote:
>>
>>>> By "full kernel crashdump" you mean kdump thing, or something else?
>>>
>>> Yes, the kdump thing (for KVM guest you can grab the memory dump also from
>>> the host in a simplier way and it should be usable with the crash utility
>>> AFAIK).
>>>
>>
>> In QEMU monitor 'dump-guest-memory' command:
>>
>> (qemu) help dump-guest-memory
>> dump-guest-memory [-p] [-d] [-z|-l|-s] filename [begin length] -- dump guest memory into file 'filename'.
>> -p: do paging to get guest's memory mapping.
>> -d: return immediately (do not wait for completion).
>> -z: dump in kdump-compressed format, with zlib compression.
>> -l: dump in kdump-compressed format, with lzo compression.
>> -s: dump in kdump-compressed format, with snappy compression.
>> begin: the starting physical address.
>> length: the memory size, in bytes
>
>
> Nice!
> Do you know straight away if it's scriptable/automatable? Or do I just
> send some magic sequence of bytes representing ^A+C,
> dump-guest-memory, \n to stdin pipe?
>

I wouldn't do it via stdin. You can setup monitor on any chardev you like and send command
there when you know that guest paniced. Look for -mon and -chardev qemu options.

> Unfortunately, syzbot uses GCE VMs for testing, and there does not
> seem to be such feature on GCE...
>

Well, you still have kdump.