2016-05-31 14:09:25

by Eryu Guan

[permalink] [raw]
Subject: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Hi,

I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
("ext4: fix data exposure after a crash").

It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
and I can reproduce it on different hosts, both bare metal and kvm
guest.

Thanks,
Eryu

P.S-1: a slightly simplified reproducer

#!/bin/bash
dev=/dev/sda5
mnt=/mnt/ext4

mkfs -t ext4 -b 1024 $dev
mount $dev $mnt

echo "abcdefghijklmnopqrstuvwxyz" > $mnt/testfile
xfs_io -f -s -c "pread -v 0 1" -c "pread -v 1 1" -c "pread -v 2 1" -c "pread -v 3 1" -c "pread -v 4 1" -c "pread -v 5 1" -c "pread -v 6 1" -c "pread -v 7 1" -c "pread -v 8 1" -c "pread -v 9 1" -c "pread -v 10 1" -c "pread -v 11 1" -c "pread -v 12 1" -c "pread -v 13 13" -c "pwrite -S 0x61 4090 1" -c "pwrite -S 0x62 4091 1" -c "pwrite -S 0x63 4092 1" -c "pwrite -S 0x64 4093 1" -c "pwrite -S 0x65 4094 1" -c "pwrite -S 0x66 4095 1" -c "pwrite -S 0x67 4096 1" -c "pwrite -S 0x68 4097 1" -c "pwrite -S 0x69 4098 1" -c "pwrite -S 0x6A 4099 1" -c "pwrite -S 0x6B 4100 1" -c "pwrite -S 0x6C 4101 1" -c "pwrite -S 0x6D 4102 1" -c "pwrite -S 0x6E 4103 1" -c "pwrite -S 0x6F 4104 1" -c "pwrite -S 0x70 4105 1" -c "pread -v 4090 4" -c "pread -v 4094 4" -c "pread -v 4098 4" -c "pread -v 4102 4" -c "pwrite -S
0x61 10000000000 1" -c "pwrite -S 0x62 10000000001 1" -c "pwrite -S 0x63 10000000002 1" -c "pwrite -S 0x64 10000000003 1" -c "pwrite -S 0x65 10000000004 1" -c "pwrite -S 0x66 10000000005 1" -c "pwrite -S 0x67 10000000006 1" -c "pwrite -S 0x68 10000000007 1" -c "pwrite -S 0x69 10000000008 1" -c "pwrite -S 0x6A 10000000009 1" -c "pwrite -S 0x6B 10000000010 1" -c "pwrite -S 0x6C 10000000011 1" -c "pwrite -S 0x6D 10000000012 1" -c "pwrite -S 0x6E 10000000013 1" -c "pwrite -S 0x6F 10000000014 1" -c "pwrite -S 0x70 10000000015 1" -c "pread -v 10000000000 4" -c "pread -v 10000000004 4" -c "pread -v 10000000008 4" -c "pread -v 10000000012 4" $mnt/testfile


P.S-2: sysrq-w output
[43360.261177] sysrq: SysRq : Show Blocked State
[43360.265588] task PC stack pid father
[43360.271579] jbd2/sda5-8 D ffff880225d3b9e8 0 21723 2 0x00000080
[43360.278718] ffff880225d3b9e8 0000000000000000 ffff88022695bd80 0000000000002000
[43360.286229] ffff880225d3c000 0000000000000000 7fffffffffffffff ffff88022ffaa790
[43360.293741] ffffffff816c2f50 ffff880225d3ba00 ffffffff816c26e5 ffff88022fc17ec0
[43360.301268] Call Trace:
[43360.303737] [<ffffffff816c2f50>] ? bit_wait+0x50/0x50
[43360.308900] [<ffffffff816c26e5>] schedule+0x35/0x80
[43360.313884] [<ffffffff816c5691>] schedule_timeout+0x231/0x2d0
[43360.319733] [<ffffffff81318ad0>] ? queue_unplugged+0xa0/0xb0
[43360.325505] [<ffffffff810fc44c>] ? ktime_get+0x3c/0xb0
[43360.330739] [<ffffffff816c2f50>] ? bit_wait+0x50/0x50
[43360.335895] [<ffffffff816c1fb6>] io_schedule_timeout+0xa6/0x110
[43360.341917] [<ffffffff816c2f6b>] bit_wait_io+0x1b/0x60
[43360.347161] [<ffffffff816c2b10>] __wait_on_bit+0x60/0x90
[43360.352580] [<ffffffff8119025e>] wait_on_page_bit+0xce/0xf0
[43360.358256] [<ffffffff810cd1c0>] ? autoremove_wake_function+0x40/0x40
[43360.364798] [<ffffffff8119037f>] __filemap_fdatawait_range+0xff/0x180
[43360.371341] [<ffffffff8131b127>] ? submit_bio+0x77/0x150
[43360.376758] [<ffffffff81312b9b>] ? bio_alloc_bioset+0x1ab/0x2d0
[43360.382782] [<ffffffffa06bffa9>] ? jbd2_journal_write_metadata_buffer+0x279/0x430 [jbd2]
[43360.390973] [<ffffffff81190414>] filemap_fdatawait_range+0x14/0x30
[43360.397264] [<ffffffff81190453>] filemap_fdatawait+0x23/0x30
[43360.403032] [<ffffffffa06b7787>] jbd2_journal_commit_transaction+0x677/0x1860 [jbd2]
[43360.410881] [<ffffffff81036bb9>] ? sched_clock+0x9/0x10
[43360.416195] [<ffffffff8102c6d9>] ? __switch_to+0x219/0x5c0
[43360.421795] [<ffffffffa06bcd5a>] kjournald2+0xca/0x260 [jbd2]
[43360.427649] [<ffffffff810cd180>] ? prepare_to_wait_event+0xf0/0xf0
[43360.433936] [<ffffffffa06bcc90>] ? commit_timeout+0x10/0x10 [jbd2]
[43360.440215] [<ffffffff810a92b8>] kthread+0xd8/0xf0
[43360.445105] [<ffffffff816c663f>] ret_from_fork+0x1f/0x40
[43360.450519] [<ffffffff810a91e0>] ? kthread_park+0x60/0x60
[43360.456025] xfs_io D ffff880082503960 0 21895 21474 0x00000080
[43360.463145] ffff880082503960 0000000000000246 ffff880220805200 ffff880225d5f088
[43360.470681] ffff880082504000 0000000000000012 ffff880225d5f088 ffff880225d5f024
[43360.478186] ffff8800825039a8 ffff880082503978 ffffffff816c26e5 ffff880225d5f000
[43360.485707] Call Trace:
[43360.488176] [<ffffffff816c26e5>] schedule+0x35/0x80
[43360.493162] [<ffffffffa06bc899>] jbd2_log_wait_commit+0xa9/0x130 [jbd2]
[43360.499877] [<ffffffff810cd180>] ? prepare_to_wait_event+0xf0/0xf0
[43360.506163] [<ffffffffa06b560c>] jbd2_journal_stop+0x38c/0x3e0 [jbd2]
[43360.512731] [<ffffffffa07337fc>] __ext4_journal_stop+0x3c/0xa0 [ext4]
[43360.519278] [<ffffffffa0703bce>] ext4_writepages+0x8ce/0xd70 [ext4]
[43360.525660] [<ffffffff8119e8ae>] do_writepages+0x1e/0x30
[43360.531068] [<ffffffff81192996>] __filemap_fdatawrite_range+0xc6/0x100
[43360.537699] [<ffffffff81192b01>] filemap_write_and_wait_range+0x41/0x90
[43360.544420] [<ffffffffa06fa971>] ext4_sync_file+0xb1/0x320 [ext4]
[43360.550619] [<ffffffff8124ca7d>] vfs_fsync_range+0x3d/0xb0
[43360.556223] [<ffffffffa06f9fad>] ext4_file_write_iter+0x22d/0x330 [ext4]
[43360.563031] [<ffffffff811937b7>] ? generic_file_read_iter+0x627/0x7b0
[43360.569569] [<ffffffff812180b3>] __vfs_write+0xe3/0x160
[43360.574888] [<ffffffff81219302>] vfs_write+0xb2/0x1b0
[43360.580046] [<ffffffff8121a8f7>] SyS_pwrite64+0x87/0xb0
[43360.585366] [<ffffffff81003b12>] do_syscall_64+0x62/0x110
[43360.590869] [<ffffffff816c64e1>] entry_SYSCALL64_slow_path+0x25/0x25


2016-05-31 15:40:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
>
> I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
> block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
> ("ext4: fix data exposure after a crash").
>
> It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
> and I can reproduce it on different hosts, both bare metal and kvm
> guest.

Hmm, it's not reproducing for me, either using your simplified repro
or generic/130. Is there something specific with your kernel config,
which is needed for the reproduction, perhaps?

- Ted

FSTESTVER: e2fsprogs v1.43-25-ge2406b9 (Wed, 25 May 2016 00:30:42 -0400)
FSTESTVER: fio fio-2.6-8-ge6989e1 (Thu, 4 Feb 2016 12:09:48 -0700)
FSTESTVER: quota 67fd9cc (Mon, 4 Apr 2016 00:32:39 -0400)
FSTESTVER: xfsprogs v4.3.0 (Mon, 23 Nov 2015 15:24:24 +1100)
FSTESTVER: xfstests-bld ccae8d1 (Tue, 26 Apr 2016 00:42:18 -0400)
FSTESTVER: xfstests linux-v3.8-1036-gd22e675 (Wed, 25 May 2016 00:58:35 -0400)
FSTESTVER: kernel 4.7.0-rc1-ext4 #293 SMP Tue May 31 11:31:24 EDT 2016 x86_64
FSTESTCFG: "1k"
FSTESTSET: "generic/130"
FSTESTEXC: ""
FSTESTOPT: "aex"
MNTOPTS: ""
CPUS: "2"
MEM: "2007.43"
total used free shared buffers cached
Mem: 2007 93 1913 9 3 29
-/+ buffers/cache: 60 1946
Swap: 0 0 0
BEGIN TEST 1k: Ext4 1k block Tue May 31 11:33:55 EDT 2016
DEVICE: /dev/vdd
MK2FS OPTIONS: -q -b 1024
MOUNT OPTIONS: -o block_validity
FSTYP -- ext4
PLATFORM -- Linux/x86_64 kvm-xfstests 4.7.0-rc1-ext4
MKFS_OPTIONS -- -q -b 1024 /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc

generic/130 [11:33:55]run fstests generic/130 at 2016-05-31 11:33:55
[11:33:59] 4s
Ran: generic/130
Passed all 1 tests

total used free shared buffers cached
Mem: 2007 71 1936 9 0 16
-/+ buffers/cache: 53 1953
Swap: 0 0 0
END TEST: Ext4 1k block Tue May 31 11:33:59 EDT 2016
reboot: Power down

2016-06-01 06:38:33

by Eryu Guan

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Tue, May 31, 2016 at 11:40:17AM -0400, Theodore Ts'o wrote:
> On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
> >
> > I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
> > block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
> > ("ext4: fix data exposure after a crash").
> >
> > It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
> > and I can reproduce it on different hosts, both bare metal and kvm
> > guest.
>
> Hmm, it's not reproducing for me, either using your simplified repro
> or generic/130. Is there something specific with your kernel config,
> which is needed for the reproduction, perhaps?

That's weird, it's easily reproduced for me on different hosts/guests.
The kernel config I'm using is based on the config from RHEL7.2 kernel,
leaving all new config options to their default choices. i.e

cp /boot/<config-rhel7.2> ./.config && yes "" | make oldconfig && make

I attached my kernel config file.

And my test vm has 8G memory & 4 vcpus, with RHEL7.2 installed running
upstream kernel, host is RHEL6.7. xfsprogs version 3.2.2 (shipped with
RHEL7.2) and version 4.5.0 (compiled from upstream) made no difference.

I think I can try configs from other venders such as SuSE, Ubuntu. If
you can share your config file I'll test it as well.

Thanks,
Eryu

>
> - Ted
>
> FSTESTVER: e2fsprogs v1.43-25-ge2406b9 (Wed, 25 May 2016 00:30:42 -0400)
> FSTESTVER: fio fio-2.6-8-ge6989e1 (Thu, 4 Feb 2016 12:09:48 -0700)
> FSTESTVER: quota 67fd9cc (Mon, 4 Apr 2016 00:32:39 -0400)
> FSTESTVER: xfsprogs v4.3.0 (Mon, 23 Nov 2015 15:24:24 +1100)
> FSTESTVER: xfstests-bld ccae8d1 (Tue, 26 Apr 2016 00:42:18 -0400)
> FSTESTVER: xfstests linux-v3.8-1036-gd22e675 (Wed, 25 May 2016 00:58:35 -0400)
> FSTESTVER: kernel 4.7.0-rc1-ext4 #293 SMP Tue May 31 11:31:24 EDT 2016 x86_64
> FSTESTCFG: "1k"
> FSTESTSET: "generic/130"
> FSTESTEXC: ""
> FSTESTOPT: "aex"
> MNTOPTS: ""
> CPUS: "2"
> MEM: "2007.43"
> total used free shared buffers cached
> Mem: 2007 93 1913 9 3 29
> -/+ buffers/cache: 60 1946
> Swap: 0 0 0
> BEGIN TEST 1k: Ext4 1k block Tue May 31 11:33:55 EDT 2016
> DEVICE: /dev/vdd
> MK2FS OPTIONS: -q -b 1024
> MOUNT OPTIONS: -o block_validity
> FSTYP -- ext4
> PLATFORM -- Linux/x86_64 kvm-xfstests 4.7.0-rc1-ext4
> MKFS_OPTIONS -- -q -b 1024 /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
>
> generic/130 [11:33:55]run fstests generic/130 at 2016-05-31 11:33:55
> [11:33:59] 4s
> Ran: generic/130
> Passed all 1 tests
>
> total used free shared buffers cached
> Mem: 2007 71 1936 9 0 16
> -/+ buffers/cache: 53 1953
> Swap: 0 0 0
> END TEST: Ext4 1k block Tue May 31 11:33:59 EDT 2016
> reboot: Power down
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Attachments:
(No filename) (3.14 kB)
kernel-config.bz2 (31.88 kB)
Download all attachments

2016-06-01 13:53:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Wed, Jun 01, 2016 at 02:38:22PM +0800, Eryu Guan wrote:
> I think I can try configs from other venders such as SuSE, Ubuntu. If
> you can share your config file I'll test it as well.

Here are the configs I'm using. The 32-bit config is used with KVM.
The 64-bit config is used with KVM and Google Compute Engine
(http://thunk.org/gce-xfstests).

Cheers,

- Ted


Attachments:
(No filename) (372.00 B)
ext4-32.config.bz2 (19.89 kB)
ext4-64.config.bz2 (13.31 kB)
Download all attachments

2016-06-01 16:58:12

by Eryu Guan

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Wed, Jun 01, 2016 at 02:38:22PM +0800, Eryu Guan wrote:
> On Tue, May 31, 2016 at 11:40:17AM -0400, Theodore Ts'o wrote:
> > On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
> > >
> > > I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
> > > block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
> > > ("ext4: fix data exposure after a crash").
> > >
> > > It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
> > > and I can reproduce it on different hosts, both bare metal and kvm
> > > guest.
> >
> > Hmm, it's not reproducing for me, either using your simplified repro
> > or generic/130. Is there something specific with your kernel config,
> > which is needed for the reproduction, perhaps?
>
> That's weird, it's easily reproduced for me on different hosts/guests.
> The kernel config I'm using is based on the config from RHEL7.2 kernel,
> leaving all new config options to their default choices. i.e
>
> cp /boot/<config-rhel7.2> ./.config && yes "" | make oldconfig && make
>
> I attached my kernel config file.
>
> And my test vm has 8G memory & 4 vcpus, with RHEL7.2 installed running
> upstream kernel, host is RHEL6.7. xfsprogs version 3.2.2 (shipped with
> RHEL7.2) and version 4.5.0 (compiled from upstream) made no difference.
>
> I think I can try configs from other venders such as SuSE, Ubuntu. If
> you can share your config file I'll test it as well.

I've tried kernel config from Ubuntu 16.04, and I can reproduce the hang
as well. If I add "-o data=journal" or "-o data=writeback" mount option,
I don't see the hang. So seems it only happens in data=ordered mode,
which matches the code change in commit 06bd3c36a733, I think.

I had some troubles booting the kernel compiled with your config file,
I'll dig into it more tomorrow.

Thanks,
Eryu

2016-06-02 08:58:43

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Thu 02-06-16 00:58:00, Eryu Guan wrote:
> On Wed, Jun 01, 2016 at 02:38:22PM +0800, Eryu Guan wrote:
> > On Tue, May 31, 2016 at 11:40:17AM -0400, Theodore Ts'o wrote:
> > > On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
> > > >
> > > > I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
> > > > block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
> > > > ("ext4: fix data exposure after a crash").
> > > >
> > > > It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
> > > > and I can reproduce it on different hosts, both bare metal and kvm
> > > > guest.
> > >
> > > Hmm, it's not reproducing for me, either using your simplified repro
> > > or generic/130. Is there something specific with your kernel config,
> > > which is needed for the reproduction, perhaps?
> >
> > That's weird, it's easily reproduced for me on different hosts/guests.
> > The kernel config I'm using is based on the config from RHEL7.2 kernel,
> > leaving all new config options to their default choices. i.e
> >
> > cp /boot/<config-rhel7.2> ./.config && yes "" | make oldconfig && make
> >
> > I attached my kernel config file.
> >
> > And my test vm has 8G memory & 4 vcpus, with RHEL7.2 installed running
> > upstream kernel, host is RHEL6.7. xfsprogs version 3.2.2 (shipped with
> > RHEL7.2) and version 4.5.0 (compiled from upstream) made no difference.
> >
> > I think I can try configs from other venders such as SuSE, Ubuntu. If
> > you can share your config file I'll test it as well.
>
> I've tried kernel config from Ubuntu 16.04, and I can reproduce the hang
> as well. If I add "-o data=journal" or "-o data=writeback" mount option,
> I don't see the hang. So seems it only happens in data=ordered mode,
> which matches the code change in commit 06bd3c36a733, I think.

Yeah, so this is what I kind of expected. From the backtraces you have
provided it is clear that:

1) There is process (xfs_io) doing O_SYNC write. That is blocked waiting
for transaction commit when it entered fsync path.

2) jbd2 thread is blocked waiting for PG_Writeback to be cleared - this
happens only in data=ordered mode.

But what is not clear to me is: Why PG_Writeback doesn't get cleared for
the page? It should get cleared once the IO that was submitted completes...
Also how my change can trigger the problem - we have waited for
PG_Writeback in data=ordered mode even before. What my patch did is that we
are now avoiding filemap_fdatawrite() call before the filemap_fdatawait()
call. So I suspect this is a race that has always been there and the new
faster code path is just tickling it in your setup.

I'll try to reproduce this problem in my setup (but my kvm instance fails
to boot with 4.7-rc1 so I'm debugging that currently) and if I succeed,
I'll debug this more. If I'm unable to reproduce this, I'll need you to
debug why the IO for that page does not complete. Probably attaching to the
hung kvm guest with gdb and looking through it is the simplest in that
case. Thanks for your report!

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-02 12:17:53

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Thu 02-06-16 10:58:40, Jan Kara wrote:
> On Thu 02-06-16 00:58:00, Eryu Guan wrote:
> > On Wed, Jun 01, 2016 at 02:38:22PM +0800, Eryu Guan wrote:
> > > On Tue, May 31, 2016 at 11:40:17AM -0400, Theodore Ts'o wrote:
> > > > On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
> > > > >
> > > > > I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
> > > > > block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
> > > > > ("ext4: fix data exposure after a crash").
> > > > >
> > > > > It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
> > > > > and I can reproduce it on different hosts, both bare metal and kvm
> > > > > guest.
> > > >
> > > > Hmm, it's not reproducing for me, either using your simplified repro
> > > > or generic/130. Is there something specific with your kernel config,
> > > > which is needed for the reproduction, perhaps?
> > >
> > > That's weird, it's easily reproduced for me on different hosts/guests.
> > > The kernel config I'm using is based on the config from RHEL7.2 kernel,
> > > leaving all new config options to their default choices. i.e
> > >
> > > cp /boot/<config-rhel7.2> ./.config && yes "" | make oldconfig && make
> > >
> > > I attached my kernel config file.
> > >
> > > And my test vm has 8G memory & 4 vcpus, with RHEL7.2 installed running
> > > upstream kernel, host is RHEL6.7. xfsprogs version 3.2.2 (shipped with
> > > RHEL7.2) and version 4.5.0 (compiled from upstream) made no difference.
> > >
> > > I think I can try configs from other venders such as SuSE, Ubuntu. If
> > > you can share your config file I'll test it as well.
> >
> > I've tried kernel config from Ubuntu 16.04, and I can reproduce the hang
> > as well. If I add "-o data=journal" or "-o data=writeback" mount option,
> > I don't see the hang. So seems it only happens in data=ordered mode,
> > which matches the code change in commit 06bd3c36a733, I think.
>
> Yeah, so this is what I kind of expected. From the backtraces you have
> provided it is clear that:
>
> 1) There is process (xfs_io) doing O_SYNC write. That is blocked waiting
> for transaction commit when it entered fsync path.
>
> 2) jbd2 thread is blocked waiting for PG_Writeback to be cleared - this
> happens only in data=ordered mode.
>
> But what is not clear to me is: Why PG_Writeback doesn't get cleared for
> the page? It should get cleared once the IO that was submitted completes...
> Also how my change can trigger the problem - we have waited for
> PG_Writeback in data=ordered mode even before. What my patch did is that we
> are now avoiding filemap_fdatawrite() call before the filemap_fdatawait()
> call. So I suspect this is a race that has always been there and the new
> faster code path is just tickling it in your setup.
>
> I'll try to reproduce this problem in my setup (but my kvm instance fails
> to boot with 4.7-rc1 so I'm debugging that currently) and if I succeed,
> I'll debug this more. If I'm unable to reproduce this, I'll need you to
> debug why the IO for that page does not complete. Probably attaching to the
> hung kvm guest with gdb and looking through it is the simplest in that
> case. Thanks for your report!

So I was trying but I could not reproduce the hang either. Can you find out
which page is jbd2 thread waiting for and dump page->index, page->flags and
also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
page->private? Maybe that will shed some light...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-02 12:30:12

by Nikola Pajkovsky

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Jan Kara <[email protected]> writes:

> On Thu 02-06-16 10:58:40, Jan Kara wrote:
>> On Thu 02-06-16 00:58:00, Eryu Guan wrote:
>> > On Wed, Jun 01, 2016 at 02:38:22PM +0800, Eryu Guan wrote:
>> > > On Tue, May 31, 2016 at 11:40:17AM -0400, Theodore Ts'o wrote:
>> > > > On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
>> > > > >
>> > > > > I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
>> > > > > block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
>> > > > > ("ext4: fix data exposure after a crash").
>> > > > >
>> > > > > It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
>> > > > > and I can reproduce it on different hosts, both bare metal and kvm
>> > > > > guest.
>> > > >
>> > > > Hmm, it's not reproducing for me, either using your simplified repro
>> > > > or generic/130. Is there something specific with your kernel config,
>> > > > which is needed for the reproduction, perhaps?
>> > >
>> > > That's weird, it's easily reproduced for me on different hosts/guests.
>> > > The kernel config I'm using is based on the config from RHEL7.2 kernel,
>> > > leaving all new config options to their default choices. i.e
>> > >
>> > > cp /boot/<config-rhel7.2> ./.config && yes "" | make oldconfig && make
>> > >
>> > > I attached my kernel config file.
>> > >
>> > > And my test vm has 8G memory & 4 vcpus, with RHEL7.2 installed running
>> > > upstream kernel, host is RHEL6.7. xfsprogs version 3.2.2 (shipped with
>> > > RHEL7.2) and version 4.5.0 (compiled from upstream) made no difference.
>> > >
>> > > I think I can try configs from other venders such as SuSE, Ubuntu. If
>> > > you can share your config file I'll test it as well.
>> >
>> > I've tried kernel config from Ubuntu 16.04, and I can reproduce the hang
>> > as well. If I add "-o data=journal" or "-o data=writeback" mount option,
>> > I don't see the hang. So seems it only happens in data=ordered mode,
>> > which matches the code change in commit 06bd3c36a733, I think.
>>
>> Yeah, so this is what I kind of expected. From the backtraces you have
>> provided it is clear that:
>>
>> 1) There is process (xfs_io) doing O_SYNC write. That is blocked waiting
>> for transaction commit when it entered fsync path.
>>
>> 2) jbd2 thread is blocked waiting for PG_Writeback to be cleared - this
>> happens only in data=ordered mode.
>>
>> But what is not clear to me is: Why PG_Writeback doesn't get cleared for
>> the page? It should get cleared once the IO that was submitted completes...
>> Also how my change can trigger the problem - we have waited for
>> PG_Writeback in data=ordered mode even before. What my patch did is that we
>> are now avoiding filemap_fdatawrite() call before the filemap_fdatawait()
>> call. So I suspect this is a race that has always been there and the new
>> faster code path is just tickling it in your setup.
>>
>> I'll try to reproduce this problem in my setup (but my kvm instance fails
>> to boot with 4.7-rc1 so I'm debugging that currently) and if I succeed,
>> I'll debug this more. If I'm unable to reproduce this, I'll need you to
>> debug why the IO for that page does not complete. Probably attaching to the
>> hung kvm guest with gdb and looking through it is the simplest in that
>> case. Thanks for your report!
>
> So I was trying but I could not reproduce the hang either. Can you find out
> which page is jbd2 thread waiting for and dump page->index, page->flags and
> also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> page->private? Maybe that will shed some light...

It's dump question, but how can I find out which page is jbd2 thread
waiting for and dump page->index, page->flags and also bh->b_state,
bh->b_blocknr of all 4 buffer heads attached to it via page->private?

--
Nikola

2016-06-03 10:16:20

by Eryu Guan

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
>
> So I was trying but I could not reproduce the hang either. Can you find out
> which page is jbd2 thread waiting for and dump page->index, page->flags and
> also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> page->private? Maybe that will shed some light...

I'm using crash on live system when the hang happens, so I got the page
address from "bt -f"

#6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
ffff880212343b48: ffffea0002c23600 000000000000000d
ffff880212343b58: 0000000000000000 0000000000000000
ffff880212343b68: ffff880213251480 ffffffff810cd000
ffff880212343b78: ffff88021ff27218 ffff88021ff27218
ffff880212343b88: 00000000c1b4a75a ffff880212343c68
ffff880212343b98: ffffffff811901bf

The call is "wait_on_page_bit(page, PG_writeback);" and PG_writeback is
13 of type enum, which is 0xd in hex. So ffffea0002c23600 might be the
page address. (Confirmed by following the page to the correct inode)

So page->index:
crash> page ffffea0002c23600 | grep index
index = 2441406,

page->flags:
crash> page ffffea0002c23600 | grep flags
flags = 9007197107267616,
crash> eval 9007197107267616
hexadecimal: 1fffff80002820
decimal: 9007197107267616
octal: 377777760000024040
binary: 0000000000011111111111111111111110000000000000000010100000100000

page->private:
crash> page ffffea0002c23600 | grep private
private = 18446612141237651696,
crash> eval 18446612141237651696
hexadecimal: ffff880213e0c8f0

The 4 buffer heads:
crash> buffer_head ffff880213e0c8f0
struct buffer_head {
b_state = 0,
b_this_page = 0xffff880213e0c958,
b_page = 0xffffea0002c23600,
b_blocknr = 18446744073709551615,
b_size = 1024,
b_data = 0xffff8800b08d8000 "",
b_bdev = 0x0,
b_end_io = 0x0,
b_private = 0x0,
b_assoc_buffers = {
next = 0xffff880213e0c938,
prev = 0xffff880213e0c938
},
b_assoc_map = 0x0,
b_count = {
counter = 0
}
}
crash> buffer_head 0xffff880213e0c958
struct buffer_head {
b_state = 289,
b_this_page = 0xffff880213e0c9c0,
b_page = 0xffffea0002c23600,
b_blocknr = 19194,
b_size = 1024,
b_data = 0xffff8800b08d8400 "a",
b_bdev = 0xffff8802152009c0,
b_end_io = 0x0,
b_private = 0x0,
b_assoc_buffers = {
next = 0xffff880213e0c9a0,
prev = 0xffff880213e0c9a0
},
b_assoc_map = 0x0,
b_count = {
counter = 0
}
}
crash> buffer_head 0xffff880213e0c9c0
struct buffer_head {
b_state = 1,
b_this_page = 0xffff880213e0ca28,
b_page = 0xffffea0002c23600,
b_blocknr = 18446744073709551615,
b_size = 1024,
b_data = 0xffff8800b08d8800 "",
b_bdev = 0x0,
b_end_io = 0x0,
b_private = 0x0,
b_assoc_buffers = {
next = 0xffff880213e0ca08,
prev = 0xffff880213e0ca08
},
b_assoc_map = 0x0,
b_count = {
counter = 0
}
}
crash> buffer_head 0xffff880213e0ca28
struct buffer_head {
b_state = 1,
b_this_page = 0xffff880213e0c8f0,
b_page = 0xffffea0002c23600,
b_blocknr = 18446744073709551615,
b_size = 1024,
b_data = 0xffff8800b08d8c00 "",
b_bdev = 0x0,
b_end_io = 0x0,
b_private = 0x0,
b_assoc_buffers = {
next = 0xffff880213e0ca70,
prev = 0xffff880213e0ca70
},
b_assoc_map = 0x0,
b_count = {
counter = 0
}
}

If you need anything else please let me know.

And just a follow-up on the testing with different kernel configs, I can
still reproduce the hang easily with OpenSuSE based kernel, and I still
have trouble booting the kernel compiled with Ted's config file... Maybe
it's missing some modules, but there're so many of them, I trend to give
up :)

Thanks,
Eryu

2016-06-03 11:58:47

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Fri 03-06-16 18:16:12, Eryu Guan wrote:
> On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
> >
> > So I was trying but I could not reproduce the hang either. Can you find out
> > which page is jbd2 thread waiting for and dump page->index, page->flags and
> > also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> > page->private? Maybe that will shed some light...
>
> I'm using crash on live system when the hang happens, so I got the page
> address from "bt -f"
>
> #6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
> ffff880212343b48: ffffea0002c23600 000000000000000d
> ffff880212343b58: 0000000000000000 0000000000000000
> ffff880212343b68: ffff880213251480 ffffffff810cd000
> ffff880212343b78: ffff88021ff27218 ffff88021ff27218
> ffff880212343b88: 00000000c1b4a75a ffff880212343c68
> ffff880212343b98: ffffffff811901bf

Thanks for debugging! In the end I was able to reproduce the issue on my
UML instance as well and I'm debugging what's going on.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-08 12:56:35

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Fri 03-06-16 13:58:44, Jan Kara wrote:
> On Fri 03-06-16 18:16:12, Eryu Guan wrote:
> > On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
> > >
> > > So I was trying but I could not reproduce the hang either. Can you find out
> > > which page is jbd2 thread waiting for and dump page->index, page->flags and
> > > also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> > > page->private? Maybe that will shed some light...
> >
> > I'm using crash on live system when the hang happens, so I got the page
> > address from "bt -f"
> >
> > #6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
> > ffff880212343b48: ffffea0002c23600 000000000000000d
> > ffff880212343b58: 0000000000000000 0000000000000000
> > ffff880212343b68: ffff880213251480 ffffffff810cd000
> > ffff880212343b78: ffff88021ff27218 ffff88021ff27218
> > ffff880212343b88: 00000000c1b4a75a ffff880212343c68
> > ffff880212343b98: ffffffff811901bf
>
> Thanks for debugging! In the end I was able to reproduce the issue on my
> UML instance as well and I'm debugging what's going on.

Attached patch fixes the issue for me. I'll submit it once a full xfstests
run finishes for it (which may take a while as our server room is currently
moving to a different place).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (1.32 kB)
0001-ext4-Fix-deadlock-during-page-writeback.patch (2.40 kB)
Download all attachments

2016-06-08 14:30:06

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
(snip)
> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> run finishes for it (which may take a while as our server room is currently
> moving to a different place).
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> From: Jan Kara <[email protected]>
> Date: Wed, 8 Jun 2016 10:01:45 +0200
> Subject: [PATCH] ext4: Fix deadlock during page writeback
>
> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> deadlock in ext4_writepages() which was previously much harder to hit.
> After this commit xfstest generic/130 reproduces the deadlock on small
> filesystems.

Since you marked this for -stable, just a heads-up that the previous patch
for the data exposure was rejected from -stable (see [1]) because it
has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
until 4.6. I removed it locally but Greg probably wants an official patch.

So both this and the previous patch need to be submitted.

-h

[1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}


2016-06-09 07:23:33

by Nikola Pajkovsky

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Holger Hoffstätte <[email protected]> writes:

> On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> (snip)
>> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> run finishes for it (which may take a while as our server room is currently
>> moving to a different place).
>>
>> Honza
>> --
>> Jan Kara <[email protected]>
>> SUSE Labs, CR
>> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> From: Jan Kara <[email protected]>
>> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> Subject: [PATCH] ext4: Fix deadlock during page writeback
>>
>> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> deadlock in ext4_writepages() which was previously much harder to hit.
>> After this commit xfstest generic/130 reproduces the deadlock on small
>> filesystems.
>
> Since you marked this for -stable, just a heads-up that the previous patch
> for the data exposure was rejected from -stable (see [1]) because it
> has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> until 4.6. I removed it locally but Greg probably wants an official patch.
>
> So both this and the previous patch need to be submitted.
>
> [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}

I'm just wondering if the Jan's patch is not related to blocked
processes in following trace. It very hard to hit it and I don't have
any reproducer.

kernel: INFO: task jbd2/vdc-8:4710 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: jbd2/vdc-8 D 0000000000000002 0 4710 2 0x00000080 severity=info
kernel: ffff88273a373b38 0000000000000046 ffff88273a373ae8 ffff88273a370010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273c76c7e0 ffff88273e322b20 severity=warning
kernel: ffff88273a373b38 ffff88277f092c40 ffff88273c76c7e0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffff8121021e>] __wait_on_buffer+0x2e/0x30 severity=warning
kernel: [<ffffffffa01180a1>] jbd2_journal_commit_transaction+0x861/0x16e0 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d847>] kjournald2+0x127/0x340 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d720>] ? jbd2_journal_clear_features+0x90/0x90 [jbd2] severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: INFO: task postmaster:25406 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 000000000000000f 0 25406 25395 0x00000080 severity=info
kernel: ffff882485e5f9a8 0000000000000086 ffff882485e5f958 ffff882485e5c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c9d640 ffff88273e3b8000 severity=warning
kernel: ffff882485e5f9a8 ffff88277f3d2c40 ffff882657c9d640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0146c08>] ext4_reserve_inode_write+0x78/0xa0 [ext4] severity=warning
kernel: [<ffffffffa0146c82>] ext4_mark_inode_dirty+0x52/0x240 [ext4] severity=warning
kernel: [<ffffffffa0146eba>] ext4_dirty_inode+0x4a/0x70 [ext4] severity=warning
kernel: [<ffffffff8120644a>] __mark_inode_dirty+0x3a/0x290 severity=warning
kernel: [<ffffffff811f6f91>] update_time+0x81/0xc0 severity=warning
kernel: [<ffffffff811f7068>] file_update_time+0x98/0xe0 severity=warning
kernel: [<ffffffff811630b2>] __generic_file_write_iter+0x162/0x390 severity=warning
kernel: [<ffffffffa013deb9>] ext4_file_write_iter+0x119/0x430 [ext4] severity=warning
kernel: [<ffffffff811dbf22>] new_sync_write+0x92/0xd0 severity=warning
kernel: [<ffffffff811dc43e>] vfs_write+0xce/0x180 severity=warning
kernel: [<ffffffff811dca86>] SyS_write+0x56/0xd0 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task kworker/u36:2:20256 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: kworker/u36:2 D 0000000000000002 0 20256 2 0x00000080 severity=info
kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:32) severity=info
kernel: ffff881b4ef83448 0000000000000046 ffff881b4ef833f8 ffff881b4ef80010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273a238e60 ffff88273e322b20 severity=warning
kernel: ffff881b4ef83418 ffff88277f092c40 ffff88273a238e60 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0175f37>] ? mb_mark_used+0x297/0x350 [ext4] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa0177139>] ext4_mb_mark_diskspace_used+0x79/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa01746a4>] ? __ext4_journal_start_sb+0x74/0x100 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff812c37e5>] ? blk_mq_flush_plug_list+0x135/0x150 severity=warning
kernel: [<ffffffff810a4073>] ? set_next_entity+0x93/0xa0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81205990>] __writeback_single_inode+0x40/0x220 severity=warning
kernel: [<ffffffff810ad72f>] ? wake_up_bit+0x2f/0x40 severity=warning
kernel: [<ffffffff81206c44>] writeback_sb_inodes+0x2b4/0x3c0 severity=warning
kernel: [<ffffffff811deef1>] ? put_super+0x31/0x40 severity=warning
kernel: [<ffffffff81206dee>] __writeback_inodes_wb+0x9e/0xd0 severity=warning
kernel: [<ffffffff81207063>] wb_writeback+0x243/0x2d0 severity=warning
kernel: [<ffffffff812071eb>] wb_do_writeback+0xfb/0x1e0 severity=warning
kernel: [<ffffffff81207340>] bdi_writeback_workfn+0x70/0x210 severity=warning
kernel: [<ffffffff810848c3>] process_one_work+0x143/0x4b0 severity=warning
kernel: [<ffffffff81085a33>] worker_thread+0x123/0x520 severity=warning
kernel: [<ffffffff81614095>] ? __schedule+0x375/0x750 severity=warning
kernel: [<ffffffff81085910>] ? maybe_create_worker+0x130/0x130 severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: INFO: task postmaster:14855 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 0000000000000009 0 14855 25395 0x00000080 severity=info
kernel: ffff8822bf3877b8 0000000000000082 ffff8822bf387768 ffff8822bf384010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825d71f5640 ffff88273e391cc0 severity=warning
kernel: ffff8822bf387788 ffff88277f252c40 ffff8825d71f5640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff8120f780>] ? bh_lru_install+0x170/0x1a0 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa01771ab>] ext4_mb_mark_diskspace_used+0xeb/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff81193472>] ? handle_mm_fault+0xb2/0x1a0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81160a69>] __filemap_fdatawrite_range+0x59/0x60 severity=warning
kernel: [<ffffffff81160b1a>] filemap_write_and_wait_range+0xaa/0x100 severity=warning
kernel: [<ffffffffa013eac3>] ext4_sync_file+0xf3/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14856 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 0000000000000006 0 14856 25395 0x00000080 severity=info
kernel: ffff88262affb8a8 0000000000000086 ffff88262affb858 ffff88262aff8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab1cc0 ffff88273e3264a0 severity=warning
kernel: ffff88225c97d6a0 ffff88277f192c40 ffff88266aab1cc0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa011576e>] ? start_this_handle+0x28e/0x520 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa01406d4>] __ext4_new_inode+0x3a4/0x1110 [ext4] severity=warning
kernel: [<ffffffff8123c960>] ? __dquot_initialize+0x30/0x1e0 severity=warning
kernel: [<ffffffffa0150bbb>] ext4_create+0xab/0x170 [ext4] severity=warning
kernel: [<ffffffff811e9118>] vfs_create+0xd8/0x100 severity=warning
kernel: [<ffffffff811e982e>] lookup_open+0x13e/0x1d0 severity=warning
kernel: [<ffffffff811ed006>] do_last+0x416/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811e77a9>] ? putname+0x29/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14857 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 0000000000000010 0 14857 25395 0x00000080 severity=info
kernel: ffff880748203df8 0000000000000082 ffff8826a94e67c8 ffff880748200010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab5640 ffff88273e3b8e60 severity=warning
kernel: ffff880748203e08 00000000023bc691 ffff882737c2e800 ffff880748203e38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14859 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 0000000000000001 0 14859 25395 0x00000080 severity=info
kernel: ffff8820d2017c38 0000000000000082 ffff882700000000 ffff8820d2014010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c99cc0 ffff88273e321cc0 severity=warning
kernel: ffff882657c99cc0 ffff8826feadc068 ffff882657c99cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff816165b6>] ? mutex_lock+0x16/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14860 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 000000000000000e 0 14860 25395 0x00000080 severity=info
kernel: ffff8820d22fbdf8 0000000000000086 ffff8826947ba4d8 ffff8820d22f8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a93e4a0 ffff88273e3964a0 severity=warning
kernel: ffff8820d22fbe08 00000000023bc691 ffff882737c2e800 ffff8820d22fbe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14861 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 0000000000000007 0 14861 25395 0x00000080 severity=info
kernel: ffff88041dfdfc38 0000000000000082 0000000000000000 ffff88041dfdc010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a939cc0 ffff88273e390000 severity=warning
kernel: ffff88281ffd8d80 ffff8826feadc068 ffff88266a939cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14868 blocked for more than 120 seconds. severity=err
kernel: Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster D 000000000000000a 0 14868 25395 0x00000080 severity=info
kernel: ffff8821bd68fdf8 0000000000000086 ffff88269980e7c8 ffff8821bd68c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273dd68e60 ffff88273e392b20 severity=warning
kernel: ffff8821bd68fe08 00000000023bc691 ffff882737c2e800 ffff8821bd68fe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: SysRq : Show Blocked State severity=info
kernel: task PC stack pid father severity=info
kernel: jbd2/vdc-8 D 0000000000000002 0 4710 2 0x00000080 severity=info
kernel: ffff88273a373b38 0000000000000046 ffff88273a373ae8 ffff88273a370010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273c76c7e0 ffff88273e322b20 severity=warning
kernel: ffff88273a373b38 ffff88277f092c40 ffff88273c76c7e0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffff8121021e>] __wait_on_buffer+0x2e/0x30 severity=warning
kernel: [<ffffffffa01180a1>] jbd2_journal_commit_transaction+0x861/0x16e0 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d847>] kjournald2+0x127/0x340 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d720>] ? jbd2_journal_clear_features+0x90/0x90 [jbd2] severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: postmaster D 000000000000000f 0 25406 25395 0x00000080 severity=info
kernel: ffff882485e5f9a8 0000000000000086 ffff882485e5f958 ffff882485e5c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c9d640 ffff88273e3b8000 severity=warning
kernel: ffff882485e5f9a8 ffff88277f3d2c40 ffff882657c9d640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0146c08>] ext4_reserve_inode_write+0x78/0xa0 [ext4] severity=warning
kernel: [<ffffffffa0146c82>] ext4_mark_inode_dirty+0x52/0x240 [ext4] severity=warning
kernel: [<ffffffffa0146eba>] ext4_dirty_inode+0x4a/0x70 [ext4] severity=warning
kernel: [<ffffffff8120644a>] __mark_inode_dirty+0x3a/0x290 severity=warning
kernel: [<ffffffff811f6f91>] update_time+0x81/0xc0 severity=warning
kernel: [<ffffffff811f7068>] file_update_time+0x98/0xe0 severity=warning
kernel: [<ffffffff811630b2>] __generic_file_write_iter+0x162/0x390 severity=warning
kernel: [<ffffffffa013deb9>] ext4_file_write_iter+0x119/0x430 [ext4] severity=warning
kernel: [<ffffffff811dbf22>] new_sync_write+0x92/0xd0 severity=warning
kernel: [<ffffffff811dc43e>] vfs_write+0xce/0x180 severity=warning
kernel: [<ffffffff811dca86>] SyS_write+0x56/0xd0 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 000000000000000b 0 26139 25395 0x00000080 severity=info
kernel: ffff882451267e28 0000000000000082 ffff882737404020 ffff882451264010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882641ea8e60 ffff88273e393980 severity=warning
kernel: 0000000000000003 ffff8826feadc068 ffff882641ea8e60 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616694>] __mutex_lock_killable_slowpath+0xb4/0x140 severity=warning
kernel: [<ffffffff81616762>] mutex_lock_killable+0x42/0x50 severity=warning
kernel: [<ffffffff811efc77>] iterate_dir+0x77/0x140 severity=warning
kernel: [<ffffffff8110e0cc>] ? __audit_syscall_entry+0xac/0x110 severity=warning
kernel: [<ffffffff811efef1>] SyS_getdents+0xb1/0x100 severity=warning
kernel: [<ffffffff811effd0>] ? SyS_old_readdir+0x90/0x90 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: kworker/u36:2 D 0000000000000002 0 20256 2 0x00000080 severity=info
kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:32) severity=info
kernel: ffff881b4ef83448 0000000000000046 ffff881b4ef833f8 ffff881b4ef80010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273a238e60 ffff88273e322b20 severity=warning
kernel: ffff881b4ef83418 ffff88277f092c40 ffff88273a238e60 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0175f37>] ? mb_mark_used+0x297/0x350 [ext4] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa0177139>] ext4_mb_mark_diskspace_used+0x79/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa01746a4>] ? __ext4_journal_start_sb+0x74/0x100 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff812c37e5>] ? blk_mq_flush_plug_list+0x135/0x150 severity=warning
kernel: [<ffffffff810a4073>] ? set_next_entity+0x93/0xa0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81205990>] __writeback_single_inode+0x40/0x220 severity=warning
kernel: [<ffffffff810ad72f>] ? wake_up_bit+0x2f/0x40 severity=warning
kernel: [<ffffffff81206c44>] writeback_sb_inodes+0x2b4/0x3c0 severity=warning
kernel: [<ffffffff811deef1>] ? put_super+0x31/0x40 severity=warning
kernel: [<ffffffff81206dee>] __writeback_inodes_wb+0x9e/0xd0 severity=warning
kernel: [<ffffffff81207063>] wb_writeback+0x243/0x2d0 severity=warning
kernel: [<ffffffff812071eb>] wb_do_writeback+0xfb/0x1e0 severity=warning
kernel: [<ffffffff81207340>] bdi_writeback_workfn+0x70/0x210 severity=warning
kernel: [<ffffffff810848c3>] process_one_work+0x143/0x4b0 severity=warning
kernel: [<ffffffff81085a33>] worker_thread+0x123/0x520 severity=warning
kernel: [<ffffffff81614095>] ? __schedule+0x375/0x750 severity=warning
kernel: [<ffffffff81085910>] ? maybe_create_worker+0x130/0x130 severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: postmaster D 0000000000000009 0 14855 25395 0x00000080 severity=info
kernel: ffff8822bf3877b8 0000000000000082 ffff8822bf387768 ffff8822bf384010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825d71f5640 ffff88273e391cc0 severity=warning
kernel: ffff8822bf387788 ffff88277f252c40 ffff8825d71f5640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff8120f780>] ? bh_lru_install+0x170/0x1a0 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa01771ab>] ext4_mb_mark_diskspace_used+0xeb/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff81193472>] ? handle_mm_fault+0xb2/0x1a0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81160a69>] __filemap_fdatawrite_range+0x59/0x60 severity=warning
kernel: [<ffffffff81160b1a>] filemap_write_and_wait_range+0xaa/0x100 severity=warning
kernel: [<ffffffffa013eac3>] ext4_sync_file+0xf3/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000006 0 14856 25395 0x00000080 severity=info
kernel: ffff88262affb8a8 0000000000000086 ffff88262affb858 ffff88262aff8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab1cc0 ffff88273e3264a0 severity=warning
kernel: ffff88225c97d6a0 ffff88277f192c40 ffff88266aab1cc0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa011576e>] ? start_this_handle+0x28e/0x520 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa01406d4>] __ext4_new_inode+0x3a4/0x1110 [ext4] severity=warning
kernel: [<ffffffff8123c960>] ? __dquot_initialize+0x30/0x1e0 severity=warning
kernel: [<ffffffffa0150bbb>] ext4_create+0xab/0x170 [ext4] severity=warning
kernel: [<ffffffff811e9118>] vfs_create+0xd8/0x100 severity=warning
kernel: [<ffffffff811e982e>] lookup_open+0x13e/0x1d0 severity=warning
kernel: [<ffffffff811ed006>] do_last+0x416/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811e77a9>] ? putname+0x29/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000010 0 14857 25395 0x00000080 severity=info
kernel: ffff880748203df8 0000000000000082 ffff8826a94e67c8 ffff880748200010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab5640 ffff88273e3b8e60 severity=warning
kernel: ffff880748203e08 00000000023bc691 ffff882737c2e800 ffff880748203e38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000001 0 14859 25395 0x00000080 severity=info
kernel: ffff8820d2017c38 0000000000000082 ffff882700000000 ffff8820d2014010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c99cc0 ffff88273e321cc0 severity=warning
kernel: ffff882657c99cc0 ffff8826feadc068 ffff882657c99cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff816165b6>] ? mutex_lock+0x16/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 000000000000000e 0 14860 25395 0x00000080 severity=info
kernel: ffff8820d22fbdf8 0000000000000086 ffff8826947ba4d8 ffff8820d22f8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a93e4a0 ffff88273e3964a0 severity=warning
kernel: ffff8820d22fbe08 00000000023bc691 ffff882737c2e800 ffff8820d22fbe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000007 0 14861 25395 0x00000080 severity=info
kernel: ffff88041dfdfc38 0000000000000082 0000000000000000 ffff88041dfdc010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a939cc0 ffff88273e390000 severity=warning
kernel: ffff88281ffd8d80 ffff8826feadc068 ffff88266a939cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 000000000000000a 0 14868 25395 0x00000080 severity=info
kernel: ffff8821bd68fdf8 0000000000000086 ffff88269980e7c8 ffff8821bd68c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273dd68e60 ffff88273e392b20 severity=warning
kernel: ffff8821bd68fe08 00000000023bc691 ffff882737c2e800 ffff8821bd68fe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000001 0 14869 25395 0x00000080 severity=info
kernel: ffff880ed01d3df8 0000000000000082 ffff8826995ed418 ffff880ed01d0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273dd6d640 ffff88273e321cc0 severity=warning
kernel: ffff880ed01d3e08 00000000023bc691 ffff882737c2e800 ffff880ed01d3e38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 000000000000000b 0 14905 25395 0x00000080 severity=info
kernel: ffff8824d12979a8 0000000000000086 ffff8824d1297958 ffff8824d1294010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8826317f3980 ffff88273e393980 severity=warning
kernel: 000000000000008d ffff88277f2d2c40 ffff8826317f3980 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0146c08>] ext4_reserve_inode_write+0x78/0xa0 [ext4] severity=warning
kernel: [<ffffffffa0146c82>] ext4_mark_inode_dirty+0x52/0x240 [ext4] severity=warning
kernel: [<ffffffffa0146eba>] ext4_dirty_inode+0x4a/0x70 [ext4] severity=warning
kernel: [<ffffffff8120644a>] __mark_inode_dirty+0x3a/0x290 severity=warning
kernel: [<ffffffff811f6f91>] update_time+0x81/0xc0 severity=warning
kernel: [<ffffffff811fcc63>] ? mntput+0x23/0x40 severity=warning
kernel: [<ffffffff811f7068>] file_update_time+0x98/0xe0 severity=warning
kernel: [<ffffffff811e6925>] ? terminate_walk+0x35/0x40 severity=warning
kernel: [<ffffffff811630b2>] __generic_file_write_iter+0x162/0x390 severity=warning
kernel: [<ffffffffa013deb9>] ext4_file_write_iter+0x119/0x430 [ext4] severity=warning
kernel: [<ffffffff81170c3b>] ? invalidate_mapping_pages+0x8b/0x210 severity=warning
kernel: [<ffffffff811dbf22>] new_sync_write+0x92/0xd0 severity=warning
kernel: [<ffffffff811dc43e>] vfs_write+0xce/0x180 severity=warning
kernel: [<ffffffff811dca86>] SyS_write+0x56/0xd0 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000000 0 14910 25395 0x00000080 severity=info
kernel: ffff882641f7bdf8 0000000000000086 ffff8802a90d7398 ffff882641f78010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8800db8c0000 ffffffff81c1b4c0 severity=warning
kernel: ffff882641f7be08 00000000023bc691 ffff882737c2e800 ffff882641f7be38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: smokerd plugin D 0000000000000009 0 15001 29979 0x00000084 severity=info
kernel: ffff88264c437b48 0000000000000082 ffff88264c437af8 ffff88264c434010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825d71f1cc0 ffff88273e391cc0 severity=warning
kernel: ffff8803362adc98 ffff88277f252c40 ffff8825d71f1cc0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0116cbf>] ? jbd2_journal_dirty_metadata+0xff/0x250 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa014cb7f>] ext4_orphan_add+0xbf/0x2e0 [ext4] severity=warning
kernel: [<ffffffffa0146cb8>] ? ext4_mark_inode_dirty+0x88/0x240 [ext4] severity=warning
kernel: [<ffffffffa0151d3a>] ext4_unlink+0x2aa/0x2d0 [ext4] severity=warning
kernel: [<ffffffff811e8bc3>] vfs_unlink+0xd3/0x140 severity=warning
kernel: [<ffffffff811ebd06>] do_unlinkat+0x2d6/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000007 0 15029 25395 0x00000080 severity=info
kernel: ffff88206081f8a8 0000000000000086 ffff88206081f858 ffff88206081c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825ab41ab20 ffff88273e390000 severity=warning
kernel: ffff88277ec03700 ffff88277f1d2c40 ffff8825ab41ab20 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa011576e>] ? start_this_handle+0x28e/0x520 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0140ada>] __ext4_new_inode+0x7aa/0x1110 [ext4] severity=warning
kernel: [<ffffffff8123c960>] ? __dquot_initialize+0x30/0x1e0 severity=warning
kernel: [<ffffffff8115fece>] ? find_get_entry+0x1e/0xa0 severity=warning
kernel: [<ffffffffa0150bbb>] ext4_create+0xab/0x170 [ext4] severity=warning
kernel: [<ffffffff811e9118>] vfs_create+0xd8/0x100 severity=warning
kernel: [<ffffffff811e982e>] lookup_open+0x13e/0x1d0 severity=warning
kernel: [<ffffffff811ed006>] do_last+0x416/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811e77a9>] ? putname+0x29/0x40 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000000 0 15175 25395 0x00000080 severity=info
kernel: ffff88057229bb48 0000000000000086 ffff88057229baf8 ffff880572298010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88245105c7e0 ffffffff81c1b4c0 severity=warning
kernel: ffff88057229bb38 ffff88277f012c40 ffff88245105c7e0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0116cbf>] ? jbd2_journal_dirty_metadata+0xff/0x250 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa014cb7f>] ext4_orphan_add+0xbf/0x2e0 [ext4] severity=warning
kernel: [<ffffffffa0146cb8>] ? ext4_mark_inode_dirty+0x88/0x240 [ext4] severity=warning
kernel: [<ffffffffa0151d3a>] ext4_unlink+0x2aa/0x2d0 [ext4] severity=warning
kernel: [<ffffffff811e8bc3>] vfs_unlink+0xd3/0x140 severity=warning
kernel: [<ffffffff811ebd06>] do_unlinkat+0x2d6/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000003 0 15823 25395 0x00000084 severity=info
kernel: ffff880424173db8 0000000000000086 ffff880424173de8 ffff880424170010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882254f49cc0 ffff88273e323980 severity=warning
kernel: ffff880424173dd8 ffff8826e845acb8 ffff882254f49cc0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000007 0 15855 25395 0x00000080 severity=info
kernel: ffff88266a8c3db8 0000000000000086 ffff88266a8c3de8 ffff88266a8c0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8820f680c7e0 ffff88273e390000 severity=warning
kernel: ffff88266a8c3dd8 ffff8826e845acb8 ffff8820f680c7e0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000006 0 16102 25395 0x00000084 severity=info
kernel: ffff8820d23e7db8 0000000000000086 ffff8820d23e7de8 ffff8820d23e4010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff881b4d6147e0 ffff88273e3264a0 severity=warning
kernel: ffff8820d23e7dd8 ffff8826e845acb8 ffff881b4d6147e0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 000000000000000c 0 16227 25395 0x00000084 severity=info
kernel: ffff881b4d8efdb8 0000000000000082 ffff881b4d8efde8 ffff881b4d8ec010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88220a8fe4a0 ffff88273e3947e0 severity=warning
kernel: ffff881b4d8efdd8 ffff8826e845acb8 ffff88220a8fe4a0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 000000000000000d 0 16658 25395 0x00000084 severity=info
kernel: ffff88251e143db8 0000000000000082 ffff88251e143de8 ffff88251e140010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88265b268000 ffff88273e395640 severity=warning
kernel: ffff88251e143dd8 ffff8826e845acb8 ffff88265b268000 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 0000000000000003 0 24648 25395 0x00000084 severity=info
kernel: ffff88266abb3db8 0000000000000086 ffff88266abb3de8 ffff88266abb0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882254f4ab20 ffff88273e323980 severity=warning
kernel: ffff88266abb3dd8 ffff8826e845acb8 ffff882254f4ab20 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: bash D 0000000000000004 0 25435 25434 0x00000084 severity=info
kernel: ffff88265b2c3e28 0000000000000086 ffff882737404020 ffff88265b2c0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88217e4764a0 ffff88273e3247e0 severity=warning
kernel: 0000000000000003 ffff88270bfe3c78 ffff88217e4764a0 ffff88270bfe3c7c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616694>] __mutex_lock_killable_slowpath+0xb4/0x140 severity=warning
kernel: [<ffffffff81616762>] mutex_lock_killable+0x42/0x50 severity=warning
kernel: [<ffffffff811efc77>] iterate_dir+0x77/0x140 severity=warning
kernel: [<ffffffff8110e0cc>] ? __audit_syscall_entry+0xac/0x110 severity=warning
kernel: [<ffffffff811efef1>] SyS_getdents+0xb1/0x100 severity=warning
kernel: [<ffffffff811effd0>] ? SyS_old_readdir+0x90/0x90 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster D 000000000000000b 0 27811 25395 0x00000084 severity=info
kernel: ffff8822bf0e7db8 0000000000000086 ffff8822bf0e7de8 ffff8822bf0e4010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882739808000 ffff88273e393980 severity=warning
kernel: ffff8822bf0e7dd8 ffff8826e845acb8 ffff882739808000 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning

--
Nikola

2016-06-09 14:59:25

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Please use reply-to-all when replying to emails. Usually I read mailing
lists with much smaller frequency than directly sent email...

On Wed 08-06-16 14:23:49, Holger Hoffst?tte wrote:
> On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> (snip)
> > Attached patch fixes the issue for me. I'll submit it once a full xfstests
> > run finishes for it (which may take a while as our server room is currently
> > moving to a different place).
> >
> > Honza
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
> > From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <[email protected]>
> > Date: Wed, 8 Jun 2016 10:01:45 +0200
> > Subject: [PATCH] ext4: Fix deadlock during page writeback
> >
> > Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> > deadlock in ext4_writepages() which was previously much harder to hit.
> > After this commit xfstest generic/130 reproduces the deadlock on small
> > filesystems.
>
> Since you marked this for -stable, just a heads-up that the previous patch
> for the data exposure was rejected from -stable (see [1]) because it
> has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> until 4.6. I removed it locally but Greg probably wants an official patch.
>
> So both this and the previous patch need to be submitted.

This patch is actually independent fix to the previous one. Just the
previous patch made the bug more visible. Regarding the backport - yeah, I
have it on my todo list.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-09 15:04:08

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> Holger Hoffst?tte <[email protected]> writes:
>
> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> > (snip)
> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> run finishes for it (which may take a while as our server room is currently
> >> moving to a different place).
> >>
> >> Honza
> >> --
> >> Jan Kara <[email protected]>
> >> SUSE Labs, CR
> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> From: Jan Kara <[email protected]>
> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >>
> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> filesystems.
> >
> > Since you marked this for -stable, just a heads-up that the previous patch
> > for the data exposure was rejected from -stable (see [1]) because it
> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >
> > So both this and the previous patch need to be submitted.
> >
> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>
> I'm just wondering if the Jan's patch is not related to blocked
> processes in following trace. It very hard to hit it and I don't have
> any reproducer.

This looks like a different issue. Does the machine recover itself or is it
a hard hang and you have to press a reset button?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-10 05:52:59

by Nikola Pajkovsky

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Jan Kara <[email protected]> writes:

> On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> Holger Hoffstätte <[email protected]> writes:
>>
>> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> > (snip)
>> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> run finishes for it (which may take a while as our server room is currently
>> >> moving to a different place).
>> >>
>> >> Honza
>> >> --
>> >> Jan Kara <[email protected]>
>> >> SUSE Labs, CR
>> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> From: Jan Kara <[email protected]>
>> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >>
>> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> filesystems.
>> >
>> > Since you marked this for -stable, just a heads-up that the previous patch
>> > for the data exposure was rejected from -stable (see [1]) because it
>> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >
>> > So both this and the previous patch need to be submitted.
>> >
>> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>>
>> I'm just wondering if the Jan's patch is not related to blocked
>> processes in following trace. It very hard to hit it and I don't have
>> any reproducer.
>
> This looks like a different issue. Does the machine recover itself or is it
> a hard hang and you have to press a reset button?

The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
ram and machine has dedicated mount point only for PostgreSQL data.

Nevertheless, I was able always to ssh to the machine, so machine itself
was not in hard hang and ext4 mostly gets recover by itself (it took
30min). But I have seen situation, were every process who 'touch' the ext4
goes immediately to D state and does not recover even after hour.

--
Nikola

2016-06-10 08:37:44

by Eryu Guan

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Wed, Jun 08, 2016 at 02:56:31PM +0200, Jan Kara wrote:
> On Fri 03-06-16 13:58:44, Jan Kara wrote:
> > On Fri 03-06-16 18:16:12, Eryu Guan wrote:
> > > On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
> > > >
> > > > So I was trying but I could not reproduce the hang either. Can you find out
> > > > which page is jbd2 thread waiting for and dump page->index, page->flags and
> > > > also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> > > > page->private? Maybe that will shed some light...
> > >
> > > I'm using crash on live system when the hang happens, so I got the page
> > > address from "bt -f"
> > >
> > > #6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
> > > ffff880212343b48: ffffea0002c23600 000000000000000d
> > > ffff880212343b58: 0000000000000000 0000000000000000
> > > ffff880212343b68: ffff880213251480 ffffffff810cd000
> > > ffff880212343b78: ffff88021ff27218 ffff88021ff27218
> > > ffff880212343b88: 00000000c1b4a75a ffff880212343c68
> > > ffff880212343b98: ffffffff811901bf
> >
> > Thanks for debugging! In the end I was able to reproduce the issue on my
> > UML instance as well and I'm debugging what's going on.
>
> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> run finishes for it (which may take a while as our server room is currently
> moving to a different place).

(Sorry for the late reply, I was on holiday yesterday)

Thanks for the fix! I'll give it a test as well.

Thanks,
Eryu

2016-06-12 03:28:30

by Eryu Guan

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Fri, Jun 10, 2016 at 04:37:36PM +0800, Eryu Guan wrote:
> On Wed, Jun 08, 2016 at 02:56:31PM +0200, Jan Kara wrote:
> > On Fri 03-06-16 13:58:44, Jan Kara wrote:
> > > On Fri 03-06-16 18:16:12, Eryu Guan wrote:
> > > > On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
> > > > >
> > > > > So I was trying but I could not reproduce the hang either. Can you find out
> > > > > which page is jbd2 thread waiting for and dump page->index, page->flags and
> > > > > also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> > > > > page->private? Maybe that will shed some light...
> > > >
> > > > I'm using crash on live system when the hang happens, so I got the page
> > > > address from "bt -f"
> > > >
> > > > #6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
> > > > ffff880212343b48: ffffea0002c23600 000000000000000d
> > > > ffff880212343b58: 0000000000000000 0000000000000000
> > > > ffff880212343b68: ffff880213251480 ffffffff810cd000
> > > > ffff880212343b78: ffff88021ff27218 ffff88021ff27218
> > > > ffff880212343b88: 00000000c1b4a75a ffff880212343c68
> > > > ffff880212343b98: ffffffff811901bf
> > >
> > > Thanks for debugging! In the end I was able to reproduce the issue on my
> > > UML instance as well and I'm debugging what's going on.
> >
> > Attached patch fixes the issue for me. I'll submit it once a full xfstests
> > run finishes for it (which may take a while as our server room is currently
> > moving to a different place).
>
> (Sorry for the late reply, I was on holiday yesterday)
>
> Thanks for the fix! I'll give it a test as well.

I tested this patch with xfstests on x86_64 and ppc64 hosts, all results
look fine, no regression found. Test configurations are: 4k/2k/1k block
size ext4/3/2 and data=journal|writeback ext4.

Thanks,
Eryu

2016-06-16 13:26:23

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
> Jan Kara <[email protected]> writes:
> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> >> Holger Hoffst?tte <[email protected]> writes:
> >>
> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> >> > (snip)
> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> >> run finishes for it (which may take a while as our server room is currently
> >> >> moving to a different place).
> >> >>
> >> >> Honza
> >> >> --
> >> >> Jan Kara <[email protected]>
> >> >> SUSE Labs, CR
> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> >> From: Jan Kara <[email protected]>
> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >> >>
> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> >> filesystems.
> >> >
> >> > Since you marked this for -stable, just a heads-up that the previous patch
> >> > for the data exposure was rejected from -stable (see [1]) because it
> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >> >
> >> > So both this and the previous patch need to be submitted.
> >> >
> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
> >>
> >> I'm just wondering if the Jan's patch is not related to blocked
> >> processes in following trace. It very hard to hit it and I don't have
> >> any reproducer.
> >
> > This looks like a different issue. Does the machine recover itself or is it
> > a hard hang and you have to press a reset button?
>
> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
> ram and machine has dedicated mount point only for PostgreSQL data.
>
> Nevertheless, I was able always to ssh to the machine, so machine itself
> was not in hard hang and ext4 mostly gets recover by itself (it took
> 30min). But I have seen situation, were every process who 'touch' the ext4
> goes immediately to D state and does not recover even after hour.

If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
dump stuck processes and also run 'iostat -x 1' for a while to see how much
IO is happening in the system? That should tell us more.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-16 14:43:02

by Nikola Pajkovsky

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Jan Kara <[email protected]> writes:

> On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
>> Jan Kara <[email protected]> writes:
>> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> >> Holger Hoffstätte <[email protected]> writes:
>> >>
>> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> >> > (snip)
>> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> >> run finishes for it (which may take a while as our server room is currently
>> >> >> moving to a different place).
>> >> >>
>> >> >> Honza
>> >> >> --
>> >> >> Jan Kara <[email protected]>
>> >> >> SUSE Labs, CR
>> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> >> From: Jan Kara <[email protected]>
>> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> >>
>> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> >> filesystems.
>> >> >
>> >> > Since you marked this for -stable, just a heads-up that the previous patch
>> >> > for the data exposure was rejected from -stable (see [1]) because it
>> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >> >
>> >> > So both this and the previous patch need to be submitted.
>> >> >
>> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> >>
>> >> I'm just wondering if the Jan's patch is not related to blocked
>> >> processes in following trace. It very hard to hit it and I don't have
>> >> any reproducer.
>> >
>> > This looks like a different issue. Does the machine recover itself or is it
>> > a hard hang and you have to press a reset button?
>>
>> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
>> ram and machine has dedicated mount point only for PostgreSQL data.
>>
>> Nevertheless, I was able always to ssh to the machine, so machine itself
>> was not in hard hang and ext4 mostly gets recover by itself (it took
>> 30min). But I have seen situation, were every process who 'touch' the ext4
>> goes immediately to D state and does not recover even after hour.
>
> If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
> dump stuck processes and also run 'iostat -x 1' for a while to see how much
> IO is happening in the system? That should tell us more.


Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
to mail it.

http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html

I was running iotop and there was traffic roughly ~20 KB/s write.

What was bit more interesting, was looking at

cat /proc/vmstat | egrep "nr_dirty|nr_writeback"

nr_drity had around 240 and was slowly counting up, but nr_writeback had
~8800 and was stuck for 120s.

--
Nikola

2016-06-20 11:39:53

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
> Jan Kara <[email protected]> writes:
>
> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
> >> Jan Kara <[email protected]> writes:
> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> >> >> Holger Hoffst?tte <[email protected]> writes:
> >> >>
> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> >> >> > (snip)
> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> >> >> run finishes for it (which may take a while as our server room is currently
> >> >> >> moving to a different place).
> >> >> >>
> >> >> >> Honza
> >> >> >> --
> >> >> >> Jan Kara <[email protected]>
> >> >> >> SUSE Labs, CR
> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> >> >> From: Jan Kara <[email protected]>
> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >> >> >>
> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> >> >> filesystems.
> >> >> >
> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
> >> >> > for the data exposure was rejected from -stable (see [1]) because it
> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >> >> >
> >> >> > So both this and the previous patch need to be submitted.
> >> >> >
> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
> >> >>
> >> >> I'm just wondering if the Jan's patch is not related to blocked
> >> >> processes in following trace. It very hard to hit it and I don't have
> >> >> any reproducer.
> >> >
> >> > This looks like a different issue. Does the machine recover itself or is it
> >> > a hard hang and you have to press a reset button?
> >>
> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
> >> ram and machine has dedicated mount point only for PostgreSQL data.
> >>
> >> Nevertheless, I was able always to ssh to the machine, so machine itself
> >> was not in hard hang and ext4 mostly gets recover by itself (it took
> >> 30min). But I have seen situation, were every process who 'touch' the ext4
> >> goes immediately to D state and does not recover even after hour.
> >
> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
> > IO is happening in the system? That should tell us more.
>
>
> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
> to mail it.
>
> http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html

Can you upload it again please? I've got to looking at the file only today
and it is already deleted. Thanks!

> I was running iotop and there was traffic roughly ~20 KB/s write.
>
> What was bit more interesting, was looking at
>
> cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
>
> nr_drity had around 240 and was slowly counting up, but nr_writeback had
> ~8800 and was stuck for 120s.

Hum, interesting. This would suggest like IO completion got stuck for some
reason. We'll see more from the stacktraces hopefully.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-20 13:00:03

by Nikola Pajkovsky

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Jan Kara <[email protected]> writes:

> On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
>> Jan Kara <[email protected]> writes:
>>
>> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
>> >> Jan Kara <[email protected]> writes:
>> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> >> >> Holger Hoffstätte <[email protected]> writes:
>> >> >>
>> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> >> >> > (snip)
>> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> >> >> run finishes for it (which may take a while as our server room is currently
>> >> >> >> moving to a different place).
>> >> >> >>
>> >> >> >> Honza
>> >> >> >> --
>> >> >> >> Jan Kara <[email protected]>
>> >> >> >> SUSE Labs, CR
>> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> >> >> From: Jan Kara <[email protected]>
>> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> >> >>
>> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> >> >> filesystems.
>> >> >> >
>> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
>> >> >> > for the data exposure was rejected from -stable (see [1]) because it
>> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >> >> >
>> >> >> > So both this and the previous patch need to be submitted.
>> >> >> >
>> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> >> >>
>> >> >> I'm just wondering if the Jan's patch is not related to blocked
>> >> >> processes in following trace. It very hard to hit it and I don't have
>> >> >> any reproducer.
>> >> >
>> >> > This looks like a different issue. Does the machine recover itself or is it
>> >> > a hard hang and you have to press a reset button?
>> >>
>> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
>> >> ram and machine has dedicated mount point only for PostgreSQL data.
>> >>
>> >> Nevertheless, I was able always to ssh to the machine, so machine itself
>> >> was not in hard hang and ext4 mostly gets recover by itself (it took
>> >> 30min). But I have seen situation, were every process who 'touch' the ext4
>> >> goes immediately to D state and does not recover even after hour.
>> >
>> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
>> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
>> > IO is happening in the system? That should tell us more.
>>
>>
>> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
>> to mail it.
>>
>> http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html
>
> Can you upload it again please? I've got to looking at the file only today
> and it is already deleted. Thanks!

http://expirebox.com/download/c010e712e55938435c446cdc01a0b523.html

>> I was running iotop and there was traffic roughly ~20 KB/s write.
>>
>> What was bit more interesting, was looking at
>>
>> cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
>>
>> nr_drity had around 240 and was slowly counting up, but nr_writeback had
>> ~8800 and was stuck for 120s.
>
> Hum, interesting. This would suggest like IO completion got stuck for some
> reason. We'll see more from the stacktraces hopefully.

I have monitor /sys/kernel/debug/bdi/253:32/stats for 10 mins per 1 sec.
Values are all same as follows:

--[ Sun Jun 19 06:11:08 CEST 2016
BdiWriteback: 15840 kB
BdiReclaimable: 32320 kB
BdiDirtyThresh: 0 kB
DirtyThresh: 1048576 kB
BackgroundThresh: 131072 kB
BdiDirtied: 6131163680 kB
BdiWritten: 6130214880 kB
BdiWriteBandwidth: 324948 kBps
b_dirty: 2
b_io: 3
b_more_io: 0
bdi_list: 1
state: c

Maybe those values can cause issue and kicks in writeback to often and
block everyone else.

$ sysctl -a | grep dirty | grep -v ratio
vm.dirty_background_bytes = 134217728
vm.dirty_bytes = 1073741824
vm.dirty_expire_centisecs = 1500
vm.dirty_writeback_centisecs = 500

I even have output of command, if you're interested.

$ trace-cmd record -e ext4 -e jbd2 -e writeback -e block sleep 600

--
Nikola

2016-06-21 10:23:46

by Jan Kara

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

On Mon 20-06-16 14:59:57, Nikola Pajkovsky wrote:
> Jan Kara <[email protected]> writes:
> > On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
> >> Jan Kara <[email protected]> writes:
> >>
> >> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
> >> >> Jan Kara <[email protected]> writes:
> >> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> >> >> >> Holger Hoffst?tte <[email protected]> writes:
> >> >> >>
> >> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> >> >> >> > (snip)
> >> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> >> >> >> run finishes for it (which may take a while as our server room is currently
> >> >> >> >> moving to a different place).
> >> >> >> >>
> >> >> >> >> Honza
> >> >> >> >> --
> >> >> >> >> Jan Kara <[email protected]>
> >> >> >> >> SUSE Labs, CR
> >> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> >> >> >> From: Jan Kara <[email protected]>
> >> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >> >> >> >>
> >> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> >> >> >> filesystems.
> >> >> >> >
> >> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
> >> >> >> > for the data exposure was rejected from -stable (see [1]) because it
> >> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> >> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >> >> >> >
> >> >> >> > So both this and the previous patch need to be submitted.
> >> >> >> >
> >> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
> >> >> >>
> >> >> >> I'm just wondering if the Jan's patch is not related to blocked
> >> >> >> processes in following trace. It very hard to hit it and I don't have
> >> >> >> any reproducer.
> >> >> >
> >> >> > This looks like a different issue. Does the machine recover itself or is it
> >> >> > a hard hang and you have to press a reset button?
> >> >>
> >> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
> >> >> ram and machine has dedicated mount point only for PostgreSQL data.
> >> >>
> >> >> Nevertheless, I was able always to ssh to the machine, so machine itself
> >> >> was not in hard hang and ext4 mostly gets recover by itself (it took
> >> >> 30min). But I have seen situation, were every process who 'touch' the ext4
> >> >> goes immediately to D state and does not recover even after hour.
> >> >
> >> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
> >> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
> >> > IO is happening in the system? That should tell us more.
> >>
> >>
> >> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
> >> to mail it.
> >>
> >> http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html
> >
> > Can you upload it again please? I've got to looking at the file only today
> > and it is already deleted. Thanks!
>
> http://expirebox.com/download/c010e712e55938435c446cdc01a0b523.html

OK, I had a look into the traces and JBD2 thread just waits for the buffers
is has submitted for IO to complete. The rest is just blocked on that. From
the message "INFO: task jbd2/vdc-8:4710 blocked for more than 120 seconds.
severity=err" we can see that the JBD2 process has been waiting for a
significant amount of time. Now the question is why it takes so long for
the IO to complete - likely not a fs problem but somewhere below - block
layer or the storage itself.

What is the underlying storage? And what IO scheduler do you use? Seeing
that the device is 'vdc' - that suggests you are running in a guest - is
there anything interesting happening on the host at that moment? Is IO from
other guests / the host stalled at that moment as well?

> >> I was running iotop and there was traffic roughly ~20 KB/s write.
> >>
> >> What was bit more interesting, was looking at
> >>
> >> cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
> >>
> >> nr_drity had around 240 and was slowly counting up, but nr_writeback had
> >> ~8800 and was stuck for 120s.
> >
> > Hum, interesting. This would suggest like IO completion got stuck for some
> > reason. We'll see more from the stacktraces hopefully.
>
> I have monitor /sys/kernel/debug/bdi/253:32/stats for 10 mins per 1 sec.
> Values are all same as follows:
>
> --[ Sun Jun 19 06:11:08 CEST 2016
> BdiWriteback: 15840 kB
> BdiReclaimable: 32320 kB
> BdiDirtyThresh: 0 kB
> DirtyThresh: 1048576 kB
> BackgroundThresh: 131072 kB
> BdiDirtied: 6131163680 kB
> BdiWritten: 6130214880 kB
> BdiWriteBandwidth: 324948 kBps
> b_dirty: 2
> b_io: 3
> b_more_io: 0
> bdi_list: 1
> state: c

OK, so all the IO looks stalled for that period of it.

> Maybe those values can cause issue and kicks in writeback to often and
> block everyone else.
>
> $ sysctl -a | grep dirty | grep -v ratio
> vm.dirty_background_bytes = 134217728
> vm.dirty_bytes = 1073741824
> vm.dirty_expire_centisecs = 1500
> vm.dirty_writeback_centisecs = 500

This looks healthy.

> I even have output of command, if you're interested.
>
> $ trace-cmd record -e ext4 -e jbd2 -e writeback -e block sleep 600

Traces from block layer may be interesting but you'd need the trace started
before the hang starts so that you see what happened with the IO that
jbd2/vdc-8:4710 is waiting for.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-22 09:02:58

by Nikola Pajkovsky

[permalink] [raw]
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Jan Kara <[email protected]> writes:

> On Mon 20-06-16 14:59:57, Nikola Pajkovsky wrote:
>> Jan Kara <[email protected]> writes:
>> > On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
>> >> Jan Kara <[email protected]> writes:
>> >>
>> >> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
>> >> >> Jan Kara <[email protected]> writes:
>> >> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> >> >> >> Holger Hoffstätte <[email protected]> writes:
>> >> >> >>
>> >> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> >> >> >> > (snip)
>> >> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> >> >> >> run finishes for it (which may take a while as our server room is currently
>> >> >> >> >> moving to a different place).
>> >> >> >> >>
>> >> >> >> >> Honza
>> >> >> >> >> --
>> >> >> >> >> Jan Kara <[email protected]>
>> >> >> >> >> SUSE Labs, CR
>> >> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> >> >> >> From: Jan Kara <[email protected]>
>> >> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> >> >> >>
>> >> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> >> >> >> filesystems.
>> >> >> >> >
>> >> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
>> >> >> >> > for the data exposure was rejected from -stable (see [1]) because it
>> >> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> >> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >> >> >> >
>> >> >> >> > So both this and the previous patch need to be submitted.
>> >> >> >> >
>> >> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> >> >> >>
>> >> >> >> I'm just wondering if the Jan's patch is not related to blocked
>> >> >> >> processes in following trace. It very hard to hit it and I don't have
>> >> >> >> any reproducer.
>> >> >> >
>> >> >> > This looks like a different issue. Does the machine recover itself or is it
>> >> >> > a hard hang and you have to press a reset button?
>> >> >>
>> >> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
>> >> >> ram and machine has dedicated mount point only for PostgreSQL data.
>> >> >>
>> >> >> Nevertheless, I was able always to ssh to the machine, so machine itself
>> >> >> was not in hard hang and ext4 mostly gets recover by itself (it took
>> >> >> 30min). But I have seen situation, were every process who 'touch' the ext4
>> >> >> goes immediately to D state and does not recover even after hour.
>> >> >
>> >> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
>> >> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
>> >> > IO is happening in the system? That should tell us more.
>> >>
>> >>
>> >> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
>> >> to mail it.
>> >>
>> >> http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html
>> >
>> > Can you upload it again please? I've got to looking at the file only today
>> > and it is already deleted. Thanks!
>>
>> http://expirebox.com/download/c010e712e55938435c446cdc01a0b523.html
>
> OK, I had a look into the traces and JBD2 thread just waits for the buffers
> is has submitted for IO to complete. The rest is just blocked on that. From
> the message "INFO: task jbd2/vdc-8:4710 blocked for more than 120 seconds.
> severity=err" we can see that the JBD2 process has been waiting for a
> significant amount of time. Now the question is why it takes so long for
> the IO to complete - likely not a fs problem but somewhere below - block
> layer or the storage itself.
>
> What is the underlying storage? And what IO scheduler do you use? Seeing
> that the device is 'vdc' - that suggests you are running in a guest - is
> there anything interesting happening on the host at that moment? Is IO from
> other guests / the host stalled at that moment as well?

The underlying storage is 24 disks in hw raid6 with 64k stripe. LVM is
used to manage partitions for virt guests. Guests see just block dev
which is formatted in guest with jsize=2048 and with mount options
rw,noatime,nodiratime,user_xattr,acl.

Two guests running 3.18.34 with virtio-blk and hence uses multiqueue and
if I remember correctly, no IO sched is used for multiqueue. Each 18
vcpu, 160GB ram and only Postgresql uses /dev/vdc.

There are two other guests, much smaller running standard rhel6 kernel
with deadline IO sched.

We're plotting IO and read/write throughput from host and when processes
get blocked in one guest, we don't see any traffic going down to the
host/raid6. The other guests are running just fine, because we don't get
any blocked processes from other guests.

>> >> I was running iotop and there was traffic roughly ~20 KB/s write.
>> >>
>> >> What was bit more interesting, was looking at
>> >>
>> >> cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
>> >>
>> >> nr_drity had around 240 and was slowly counting up, but nr_writeback had
>> >> ~8800 and was stuck for 120s.
>> >
>> > Hum, interesting. This would suggest like IO completion got stuck for some
>> > reason. We'll see more from the stacktraces hopefully.
>>
>> I have monitor /sys/kernel/debug/bdi/253:32/stats for 10 mins per 1 sec.
>> Values are all same as follows:
>>
>> --[ Sun Jun 19 06:11:08 CEST 2016
>> BdiWriteback: 15840 kB
>> BdiReclaimable: 32320 kB
>> BdiDirtyThresh: 0 kB
>> DirtyThresh: 1048576 kB
>> BackgroundThresh: 131072 kB
>> BdiDirtied: 6131163680 kB
>> BdiWritten: 6130214880 kB
>> BdiWriteBandwidth: 324948 kBps
>> b_dirty: 2
>> b_io: 3
>> b_more_io: 0
>> bdi_list: 1
>> state: c
>
> OK, so all the IO looks stalled for that period of it.
>
>> Maybe those values can cause issue and kicks in writeback to often and
>> block everyone else.
>>
>> $ sysctl -a | grep dirty | grep -v ratio
>> vm.dirty_background_bytes = 134217728
>> vm.dirty_bytes = 1073741824
>> vm.dirty_expire_centisecs = 1500
>> vm.dirty_writeback_centisecs = 500
>
> This looks healthy.
>
>> I even have output of command, if you're interested.
>>
>> $ trace-cmd record -e ext4 -e jbd2 -e writeback -e block sleep 600
>
> Traces from block layer may be interesting but you'd need the trace started
> before the hang starts so that you see what happened with the IO that
> jbd2/vdc-8:4710 is waiting for.
>
> Honza

--
Nikola