I've been trying to figure out why generic/299 has occasionally been
stalling forever. After taking a closer look, it appears the problem
is that the fio process is stalling in userspace. Looking at the ps
listing, the fio process hasn't run in over six hours, and using
attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
that I started seeing this when I started using a newer version of
fio. So I'm going to try roll back to an older version of fio and see
if that causes the problem to go away.
- Ted
* Theodore Ts'o <[email protected]>:
> I've been trying to figure out why generic/299 has occasionally been
> stalling forever. After taking a closer look, it appears the problem
> is that the fio process is stalling in userspace. Looking at the ps
> listing, the fio process hasn't run in over six hours, and using
> attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
>
> Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
> that I started seeing this when I started using a newer version of
> fio. So I'm going to try roll back to an older version of fio and see
> if that causes the problem to go away.
>
I've not seen that, but have only been running 4.1-rc* kernels recently. I
can say that I ran 1000 trials of generic/299 on nojournal using a vanilla
4.1-rc6 kernel on a Pandaboard successfully with fio 2.2.6. No failures, and
completed in ~61 hours for an average test duration of ~3.7 minutes, which
agrees well with what I see when sampling the logged output for individual
test times.
Eric
On Thu, Jun 18, 2015 at 11:53:37AM -0400, Theodore Ts'o wrote:
> I've been trying to figure out why generic/299 has occasionally been
> stalling forever. After taking a closer look, it appears the problem
> is that the fio process is stalling in userspace. Looking at the ps
> listing, the fio process hasn't run in over six hours, and using
> attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
>
> Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
> that I started seeing this when I started using a newer version of
> fio. So I'm going to try roll back to an older version of fio and see
> if that causes the problem to go away.
I'm running on fio 2.1.3 at the moment and I havne't seen any
problems like this for months. Keep in mind that fio does tend to
break in strange ways fairly regularly, so I'd suggest an
upgrade/downgrade of fio as your first move.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Fri, Jun 19, 2015 at 09:34:30AM +1000, Dave Chinner wrote:
> On Thu, Jun 18, 2015 at 11:53:37AM -0400, Theodore Ts'o wrote:
> > I've been trying to figure out why generic/299 has occasionally been
> > stalling forever. After taking a closer look, it appears the problem
> > is that the fio process is stalling in userspace. Looking at the ps
> > listing, the fio process hasn't run in over six hours, and using
> > attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
> >
> > Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
> > that I started seeing this when I started using a newer version of
> > fio. So I'm going to try roll back to an older version of fio and see
> > if that causes the problem to go away.
>
> I'm running on fio 2.1.3 at the moment and I havne't seen any
> problems like this for months. Keep in mind that fio does tend to
> break in strange ways fairly regularly, so I'd suggest an
> upgrade/downgrade of fio as your first move.
Yep, I've rolled back to fio-2.1.13-77-g0f9940a, which is what I had
been using previously for a long time, and the problem with
generic/299 seems to have gone away. I had hoped that the days when
new releases of fio wouldn't intrdouce random new bugs was a temporary
aberration, but obviously that was a bit over-optimistic.
- Ted
On Fri, Jun 19, 2015 at 09:34:30AM +1000, Dave Chinner wrote:
> On Thu, Jun 18, 2015 at 11:53:37AM -0400, Theodore Ts'o wrote:
> > I've been trying to figure out why generic/299 has occasionally been
> > stalling forever. After taking a closer look, it appears the problem
> > is that the fio process is stalling in userspace. Looking at the ps
> > listing, the fio process hasn't run in over six hours, and using
> > attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
> >
> > Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
> > that I started seeing this when I started using a newer version of
> > fio. So I'm going to try roll back to an older version of fio and see
> > if that causes the problem to go away.
>
> I'm running on fio 2.1.3 at the moment and I havne't seen any
> problems like this for months. Keep in mind that fio does tend to
> break in strange ways fairly regularly, so I'd suggest an
> upgrade/downgrade of fio as your first move.
Out of curiosity, Dave, are you still using fio 2.1.3? I had upgraded
to the latest fio to fix other test breaks, and I'm stil seeing the
occasional generic/299 test failure. In fact, it's been happening
often enough on one of my test platforms[1] that I decided to really
dig down and investigate it, and all of the threads were blocking on
td->verify_cond in fio's verify.c.
It bisected down to this commit:
commit e5437a073e658e8154b9e87bab5c7b3b06ed4255
Author: Vasily Tarasov <[email protected]>
Date: Sun Nov 9 20:22:24 2014 -0700
Fix for a race when fio prints I/O statistics periodically
Below is the demonstration for the latest code in git:
...
So generic/299 passes reliably with this commits parent, and it fails
on this commit within a dozen tries or so. The commit first landed in
fio 2.1.14, so it's consistent with Dave's report a year ago he was
still using fio 2.1.3.
I haven't had time to do a deep analysis on what fio/verify.c does, or
the above patch, but the good news is that when fio hangs, it's just a
userspace hang, so I can log into machine and attach a gdb to the
process. The code in question isn't very well documented, so I'm
sending this out in the hopes that Jens and Vasily might see something
obvious, and because I'm curious whether anyone else has seen this
(since it seems to be a timing-related race in fio, so it's likely a
file system independent issue).
Thanks,
- Ted
[1] When running xfstests in a Google Compute Engine VM with a
SSD-backed Persistent disk, using a n1-standard-2 machine type with a
recent kernel testing with ext4, the command "gce-xfstests -C 100
generic/299" will hang within a dozen runs of the test, so -C 100 to
run the test a hundred times was definitely overkill --- in fact
usually in fio would hang after less than a half-dozen runs.
My bisecting technique (using the infrastructure at
https://github.com/tytso/xfstests-bld) was:
./build-all --fio-only
make tarball
gce-xfstests --update-xfstests -C 100 generic/299
and then wait for an hour or so and see whether or not fio was hanging
or not, and then follow it up with "(cd fio ; git bisect good)" or
"(cd fio ; git bisect bad)" as appropriate. I was using a Debian
jessie build chroot to compile fio and all of xfstests-bld.
On 09/28/2016 10:37 PM, Theodore Ts'o wrote:
> On Fri, Jun 19, 2015 at 09:34:30AM +1000, Dave Chinner wrote:
>> On Thu, Jun 18, 2015 at 11:53:37AM -0400, Theodore Ts'o wrote:
>>> I've been trying to figure out why generic/299 has occasionally been
>>> stalling forever. After taking a closer look, it appears the problem
>>> is that the fio process is stalling in userspace. Looking at the ps
>>> listing, the fio process hasn't run in over six hours, and using
>>> attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
>>>
>>> Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
>>> that I started seeing this when I started using a newer version of
>>> fio. So I'm going to try roll back to an older version of fio and see
>>> if that causes the problem to go away.
>>
>> I'm running on fio 2.1.3 at the moment and I havne't seen any
>> problems like this for months. Keep in mind that fio does tend to
>> break in strange ways fairly regularly, so I'd suggest an
>> upgrade/downgrade of fio as your first move.
>
> Out of curiosity, Dave, are you still using fio 2.1.3? I had upgraded
> to the latest fio to fix other test breaks, and I'm stil seeing the
> occasional generic/299 test failure. In fact, it's been happening
> often enough on one of my test platforms[1] that I decided to really
> dig down and investigate it, and all of the threads were blocking on
> td->verify_cond in fio's verify.c.
>
> It bisected down to this commit:
>
> commit e5437a073e658e8154b9e87bab5c7b3b06ed4255
> Author: Vasily Tarasov <[email protected]>
> Date: Sun Nov 9 20:22:24 2014 -0700
>
> Fix for a race when fio prints I/O statistics periodically
>
> Below is the demonstration for the latest code in git:
> ...
>
> So generic/299 passes reliably with this commits parent, and it fails
> on this commit within a dozen tries or so. The commit first landed in
> fio 2.1.14, so it's consistent with Dave's report a year ago he was
> still using fio 2.1.3.
>
> I haven't had time to do a deep analysis on what fio/verify.c does, or
> the above patch, but the good news is that when fio hangs, it's just a
> userspace hang, so I can log into machine and attach a gdb to the
> process. The code in question isn't very well documented, so I'm
> sending this out in the hopes that Jens and Vasily might see something
> obvious, and because I'm curious whether anyone else has seen this
> (since it seems to be a timing-related race in fio, so it's likely a
> file system independent issue).
>
> Thanks,
>
> - Ted
>
> [1] When running xfstests in a Google Compute Engine VM with a
> SSD-backed Persistent disk, using a n1-standard-2 machine type with a
> recent kernel testing with ext4, the command "gce-xfstests -C 100
> generic/299" will hang within a dozen runs of the test, so -C 100 to
> run the test a hundred times was definitely overkill --- in fact
> usually in fio would hang after less than a half-dozen runs.
>
> My bisecting technique (using the infrastructure at
> https://github.com/tytso/xfstests-bld) was:
>
> ./build-all --fio-only
> make tarball
> gce-xfstests --update-xfstests -C 100 generic/299
>
> and then wait for an hour or so and see whether or not fio was hanging
> or not, and then follow it up with "(cd fio ; git bisect good)" or
> "(cd fio ; git bisect bad)" as appropriate. I was using a Debian
> jessie build chroot to compile fio and all of xfstests-bld.
Sorry for being late to the party here, was away. This issue should be
fixed in newer versions. Can you update to current -git and check that
it works without hangs?
--
Jens Axboe
On Thu, Sep 29, 2016 at 12:37:22AM -0400, Theodore Ts'o wrote:
> On Fri, Jun 19, 2015 at 09:34:30AM +1000, Dave Chinner wrote:
> > On Thu, Jun 18, 2015 at 11:53:37AM -0400, Theodore Ts'o wrote:
> > > I've been trying to figure out why generic/299 has occasionally been
> > > stalling forever. After taking a closer look, it appears the problem
> > > is that the fio process is stalling in userspace. Looking at the ps
> > > listing, the fio process hasn't run in over six hours, and using
> > > attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
> > >
> > > Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
> > > that I started seeing this when I started using a newer version of
> > > fio. So I'm going to try roll back to an older version of fio and see
> > > if that causes the problem to go away.
> >
> > I'm running on fio 2.1.3 at the moment and I havne't seen any
> > problems like this for months. Keep in mind that fio does tend to
> > break in strange ways fairly regularly, so I'd suggest an
> > upgrade/downgrade of fio as your first move.
>
> Out of curiosity, Dave, are you still using fio 2.1.3? I had upgraded
No.
$ fio -v
fio-2.1.11
$
> to the latest fio to fix other test breaks, and I'm stil seeing the
> occasional generic/299 test failure. In fact, it's been happening
> often enough on one of my test platforms[1] that I decided to really
> dig down and investigate it, and all of the threads were blocking on
> td->verify_cond in fio's verify.c.
>
> It bisected down to this commit:
>
> commit e5437a073e658e8154b9e87bab5c7b3b06ed4255
> Author: Vasily Tarasov <[email protected]>
> Date: Sun Nov 9 20:22:24 2014 -0700
>
> Fix for a race when fio prints I/O statistics periodically
>
> Below is the demonstration for the latest code in git:
> ...
>
> So generic/299 passes reliably with this commits parent, and it fails
> on this commit within a dozen tries or so. The commit first landed in
> fio 2.1.14, so it's consistent with Dave's report a year ago he was
> still using fio 2.1.3.
But I'm still not using a fio recent enough to hit this.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On 10/12/2016 03:14 PM, Dave Chinner wrote:
> On Thu, Sep 29, 2016 at 12:37:22AM -0400, Theodore Ts'o wrote:
>> On Fri, Jun 19, 2015 at 09:34:30AM +1000, Dave Chinner wrote:
>>> On Thu, Jun 18, 2015 at 11:53:37AM -0400, Theodore Ts'o wrote:
>>>> I've been trying to figure out why generic/299 has occasionally been
>>>> stalling forever. After taking a closer look, it appears the problem
>>>> is that the fio process is stalling in userspace. Looking at the ps
>>>> listing, the fio process hasn't run in over six hours, and using
>>>> attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
>>>>
>>>> Has anyone else seen this? I'm using fio 2.2.6, and I have a feeling
>>>> that I started seeing this when I started using a newer version of
>>>> fio. So I'm going to try roll back to an older version of fio and see
>>>> if that causes the problem to go away.
>>>
>>> I'm running on fio 2.1.3 at the moment and I havne't seen any
>>> problems like this for months. Keep in mind that fio does tend to
>>> break in strange ways fairly regularly, so I'd suggest an
>>> upgrade/downgrade of fio as your first move.
>>
>> Out of curiosity, Dave, are you still using fio 2.1.3? I had upgraded
>
> No.
>
> $ fio -v
> fio-2.1.11
> $
>
>> to the latest fio to fix other test breaks, and I'm stil seeing the
>> occasional generic/299 test failure. In fact, it's been happening
>> often enough on one of my test platforms[1] that I decided to really
>> dig down and investigate it, and all of the threads were blocking on
>> td->verify_cond in fio's verify.c.
>>
>> It bisected down to this commit:
>>
>> commit e5437a073e658e8154b9e87bab5c7b3b06ed4255
>> Author: Vasily Tarasov <[email protected]>
>> Date: Sun Nov 9 20:22:24 2014 -0700
>>
>> Fix for a race when fio prints I/O statistics periodically
>>
>> Below is the demonstration for the latest code in git:
>> ...
>>
>> So generic/299 passes reliably with this commits parent, and it fails
>> on this commit within a dozen tries or so. The commit first landed in
>> fio 2.1.14, so it's consistent with Dave's report a year ago he was
>> still using fio 2.1.3.
>
> But I'm still not using a fio recent enough to hit this.
FWIW, this is the commit that fixes it:
commit 39d13e67ef1f4b327c68431f8daf033a03920117
Author: Jens Axboe <[email protected]>
Date: Fri Aug 26 14:39:30 2016 -0600
backend: check if we need to update rusage stats, if stat_mutex is busy
2.14 and newer should not have the problem, but earlier versions may
depending on how old...
--
Jens Axboe
On Wed, Oct 12, 2016 at 03:19:25PM -0600, Jens Axboe wrote:
>
> FWIW, this is the commit that fixes it:
>
> commit 39d13e67ef1f4b327c68431f8daf033a03920117
> Author: Jens Axboe <[email protected]>
> Date: Fri Aug 26 14:39:30 2016 -0600
>
> backend: check if we need to update rusage stats, if stat_mutex is busy
>
> 2.14 and newer should not have the problem, but earlier versions may
> depending on how old...
Unfortunately, I'm still seeing hangs in generic/299 with the latest version of fio:
fio fio-2.14-10-g0a301e9 (Fri, 23 Sep 2016 11:57:00 -0600)
If I use an older fio, it reliably does not hang. What can I do to
help debug this?
As I said, I can attach to the hanging fio using a gdb and give you
stackdumps for all of the threads if that would be helpful.
- Ted
On 10/12/2016 08:15 PM, Theodore Ts'o wrote:
> On Wed, Oct 12, 2016 at 03:19:25PM -0600, Jens Axboe wrote:
>>
>> FWIW, this is the commit that fixes it:
>>
>> commit 39d13e67ef1f4b327c68431f8daf033a03920117
>> Author: Jens Axboe <[email protected]>
>> Date: Fri Aug 26 14:39:30 2016 -0600
>>
>> backend: check if we need to update rusage stats, if stat_mutex is busy
>>
>> 2.14 and newer should not have the problem, but earlier versions may
>> depending on how old...
>
> Unfortunately, I'm still seeing hangs in generic/299 with the latest version of fio:
>
> fio fio-2.14-10-g0a301e9 (Fri, 23 Sep 2016 11:57:00 -0600)
Well, that's no good.
> If I use an older fio, it reliably does not hang. What can I do to
> help debug this?
If I just wanted to invoke generic/299 with a setup similar to yours,
what should I do? I can attempt to reproduce it.
> As I said, I can attach to the hanging fio using a gdb and give you
> stackdumps for all of the threads if that would be helpful.
That would definitely be helpful as well, especially if I cannot
reproduce it.
--
Jens Axboe
On Wed, Oct 12, 2016 at 10:15:52PM -0400, Theodore Ts'o wrote:
> On Wed, Oct 12, 2016 at 03:19:25PM -0600, Jens Axboe wrote:
> >
> > FWIW, this is the commit that fixes it:
> >
> > commit 39d13e67ef1f4b327c68431f8daf033a03920117
> > Author: Jens Axboe <[email protected]>
> > Date: Fri Aug 26 14:39:30 2016 -0600
> >
> > backend: check if we need to update rusage stats, if stat_mutex is busy
> >
> > 2.14 and newer should not have the problem, but earlier versions may
> > depending on how old...
>
> Unfortunately, I'm still seeing hangs in generic/299 with the latest version of fio:
>
> fio fio-2.14-10-g0a301e9 (Fri, 23 Sep 2016 11:57:00 -0600)
>
> If I use an older fio, it reliably does not hang. What can I do to
> help debug this?
>
> As I said, I can attach to the hanging fio using a gdb and give you
> stackdumps for all of the threads if that would be helpful.
I don't have git fio hangs on 299 (just a lot of "fallocate: No space left on
device" in results/generic/299.full):
# fio -v
fio-2.14-27-gafd2c
# mount tmpfs -t tmpfs -o size=26g /ramdisk/
# fallocate -l 12g /ramdisk/testvol1 && losetup -f /ramdisk/testvol1
# fallocate -l 12g /ramdisk/scratch0 && losetup -f /ramdisk/scratch0
root@ttip:/home/mator/xfstests-dev# cat local.config
export TEST_DEV=/dev/loop0
export TEST_DIR=/testvol
export SCRATCH_DEV_POOL="/dev/loop1"
#export SCRATCH_DEV_POOL="/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3"
export SCRATCH_MNT=/mnt/scratch
# mkfs.xfs -f /dev/loop0
meta-data=/dev/loop0 isize=512 agcount=4, agsize=786432 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=0, rmapbt=0
data = bsize=4096 blocks=3145728, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
root@ttip:/home/mator/xfstests-dev# ./check generic/299
FSTYP -- xfs (non-debug)
PLATFORM -- Linux/sparc64 ttip 4.8.0+
MKFS_OPTIONS -- -f -bsize=4096 /dev/loop1
MOUNT_OPTIONS -- /dev/loop1 /mnt/scratch
generic/299 103s
Ran: generic/299
Passed all 1 tests
kernel logs:
$ journalctl -k -e
Oct 13 15:35:41 ttip unknown: run fstests generic/299 at 2016-10-13 15:35:41
Oct 13 15:35:41 ttip kernel: XFS (loop1): Unmounting Filesystem
Oct 13 15:35:42 ttip kernel: XFS (loop1): Mounting V5 Filesystem
Oct 13 15:35:42 ttip kernel: XFS (loop1): Ending clean mount
Oct 13 15:37:24 ttip kernel: XFS (loop1): Unmounting Filesystem
Oct 13 15:37:25 ttip kernel: XFS (loop0): Unmounting Filesystem
results/generic/299.full in attachment
On Thu, Oct 13, 2016 at 04:08:37PM +0300, Anatoly Pugachev wrote:
> On Wed, Oct 12, 2016 at 10:15:52PM -0400, Theodore Ts'o wrote:
> > On Wed, Oct 12, 2016 at 03:19:25PM -0600, Jens Axboe wrote:
> > >
> > > FWIW, this is the commit that fixes it:
> > >
> > > commit 39d13e67ef1f4b327c68431f8daf033a03920117
> > > Author: Jens Axboe <[email protected]>
> > > Date: Fri Aug 26 14:39:30 2016 -0600
> > >
> > > backend: check if we need to update rusage stats, if stat_mutex is busy
> > >
> > > 2.14 and newer should not have the problem, but earlier versions may
> > > depending on how old...
> >
> > Unfortunately, I'm still seeing hangs in generic/299 with the latest version of fio:
> >
> > fio fio-2.14-10-g0a301e9 (Fri, 23 Sep 2016 11:57:00 -0600)
> >
> > If I use an older fio, it reliably does not hang. What can I do to
> > help debug this?
> >
> > As I said, I can attach to the hanging fio using a gdb and give you
> > stackdumps for all of the threads if that would be helpful.
I'm sorry, didn't read that it only with ext4, it hangs for me with ext4 as
well.
# mkfs.ext4 /dev/loop0
mke2fs 1.43.3 (04-Sep-2016)
/dev/loop0 contains a xfs file system
Proceed anyway? (y,n) y
Discarding device blocks: done
Creating filesystem with 3145728 4k blocks and 786432 inodes
Filesystem UUID: 18abfd64-3395-43c4-9c7d-f2c312e7516d
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
root@ttip:/home/mator/xfstests-dev# ./check generic/299
FSTYP -- ext4
PLATFORM -- Linux/sparc64 ttip 4.8.0+
MKFS_OPTIONS -- /dev/loop1
MOUNT_OPTIONS -- -o acl,user_xattr /dev/loop1 /mnt/scratch
generic/299 103s ... 104s
(hangs)
on another console:
$ pstree -Alap
| `-check,219419 ./check generic/299
| `-fsck,221209 -t ext4 -nf /dev/loop1
| `-fsck.ext4,221210 -nf /dev/loop1
fsck.ext4 can't finish its work
# strace \-p 221210
strace: Process 221210 attached
pread64(4, "\n\363R\0T\1\0\0\0\0\0\0\37\350\"\0\1\0\0\0x+\26\0*\350\"\0\1\0\0\0"..., 4096, 12793810944) = 4096
pread64(4, "\n\363\37\1T\1\0\0\0\0\0\0\260\356\"\0\1\0\0\0\227\256/\0\261\356\"\0\1\0\0\0"..., 4096, 3918262272) = 4096
pread64(4, "\n\363\357\0T\1\0\0\0\0045\254\32\6#\0\1\0\0\0\275\322 \0%\6#\0\1\0\0\0"..., 4096, 974860288) = 4096
pread64(4, "\n\363\226\0T\1\0\0\0\0\0\2\377\31#\0\1\0\0\0\311\257/\0\33\32#\0\1\0\0\0"..., 4096, 3927293952) = 4096
brk(0x10000d08000) = 0x10000d08000
pread64(4, "\n\363\306\0T\1\0\0\0\0\0\2L%#\0\1\0\0\0<3\10\0Y%#\0\1\0\0\0"..., 4096, 6654865408) = 4096
pread64(4, "\n\363\3\0T\1\0\0\0\0\0\0\2124#\0\1\0\0\0\272\263\2\0\2344#\0\1\0\0\0"..., 4096, 3800674304) = 4096
pread64(4, "\n\363\254\0T\1\0\0!\277\301~\3074#\0\1\0\0\0\"\310\16\0\3514#\0\1\0\0\0"..., 4096, 2717888512) = 4096
pread64(4, "\n\363Q\1T\1\0\0\255\255\255\255\307B#\0\1\0\0\0S\231\6\0\313B#\0\1\0\0\0"..., 4096, 4038303744) = 4096
pread64(4, "\n\363/\0T\1\0\0&o\r\201e^#\0\1\0\0\0\233\237\f\0\202^#\0\1\0\0\0"..., 4096, 4184920064) = 4096
pread64(4, "\n\363\244\0T\1\0\0\305\305\305\305\266b#\0\1\0\0\0n]\21\0\322b#\0\1\0\0\0"..., 4096, 6043922432) = 4096
pread64(4, "\n\363\"\1T\1\0\0\0\0\0\2\310q#\0\1\0\0\0 \337\26\0\335q#\0\1\0\0\0"..., 4096, 4194627584) = 4096
pread64(4, "\n\363\376\0T\1\0\0\0\0\0\2F\211#\0\1\0\0\0 \256/\0M\211#\0\1\0\0\0"..., 4096, 6049001472) = 4096
pread64(4, "\n\363]\0T\1\0\0\0\0\0\1\16\235#\0\1\0\0\0\272_\30\0\20\235#\0\1\0\0\0"..., 4096, 6567260160) = 4096
pread64(4, "\n\363\343\0T\1\0\0eeee\317\243#\0\1\0\0\0\264\222\20\0\332\243#\0\1\0\0\0"..., 4096, 6083014656) = 4096
pread64(4, "\n\363\332\0T\1\0\0\36\255\246d\216\264#\0\1\0\0\0.\212\17\0\242\264#\0\1\0\0\0"..., 4096, 8083423232) = 4096
pread64(4, "\n\363\334\0T\1\0\0EEEEW\306#\0\1\0\0\0^\273\20\0g\306#\0\1\0\0\0"..., 4096, 6083252224) = 4096
pread64(4, "\n\363\333\0T\1\0\0\0\0\0\0@\330#\0\1\0\0\0Z\353\23\0N\330#\0\1\0\0\0"..., 4096, 7774924800) = 4096
pread64(4, "\n\363x\0T\1\0\0{{{{H\347#\0\1\0\0\0\232~\27\0M\347#\0\1\0\0\0"..., 4096, 4735361024) = 4096
pread64(4, "\n\363\367\0T\1\0\0\377\377\377\377#\361#\0\1\0\0\0O\t\27\0b\361#\0\1\0\0\0"..., 4096, 4448808960) = 4096
pread64(4, "\n\363\t\0T\1\0\0aaaa\271\2$\0\1\0\0\0\351I\21\0\340\2$\0\1\0\0\0"..., 4096, 5948190720) = 4096
pread64(4, "\n\363\25\0T\1\0\0\0\0\0\0@\3$\0\1\0\0\0t\33.\0D\3$\0\1\0\0\0"..., 4096, 1714593792) = 4096
pread64(4, "\n\363b\0T\1\0\0\0\0\0\0g\4$\0\1\0\0\0\332\366\0\0n\4$\0\1\0\0\0"..., 4096, 1635794944) = 4096
pread64(4, "\n\363)\1T\1\0\0\0\0\0\0%\v$\0\1\0\0\0e\230\17\0003\v$\0\1\0\0\0"..., 4096, 2111643648) = 4096
pread64(4, "\n\363<\0T\1\0\0\0\0\0\0D $\0\1\0\0\0\1\355 \0T $\0\1\0\0\0"..., 4096, 165666816) = 4096
brk(0x10000d2a000) = 0x10000d2a000
pread64(4, "\n\363\n\0T\1\0\0\301\301\301\3012%$\0\1\0\0\0r\37\22\0004%$\0\1\0\0\0"..., 4096, 4871196672) = 4096
pread64(4, "\n\363\225\0T\1\0\0\0\0\0\1(&$\0\1\0\0\0\213\"\22\0@&$\0\1\0\0\0"..., 4096, 5401088000) = 4096
^Cstrace: Process 221210 detached
$ dpkg -l e2fsprogs
ii e2fsprogs 1.43.3-1 sparc64 ext2/ext3/ext4 file system utilities
299.full in attachment
On 10/13/2016 07:36 AM, Anatoly Pugachev wrote:
> On Thu, Oct 13, 2016 at 04:08:37PM +0300, Anatoly Pugachev wrote:
>> On Wed, Oct 12, 2016 at 10:15:52PM -0400, Theodore Ts'o wrote:
>>> On Wed, Oct 12, 2016 at 03:19:25PM -0600, Jens Axboe wrote:
>>>>
>>>> FWIW, this is the commit that fixes it:
>>>>
>>>> commit 39d13e67ef1f4b327c68431f8daf033a03920117
>>>> Author: Jens Axboe <[email protected]>
>>>> Date: Fri Aug 26 14:39:30 2016 -0600
>>>>
>>>> backend: check if we need to update rusage stats, if stat_mutex is busy
>>>>
>>>> 2.14 and newer should not have the problem, but earlier versions may
>>>> depending on how old...
>>>
>>> Unfortunately, I'm still seeing hangs in generic/299 with the latest version of fio:
>>>
>>> fio fio-2.14-10-g0a301e9 (Fri, 23 Sep 2016 11:57:00 -0600)
>>>
>>> If I use an older fio, it reliably does not hang. What can I do to
>>> help debug this?
>>>
>>> As I said, I can attach to the hanging fio using a gdb and give you
>>> stackdumps for all of the threads if that would be helpful.
>
> I'm sorry, didn't read that it only with ext4, it hangs for me with ext4 as
> well.
>
> # mkfs.ext4 /dev/loop0
> mke2fs 1.43.3 (04-Sep-2016)
> /dev/loop0 contains a xfs file system
> Proceed anyway? (y,n) y
> Discarding device blocks: done
> Creating filesystem with 3145728 4k blocks and 786432 inodes
> Filesystem UUID: 18abfd64-3395-43c4-9c7d-f2c312e7516d
> Superblock backups stored on blocks:
> 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208
>
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> root@ttip:/home/mator/xfstests-dev# ./check generic/299
> FSTYP -- ext4
> PLATFORM -- Linux/sparc64 ttip 4.8.0+
> MKFS_OPTIONS -- /dev/loop1
> MOUNT_OPTIONS -- -o acl,user_xattr /dev/loop1 /mnt/scratch
>
> generic/299 103s ... 104s
>
> (hangs)
>
> on another console:
>
> $ pstree -Alap
>
> | `-check,219419 ./check generic/299
> | `-fsck,221209 -t ext4 -nf /dev/loop1
> | `-fsck.ext4,221210 -nf /dev/loop1
>
> fsck.ext4 can't finish its work
>
> # strace \-p 221210
> strace: Process 221210 attached
> pread64(4, "\n\363R\0T\1\0\0\0\0\0\0\37\350\"\0\1\0\0\0x+\26\0*\350\"\0\1\0\0\0"..., 4096, 12793810944) = 4096
> pread64(4, "\n\363\37\1T\1\0\0\0\0\0\0\260\356\"\0\1\0\0\0\227\256/\0\261\356\"\0\1\0\0\0"..., 4096, 3918262272) = 4096
> pread64(4, "\n\363\357\0T\1\0\0\0\0045\254\32\6#\0\1\0\0\0\275\322 \0%\6#\0\1\0\0\0"..., 4096, 974860288) = 4096
> pread64(4, "\n\363\226\0T\1\0\0\0\0\0\2\377\31#\0\1\0\0\0\311\257/\0\33\32#\0\1\0\0\0"..., 4096, 3927293952) = 4096
> brk(0x10000d08000) = 0x10000d08000
> pread64(4, "\n\363\306\0T\1\0\0\0\0\0\2L%#\0\1\0\0\0<3\10\0Y%#\0\1\0\0\0"..., 4096, 6654865408) = 4096
> pread64(4, "\n\363\3\0T\1\0\0\0\0\0\0\2124#\0\1\0\0\0\272\263\2\0\2344#\0\1\0\0\0"..., 4096, 3800674304) = 4096
> pread64(4, "\n\363\254\0T\1\0\0!\277\301~\3074#\0\1\0\0\0\"\310\16\0\3514#\0\1\0\0\0"..., 4096, 2717888512) = 4096
> pread64(4, "\n\363Q\1T\1\0\0\255\255\255\255\307B#\0\1\0\0\0S\231\6\0\313B#\0\1\0\0\0"..., 4096, 4038303744) = 4096
> pread64(4, "\n\363/\0T\1\0\0&o\r\201e^#\0\1\0\0\0\233\237\f\0\202^#\0\1\0\0\0"..., 4096, 4184920064) = 4096
> pread64(4, "\n\363\244\0T\1\0\0\305\305\305\305\266b#\0\1\0\0\0n]\21\0\322b#\0\1\0\0\0"..., 4096, 6043922432) = 4096
> pread64(4, "\n\363\"\1T\1\0\0\0\0\0\2\310q#\0\1\0\0\0 \337\26\0\335q#\0\1\0\0\0"..., 4096, 4194627584) = 4096
> pread64(4, "\n\363\376\0T\1\0\0\0\0\0\2F\211#\0\1\0\0\0 \256/\0M\211#\0\1\0\0\0"..., 4096, 6049001472) = 4096
> pread64(4, "\n\363]\0T\1\0\0\0\0\0\1\16\235#\0\1\0\0\0\272_\30\0\20\235#\0\1\0\0\0"..., 4096, 6567260160) = 4096
> pread64(4, "\n\363\343\0T\1\0\0eeee\317\243#\0\1\0\0\0\264\222\20\0\332\243#\0\1\0\0\0"..., 4096, 6083014656) = 4096
> pread64(4, "\n\363\332\0T\1\0\0\36\255\246d\216\264#\0\1\0\0\0.\212\17\0\242\264#\0\1\0\0\0"..., 4096, 8083423232) = 4096
> pread64(4, "\n\363\334\0T\1\0\0EEEEW\306#\0\1\0\0\0^\273\20\0g\306#\0\1\0\0\0"..., 4096, 6083252224) = 4096
> pread64(4, "\n\363\333\0T\1\0\0\0\0\0\0@\330#\0\1\0\0\0Z\353\23\0N\330#\0\1\0\0\0"..., 4096, 7774924800) = 4096
> pread64(4, "\n\363x\0T\1\0\0{{{{H\347#\0\1\0\0\0\232~\27\0M\347#\0\1\0\0\0"..., 4096, 4735361024) = 4096
> pread64(4, "\n\363\367\0T\1\0\0\377\377\377\377#\361#\0\1\0\0\0O\t\27\0b\361#\0\1\0\0\0"..., 4096, 4448808960) = 4096
> pread64(4, "\n\363\t\0T\1\0\0aaaa\271\2$\0\1\0\0\0\351I\21\0\340\2$\0\1\0\0\0"..., 4096, 5948190720) = 4096
> pread64(4, "\n\363\25\0T\1\0\0\0\0\0\0@\3$\0\1\0\0\0t\33.\0D\3$\0\1\0\0\0"..., 4096, 1714593792) = 4096
> pread64(4, "\n\363b\0T\1\0\0\0\0\0\0g\4$\0\1\0\0\0\332\366\0\0n\4$\0\1\0\0\0"..., 4096, 1635794944) = 4096
> pread64(4, "\n\363)\1T\1\0\0\0\0\0\0%\v$\0\1\0\0\0e\230\17\0003\v$\0\1\0\0\0"..., 4096, 2111643648) = 4096
> pread64(4, "\n\363<\0T\1\0\0\0\0\0\0D $\0\1\0\0\0\1\355 \0T $\0\1\0\0\0"..., 4096, 165666816) = 4096
> brk(0x10000d2a000) = 0x10000d2a000
> pread64(4, "\n\363\n\0T\1\0\0\301\301\301\3012%$\0\1\0\0\0r\37\22\0004%$\0\1\0\0\0"..., 4096, 4871196672) = 4096
> pread64(4, "\n\363\225\0T\1\0\0\0\0\0\1(&$\0\1\0\0\0\213\"\22\0@&$\0\1\0\0\0"..., 4096, 5401088000) = 4096
> ^Cstrace: Process 221210 detached
>
> $ dpkg -l e2fsprogs
> ii e2fsprogs 1.43.3-1 sparc64 ext2/ext3/ext4 file system utilities
If it's fsck that's hung, then that's a different issue than the one Ted
is reporting, where it is fio that is hung. I've run your test case 10
times this morning, and I haven't seen any issues.
I'll try with two nvme devices and see if that changes the situation for
me. Ted, what devices are you using for data/scratch?
--
Jens Axboe
On Wed, Oct 12, 2016 at 08:39:57PM -0600, Jens Axboe wrote:
>
> If I just wanted to invoke generic/299 with a setup similar to yours,
> what should I do? I can attempt to reproduce it.
Well, if you're willing to get a Google Compute Engine account (it
will come with a $300 free trial credit, which is plenty for this
exercise), the simplest way to do an exact reproduction would be to
follow the instructions here:
https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
If you use the current default gce-xfstests image, which is
xfstests-201610131131 in the xfstests-cloud project, what I run which
triggers it quite reliably is:
gce-xfstests -C 10 generic/299
On a run that I just did using a 4.8.0+ kernel, it ran 12 times (10
times with the 4k configuration, and then 2 times with the 1k
configuration) before it hung.
FSTESTIMG: xfstests-cloud/xfstests-201610131131
FSTESTVER: e2fsprogs v1.43.3-30-g8df85fb (Sun, 4 Sep 2016 21:32:35 -0400)
FSTESTVER: fio fio-2.14-27-gafd2cef (Wed, 12 Oct 2016 08:59:25 -0600)
FSTESTVER: quota 81aca5c (Tue, 12 Jul 2016 16:15:45 +0200)
FSTESTVER: xfsprogs v4.5.0 (Tue, 15 Mar 2016 15:25:56 +1100)
FSTESTVER: xfstests-bld de1a881 (Fri, 30 Sep 2016 18:23:15 -0400)
FSTESTVER: xfstests linux-v3.8-1200-g218a62d (Thu, 29 Sep 2016 23:30:27 -0400)
FSTESTVER: kernel 4.8.0-ext4-14234-gfb44543 #21 SMP Wed Oct 12 23:40:33 EDT 2016 x86_64
So on a system which was hung in this way, I ran "apt-get update ;
apt-get install gdb", and uploaded the build tree (plus source code
for the fio used to build the image) to /root/fio.
root@xfstests-201610131148:~# ps axm -o pid,tid,class,pri,psr,pcpu,stat,wchan:20,comm
PID TID CLS PRI PSR %CPU STAT WCHAN COMMAND
....
19781 - - - - 1.6 - - fio
- 19781 TS 19 0 1.6 Ssl hrtimer_nanosleep -
- 19800 TS 19 0 0.0 Ssl futex_wait_queue_me -
- 19801 TS 19 0 0.0 Ssl futex_wait_queue_me -
- 19802 TS 19 0 0.0 Ssl futex_wait_queue_me -
- 19803 TS 19 0 0.0 Ssl futex_wait_queue_me -
....
root@xfstests-201610131148:~# gdb /root/fio/fio
....
(gdb) dir /root/fio
Source directories searched: /root/fio:$cdir:$cwd
(gdb) attach 19781
Attaching to program: /root/fio/fio, process 19781
....
(gdb) thread apply all bt
Thread 5 (Thread 0x7fb3ddfed700 (LWP 19800)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfed700) at pthread_create.c:309
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 4 (Thread 0x7fb3ddfe5700 (LWP 19801)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfe5700) at pthread_create.c:309
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 3 (Thread 0x7fb3ddfdd700 (LWP 19802)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfdd700) at pthread_create.c:309
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7fb3ddfd5700 (LWP 19803)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfd5700) at pthread_create.c:309
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7fb3de190740 (LWP 19781)):
#0 0x00007fb3dd35ff2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fb3dd387fb4 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x000000000045b541 in thread_main (data=0x18fca000) at backend.c:1738
#3 0x000000000045d4d3 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
#4 0x000000000045d80d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
#5 0x000000000040cbc8 in main (argc=2, argv=0x7ffd24b952e8, envp=<optimized out>) at fio.c:65
(gdb) thread apply all bt full
Thread 5 (Thread 0x7fb3ddfed700 (LWP 19800)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
list = {next = 0x7fb3ddfeced0, prev = 0x7fb3ddfeced0}
td = 0x7fb3c7590610
io_u = 0x202fac0
ret = 0
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfed700) at pthread_create.c:309
__res = <optimized out>
pd = 0x7fb3ddfed700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500339456, -2782054375790187915, 0, 140410502103136, 140410120428088,
140410500339456, 2739195402349996661, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.
Thread 4 (Thread 0x7fb3ddfe5700 (LWP 19801)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
list = {next = 0x7fb3ddfe4ed0, prev = 0x7fb3ddfe4ed0}
td = 0x7fb3c7590610
io_u = 0x203d280
ret = 0
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfe5700) at pthread_create.c:309
__res = <optimized out>
pd = 0x7fb3ddfe5700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500306688, -2782054375790187915, 0, 140410502103136, 140410120428088,
140410500306688, 2739195406644963957, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.
Thread 3 (Thread 0x7fb3ddfdd700 (LWP 19802)):
---Type <return> to continue, or q <return> to quit---
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
list = {next = 0x7fb3ddfdced0, prev = 0x7fb3ddfdced0}
td = 0x7fb3c7590610
io_u = 0x203cf80
ret = 0
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfdd700) at pthread_create.c:309
__res = <optimized out>
pd = 0x7fb3ddfdd700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500273920, -2782054375790187915, 0, 140410502103136, 140410120428088,
140410500273920, 2739195393760062069, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.
Thread 2 (Thread 0x7fb3ddfd5700 (LWP 19803)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
list = {next = 0x7fb3ddfd4ed0, prev = 0x7fb3ddfd4ed0}
td = 0x7fb3c7590610
io_u = 0x203ccc0
ret = 0
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfd5700) at pthread_create.c:309
__res = <optimized out>
pd = 0x7fb3ddfd5700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500241152, -2782054375790187915, 0, 140410502103136, 140410120428088,
140410500241152, 2739195398055029365, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.
Thread 1 (Thread 0x7fb3de190740 (LWP 19781)):
#0 0x00007fb3dd35ff2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
No locals.
---Type <return> to continue, or q <return> to quit---
#1 0x00007fb3dd387fb4 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
ts = {tv_sec = 0, tv_nsec = 1000000}
#2 0x000000000045b541 in thread_main (data=0x18fca000) at backend.c:1738
verify_bytes = 419209216
elapsed_us = {0, 0, 0}
td = 0x7fb3c7590610
sk_out = 0x2039600
clear_state = 33789440
#3 0x000000000045d4d3 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
ret = <optimized out>
pid = <optimized out>
this_jobs = 6
fd = 0x2029720
map = {0x7fb3c754c000, 0x7fb3c7559ad0, 0x7fb3c75675a0, 0x7fb3c7575070, 0x7fb3c7582b40, 0x7fb3c7590610, 0x0 <repeats 11 times>,
0x10000000, 0x0 <repeats 771 times>, 0x7fb3ddf83c1c <check_match+300>, 0x0, 0xd827524, 0x5, 0x7, 0x7fb3de192040,
0x7fb3ddf8458e <do_lookup_x+2334>, 0x0, 0x7ffd24b8ea80, 0x7fb3dd2a9d28, 0x7fb3dd2af020, 0x7ffd24b8eb90, 0x3609d4,
0x7ffd24b8eb80, 0x0, 0x0, 0x7fb3de1920c0, 0x7fb3de1939d8, 0x406851, 0x7fb3dd2b6d78, 0x401e18, 0x100000000, 0x100000375, 0x0,
0x7fb3de1920c0, 0x7ffd24b8ec20, 0x7fb3de19d500, 0x7ffd24b8ec48, 0x7fb3de19d1a8, 0x1,
0x7fb3ddf8473d <_dl_lookup_symbol_x+349>, 0x0, 0x7fb3de1920c0, 0x1, 0x0, 0x1, 0x7fb3de19d1a8, 0x0, 0x0, 0x0, 0x0, 0x0,
0x7fb3de19d500, 0x7ffd24b8eb90, 0x7ffd24b8eb80, 0xd827524, 0x406851, 0xffffffff, 0x0, 0x7fb3dd2af020, 0x7fb3de1939d8,
0x0 <repeats 11 times>, 0x698620 <[email protected]>, 0x0, 0x7ffd24b8ecf8, 0x7ffd24b8ecf8, 0x3, 0x7ffd24b8f6f0,
0x7fb3ddf88ecc <_dl_fixup+508>, 0x9000000000001, 0x0, 0x8ff5c28f5c28f, 0x7fb3dd2af020, 0x140, 0x7ffd24b8f6f0,
0x7ffd24b91720, 0x7fb3ddf8f2e5 <_dl_runtime_resolve+53>, 0x0, 0x20262a0, 0x2bc4, 0x7ffd24b92530, 0x7ffd24b8f960,
0x8ff5c28f5c200, 0x2026270, 0x7fb3de19d1a8, 0x54, 0x46144b <fio_test_cconv+3595>, 0x0, 0x7ffd24b8ece0,
0x0 <repeats 16 times>, 0x2026210, 0x0, 0x2026270, 0x0, 0x1, 0x40000000400, 0x1, 0x0, 0x1, 0x100000001, 0x100000000, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x100000001000, 0x1000, 0x0 <repeats 14 times>, 0x1, 0x0, 0x100000000, 0x100000001, 0x0, 0x0,
0x0, 0x100000001, 0x400, 0x0 <repeats 87 times>, 0x100000000, 0x100000000, 0x0, 0x89, 0x0 <repeats 20 times>, 0x6400000000,
0x6400000064, 0x200000, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1f400000000, 0x1000001f4...}
this_start = {tv_sec = 0, tv_usec = 0}
left = <optimized out>
td = 0x7fb3c7590610
i = 5
todo = 6
nr_running = 0
m_rate = 0
t_rate = 0
nr_started = 6
spent = <optimized out>
#4 0x000000000045d80d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
td = <optimized out>
i = <optimized out>
#5 0x000000000040cbc8 in main (argc=2, argv=0x7ffd24b952e8, envp=<optimized out>) at fio.c:65
ret = 1
> > As I said, I can attach to the hanging fio using a gdb and give you
> > stackdumps for all of the threads if that would be helpful.
>
> That would definitely be helpful as well, especially if I cannot
> reproduce it.
Here you go!
If I kill -9 the fio process, the tests continue to run, and there are
no kernel complaints, so I'm pretty sure this is a fio issue.
Many thanks!!
- Ted
P.S. In answer to your question from another e-mail message on this
thread, the block device in volved is a Google Compute Engine
Persistent Disk device backed using an SSD. To a very *rough*
approximation, think about something kinda like a qemu qcow image on
an very fast flash device in terms of performance. I've seen
the problem before on a 5400 RPM laptop drive, but it's ***much***
rarer. It reproduces much more easily faster block devices.
P.P.S. And here's the fio job file that it was running:
###########
# 299 test fio activity
# Filenames derived from jobsname and jobid like follows:
# ..
[global]
ioengine=libaio
bs=128k
directory=/xt-vdc
filesize=5368709120
size=999G
iodepth=128*1
continue_on_error=write
ignore_error=,ENOSPC
error_dump=0
create_on_open=1
fallocate=none
exitall=1
## Perform direct aio, to files which may be truncated
## by external task
[direct_aio]
direct=1
buffered=0
numjobs=4
rw=randwrite
runtime=100*1
time_based
# Perform direct aio and verify data
# This test case should check use-after-free issues
[aio-dio-verifier]
numjobs=1
verify=crc32c-intel
verify_fatal=1
verify_dump=1
verify_backlog=1024
verify_async=4
verifysort=1
direct=1
bs=4k
rw=randrw
filename=aio-dio-verifier
# Perform buffered aio and verify data
# This test case should check use-after-free issues
[buffered-aio-verifier]
numjobs=1
verify=crc32c-intel
verify_fatal=1
verify_dump=1
verify_backlog=1024
verify_async=4
verifysort=1
direct=0
buffered=1
bs=4k
rw=randrw
filename=buffered-aio-verifier
Jens, ping?
Have you had a chance to take a look at the stack traces? Is there
any other debugging information I can get for you?
Thanks,
- Ted
On Thu, Oct 13, 2016 at 07:19:23PM -0400, Theodore Ts'o wrote:
> On Wed, Oct 12, 2016 at 08:39:57PM -0600, Jens Axboe wrote:
> >
> > If I just wanted to invoke generic/299 with a setup similar to yours,
> > what should I do? I can attempt to reproduce it.
>
> Well, if you're willing to get a Google Compute Engine account (it
> will come with a $300 free trial credit, which is plenty for this
> exercise), the simplest way to do an exact reproduction would be to
> follow the instructions here:
>
> https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
>
> If you use the current default gce-xfstests image, which is
> xfstests-201610131131 in the xfstests-cloud project, what I run which
> triggers it quite reliably is:
>
> gce-xfstests -C 10 generic/299
>
> On a run that I just did using a 4.8.0+ kernel, it ran 12 times (10
> times with the 4k configuration, and then 2 times with the 1k
> configuration) before it hung.
>
> FSTESTIMG: xfstests-cloud/xfstests-201610131131
> FSTESTVER: e2fsprogs v1.43.3-30-g8df85fb (Sun, 4 Sep 2016 21:32:35 -0400)
> FSTESTVER: fio fio-2.14-27-gafd2cef (Wed, 12 Oct 2016 08:59:25 -0600)
> FSTESTVER: quota 81aca5c (Tue, 12 Jul 2016 16:15:45 +0200)
> FSTESTVER: xfsprogs v4.5.0 (Tue, 15 Mar 2016 15:25:56 +1100)
> FSTESTVER: xfstests-bld de1a881 (Fri, 30 Sep 2016 18:23:15 -0400)
> FSTESTVER: xfstests linux-v3.8-1200-g218a62d (Thu, 29 Sep 2016 23:30:27 -0400)
> FSTESTVER: kernel 4.8.0-ext4-14234-gfb44543 #21 SMP Wed Oct 12 23:40:33 EDT 2016 x86_64
>
> So on a system which was hung in this way, I ran "apt-get update ;
> apt-get install gdb", and uploaded the build tree (plus source code
> for the fio used to build the image) to /root/fio.
>
> root@xfstests-201610131148:~# ps axm -o pid,tid,class,pri,psr,pcpu,stat,wchan:20,comm
> PID TID CLS PRI PSR %CPU STAT WCHAN COMMAND
> ....
> 19781 - - - - 1.6 - - fio
> - 19781 TS 19 0 1.6 Ssl hrtimer_nanosleep -
> - 19800 TS 19 0 0.0 Ssl futex_wait_queue_me -
> - 19801 TS 19 0 0.0 Ssl futex_wait_queue_me -
> - 19802 TS 19 0 0.0 Ssl futex_wait_queue_me -
> - 19803 TS 19 0 0.0 Ssl futex_wait_queue_me -
> ....
> root@xfstests-201610131148:~# gdb /root/fio/fio
> ....
> (gdb) dir /root/fio
> Source directories searched: /root/fio:$cdir:$cwd
> (gdb) attach 19781
> Attaching to program: /root/fio/fio, process 19781
> ....
> (gdb) thread apply all bt
>
> Thread 5 (Thread 0x7fb3ddfed700 (LWP 19800)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfed700) at pthread_create.c:309
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Thread 4 (Thread 0x7fb3ddfe5700 (LWP 19801)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfe5700) at pthread_create.c:309
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Thread 3 (Thread 0x7fb3ddfdd700 (LWP 19802)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfdd700) at pthread_create.c:309
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Thread 2 (Thread 0x7fb3ddfd5700 (LWP 19803)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfd5700) at pthread_create.c:309
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Thread 1 (Thread 0x7fb3de190740 (LWP 19781)):
> #0 0x00007fb3dd35ff2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> #1 0x00007fb3dd387fb4 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
> #2 0x000000000045b541 in thread_main (data=0x18fca000) at backend.c:1738
> #3 0x000000000045d4d3 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
> #4 0x000000000045d80d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
> #5 0x000000000040cbc8 in main (argc=2, argv=0x7ffd24b952e8, envp=<optimized out>) at fio.c:65
>
> (gdb) thread apply all bt full
>
> Thread 5 (Thread 0x7fb3ddfed700 (LWP 19800)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> No locals.
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> list = {next = 0x7fb3ddfeced0, prev = 0x7fb3ddfeced0}
> td = 0x7fb3c7590610
> io_u = 0x202fac0
> ret = 0
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfed700) at pthread_create.c:309
> __res = <optimized out>
> pd = 0x7fb3ddfed700
> now = <optimized out>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500339456, -2782054375790187915, 0, 140410502103136, 140410120428088,
> 140410500339456, 2739195402349996661, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> not_first_call = <optimized out>
> pagesize_m1 = <optimized out>
> sp = <optimized out>
> freesize = <optimized out>
> __PRETTY_FUNCTION__ = "start_thread"
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> No locals.
>
> Thread 4 (Thread 0x7fb3ddfe5700 (LWP 19801)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> No locals.
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> list = {next = 0x7fb3ddfe4ed0, prev = 0x7fb3ddfe4ed0}
> td = 0x7fb3c7590610
> io_u = 0x203d280
> ret = 0
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfe5700) at pthread_create.c:309
> __res = <optimized out>
> pd = 0x7fb3ddfe5700
> now = <optimized out>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500306688, -2782054375790187915, 0, 140410502103136, 140410120428088,
> 140410500306688, 2739195406644963957, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> not_first_call = <optimized out>
> pagesize_m1 = <optimized out>
> sp = <optimized out>
> freesize = <optimized out>
> __PRETTY_FUNCTION__ = "start_thread"
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> No locals.
>
> Thread 3 (Thread 0x7fb3ddfdd700 (LWP 19802)):
> ---Type <return> to continue, or q <return> to quit---
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> No locals.
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> list = {next = 0x7fb3ddfdced0, prev = 0x7fb3ddfdced0}
> td = 0x7fb3c7590610
> io_u = 0x203cf80
> ret = 0
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfdd700) at pthread_create.c:309
> __res = <optimized out>
> pd = 0x7fb3ddfdd700
> now = <optimized out>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500273920, -2782054375790187915, 0, 140410502103136, 140410120428088,
> 140410500273920, 2739195393760062069, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> not_first_call = <optimized out>
> pagesize_m1 = <optimized out>
> sp = <optimized out>
> freesize = <optimized out>
> __PRETTY_FUNCTION__ = "start_thread"
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> No locals.
>
> Thread 2 (Thread 0x7fb3ddfd5700 (LWP 19803)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> No locals.
> #1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
> list = {next = 0x7fb3ddfd4ed0, prev = 0x7fb3ddfd4ed0}
> td = 0x7fb3c7590610
> io_u = 0x203ccc0
> ret = 0
> #2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfd5700) at pthread_create.c:309
> __res = <optimized out>
> pd = 0x7fb3ddfd5700
> now = <optimized out>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140410500241152, -2782054375790187915, 0, 140410502103136, 140410120428088,
> 140410500241152, 2739195398055029365, 2739194912765274741}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> not_first_call = <optimized out>
> pagesize_m1 = <optimized out>
> sp = <optimized out>
> freesize = <optimized out>
> __PRETTY_FUNCTION__ = "start_thread"
> #3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> No locals.
>
> Thread 1 (Thread 0x7fb3de190740 (LWP 19781)):
> #0 0x00007fb3dd35ff2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> No locals.
> ---Type <return> to continue, or q <return> to quit---
> #1 0x00007fb3dd387fb4 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
> ts = {tv_sec = 0, tv_nsec = 1000000}
> #2 0x000000000045b541 in thread_main (data=0x18fca000) at backend.c:1738
> verify_bytes = 419209216
> elapsed_us = {0, 0, 0}
> td = 0x7fb3c7590610
> sk_out = 0x2039600
> clear_state = 33789440
> #3 0x000000000045d4d3 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
> ret = <optimized out>
> pid = <optimized out>
> this_jobs = 6
> fd = 0x2029720
> map = {0x7fb3c754c000, 0x7fb3c7559ad0, 0x7fb3c75675a0, 0x7fb3c7575070, 0x7fb3c7582b40, 0x7fb3c7590610, 0x0 <repeats 11 times>,
> 0x10000000, 0x0 <repeats 771 times>, 0x7fb3ddf83c1c <check_match+300>, 0x0, 0xd827524, 0x5, 0x7, 0x7fb3de192040,
> 0x7fb3ddf8458e <do_lookup_x+2334>, 0x0, 0x7ffd24b8ea80, 0x7fb3dd2a9d28, 0x7fb3dd2af020, 0x7ffd24b8eb90, 0x3609d4,
> 0x7ffd24b8eb80, 0x0, 0x0, 0x7fb3de1920c0, 0x7fb3de1939d8, 0x406851, 0x7fb3dd2b6d78, 0x401e18, 0x100000000, 0x100000375, 0x0,
> 0x7fb3de1920c0, 0x7ffd24b8ec20, 0x7fb3de19d500, 0x7ffd24b8ec48, 0x7fb3de19d1a8, 0x1,
> 0x7fb3ddf8473d <_dl_lookup_symbol_x+349>, 0x0, 0x7fb3de1920c0, 0x1, 0x0, 0x1, 0x7fb3de19d1a8, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x7fb3de19d500, 0x7ffd24b8eb90, 0x7ffd24b8eb80, 0xd827524, 0x406851, 0xffffffff, 0x0, 0x7fb3dd2af020, 0x7fb3de1939d8,
> 0x0 <repeats 11 times>, 0x698620 <[email protected]>, 0x0, 0x7ffd24b8ecf8, 0x7ffd24b8ecf8, 0x3, 0x7ffd24b8f6f0,
> 0x7fb3ddf88ecc <_dl_fixup+508>, 0x9000000000001, 0x0, 0x8ff5c28f5c28f, 0x7fb3dd2af020, 0x140, 0x7ffd24b8f6f0,
> 0x7ffd24b91720, 0x7fb3ddf8f2e5 <_dl_runtime_resolve+53>, 0x0, 0x20262a0, 0x2bc4, 0x7ffd24b92530, 0x7ffd24b8f960,
> 0x8ff5c28f5c200, 0x2026270, 0x7fb3de19d1a8, 0x54, 0x46144b <fio_test_cconv+3595>, 0x0, 0x7ffd24b8ece0,
> 0x0 <repeats 16 times>, 0x2026210, 0x0, 0x2026270, 0x0, 0x1, 0x40000000400, 0x1, 0x0, 0x1, 0x100000001, 0x100000000, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x100000001000, 0x1000, 0x0 <repeats 14 times>, 0x1, 0x0, 0x100000000, 0x100000001, 0x0, 0x0,
> 0x0, 0x100000001, 0x400, 0x0 <repeats 87 times>, 0x100000000, 0x100000000, 0x0, 0x89, 0x0 <repeats 20 times>, 0x6400000000,
> 0x6400000064, 0x200000, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1f400000000, 0x1000001f4...}
> this_start = {tv_sec = 0, tv_usec = 0}
> left = <optimized out>
> td = 0x7fb3c7590610
> i = 5
> todo = 6
> nr_running = 0
> m_rate = 0
> t_rate = 0
> nr_started = 6
> spent = <optimized out>
> #4 0x000000000045d80d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
> td = <optimized out>
> i = <optimized out>
> #5 0x000000000040cbc8 in main (argc=2, argv=0x7ffd24b952e8, envp=<optimized out>) at fio.c:65
> ret = 1
>
> > > As I said, I can attach to the hanging fio using a gdb and give you
> > > stackdumps for all of the threads if that would be helpful.
> >
> > That would definitely be helpful as well, especially if I cannot
> > reproduce it.
>
> Here you go!
>
> If I kill -9 the fio process, the tests continue to run, and there are
> no kernel complaints, so I'm pretty sure this is a fio issue.
>
> Many thanks!!
>
> - Ted
>
> P.S. In answer to your question from another e-mail message on this
> thread, the block device in volved is a Google Compute Engine
> Persistent Disk device backed using an SSD. To a very *rough*
> approximation, think about something kinda like a qemu qcow image on
> an very fast flash device in terms of performance. I've seen
> the problem before on a 5400 RPM laptop drive, but it's ***much***
> rarer. It reproduces much more easily faster block devices.
>
> P.P.S. And here's the fio job file that it was running:
>
> ###########
> # 299 test fio activity
> # Filenames derived from jobsname and jobid like follows:
> # ..
> [global]
> ioengine=libaio
> bs=128k
> directory=/xt-vdc
> filesize=5368709120
> size=999G
> iodepth=128*1
> continue_on_error=write
> ignore_error=,ENOSPC
> error_dump=0
> create_on_open=1
> fallocate=none
> exitall=1
>
> ## Perform direct aio, to files which may be truncated
> ## by external task
> [direct_aio]
> direct=1
> buffered=0
> numjobs=4
> rw=randwrite
> runtime=100*1
> time_based
>
> # Perform direct aio and verify data
> # This test case should check use-after-free issues
> [aio-dio-verifier]
> numjobs=1
> verify=crc32c-intel
> verify_fatal=1
> verify_dump=1
> verify_backlog=1024
> verify_async=4
> verifysort=1
> direct=1
> bs=4k
> rw=randrw
> filename=aio-dio-verifier
>
> # Perform buffered aio and verify data
> # This test case should check use-after-free issues
> [buffered-aio-verifier]
> numjobs=1
> verify=crc32c-intel
> verify_fatal=1
> verify_dump=1
> verify_backlog=1024
> verify_async=4
> verifysort=1
> direct=0
> buffered=1
> bs=4k
> rw=randrw
> filename=buffered-aio-verifier
On 10/18/2016 12:01 PM, Theodore Ts'o wrote:
> Jens, ping?
>
> Have you had a chance to take a look at the stack traces? Is there
> any other debugging information I can get for you?
I'll take a look today. I agree, this definitely looks like a fio
bug. But not related to the mutex issue for the stat part, all verifier
threads are waiting to be woken up, but the main thread is done.
I'll see if I can reproduce. If not, I may ask you to dump more info
from the hung processes.
--
Jens Axboe
On 10/19/2016 08:06 AM, Jens Axboe wrote:
> On 10/18/2016 12:01 PM, Theodore Ts'o wrote:
>> Jens, ping?
>>
>> Have you had a chance to take a look at the stack traces? Is there
>> any other debugging information I can get for you?
>
> I'll take a look today. I agree, this definitely looks like a fio
> bug. But not related to the mutex issue for the stat part, all verifier
> threads are waiting to be woken up, but the main thread is done.
>
> I'll see if I can reproduce. If not, I may ask you to dump more info
> from the hung processes.
Been running for 4 hours now, in a loop, with bs=1k. I haven't been able
to reproduce. Before I go through the hassle of doing the google compute
engine thing, can you give me some details of the hardware?
Number of cores/nodes?
Memory size?
Rough speed and size of the device?
Any special mkfs options?
And whatever else might be relevant.
--
Jens Axboe
On Wed, Oct 19, 2016 at 11:49:12AM -0600, Jens Axboe wrote:
>
> Number of cores/nodes?
> Memory size?
I'm using a gce n1-standard-2 VM. So that's two CPU's and 7680M.
Each CPU is a virtual CPU is implemented as a single hardware
hyper-thread on a 2.3 GHz Intel Xeon E5 v3 (Haswell). (I was using a
GCE zone that has Haswell processors; different GCE zones may have
different processors. See [1] and [2] for more details.)
[1] https://cloud.google.com/compute/docs/machine-types
[2] https://cloud.google.com/compute/docs/regions-zones/regions-zones
> Rough speed and size of the device?
I'm using a GCE PD backed by a SSD. To a first approximation, you can
think of it as a KVM qcow file stored on a fast flash device. I'm
running LVM on the disk, and the fio is running on a 5 gig LVM volume.
> Any special mkfs options?
No. This particular error will trigger on 4k block file systems, 1k
block file systems, 4k file system swith journals disabled, etc. It's
fairly insensitive to the file system configuration.
> And whatever else might be relevant.
Note that the generic/299 test is running fio in an an ENOSPC hitter
configuration, where there is an antangonist thread which is
constantly allocating all of the disk space available and then freeing
it all:
# FSQA Test No. 299
#
# AIO/DIO stress test
# Run random AIO/DIO activity and fallocate/truncate simultaneously
# Test will operate on huge sparsed files so ENOSPC is expected.
So some of the AIO/DIO operations will be failing with an error, and
and I suspect that's very likely relevant to reproducing the failure.
The actual guts of the test from generic/299[1]:
[1] https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/tree/tests/generic/299
_workout()
{
echo ""
echo "Run fio with random aio-dio pattern"
echo ""
cat $fio_config >> $seqres.full
run_check $FIO_PROG $fio_config &
pid=$!
echo "Start fallocate/truncate loop"
for ((i=0; ; i++))
do
for ((k=1; k <= NUM_JOBS; k++))
do
$XFS_IO_PROG -f -c "falloc 0 $FILE_SIZE" \
$SCRATCH_MNT/direct_aio.$k.0 >> $seqres.full 2>&1
done
for ((k=1; k <= NUM_JOBS; k++))
do
$XFS_IO_PROG -c "truncate 0" \
$SCRATCH_MNT/direct_aio.$k.0 >> $seqres.full 2>&1
done
# Following like will check that pid is still run.
# Once fio exit we can stop fallocate/truncate loop
pgrep -f "$FIO_PROG" > /dev/null 2>&1 || break
done
wait $pid
}
So what's happening is that generic/299 is looping in the
fallocate/truncate loop until fio exits, but since fio never exits, so
it ends up looping forever.
Cheers,
- Ted
On 10/19/2016 02:32 PM, Theodore Ts'o wrote:
> On Wed, Oct 19, 2016 at 11:49:12AM -0600, Jens Axboe wrote:
>>
>> Number of cores/nodes?
>> Memory size?
>
> I'm using a gce n1-standard-2 VM. So that's two CPU's and 7680M.
>
> Each CPU is a virtual CPU is implemented as a single hardware
> hyper-thread on a 2.3 GHz Intel Xeon E5 v3 (Haswell). (I was using a
> GCE zone that has Haswell processors; different GCE zones may have
> different processors. See [1] and [2] for more details.)
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.google.com_compute_docs_machine-2Dtypes&d=DQIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=cK1a7KivzZRh1fKQMjSm2A&m=itmAtbDNAiup3d6EW5J8mxTc5VKZZo4z-TaIkfeBJ8o&s=JGVEfRvtOlOiYH_c8NLKuy3FFH8Ap3EGLrhsLV_UdiM&e=
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.google.com_compute_docs_regions-2Dzones_regions-2Dzones&d=DQIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=cK1a7KivzZRh1fKQMjSm2A&m=itmAtbDNAiup3d6EW5J8mxTc5VKZZo4z-TaIkfeBJ8o&s=nMlkYsjMNWYhpWmDmQmSFH_bQ6am_PeCfQzhWwFbbag&e=
>
>> Rough speed and size of the device?
>
> I'm using a GCE PD backed by a SSD. To a first approximation, you can
> think of it as a KVM qcow file stored on a fast flash device. I'm
> running LVM on the disk, and the fio is running on a 5 gig LVM volume.
>
>> Any special mkfs options?
>
> No. This particular error will trigger on 4k block file systems, 1k
> block file systems, 4k file system swith journals disabled, etc. It's
> fairly insensitive to the file system configuration.
>
>> And whatever else might be relevant.
>
> Note that the generic/299 test is running fio in an an ENOSPC hitter
> configuration, where there is an antangonist thread which is
> constantly allocating all of the disk space available and then freeing
> it all:
>
> # FSQA Test No. 299
> #
> # AIO/DIO stress test
> # Run random AIO/DIO activity and fallocate/truncate simultaneously
> # Test will operate on huge sparsed files so ENOSPC is expected.
>
>
> So some of the AIO/DIO operations will be failing with an error, and
> and I suspect that's very likely relevant to reproducing the failure.
>
> The actual guts of the test from generic/299[1]:
>
> [1] https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/tree/tests/generic/299
>
> _workout()
> {
> echo ""
> echo "Run fio with random aio-dio pattern"
> echo ""
> cat $fio_config >> $seqres.full
> run_check $FIO_PROG $fio_config &
> pid=$!
> echo "Start fallocate/truncate loop"
>
> for ((i=0; ; i++))
> do
> for ((k=1; k <= NUM_JOBS; k++))
> do
> $XFS_IO_PROG -f -c "falloc 0 $FILE_SIZE" \
> $SCRATCH_MNT/direct_aio.$k.0 >> $seqres.full 2>&1
> done
> for ((k=1; k <= NUM_JOBS; k++))
> do
> $XFS_IO_PROG -c "truncate 0" \
> $SCRATCH_MNT/direct_aio.$k.0 >> $seqres.full 2>&1
> done
> # Following like will check that pid is still run.
> # Once fio exit we can stop fallocate/truncate loop
> pgrep -f "$FIO_PROG" > /dev/null 2>&1 || break
> done
> wait $pid
> }
>
> So what's happening is that generic/299 is looping in the
> fallocate/truncate loop until fio exits, but since fio never exits, so
> it ends up looping forever.
I'm setting up the GCE now, I've had the tests running for about 24h now
on another test box and haven't been able to trigger any hangs. I'll
match your setup as closely as I can, hopefully that'll work.
--
Jens Axboe
On Thu, Oct 20, 2016 at 08:22:00AM -0600, Jens Axboe wrote:
> > So what's happening is that generic/299 is looping in the
> > fallocate/truncate loop until fio exits, but since fio never exits, so
> > it ends up looping forever.
>
> I'm setting up the GCE now, I've had the tests running for about 24h now
> on another test box and haven't been able to trigger any hangs. I'll
> match your setup as closely as I can, hopefully that'll work.
Any luck reproducing the problem?
On Wed, Oct 19, 2016 at 08:06:44AM -0600, Jens Axboe wrote:
>
> I'll take a look today. I agree, this definitely looks like a fio
> bug. But not related to the mutex issue for the stat part, all verifier
> threads are waiting to be woken up, but the main thread is done.
>
I was taking a closer look at this, and it does look ike it's related
to the stat_mutex. The main thread (according to gdb) seems to be
stuck in this loop in backend.c line 1738 (in thread_main):
do {
check_update_rusage(td);
if (!fio_mutex_down_trylock(stat_mutex))
break;
usleep(1000); <----- line 1738
} while (1);
So it looks like it's not able to grab the stat_mutex. But I can't
figure out how the stat_mutex could be down. None of the strack
traces seem to show that, and I've looked at all of the places where
stat_mutex is taken, and it doesn't look like stat_mutex should ever
be down for more than, say, a second?
So as a temporary workaround, I'm considering adding a check to see if
we stay stuck in this loop for than a thousand times, and if so, print
an error to stderr and then call _exit(1), or maybe just break out two
levels by jumping to line 1778 at "td_set_runstate(td, TD_FINISHING)"
and just give up on the usage statistics (since for xfstests we really
don't care about the usage stats).
- Ted
P.S. I can't see any way this could be happening other than perhaps a
pointer error that corrupted stat_mutex. I can't see any way a thread
could leave stat_mutex down WDYT?
On Fri, Oct 21, 2016 at 06:15:51PM -0400, Theodore Ts'o wrote:
> I was taking a closer look at this, and it does look ike it's related
> to the stat_mutex. The main thread (according to gdb) seems to be
> stuck in this loop in backend.c line 1738 (in thread_main):
>
> do {
> check_update_rusage(td);
> if (!fio_mutex_down_trylock(stat_mutex))
> break;
> usleep(1000); <----- line 1738
> } while (1);
So I have something very strange to report. I sync'ed up to the
latest fio repo, at commit e291cff14e97feb3cf. The problem still
manifests with that commit. Given what I've observed with a thread
spinning in this do loop, I added this commit:
commit 0f2f71f51595f6b708b801f7ae1dc86c5b2f3705
Author: Theodore Ts'o <[email protected]>
Date: Sat Oct 22 10:32:41 2016 -0400
backend: if we can't grab stat_mutex, report a deadlock error and exit
Signed-off-by: Theodore Ts'o <[email protected]>
diff --git a/backend.c b/backend.c
index fb2a855..093b6a3 100644
--- a/backend.c
+++ b/backend.c
@@ -1471,6 +1471,7 @@ static void *thread_main(void *data)
struct thread_data *td = fd->td;
struct thread_options *o = &td->o;
struct sk_out *sk_out = fd->sk_out;
+ int deadlock_loop_cnt;
int clear_state;
int ret;
@@ -1731,11 +1732,17 @@ static void *thread_main(void *data)
* the rusage_sem, which would never get upped because
* this thread is waiting for the stat mutex.
*/
+ deadlock_loop_cnt = 0;
do {
check_update_rusage(td);
if (!fio_mutex_down_trylock(stat_mutex))
break;
usleep(1000);
+ if (deadlock_loop_cnt++ > 5000) {
+ log_err("fio seems to be stuck grabbing stat_mutex, forcibly exiting\n");
+ td->error = EDEADLOCK;
+ goto err;
+ }
} while (1);
if (td_read(td) && td->io_bytes[DDIR_READ])
With this commit, the fioe in the generic/299 test no longer hangs.
I've tried running a very large time, and it no longer reproduces at
all. Specifically, the log_err() and the EDEADLOCK error added by the
patch isn't triggering, and fio is no longer hanging. So merely
adding loop counter seems to make the problem go away. Which makes me
wonder if there is either some kind of compiler or code generation
artifact we're seeing. So I should mention which compiler I'm
currently using:
% schroot -c jessie64 -- gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Anyway, I have a work around that seems to work for me, and which even
if the deadlock_loop counter fires, will at least stop the test run
from hanging.
You may or may not want to include this in the fio upstream repo,
given that I can't explain merely trying to check for the deadlock (or
inability to grab the stat_mute, anyway) makes the deadlock go away.
At least for the purposes of running the test, though, it does seem to
be a valid workaround, though.
Cheers,
- Ted
On 10/21/2016 04:15 PM, Theodore Ts'o wrote:
> On Thu, Oct 20, 2016 at 08:22:00AM -0600, Jens Axboe wrote:
>>> So what's happening is that generic/299 is looping in the
>>> fallocate/truncate loop until fio exits, but since fio never exits, so
>>> it ends up looping forever.
>>
>> I'm setting up the GCE now, I've had the tests running for about 24h now
>> on another test box and haven't been able to trigger any hangs. I'll
>> match your setup as closely as I can, hopefully that'll work.
>
> Any luck reproducing the problem?
>
> On Wed, Oct 19, 2016 at 08:06:44AM -0600, Jens Axboe wrote:
>>
>> I'll take a look today. I agree, this definitely looks like a fio
>> bug. But not related to the mutex issue for the stat part, all verifier
>> threads are waiting to be woken up, but the main thread is done.
>>
>
> I was taking a closer look at this, and it does look ike it's related
> to the stat_mutex. The main thread (according to gdb) seems to be
> stuck in this loop in backend.c line 1738 (in thread_main):
>
> do {
> check_update_rusage(td);
> if (!fio_mutex_down_trylock(stat_mutex))
> break;
> usleep(1000); <----- line 1738
> } while (1);
>
> So it looks like it's not able to grab the stat_mutex. But I can't
> figure out how the stat_mutex could be down. None of the strack
> traces seem to show that, and I've looked at all of the places where
> stat_mutex is taken, and it doesn't look like stat_mutex should ever
> be down for more than, say, a second?
>
> So as a temporary workaround, I'm considering adding a check to see if
> we stay stuck in this loop for than a thousand times, and if so, print
> an error to stderr and then call _exit(1), or maybe just break out two
> levels by jumping to line 1778 at "td_set_runstate(td, TD_FINISHING)"
> and just give up on the usage statistics (since for xfstests we really
> don't care about the usage stats).
Very strange. Can you see who the owner is of stat_mutex->lock, that's
the pthread_mutex_t they are sleeping on.
For now, I'll apply the work-around you sent. I haven't been able to
reproduce this, but knowing that it's the stat_mutex will allow me to
better make up a test case to hit it.
--
Jens Axboe
On Sun, Oct 23, 2016 at 08:32:49AM -0600, Jens Axboe wrote:
>
> Very strange. Can you see who the owner is of stat_mutex->lock, that's
> the pthread_mutex_t they are sleeping on.
>
> For now, I'll apply the work-around you sent. I haven't been able to
> reproduce this, but knowing that it's the stat_mutex will allow me to
> better make up a test case to hit it.
Well, that's the wierd thing. I can't see anything that could be
holding the stat_mutex. This is from a somewhat older version of fio
(fio-2.14-27-gafd2cef) but there when things wedged, there were four
threads hanging here (with the exact same stack trace):
Thread 5 (Thread 0x7fb3ddfed700 (LWP 19800)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000043987b in verify_async_thread (data=0x7fb3c7590610) at verify.c:1241
#2 0x00007fb3dd85d0a4 in start_thread (arg=0x7fb3ddfed700) at pthread_create.c:309
#3 0x00007fb3dd38e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
... and the main thread was hanging here:
Thread 1 (Thread 0x7fb3de190740 (LWP 19781)):
#0 0x00007fb3dd35ff2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fb3dd387fb4 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x000000000045b541 in thread_main (data=0x18fca000) at backend.c:1738
#3 0x000000000045d4d3 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
#4 0x000000000045d80d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
#5 0x000000000040cbc8 in main (argc=2, argv=0x7ffd24b952e8, envp=<optimized out>) at fio.c:65
The verify threads were waiting td->verify_cond, which it looks like
the main thread was supposed to wakeup in its shutdown sequence when
it calls verify_async_exit(), or if there was work to be done via the
verify_io_u_async() end_io callback. But the verify threads don't
take the stat_mutex, and the only places that take the stat_mutex in
stat.c and backend.c don't hold it for long, and I can't see any
longjumps or other progam flow discontinuities that could have left
stat_mutex locked.
I'm not an expert on pthreads debugging; is there some way to figure
out ala CONFIG_PROVE_LOCKiNG, which thread took the mutex, and at what
line number? Or would we need to to try hacking in some kind of mutex
debugging code into fio_mutex_down(), et. al.?
Cheers,
- Ted
OK, perhaps this is helpful. I reverted my last patch (so I could
repro the problem), and then added the following patch to fio/mutex.c
(see below).
When it hung, I got the following stack trace on the main thread:
(gdb) where
#0 0x00007ff8c018cf2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ff8c01b4fb4 in usleep (useconds=<optimized out>)
at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x000000000045b691 in thread_main (data=0x1bc64000) at backend.c:1738
#3 0x000000000045d623 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
#4 0x000000000045d95d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
#5 0x000000000040cc18 in main (argc=2, argv=0x7fff8446f0d8, envp=<optimized out>) at fio.c:65
.... and it looks like the reason why we can't take the mutex is that
somehow some thread managed to call stat_exit(), which calls
fio_mutex_down() followed by fio_mutex_remove():
(gdb) p (void *) stat_mutex->owner
$4 = (void *) 0x42fd60 <stat_exit+16>
How this could have happened, I have no idea. The only call to
stat_exit() appears to be at the end of fio_backend(), and the main
thread is still in the middle of calling thread_main() which is called
by run_threads().
Do you have any thoughts? Let me know if there's something I can try.
- Ted
diff --git a/mutex.c b/mutex.c
index 7580922..26e2955 100644
--- a/mutex.c
+++ b/mutex.c
@@ -108,6 +108,8 @@ int __fio_mutex_init(struct fio_mutex *mutex, int value)
mutex->value = value;
mutex->magic = FIO_MUTEX_MAGIC;
+ if (!value)
+ mutex->owner = __builtin_return_address(0);
ret = mutex_cond_init_pshared(&mutex->lock, &mutex->cond);
if (ret)
@@ -128,8 +130,11 @@ struct fio_mutex *fio_mutex_init(int value)
return NULL;
}
- if (!__fio_mutex_init(mutex, value))
+ if (!__fio_mutex_init(mutex, value)) {
+ if (!value)
+ mutex->owner = __builtin_return_address(0);
return mutex;
+ }
fio_mutex_remove(mutex);
return NULL;
@@ -194,7 +199,13 @@ bool fio_mutex_down_trylock(struct fio_mutex *mutex)
pthread_mutex_lock(&mutex->lock);
if (mutex->value) {
+ if (mutex->value > 1) {
+ log_err("Completely bogus mutex value?!? (%d, %p)\n",
+ mutex->value, mutex->owner);
+ abort();
+ }
mutex->value--;
+ mutex->owner = __builtin_return_address(0);
ret = false;
}
pthread_mutex_unlock(&mutex->lock);
@@ -215,6 +226,7 @@ void fio_mutex_down(struct fio_mutex *mutex)
}
mutex->value--;
+ mutex->owner = __builtin_return_address(0);
pthread_mutex_unlock(&mutex->lock);
}
@@ -228,6 +240,8 @@ void fio_mutex_up(struct fio_mutex *mutex)
read_barrier();
if (!mutex->value && mutex->waiters)
do_wake = 1;
+ if (!mutex->value)
+ mutex->owner = 0;
mutex->value++;
pthread_mutex_unlock(&mutex->lock);
diff --git a/mutex.h b/mutex.h
index 54009ba..ea43845 100644
--- a/mutex.h
+++ b/mutex.h
@@ -13,6 +13,7 @@ struct fio_mutex {
int value;
int waiters;
int magic;
+ unsigned long owner;
};
struct fio_rwlock {
P.P.S. As before, the only other reads are the verify threads:
(gdb) thread apply all bt
Thread 5 (Thread 0x7ff8c0e1a700 (LWP 13147)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00000000004398cb in verify_async_thread (data=0x7ff8aa3bd610) at verify.c:1241
#2 0x00007ff8c068a0a4 in start_thread (arg=0x7ff8c0e1a700) at pthread_create.c:309
#3 0x00007ff8c01bb62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 4 (Thread 0x7ff8c0e12700 (LWP 13148)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00000000004398cb in verify_async_thread (data=0x7ff8aa3bd610) at verify.c:1241
#2 0x00007ff8c068a0a4 in start_thread (arg=0x7ff8c0e12700) at pthread_create.c:309
#3 0x00007ff8c01bb62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 3 (Thread 0x7ff8c0e0a700 (LWP 13149)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00000000004398cb in verify_async_thread (data=0x7ff8aa3bd610) at verify.c:1241
#2 0x00007ff8c068a0a4 in start_thread (arg=0x7ff8c0e0a700) at pthread_create.c:309
#3 0x00007ff8c01bb62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7ff8c0e02700 (LWP 13150)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00000000004398cb in verify_async_thread (data=0x7ff8aa3bd610) at verify.c:1241
#2 0x00007ff8c068a0a4 in start_thread (arg=0x7ff8c0e02700) at pthread_create.c:309
#3 0x00007ff8c01bb62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ff8c0fbd740 (LWP 13139)):
#0 0x00007ff8c018cf2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ff8c01b4fb4 in usleep (useconds=<optimized out>)
at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x000000000045b691 in thread_main (data=0x1bc64000) at backend.c:1738
#3 0x000000000045d623 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
#4 0x000000000045d95d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
#5 0x000000000040cc18 in main (argc=2, argv=0x7fff8446f0d8, envp=<optimized out>) at fio.c:65
(gdb)
On 10/23/2016 03:24 PM, Theodore Ts'o wrote:
> OK, perhaps this is helpful. I reverted my last patch (so I could
> repro the problem), and then added the following patch to fio/mutex.c
> (see below).
>
> When it hung, I got the following stack trace on the main thread:
>
> (gdb) where
> #0 0x00007ff8c018cf2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> #1 0x00007ff8c01b4fb4 in usleep (useconds=<optimized out>)
> at ../sysdeps/unix/sysv/linux/usleep.c:32
> #2 0x000000000045b691 in thread_main (data=0x1bc64000) at backend.c:1738
> #3 0x000000000045d623 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268
> #4 0x000000000045d95d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
> #5 0x000000000040cc18 in main (argc=2, argv=0x7fff8446f0d8, envp=<optimized out>) at fio.c:65
>
> .... and it looks like the reason why we can't take the mutex is that
> somehow some thread managed to call stat_exit(), which calls
> fio_mutex_down() followed by fio_mutex_remove():
>
> (gdb) p (void *) stat_mutex->owner
> $4 = (void *) 0x42fd60 <stat_exit+16>
>
> How this could have happened, I have no idea. The only call to
> stat_exit() appears to be at the end of fio_backend(), and the main
> thread is still in the middle of calling thread_main() which is called
> by run_threads().
>
> Do you have any thoughts? Let me know if there's something I can try.
Wow, that is very odd. That should only be possible if the main thread
exits too early, and calls stat_exit() while we still have jobs running.
That should trigger other funkiness too. I have no idea how that is
happening, but it obviously is since your debugging patch showed that
stat_exit() owns the mutex.
But this gives me more to work from, very useul, thanks Ted! I'll pound
on this tomorrow.
--
Jens Axboe
I enabled some more debugging and it's become more clear what's going
on. (See attached for the full log).
The main issue seems to be that once one of fio is done, it kills off
the other threads (actually, we're using processes):
process 31848 terminate group_id=0
process 31848 setting terminate on direct_aio/31846
process 31848 setting terminate on direct_aio/31848
process 31848 setting terminate on direct_aio/31849
process 31848 setting terminate on direct_aio/31851
process 31848 setting terminate on aio-dio-verifier/31852
process 31848 setting terminate on buffered-aio-verifier/31854
process 31851 pid=31851: runstate RUNNING -> FINISHING
process 31851 terminate group_id=0
process 31851 setting terminate on direct_aio/31846
process 31851 setting terminate on direct_aio/31848
process 31851 setting terminate on direct_aio/31849
process 31851 setting terminate on direct_aio/31851
process 31851 setting terminate on aio-dio-verifier/31852
process 31851 setting terminate on buffered-aio-verifier/31854
process 31852 pid=31852: runstate RUNNING -> FINISHING
process 31846 pid=31846: runstate RUNNING -> FINISHING
...
but one or more of the threads doesn't exit within 60 seconds:
fio: job 'direct_aio' (state=5) hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
process 31794 pid=31849: runstate RUNNING -> REAPED
fio: job 'buffered-aio-verifier' (state=5) hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
process 31794 pid=31854: runstate RUNNING -> REAPED
process 31794 terminate group_id=-1
The main thread then prints all of the statistics, and calls stat_exit():
stat_exit called by tid: 31794 <---- debugging message which prints gettid()
Unfortunately, this process(es) aren't actually, killed, they are
marked as reap, but they are still in the process listing:
root@xfstests:~# ps augxww | grep fio
root 1585 0.0 0.0 0 0 ? S< 18:45 0:00 [dm_bufio_cache]
root 7191 0.0 0.0 12732 2200 pts/1 S+ 23:05 0:00 grep fio
root 31849 1.5 0.2 407208 18876 ? Ss 22:36 0:26 /root/xfstests/bin/fio /tmp/31503.fio
root 31854 1.2 0.1 398480 10240 ? Ssl 22:36 0:22 /root/xfstests/bin/fio /tmp/31503.fio
And if you attach to them with a gdb, they are spinning trying to grab
the stat_mutex(), which they can't get because the main thread has
already called stat_exit() and then has exited. So these two threads
did eventually return, but some time after 60 seconds had passed, and
then they hung waiting for stat_mutex(), which they will never get
because the main thread has already called stat_exit().
This probably also explains why you had trouble reproducing it. It
requires a disk whose performance is variable enougoh that under heavy
load, it might take more than 60 seconds for the direct_aio or
buffered-aio-verifier thread to close itself out.
And I suspect once the main thread exited, it probably also closed out
the debugging channel so the deadlock detector did probably trip, but
somehow we just didn't see the output.
So I can imagine some possible fixes. We could make the thread
timeout configurable, and/or increase it from 60 seconds to something like
300 seconds. We could make stat_exit() a no-op --- after all, if the
main thread is exiting, there's no real point to down and then destroy
the stat_mutex. And/or we could change the forced reap to send a kill
-9 to the thread, and instead of maring it as reaped.
Cheers,
- Ted
On 10/23/2016 09:38 PM, Theodore Ts'o wrote:
> I enabled some more debugging and it's become more clear what's going
> on. (See attached for the full log).
>
> The main issue seems to be that once one of fio is done, it kills off
> the other threads (actually, we're using processes):
>
> process 31848 terminate group_id=0
> process 31848 setting terminate on direct_aio/31846
> process 31848 setting terminate on direct_aio/31848
> process 31848 setting terminate on direct_aio/31849
> process 31848 setting terminate on direct_aio/31851
> process 31848 setting terminate on aio-dio-verifier/31852
> process 31848 setting terminate on buffered-aio-verifier/31854
> process 31851 pid=31851: runstate RUNNING -> FINISHING
> process 31851 terminate group_id=0
> process 31851 setting terminate on direct_aio/31846
> process 31851 setting terminate on direct_aio/31848
> process 31851 setting terminate on direct_aio/31849
> process 31851 setting terminate on direct_aio/31851
> process 31851 setting terminate on aio-dio-verifier/31852
> process 31851 setting terminate on buffered-aio-verifier/31854
> process 31852 pid=31852: runstate RUNNING -> FINISHING
> process 31846 pid=31846: runstate RUNNING -> FINISHING
> ...
>
> but one or more of the threads doesn't exit within 60 seconds:
>
> fio: job 'direct_aio' (state=5) hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> process 31794 pid=31849: runstate RUNNING -> REAPED
> fio: job 'buffered-aio-verifier' (state=5) hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> process 31794 pid=31854: runstate RUNNING -> REAPED
> process 31794 terminate group_id=-1
>
> The main thread then prints all of the statistics, and calls stat_exit():
>
> stat_exit called by tid: 31794 <---- debugging message which prints gettid()
>
> Unfortunately, this process(es) aren't actually, killed, they are
> marked as reap, but they are still in the process listing:
>
> root@xfstests:~# ps augxww | grep fio
> root 1585 0.0 0.0 0 0 ? S< 18:45 0:00 [dm_bufio_cache]
> root 7191 0.0 0.0 12732 2200 pts/1 S+ 23:05 0:00 grep fio
> root 31849 1.5 0.2 407208 18876 ? Ss 22:36 0:26 /root/xfstests/bin/fio /tmp/31503.fio
> root 31854 1.2 0.1 398480 10240 ? Ssl 22:36 0:22 /root/xfstests/bin/fio /tmp/31503.fio
>
> And if you attach to them with a gdb, they are spinning trying to grab
> the stat_mutex(), which they can't get because the main thread has
> already called stat_exit() and then has exited. So these two threads
> did eventually return, but some time after 60 seconds had passed, and
> then they hung waiting for stat_mutex(), which they will never get
> because the main thread has already called stat_exit().
>
> This probably also explains why you had trouble reproducing it. It
> requires a disk whose performance is variable enougoh that under heavy
> load, it might take more than 60 seconds for the direct_aio or
> buffered-aio-verifier thread to close itself out.
Good catch! Yes, that could certainly explain why we are stuck on that
stat_mutex and why the main thread just gave up on it and ended up in
stat_exit() with a thread (or more) still running.
> And I suspect once the main thread exited, it probably also closed out
> the debugging channel so the deadlock detector did probably trip, but
> somehow we just didn't see the output.
>
> So I can imagine some possible fixes. We could make the thread
> timeout configurable, and/or increase it from 60 seconds to something like
> 300 seconds. We could make stat_exit() a no-op --- after all, if the
> main thread is exiting, there's no real point to down and then destroy
> the stat_mutex. And/or we could change the forced reap to send a kill
> -9 to the thread, and instead of maring it as reaped.
We have to clean up - for normal runs, it's not a big deal, but if fio
is run as a client/server setup, the backend will persist across runs.
If we leak, then that could be a concern.
How about the below? Bump the timeout to 5 min, 1 min is a little on the
short side, we want normal error handling to be out of the way before
that happens. And additionally, break out if we have been marked as
reaped/exited, so we avoid grabbing the stat mutex again.
diff --git a/backend.c b/backend.c
index 093b6a3a290e..f0927abfccb0 100644
--- a/backend.c
+++ b/backend.c
@@ -1723,6 +1723,14 @@ static void *thread_main(void *data)
}
}
+ /*
+ * If we took too long to shut down, the main thread could
+ * already consider us reaped/exited. If that happens, break
+ * out and clean up.
+ */
+ if (td->runstate >= TD_EXITED)
+ break;
+
clear_state = 1;
/*
diff --git a/fio.h b/fio.h
index 080842aef4f8..74c1b306af26 100644
--- a/fio.h
+++ b/fio.h
@@ -588,7 +588,7 @@ extern const char *runstate_to_name(int runstate);
* Allow 60 seconds for a job to quit on its own, otherwise reap with
* a vengeance.
*/
-#define FIO_REAP_TIMEOUT 60
+#define FIO_REAP_TIMEOUT 300
#define TERMINATE_ALL (-1U)
extern void fio_terminate_threads(unsigned int);
--
Jens Axboe
On Mon, Oct 24, 2016 at 10:28:14AM -0600, Jens Axboe wrote:
> How about the below? Bump the timeout to 5 min, 1 min is a little on the
> short side, we want normal error handling to be out of the way before
> that happens. And additionally, break out if we have been marked as
> reaped/exited, so we avoid grabbing the stat mutex again.
Yep, that works. I tried a test with just the second change:
> + /*
> + * If we took too long to shut down, the main thread could
> + * already consider us reaped/exited. If that happens, break
> + * out and clean up.
> + */
> + if (td->runstate >= TD_EXITED)
> + break;
> +
And that's sufficient to solve the problem.
Increasing the timeout to 5 minute also would be a good idea, so we
can let the worker threads exit cleanly so the reported stats will be
completely accurate.
Thanks for your help in figuring out this long-standing problem!
- Ted
On 10/24/2016 08:54 PM, Theodore Ts'o wrote:
> On Mon, Oct 24, 2016 at 10:28:14AM -0600, Jens Axboe wrote:
>
>> How about the below? Bump the timeout to 5 min, 1 min is a little on the
>> short side, we want normal error handling to be out of the way before
>> that happens. And additionally, break out if we have been marked as
>> reaped/exited, so we avoid grabbing the stat mutex again.
>
> Yep, that works. I tried a test with just the second change:
>
>> + /*
>> + * If we took too long to shut down, the main thread could
>> + * already consider us reaped/exited. If that happens, break
>> + * out and clean up.
>> + */
>> + if (td->runstate >= TD_EXITED)
>> + break;
>> +
>
> And that's sufficient to solve the problem.
Yes, it should be, so glad that it is!
> Increasing the timeout to 5 minute also would be a good idea, so we
> can let the worker threads exit cleanly so the reported stats will be
> completely accurate.
I made that separate change as well. If the job is stuck in the kernel
for some sync operation, we could feasibly be uninterruptible for
minutes. So 1 minutes is too short in any case, and I'd rather just make
this check than sending kill signals since it won't fix the
uninterruptible problem.
> Thanks for your help in figuring out this long-standing problem!
It was easy based on all your info, since I could not reproduce. So
thanks for your help! Everything should be committed now, and I'll cut a
new release tomorrow so we can hopefully put this behind us.
--
Jens Axboe