Hi Ted,
first of all I want to remind you about the sync_supers changes which
you said you'd make sure get properly reviewed and tested for v3.5. But
I am getting worried about missing also 3.5.
I wanted to re-send the patch-set, but I cannot test it because ext4
blocks when I test it with xfstests.
The HEAD of xfstests I use is:
7d14795 xfstests: introduce 286 for SEEK_DATA/SEEK_HOLE copy test
I run xfstests with "check -g auto" command. My environment is:
export TEST_DIR=/mnt/xfstests-disk-1
export TEST_DEV=/dev/vdb
export SCRATCH_MNT=/mnt/xfstests-disk-2
export SCRATCH_DEV=/dev/vdc
export FSTYP=ext4
The disks are not very big, df says:
/dev/vdb 4181968 192780 3779476 5% /mnt/xfstests-disk-1
/dev/vdc 12539816 486160 11424512 5% /mnt/xfstests-disk-2
Is this a known issue? Please, let me know if you need more information.
I can also try some patches for you.
I've just tested 3.5-rc5 in my kvm machine. Below is what dmesg gives
me:
[ 2.541716] EXT4-fs (vda1): re-mounted. Opts: (null)
[ 3.230435] systemd-fsck[508]: home: clean, 316114/2097152 files, 2959603/8388608 blocks
[ 3.296292] hrtimer: interrupt took 2148140 ns
[ 3.369675] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null)
[ 3.604614] /usr/sbin/crond[559]: (CRON) INFO (Syslog will be used instead of sendmail.): No such file or directory
[ 3.638686] /usr/sbin/crond[559]: (CRON) INFO (running with inotify support)
[ 50.659283] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: (null)
[ 50.687492] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: (null)
[ 52.098169] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 52.272963] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 56.323779] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 56.771414] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 57.459560] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 58.176195] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 59.117647] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 60.686415] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 61.116057] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 62.321826] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 69.970617] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 70.315529] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 70.344068] EXT4-fs (vdb): re-mounted. Opts: data=ordered
[ 71.086640] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 71.632241] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 71.840146] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 72.229582] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 73.526206] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 78.180546] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 78.518143] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 78.693992] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 80.758908] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 81.629718] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 83.278788] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 240.379170] INFO: task flush-253:32:1336 blocked for more than 120 seconds.
[ 240.379402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.379605] flush-253:32 D ffff8803118d8508 0 1336 2 0x00000000
[ 240.379608] ffff88030f93dae0 0000000000000046 ffff88030f93da80 ffffffff810bbb5d
[ 240.379612] ffff880312b13f80 ffff88030f93dfd8 ffff88030f93dfd8 ffff88030f93dfd8
[ 240.379615] ffff88031458df40 ffff880312b13f80 ffff88030f93db70 ffff88030c198800
[ 240.379618] Call Trace:
[ 240.379632] [<ffffffff810bbb5d>] ? trace_hardirqs_on+0xd/0x10
[ 240.379641] [<ffffffff81628029>] schedule+0x29/0x70
[ 240.379650] [<ffffffff81236e85>] ext4_force_commit+0x85/0xc0
[ 240.379656] [<ffffffff8107fcc0>] ? __init_waitqueue_head+0x60/0x60
[ 240.379660] [<ffffffff81214805>] ext4_write_inode+0x75/0x110
[ 240.379667] [<ffffffff811bcba1>] __writeback_single_inode+0x161/0x1a0
[ 240.379670] [<ffffffff811becdf>] writeback_sb_inodes+0x24f/0x440
[ 240.379672] [<ffffffff811bf094>] wb_writeback+0xf4/0x370
[ 240.379677] [<ffffffff81062647>] ? local_bh_enable_ip+0x97/0x100
[ 240.379680] [<ffffffff810bbb5d>] ? trace_hardirqs_on+0xd/0x10
[ 240.379683] [<ffffffff811c0ae2>] wb_do_writeback+0xc2/0x200
[ 240.379685] [<ffffffff8106b02e>] ? del_timer+0x8e/0x140
[ 240.379688] [<ffffffff811c0ca4>] bdi_writeback_thread+0x84/0x2d0
[ 240.379692] [<ffffffff811c0c20>] ? wb_do_writeback+0x200/0x200
[ 240.379695] [<ffffffff8107f19e>] kthread+0xae/0xc0
[ 240.379699] [<ffffffff81633374>] kernel_thread_helper+0x4/0x10
[ 240.379702] [<ffffffff81629c70>] ? retint_restore_args+0x13/0x13
[ 240.379704] [<ffffffff8107f0f0>] ? __init_kthread_worker+0x70/0x70
[ 240.379706] [<ffffffff81633370>] ? gs_change+0x13/0x13
[ 240.379708] no locks held by flush-253:32/1336.
[ 240.379710] INFO: task ext4lazyinit:14878 blocked for more than 120 seconds.
[ 240.379899] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.380129] ext4lazyinit D 0000000000000000 0 14878 2 0x00000000
[ 240.380132] ffff8803140dfd10 0000000000000046 ffff8803140dfcb0 ffffffff810bbb5d
[ 240.380135] ffff880312523f80 ffff8803140dffd8 ffff8803140dffd8 ffff8803140dffd8
[ 240.380138] ffffffff81c13420 ffff880312523f80 ffff8803140dfd00 ffff88030c19a000
[ 240.380141] Call Trace:
[ 240.380144] [<ffffffff810bbb5d>] ? trace_hardirqs_on+0xd/0x10
[ 240.380147] [<ffffffff81628029>] schedule+0x29/0x70
[ 240.380149] [<ffffffff812360c5>] ext4_journal_start_sb+0x1c5/0x200
[ 240.380152] [<ffffffff81210ddb>] ? ext4_init_inode_table+0xab/0x350
[ 240.380154] [<ffffffff8107fcc0>] ? __init_waitqueue_head+0x60/0x60
[ 240.380156] [<ffffffff81210ddb>] ext4_init_inode_table+0xab/0x350
[ 240.380159] [<ffffffff816260f0>] ? mutex_lock_nested+0x290/0x360
[ 240.380162] [<ffffffff8122558b>] ? ext4_lazyinit_thread+0x5b/0x2f0
[ 240.380165] [<ffffffff81225640>] ext4_lazyinit_thread+0x110/0x2f0
[ 240.380167] [<ffffffff81225530>] ? ext4_unregister_li_request+0x70/0x70
[ 240.380169] [<ffffffff8107f19e>] kthread+0xae/0xc0
[ 240.380172] [<ffffffff81633374>] kernel_thread_helper+0x4/0x10
[ 240.380174] [<ffffffff81629c70>] ? retint_restore_args+0x13/0x13
[ 240.380176] [<ffffffff8107f0f0>] ? __init_kthread_worker+0x70/0x70
[ 240.380178] [<ffffffff81633370>] ? gs_change+0x13/0x13
[ 240.380180] 1 lock held by ext4lazyinit/14878:
[ 240.380181] #0: (&eli->li_list_mtx){+.+...}, at: [<ffffffff8122558b>] ext4_lazyinit_thread+0x5b/0x2f0
[ 240.380191] INFO: task fsstress:15250 blocked for more than 120 seconds.
[ 240.380429] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.380718] fsstress D ffff880308e05880 0 15250 15249 0x00000000
[ 240.380721] ffff8803107ddc18 0000000000000046 ffff8803107ddbb8 ffffffff810bbb5d
[ 240.380724] ffff8803125adf40 ffff8803107ddfd8 ffff8803107ddfd8 ffff8803107ddfd8
[ 240.380727] ffff88031458df40 ffff8803125adf40 ffff880300000000 ffff880312be6140
[ 240.380730] Call Trace:
[ 240.380734] [<ffffffff810bbb5d>] ? trace_hardirqs_on+0xd/0x10
[ 240.380737] [<ffffffff81628029>] schedule+0x29/0x70
[ 240.380744] [<ffffffff81133b55>] __generic_file_aio_write+0xb5/0x440
[ 240.380747] [<ffffffff8107fcc0>] ? __init_waitqueue_head+0x60/0x60
[ 240.380749] [<ffffffff81133f54>] generic_file_aio_write+0x74/0xe0
[ 240.380753] [<ffffffff811a0da5>] ? putname+0x35/0x50
[ 240.380755] [<ffffffff8120df27>] ext4_file_write+0xc7/0x270
[ 240.380759] [<ffffffff81195072>] do_sync_write+0xd2/0x110
[ 240.380762] [<ffffffff81084bb3>] ? up_read+0x23/0x40
[ 240.380767] [<ffffffff81284a5c>] ? security_file_permission+0x2c/0xb0
[ 240.380770] [<ffffffff81195611>] ? rw_verify_area+0x61/0xf0
[ 240.380772] [<ffffffff81195973>] vfs_write+0xb3/0x180
[ 240.380774] [<ffffffff81195c9a>] sys_write+0x4a/0x90
[ 240.380777] [<ffffffff81631f29>] system_call_fastpath+0x16/0x1b
[ 240.380778] 1 lock held by fsstress/15250:
[ 240.380779] #0: (&sb->s_type->i_mutex_key#12){+.+...}, at: [<ffffffff81133f38>] generic_file_aio_write+0x58/0xe0
[ 240.380785] INFO: task fsstress:15251 blocked for more than 120 seconds.
[ 240.380966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.381194] fsstress D ffff8803125a8000 0 15251 15249 0x00000000
[ 240.381197] ffff880310651c38 0000000000000046 ffff8803125a86d8 0000000000000046
[ 240.381202] ffff8803125a8000 ffff880310651fd8 ffff880310651fd8 ffff880310651fd8
[ 240.381205] ffffffff81c13420 ffff8803125a8000 ffff880310651c98 7fffffffffffffff
[ 240.381208] Call Trace:
[ 240.381212] [<ffffffff81628029>] schedule+0x29/0x70
[ 240.381214] [<ffffffff81625975>] schedule_timeout+0x305/0x380
[ 240.381217] [<ffffffff810bb906>] ? mark_held_locks+0x86/0x140
[ 240.381219] [<ffffffff81629430>] ? _raw_spin_unlock_irq+0x30/0x50
[ 240.381222] [<ffffffff810bbac5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 240.381224] [<ffffffff81627ed2>] wait_for_common+0x122/0x170
[ 240.381228] [<ffffffff81092d90>] ? try_to_wake_up+0x2f0/0x2f0
[ 240.381231] [<ffffffff811c4630>] ? __sync_filesystem+0x90/0x90
[ 240.381234] [<ffffffff811c4630>] ? __sync_filesystem+0x90/0x90
[ 240.381236] [<ffffffff81627ffd>] wait_for_completion+0x1d/0x20
[ 240.381238] [<ffffffff811bd1ae>] sync_inodes_sb+0x12e/0x280
[ 240.381241] [<ffffffff811c4630>] ? __sync_filesystem+0x90/0x90
[ 240.381244] [<ffffffff811c4628>] __sync_filesystem+0x88/0x90
[ 240.381246] [<ffffffff811c464f>] sync_one_sb+0x1f/0x30
[ 240.381252] [<ffffffff81198cf1>] iterate_supers+0xf1/0x100
[ 240.381256] [<ffffffff811c4717>] sys_sync+0x47/0x70
[ 240.381260] [<ffffffff81631f29>] system_call_fastpath+0x16/0x1b
[ 240.381263] 1 lock held by fsstress/15251:
[ 240.381264] #0: (&type->s_umount_key#19){+++++.}, at: [<ffffffff81198c88>] iterate_supers+0x88/0x100
[ 240.381276] INFO: task xfs_io:15270 blocked for more than 120 seconds.
[ 240.381527] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.381724] xfs_io D 0000000000000002 0 15270 15266 0x00000000
[ 240.381727] ffff8803123b3d68 0000000000000046 0000000000000000 ffff880312ab9fc0
[ 240.381730] ffff880312ab9fc0 ffff8803123b3fd8 ffff8803123b3fd8 ffff8803123b3fd8
[ 240.381733] ffff88031458df40 ffff880312ab9fc0 ffff88030c198870 ffff880312ab9fc0
[ 240.381736] Call Trace:
[ 240.381740] [<ffffffff81628029>] schedule+0x29/0x70
[ 240.381742] [<ffffffff81628d75>] rwsem_down_failed_common+0xb5/0x150
[ 240.381744] [<ffffffff81628e23>] rwsem_down_write_failed+0x13/0x20
[ 240.381751] [<ffffffff812ed303>] call_rwsem_down_write_failed+0x13/0x20
[ 240.381754] [<ffffffff816271f5>] ? down_write+0x65/0x70
[ 240.381757] [<ffffffff81197ab8>] ? thaw_super+0x28/0xd0
[ 240.381759] [<ffffffff81197ab8>] thaw_super+0x28/0xd0
[ 240.381761] [<ffffffff811a0da5>] ? putname+0x35/0x50
[ 240.381765] [<ffffffff811a7f1c>] do_vfs_ioctl+0x36c/0x560
[ 240.381767] [<ffffffff810bbb5d>] ? trace_hardirqs_on+0xd/0x10
[ 240.381769] [<ffffffff81629c55>] ? retint_swapgs+0x13/0x1b
[ 240.381772] [<ffffffff811a81a1>] sys_ioctl+0x91/0xa0
[ 240.381774] [<ffffffff81631f29>] system_call_fastpath+0x16/0x1b
[ 240.381776] 1 lock held by xfs_io/15270:
[ 240.381776] #0: (&type->s_umount_key#19){+++++.}, at: [<ffffffff81197ab8>] thaw_super+0x28/0xd0
--
Best Regards,
Artem Bityutskiy
On Mon, Jul 02, 2012 at 03:15:16PM +0300, Artem Bityutskiy wrote:
> Hi Ted,
>
> first of all I want to remind you about the sync_supers changes which
> you said you'd make sure get properly reviewed and tested for v3.5. But
> I am getting worried about missing also 3.5.
>
> I wanted to re-send the patch-set, but I cannot test it because ext4
> blocks when I test it with xfstests.
Sorry, I've just been totally swamped this past couple of monthes;
work deadlines, plus a lot of other things that have been just keeping
me totally overloaded.
Can you tell me *which* xfstest this was blocking on? I generally run
xfstests under KVM, on the console, which is nice because the xfstests
output gets interleaved with any kernel oops/softlock up messages.
Also, for the sort of changes you are doing using the check -g quick
is probably good enough, that will get you unblocked.
I don't recall anything going wrong with a -g auto run with a standard
4k block size, but I'll give it a try on my end and see what I get.
- Ted
On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
> Can you tell me *which* xfstest this was blocking on?
Test 068 which runs fsstress.
$ ps axjf
828 830 828 828 ? -1 S 500 0:00 | \_ sshd: dedekind@pts/0
830 831 831 831 pts/0 972 Ss 500 0:00 | \_ -bash
831 972 972 831 pts/0 972 S+ 0 0:00 | \_ sudo env PATH=/home/dedekind/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin ./test-ext4.sh elog
972 973 972 831 pts/0 972 S+ 0 0:00 | \_ /bin/sh -euf ./test-ext4.sh elog
973 997 972 831 pts/0 972 S+ 0 0:00 | \_ /bin/bash ./check -g auto
997 14830 972 831 pts/0 972 S+ 0 0:00 | \_ /bin/bash ./068
14830 15000 972 831 pts/0 972 S+ 0 0:00 | \_ /bin/bash ./068
15000 15153 15153 831 pts/0 972 S 0 0:00 | | \_ /home/dedekind/git/xfstests/ltp/fsstress -d /mnt/xfstests-disk-2/fsstress_test_dir -p 2 -n 200
15153 15154 15153 831 pts/0 972 D 0 0:00 | | \_ /home/dedekind/git/xfstests/ltp/fsstress -d /mnt/xfstests-disk-2/fsstress_test_dir -p 2 -n 200
15153 15155 15153 831 pts/0 972 D 0 0:00 | | \_ /home/dedekind/git/xfstests/ltp/fsstress -d /mnt/xfstests-disk-2/fsstress_test_dir -p 2 -n 200
14830 15170 972 831 pts/0 972 S+ 0 0:00 | \_ /bin/sh -f /usr/sbin/xfs_freeze -u /mnt/xfstests-disk-2
15170 15174 972 831 pts/0 972 D+ 0 0:00 | | \_ /usr/sbin/xfs_io -F -r -p xfs_freeze -x -c thaw /mnt/xfstests-disk-2
14830 15171 972 831 pts/0 972 S+ 0 0:00 | \_ tee -a 068.full
--
Best Regards,
Artem Bityutskiy
On Mon, Jul 02, 2012 at 05:54:22PM +0300, Artem Bityutskiy wrote:
> On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
> > Can you tell me *which* xfstest this was blocking on?
>
> Test 068 which runs fsstress.
Hmmm, I had just started an xfstests -g auto run using standard ext4
defaults with 3.5-rc2, and I'm not noting any problems. And I've
gotten past test 68 w/o any problems.
Can you tell me more about your KVM config?
I'm using a 512 megs of memory and 2 CPU's under KVM, with a 5 gig
test partition running on a 5400 RPM laptop drive.
- Ted
I've removed linux-fsdevel and linux-kernel to avoid spamming folks
who don't need to see the blow-by-blow debugging....
On Mon, Jul 02, 2012 at 11:14:42AM -0400, Theodore Ts'o wrote:
> On Mon, Jul 02, 2012 at 05:54:22PM +0300, Artem Bityutskiy wrote:
> > On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
> > > Can you tell me *which* xfstest this was blocking on?
> >
> > Test 068 which runs fsstress.
How reliably can you reproduce this hang-up? I've just run the
equivalent of "check 68,68,68,68,68" and I haven't gotten hung up
once.
And this is a stock 4k ext4 filesystem, right? Created with "mke2fs
-t ext4" w/o any other options?
- Ted
On Mon, Jul 02, 2012 at 11:14:42AM -0400, Theodore Ts'o wrote:
> On Mon, Jul 02, 2012 at 05:54:22PM +0300, Artem Bityutskiy wrote:
> > On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
> > > Can you tell me *which* xfstest this was blocking on?
> >
> > Test 068 which runs fsstress.
>
> Hmmm, I had just started an xfstests -g auto run using standard ext4
> defaults with 3.5-rc2, and I'm not noting any problems. And I've
> gotten past test 68 w/o any problems.
I just tried again with 3.5-rc5, and I was able to reproduce your
failure by using "check 68 68 68 68 68" (it failed after the 2nd time
it ran the fsstress test. What's interesting is that there have been
no changes in fs/ext4 or fs/jbd2 since 3.5-rc2. So this may have been
caused by a writeback related change; I'm starting a bisect now.
- Ted
On Mon, 2012-07-02 at 11:14 -0400, Theodore Ts'o wrote:
> On Mon, Jul 02, 2012 at 05:54:22PM +0300, Artem Bityutskiy wrote:
> > On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
> > > Can you tell me *which* xfstest this was blocking on?
> >
> > Test 068 which runs fsstress.
>
> Hmmm, I had just started an xfstests -g auto run using standard ext4
> defaults with 3.5-rc2, and I'm not noting any problems. And I've
> gotten past test 68 w/o any problems.
>
> Can you tell me more about your KVM config?
>
> I'm using a 512 megs of memory and 2 CPU's under KVM, with a 5 gig
> test partition running on a 5400 RPM laptop drive.
I have to run home now, but quickly few words about my setup. Yes, it is
100% reproducible, I can probably even try to bisect it. It worked, not
sure in which kernel, but I had xfstests finish successfully. I
reproduce this on stock v3.5-r5 kernel.
This is how I run kvm:
IMAGE="np.img"
DEVEL_DISK1="devel-disk-1.img"
DEVEL_DISK2="devel-disk-2.img"
mv np.output np.output.prev
kvm -m 12384 -spice port=7837,disable-ticketing \
-cpu core2duo -smp 12,sockets=1,cores=6,threads=2 \
-enable-kvm -name np \
-drive file=$IMAGE,if=virtio,cache=none,boot=on \
-drive file=$DEVEL_DISK1,if=virtio,cache=none \
-drive file=$DEVEL_DISK2,if=virtio,cache=none \
-rtc base=utc \
-net nic,model=virtio -net user \
-device virtio-balloon-pci \
-nographic \
-redir tcp:7838::22 \
| tee np.output
The cpuinfo in kvm is attached.
My host system is a SandyBridge with 6 cores, hyperthreding gives 12.
The disk where the images sit is an Intel SSD.
My hacky scripts to run the tests are also attached. I just run
"./test-ext4". My .config is also attached. Yes, I just create the
file-system with mkfs.ext4 without any additional options.
--
Best Regards,
Artem Bityutskiy
On Mon, 2012-07-02 at 12:45 -0400, Theodore Ts'o wrote:
> On Mon, Jul 02, 2012 at 11:14:42AM -0400, Theodore Ts'o wrote:
> > On Mon, Jul 02, 2012 at 05:54:22PM +0300, Artem Bityutskiy wrote:
> > > On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
> > > > Can you tell me *which* xfstest this was blocking on?
> > >
> > > Test 068 which runs fsstress.
> >
> > Hmmm, I had just started an xfstests -g auto run using standard ext4
> > defaults with 3.5-rc2, and I'm not noting any problems. And I've
> > gotten past test 68 w/o any problems.
>
> I just tried again with 3.5-rc5, and I was able to reproduce your
> failure by using "check 68 68 68 68 68" (it failed after the 2nd time
> it ran the fsstress test. What's interesting is that there have been
> no changes in fs/ext4 or fs/jbd2 since 3.5-rc2. So this may have been
> caused by a writeback related change; I'm starting a bisect now.
OK, cool, let me know if you need help, I'll be able to help tomorrow.
--
Best Regards,
Artem Bityutskiy
On 07/02/2012 11:45 AM, Theodore Ts'o wrote:
> On Mon, Jul 02, 2012 at 11:14:42AM -0400, Theodore Ts'o wrote:
>> On Mon, Jul 02, 2012 at 05:54:22PM +0300, Artem Bityutskiy wrote:
>>> On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
>>>> Can you tell me *which* xfstest this was blocking on?
>>>
>>> Test 068 which runs fsstress.
>>
>> Hmmm, I had just started an xfstests -g auto run using standard ext4
>> defaults with 3.5-rc2, and I'm not noting any problems. And I've
>> gotten past test 68 w/o any problems.
>
> I just tried again with 3.5-rc5, and I was able to reproduce your
> failure by using "check 68 68 68 68 68" (it failed after the 2nd time
> it ran the fsstress test. What's interesting is that there have been
> no changes in fs/ext4 or fs/jbd2 since 3.5-rc2. So this may have been
> caused by a writeback related change; I'm starting a bisect now.
Isn't this just part of the whole class of filesystem freeze races Jan
has been trying to work out?
-Eric
On Mon 02-07-12 11:54:05, Eric Sandeen wrote:
> On 07/02/2012 11:45 AM, Theodore Ts'o wrote:
> > On Mon, Jul 02, 2012 at 11:14:42AM -0400, Theodore Ts'o wrote:
> >> On Mon, Jul 02, 2012 at 05:54:22PM +0300, Artem Bityutskiy wrote:
> >>> On Mon, 2012-07-02 at 10:44 -0400, Theodore Ts'o wrote:
> >>>> Can you tell me *which* xfstest this was blocking on?
> >>>
> >>> Test 068 which runs fsstress.
> >>
> >> Hmmm, I had just started an xfstests -g auto run using standard ext4
> >> defaults with 3.5-rc2, and I'm not noting any problems. And I've
> >> gotten past test 68 w/o any problems.
> >
> > I just tried again with 3.5-rc5, and I was able to reproduce your
> > failure by using "check 68 68 68 68 68" (it failed after the 2nd time
> > it ran the fsstress test. What's interesting is that there have been
> > no changes in fs/ext4 or fs/jbd2 since 3.5-rc2. So this may have been
> > caused by a writeback related change; I'm starting a bisect now.
>
> Isn't this just part of the whole class of filesystem freeze races Jan
> has been trying to work out?
Yes, I can hang the kernel on this test with basically any filesystem
when running long enough. My freezing patches should fix that - currently
waiting for Al to merge them...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, Jul 02, 2012 at 07:31:04PM +0200, Jan Kara wrote:
> Yes, I can hang the kernel on this test with basically any filesystem
> when running long enough. My freezing patches should fix that - currently
> waiting for Al to merge them...
So this is different from your suspend/resume patches, then?
Can you send me a pointer to your patches? Is this something that Al
will hopefully merge before 3.5 is released?
- Ted
On Mon 02-07-12 13:43:42, Ted Tso wrote:
> On Mon, Jul 02, 2012 at 07:31:04PM +0200, Jan Kara wrote:
> > Yes, I can hang the kernel on this test with basically any filesystem
> > when running long enough. My freezing patches should fix that - currently
> > waiting for Al to merge them...
>
> So this is different from your suspend/resume patches, then?
I'm not sure what suspend/resume patches do you mean so I suspect it
might be it :)
> Can you send me a pointer to your patches? Is this something that Al
http://www.spinics.net/lists/kernel/msg1355763.html
> will hopefully merge before 3.5 is released?
Hopefully. He is certainly aware of the series and wants to merge it.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, Jul 02, 2012 at 07:53:41PM +0200, Jan Kara wrote:
>
> > Can you send me a pointer to your patches? Is this something that Al
> http://www.spinics.net/lists/kernel/msg1355763.html
Ah, right. OK, I didn't realize until I took a closer look that
xfstests #68 was testing file system freezes while running fsstress.
So Artem, what I'd suggest doing for now is to simply comment out test
#68 from your xfstests's group file and just avoid running that test
until this gets sorted. I simply hadn't noticed since for most of my
quick checks I tend to use -g quick, and -g auto more rarely (i.e.,
before I push changes to Linus, and every so often during the dev
cycle), and test #68 isn't run for -g quick, and it's only failing
about 5-10% of the time in my test configuration.
- Ted