2014-04-13 20:22:09

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] ext4: add fallocate mode blocking for debugging purposes

If a particular fallocate mode is causing test failures, give the
tester the ability to block a particular fallocate mode so that the
use of a particular fallocate mode will be reported as not supported.

For example, if the COLLAPSE_RANGE fallocate mode is causing test
failures, this allows us to suppress it so we can more easily test the
rest of the file system code.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/extents.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 64b4003..1bb3e4b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -29,6 +29,7 @@
* - smart tree reduction
*/

+#include <linux/module.h>
#include <linux/fs.h>
#include <linux/time.h>
#include <linux/jbd2.h>
@@ -4862,6 +4863,12 @@ out_mutex:
return ret;
}

+int ext4_fallocate_mode_block __read_mostly;
+
+module_param_named(fallocate_mode_block, ext4_fallocate_mode_block, int, 0644);
+MODULE_PARM_DESC(fallocate_mode_block,
+ "Fallocate modes which are blocked for debugging purposes");
+
/*
* preallocate space for a file. This implements ext4's fallocate file
* operation, which gets called from sys_fallocate system call.
@@ -4881,6 +4888,13 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
struct timespec tv;
unsigned int blkbits = inode->i_blkbits;

+ /*
+ * for debugging purposes, allow certain fallocate operations
+ * to be disabled
+ */
+ if (unlikely(mode & ext4_fallocate_mode_block))
+ return -EOPNOTSUPP;
+
/* Return error if mode is not supported */
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
--
1.9.0



2014-04-13 22:00:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Sun, Apr 13, 2014 at 04:21:58PM -0400, Theodore Ts'o wrote:
> If a particular fallocate mode is causing test failures, give the
> tester the ability to block a particular fallocate mode so that the
> use of a particular fallocate mode will be reported as not supported.
>
> For example, if the COLLAPSE_RANGE fallocate mode is causing test
> failures, this allows us to suppress it so we can more easily test the
> rest of the file system code.

Hi Namjae,

One of the reasons this patch set is that after Lukas added
COLLAPSE_RANGE support into fsx, we've started seeing a number of
failures which seem to be directly related to COLLAPSE_RANGE.

For your convenience, I've updated the root_fs.img file for the
kvm-xfststs system in the xfstests-bld system:

https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/root_fs.img.i386

This has the latest xfstests (plus some bug fixes not yet in the
xfstests upstream), which means it has the improved fsx. With this
updated xfstests, the tests which use fsx, such as generic/075 and
generic/091, are failing for the 1k and bigalloc configuration, and if
we apply this patch and then add to the kernel's boot command line:

ext4.fallocate_mode_block=0x08

this will allow these tests to pass again.

(An easy way to do this using kvm-xfstests is by creating the file
custom.config with the line "EXTRA_ARG="ext4.fallocate_mode_block=0x08")


In addition, although I haven't figured out why what is needed to
create a reliable production, test generic/091 (with COLLAPSE_RANGE
enabled) is sometimes causing a file system corruption which causes
the xfstests run to abort. (It does seem to reproduce more reliably
if generic/091 is run as part of a "-g auto" run):

e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Inode 20 has out of order extents
(invalid logical block 3, physical block 492938, len 2)
Clear? yes

Inode 20, i_blocks is 368, should be 352. Fix? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(492938--492939)
Fix? yes

Free blocks count wrong for group #15 (32657, counted=32659).
Fix? yes

Free blocks count wrong (1136513, counted=1136515).
Fix? yes


/dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdb: 10137/327680 files (5.4% non-contiguous), 174205/1310720 blocks

(This was found in the standard 4k configuration, but in the test log
below it was showing up in the dioread_nolock configuration).

If you could take a look at some of these test failures, I'd be much
obliged.

Thanks!

- Ted

An excerpted log file generated using "kvm-xfstests generic/075
generic/091". The 1k configuration fails reliably. Some of the other
failures don't always fail, so your mileage may vary.

git versions:
fio fio-2.1-19-g0b14f0a (Thu, 23 May 2013 21:27:54 +0200)
quota 0d0a674 (Tue, 26 Mar 2013 17:13:33 +0100)
xfsprogs v3.2.0-alpha2-60-gaa210c4 (Thu, 13 Mar 2014 21:23:50 +1100)
xfstests-bld 66c8bf2 (Sat, 12 Apr 2014 13:32:12 -0400)
xfstests linux-v3.8-366-g5c0348d (Sat, 12 Apr 2014 19:11:54 -0400)
BEGIN TEST: Ext4 4k block Sun Apr 13 20:53:43 UTC 2014
Device: /dev/vdb
mk2fs options: -q
mount options: -o block_validity
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc

generic/075 101s ... [20:54:01] [20:55:44] 103s
generic/091 154s ... [20:56:00] [20:58:30] 150s
Ran: generic/075 generic/091
Passed all 2 tests

END TEST: Ext4 4k block Sun Apr 13 20:58:45 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vdb: 27437/327680 files (8.1% non-contiguous), 591406/1310720 blocks
BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Sun Apr 13 20:59:03 UTC 2014
Device: /dev/vdd
mk2fs options: -q -O ^extents,^flex_bg,^uninit_bg
mount options: -o block_validity,nodelalloc
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q -O ^extents,^flex_bg,^uninit_bg /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,nodelalloc /dev/vdc /vdc

generic/075 62s ... [20:59:11] [21:00:12] 61s
generic/091 144s ... [21:00:15] [21:02:38] 143s
Ran: generic/075 generic/091
Passed all 2 tests

END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Sun Apr 13 21:02:41 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vdd: 15/327680 files (6.7% non-contiguous), 55986/1310720 blocks
BEGIN TEST: Ext4 4k block w/ no journal Sun Apr 13 21:02:44 UTC 2014
Device: /dev/vdb
mk2fs options: -q -O ^has_journal
mount options: -o block_validity,noload
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q -O ^has_journal /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc

generic/075 60s ... [21:03:00] [21:03:57] 57s
generic/091 137s ... [21:04:10] [21:06:21] 131s
Ran: generic/075 generic/091
Passed all 2 tests

END TEST: Ext4 4k block w/ no journal Sun Apr 13 21:06:34 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vdb: 27437/327680 files (8.1% non-contiguous), 591406/1310720 blocks
BEGIN TEST: Ext4 1k block Sun Apr 13 21:06:49 UTC 2014
Device: /dev/vdd
mk2fs options: -q -b 1024
mount options: -o block_validity
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q -b 1024 /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc

generic/075 [21:06:53] [21:06:59] [failed, exit status 1] - output mismatch (see /results/results-1k/generic/075.out.bad)
--- tests/generic/075.out 2014-04-12 23:20:55.000000000 +0000
+++ /results/results-1k/generic/075.out.bad 2014-04-13 21:06:59.742889855 +0000
@@ -4,15 +4,5 @@
-----------------------------------------------
fsx.0 : -d -N numops -S 0
-----------------------------------------------
-
------------------------------------------------
-fsx.1 : -d -N numops -S 0 -x
------------------------------------------------
...
(Run 'diff -u tests/generic/075.out /results/results-1k/generic/075.out.bad' to see the entire diff)
generic/091 187s ... [21:07:00] [21:09:15] [failed, exit status 1] - output mismatch (see /results/results-1k/generic/091.out.bad)
--- tests/generic/091.out 2014-04-12 23:20:55.000000000 +0000
+++ /results/results-1k/generic/091.out.bad 2014-04-13 21:09:15.726223187 +0000
@@ -5,3 +5,816 @@
fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
+fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
+mapped writes DISABLED
+skipping zero length zero range
+fallocating to largest ever: 0x137f5
...
(Run 'diff -u tests/generic/091.out /results/results-1k/generic/091.out.bad' to see the entire diff)
Ran: generic/075 generic/091
Failures: generic/075 generic/091
Failed 2 of 2 tests

END TEST: Ext4 1k block Sun Apr 13 21:09:17 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vdd: 15/327680 files (6.7% non-contiguous), 120162/5242880 blocks
BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Sun Apr 13 21:09:19 UTC 2014
Device: /dev/vdd
mk2fs options: -q -O ^flex_bg
mount options: -o block_validity,nodelalloc
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q -O ^flex_bg /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,nodelalloc /dev/vdc /vdc

generic/075 63s ... [21:09:23] [21:10:23] 60s
generic/091 163s ... [21:10:24] [21:13:07] 163s
Ran: generic/075 generic/091
Passed all 2 tests

END TEST: Ext4 4k block w/nodelalloc and no flex_bg Sun Apr 13 21:13:08 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vdd: 15/327680 files (6.7% non-contiguous), 55952/1310720 blocks
BEGIN TEST: Ext4 4k block w/dioread_nolock Sun Apr 13 21:13:09 UTC 2014
Device: /dev/vdb
mk2fs options: -q
mount options: -o block_validity,dioread_nolock
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,dioread_nolock /dev/vdc /vdc

generic/075 99s ... [21:13:27] [21:15:05] 98s
generic/091 [21:15:19] [21:17:36] [failed, exit status 1] - output mismatch (see /results/results-dioread_nolock/generic/091.out.bad)
--- tests/generic/091.out 2014-04-12 23:20:55.000000000 +0000
+++ /results/results-dioread_nolock/generic/091.out.bad 2014-04-13 21:17:36.366223190 +0000
@@ -5,3 +5,1100 @@
fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
+fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
+mapped writes DISABLED
+skipping zero length zero range
+fallocating to largest ever: 0x137f5
...
(Run 'diff -u tests/generic/091.out /results/results-dioread_nolock/generic/091.out.bad' to see the entire diff)
_check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /results/results-dioread_nolock/generic/091.full)
Ran: generic/075 generic/091
Failures: generic/091
Failed 1 of 2 tests
END TEST: Ext4 4k block w/dioread_nolock Sun Apr 13 21:17:50 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Inode 801 has out of order extents
(invalid logical block 9, physical block 1022972, len 2)
Clear? yes

Inode 801 has out of order extents
(invalid logical block 18, physical block 1023063, len 1)
Clear? yes

Inode 801 has out of order extents
(invalid logical block 16, physical block 1022970, len 2)
Clear? yes

Inode 801 has out of order extents
(invalid logical block 20, physical block 1023056, len 7)
Clear? yes

Inode 801 has out of order extents
(invalid logical block 19, physical block 1024820, len 2)
Clear? yes

Inode 801, i_blocks is 256, should be 136. Fix? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(1019465--1019470) -(1022972--1022973) -1023063 -(1024820--1024821) -(1024836--1024839)
Fix? yes

Free blocks count wrong for group #31 (22489, counted=22504).
Fix? yes

Free blocks count wrong (719201, counted=719216).
Fix? yes


/dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdb: 27437/327680 files (8.1% non-contiguous), 591504/1310720 blocks
BEGIN TEST: Ext4 4k block w/data=journal Sun Apr 13 21:18:03 UTC 2014
Device: /dev/vdb
mk2fs options: -q
mount options: -o block_validity,data=journal
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,data=journal /dev/vdc /vdc

[ 1488.281589] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
_check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /results/results-data_journal/check.full)
Passed all 0 tests
END TEST: Ext4 4k block w/data=journal Sun Apr 13 21:18:21 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Inode 801, i_blocks is 136, should be 144. Fix? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +(1019465--1019470) -(1022970--1022971) -(1023056--1023062) +(1024836--1024839)
Fix? yes

Free blocks count wrong for group #31 (22504, counted=22503).
Fix? yes

Free blocks count wrong (719216, counted=719215).
Fix? yes


/dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdb: 27437/327680 files (8.1% non-contiguous), 591505/1310720 blocks
BEGIN TEST: Ext4 4k block w/bigalloc Sun Apr 13 21:18:38 UTC 2014
Device: /dev/vde
mk2fs options: -q -O bigalloc
mount options: -o block_validity
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q -O bigalloc /dev/vdf
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdf /vdf

generic/075 [21:18:42] [21:18:44] [failed, exit status 1] - output mismatch (see /results/results-bigalloc/generic/075.out.bad)
--- tests/generic/075.out 2014-04-12 23:20:55.000000000 +0000
+++ /results/results-bigalloc/generic/075.out.bad 2014-04-13 21:18:44.126223139 +0000
@@ -4,15 +4,5 @@
-----------------------------------------------
fsx.0 : -d -N numops -S 0
-----------------------------------------------
-
------------------------------------------------
-fsx.1 : -d -N numops -S 0 -x
------------------------------------------------
...
(Run 'diff -u tests/generic/075.out /results/results-bigalloc/generic/075.out.bad' to see the entire diff)
generic/091 [21:18:44][ 1526.350948] EXT4-fs error (device vde): ext4_mb_free_metadata:4563: group 1, block 524432:Block already on to-be-freed list
[ 1526.358204] JBD2: Spotted dirty metadata buffer (dev = vde, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[21:18:45] [failed, exit status 1] - output mismatch (see /results/results-bigalloc/generic/091.out.bad)
--- tests/generic/091.out 2014-04-12 23:20:55.000000000 +0000
+++ /results/results-bigalloc/generic/091.out.bad 2014-04-13 21:18:45.116223190 +0000
@@ -1,7 +1,158 @@
QA output created by 091
fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
...
(Run 'diff -u tests/generic/091.out /results/results-bigalloc/generic/091.out.bad' to see the entire diff)
_check_generic_filesystem: filesystem on /dev/vde is inconsistent (see /results/results-bigalloc/generic/091.full)
Ran: generic/075 generic/091
Failures: generic/075 generic/091
Failed 2 of 2 tests
END TEST: Ext4 4k block w/bigalloc Sun Apr 13 21:18:45 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Inode 12, i_blocks is 640, should be 4096. Fix? yes


Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 12: 524842 524840 524841 524848 524849 524850 524851 524852 524853 524854 524855 524857 524858 524859 524860 524861 524862 524863 525136 525137 525138 525139 525140 525141 525142 525143 525144 525145 525146 525137 525138 525139 525140 525141 525137 525138 525139 525140 525172 525173 525174 525175 525176 525177 525178 525179 525180 525181 525182 525183 525184 525185 525190 525191 525192
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 1 inodes containing multiply-claimed blocks.)

File /junk (inode #12, mod time Sun Apr 13 21:18:45 2014)
has 5 multiply-claimed block(s), shared with 0 file(s):
Clone multiply-claimed blocks? yes

clone_file_block: internal error: can't find dup_blk for 524848

clone_file_block: internal error: can't find dup_blk for 524849

clone_file_block: internal error: can't find dup_blk for 524850

clone_file_block: internal error: can't find dup_blk for 524851

clone_file_block: internal error: can't find dup_blk for 524852

clone_file_block: internal error: can't find dup_blk for 524853

clone_file_block: internal error: can't find dup_blk for 524854

clone_file_block: internal error: can't find dup_blk for 524855

clone_file_block: internal error: can't find dup_blk for 524857

clone_file_block: internal error: can't find dup_blk for 524858

clone_file_block: internal error: can't find dup_blk for 524859

clone_file_block: internal error: can't find dup_blk for 524860

clone_file_block: internal error: can't find dup_blk for 524861

clone_file_block: internal error: can't find dup_blk for 524862

clone_file_block: internal error: can't find dup_blk for 524863

clone_file_block: internal error: can't find dup_blk for 525184

clone_file_block: internal error: can't find dup_blk for 525185

clone_file_block: internal error: can't find dup_blk for 525190

clone_file_block: internal error: can't find dup_blk for 525191

clone_file_block: internal error: can't find dup_blk for 525192

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (31477, counted=31474).
Fix? yes

Free blocks count wrong for group #1 (32752, counted=32751).
Fix? yes

Free blocks count wrong (5188880, counted=5188816).
Fix? yes


/dev/vde: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vde: 15/327680 files (6.7% non-contiguous), 54064/5242880 blocks
BEGIN TEST: Ext4 1k block w/bigalloc Sun Apr 13 21:18:46 UTC 2014
Device: /dev/vdd
mk2fs options: -q -b 1024 -O bigalloc
mount options: -o block_validity
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
MKFS_OPTIONS -- -q -b 1024 -O bigalloc /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc

generic/075 [21:18:48][ 1533.195953] EXT4-fs error (device vdd): ext4_mb_free_metadata:4563: group 1, block 131376:Block already on to-be-freed list
[ 1533.210657] JBD2: Spotted dirty metadata buffer (dev = vdd, blocknr = 1). There's a risk of filesystem corruption in case of system crash.
[21:18:53] [failed, exit status 1] - output mismatch (see /results/results-bigalloc_1k/generic/075.out.bad)
--- tests/generic/075.out 2014-04-12 23:20:55.000000000 +0000
+++ /results/results-bigalloc_1k/generic/075.out.bad 2014-04-13 21:18:53.882889857 +0000
@@ -4,15 +4,5 @@
-----------------------------------------------
fsx.0 : -d -N numops -S 0
-----------------------------------------------
-
------------------------------------------------
-fsx.1 : -d -N numops -S 0 -x
------------------------------------------------
...
(Run 'diff -u tests/generic/075.out /results/results-bigalloc_1k/generic/075.out.bad' to see the entire diff)
_check_generic_filesystem: filesystem on /dev/vdd is inconsistent (see /results/results-bigalloc_1k/generic/075.full)
Ran: generic/075
Failures: generic/075
Failed 1 of 1 tests
END TEST: Ext4 1k block w/bigalloc Sun Apr 13 21:18:54 UTC 2014
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #1 (8176, counted=8175).
Fix? yes

Free blocks count wrong (5125872, counted=5125856).
Fix? yes


/dev/vdd: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdd: 12/327680 files (0.0% non-contiguous), 117024/5242880 blocks
[....] startpar: service(s) returned failure: rc.local ...[?25l[?1c7[FAIL8[?25h[?0c failed!

Last login: Sun Apr 13 21:18:54 UTC 2014 on ttyS3
Linux candygram 3.14.0-08699-gc282c15 #1801 SMP Sun Apr 13 16:46:52 EDT 2014 i686

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@candygram:~# QEMU: Terminated

2014-04-14 14:05:47

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

>
> Hi Namjae,
>
> One of the reasons this patch set is that after Lukas added
> COLLAPSE_RANGE support into fsx, we've started seeing a number of
> failures which seem to be directly related to COLLAPSE_RANGE.
>
> For your convenience, I've updated the root_fs.img file for the
> kvm-xfststs system in the xfstests-bld system:
>
> https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/root_fs.img.i386
>
> This has the latest xfstests (plus some bug fixes not yet in the
> xfstests upstream), which means it has the improved fsx. With this
> updated xfstests, the tests which use fsx, such as generic/075 and
> generic/091, are failing for the 1k and bigalloc configuration, and if
> we apply this patch and then add to the kernel's boot command line:
>
> ext4.fallocate_mode_block=0x08
>
> this will allow these tests to pass again.
>
> (An easy way to do this using kvm-xfstests is by creating the file
> custom.config with the line "EXTRA_ARG="ext4.fallocate_mode_block=0x08")
>
>
> In addition, although I haven't figured out why what is needed to
> create a reliable production, test generic/091 (with COLLAPSE_RANGE
> enabled) is sometimes causing a file system corruption which causes
> the xfstests run to abort. (It does seem to reproduce more reliably
> if generic/091 is run as part of a "-g auto" run):
>
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 20 has out of order extents
> (invalid logical block 3, physical block 492938, len 2)
> Clear? yes
>
> Inode 20, i_blocks is 368, should be 352. Fix? yes
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: -(492938--492939)
> Fix? yes
>
> Free blocks count wrong for group #15 (32657, counted=32659).
> Fix? yes
>
> Free blocks count wrong (1136513, counted=1136515).
> Fix? yes
>
>
> /dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/vdb: 10137/327680 files (5.4% non-contiguous), 174205/1310720 blocks
>
> (This was found in the standard 4k configuration, but in the test log
> below it was showing up in the dioread_nolock configuration).
>
> If you could take a look at some of these test failures, I'd be much
> obliged.
Hi. Ted.
I will take a look at this issue.
Thanks for your mail!
>
> Thanks!
>
> - Ted
>

2014-04-15 16:02:28

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Sun, 13 Apr 2014, Theodore Ts'o wrote:

> Date: Sun, 13 Apr 2014 18:00:16 -0400
> From: Theodore Ts'o <[email protected]>
> To: Ext4 Developers List <[email protected]>
> Cc: Namjae Jeon <[email protected]>
> Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes
>
> On Sun, Apr 13, 2014 at 04:21:58PM -0400, Theodore Ts'o wrote:
> > If a particular fallocate mode is causing test failures, give the
> > tester the ability to block a particular fallocate mode so that the
> > use of a particular fallocate mode will be reported as not supported.
> >
> > For example, if the COLLAPSE_RANGE fallocate mode is causing test
> > failures, this allows us to suppress it so we can more easily test the
> > rest of the file system code.
>
> Hi Namjae,
>
> One of the reasons this patch set is that after Lukas added
> COLLAPSE_RANGE support into fsx, we've started seeing a number of
> failures which seem to be directly related to COLLAPSE_RANGE.

Ah, I did mentioned it when I added COLLAPSE_RANGE to the fsx and
fsstress, but I forgot to cc you Namjae, sorry about that.

But about the patch. It seems a little bit weird to change kernel
for this. The way I am doing it is by changing ltp/fsx.c and
ltp.fsstress.c to disable the particular mode:

diff --git a/ltp/fsstress.c b/ltp/fsstress.c
index 1eec11a..29b790b 100644
--- a/ltp/fsstress.c
+++ b/ltp/fsstress.c
@@ -208,7 +208,7 @@ opdesc_t ops[] = {
{ OP_MKNOD, "mknod", mknod_f, 2, 1 },
{ OP_PUNCH, "punch", punch_f, 1, 1 },
{ OP_ZERO, "zero", zero_f, 1, 1 },
- { OP_COLLAPSE, "collapse", collapse_f, 1, 1 },
+ { OP_COLLAPSE, "collapse", collapse_f, 0, 1 },
{ OP_READ, "read", read_f, 1, 0 },
{ OP_READLINK, "readlink", readlink_f, 1, 0 },
{ OP_RENAME, "rename", rename_f, 2, 1 },
diff --git a/ltp/fsx.c b/ltp/fsx.c
index 47d3ee8..194d7a3 100644
--- a/ltp/fsx.c
+++ b/ltp/fsx.c
@@ -144,7 +144,7 @@ int mapped_writes = 1; /* -W flag disables */
int fallocate_calls = 1; /* -F flag disables */
int punch_hole_calls = 1; /* -H flag disables */
int zero_range_calls = 1; /* -z flag disables */
-int collapse_range_calls = 1; /* -C flag disables */
+int collapse_range_calls = 0; /* -C flag disables */
int mapped_reads = 1; /* -R flag disables it */
int fsxgoodfd = 0;
int o_direct; /* -Z */


Or we can add environment variables to disable a particular
operation on fsx, or fsstress. I am not really sure whether we need
to change kernel in order to avoid testing particular fallocate
mode.

Thanks!
-Lukas


>
> For your convenience, I've updated the root_fs.img file for the
> kvm-xfststs system in the xfstests-bld system:
>
> https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/root_fs.img.i386
>
> This has the latest xfstests (plus some bug fixes not yet in the
> xfstests upstream), which means it has the improved fsx. With this
> updated xfstests, the tests which use fsx, such as generic/075 and
> generic/091, are failing for the 1k and bigalloc configuration, and if
> we apply this patch and then add to the kernel's boot command line:
>
> ext4.fallocate_mode_block=0x08
>
> this will allow these tests to pass again.
>
> (An easy way to do this using kvm-xfstests is by creating the file
> custom.config with the line "EXTRA_ARG="ext4.fallocate_mode_block=0x08")
>
>
> In addition, although I haven't figured out why what is needed to
> create a reliable production, test generic/091 (with COLLAPSE_RANGE
> enabled) is sometimes causing a file system corruption which causes
> the xfstests run to abort. (It does seem to reproduce more reliably
> if generic/091 is run as part of a "-g auto" run):
>
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 20 has out of order extents
> (invalid logical block 3, physical block 492938, len 2)
> Clear? yes
>
> Inode 20, i_blocks is 368, should be 352. Fix? yes
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: -(492938--492939)
> Fix? yes
>
> Free blocks count wrong for group #15 (32657, counted=32659).
> Fix? yes
>
> Free blocks count wrong (1136513, counted=1136515).
> Fix? yes
>
>
> /dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/vdb: 10137/327680 files (5.4% non-contiguous), 174205/1310720 blocks
>
> (This was found in the standard 4k configuration, but in the test log
> below it was showing up in the dioread_nolock configuration).
>
> If you could take a look at some of these test failures, I'd be much
> obliged.
>
> Thanks!
>
> - Ted
>
> An excerpted log file generated using "kvm-xfstests generic/075
> generic/091". The 1k configuration fails reliably. Some of the other
> failures don't always fail, so your mileage may vary.
>
> git versions:
> fio fio-2.1-19-g0b14f0a (Thu, 23 May 2013 21:27:54 +0200)
> quota 0d0a674 (Tue, 26 Mar 2013 17:13:33 +0100)
> xfsprogs v3.2.0-alpha2-60-gaa210c4 (Thu, 13 Mar 2014 21:23:50 +1100)
> xfstests-bld 66c8bf2 (Sat, 12 Apr 2014 13:32:12 -0400)
> xfstests linux-v3.8-366-g5c0348d (Sat, 12 Apr 2014 19:11:54 -0400)
> BEGIN TEST: Ext4 4k block Sun Apr 13 20:53:43 UTC 2014
> Device: /dev/vdb
> mk2fs options: -q
> mount options: -o block_validity
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
>
> generic/075 101s ... [20:54:01] [20:55:44] 103s
> generic/091 154s ... [20:56:00] [20:58:30] 150s
> Ran: generic/075 generic/091
> Passed all 2 tests
>
> END TEST: Ext4 4k block Sun Apr 13 20:58:45 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/vdb: 27437/327680 files (8.1% non-contiguous), 591406/1310720 blocks
> BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Sun Apr 13 20:59:03 UTC 2014
> Device: /dev/vdd
> mk2fs options: -q -O ^extents,^flex_bg,^uninit_bg
> mount options: -o block_validity,nodelalloc
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q -O ^extents,^flex_bg,^uninit_bg /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,nodelalloc /dev/vdc /vdc
>
> generic/075 62s ... [20:59:11] [21:00:12] 61s
> generic/091 144s ... [21:00:15] [21:02:38] 143s
> Ran: generic/075 generic/091
> Passed all 2 tests
>
> END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Sun Apr 13 21:02:41 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/vdd: 15/327680 files (6.7% non-contiguous), 55986/1310720 blocks
> BEGIN TEST: Ext4 4k block w/ no journal Sun Apr 13 21:02:44 UTC 2014
> Device: /dev/vdb
> mk2fs options: -q -O ^has_journal
> mount options: -o block_validity,noload
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q -O ^has_journal /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc
>
> generic/075 60s ... [21:03:00] [21:03:57] 57s
> generic/091 137s ... [21:04:10] [21:06:21] 131s
> Ran: generic/075 generic/091
> Passed all 2 tests
>
> END TEST: Ext4 4k block w/ no journal Sun Apr 13 21:06:34 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/vdb: 27437/327680 files (8.1% non-contiguous), 591406/1310720 blocks
> BEGIN TEST: Ext4 1k block Sun Apr 13 21:06:49 UTC 2014
> Device: /dev/vdd
> mk2fs options: -q -b 1024
> mount options: -o block_validity
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q -b 1024 /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
>
> generic/075 [21:06:53] [21:06:59] [failed, exit status 1] - output mismatch (see /results/results-1k/generic/075.out.bad)
> --- tests/generic/075.out 2014-04-12 23:20:55.000000000 +0000
> +++ /results/results-1k/generic/075.out.bad 2014-04-13 21:06:59.742889855 +0000
> @@ -4,15 +4,5 @@
> -----------------------------------------------
> fsx.0 : -d -N numops -S 0
> -----------------------------------------------
> -
> ------------------------------------------------
> -fsx.1 : -d -N numops -S 0 -x
> ------------------------------------------------
> ...
> (Run 'diff -u tests/generic/075.out /results/results-1k/generic/075.out.bad' to see the entire diff)
> generic/091 187s ... [21:07:00] [21:09:15] [failed, exit status 1] - output mismatch (see /results/results-1k/generic/091.out.bad)
> --- tests/generic/091.out 2014-04-12 23:20:55.000000000 +0000
> +++ /results/results-1k/generic/091.out.bad 2014-04-13 21:09:15.726223187 +0000
> @@ -5,3 +5,816 @@
> fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
> +fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> +mapped writes DISABLED
> +skipping zero length zero range
> +fallocating to largest ever: 0x137f5
> ...
> (Run 'diff -u tests/generic/091.out /results/results-1k/generic/091.out.bad' to see the entire diff)
> Ran: generic/075 generic/091
> Failures: generic/075 generic/091
> Failed 2 of 2 tests
>
> END TEST: Ext4 1k block Sun Apr 13 21:09:17 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/vdd: 15/327680 files (6.7% non-contiguous), 120162/5242880 blocks
> BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Sun Apr 13 21:09:19 UTC 2014
> Device: /dev/vdd
> mk2fs options: -q -O ^flex_bg
> mount options: -o block_validity,nodelalloc
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q -O ^flex_bg /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,nodelalloc /dev/vdc /vdc
>
> generic/075 63s ... [21:09:23] [21:10:23] 60s
> generic/091 163s ... [21:10:24] [21:13:07] 163s
> Ran: generic/075 generic/091
> Passed all 2 tests
>
> END TEST: Ext4 4k block w/nodelalloc and no flex_bg Sun Apr 13 21:13:08 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/vdd: 15/327680 files (6.7% non-contiguous), 55952/1310720 blocks
> BEGIN TEST: Ext4 4k block w/dioread_nolock Sun Apr 13 21:13:09 UTC 2014
> Device: /dev/vdb
> mk2fs options: -q
> mount options: -o block_validity,dioread_nolock
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,dioread_nolock /dev/vdc /vdc
>
> generic/075 99s ... [21:13:27] [21:15:05] 98s
> generic/091 [21:15:19] [21:17:36] [failed, exit status 1] - output mismatch (see /results/results-dioread_nolock/generic/091.out.bad)
> --- tests/generic/091.out 2014-04-12 23:20:55.000000000 +0000
> +++ /results/results-dioread_nolock/generic/091.out.bad 2014-04-13 21:17:36.366223190 +0000
> @@ -5,3 +5,1100 @@
> fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
> +fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> +mapped writes DISABLED
> +skipping zero length zero range
> +fallocating to largest ever: 0x137f5
> ...
> (Run 'diff -u tests/generic/091.out /results/results-dioread_nolock/generic/091.out.bad' to see the entire diff)
> _check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /results/results-dioread_nolock/generic/091.full)
> Ran: generic/075 generic/091
> Failures: generic/091
> Failed 1 of 2 tests
> END TEST: Ext4 4k block w/dioread_nolock Sun Apr 13 21:17:50 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 801 has out of order extents
> (invalid logical block 9, physical block 1022972, len 2)
> Clear? yes
>
> Inode 801 has out of order extents
> (invalid logical block 18, physical block 1023063, len 1)
> Clear? yes
>
> Inode 801 has out of order extents
> (invalid logical block 16, physical block 1022970, len 2)
> Clear? yes
>
> Inode 801 has out of order extents
> (invalid logical block 20, physical block 1023056, len 7)
> Clear? yes
>
> Inode 801 has out of order extents
> (invalid logical block 19, physical block 1024820, len 2)
> Clear? yes
>
> Inode 801, i_blocks is 256, should be 136. Fix? yes
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: -(1019465--1019470) -(1022972--1022973) -1023063 -(1024820--1024821) -(1024836--1024839)
> Fix? yes
>
> Free blocks count wrong for group #31 (22489, counted=22504).
> Fix? yes
>
> Free blocks count wrong (719201, counted=719216).
> Fix? yes
>
>
> /dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/vdb: 27437/327680 files (8.1% non-contiguous), 591504/1310720 blocks
> BEGIN TEST: Ext4 4k block w/data=journal Sun Apr 13 21:18:03 UTC 2014
> Device: /dev/vdb
> mk2fs options: -q
> mount options: -o block_validity,data=journal
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,data=journal /dev/vdc /vdc
>
> [ 1488.281589] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
> _check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /results/results-data_journal/check.full)
> Passed all 0 tests
> END TEST: Ext4 4k block w/data=journal Sun Apr 13 21:18:21 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 801, i_blocks is 136, should be 144. Fix? yes
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: +(1019465--1019470) -(1022970--1022971) -(1023056--1023062) +(1024836--1024839)
> Fix? yes
>
> Free blocks count wrong for group #31 (22504, counted=22503).
> Fix? yes
>
> Free blocks count wrong (719216, counted=719215).
> Fix? yes
>
>
> /dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/vdb: 27437/327680 files (8.1% non-contiguous), 591505/1310720 blocks
> BEGIN TEST: Ext4 4k block w/bigalloc Sun Apr 13 21:18:38 UTC 2014
> Device: /dev/vde
> mk2fs options: -q -O bigalloc
> mount options: -o block_validity
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q -O bigalloc /dev/vdf
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdf /vdf
>
> generic/075 [21:18:42] [21:18:44] [failed, exit status 1] - output mismatch (see /results/results-bigalloc/generic/075.out.bad)
> --- tests/generic/075.out 2014-04-12 23:20:55.000000000 +0000
> +++ /results/results-bigalloc/generic/075.out.bad 2014-04-13 21:18:44.126223139 +0000
> @@ -4,15 +4,5 @@
> -----------------------------------------------
> fsx.0 : -d -N numops -S 0
> -----------------------------------------------
> -
> ------------------------------------------------
> -fsx.1 : -d -N numops -S 0 -x
> ------------------------------------------------
> ...
> (Run 'diff -u tests/generic/075.out /results/results-bigalloc/generic/075.out.bad' to see the entire diff)
> generic/091 [21:18:44][ 1526.350948] EXT4-fs error (device vde): ext4_mb_free_metadata:4563: group 1, block 524432:Block already on to-be-freed list
> [ 1526.358204] JBD2: Spotted dirty metadata buffer (dev = vde, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
> [21:18:45] [failed, exit status 1] - output mismatch (see /results/results-bigalloc/generic/091.out.bad)
> --- tests/generic/091.out 2014-04-12 23:20:55.000000000 +0000
> +++ /results/results-bigalloc/generic/091.out.bad 2014-04-13 21:18:45.116223190 +0000
> @@ -1,7 +1,158 @@
> QA output created by 091
> fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
> -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
> ...
> (Run 'diff -u tests/generic/091.out /results/results-bigalloc/generic/091.out.bad' to see the entire diff)
> _check_generic_filesystem: filesystem on /dev/vde is inconsistent (see /results/results-bigalloc/generic/091.full)
> Ran: generic/075 generic/091
> Failures: generic/075 generic/091
> Failed 2 of 2 tests
> END TEST: Ext4 4k block w/bigalloc Sun Apr 13 21:18:45 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 12, i_blocks is 640, should be 4096. Fix? yes
>
>
> Running additional passes to resolve blocks claimed by more than one inode...
> Pass 1B: Rescanning for multiply-claimed blocks
> Multiply-claimed block(s) in inode 12: 524842 524840 524841 524848 524849 524850 524851 524852 524853 524854 524855 524857 524858 524859 524860 524861 524862 524863 525136 525137 525138 525139 525140 525141 525142 525143 525144 525145 525146 525137 525138 525139 525140 525141 525137 525138 525139 525140 525172 525173 525174 525175 525176 525177 525178 525179 525180 525181 525182 525183 525184 525185 525190 525191 525192
> Pass 1C: Scanning directories for inodes with multiply-claimed blocks
> Pass 1D: Reconciling multiply-claimed blocks
> (There are 1 inodes containing multiply-claimed blocks.)
>
> File /junk (inode #12, mod time Sun Apr 13 21:18:45 2014)
> has 5 multiply-claimed block(s), shared with 0 file(s):
> Clone multiply-claimed blocks? yes
>
> clone_file_block: internal error: can't find dup_blk for 524848
>
> clone_file_block: internal error: can't find dup_blk for 524849
>
> clone_file_block: internal error: can't find dup_blk for 524850
>
> clone_file_block: internal error: can't find dup_blk for 524851
>
> clone_file_block: internal error: can't find dup_blk for 524852
>
> clone_file_block: internal error: can't find dup_blk for 524853
>
> clone_file_block: internal error: can't find dup_blk for 524854
>
> clone_file_block: internal error: can't find dup_blk for 524855
>
> clone_file_block: internal error: can't find dup_blk for 524857
>
> clone_file_block: internal error: can't find dup_blk for 524858
>
> clone_file_block: internal error: can't find dup_blk for 524859
>
> clone_file_block: internal error: can't find dup_blk for 524860
>
> clone_file_block: internal error: can't find dup_blk for 524861
>
> clone_file_block: internal error: can't find dup_blk for 524862
>
> clone_file_block: internal error: can't find dup_blk for 524863
>
> clone_file_block: internal error: can't find dup_blk for 525184
>
> clone_file_block: internal error: can't find dup_blk for 525185
>
> clone_file_block: internal error: can't find dup_blk for 525190
>
> clone_file_block: internal error: can't find dup_blk for 525191
>
> clone_file_block: internal error: can't find dup_blk for 525192
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong for group #0 (31477, counted=31474).
> Fix? yes
>
> Free blocks count wrong for group #1 (32752, counted=32751).
> Fix? yes
>
> Free blocks count wrong (5188880, counted=5188816).
> Fix? yes
>
>
> /dev/vde: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/vde: 15/327680 files (6.7% non-contiguous), 54064/5242880 blocks
> BEGIN TEST: Ext4 1k block w/bigalloc Sun Apr 13 21:18:46 UTC 2014
> Device: /dev/vdd
> mk2fs options: -q -b 1024 -O bigalloc
> mount options: -o block_validity
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.14.0-08699-gc282c15
> MKFS_OPTIONS -- -q -b 1024 -O bigalloc /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
>
> generic/075 [21:18:48][ 1533.195953] EXT4-fs error (device vdd): ext4_mb_free_metadata:4563: group 1, block 131376:Block already on to-be-freed list
> [ 1533.210657] JBD2: Spotted dirty metadata buffer (dev = vdd, blocknr = 1). There's a risk of filesystem corruption in case of system crash.
> [21:18:53] [failed, exit status 1] - output mismatch (see /results/results-bigalloc_1k/generic/075.out.bad)
> --- tests/generic/075.out 2014-04-12 23:20:55.000000000 +0000
> +++ /results/results-bigalloc_1k/generic/075.out.bad 2014-04-13 21:18:53.882889857 +0000
> @@ -4,15 +4,5 @@
> -----------------------------------------------
> fsx.0 : -d -N numops -S 0
> -----------------------------------------------
> -
> ------------------------------------------------
> -fsx.1 : -d -N numops -S 0 -x
> ------------------------------------------------
> ...
> (Run 'diff -u tests/generic/075.out /results/results-bigalloc_1k/generic/075.out.bad' to see the entire diff)
> _check_generic_filesystem: filesystem on /dev/vdd is inconsistent (see /results/results-bigalloc_1k/generic/075.full)
> Ran: generic/075
> Failures: generic/075
> Failed 1 of 1 tests
> END TEST: Ext4 1k block w/bigalloc Sun Apr 13 21:18:54 UTC 2014
> e2fsck 1.42.9 (4-Feb-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong for group #1 (8176, counted=8175).
> Fix? yes
>
> Free blocks count wrong (5125872, counted=5125856).
> Fix? yes
>
>
> /dev/vdd: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/vdd: 12/327680 files (0.0% non-contiguous), 117024/5242880 blocks
> [....] startpar: service(s) returned failure: rc.local ...[?25l[?1c7[FAIL8[?25h[?0c failed!
>
> Last login: Sun Apr 13 21:18:54 UTC 2014 on ttyS3
> Linux candygram 3.14.0-08699-gc282c15 #1801 SMP Sun Apr 13 16:46:52 EDT 2014 i686
>
> The programs included with the Debian GNU/Linux system are free software;
> the exact distribution terms for each program are described in the
> individual files in /usr/share/doc/*/copyright.
>
> Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
> permitted by applicable law.
> root@candygram:~# QEMU: Terminated
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-04-15 16:15:45

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On 4/15/14, 11:02 AM, Luk?? Czerner wrote:
> On Sun, 13 Apr 2014, Theodore Ts'o wrote:
>
>> Date: Sun, 13 Apr 2014 18:00:16 -0400
>> From: Theodore Ts'o <[email protected]>
>> To: Ext4 Developers List <[email protected]>
>> Cc: Namjae Jeon <[email protected]>
>> Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes
>>
>> On Sun, Apr 13, 2014 at 04:21:58PM -0400, Theodore Ts'o wrote:
>>> If a particular fallocate mode is causing test failures, give the
>>> tester the ability to block a particular fallocate mode so that the
>>> use of a particular fallocate mode will be reported as not supported.
>>>
>>> For example, if the COLLAPSE_RANGE fallocate mode is causing test
>>> failures, this allows us to suppress it so we can more easily test the
>>> rest of the file system code.
>>
>> Hi Namjae,
>>
>> One of the reasons this patch set is that after Lukas added
>> COLLAPSE_RANGE support into fsx, we've started seeing a number of
>> failures which seem to be directly related to COLLAPSE_RANGE.
>
> Ah, I did mentioned it when I added COLLAPSE_RANGE to the fsx and
> fsstress, but I forgot to cc you Namjae, sorry about that.
>
> But about the patch. It seems a little bit weird to change kernel
> for this. The way I am doing it is by changing ltp/fsx.c and
> ltp.fsstress.c to disable the particular mode:

I tend to agree, better to fix the kernel than to add a knob to turn it
off. And fsx changes can happen a lot quicker than kernel changes. [1]

And if it's really unsafe, and you really want to add a knob, I'd at least
default it to off until it's non-corrupting, and add a message that
this tunable will go away as soon as it's stable, so you'll have no
qualms about quickly deprecating it...

-Eric

[1] it'd be nifty to make an env. var in xfstests which can globally
disable certain fsx operations across all tests which run fsx...

2014-04-15 18:44:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Tue, Apr 15, 2014 at 11:15:41AM -0500, Eric Sandeen wrote:
>
> I tend to agree, better to fix the kernel than to add a knob to turn it
> off. And fsx changes can happen a lot quicker than kernel changes. [1]
>
> And if it's really unsafe, and you really want to add a knob, I'd at least
> default it to off until it's non-corrupting, and add a message that
> this tunable will go away as soon as it's stable, so you'll have no
> qualms about quickly deprecating it...

Yeah, I went back and forth on this. One of there reasons why I added
a kernel knob is that *I* can make the kernel change a lot faster than
it would be to tweak all of the various xfstests program to globally
disable certain operations in fsx, fstress, etc.

I also had a sneaking suspicion that we might have a similar issue
with the INSERT RANGE patches which are coming down the pike, and so
having a general way of also being able INSERT RANGE if to be able to
quickly determine whether a potential bug was caused by INSERT RANGE
or some other pending changes might also be useful.

I freely admit it is a bit of a hack, though. Does the hack smell
less bad if we wrap it in CONFIG_EXT4FS_DEBUG?

> [1] it'd be nifty to make an env. var in xfstests which can globally
> disable certain fsx operations across all tests which run fsx...

Yes, although as I mentioned above, it would be really nice if it
worked across all of the various tests, and not just be limited to
fsx, or even just fsx and fstress.

- Ted




2014-04-15 19:14:03

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On 4/15/14, 1:44 PM, Theodore Ts'o wrote:
> On Tue, Apr 15, 2014 at 11:15:41AM -0500, Eric Sandeen wrote:
>>
>> I tend to agree, better to fix the kernel than to add a knob to turn it
>> off. And fsx changes can happen a lot quicker than kernel changes. [1]
>>
>> And if it's really unsafe, and you really want to add a knob, I'd at least
>> default it to off until it's non-corrupting, and add a message that
>> this tunable will go away as soon as it's stable, so you'll have no
>> qualms about quickly deprecating it...
>
> Yeah, I went back and forth on this. One of there reasons why I added
> a kernel knob is that *I* can make the kernel change a lot faster than
> it would be to tweak all of the various xfstests program to globally
> disable certain operations in fsx, fstress, etc.
>
> I also had a sneaking suspicion that we might have a similar issue
> with the INSERT RANGE patches which are coming down the pike, and so
> having a general way of also being able INSERT RANGE if to be able to
> quickly determine whether a potential bug was caused by INSERT RANGE
> or some other pending changes might also be useful.
>
> I freely admit it is a bit of a hack, though. Does the hack smell
> less bad if we wrap it in CONFIG_EXT4FS_DEBUG?
>
>> [1] it'd be nifty to make an env. var in xfstests which can globally
>> disable certain fsx operations across all tests which run fsx...
>
> Yes, although as I mentioned above, it would be really nice if it
> worked across all of the various tests, and not just be limited to
> fsx, or even just fsx and fstress.

Well, for tests which are specific to collapse range, it'd be trivial
to add a "collapse" group, and exclude it.

For generic stress tests which happen to do collapse range, it'd take
a bit more. But that'd probably still be the generic solution.

XFS got bitten too, there were collapse range problems. But the fixes
are already in the pipe AFAIK.

-Eric

> - Ted
>
>
>


2014-04-15 22:32:47

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On 4/15/14, 1:44 PM, Theodore Ts'o wrote:
> On Tue, Apr 15, 2014 at 11:15:41AM -0500, Eric Sandeen wrote:
>>
>> I tend to agree, better to fix the kernel than to add a knob to turn it
>> off. And fsx changes can happen a lot quicker than kernel changes. [1]
>>
>> And if it's really unsafe, and you really want to add a knob, I'd at least
>> default it to off until it's non-corrupting, and add a message that
>> this tunable will go away as soon as it's stable, so you'll have no
>> qualms about quickly deprecating it...
>
> Yeah, I went back and forth on this. One of there reasons why I added
> a kernel knob is that *I* can make the kernel change a lot faster than
> it would be to tweak all of the various xfstests program to globally
> disable certain operations in fsx, fstress, etc.
>
> I also had a sneaking suspicion that we might have a similar issue
> with the INSERT RANGE patches which are coming down the pike, and so
> having a general way of also being able INSERT RANGE if to be able to
> quickly determine whether a potential bug was caused by INSERT RANGE
> or some other pending changes might also be useful.

Also: I'd humbly suggest just not merging those until they pass stringent
tests like fsx & fsstress...

Adding a pre-emptive knob to turn them off post-merge when they turn
out to be broken sounds backwards to me...

-Eric

2014-04-15 23:26:15

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Tue, Apr 15, 2014 at 02:44:42PM -0400, Theodore Ts'o wrote:
> On Tue, Apr 15, 2014 at 11:15:41AM -0500, Eric Sandeen wrote:
> >
> > I tend to agree, better to fix the kernel than to add a knob to turn it
> > off. And fsx changes can happen a lot quicker than kernel changes. [1]
> >
> > And if it's really unsafe, and you really want to add a knob, I'd at least
> > default it to off until it's non-corrupting, and add a message that
> > this tunable will go away as soon as it's stable, so you'll have no
> > qualms about quickly deprecating it...
>
> Yeah, I went back and forth on this. One of there reasons why I added
> a kernel knob is that *I* can make the kernel change a lot faster than
> it would be to tweak all of the various xfstests program to globally
> disable certain operations in fsx, fstress, etc.

Actually, we shouldn't be changing xfstests or adding workarounds in
the kernel to avoid certain operations. We should be fixing the damn
bugs that are being exposed.

Yes, the addition of zero range and collapse range to fsx and
fsstress has exposed bugs in the XFS code as well, and that causes
assert failures all over the place. But that's a *good thing* - now
those bugs are all fixed and ready to be sent to Linus:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=shortlog;h=refs/heads/xfs-fixes-for-3.15-rc2

And so in a couple of days the problem goes away for everyone using
XFS. Do the same (i.e. fix the bugs) for ext4, and the problem goes
away.

> I also had a sneaking suspicion that we might have a similar issue
> with the INSERT RANGE patches which are coming down the pike, and so
> having a general way of also being able INSERT RANGE if to be able to
> quickly determine whether a potential bug was caused by INSERT RANGE
> or some other pending changes might also be useful.

Well, only if you ignore the lesson we've just learnt.

i.e. that we have to add the FALLOC_FL_INSERT_RANGE to fsx and
fsstress as well as having corner case tests and it needs to pass
those tests before XFS support is ready for upstream inclusion. At
least, that's the lesson I learnt from as the xfstests and XFS
Maintainer - we didn't put the QA bar for inclusion high enough, and
so problems slipped through.

If you want to add more strict testing requirements for ext4
inclusion, then you're welcome to request them for the ext4
implementation of that functionality. You don't have to accept the
code until you're happy with it....

Cheers,

Dave.

--
Dave Chinner
[email protected]

2014-04-15 23:30:42

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Tue, Apr 15, 2014 at 05:32:43PM -0500, Eric Sandeen wrote:
> > I also had a sneaking suspicion that we might have a similar issue
> > with the INSERT RANGE patches which are coming down the pike, and so
> > having a general way of also being able INSERT RANGE if to be able to
> > quickly determine whether a potential bug was caused by INSERT RANGE
> > or some other pending changes might also be useful.
>
> Also: I'd humbly suggest just not merging those until they pass stringent
> tests like fsx & fsstress...
>
> Adding a pre-emptive knob to turn them off post-merge when they turn
> out to be broken sounds backwards to me...

Having learned from COLLAPSE RANGE, I agree. The fact that we didn't
have full testing during the whole development cycle was unfortunate.
And we got lucky with the renameat patches, since I wasn't able to get
tha testing done because the xfstests commits didn't get merged until
*after* the they renameat commits got merged, and also because I
didn't notice that the i386 system call wasn't wired up when I was
doing my manual "just before I push to Linus" testing.

I plan on insisting that INSERT RANGE support being in the VFS, and be
fully enabled, and that we have full INSERT RANGE testing into
xfstests, during the development cycle. Some of the work that I've
been doing with kvm-xfstests and why I created a github tytso/xfstests
git tree is specifically to make sure things go much more smoothly
this time around. (That way, if there is some new fs feature patch,
such as COLLAPSE RANGE or renameat(2) where the tests are still being
refined for xfstests inclusion, we can still have something we can all
use on an interim basis during the development cycle.)

However, with all of this being said, while new feature patches such
sa INSERT RANGE are cooking in the ext4.git tree, so that multiple
developers *can* do that testing, having a knob to turn the feature on
and off without having to do a kernel recompile is convenient.

Cheers,

- Ted

2014-04-16 00:07:19

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

[added [email protected]]

On Tue, Apr 15, 2014 at 07:30:39PM -0400, Theodore Ts'o wrote:
> On Tue, Apr 15, 2014 at 05:32:43PM -0500, Eric Sandeen wrote:
> > > I also had a sneaking suspicion that we might have a similar issue
> > > with the INSERT RANGE patches which are coming down the pike, and so
> > > having a general way of also being able INSERT RANGE if to be able to
> > > quickly determine whether a potential bug was caused by INSERT RANGE
> > > or some other pending changes might also be useful.
> >
> > Also: I'd humbly suggest just not merging those until they pass stringent
> > tests like fsx & fsstress...
> >
> > Adding a pre-emptive knob to turn them off post-merge when they turn
> > out to be broken sounds backwards to me...
>
> Having learned from COLLAPSE RANGE, I agree. The fact that we didn't
> have full testing during the whole development cycle was unfortunate.
> And we got lucky with the renameat patches, since I wasn't able to get
> tha testing done because the xfstests commits didn't get merged until
> *after* the they renameat commits got merged, and also because I
> didn't notice that the i386 system call wasn't wired up when I was
> doing my manual "just before I push to Linus" testing.

I asked for renameat2 tests long before inclusion occurred. The fact
is that we can't co-ordinate xfstests inclusion for a feature that
we don't even know is going to be included until someone sends Linus
a pull request....

> I plan on insisting that INSERT RANGE support being in the VFS, and be
> fully enabled, and that we have full INSERT RANGE testing into
> xfstests, during the development cycle.

There wasn't a problem with the timing of xfstests inclusion - the
problem was with the fact we didn't have sufficient QA coverage in
xfstests when the initial upstream kernel commits occurred. This
time around, the difference will be that this time we'll have fsx
and fsstress coverage *before* kernel support is added, and I've
already asked for that:

http://oss.sgi.com/archives/xfs/2014-04/msg00121.html

> Some of the work that I've
> been doing with kvm-xfstests and why I created a github tytso/xfstests
> git tree is specifically to make sure things go much more smoothly
> this time around.

Ted, this looks and sounds like you're preparing to fork xfstests.
Why? What's the problem with working upstream on test development
and refinement like everyone else does?

This thread is a demonstration of how avoiding upstream interaction
results in nasty hacks being proposed. If you asked the question on
the xfs mailing list of how to avoid various fsstress/fsx
operations, we woul dhave told you that using FSSTRESS_AVOID and
adding an equivalent FSX_AVOID to xfstests is all that is needed.
i.e. we already have partial infrastructure support for what you
need in xfstests and it would be about 30 minutes work to add
FSX_AVOID....

Is that fast enough for you?

Indeed, we could also use similar env vars to ensure various
_requires_* checks fail and to populate FSSTRESS_AVOID/FSX_AVOID
automatically and so tests that require this functionality are not
run.

IOWs, it's in your best interests to work with upstream to add the
functionality you require to xfstests. History tells us that forking
development into private repositories has never worked out well for
anyone, so I'd really, really like you to *at least try* to work
with upstream as your primary test development environment....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-04-16 00:23:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Wed, Apr 16, 2014 at 09:25:56AM +1000, Dave Chinner wrote:
> Actually, we shouldn't be changing xfstests or adding workarounds in
> the kernel to avoid certain operations. We should be fixing the damn
> bugs that are being exposed.

Well, I'm waiting for Namjae to look into the test failures.
Unfortunately I don't have time right now to fix it myself.

In the meantime, I wanted to do a full baseline test run to make sure
we didn't have any other regressions or failures post -rc1, and so
being able to filter out collapse range allowed me to kick off a test
of the rest of the patches I was hoping to push to Linus for -rc2.

> i.e. that we have to add the FALLOC_FL_INSERT_RANGE to fsx and
> fsstress as well as having corner case tests and it needs to pass
> those tests before XFS support is ready for upstream inclusion. At
> least, that's the lesson I learnt from as the xfstests and XFS
> Maintainer - we didn't put the QA bar for inclusion high enough, and
> so problems slipped through.
>
> If you want to add more strict testing requirements for ext4
> inclusion, then you're welcome to request them for the ext4
> implementation of that functionality. You don't have to accept the
> code until you're happy with it....

No arguments here; I plan to do the same.

Regards,

- Ted

2014-04-16 05:47:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Wed, Apr 16, 2014 at 10:06:35AM +1000, Dave Chinner wrote:
> > Some of the work that I've
> > been doing with kvm-xfstests and why I created a github tytso/xfstests
> > git tree is specifically to make sure things go much more smoothly
> > this time around.
>
> Ted, this looks and sounds like you're preparing to fork xfstests.
> Why? What's the problem with working upstream on test development
> and refinement like everyone else does?

I'd prefer not to fork xfstests. However, I do want to get more ext4
developers using automated xfstests testing, so I can scale better.
In order to do that, I need to be able to make it really easy for
people to who aren't hard-core xfstests people to be able to use it.

One of the nice things about kvm-xfstests is that a *lot* easier for
people to figure out how to use it. If I can lower the activation
energy required to get people to use xfstests, it saves me time in the
end.

The reason why I created the github repository is because if I'm going
to be shipping a KVM test appliance image that people can use in a
turn-key environment, I'd prefer that all of the sources, including
any local changes that I might need to make the tests run as smoothly
as possible, are available in a public repository. (And at one point,
I did have up to 12 local changes, which is why I wanted it tracked in
a repo.)

Every single local change I made was either a test or commit that
hadn't been accepted into the upstream xfstests repository yet, or a
fix I wrote that I sent upstream. And as soon as the fixes made it
into the upstream xfstests repository, I rebased them away. At the
moment, there's only once commit in my xfstests github repository
which isn't upstream and it's the:

check: add support for an external file containing tests to exclude

commit for which I've sent the V2 version to you.

So for the most part, I want to keep the repo as close to upstream as
possible, and ideally identical to upstream, and I've been working
towards that end.

> This thread is a demonstration of how avoiding upstream interaction
> results in nasty hacks being proposed. If you asked the question on
> the xfs mailing list of how to avoid various fsstress/fsx
> operations, we woul dhave told you that using FSSTRESS_AVOID and
> adding an equivalent FSX_AVOID to xfstests is all that is needed.
> i.e. we already have partial infrastructure support for what you
> need in xfstests and it would be about 30 minutes work to add
> FSX_AVOID....
>
> Is that fast enough for you?
>
> Indeed, we could also use similar env vars to ensure various
> _requires_* checks fail and to populate FSSTRESS_AVOID/FSX_AVOID
> automatically and so tests that require this functionality are not
> run.

Well, it took me about 1 minute to write the dozen line kernel patch.
I really didn't want to ask you to make changes to xfstests for me,
but if you're willing to make those changes, that would be great. I
really didn't want to presume, though. And if the answer is that I
need to spend the time making all of these changes --- I'll try, but
if I don't have time, I may end up taking the more expedient path.

> IOWs, it's in your best interests to work with upstream to add the
> functionality you require to xfstests. History tells us that forking
> development into private repositories has never worked out well for
> anyone, so I'd really, really like you to *at least try* to work
> with upstream as your primary test development environment....

As I said, every single patch which I put in my local xfstests tree I
also sent upstream.

That being said, I wasn't sure whether you were going to accept that
last change, since there was similar, but for me, not usable
functionality in the form of the -X option. So if you weren't going
to accept a change to allow the excluded list of tests to be kept in a
single file outside of the tests/* subdirectory, I probably would have
carried it as a separate patch --- because it's something I need, and
the current -X functionality really isn't easy to maintain (you need
to have many more files, and they have to be dropped into the xfstests
tests/* subdirectory).

I know that you and I haven't seen eye to eye in the past. For
example, the NO_HIDE_STALE out of tree patch which is running on
thousands and thousands numbers of machines inside Google, but which
the XFS folks have considered evil incarnate. I will freely admit
that I'm much more of a pragmatist and much less of a purist on
certain matters.

So sure, I'm certainly going to _try_ to work with upstream xfstests.
I've done that to date. But I'm certainly not going to presume that
you're going to like or accept all of the changes I might want to
propose.

Regards,

- Ted

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2014-04-16 16:05:33

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes

On Mon, 14 Apr 2014, Namjae Jeon wrote:

> Date: Mon, 14 Apr 2014 23:05:47 +0900
> From: Namjae Jeon <[email protected]>
> To: Theodore Ts'o <[email protected]>
> Cc: Ext4 Developers List <[email protected]>
> Subject: Re: [PATCH] ext4: add fallocate mode blocking for debugging purposes
>
> >
> > Hi Namjae,
> >
> > One of the reasons this patch set is that after Lukas added
> > COLLAPSE_RANGE support into fsx, we've started seeing a number of
> > failures which seem to be directly related to COLLAPSE_RANGE.
> >
> > For your convenience, I've updated the root_fs.img file for the
> > kvm-xfststs system in the xfstests-bld system:
> >
> > https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/root_fs.img.i386
> >
> > This has the latest xfstests (plus some bug fixes not yet in the
> > xfstests upstream), which means it has the improved fsx. With this
> > updated xfstests, the tests which use fsx, such as generic/075 and
> > generic/091, are failing for the 1k and bigalloc configuration, and if
> > we apply this patch and then add to the kernel's boot command line:
> >
> > ext4.fallocate_mode_block=0x08
> >
> > this will allow these tests to pass again.
> >
> > (An easy way to do this using kvm-xfstests is by creating the file
> > custom.config with the line "EXTRA_ARG="ext4.fallocate_mode_block=0x08")
> >
> >
> > In addition, although I haven't figured out why what is needed to
> > create a reliable production, test generic/091 (with COLLAPSE_RANGE
> > enabled) is sometimes causing a file system corruption which causes
> > the xfstests run to abort. (It does seem to reproduce more reliably
> > if generic/091 is run as part of a "-g auto" run):
> >
> > e2fsck 1.42.9 (4-Feb-2014)
> > Pass 1: Checking inodes, blocks, and sizes
> > Inode 20 has out of order extents
> > (invalid logical block 3, physical block 492938, len 2)
> > Clear? yes
> >
> > Inode 20, i_blocks is 368, should be 352. Fix? yes
> >
> > Pass 2: Checking directory structure
> > Pass 3: Checking directory connectivity
> > Pass 4: Checking reference counts
> > Pass 5: Checking group summary information
> > Block bitmap differences: -(492938--492939)
> > Fix? yes
> >
> > Free blocks count wrong for group #15 (32657, counted=32659).
> > Fix? yes
> >
> > Free blocks count wrong (1136513, counted=1136515).
> > Fix? yes
> >
> >
> > /dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
> > /dev/vdb: 10137/327680 files (5.4% non-contiguous), 174205/1310720 blocks
> >
> > (This was found in the standard 4k configuration, but in the test log
> > below it was showing up in the dioread_nolock configuration).
> >
> > If you could take a look at some of these test failures, I'd be much
> > obliged.
> Hi. Ted.
> I will take a look at this issue.
> Thanks for your mail!

Hi,

Just a heads up for those working on the fix for this problem.
I've got the fix and I am testing it right now. Will send patches
soon after I verify that it's working properly.

Thanks!
-Lukas

> >
> > Thanks!
> >
> > - Ted
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>